 |
Interception of KiDebugTrapOrFault
Although many of the proposed techniques for blocking PatchGuard have so far
relied on the fact that PatchGuard utilizes SEH to kick off execution of the
system integrity check, there are different approaches that can be taken which
do not rely on this specific PatchGuard implementation detail. One such
alternative technique for bypassing PatchGuard involves subverting the
kernel debug fault handler: KiDebugTrapOrFault. This handler represents
the entry point for all debug exceptions (such as so-called hardware
breakpoints), and as such presents an attractive target for bypassing
PatchGuard.
The basis of this proposed technique is to utilize a set of hardware
breakpoints to intercept execution at a convenient critical location within
PatchGuard's execution path leading up to the system integrity check. This
technique has a greater degree of flexibility than many of the previously
described techniques, though this flexibility comes at cost of a significantly
more involved (and difficult) implementation. Specifically, one could use
this proposed technique to intercept control at any point critical to the
execution of PatchGuard's system integrity check (for example, the kernel DPC
dispatcher, one of the PatchGuard DPC routines, or a convenient location in
the exception dispatching code path, such as _C_specific_handler.
The means by which this interception of execution could be accomplished is by
assuming control of debug exception handling. This could be done in several
different ways; for example, one could hook KiDebugTrapOrFault or alter the
IDT directory to simply repoint the debug exception to driver-supplied code,
bypassing KiDebugTrapOrFault entirely. There are even ways that this
interception could be done in a way that is transparent to the current
PatchGuard implementation, such as by intercepting PsInvertedFunctionTable as
described in technique 4.3. A driver could then alter the unwind metadata for
KiDebugTrapOrFault and create an exception handler for this routine. This
step would allow transparent, first-chance access to all debug faults
(because KiDebugTrapOrFault internally constructs and dispatches a
STATUS_SINGLE_STEP exception describing the debug fault; normally, this would
present the STATUS_SINGLE_STEP exception to a debugger, but there is no
technical reason why a standard SEH-style exception handler could not catch
the exception). Regardless of how control of execution at the debug trap
handler is gained, the next step in this proposed approach is to
alter execution at the requested point of interest (whether it be the kernel
timer DPC dispatcher, which could be easily found by queuing a DPC and
executing a virtual unwind, or a PatchGuard DPC routine, or
_C_specific_handler or any other place of interest in the critical PatchGuard
execution path) to prevent PatchGuard's system integrity check from executing.
After the implementor has established control over the debug trap handler
(through whichever means desired), all that remains is to set
debug-register-based breakpoints on target locations. When these breakpoints
are hit, control is transferred to the debug trap handler, and from there to
the implementor's driver code which can act as necessary, such as by altering
the execution context of the processor at the time of the exception before
resuming execution.
The advantages of this approach over directly patching into kernel code (i.e.
opcode replacement) are threefold. First, it is more flexible in that there
are no difficulties with placing an absolute 64-bit jump in an arbitrary
location (in x64, this typically takes around 12 opcode bytes to do from any
arbitrary location in memory). For example, one does not have to worry about
whether a the opcode space overwritten by the jump might overlap a whole
instruction boundary that is a jump target, which might lead to invalid code
being executed. Secondly, this approach can be used to get out of having to
implement a disassembler (or other similar forms of code analysis) in kernel
mode, as hardware breakpoints allow one to gain control of execution at a
precise location without having to worry about creating enough space for a
jump patch, and then placing the original instructions back into a jump stub
to allow execution to resume at the original effective instruction stream (if
desired). Finally, if done correctly, this technique could be implemented in
a truly race-condition free manner (as the only patching that would need to be
done is an interlocked 8-byte swap of a pointer-aligned value in
PsInvertedFunctionTable, if one took that approach).
This approach does require that the implementor pick a location (or
multiple locations) in the kernel that are to have breakpoints set
over in order to gain execution control. There are many
possibilities, such as the DPC dispatcher (where one could filter
out the PatchGuard DPC by detecting, say, invalid kernel pointers in
DeferredContext), the execution dispatcher path (where one could
unwind past a PatchGuard DPC's access violation exception), a
PatchGuard DPC itself (where one could again unwind past with
RtlVirtualUnwind, bypassing PatchGuard if the DPC is being invoked
by PatchGuard), or any other choice area. One of the advantages of
this approach is that it is comparatively easy to intercept
execution anywhere in the kernel that can be reliably located across
kernel versions, making it potentially a great deal more flexible to
being easily adapted to defeat future PatchGuard implementations
than some of the previously discussed bypass techniques.
Normally, the kernel has logic in place that prevents stray kernel addresses
from being placed in debug registers by user mode code via NtSetContextThread.
It may be necessary to make additional alterations to ensure that the custom
values in the debug registers are persisted across context switches, via the
same mechanisms used by the kernel debugger to persist debug registers.
In the author's opinion, this technique would be difficult for
Microsoft to defeat in principle, barring hardware support (like
virtualization). Although Microsoft could move around critical code
paths for PatchGuard, this technique presents a general mechanism by
which any location in the kernel could be surreptitiously
intercepted, thus lending itself to relatively easy adaptation to
future PatchGuard revisions. One approach that could be taken is to
perform increased validation of the debug trap handler in an attempt
to make it more difficult to intercept without being detected by
PatchGuard or some other validation mechanism. Other counters to
this sort of tactic (in general) would be to make it difficult to
reliably locate all of the critical code paths in a consistent and
reliable manner across all kernel versions, from the perspective of
a third party driver. This is likely to prove difficult, as a great
deal of the internal workings of the kernel are exposed in some way
to drivers (i.e. exported functions), or are otherwise indirectly
exposed to drivers (i.e. trap labels via the IDT, exception handlers
via unwind metadata and exports used in the process of dispatching
exceptions to SEH registrations). Completely insulating PatchGuard
from all such externally visible locations (that could be
comparatively easily compromised by a third party driver) would, as
a result, likely be an arduous task.
The debug trap handler can be used to do more than simply evade PatchGuard for
purposes of allowing conventional kernel code patches via opcode replacement.
It can also be utilized in order to completely eliminate the need to perform
opcode-replacement-based kernel patches in order to gain execution control.
In this vein, via assuming control of the debug trap handler in a way that is
transparent to PatchGuard (such as via the proposed
PsInvertedFunctionTable-based approach), it would then be possible to set
debug-register-based breakpoints at every address of interest (assuming that
there enough debug registers to patch all of the locations of interest). From
the debug trap handler, it is possible to completely alter the execution
context at the point of the debug exception, which is exactly the same as what
one could do via traditional opcode-replacement-based patching for a given
location. This sort of transparent patching would be extremely difficult for
Microsoft to detect, because the debug registers must remain available for use
by the kernel debugger. Without completely crippling the ability of the
kernel debugger to set breakpoints without being attached before PatchGuard is
initialized, the author does not see a particularly viable (i.e. without a
trivial workaround) way for Microsoft to prevent the use of debug registers to
alter execution context at select points in the kernel (from a third party
driver). Because such an approach would capitalize on the fact that Microsoft
must, from a business case perspective, make it possible for IHVs and ISVs to
debug their code on Windows, the author believes that it would be unlikely to
be successfully disabled by Microsoft. Furthermore, because such techniques
can be implemented without even having the basic requirement of disabling
PatchGuard, they would be inherently much more likely to work with future
PatchGuard revisions. After all, if PatchGuard can't even detect changes to
the kernel (because kernel code isn't being patched), then there is no reason
to even bother with trying to disable it, which gets one out of the
comparatively messy business of playing catch-up with Microsoft with each new
PatchGuard revision.
|