Uninformed: Informative Information for the Uninformed

Vol 6» 2007.Jan



Interception of KiDebugTrapOrFault

Although many of the proposed techniques for blocking PatchGuard have so far relied on the fact that PatchGuard utilizes SEH to kick off execution of the system integrity check, there are different approaches that can be taken which do not rely on this specific PatchGuard implementation detail. One such alternative technique for bypassing PatchGuard involves subverting the kernel debug fault handler: KiDebugTrapOrFault. This handler represents the entry point for all debug exceptions (such as so-called hardware breakpoints), and as such presents an attractive target for bypassing PatchGuard.

The basis of this proposed technique is to utilize a set of hardware breakpoints to intercept execution at a convenient critical location within PatchGuard's execution path leading up to the system integrity check. This technique has a greater degree of flexibility than many of the previously described techniques, though this flexibility comes at cost of a significantly more involved (and difficult) implementation. Specifically, one could use this proposed technique to intercept control at any point critical to the execution of PatchGuard's system integrity check (for example, the kernel DPC dispatcher, one of the PatchGuard DPC routines, or a convenient location in the exception dispatching code path, such as _C_specific_handler.

The means by which this interception of execution could be accomplished is by assuming control of debug exception handling. This could be done in several different ways; for example, one could hook KiDebugTrapOrFault or alter the IDT directory to simply repoint the debug exception to driver-supplied code, bypassing KiDebugTrapOrFault entirely. There are even ways that this interception could be done in a way that is transparent to the current PatchGuard implementation, such as by intercepting PsInvertedFunctionTable as described in technique 4.3. A driver could then alter the unwind metadata for KiDebugTrapOrFault and create an exception handler for this routine. This step would allow transparent, first-chance access to all debug faults (because KiDebugTrapOrFault internally constructs and dispatches a STATUS_SINGLE_STEP exception describing the debug fault; normally, this would present the STATUS_SINGLE_STEP exception to a debugger, but there is no technical reason why a standard SEH-style exception handler could not catch the exception). Regardless of how control of execution at the debug trap handler is gained, the next step in this proposed approach is to alter execution at the requested point of interest (whether it be the kernel timer DPC dispatcher, which could be easily found by queuing a DPC and executing a virtual unwind, or a PatchGuard DPC routine, or _C_specific_handler or any other place of interest in the critical PatchGuard execution path) to prevent PatchGuard's system integrity check from executing.

After the implementor has established control over the debug trap handler (through whichever means desired), all that remains is to set debug-register-based breakpoints on target locations. When these breakpoints are hit, control is transferred to the debug trap handler, and from there to the implementor's driver code which can act as necessary, such as by altering the execution context of the processor at the time of the exception before resuming execution.

The advantages of this approach over directly patching into kernel code (i.e. opcode replacement) are threefold. First, it is more flexible in that there are no difficulties with placing an absolute 64-bit jump in an arbitrary location (in x64, this typically takes around 12 opcode bytes to do from any arbitrary location in memory). For example, one does not have to worry about whether a the opcode space overwritten by the jump might overlap a whole instruction boundary that is a jump target, which might lead to invalid code being executed. Secondly, this approach can be used to get out of having to implement a disassembler (or other similar forms of code analysis) in kernel mode, as hardware breakpoints allow one to gain control of execution at a precise location without having to worry about creating enough space for a jump patch, and then placing the original instructions back into a jump stub to allow execution to resume at the original effective instruction stream (if desired). Finally, if done correctly, this technique could be implemented in a truly race-condition free manner (as the only patching that would need to be done is an interlocked 8-byte swap of a pointer-aligned value in PsInvertedFunctionTable, if one took that approach).

This approach does require that the implementor pick a location (or multiple locations) in the kernel that are to have breakpoints set over in order to gain execution control. There are many possibilities, such as the DPC dispatcher (where one could filter out the PatchGuard DPC by detecting, say, invalid kernel pointers in DeferredContext), the execution dispatcher path (where one could unwind past a PatchGuard DPC's access violation exception), a PatchGuard DPC itself (where one could again unwind past with RtlVirtualUnwind, bypassing PatchGuard if the DPC is being invoked by PatchGuard), or any other choice area. One of the advantages of this approach is that it is comparatively easy to intercept execution anywhere in the kernel that can be reliably located across kernel versions, making it potentially a great deal more flexible to being easily adapted to defeat future PatchGuard implementations than some of the previously discussed bypass techniques.

Normally, the kernel has logic in place that prevents stray kernel addresses from being placed in debug registers by user mode code via NtSetContextThread. It may be necessary to make additional alterations to ensure that the custom values in the debug registers are persisted across context switches, via the same mechanisms used by the kernel debugger to persist debug registers.

In the author's opinion, this technique would be difficult for Microsoft to defeat in principle, barring hardware support (like virtualization). Although Microsoft could move around critical code paths for PatchGuard, this technique presents a general mechanism by which any location in the kernel could be surreptitiously intercepted, thus lending itself to relatively easy adaptation to future PatchGuard revisions. One approach that could be taken is to perform increased validation of the debug trap handler in an attempt to make it more difficult to intercept without being detected by PatchGuard or some other validation mechanism. Other counters to this sort of tactic (in general) would be to make it difficult to reliably locate all of the critical code paths in a consistent and reliable manner across all kernel versions, from the perspective of a third party driver. This is likely to prove difficult, as a great deal of the internal workings of the kernel are exposed in some way to drivers (i.e. exported functions), or are otherwise indirectly exposed to drivers (i.e. trap labels via the IDT, exception handlers via unwind metadata and exports used in the process of dispatching exceptions to SEH registrations). Completely insulating PatchGuard from all such externally visible locations (that could be comparatively easily compromised by a third party driver) would, as a result, likely be an arduous task.

The debug trap handler can be used to do more than simply evade PatchGuard for purposes of allowing conventional kernel code patches via opcode replacement. It can also be utilized in order to completely eliminate the need to perform opcode-replacement-based kernel patches in order to gain execution control. In this vein, via assuming control of the debug trap handler in a way that is transparent to PatchGuard (such as via the proposed PsInvertedFunctionTable-based approach), it would then be possible to set debug-register-based breakpoints at every address of interest (assuming that there enough debug registers to patch all of the locations of interest). From the debug trap handler, it is possible to completely alter the execution context at the point of the debug exception, which is exactly the same as what one could do via traditional opcode-replacement-based patching for a given location. This sort of transparent patching would be extremely difficult for Microsoft to detect, because the debug registers must remain available for use by the kernel debugger. Without completely crippling the ability of the kernel debugger to set breakpoints without being attached before PatchGuard is initialized, the author does not see a particularly viable (i.e. without a trivial workaround) way for Microsoft to prevent the use of debug registers to alter execution context at select points in the kernel (from a third party driver). Because such an approach would capitalize on the fact that Microsoft must, from a business case perspective, make it possible for IHVs and ISVs to debug their code on Windows, the author believes that it would be unlikely to be successfully disabled by Microsoft. Furthermore, because such techniques can be implemented without even having the basic requirement of disabling PatchGuard, they would be inherently much more likely to work with future PatchGuard revisions. After all, if PatchGuard can't even detect changes to the kernel (because kernel code isn't being patched), then there is no reason to even bother with trying to disable it, which gets one out of the comparatively messy business of playing catch-up with Microsoft with each new PatchGuard revision.