Hybrid Exception Interception and Memory Searching

As PatchGuard 3 utilizes completely randomized (self-decrypting) blocks of code and data for its constituent PatchGuard contexts in the SEH execution case, it is not generally possible to trivially locate and disable PatchGuard contexts through a non-paged pool scan. Additionally, due to PatchGuard 3's break on relying upon SEH to invoke PatchGuard in all cases, it is also not generally possible to disable PatchGuard 3 reliably via interception of the SEH dispatching code path.

While these defenses do complement one another, there still exists weaknesses that can be exploited by a third party. Specifically, when PatchGuard is running through a re-purposed DPC routine that is invoked via SEH, it is vulnerable in that the SEH dispatching code path can be intercepted to locate (and disable) PatchGuard just before it is executed. Furthermore, in the case where PatchGuard runs without any SEH obfuscation, it is vulnerable to a memory search, as there is (necessarily) some static code placed in non-paged pool memory which makes the translation between the DPC function calling convention and the PatchGuard stage 1 decryption routine's calling convention.

By combining a memory search approach with the previously described SEH interception approach, it is possible to attack both launch vectors of PatchGuard simultaneously, with the effect of disabling it no matter which vector(s) are used in a particular boot.

However, there are still some sticking points that need to be resolved in the SEH interception case. As previously mentioned, the SEH-obfuscation-based launch vector was significantly improved over PatchGuard 2, with obfuscation of the exception information and randomization of the call stack from the point of view of the exception dispatcher logic itself. These obstacles must be overcome in order to successfully mount an attack using this approach against PatchGuard 3.

The first problem relating to the obfuscation and randomization of the exception information turns out to not be the roadblock that one might think at first glance. There are some weaknesses of the obfuscation logic that allow the true colors of the exception to show through if one is clever about examining the information available at the point of _C_specific_handler. Furthermore, it is also possible to hook at a lower level than _C_specific_handler, such as KiGeneralProtectionFault (easily located by examining the IDT), which would get one in before the assembly-language exception handler logic has a chance to fudge the exception information.

Although the KiGeneralProtectionFault vector is easier to implement in that it completely bypasses one of the new defensive mechanisms with respect to the SEH-related PatchGuard execution code path, it is again still possible to attack PatchGuard using _C_specific_handler by relying upon information leakage when _C_specific_handler is called. Specifically, all exceptions altered by PatchGuard originate within the confines of the kernel itself, all of the exceptions have two parameters (most of the "legitimate" versions of exceptions like STATUS_INSUFFICIENT_RESOURCES always have zero parameters, because they originate from within RtlRaiseStatus which never stores any exception parameters in the exception record), and somewhere in the call stack the kernel routine responsible for dispatching DPCs or timer DPCs is going to be present.

By combining these facts, it is possible to make a highly accurate determination as to whether an exception is caused by PatchGuard. The latter piece of information (checking whether the routine responsible for calling the DPC or timer DPC is in the call stack) also proves valuable when one must later counteract the second defense added to the SEH code path, that is, the randomization of the call stack.

In order to determine whether the DPC or timer DPC dispatcher is in a given call stack, it is first necessary to locate it in the kernel image. There are some complications here. First of all, the timer DPC dispatcher routine has three call instructions that can call a timer DPC, not all of which are readily triggerable. Additionally, neither the timer DPC dispatcher or the DPC dispatcher are exported.

However, while it is not possible to simply ask for the addresses of those two routines, it is possible to find them programmatically by requesting that a DPC and a timer DPC be executed through the documented APIs for DPCs and timer DPCs. From within the DPC or timer DPC routine, it is then possible to locate the return address via the use of the _ReturnAddress() compiler intrinsic. This works because the return address will be guaranteed to reside within the DPC or timer DPC dispatcher. Alternatively, an assembly language routine could be written that simply examines the current pointer at [rsp] at the time of the call.

This still leaves a problem in the timer DPC dispatcher case, as there are three call instructions, and it is not easy to observe calls from all three call sites within the timer DPC dispatcher on-demand, since it is necessary to programmatically find the return points at runtime. However, once again, the very same metadata that is critical to x64 SEH support dooms PatchGuard with respect to this approach, as it is possible to go from an arbitrary instruction in the middle of any function to the start of that function, by following chained unwind metadata until an unwind metadata block is reached that has no parent[3]. This top-level unwind metadata block has a reference to the first instruction in the function. Now that it is possible to locate the start of a function from any arbitrary valid instruction location within that function, it becomes trivial to determine if two addresses reside in the same function; to do this, one must only follow the unwind metadata chain for both addresses, and then check to see whether both top-level unwind metadata blocks refer to the same function. With this technique, combined with the ability to locate at least one call site within the timer DPC dispatcher, it again becomes possible to identify the timer DPC dispatcher, as no matter which call site is used, it will be guaranteed that the call site resides within the timer DPC dispatcher routine KiTimerExpiration. By comparing top-level unwind metadata blocks, it becomes possible to authoritatively discern whether any arbitrary instruction resides within the timer DPC dispatcher or not.

It is also possible to bypass the alterations to the exception (and instruction pointer) addresses that KiCustomAccessHandler (the assembly-language "first chance" exception handler routines for the repurposed DPC routines) makes by performing a stack trace from the _C_specific_handler itself instead of relying on the context record or exception handler information. This is because the call stack is conveyed as if the faulting instruction in the repurposed DPC call stack was the site of a call to KiGeneralProtectionFault. As a result, it is possible to substitute the current context for the context presented to _C_specific_handler for unwind purposes. This also provides a layer of defense against Microsoft altering other registers in the exception handler context in future PatchGuard revisions, which could cause manual unwinds to return incorrect register values, resulting in system crashes after an unwind intended to effect a hard return out of the re-purposed DPC routine.

Furthermore, by clever usage of this mechanism for determining whether an address resides within a particular function, it is also now possible to determine the real return address for any given re-purposed DPC routine. Specifically, by checking whether each address in the call stack as of _C_specific_handler is within either the DPC dispatcher or the timer DPC dispatcher, one can determine whether a given call frame corresponds to the call site that called the re-purposed DPC routine or not, irrespective of any random amount of bogus function calls that may be layered on top of the re-purposed DPC. This in turn defeats the remaining improvement to the SEH PatchGuard code path, as it once again becomes possible to cleanly unwind from any arbitrary point in the PatchGuard exception callstack.

Through the combination of the ability to either circumvent entirely or "see through" the deception that KiCustomAccessHandler creates over the exception information passed to _C_specific_handler, and the ability to recover the correct return address of a repurposed DPC routine, it now becomes possible to disable the SEH control flow path of PatchGuard 3. This leaves the remaining problem of locating the non-SEH control flow path of PatchGuard in non-paged pool memory as the last piece of the puzzle with respect to this method of disabling PatchGuard. However, locating the trampoline routine that adapts a DPC routine call to a PatchGuard stage 1 decryption stub call is trivial, as the adapter trampoline is static and contains a very recognizable signature in terms of the constants written to the beginning of the decryption stub. In order to disable the trampoline routine, it is enough to simply patch it with a "ret" instruction (effectively the same thing as the SEH bypass technique, but as implemented in code instead of a virtual unwind).

The source code to a working implementation of the hybrid exception interception and memory searching bypass technique for PatchGuard 3 is included with the article.

Although this approach is successful in disabling the current iteration of PatchGuard 3, it is not without its weaknesses. Microsoft could, for instance, disable this technique via altering the SEH-less PatchGuard DPC-to-decryption-stub adapter to not be static (i.e. randomization of the code placed into non-paged pool at runtime). There are also a number of assumptions of the SEH-based approach that could be invalidated by Microsoft in a future PatchGuard release. However, in keeping with the fact that it is possible to gain control flow at a lower level than the exception dispatcher path itself (i.e. patching KiGeneralProtectionFault), the author feels that it would be better to focus on removing relevant information before any exception handlers (assembler or C-language) are called instead of after the defining moment (in other words, the exception) occurs, as it is the exception that presents the first easily-accessible interception point to an outside attacker.