Subverting PatchGuard

PatchGuard currently possesses a formidable array of defensive mechanisms that are aimed at making it difficult to reverse engineer and debug. Given that Microsoft does not currently have in place the infrastructure to make PatchGuard enforced by hardware, this is arguably the best that Microsoft will ever really be able to do in the short term. They're only able to build a system that is based on obfuscation and anti-debugging techniques in an attempt to make it difficult for third parties to detect, disable, or bypass it.

There are other classes of software that seek to create defenses similar to those of PatchGuard's. However, these other classes usually have far more nefarious purposes than preventing third parties from patching the kernel. Specifically, anti-debugging, anti-reverse-engineering, and self-decrypting code have often used been to hide viruses, rootkits, and other malicious software on compromised systems.

Although Microsoft may have intended the defensive mechanisms employed by PatchGuard for an (arguably) good cause, these same anti-debugging, anti-detection, and anti-reverse-engineering techniques that protect PatchGuard from attack by third party drivers can also be subverted to protect custom code from detection or analysis by anti-virus or anti-rootkit software. With this respect, Microsoft has created a double-bladed-sword, as the same elaborate obfuscation and anti-debugging schemes that guard PatchGuard against third party software can also be used to guard malicious software from system security software. It is in fact quite possible to subvert PatchGuard version 2's myriad defenses to execute custom code instead of PatchGuard's system integrity check routine. While doing so might not be exactly called trivial, it is far from impossible.

In order to subvert PatchGuard to do one's bidding, one must first catch PatchGuard in the act, so to speak. To accomplish this, the author recommends turning to one of the proposed bypass techniques as a starting place. For example, consider the first proposed bypass technique, wherein the author recommends hooking _C_specific_handler to intercept control of execution at the exception generated by the PatchGuard DPC routine in order to trigger execution of the system integrity check. An implementation of this bypass technique provides direct access to the machine context inside the PatchGuard DPC routine, and this machine context contains the information necessary to locate the PatchGuard system integrity check routine.

Since the objective is to repurpose the system integrity check routine to execute custom code, this is a good starting point. However, determining the location of the system integrity check routine is much more involved than simply skipping over PatchGuard's checks entirely; the pointer to the routine in question is encrypted based off of the original arguments to the DPC (the Dpc and DeferredContext arguments). Additionally, the original arguments to the PatchGuard DPC have at this point already been moved from registers to the stack and obfuscated (rotated left or right by a magical constant). As the original contents of the argument registers are deliberately overwritten by the DPC routine before the access violation is triggered, there is no choice other than to somehow fish the DPC arguments out of the caller's stack. This is actually somewhat of a challenge, given that such an approach must work for all kernel versions, and must also work for all of the different DPC permutations. Since this set of possibilities represents an unmaintainably large number of routines to reverse engineer in order to determine rotate obfuscation values and stack offsets, a more generalized approach to locating the original arguments on the stack must be taken. In order to create such a generic approach, one must take a closer look at the first few instructions of each DPC routine (leading up to the intentional access violation). Although PatchGuard has put into place several barriers to prevent easy retrieval of the original arguments from this context, there might be a pattern or weakness that could be exploited in order to recover the arguments in question.

The basic things common to each DPC routine, when it comes to the machine context at the time of the access violation, are:

The original arguments have been stored on the stack in an obfuscated form (rotated left or right by an arbitrary magical constant).
The access violation always occurs by dereferencing rax. Here, rax is always the deobfuscated form of the DeferredContext argument. This gives us one of the arguments for free, as rax in the register context at the time of the access violation is always the plaintext DeferredContext value.
The stack location where the Dpc argument is stored at varies greatly between DPC version to DPC version. Furthermore, it also varies between different kernel flavors within an operating system family, and between operating system families. As a result, it is not practical to hardcode stack displacements for this argument.
The instruction immediately prior to the faulting instruction is always an instruction in the form of ror rax, <magical constant>. Here, the magical constant is an immediate value, which means that it is encoded as a part of the opcode for this instruction itself. Each DPC has its own unique magical constant, and the magical constants used do not change for a particular DPC flavor across all kernel flavors and operating system families. This gives us a nice way to quickly identify which of the ten PatchGuard DPCs is in use from the context of the _C_specific_handler hook (without having to do ugly code fingerprinting or analysis). Unfortunately, we still don't have a way to determine the stack displacement of the Dpc argument.
The r8 register is always equal to the original Dpc argument, shifted right by the low byte of the DeferredContext argument. Although this may seem tantalizingly close to what we're looking for, it can't actually be used as a substitute for the original Dpc argument, even though the DeferredContext argument is known here (due to the value of rax). This is because the right shift operation is destructive, in that information is permanently lost as bits are shifted right off of the register into oblivion. As a result, depending on the low byte of the DeferredContext argument, important bits in the Dpc argument have already been permanently lost in the pseudo-copy residing in r8.

Although the situation may initially appear grim, it is in fact still possible to locate the Dpc argument given the above information; all that is needed is a bit of work (and getting one's hands dirty with some ugly tricks). Specifically, it is possible to search the stack frame of the DPC routine for the Dpc argument with a brute-force attack. This isn't exactly elegant, but it gets the job done. There are a number of hints that can be used to increase the chance of successfully finding the real Dpc argument on the stack:

The stack is 8-byte aligned (at least) due to x64 calling convention requirements, and the Microsoft C/C++ compiler will always place pointer-sized values on the stack in 8-byte-aligned locations. As a result, the search can be narrowed down to 8-byte-aligned locations on the stack, instead of a bytewise search.
Because the identity of the current DPC routine is known (due to analyzing the ror instruction immediately preceding the faulting mov eax, [rax] instruction), the rotate constant used to obfuscate the Dpc argument is known. Each DPC routine has its own unique magical rotate constant, and as the current DPC routine has been positively identified, the rotate constant used to obfuscate the Dpc argument on the stack is thus also known.
A quick check as to whether a value on the stack could possibly be the Dpc argument can be made by rotating the value on the stack by the known obfuscation constant, then shifting the value right by the low byte in the DeferredContext argument and comparing the result to the r8 value at the time of the exception. If there is a mismatch, then the current stack location can be eliminated from the search. This does not provide a positive match, but it does provide a way to positively eliminate possibilities. This step is also optional, in that it is still possible to locate the Dpc argument without relying on r8; the check against r8 is simply an optimization.
The Dpc argument should point to a valid non-paged pool address, given that it must represent a valid kernel pointer. In order to check that this is the case, MmIsAddressValid can be used to test whether the deobfuscated value in question is a valid pointer or not. (Yes, MmIsAddressValid is a bit of a race condition and certainly a hack. The author would like to note that this approach was described as requiring that the implementor get his or her ``hands dirty with some ugly tricks'', in an attempt to forstall the inevitable complaints about how this approach might be decried as an unstomachable ugly hack by some.)
The Dpc argument should point to a valid non-paged pool address whose length is great enough to contain a KDPC object, plus at least one pointer-sized additional field. A secondary MmIsAddressValid test can be used to verify that the pointer describes a valid region large enough to contain the KDPC object, plus the additional pointer-sized field following it (the PatchGuard decryption key).
The Dpc argument should point to a DPC whose Type and DeferredContext arguments have been zeroed. (The DPC routine intentionally zeros these values in the DPC before intentionally triggering an access violation.) If the suspected Dpc argument, when treated as a PKDPC, does not have these properties then it can be eliminated as a possibility.

By repeatedly applying these rules to every applicable location within a reasonable distance upward from the rsp value at the time of the exception (say, 256 bytes, although the exact size can be greater; the only requirement is that the entire local variable space of the DPC routine with the largest local variable space is completely contained within the search region), it is possible to recover the Dpc argument with virtual certainty. In the author's experience, this technique works quite reliably, despite that one might intuit that a search of an unknown stack frame might be prone to failing or turning up false positives.

After both the Dpc and DeferredContext arguments to the PatchGuard DPC routine have been recovered, it is a simple matter of analyzing how PatchGuard invokes the system integrity check in order to determine how to locate it in-memory. This has been discussed previously, and it amounts to the following set of statements:

ULONG64 DecryptionKey, PatchGuardCheckFunction;

DecryptionKey            = *(PULONG64)(Dpc + 0x40);
PatchGuardCheckFunction  = DecryptionKey ^ DeferredContext;
PatchGuardCheckFunction |= 0xFFFFF80000000000;

At this point, it's almost possible to replace the system integrity check routine with custom code. However, there is still the matter of the pesky self-decrypting stub that runs before the check function. Because the DPC routine's exception handler rewrites the first instruction of the stub before it is executed, one doesn't have a whole lot of choice but to implement at least a very basic version of the decryption stub for the system integrity check routine.

Recall that the first instruction in the stub is set to the following:

lock xor qword ptr [rcx],rdx

Looking at the prototype for the decryption stub, rcx corresponds to the address of the decryption stub itself, and rdx corresponds to the decryption key. Since this instruction modifies both itself and the next instruction (the instruction is four bytes long and the xor alters eight bytes), the replacement code for the system integrity check routine must allow the first instruction to be the above xor instruction, and the must allow for the second instruction (at a minimum) to be initially xor-obfuscated. For simplicity's sake, the author has chosen to implement the simplest possible solution to this conundrum, which is to make the second instruction in the replacement code a duplicate of the first instruction. In other words, the replacement code would read as follows:

;
; This instruction is forced on us by PatchGuard,
; and cannot be altered; it is rewritten at runtime.
;

lock xor qword ptr [rcx],rdx

;
; The next instruction, conveniently four bytes
; long, re-encrypts itself by xoring the first
; eight bytes of the decryption stub (which includes
; the second instruction) by the decryption key a
; second time;
;

lock xor qword ptr [rcx],rdx

;
; (... any custom code may follow here ...)
;

As noted previously, after specially constructing the replacement code, it is necessary to initially encrypt the second instruction (as it will be immediately decrypted by the first instruction). This must be done before control is returned to PatchGuard.

After the custom code is configured and the second instruction is encrypted, all that remains is to copy the custom code over the PatchGuard decryption stub. When this is accomplished, the PatchGuard DPC's exception handler will invoke the supplied custom code instead of the system integrity check routine.

However, this is not really all that interesting due to the fact that PatchGuard utilizes a one-shot timer. The custom code that was substituted for the decryption stub will never be run again. To account for this fact, it would be prudent to place a call to queue a timer with an associated DPC routine (pointing to the DPC routine that PatchGuard selected at boot) within the custom code block.

At this point, it is possible to simply allow the normal exception dispatching process to continue (i.e. to resume _C_specific_handler), after which the custom code will be invoked instead of PatchGuard. In essence, PatchGuard has been not only disabled, but completely subverted to call customized code under the control of a third party driver instead of the system integrity check.

Still, the situation is less than optimal. Presently, there is still a hook in _C_specific_handler that is there for anyone to see (and recognize that someone has tampered with the kernel). Additionally, the driver that was used to subvert PatchGuard in the first place is still loaded, which may also be a tell-tale giveaway sign that someone may have done something unsavory to the kernel.

These problems are also solvable, however. It turns out that after PatchGuard has been subverted, it is safe to unhook from _C_specific_handler, and then simply call back into _C_specific_handler after the hook is removed. Furthermore, everything necessary to run the subverted system integrity check routine could even reside within PatchGuard's own internal data structures; for example, one could simply utilize extra space after the custom code, where the decryption stub and PatchGuard check routine would normally reside as a parameter block. This is especially convenient, as the custom code block is given a pointer to itself in rcx (the first argument), and it is easy to add a known constant value to that pointer in order to retrieve the parameter block for the custom code. At this point, all of the code and data necessary for the custom code that the driver has subverted PatchGuard with is located in dynamically allocated memory. Given this, the original driver is no longer needed and can even be unloaded (so as to further disguise the fact that any alterations to the kernel have taken place). After the driver has been unloaded, the only traces of the alterations that have taken place would be the unloaded module list (easily modified), and the re-written PatchGuard system integrity routine itself (which could easily be bolstered to be self-decrypting (with a differing encryption key in order to make for an extremely difficult to locate target in-memory).

The end result is that PatchGuard has been disabled, and in its place, arbitrary custom code is periodically executed. Furthermore, no modifications or patches to kernel code or global data are present and no suspicious drivers (or even suspicious extraneous memory allocations) remain present in memory. In essence, the only traces of the fact that PatchGuard has been subverted would be visible only to someone (or something) that knows how to locate and disable PatchGuard.

The supplied example program for subverting PatchGuard is fairly simple, and it does not utilize all of the defensive technologies employed by PatchGuard. For instance, it does not change the decryption key on every execution, nor does it follow through with keeping the entire code block encrypted except just before execution. These features could be easily added, however, and would greatly increase the difficulty of locating the subverted PatchGuard code in memory.