TLB Desynchronization (Split TLB)

All x86 processors supporting protected mode and paging employ a caching scheme to speed up the translation of virtual addresses to physical addresses. This scheme is implemented via a set of Translation Lookaside Buffers, or TLBs, which cache the contents of the page attributes (and associated physical address) for a given virtual address. Recent x86 processors (Pentium II-class or later) utilize several sets of TLBs, such as one set of TLBs for data accesses and one set of TLBs for instruction accesses. In normal system operation, both TLBs (if a processor supports multiple TLBs) maintain consistent views for the attributes of a particular page; however, it is possible to deliberately desynchronize the contents of these TLBs, thereby maintaining the illusion that a single page has different attributes depending on whether it is referenced as data or as executable code. This deliberate desynchronization of TLBs has many uses, from the implementation of no-execute support (utilized by PaX/GRsec on GNU/Linux [6]) to ``memory cloaking'', a technique often used by rootkits to provide one view of memory when memory is referenced as data by a read operation, and a different view of memory if memory is referenced by an instruction fetch. This same memory cloaking technique that has appealed to rootkit developers for the purpose of hiding rootkits from detection can also be used to hide kernel patching from PatchGuard's integrity check. Strictly speaking, this proposed technique is not a bypass mechanism for PatchGuard; rather, it is a mechanism to hide kernel patching from PatchGuard, thus making PatchGuard harmless to third parties that are patching the kernel.

The details of this approach are essentially similar in many respects to that of any program implementing a split-TLB approach to altering page attributes or contents based on execute or read fetches. The exact details behind how this can be accomplished are beyond the scope of this paper, and are discussed elsewhere, by the PaX team (in the context of implementing no-execute on legacy platforms) [6], and by Sherri Sparks and Jamie Butler (in the context of implementing a Windows rootkit that utilizes split-TLBs to implement so-called ``memory cloaking'') [7]. Interested readers are encouraged to review these references for the raw details on how the general split-TLB concept is implemented. Although the referenced articles directly apply to x86, the concepts apply in principle to x64 as well, and can likely be made to work on x64 with minimal modification.

After one has established a mechanism for desynchronizing TLBs (such as by hooking the page fault handler), the recommended approach for this technique is to desynchronize the TLBs for any regions in the kernel where one is performing traditional opcode-replacement-based patching or hooking. Specifically, when kernel code is read for execute on a page where an opcode-replacement-based patch is in place, then the patched page should be returned. If kernel code is read for a data reference (such as PatchGuard making a read of kernel code to validate its integrity), then the original data should be returned. This technique effectively hides all modifications to kernel code to any access other than direct execution, which prevents PatchGuard from detecting that kernel code has been altered by a third party.

Note that in order for this approach to succeed, the hook on the page fault handler itself must be hidden from PatchGuard. This cannot be directly accomplished by the same TLB desynchronization tactic, as the page fault handler must remain resident. A combined approach, such as utilizing a debug breakpoint on the page fault handler (when coupled with a subverted debug trap handler, perhaps via PsInvertedFunctionTable as described previously in technique 3) along with a scheme to prevent PatchGuard from disabling debug-register-based breakpoints (such as described in technique 5) might be needed in order to hook the page fault handler in a manner truly transparent to PatchGuard.

The most logical defense for this approach is to attempt to detect a compromise in the page fault dispatching path. Because TLB desynchronization cannot in general be used to hide the page fault handler itself (the page fault handler must remain marked present in memory), it would be difficult for a third party to conceal the alteration to the page fault handler from the kernel. This difficulty would be expressed in a limited number of ways in which alterations to the page fault handler could be hidden, such as by clever utilization of debug registers. As a result, the key to preventing this technique from remaining viable is to develop a way for PatchGuard to detect the page fault hook. If, for example, the debug trap handler and a debug breakpoint on the page fault handler were used to gain control on a page fault, then Microsoft might be able to prevent this technique by blocking or detecting the interception of the debug trap handler. One such approach might be to better secure PsInvertedFunctionTable, which represents an easy way for a third party to subvert the debug trap handler without PatchGuard's knowledge. Such counters will vary based on the mechanism used to hide the page fault handler hook, however.