|
TLB Desynchronization (Split TLB)
All x86 processors supporting protected mode and paging employ a
caching scheme to speed up the translation of virtual addresses to
physical addresses. This scheme is implemented via a set of
Translation Lookaside Buffers, or TLBs, which cache the contents of
the page attributes (and associated physical address) for a given
virtual address. Recent x86 processors (Pentium II-class or later)
utilize several sets of TLBs, such as one set of TLBs for data
accesses and one set of TLBs for instruction accesses. In normal
system operation, both TLBs (if a processor supports multiple TLBs)
maintain consistent views for the attributes of a particular page;
however, it is possible to deliberately desynchronize the contents
of these TLBs, thereby maintaining the illusion that a single page
has different attributes depending on whether it is referenced as
data or as executable code. This deliberate desynchronization of
TLBs has many uses, from the implementation of no-execute support
(utilized by PaX/GRsec on GNU/Linux [6]) to ``memory
cloaking'', a technique often used by rootkits to provide one view
of memory when memory is referenced as data by a read operation, and
a different view of memory if memory is referenced by an instruction
fetch. This same memory cloaking technique that has appealed to
rootkit developers for the purpose of hiding rootkits from detection
can also be used to hide kernel patching from PatchGuard's integrity
check. Strictly speaking, this proposed technique is not a bypass
mechanism for PatchGuard; rather, it is a mechanism to hide kernel
patching from PatchGuard, thus making PatchGuard harmless to third
parties that are patching the kernel.
The details of this approach are essentially similar in many respects to that
of any program implementing a split-TLB approach to altering page attributes
or contents based on execute or read fetches. The exact details behind how
this can be accomplished are beyond the scope of this paper, and are discussed
elsewhere, by the PaX team (in the context of implementing no-execute on
legacy platforms) [6], and by Sherri Sparks and Jamie Butler (in the context
of implementing a Windows rootkit that utilizes split-TLBs to implement
so-called ``memory cloaking'') [7]. Interested readers are encouraged to review
these references for the raw details on how the general split-TLB concept is
implemented. Although the referenced articles directly apply to x86, the
concepts apply in principle to x64 as well, and can likely be made to work on
x64 with minimal modification.
After one has established a mechanism for desynchronizing TLBs (such
as by hooking the page fault handler), the recommended approach for
this technique is to desynchronize the TLBs for any regions in the
kernel where one is performing traditional opcode-replacement-based
patching or hooking. Specifically, when kernel code is read for
execute on a page where an opcode-replacement-based patch is in
place, then the patched page should be returned. If kernel code is
read for a data reference (such as PatchGuard making a read of
kernel code to validate its integrity), then the original data
should be returned. This technique effectively hides all
modifications to kernel code to any access other than direct
execution, which prevents PatchGuard from detecting that kernel code
has been altered by a third party.
Note that in order for this approach to succeed, the hook on the page fault
handler itself must be hidden from PatchGuard. This cannot be directly
accomplished by the same TLB desynchronization tactic, as the page fault
handler must remain resident. A combined approach, such as utilizing a debug
breakpoint on the page fault handler (when coupled with a subverted debug trap
handler, perhaps via PsInvertedFunctionTable as described previously in
technique 3) along with a scheme to prevent PatchGuard from disabling
debug-register-based breakpoints (such as described in technique 5) might be
needed in order to hook the page fault handler in a manner truly transparent
to PatchGuard.
The most logical defense for this approach is to attempt to detect a
compromise in the page fault dispatching path. Because TLB
desynchronization cannot in general be used to hide the page fault
handler itself (the page fault handler must remain marked present in
memory), it would be difficult for a third party to conceal the
alteration to the page fault handler from the kernel. This
difficulty would be expressed in a limited number of ways in which
alterations to the page fault handler could be hidden, such as by
clever utilization of debug registers. As a result, the key to
preventing this technique from remaining viable is to develop a way
for PatchGuard to detect the page fault hook. If, for example, the
debug trap handler and a debug breakpoint on the page fault handler
were used to gain control on a page fault, then Microsoft might be
able to prevent this technique by blocking or detecting the
interception of the debug trap handler. One such approach might be
to better secure PsInvertedFunctionTable, which represents an easy
way for a third party to subvert the debug trap handler without
PatchGuard's knowledge. Such counters will vary based on the
mechanism used to hide the page fault handler hook, however.
|