Subverting PatchGuard Version 2
Skywing
12/2006
skywing@valhallalegends.com 
http://www.nynaeve.net
            
1) Foreword

Abstract: Windows Vista x64 and recently hotfixed versions of the Windows
Server 2003 x64 kernel contain an updated version of Microsoft's kernel-mode
patch prevention technology known as PatchGuard.  This new version of
PatchGuard improves on the previous version in several ways, primarily dealing
with attempts to increase the difficulty of bypassing PatchGuard from the
perspective of an independent software vendor (ISV) deploying a driver that
patches the kernel.  The feature-set of PatchGuard version 2 is otherwise
quite similar to PatchGuard version 1; the SSDT, IDT/GDT, various MSRs, and
several kernel global function pointer variables (as well as kernel code) are
guarded against unauthorized modification.  This paper proposes several
methods that can be used to bypass PatchGuard version 2 completely.  Potential
solutions to these bypass techniques are also suggested.  Additionally, this
paper describes a mechanism by which PatchGuard version 2 can be subverted to
run custom code in place of PatchGuard's system integrity checking code, all
while leaving no traces of any kernel patching or custom kernel drivers loaded
in the system after PatchGuard has been subverted.  This is particularly
interesting from the perspective of using PatchGuard's defenses to hide kernel
mode code, a goal that is (in many respects) completely contrary to what
PatchGuard is designed to do.

Thanks: The author would like to thank skape, bugcheck, and Alex Ionescu.

Disclaimer: This paper is presented in the interest of education and the
furthering of general public knowledge.  The author cannot be held responsible
for any potential use (or misuse) of the information disclosed in this paper.
While the author has attempted to be as vigilant as possible with respect to
ensuring that this paper is accurate, it is possible that one or more mistakes
might remain.  If such an inaccuracy or mistake is located, the author would
appreciate being notified so that the appropriate corrections can be made.

2) Introduction

With x64 versions of the Windows kernel, Microsoft has attempted to take an
aggressive stance[1] against the use of a certain class of techniques that have
been frequently used to ``extend'' the kernel in potentially unsafe fashions
on previous versions of Windows.  This includes patching the kernel itself,
hooking the kernel's system service tables, redirecting interrupt handlers,
and several other less common techniques for intercepting control of execution
before the kernel is reached, such as the alternation of the system call
target MSR.

The technology that Microsoft has deployed to prevent the unauthorized
patching of the kernel that has been historically rampant on x86 is known as
PatchGuard.  This technology was initially released with Windows Server 2003
x64 Edition and Windows XP x64 Edition (known as PatchGuard version 1).  The
x64 editions of Windows Vista, and recently hotfixed versions of the Windows
Server 2003 x64 kernel contain a newer version of the PatchGuard technology,
known as PatchGuard version 2.  The new version is designed to make it
significantly more difficult for independent software vendors (ISVs) to
deploy, in the field, solutions that involve patching the kernel after
disabling the kernel patch protection mechanisms afforded by PatchGuard.  The
inner details of PatchGuard itself are much the same as they were in
PatchGuard version 1 and thus will not be discussed in detail in this paper
(excluding version 2's improved anti-debugging and anti-patch technologies).
A sufficiently interested reader wishing some more background information on
the subject may find out more about how PatchGuard version 1 functions in
Uninformed's previous article [2] on the subject, ``Bypassing PatchGuard on
Windows x64''.

PatchGuard version 2 takes the original PatchGuard release and attempts to
plug various holes in its implementation of an obfuscation-based anti-patching
system.  In this respect, it has met some mixed success and failure.  Although
the new PatchGuard version does, on the surface, appear to disable the
majority of the bypass techniques that had been proposed [2] as means to disable
the original PatchGuard release, at least several of these techniques may be
fairly trivially re-enabled through some minor alterations or additional new
code.  Furthermore, it is still possible to bypass PatchGuard version 2
without relying on dangerous (version-specific) constructs such as hard-coded
offsets or code fingerprinting on frequently changing code.  Additionally,
aside from techniques that are based on disabling PatchGuard itself, there
still exist several potential bypass mechanisms that have a strong potential
to be ``future-compatible'' with new PatchGuard versions by virtue of
preventing PatchGuard from even detecting that unauthorized alternations to
the kernel have been made (and thus isolating themselves from any
obfuscation-based changes to how PatchGuard's system integrity check is
invoked).  To Microsoft's credit, however, the resilience of PatchGuard to
being debugged and analyzed has been significantly improved (at least with
regard to certain key steps, such as initialization at boot time).

3) Notable Protection Mechanisms

PatchGuard version 2 implements a variety of anti-debug, anti-analysis, and
obfuscation mechanisms that are worth covering.  Not all of PatchGuard's
defenses are covered in detail in this paper, and those mechanisms (such as
the obfuscation of PatchGuard's internal data structures) that are at least
the same in principle as the previous PatchGuard release (and were already
disclosed by Uninformed's previous article [2] on PatchGuard) are additionally
not covered by this paper.

3.1) Anti-Debug Code During Initialization

That being said, there are still a number of interesting things to examine as
far as PatchGuard's protection mechanisms go.  Many of these techniques are on
their own worthy of discussion, simply from the perspective of their worth as
general debug/analysis protection mechanisms.  PatchGuard version 2 begins as
an appended addition to the nt!SepAdtInitializePrivilegeAuditing routine in
the kernel (PatchGuard version 2 continues the tactic of misleading and/or
bogus function names that PatchGuard version 1 introduced).  This routine is
responsible for performing the bulk of PatchGuard's initialization, including
setting up the encrypted PatchGuard context data structures.  Unlike
PatchGuard version 1, the initialization routine is littered with statements
that are intended to frustrate debugging, such as the following construct that
enters an infinite loop if a debugger is connected (this particular construct
is used in many places during PatchGuard initialization):

cli
cmp     cs:KdDebuggerNotPresent, r12b
jnz     short continue_initialization_1
infinite_loop_1:
jmp     short infinite_loop_1
sti

This particular approach is not all that robust as currently implemented in
PatchGuard version 2 today.  It remains relatively easy to detect these
references to nt!KdDebuggerNotPresent ahead of time, and disable them.  If
Microsoft had elected to corrupt the execution context in a creative way on
each occurrence (such as zeroing some registers, or otherwise arranging for a
failure to occur much later on if a debugger was attached) before entering the
forever loop, then these constructs might have been slightly effective as far
as anti-debugging goes.

Other constructs include the highly obfuscated selection of a randomized set
of bogus pool tags used to allocate PatchGuard data structures.  Like
PatchGuard version 1, PatchGuard version 2 uses a randomly chosen bogus pool
tag and randomly adjusted allocation sizes in an attempt to frustrate easy
detection of the PatchGuard context in-memory by scanning pool allocations.
The following is an example of one of the sections of code used by PatchGuard
to randomly pick a pool tag and random allocation delta from a list of
possible pool tags.  The actual allocation size is the random allocation delta
plus the minimum size of the PatchGuard context structure, truncated at 2048
bytes.  Here, the rdtsc instruction is used for random number generation
purposes (readers that have examined the previous [2] PatchGuard paper may
recognize this random number generation construct; it is used throughout
PatchGuard anywhere a random quantity is required).

;
; Generate a random value, using rdtsc.
;
lea     ebx, [r14+r13+200h]
mov     dword ptr [rsp+0A28h+Timer], ebx
rdtsc
mov     r10, qword ptr [rsp+0A28h+arg_5F8]
shl     rdx, 20h
mov     r11, 7010008004002001h
or      rax, rdx
mov     rcx, r10
xor     rcx, rax
lea     rax, [rsp+0A28h+var_2C8]
xor     rcx, rax
mov     rax, rcx
ror     rax, 3
xor     rcx, rax
mov     rax, r11
mul     rcx
mov     [rsp+0A28h+var_2C8], rax
xor     eax, edx
mov     [rsp+0A28h+arg_1F0], rdx
;
; This is essentially a switch(eax & 7), where eax
; is a random value.  Each case statement selects
; a unique obfuscated pooltag value.  The magical
; 0x432E10h constant below is the offset used to
; jump to the switch case handler selected.
;
lea     rdx, cs:400000h
and     eax, 7
mov     ecx, [rdx+rax*4+432E10h]
add     rcx, rdx
jmp     rcx
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 0D098D0D8h
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
ror     r9d, 6
jmp     DoAllocation
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 0B2AD31A1h
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
rol     r9d, 1
jmp     DoAllocation
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 85B5910Dh
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
ror     r9d, 2
jmp     DoAllocation
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 0A8223938h
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
xor     r9d, 3
ror     r9d, 0Fh
jmp     DoAllocation
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 67076494h
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
rol     r9d, 4
jmp     DoAllocation
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 288C49EDh
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
ror     r9d, 5
jmp     DoAllocation
--------------------------------------------------
mov     dword ptr [rsp+0A28h+var_9D8], 4E574672h
mov     r9d, dword ptr [rsp+0A28h+var_9D8]
xor     r9d, 6
ror     r9d, 18h
jmp     DoAllocation
--------------------------------------------------
DoAllocation:
;
; Get another random value (for the allocation size),
; and deobfuscate the pooltag value that was selected.
;
; Eventually, the value ending up in "r9d" is used as
; the pooltag value.
;
rdtsc
shl     rdx, 20h
mov     rcx, r10
or      rax, rdx
xor     rcx, rax
lea     rax, [rsp+0A28h+var_858]
xor     rcx, rax
mov     rax, rcx
ror     rax, 3
xor     rcx, rax
mov     rax, r11
mul     rcx
mov     [rsp+0A28h+ValueName], rdx
mov     r9, rax
mov     [rsp+0A28h+var_858], rax
xor     r9d, edx
mov     eax, 4EC4EC4Fh
mov     ecx, r9d
mul     r9d
shr     edx, 3
shr     r9d, 5
mov     r8d, r9d
mov     eax, 4EC4EC4Fh
imul    edx, 1Ah
sub     ecx, edx
add     ecx, 61h
shl     ecx, 8
mul     r9d
shr     edx, 3
shr     r9d, 5
mov     eax, 4EC4EC4Fh
imul    edx, 1Ah
sub     r8d, edx
mul     r9d
add     r8d, 41h
mov     eax, 4EC4EC4Fh
or      r8d, ecx
shr     edx, 3
mov     ecx, r9d
shr     r9d, 5
shl     r8d, 8
imul    edx, 1Ah
sub     ecx, edx
add     ecx, 61h
or      ecx, r8d
shl     ecx, 8
mul     r9d
shr     edx, 3
imul    edx, 1Ah
sub     r9d, edx
add     r9d, 41h
or      r9d, ecx
rdtsc
shl     rdx, 20h
mov     rcx, r10
mov     r8d, r9d        ; Tag
or      rax, rdx
xor     rcx, rax
lea     rax, [rsp+0A28h+var_2E8]
xor     rcx, rax
mov     rax, rcx
ror     rax, 3
xor     rcx, rax
mov     rax, r11
mul     rcx
;
; Perform the actual allocation.  We're requesting NonPagedPool,
; with the random pooltag selected by the deobfuscation and
; randomization code above.  The actual size of the block being
; allocated here is given in ebx, with a random "fuzz factor" that
; is added to this minimum allocation size, then truncated to a
; maximum of 2047 bytes.
;
xor     ecx, ecx        ; PoolType
mov     [rsp+0A28h+var_310], rdx
xor     rdx, rax
mov     [rsp+0A28h+var_2E8], rax
and     edx, 7FFh
add     edx, ebx        ; NumberOfBytes
call    ExAllocatePoolWithTag

3.2) Expanded Set of DPC Routines

Other protection mechanisms used in PatchGuard version 2 include an expanded
set of DPC routines used to arrange for the execution of the PatchGuard
integrity check routine.  Recall that in PatchGuard version 1, there existed a
set of three possible DPC routines.  In PatchGuard version 2, this set of
potential DPC routines that can be repurposed for PatchGuard's use has been
expanded to ten possibilities.  One DPC routine is selected at boot time from
this set of ten possiblities, and from that point is used for all further
PatchGuard operations for the lifetime of the session.  The fact that only one
DPC routine is used in a particular Windows session is a weakness that is
inherited from the previous PatchGuard version (as the reader will discover,
eventually comes in handy if one is set on bypassing PatchGuard).  The DPC
routine to be used for the current boot session is selected in the
nt!SepAdtInitializePrivilegeAuditing routine, much the same as how the bogus
pooltag to be used for all PatchGuard allocations is selected:

INIT:0000000000832741:
PatchGuard_Pick_Random_DPC:
;
; Use the time stamp counter as a random seed.
;
rdtsc
shl     rdx, 20h
mov     rcx, r15
or      rax, rdx
xor     rcx, rax
lea     rax, [rsp+0A28h+var_360]
xor     rcx, rax
mov     rax, rcx
ror     rax, 3
xor     rcx, rax
mov     rax, 7010008004002001h
mul     rcx
mov     [rsp+0A28h+var_360], rax
mov     rcx, rdx
mov     qword ptr [rsp+0A28h+arg_260], rdx
xor     rcx, rax
mov     rax, 0CCCCCCCCCCCCCCCDh
mul     rcx
shr     rdx, 3
;
; The resulting value in `rax' is the index into a switch jump table
; that is used to locate the DPC to be repurposed for initiating
; PatchGuard checks for this session.
;
lea     rax, [rdx+rdx*4]
add     rax, rax
sub     rcx, rax
jmp     PatchGuard_DPC_Switch

INIT:0000000000832317:
PatchGuard_DPC_Switch:
;
; The address of the case statement is formed by adding the image base (here,
; being loaded into `rdx') and an RVA in the table indexed by rax.
;
lea     rdx, cs:400000h
mov     eax, ecx
;
; Locate the case statement RVA by indexing the jump offset table.
;
mov     ecx, [rdx+rax*4+432E60h]
;
; Add it to the image base to form a complete 64-bit address.
;
add     rcx, rdx
;
; Execute the case handler.
;
jmp     rcx


;
; The set of case statements are as follows:
;
; Each case statement block simply loads the full 64-bit address
; of the DPC routine to be repurposed for PatchGuard checks into
; the r8 register.  This register is later stored into one of
; PatchGuard's internal data structures for future use.
;

lea     r8, CmpEnableLazyFlushDpcRoutine
jmp     short PatchGuardSelectDpcRoutine
lea     r8, _CmpLazyFlushDpcRoutine
jmp     short PatchGuardSelectDpcRoutine
lea     r8, ExpTimeRefreshDpcRoutine
jmp     short PatchGuardSelectDpcRoutine
lea     r8, ExpTimeZoneDpcRoutine
jmp     short PatchGuardSelectDpcRoutine
lea     r8, ExpCenturyDpcRoutine
jmp     short PatchGuardSelectDpcRoutine
lea     r8, ExpTimerDpcRoutine
jmp     short PatchGuardSelectDpcRoutine
lea     r8, IopTimerDispatch
jmp     short PatchGuardSelectDpcRoutine
lea     r8, IopIrpStackProfilerTimer
jmp     short PatchGuardSelectDpcRoutine
lea     r8, KiScanReadyQueues
jmp     short PatchGuardSelectDpcRoutine
lea     r8, PopThermalZoneDpc
;
; (fallthrough from last case statement)
;
INIT:0000000000832800:
PatchGuardSelectDpcRoutine:
xor     ecx, ecx
;
; Store the DPC routine into r14+178.  r14 points to one of
; the PatchGuard context structures in this particular instance.
;
mov     [r14+178h], r8

Much like PatchGuard version 1, each of the DPCs selected for use in launching
the PatchGuard integrity checks has a legitimate function.  Furthermore, the
DPC routines are ones that are important for normal system operation, thus it
is not possible for one to simply detect all DPCs that refer to these DPC
routines and cancel them.  Instead, much as with PatchGuard version 1, if one
wanted to go the route of blocking PatchGuard's DPC, a mechanism to detect the
particular PatchGuard DPC (as opposed to the legitimate system invocations
thereof) must be developed.  This aspect of PatchGuard's obfuscation
mechanisms is relatively similar to version 1, other than the logical
extension to ten DPCs instead of three DPCs.

3.3) Self-Decrypting and Mutating System Integrity Check Routine

PatchGuard version 2 also inherits the capability to encrypt its
datastructures and executable code in-memory from version 1.  This is a
defensive mechanism that intends to make it difficult for an attacker to
perform a classic egghunt style search, wherein the attacker has devised an
identifiable signature for PatchGuard data structures that can be used to
locate it in an exhaustive non-paged-pool memory scan.  From this perspective,
the obfuscation and encryption of PatchGuard code and data structures that are
dynamically allocated is still a reasonably strong defensive mechanism.
Unfortunately for Microsoft, though, some of the data structures linking to
PatchGuard are internal system structures (such as a KDPC and associated
KTIMER used to kick off PatchGuard execution).  This presents a weakness that
could be potentially used to identify PatchGuard structures in memory (which
will be explored in more detail later).

The encryption of PatchGuard's internal context structures was covered by
Uninformed's original paper [2] on the subject.  However, the mechanism by which
PatchGuard obfuscates its system integrity checking and validation routines
was not discussed.  This mechanism is novel enough to warrant some
explanation.  The technique used to obfuscate PatchGuard's executable code
in-memory involves two layers of decryption/deobfuscation functions, each of
which decrypts the next layer.  After both layers have run their course,
PatchGuard's validation routines are plaintext in memory and are then directly
executed.

The first decryption layer is the code block that is called from the
repurposed DPC routine selected by PatchGuard at boot time.  Its job is to
decrypt itself (in 8 byte chunks, starting with the second instruction in the
function).  After the decryption of the this code block is complete, the
decryption stub continues on to decrypt a second code block (the actual
PatchGuard validation routine).  When this second decryption/deobfuscation
cycle is completed, the decryption stub then executes the actual PatchGuard
system integrity check routine.

As noted above, the first task for the decryption stub is to decrypt itself.
Except for the first instruction of the stub, the entire routine is encrypted
when entered.  The first instruction encrypts itself and decrypts the next
instruction.  The following instruction decrypts the next two instructions,
and soforth.  This is accomplished by a series of four byte long instructions
that xor an eight byte quantity with a decryption key (initially starting at
the current instruction pointer - here, rcx and rip always have the same
value.  An example of how this process works is illustrated below:

;
; rcx: Address of the decryption stub (same as rip)
; rdx: Decryption key
;
Breakpoint 5 hit
nt!ExpTimeRefreshDpcRoutine+0x20a:
fffff800`0112c98b ff5538          call    qword ptr [rbp+38h]
0: kd> u poi(rbp+38)
;
; Note that beyond the first instruction, the decryption stub is initially seemingly
; garbage data (though it has an apparent pattern to it, since it is merely obfuscated
; by xor).
;
fffffadf`f6e6d55d f0483111        lock xor qword ptr [rcx],rdx
fffffadf`f6e6d561 88644d68        mov     byte ptr [rbp+rcx*2+68h],ah
fffffadf`f6e6d565 62              ???
fffffadf`f6e6d566 d257df          rcl     byte ptr [rdi-21h],cl
fffffadf`f6e6d569 88644d78        mov     byte ptr [rbp+rcx*2+78h],ah
fffffadf`f6e6d56d 62              ???
fffffadf`f6e6d56e d257ef          rcl     byte ptr [rdi-11h],cl
fffffadf`f6e6d571 88644d48        mov     byte ptr [rbp+rcx*2+48h],ah
0: kd> t
fffffadf`f6e6d55d f0483111        lock xor qword ptr [rcx],rdx
0: kd> r
;
; Note the initial input arguments.  rcx points to the decryption stub's first
; instruction (same as rip), and rdx is the decryption key.
;
rax=fffffadff6e6d55d rbx=fffff8000116d894 rcx=fffffadff6e6d55d
rdx=601c55c0cf06e32a rsi=fffff800003c7ad0 rdi=0000000000000003
rip=fffffadff6e6d55d rsp=fffff800003c51f8 rbp=fffff800003c7ad0
 r8=0000000000000000  r9=0000000000000000 r10=0000000001c7111e
r11=fffff800003c54c0 r12=fffff8000116d858 r13=fffff800003c5370
r14=fffff80001000000 r15=fffff800003c60a0
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
fffffadf`f6e6d55d f0483111        lock xor qword ptr [rcx],rdx ds:002b:fffffadf`f6e6d55d=684d6488113148f0

;
; After allowing the decryption of the stub to progress, we see the stub in its executable
; form.  The first instruction is initially re-encrypted after executed, but a later
; instruction in the decryption stub returns the initial instruction to its executable,
; plaintext form.
;

0: kd> u FFFFFADFF6E6D55D
;
; The `lock' prefix is used to create a four byte instruction when there
; is no immediate offset specified (a MASM limitation, as the assembler
; will convert a zero offset into the shorter form with no immediate
; offset operand).
;
fffffadf`f6e6d55d f0483111        lock xor qword ptr [rcx],rdx
fffffadf`f6e6d561 48315108        xor     qword ptr [rcx+8],rdx
fffffadf`f6e6d565 48315110        xor     qword ptr [rcx+10h],rdx
fffffadf`f6e6d569 48315118        xor     qword ptr [rcx+18h],rdx
fffffadf`f6e6d56d 48315120        xor     qword ptr [rcx+20h],rdx
fffffadf`f6e6d571 48315128        xor     qword ptr [rcx+28h],rdx
fffffadf`f6e6d575 48315130        xor     qword ptr [rcx+30h],rdx
fffffadf`f6e6d579 48315138        xor     qword ptr [rcx+38h],rdx
0: kd> u
fffffadf`f6e6d57d 48315140        xor     qword ptr [rcx+40h],rdx
fffffadf`f6e6d581 48315148        xor     qword ptr [rcx+48h],rdx
;
; Because the initial instruction was re-encrypted after it was executed,
; we need to decrypt it again.
;
fffffadf`f6e6d585 3111            xor     dword ptr [rcx],edx
fffffadf`f6e6d587 488bc2          mov     rax,rdx
fffffadf`f6e6d58a 488bd1          mov     rdx,rcx
fffffadf`f6e6d58d 8b4a4c          mov     ecx,dword ptr [rdx+4Ch]
;
; The following is the second stage decryption loop.  It's purpose is to
; decrypt a code block following the current decryption stub in memory.
;
; This code block is then executed (it is responsible for performing the
; actual PatchGuard system verification checks).
;

fffffadf`f6e6d590 483144ca48      xor     qword ptr [rdx+rcx*8+48h],rax
fffffadf`f6e6d595 48d3c8          ror     rax,cl
0: kd> u
fffffadf`f6e6d598 e2f6            loop    fffffadf`f6e6d590
;
; After decryption of the second block is completed, we'll execute it
; by jumping to it.  Doing so kicks off the system verification routine
; that verifies system integrity, arranging for a bug check if not,
; otherwise arranging for itself to be executed again several minutes
; later.
;
fffffadf`f6e6d59a 8b8288010000    mov     eax,dword ptr [rdx+188h]
fffffadf`f6e6d5a0 4803c2          add     rax,rdx
fffffadf`f6e6d5a3 ffe0            jmp     rax

Prior to returning control, the verification routine re-encrypts itself so
that it does not remain in plaintext after the first invocation.  In addition,
PatchGuard also re-randomizes the key used to encrypt and decrypt the
PatchGuard validation routine on each execution, such that a would-be attacker
has a frequently mutating target.  Due to this behavior, the PatchGuard
validation routine changes appearance (in encrypted form) in-memory every few
minutes, which is the period of PatchGuard's validation checks.  While this is
perhaps an admirable effort on Microsoft's part as far as interesting
obfuscation techniques go, it turns out that there are much easier avenues of
attack that can be used to disable PatchGuard without having to involve
oneself in the search of a target that alters its appearance in-memory every
few minutes.

3.4) Obfuscation of System Integrity Check Calls via Structured Exception Handling

Much like PatchGuard version 1, this version of PatchGuard utilizes structured
exception handling (SEH) support as an integral part of the process used to
kick off execution of the system integrity check routine.  The means by which
this is accomplished have changed somewhat since the last PatchGuard version.
In particular, there are several layers of obfuscation in each PatchGuard DPC
that are used to shroud the actual call to the integrity check routine.  In an
effort to make matters more difficult for would-be attackers, the exact
details of the obfuscation used vary between each of the ten DPCs that may be
repurposed for use with PatchGuard.  They all exhibit a common pattern,
however, which can be described at a high level.

The first step in invoking the PatchGuard system integrity checking routine is
a KTIMER with an associated KDPC (indicating a DPC callback routine to be
called when the timer lapses) associated with it.  This timer is primed for
single-shot execution in an interval on the order of several minutes (with a
random fuzz factor delta applied to increase the difficulty of performing a
classic egghunt style attack to locate the KTIMER in non-paged pool).  The DPC
routine indicated with the KDPC that is associated with PatchGuard's KTIMER is
one of the set of ten legitimate DPC routines that may be repurposed for use
with PatchGuard.  The means by which this particular invocation of the DPC
routine is distinguished from a legitimate system invocation of the DPC
routine in question is by the use of a deliberately invalid kernel pointer as
one of the arguments to the DPC routine.

The prototype for a DPC routine is described by PKDEFERRED_ROUTINE:

typedef
VOID
(*PKDEFERRED_ROUTINE) (
    IN struct _KDPC *Dpc,       // pointer to parent DPC
    IN PVOID DeferredContext,   // arbitrary context - assigned at DPC initialization
    IN PVOID SystemArgument1,   // arbitrary context - assigned when DPC is queued
    IN PVOID SystemArgument2    // arbitrary context - assigned when DPC is queued
    );

Essentially, a DPC is a callback routine with a set of user-defined context
parameters whose interpretation is entirely up to the DPC routine itself.  The
standard use for context arguments in callback functions is to use them to
point to a larger structure which contains information necessary for the
callback routine to function, and this is exactly how the ten DPC routines
that can used by PatchGuard regard the DeferredContext argument during
legitimate execution.  It is this usage of the DeferredContext argument which
allows PatchGuard to trigger its execution for each of the ten DPC routines
via an exception; PatchGuard arranges for a bogus DeferredContext value to be
passed to the DPC routine when it is called.  The first time that the DPC
routine tries to dereference the DPC-specific structure referred to by
DeferredContext, an exception occurs (which transfers control to the exception
dispatching system, and eventually to PatchGuard's integrity check routine).
While this may seem simple at first, if the reader is familiar with kernel
mode programming, then there should be a couple of red flags set off by this
description; normally, it is not possible to catch bogus memory references at
DISPATCH_LEVEL or above with SEH (usually, one of the
PAGE_FAULT_IN_NON_PAGED_AREA or IRQL_NOT_LESS_OR_EQUAL bugchecks will be
raised, depending on whether the bogus reference was to a reserved non-paged
region or a paged-out pagable memory region).  As a result, one would expect
that PatchGuard would be putting the system at risk of randomly bugchecking by
passing bogus pointers that are referenced at DISPATCH_LEVEL, the IRQL at
which DPC routines run.  However, PatchGuard has a couple of tricks up its
metaphorical sleeve.  It takes advantage of an implementation-specific detail
of the current generation of x64 processors shipped by AMD in order to form
kernel mode addresses that, while bogus, will not result in a page fault when
referenced.  Instead, these bogus addresses will result in a general
protection fault, which eventually manifests itself as a
STATUS_ACCESS_VIOLATION SEH exception.  This path to raising a
STATUS_ACCESS_VIOLATION exception does in fact work even at DISPATCHL_EVEL,
thus allowing PatchGuard to provide safe bogus pointer values for the
DeferredContext argument in order to trigger SEH dispatching without risking
bringing the system down with a bugcheck.

Specifically, the implementation detail that PatchGuard relies upon relates to
the 48-bit address space limitation in AMD's Hammer family of processors[4].
Current AMD processors only implement 48 bits of the 64-bit address space
presented by the x64 architecture.  This is accomplished by requiring that
bits 63 through the most significant bit implemented by the processor (current
AMD processors implement 48 bits) of any given address be set to either all
ones or all zeros.  An address of this form is defined to be a canonical
address, or a well-formed address.  Attempts to reference addresses that are
not canonical as defined by this definition result in the processor
immediately raising a general protection fault.  This restriction on the
address space essentially splits the usable address space into two halves; one
region at the high end of the address space, and one region at the low end of
the address space, with a no-mans-land in between the two.  Windows utilizes
this split to divide user mode from kernel mode, with the high end of the
address space being reserved for kernel mode usage and the low end of the
address space being reserved for user mode usage.  PatchGuard takes advantage
of this processor-mandated no-mans-land to create bogus pointer values that
can be safely dereferenced and caught by SEH, even at high IRQLs.

All of the DPC routines that are in the set which may be repurposed for use by
PatchGuard dereference the DeferredContext argument as the first part of work
that does not involve shuffling stack variables around.  In other words, the
first real work involved in any of the PatchGuard-enabled DPC routines is to
touch a structure or variable pointed to by the DeferredContext argument.  In
the execution path of PatchGuard attempting to trigger a system integrity
check, the DeferredContext argument is invalid, which  eventually results in
an access violation exception that is routed to the SEH registrations for the
DPC routine.  If one examines any of the PatchGuard DPC routines, it is clear
that all of them have several overlapping SEH registrations (a construct that
normally indicates several levels of nested try/except and try/finally
constructs):

1: kd> !fnseh nt!ExpTimeRefreshDpcRoutine
nt!ExpTimeRefreshDpcRoutine Lc8 0A,02 [EU ] nt!_C_specific_handler (C)
> fffff8000100358a La (fffff8000112c830 -> fffff80001000000)
> fffff8000100358a Lc (fffff8000112c870 -> fffff80001003596)
> fffff8000100358a L16 (fffff8000112c8a0 -> fffff80001000000)
> fffff8000100358a L18 (fffff8000112c8f0 -> fffff800010035a2)

These SEH registrations are integral to the operation of PatchGuard's system
integrity checks.  The specifics of how each handler registration work differ
for each DPC routine (in an attempt to frustrate attempts to reverse engineer
them), but the general idea is that each registered handler performs a portion
of the work necessary to set up a call to the PatchGuard integrity check
routine.  This work is divided up among four different exception/unwind
handlers in an effort to make it difficult to understand what is going on, but
ultimately the end result is the same for each of the DPC routines; one of the
exception/unwind handlers ends up making a direct call to the system integrity
check decryption stub in-memory.  The decryption stub decrypts itself, and
then decrypts the PatchGuard check routine, following with a transfer of
control to the integrity check routine so that PatchGuard can inspect various
protected registers, MSRs, and kernel images (such as the kernel itself) for
unauthorized modification.

Additionally, all of the PatchGuard DPCs have been enhanced to obfuscate the
DPC routine arguments in stack variables (whose exact stack displacement
varies from DPC routine to DPC routine, and furthermore between kernel flavor
to kernel flavor; for example, the multiprocessor and uniprocessor kernel
builds have different stack frame layouts for many of the PatchGuard DPC
routines).  Recall that in the x64 calling convention, the first four
arguments are passed via registers (rcx, rdx, r8, and r9 respectively).  Each
PatchGuard DPC routine takes special care to save away significant register
arguments onto the stack (in an obfuscated form).  Several of the arguments
remain obfuscated until just before the decryption stub for the system
integrity check routine is called, in an effort to make it difficult for third
parties to patch into the middle of a particular DPC routine and easily access
the original arguments to the DPC.  This is presumably designed in an attempt
to make it more difficult to differentiate DPC invocations that perform the
DPC routine's legitimate function from DPC invocations that will call
PatchGuard.  It also makes it difficult, though not impossible, for a third
party to recover the original arguments to the DPC routine from the context of
any of the exception handlers registered to the DPC routine in a generalized
fashion.

This obfuscation of arguments can be clearly seen by disassembling any of the
PatchGuard DPC routines.  For example, when looking at
ExpTimeRefreshDpcRoutine, one can see that the routine saves away the Dpc
(rcx) and DeferredContext (rdx) arguments on the stack, rotates them by a
magical constant (this constant differs for each DPC routine flavor and is
used to further complicate the task of recovering the original DPC arguments
in a generalized fashion), and then overwrites the original argument
registers:

0: kd> uf nt!ExpTimeRefreshDpcRoutine
;
; On entry, we have the following:
;
; rcx -> Dpc
; rdx -> DeferredContext (if this is being called for
;                         PatchGuard, then DeferredContext
;                         is a bogus kernel pointer).
; r8  -> SystemArgument1
; r9  -> SystemArgument2
;
nt!ExpTimeRefreshDpcRoutine:
;
; r11 is used as an ephemeral frame pointer here.
;
; Ephemeral frame pointers are an x64-specific compiler
; construct, wherein a volatile register is used as a
; frame pointer until the first function call is made.
;
fffff800`01003540 4c8bdc          mov     r11,rsp
fffff800`01003543 4881ecc8000000  sub     rsp,0C8h
fffff800`0100354a 4889642460      mov     qword ptr [rsp+60h],rsp
;
; This DPC routine does not use SystemArgument1 or
; SystemArgument2.  As a result, it is free to overwrite
; these argument registers immediately without preserving
; their value.
;
; r8  = Dpc
; rcx = Dpc
; rdx = DeferredContext
;
fffff800`0100354f 4c8bc1          mov     r8,rcx
fffff800`01003552 4889542448      mov     qword ptr [rsp+48h],rdx
;
; Set [rsp+20h] to zero.  This is a state variable that is
; used by the exception/unwind scope handlers in order to
; coordinate the PatchGuard execution process across the
; set of four exception/unwind scope handlers associated
; with this section of code.
;
fffff800`01003557 4533c9          xor     r9d,r9d
fffff800`0100355a 44894c2420      mov     dword ptr [rsp+20h],r9d
;
; PatchGuard zeros out various key fields in the DPC.
; This is an attempt to make it difficult to locate the DPC
; in-memory from the context of an exception handler called
; when a PatchGuard DPC accesses the bogus DeferredContext
; argument.  Specifically, PatchGuard zeros the Type and
; DeferredContext fields of the KDPC structure, shown below:
;
; 0: kd> dt nt!_KDPC
;   +0x000 Type             : UChar
;   +0x001 Importance       : UChar
;   +0x002 Number           : UChar
;   +0x003 Expedite         : UChar
;   +0x008 DpcListEntry     : _LIST_ENTRY
;   +0x018 DeferredRoutine  : Ptr64
;   +0x020 DeferredContext  : Ptr64 Void
;   +0x028 SystemArgument1  : Ptr64 Void
;   +0x030 SystemArgument2  : Ptr64 Void
;   +0x038 DpcData          : Ptr64 Void
;
; Dpc->Type = 0
;
fffff800`0100355f 448809          mov     byte ptr [rcx],r9b
;
; Dpc->DeferredContext = 0
;
fffff800`01003562 4c894920        mov     qword ptr [rcx+20h],r9
;
; Here, the DPC loads [r11-20h] with an obfuscated
; copy of the DeferredContext argument (rotated
; left by 0x34 bits).
;
; Recall that rsp == r11+0xc8, so this location
; can also be aliased by [rsp+0A8h].
;
; [rsp+0A8h] -> ROL(DeferredContext, 0x34)
;
fffff800`01003566 488bc2          mov     rax,rdx
fffff800`01003569 48c1c034        rol     rax,34h
fffff800`0100356d 498943e0        mov     qword ptr [r11-20h],rax
;
; Similarly, the DPC loads [r11-48h] with an
; obfuscated copy of the Dpc argument (rotated
; right by 0x48 bits).
;
; This location may be aliased as [rsp+80h].
;
; [rsp+80h] -> ROR(Dpc, 0x48)
;
fffff800`01003571 488bc1          mov     rax,rcx
fffff800`01003574 48c1c848        ror     rax,48h
fffff800`01003578 498943b8        mov     qword ptr [r11-48h],rax
;
; The following register context is now in place:
;
; r8         = Dpc
; rcx        = Dpc
; rdx        = DeferredContext
; rax        = ROR(Dpc, 0x48)
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
; [rsp+80h]  = ROR(Dpc, 0x48)
;
; The DPC routine destroys the contents of rcx by
; zero extending it with a copy of the low byte of
; the DeferredContext value.
;
fffff800`0100357c 0fb6ca          movzx   ecx,dl
;
; The DPC routine destroys the contents of r8 with
; a right shift (unlike a rotate, the incoming left
; bits are simply zero filled instead of set to the
; rightmost bits being shifted off.  The rightmost
; bits are thus lost forever, destroying the r8
; register as a useful source of the Dpc argument.
;
fffff800`0100357f 49d3e8          shr     r8,cl
;
; r8 is saved away on the stack, but it is no longer
; directly useful as a way to locate the Dpc argument
; due to the destructive right shift above.
;
fffff800`01003582 4c898424d8000000 mov     qword ptr [rsp+0D8h],r8
;
; r8         = Dpc >> (UCHAR)DeferredContext
; rcx        = (UCHAR)DeferredContext
; rdx        = DeferredContext
; rax        = ROR(Dpc, 0x48)
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
; [rsp+80h]  = ROR(Dpc, 0x48)
;
; Here, we temporarily deobfuscate the DeferredContext
; argument stored at [r11-20h] above.  In this particular
; instance, rdx also happens to contain the deobfuscated
; DeferredContext value, but not all instances of
; PatchGuard's DPC routines share this property of
; retaining a plaintext copy of DeferredContext in rdx.
;
fffff800`0100358a 498b43e0        mov     rax,qword ptr [r11-20h]
fffff800`0100358e 48c1c834        ror     rax,34h
;
; Now, we have the following context in place:
;
; r8         = Dpc >> (UCHAR)DeferredContext
; rcx        = (UCHAR)DeferredContext
; rdx        = DeferredContext   (* But not valid for
;                                 all DPC routines.)
; rax        = DeferredContext
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
; [rsp+80h]  = ROR(Dpc, 0x48)
;
; The next step is to dereference the DeferredContext value.
; For a legitimate DPC invocation, this operation is harmless;
; the DeferredContext value would point to valid kernel memory.
;
; For PatchGuard, however, this triggers an access violation
; that winds up with control being transferred to the exception
; handlers registered to the DPC routine.
;
fffff800`01003592 8b00            mov     eax,dword ptr [rax]

At this point, it is necessary to investigate the various exception/unwind
handlers registered to the DPC routine in order to determine what happens
next.  Most of these handlers can be skipped as they are nothing more than
minor layers of obfuscation that, while differing significantly between each
DPC routine, have the same end result.  One of the exception/unwind handlers,
however, makes the call to PatchGuard's integrity check, and this handler is
worthy of further discussion.  Because the exception registrations for all of
the PatchGuard DPC routines make use of nt!_C_specific_handler, the scope
handlers conform to a standard prototype, defined below:

//
// Define the standard type used to describe a C-language exception handler,
// which is used with _C_specific_handler.
//
// The actual parameter values differ depending on whether the low byte of the
// first argument contains the value 0x1.  If this is the case, then the call
// is to the unwind handler to the routine; otherwise, the call is to the
// exception handler for the routine.  Each routine has fairly different
// interpretations for the two arguments, though the prototypes are as far as
// calling conventions go compatible.
//

typedef
LONG
(NTAPI * PC_LANGUAGE_EXCEPTION_HANDLER)(
        __in    PEXCEPTION_POINTERS    ExceptionPointers,  // if low byte is 0x1, then we're an unwind
        __in    ULONG64                EstablisherFrame    // faulting routine stack pointer
        );

In the case of nt!ExpTimeRefreshDpcRoutine, the fourth scope handler
registration is the one that performs the call to PatchGuard's integrity check
routine.  Here, the routine only executes the integrity check if a state
variable stored at [rsp+20h] in the DPC routine is set to a particular value.
This state variable is modified as the access violation exception traverses
each of the exception/unwind scope handlers until it reaches this handler,
which eventually leads up to the execution of PatchGuard's system integrity
check.  For now, it is best to assume that this routine is being called with
[rsp+20h] in the DPC routine having been set to a value other than 0x15.  This
signifies that PatchGuard should be executed.

0: kd> uf fffff8000112c8f0
nt!ExpTimeRefreshDpcRoutine+0x17f:
;
; mov eax, eax is a hotpatch stub and can be ignored.
;
fffff800`0112c8f0 8bc0            mov     eax,eax
fffff800`0112c8f2 55              push    rbp
fffff800`0112c8f3 4883ec20        sub     rsp,20h
;
; rdx corresponds to the EstablisherFrame argument.
; This argument is the stack pointer (rsp) value for
; the routine that this exception/unwind handler is
; associated with.  The typical use of this argument
; is to allow seamless access to local variables in
; the routine for which the try/except filter is
; associated with.  This is what eventually ends up
; occuring here, with the rbp register being loaded
; with the stack pointer of the DPC routine at the
; point in time where the exception occured.
;
;
fffff800`0112c8f7 488bea          mov     rbp,rdx
;
; We make the check against the state variable.
; Recall that when the DPC routine was first entered,
; [rsp+20h] in the DPC routine's context was set to
; zero.  That location corresponds to [rbp+20h] in
; this context, as rbp has been loaded with the stack
; pointer that was in use in the DPC routine.  This
; location is checked and altered by each of the
; registered exception/unwind handlers, and will
; eventually be set to 0x15 when this routine is called.
;
fffff800`0112c8fa 83452007        add     dword ptr [rbp+20h],7
fffff800`0112c8fe 8b4520          mov     eax,dword ptr [rbp+20h]
fffff800`0112c901 83f81c          cmp     eax,1Ch
;
; For the moment, consider the case where this jump is
; not taken.  The jump is taken when PatchGuard is not
; being executed (which is not the interesting case).
;
fffff800`0112c904 0f858c000000    jne     nt!ExpTimeRefreshDpcRoutine+0x215 (fffff800`0112c996)

nt!ExpTimeRefreshDpcRoutine+0x189:
;
; To understand the following instructions, it is
; necessary to look back at the stack variable context
; that was set up by the DPC routine prior to the
; faulting instruction that caused the access
; violation exception.  The following values were
; set on the stack at that time:
;
; [rsp+0A8h] = ROL(DeferredContext, 0x34)
; [rsp+80h]  = ROR(Dpc, 0x48)
;
; The following set of instructions utilize these
; obfuscated copies of the original arguments to the
; DPC routine in order to make the call to PatchGuard's
; integrity check routine.
;
; The first step taken is to deobfuscate the Dpc value
; that was stored at [rsp+80h], or [rbp+80h] as seen from
; this context.
;
fffff800`0112c90a 488b8580000000  mov     rax,qword ptr [rbp+80h]
;
; rax = Dpc
;
fffff800`0112c911 48c1c048        rol     rax,48h
;
; [rbp+50h] -> Dpc
;
fffff800`0112c915 48894550        mov     qword ptr [rbp+50h],rax
;
; Next, the DeferredContext argument is deobfuscated and
; stored plaintext.
;
fffff800`0112c919 488b85a8000000  mov     rax,qword ptr [rbp+0A8h]
;
; rax = DeferredContext
;
fffff800`0112c920 48c1c834        ror     rax,34h
;
; [rbp+58h] -> DeferredContext
;
fffff800`0112c924 48894558        mov     qword ptr [rbp+58h],rax
;
; rax = Dpc
;
fffff800`0112c928 488b4550        mov     rax,qword ptr [rbp+50h]
;
; The next instruction accesses memory after the KDPC
; object in memory.  Recall that a KDPC object is 0x40
; bytes in length on x64, so [Dpc+40h] is the first
; value beyond the DPC in memory.  In reality, the KDPC
; is a member of a larger structure, which is defined
; as follows:
;
; struct PATCHGUARD_DPC_CONTEXT {
;  KDPC      Dpc;            // +0x00
;  ULONGLONG DecryptionKey;  // +0x40
; };
;
; As a result, this instruction is equivalent to casting
; the Dpc argument to a PATCHGUARD_DPC_CONTEXT*, and then
; accessing the DecryptionKey member
;
;
; rcx = Dpc->DecryptionKey
;
fffff800`0112c92c 488b4840        mov     rcx,qword ptr [rax+40h]
;
; [rbp+40h] -> DecryptionKey
;
fffff800`0112c930 48894d40        mov     qword ptr [rbp+40h],rcx
;
; rax = DecryptionKey
;
fffff800`0112c934 488b4540        mov     rax,qword ptr [rbp+40h]
;
; The DeferredContext value is then xor'd with the
; decryption key stored in the PATCHGUARD_DPC_CONTEXT
; structure.  This yields the significant bits of the
; pointer to the PatchGuard decryption stub.  Recall
; that due to the "no-mans-land" region in between the
; kernel mode and user mode address space boundaries
; on current AMD64 processors, the rest of the bits
; are required to be either all ones or all zeros in
; order to form a valid address.  Because we are
; dealing with a kernel mode address, it can be safely
; assumed that all of the bits must be ones.
;
fffff800`0112c938 48334558        xor     rax,qword ptr [rbp+58h]
;
; [rbp+30h] -> DeferredContext ^ DecryptionKey
;
fffff800`0112c93c 48894530        mov     qword ptr [rbp+30h],rax
;
; Set the required bits to ones in the decrypted
; pointer, as required to form a canonical address on
; current AMD64 systems.
;
fffff800`0112c940 48b80000000000f8ffff mov rax,0FFFFF80000000000h
;
; [rbp+30h] -> [rbp+30h] | 0xFFFFF80000000000
;
; Now, [rbp+30h] is the pointer to the decryption stub.
;
fffff800`0112c94a 48094530        or      qword ptr [rbp+30h],rax
;
; The following instructions make extra copies of the decryption
; stub on the stack of the DPC routine.  There is no real purpose
; to this, other than a half-hearted attempt to confuse anyone
; attempting to reverse engineer this section of PatchGuard.
;
; [rbp+38h] -> [rbp+30h] (Decryption stub)
;
fffff800`0112c94e 488b4530        mov     rax,qword ptr [rbp+30h]
fffff800`0112c952 48894538        mov     qword ptr [rbp+38h],rax
;
; [rbp+28h] -> [rbp+38h] (Decryption stub)
;
fffff800`0112c956 488b4538        mov     rax,qword ptr [rbp+38h]
fffff800`0112c95a 48894528        mov     qword ptr [rbp+28h],rax
;
; The next set of instructions rewrite the first
; four bytes of the initial opcode in the decryption
; stub.  This opcode must be set to the following
; instruction:
;
; f0483111        lock xor qword ptr [rcx],rdx
;
; The individual opcode bytes for the instruction are
; written to the decryption stub one byte at a time.
;
; *(PULONG)DecryptionStub = 0x113148f0
;
fffff800`0112c95e 488b4528        mov     rax,qword ptr [rbp+28h]
fffff800`0112c962 c600f0          mov     byte ptr [rax],0F0h
fffff800`0112c965 488b4528        mov     rax,qword ptr [rbp+28h]
fffff800`0112c969 c6400148        mov     byte ptr [rax+1],48h
fffff800`0112c96d 488b4528        mov     rax,qword ptr [rbp+28h]
fffff800`0112c971 c6400231        mov     byte ptr [rax+2],31h
fffff800`0112c975 488b4528        mov     rax,qword ptr [rbp+28h]
fffff800`0112c979 c6400311        mov     byte ptr [rax+3],11h
;
; Finally, a call to the decryption stub is made.  The
; decryption stub has a prototype that conforms to the
; following definition:
;
; VOID
; NTAPI
; PgDecryptionStub(
;       __in PVOID   PatchGuardRoutine,
;       __in ULONG64 DecryptionKey,
;       __in ULONG   Reserved0,
;       __in ULONG   Reserved1
;       );
;
; The two 'reserved' ULONG values are always set to zero.
;
; rcx is loaded with the address of the decryption stub,
; and rdx is loaded with the DecryptionKey value.
;
fffff800`0112c97d 4533c9          xor     r9d,r9d
fffff800`0112c980 4533c0          xor     r8d,r8d
fffff800`0112c983 488b5540        mov     rdx,qword ptr [rbp+40h]
fffff800`0112c987 488b4d38        mov     rcx,qword ptr [rbp+38h]
;
; At this point, control is transferred to the decryption
; stub, as described previously.  The decryption stub will
; decrypt itself, decrypt the PatchGuard integrity check
; routine, and then transfer control to the PatchGuard
; integrity check routine.  The integrity check routine is
; responsible for ensuring that the DPC is returned to a
; usable state (recall that parts of it were zeroed out
; by the DPC routine earlier), and that it is re-queued
; for execution.  It is also responsible for re-encrypting
; the decryption stub as desired.
;
fffff800`0112c98b ff5538          call    qword ptr [rbp+38h]
;
; After the call is made, the exception filter returns
; the EXCEPTION_EXECUTE_HANDLER manifest constant.  This
; causes one of the registered handlers to be invoked
; in order to handle the exception.  The handler will
; transfer control to the return point of the DPC routine,
; thus skipping the body of the DPC (since the call to
; the DPC was not a request for the legitimate function of
; the DPC to be performed).
;
fffff800`0112c98e 41b901000000    mov     r9d,1
fffff800`0112c994 eb03            jmp     nt!ExpTimeRefreshDpcRoutine+0x218 (fffff800`0112c999)

nt!ExpTimeRefreshDpcRoutine+0x215:
fffff800`0112c996 4533c9          xor     r9d,r9d

nt!ExpTimeRefreshDpcRoutine+0x218:
fffff800`0112c999 418bc1          mov     eax,r9d
fffff800`0112c99c 4883c420        add     rsp,20h
fffff800`0112c9a0 5d              pop     rbp
fffff800`0112c9a1 c3              ret

This does represent a significant level of obfuscation, but it is not
impenetrable, and there are various simple ways through which an attacker
could bypass all of these layers of obfuscation entirely.

3.5) Disruption of Debug Register-Based Breakpoints

PatchGuard version 2 attempts to protect itself from breakpoints that are set
using the hardware debug registers.  These breakpoints operate by setting up
to four designated memory locations that are of interest.  Each memory
location can be configured to cause a debug exception when it is read,
written, or executed.  Because breakpoints of this flavor are not visible to
PatchGuard's code integrity checks (unlike conventional breakpoints, these
breakpoints do not involve int 3 (0xcc) opcodes being substituted for target
instructions), debug register-based breakpoints (sometimes known as ``memory
breakpoints'' or ``hardware breakpoints'') pose a threat to PatchGuard.
PatchGuard attempts to counter this threat by disabling all such debug
register-based breakpoints as a first step after the system integrity checking
routine has been decrypted in-memory:

;
; Here, the second stage decryption sequence is
; set to run to decrypt the system integrity
; check routine.  We step over the second stage
; decryption and examine the integrity check
; routine in its plaintext state...
;

fffffadf`f6edc043 8b4a4c          mov     ecx,dword ptr [rdx+4Ch]
fffffadf`f6edc046 483144ca48      xor     qword ptr [rdx+rcx*8+48h],rax
fffffadf`f6edc04b 48d3c8          ror     rax,cl
fffffadf`f6edc04e e2f6            loop    fffffadf`f6edc046
fffffadf`f6edc050 8b8288010000    mov     eax,dword ptr [rdx+188h]
fffffadf`f6edc056 4803c2          add     rax,rdx
fffffadf`f6edc059 ffe0            jmp     rax
fffffadf`f6edc05b 90              nop
;
; We set a breakpoint on the 'jmp rax' instruction
; above.  This instruction is what transfers control
; to the system integrity check routine.
;
0: kd> ba e1 fffffadf`f6edc059
0: kd> g
Breakpoint 2 hit
fffffadf`f6edc059 ffe0            jmp     rax
;
; rax now points to the decrypted system
; integrity check routine in-memory.  The
; first call it makes is to a routine whose
; purpose is to disable all debug register-based
; breakpoints by clearing the debug control
; register (dr7).  Doing so effectively turns
; off all of the debug register breakpoints.
;
0: kd> u @rax
fffffadf`f6edd8de 4883ec78        sub     rsp,78h
fffffadf`f6edd8e2 48895c2470      mov     qword ptr [rsp+70h],rbx
fffffadf`f6edd8e7 48896c2468      mov     qword ptr [rsp+68h],rbp
fffffadf`f6edd8ec 4889742460      mov     qword ptr [rsp+60h],rsi
fffffadf`f6edd8f1 48897c2458      mov     qword ptr [rsp+58h],rdi
fffffadf`f6edd8f6 4c89642450      mov     qword ptr [rsp+50h],r12
fffffadf`f6edd8fb 488bda          mov     rbx,rdx
fffffadf`f6edd8fe 4c896c2448      mov     qword ptr [rsp+48h],r13
0: kd> u
fffffadf`f6edd903 e8863a0000      call    fffffadf`f6ee138e
;
; The routine simply writes all zeros to dr7.
;
0: kd> u fffffadf`f6ee138e
fffffadf`f6ee138e 33c0            xor     eax,eax
fffffadf`f6ee1390 0f23f8          mov     dr7,rax
fffffadf`f6ee1393 c3              ret

3.6) Misleading Symbol Names

One of the things that Microsoft needed to consider when implementing
PatchGuard is that would-be attackers would have access to the operating
system symbols.  As a debugging aid, Microsoft makes symbols for the entire
operating system publicly available.  It is not feasible to remove the
operating system symbols from public access (doing so would severely hinder
ISVs in the process of debugging their own drivers).  As a result, Microsoft
took the route of using misleading function names to shroud PatchGuard
routines from casual inspection.  Many of the internal PatchGuard routines
have names that are seemingly legitimate-sounding at first glance, such that
without a detailed knowledge of the kernel or actually inspecting these
routines, it would be difficult to simply look at a list of all symbols in the
kernel and locate the routines responsible for setting up PatchGuard.

The following is a listing of some of the misleading symbols that are used
during PatchGuard initialization:

  1. RtlpDeleteFunctionTable
  2. FsRtlMdlReadCompleteDevEx
  3. RtlLookupFunctionEntryEx
  4. SdbpCheckDll
  5. FsRtlUninitializeSmallMcb
  6. KiNoDebugRoutine
  7. SepAdtInitializePrivilegeAuditing
  8. KiFilterFiberContext

3.7) Integrity Checks Performed During System Initialization

During system initialization, PatchGuard performs integrity checks on several
of the anti-debug mechanisms it has in place.  If these mechanisms are altered
on-disk, PatchGuard will detect the changes.  For example, PatchGuard
validates that the routine responsible for clearing debug register-based
breakpoints contains the correct opcode bytes corresponding to the
instructions used to actually zero out Dr7:

;
; Here, we are in SepAdtInitializePrivilegeAuditing, or the
; initialization routine for PatchGuard during system startup.
;
; This code fragment is designed to validate that the
; KiNoDebugRoutine routine contains the expected opcodes that
; are used to zero out debug register breakpoints.  If the
; routine does not contain the correct opcodes, PatchGuard
; makes an early exit from SepAdtInitializePrivilegeAuditing.
;
INIT:0000000000832A6D lea     rax, KiNoDebugRoutine
INIT:0000000000832A74 cmp     dword ptr [rax], 230FC033h
INIT:0000000000832A7A jnz     abort_initialization
INIT:0000000000832A80 add     rax, 4
INIT:0000000000832A84 cmp     word ptr [rax], 0C3F8h
INIT:0000000000832A89 jnz     abort_initialization

3.8) Overwriting PatchGuard Initialization Code Post-Boot

After PatchGuard has initialized itself, it intentionally zeros out much of
the code responsible for setting up PatchGuard.  It is assumed that this is
done in an attempt to prevent third party drivers from analyzing kernel code
in-memory in order to detect or defeat PatchGuard.  This approach is obviously
trivially bypassed by opening the kernel image on disk, however.

After boot, many PatchGuard-related routines contain all zeros:

0: kd> u nt!KiNoDebugRoutine
nt!KiNoDebugRoutine:
fffff800`011a4b20 0000            add     byte ptr [rax],al

nt!FsRtlUninitializeSmallMcb:
fffff800`011a4aa2 0000            add     byte ptr [rax],al

0: kd> u nt!KiGetGdtIdt
nt!KiGetGdtIdt:
fffff800`011a4a20 0000            add     byte ptr [rax],al

0: kd> u nt!RtlpDeleteFunctionTable
nt!RtlpDeleteFunctionTable:
fffff800`011a1010 0000            add     byte ptr [rax],al

Most of the PatchGuard initialization code resides in the INITKDBG section of
ntoskrnl.  Portions of this section are zeroed out during initialization.

4) Bypass Techniques

Despite the myriad anti-reverse-engineering and anti-debug techniques employed
by PatchGuard version 2, it is hardly invincible to being bypassed by third
party code.  Contrary to one might expect, given the descriptions in the
initial section of this article, there are a number of holes in PatchGuard's
armor that can be exploited by third party software.  Several potential
techniques for bypassing PatchGuard version 2 are outlined below, including
one technique that includes functional proof of concept code.  These
techniques are applicable to the version of PatchGuard currently shipping with
Windows XP x64 Edition with all hotfixes, Windows Server 2003 x64 Edition with
all hotfixes, and Windows Vista x64 with all hotfixes at the time that this
article was written.  The author has only written a complete implementation of
the first proposed bypass technique, although the remaining proposed bypass
approaches are expected to be viable in principle.

4.1) Interception of _C_specific_handler

The simplest course of action for disabling PatchGuard version 2 is, in the
author's opinion, to intercept execution at _C_specific_handler.  The
_C_specific_handler routine is responsible for dispatching exceptions for
routines compiled with the Microsoft C/C++ compiler (and using try/except,
try/finally, or try/catch clauses).  This set of functions includes all ten of
the PatchGuard DPC routines and most other C/C++ functions in the kernel.  It
also includes many third party driver routines as well; _C_specific_handler is
exported, and the compiler references this function for all C/C++ images that
utilize SEH in some form (imported from ntoskrnl).  Due to this, Microsoft is
forced to export _C_specific_handler from the kernel perpetually, making it
difficult for Microsoft to deny access to the routine's address from the
perspective of third party drivers.  Furthermore, because _C_specific_handler
is exported from the kernel, it is trivial to retrieve its address across all
kernel versions from the context of a third party driver.  This approach
capitalizes on the fact that PatchGuard utilizes SEH in order to obfuscate the
call to the system integrity checking routine, in effect turning this
obfuscation mechanism into a convenient way to hijack execution control before
the system integrity check is actually performed.

This approach can be implemented in several different ways, but the basic idea
is to intercept execution somewhere between the faulting instruction in the
PatchGuard DPC (whichever is selected at boot time), and the exception
handlers associated with the DPC routine which invoke the PatchGuard system
integrity check routine.  With this in mind, _C_specific_handler is exactly
what one could hope for; _C_specific_handler is invoked when the benign access
violation triggered by the bogus DeferredContext value to the PatchGuard DPC
routine is called.  Furthermore, being exported, there are no concerns with
compatibility with future kernel versions, or different flavors of the kernel
(PAE vs non-PAE, MP vs UP, and soforth).

Although hooking _C_specific_handler provides a convenient way to gain control
of execution in the execution path for the PatchGuard check routine, there
remains the problem of how to safely defuse the check routine and resume
execution at a safe point such that DPCs continue to be processed by the
system in a timely fashion.  On x86, this would pose a serious problem, as in
this context, we (as an attacker attempting to bypass PatchGuard) would gain
control at an exception handler with a context record describing the context
at middle of the PatchGuard DPC routine, with no good way to unwind the
context back up to the DPC routine's caller (the kernel timer DPC dispatcher).

Ironically, by virtue of being only on x64 and not x86, this problem is made
trivial where it might have been difficult to solve in a generalized fashion
on x86.  Specifically, there is extensive unwind support baked into the core
of the x64 calling convention on Windows, such that there exists metadata
describing how to unwind any function that manipulates the stack at any point
in its execution lifetime.  This metadata is used to implement unwind
semantics that allow functions to be cleanly unwound without having to call
exception/unwind handlers implemented in code that depend on the execution
context of the routine they are associated with.  This extensive unwind
metadata can be used to our advantage here, as it provides a clean mechanism
to unwind past the DPC routine (to the DPC dispatcher) in a completely
compatible and kernel-version-independent manner.  Furthermore, there is no
good way for Microsoft to disable this unwind metadata, given how deeply
involved it is with the x64 calling convention.

The process of using the unwind metadata of a function to unwind an execution
context is known as a virtual unwind, and there is a documented, exported
routine [5] to implement this mechanism: RtlVirtualUnwind.  Using
RtlVirtualUnwind, it is possible to alter the execution context that is
provided as an argument to _C_specific_handler (and thus the hook on
_C_specific_handler).  This execution context describes the machine state at
the time of the access violation in the PatchGuard DPC routine.  After
performing a virtual unwind on this execution context, all that remains is to
return the manifest ExceptionContinueExecution constant to the kernel mode
exception dispatcher in order to realize the altered context.  This completely
bypasses the PatchGuard system integrity check.  As an added bonus, the hook
on _C_specific_handler is only needed until the first time PatchGuard is
called.  This is due to the fact that the PatchGuard timer is a one-shot
timer, and as the code to re-queue the timer is skipped by the virtual unwind,
PatchGuard is effectively permanently disabled for the remainder of the
Windows boot session.

The last remaining obstacle with this bypass technique is filtering out the
specific PatchGuard access violation exceptions from legitimate access
violations that kernel mode code may produce. This is important, as access
violations in kernel mode are a normal part of parameter validation (the probe
and lock model used to validate user mode pointers) for drivers and system
services. Fortunately, it is easy to make this determination, as it is
generally only legal to use a try/except to catch an access violation relating
to a user mode address from kernel mode (as previously described). PatchGuard
is a rare exception to this rule, in that it has a well-defined no-mans-land
region where accesses can be attempted without fear of a bugcheck occurring.
As a result, it is a safe assumption that any access violation relating to a
kernel mode address is either PatchGuard trigger its own execution, or a very
badly behaved third party driver that is grossly breaking the rules relating
to Windows kernel mode drivers.  It is the author's opinion that the latter
case is not worth considering as a blocker, especially since if such a
completely broken driver were to exist, it would already be randomly bringing
the system down with bugchecks.  It is worth noting, as an addendum, that the
referenced address in the exception information block passed to the exception
handler will always be 0xFFFFFFFF`FFFFFFFF due to how violations on
non-canonical addresses are reported by the processor. This does not impact
the viability of this technique as a valid way to bypass PatchGuard in a
version-independant manner, however.

It is worth noting that the fact that this technique involves modifying the
kernel is not a problem (aside from the inherent race conditions involved in
safely patching a running binary).  The hook will disable PatchGuard before
PatchGuard has a chance to notice the hook from the context of the system
integrity check routine.

This proposed approach has several advantages over the previously suggested
approach by Uninformed's original paper on PatchGuard[2].  Specifically, it does
not involve locating each individual DPC routine (and does not even rely on
any sort of code fingerprinting; only exported symbols are used).  This
improves both the reliability of the proposed approach (as code fingerprinting
always introduces an additional margin of error as far as false positives go)
and its resiliency to attack by Microsoft.  Because this technique relies
solely on exported functions, and does not carry any sort of dependency on how
many possible DPCs are available to PatchGuard for use (or any sort of
dependency on locating them at runtime), blocking this approach would be
significantly more involved than simply adding another possible DPC routine or
changing the attributes of an existing DPC routine in an effort to third-party
drivers that were taking a signature-based approach to locating DPC routines
for patching.

Although this technique is quite resilient to kernel changes that do not
directly involve the underlying mechanisms by which PatchGuard itself
functions (the fact that it can operate unmodified on both Windows Server 2003
x64 and Windows Vista x64 is testament to this fact), there are a number of
different ways by which Microsoft could block this attack in a future update
to PatchGuard.  The most obvious solution is to entirely abandon SEH as a core
mechanism involved in arranging for the PatchGuard system integrity check.
Abandoning SEH removes the convenient mechanism (hooking _C_specific_handler)
that is presented here as a version-independent way to hook in to the
execution path involved in PatchGuard's system integrity check. If Microsoft
were to go this route, a would-be attacker would need to devise another
mechanism to achieve control of execution before the system integrity check
runs.  Assuming that Microsoft played their hand correctly, a future
PatchGuard revision would not have such an easily-accessible mechanism to hook
into the execution process in a generic manner, largely counteracting this
proposed approach.  Microsoft could also employ some sort of pre-validation of
the exception handler path before the DPC triggers an exception, although
given that this is not the easiest and most elegant way to counter such a
technique, the author feels that it is an unlikely solution.

4.2) Interception of DPC Exception Registration

Presently, all execution paths leading to the execution of PatchGuard DPC
routines involve an exception/unwind handler.  This is another single point of
failure weakness that can be exploited by third parties attempting to disable
PatchGuard.  An approach involving the detection of all of the PatchGuard DPC
routines, followed by interception of the exception handler registrations for
each DPC is proposed as another means of defeating PatchGuard.

Though this technique is not as clean or clear-cut as the technique proposed
in 4.1, this approach is considered by the author as a viable bypass mechanism
for PatchGuard version 2.  This technique essentially involves patching the
exception registrations for each possible DPC routine that could be used by
PatchGuard, such that each exception registration points to a routine that
employs a virtual unwind to safely exit out of the PatchGuard DPC without
invoking the system integrity check.  Any such approach faces several
obstacles, however.

The first major difficulty for this technique is locating each PatchGuard DPC.
Since none of the PatchGuard DPC routines are exported, a little bit more
creative thinking is involved in finding the locations to patch.  The author
feels that a combination of pattern matching and code fingerprinting would
best serve this goal; there are a number of commonalities between the
different PatchGuard DPC routines that could be used to locate them with a
relatively high degree of confidence in PatchGuard version 2.  Specifically,
the author feels that the following criteria are acceptable for use in
detecting the PatchGuard DPC routines:

  1. Each DPC routine has one exception/unwind-marked registration with
     _C_specific_handler.  
  2. Each DPC routine has exactly four _C_specific_handler scopes.
  3. Each DPC routine is referenced in raw address form (64-bit pointer) in
     the executable code sections comprising ntoskrnl at least twice.
  4. Each DPC routine has at least two _C_specific_handler scopes with an
     associated unwind/exception handler.
  5. Each DPC routine has exactly one Cspecifichanlder scope with a call to a
     common subfunction that references RtlUnwindEx (an exported routine).
  6. Each DPC routine has several sets of distinctive, normally rare
     instructions (ror/rol instructions).

Given several (or even all) of these criteria, it should be possible to
accurately locate all ten DPC routines via scanning non-pagable code in the
kernel.  It is possible to locate the exception registration information for
the DPC routines through processing of the exception directory for the kernel
(and indeed, most of the criteria require doing this as a prerequisite).
Locating the kernel image base is fairly trivial as well; the address of an
exported routine can be taken, and truncated to a 64K region.  From there, one
need only perform downward searches in 64K increments for the DOS header
signature (followed by a check for a PE32+ header).

Another hurdle that must be solved for this approach is the placement of the
replacement exception handler routines.  These routines are required to be
within 4GB of the kernel image base (there is only a 32-bit RVA in the unwind
metadata), meaning that in general, it is not practical to simply store them
in a driver binary or pool allocation (by default, these addresses are usually
far more than 4GB away from the kernel image base).  There are no documented
and exported routines to allocate kernel mode virtual memory at a specific
virtual address to the author's knowledge.  However, other, less savory
approaches could theoretically be taken (such as allocating physical memory
and altering paging structures directly to create a valid memory region within
4GB of the kernel image base).

After one has solved these difficulties, the rest of this approach is fairly
trivial (and similar to portions of the technique described in 4.1).
Specifically, the replaced exception handlers need to invoke RtlVirtualUnwind
to unwind back to the kernel DPC dispatcher, and then request that execution
be resumed at the unwound context.

This mechanism is not nearly as robust as the first in the author's point of
view, though both approaches could be disabled by abandoning SEH entirely as a
critical path in the execution of the PatchGuard system integrity check
routine.  Specifically, Microsoft could change the characteristics of the DPC
routines in an attempt to frustrate fingerprinting and detection of them at
runtime.  Pre-validation of unwind metadata (or additional checks in the
exception dispatcher itself to ensure that all SEH routines registered as part
of an image are within the confines of the image in-memory) could also be used
to defeat this technique.  There are other security benefits to validating
that SEH routines on x64 that are registered as part of an image really exist
within an image, as will be discussed below.  As such, the author would expect
this to appear in a future Windows version.

4.3) Interception of PsInvertedFunctionTable

Another variation on the theme of intercepting PatchGuard within the SEH code
path critical to the system integrity check routine involves taking advantage
of an optimization that exists in the x64 exception dispatcher.  Specifically,
it is possible to utilize the fact that the exception dispatcher on x64 uses a
cache to improve the performance of exception handling.  By taking advantage
of this cache, it may be possible to intercept control of execution when the
PatchGuard DPC routine deliberately creates an access violation exception in
order to trigger the system integrity check.  This proposed technique uses the
nt!PsInvertedFunctionTable global variable in the kernel, which represents a
cache used to perform a fast translation of RIP values to an associated image
base and exception directory pointer, without having to do a (slow) search
through the linked list of loaded kernel modules.

This technique is fairly similar to the one described in technique 4.2.  Instead
of altering the actual exception directory entries corresponding to each
PatchGuard DPC routine in the kernel's image in-memory, this technique alters
the cached exception directory pointer stored within PsInvertedFunctionTable.
PsInvertedFunctionTable is consulted by RtlLookupFunctionTableEntry, in order
to translate a RIP value to an associated image (and unwind metadata block).
The logic within RtlLookupFunctionTable is essentially to search through the
cached entries resident in PsInvertedFunctionTable for an image that
corresponds to a given RIP value.  If a hit is found, then the exception
directory pointer is loaded directly from the PsInvertedFunctionTable cache,
instead of through the (slower) process of parsing the PE header of the given
image.  If no hit is found, then the loaded module linked list is searched.
Assuming a hit is made in the loaded module list, then the PE header for the
associated module is processed in order to locate the exception directory for
the module.  From there, the exception directory is searched to locate the
unwind metadata block corresponding to the function containing the specified
RIP value.

The structure backing PsInvertedFunctionTable (RTL_INVERTED_FUNCTION_TABLE)
can be described as so in C:

typedef struct _RTL_INVERTED_FUNCTION_TABLE_ENTRY
{
    PIMAGE_RUNTIME_FUNCTION_ENTRY ExceptionDirectory;
    PVOID                         ImageBase;
    ULONG                         ImageSize;
    ULONG                         ExceptionDirectorySize;
} RTL_INVERTED_FUNCTION_TABLE_ENTRY, * PRTL_INVERTED_FUNCTION_TABLE_ENTRY;

typedef struct _RTL_INVERTED_FUNCTION_TABLE
{
    ULONG Count;
    ULONG MaxCount; // always 160 in Windows Server 2003
    ULONG Pad[ 0x2 ];
    RTL_INVERTED_FUNCTION_TABLE_ENTRY Entries[ ANYSIZE_ARRAY ];
} RTL_INVERTED_FUNCTION_TABLE, * PRTL_INVERTED_FUNCTION_TABLE;

In Windows Server 2003, there is space reserved for up to 160 loaded modules
in the array contained within PsInvertedFunctionTable.  In Windows Vista, this
number has been expanded to 512 module entries.  The array of loaded modules
is maintained by the system module loader such that when a module is loaded or
unloaded, a corresponding entry within PsInvertedFunctionTable is created or
deleted, respectively.  It is not a fatal error for the module array within
PsInvertedFunctionTable to be exhausted; in this case, performance for
exception dispatching relating to additional modules will be slower, but the
system will still function.

Because the RIP-to-exception-directory cache described by
PsInvertedFunctionTable maintains a full 64-bit pointer to the exception
directory of the associated module, it is possible to disassociate the cached
exception directory pointer from its corresponding image.  In other words, it
is possible to modify the ExceptionDirectory member of a particular cached
RTL_INVERTED_FUNCTION_TABLE_ENTRY to point to an arbitrary location instead of
the exception directory of that module.  There are no security or integrity
checks that validate that the ExceptionDirectory member points to within the
given image.  This could be exploited by a third-party driver in order to take
control of exception dispatching for any of the first 160 (or 512, in the case
of Windows Vista) kernel modules.  This loaded module list includes critical
images such as the HAL (typically the first entry in the cache) and the kernel
itself (typically the second entry in the cache).  With respect to bypassing
PatchGuard, this makes it possible for a third party driver to copy the
exception directory data of the kernel to dynamically allocated memory and
adjust it such that exception handlers for the PatchGuard DPC routines point
to a stub function that invokes a virtual unwind as described in technique 4.2.
After setting up its altered shadow copy of the exception directory for the
kernel, all that a third party driver would need to do is swap the
ExceptionDirectory pointer within the PsInvertedFunctionTable cache entry for
the kernel with the pointer to the shadow copy.  Following that, this approach
is essentially the same as the proposed approach described in 4.2.  It has the
added advantage of being more difficult to detect from the perspective of
validating the integrity of the exception dispatching path, as the exception
directory associated with the kernel image in-memory is not actually altered;
only a pointer to the exception directory in a cache is changed.

This approach does require a reliable mechanism to detect
PsInvertedFunctionTable (which is not exported) at run-time, however.  The
author feels that this is not a particularly difficult task, as the first few
members of PsInvertedFunctionTable (specifically, the maximum entry count and
the entries for the HAL and kernel) will have predictable values that can be
used in a classic egghunt style search of kernel global variable space.
Additional heuristics, such as requiring several data references to the
suspected PsInvertedFunctionTable location within kernel code could be applied
as well, in the interest of improving accuracy.

This proposed approach may be countered by many of the proposed counters to
techniques 4.1 and 4.2.  Additionally, this technique could also be countered by
validating exception directory pointers within PsInvertedFunctionTable, such
as by ensuring that such exception directory pointers are within the confines
of the purported associated image.  Although this validation is not perfect
since it might still be possible for one to reposition the exception directory
pointer to a different location within the image that could be safely modified
at runtime, such as overlapping a large global variable array or the like, it
would certainly increase the difficulty of subverting the exception
dispatcher's RIP translation cache.  Additional validation techniques, such as
requiring that the exception directory point to read-only memory, could be
similarly adopted to reduce the chance that a third party driver could
meaningfully subvert the cache (with results leading to something other than a
system crash).

It should be noted that in the current implementation, PsInvertedFunctionTable
presents a relatively inviting target for potentially malicious software to
hijack parts of the kernel without being detected.  Indeed, through careful
planned subversion of PsInvertedFunctionTable, third party software could take
control of exception dispatchers throughout the kernel in order to gain
control of execution.  Though this technique would be much more limited than
outright kernel patching, it has the advantage of being completely undetected
by current PatchGuard versions (which cannot validate global variables that
may change without notice at runtime, for obvious reasons).  It also has the
advantage of being undetected by current rootkit detection systems, which are
presently (to the author's knowledge) blissfully unaware of
PsInvertedFunctionTable.  Although it would require administrative permissions
(or an exploit granting such permissions) for an attacker to modify
PsInvertedFunctionTable in the first place, Microsoft has at late focused a
great deal of effort on protecting the kernel even from users with
administrator permissions.  For example, one could conceive of a rootkit-style
program that intercepts exception dispatchers for system services, and passes
invalid user mode pointers to system services in order to surreptitiously
execute kernel mode code without detection when the standard pointer probe
throws an exception indicating that the given usermode pointer parameter is
invalid.  Given this sort of threat (from the rootkit perspective), the author
feels that it would be in Microsoft's best interests to put into place
additional validation of PsInvertedFunctionTable's cached exception directory
pointers (assuming that Microsoft wishes to continue down the path of
strengthening the kernel against malicious administratively-privileged code).

4.4) Interception of KiDebugTrapOrFault

Although many of the proposed techniques for blocking PatchGuard have so far
relied on the fact that PatchGuard utilizes SEH to kick off execution of the
system integrity check, there are different approaches that can be taken which
do not rely on this specific PatchGuard implementation detail.  One such
alternative technique for bypassing PatchGuard involves subverting the kernel
debug fault handler: KiDebugTrapOrFault.  This handler represents the entry
point for all debug exceptions (such as so-called hardware breakpoints), and
as such presents an attractive target for bypassing PatchGuard.

The basis of this proposed technique is to utilize a set of hardware
breakpoints to intercept execution at a convenient critical location within
PatchGuard's execution path leading up to the system integrity check.  This
technique has a greater degree of flexibility than many of the previously
described techniques, though this flexibility comes at cost of a significantly
more involved (and difficult) implementation.  Specifically, one could use
this proposed technique to intercept control at any point critical to the
execution of PatchGuard's system integrity check (for example, the kernel DPC
dispatcher, one of the PatchGuard DPC routines, or a convenient location in
the exception dispatching code path, such as _C_specific_handler.

The means by which this interception of execution could be accomplished is by
assuming control of debug exception handling.  This could be done in several
different ways; for example, one could hook KiDebugTrapOrFault or alter the
IDT directory to simply repoint the debug exception to driver-supplied code,
bypassing KiDebugTrapOrFault entirely.  There are even ways that this
interception could be done in a way that is transparent to the current
PatchGuard implementation, such as by intercepting PsInvertedFunctionTable as
described in technique 4.3.  A driver could then alter the unwind metadata for
KiDebugTrapOrFault and create an exception handler for this routine.  This
step would allow transparent, first-chance access to all debug faults (because
KiDebugTrapOrFault internally constructs and dispatches a STATUS_SINGLE_STEP
exception describing the debug fault; normally, this would present the
STATUS_SINGLE_STEP exception to a debugger, but there is no technical reason why
a standard SEH-style exception handler could not catch the exception).
Regardless of how control of execution at the debug trap handler is gained,
the next step in this proposed approach is to alter execution at the requested
point of interest (whether it be the kernel timer DPC dispatcher, which could
be easily found by queuing a DPC and executing a virtual unwind, or a
PatchGuard DPC routine, or _C_specific_handler or any other place of interest
in the critical PatchGuard execution path) to prevent PatchGuard's system
integrity check from executing.

After the implementor has established control over the debug trap handler
(through whichever means desired), all that remains is to set
debug-register-based breakpoints on target locations.  When these breakpoints
are hit, control is transferred to the debug trap handler, and from there to
the implementor's driver code which can act as necessary, such as by altering
the execution context of the processor at the time of the exception before
resuming execution.

The advantages of this approach over directly patching into kernel code (i.e.
opcode replacement) are threefold.  First, it is more flexible in that there
are no difficulties with placing an absolute 64-bit jump in an arbitrary
location (in x64, this typically takes around 12 opcode bytes to do from any
arbitrary location in memory).  For example, one does not have to worry about
whether a the opcode space overwritten by the jump might overlap a whole
instruction boundary that is a jump target, which might lead to invalid code
being executed.  Secondly, this approach can be used to get out of having to
implement a disassembler (or other similar forms of code analysis) in kernel
mode, as hardware breakpoints allow one to gain control of execution at a
precise location without having to worry about creating enough space for a
jump patch, and then placing the original instructions back into a jump stub
to allow execution to resume at the original effective instruction stream (if
desired).  Finally, if done correctly, this technique could be implemented in
a truly race-condition free manner (as the only patching that would need to be
done is an interlocked 8-byte swap of a pointer-aligned value in
PsInvertedFunctionTable, if one took that approach).

This approach does require that the implementor pick a location (or multiple
locations) in the kernel that are to have breakpoints set over in order to
gain execution control.  There are many possibilities, such as the DPC
dispatcher (where one could filter out the PatchGuard DPC by detecting, say,
invalid kernel pointers in DeferredContext), the execution dispatcher path
(where one could unwind past a PatchGuard DPC's access violation exception), a
PatchGuard DPC itself (where one could again unwind past with
RtlVirtualUnwind, bypassing PatchGuard if the DPC is being invoked by
PatchGuard), or any other choice area.  One of the advantages of this approach
is that it is comparatively easy to intercept execution anywhere in the kernel
that can be reliably located across kernel versions, making it potentially a
great deal more flexible to being easily adapted to defeat future PatchGuard
implementations than some of the previously discussed bypass techniques.

Normally, the kernel has logic in place that prevents stray kernel addresses
from being placed in debug registers by user mode code via NtSetContextThread.
It may be necessary to make additional alterations to ensure that the custom
values in the debug registers are persisted across context switches, via the
same mechanisms used by the kernel debugger to persist debug registers.

In the author's opinion, this technique would be difficult for Microsoft to
defeat in principle, barring hardware support (like virtualization).  Although
Microsoft could move around critical code paths for PatchGuard, this technique
presents a general mechanism by which any location in the kernel could be
surreptitiously intercepted, thus lending itself to relatively easy adaptation
to future PatchGuard revisions.  One approach that could be taken is to
perform increased validation of the debug trap handler in an attempt to make
it more difficult to intercept without being detected by PatchGuard or some
other validation mechanism.  Other counters to this sort of tactic (in
general) would be to make it difficult to reliably locate all of the critical
code paths in a consistent and reliable manner across all kernel versions,
from the perspective of a third party driver.  This is likely to prove
difficult, as a great deal of the internal workings of the kernel are exposed
in some way to drivers (i.e. exported functions), or are otherwise indirectly
exposed to drivers (i.e. trap labels via the IDT, exception handlers via
unwind metadata and exports used in the process of dispatching exceptions to
SEH registrations).  Completely insulating PatchGuard from all such externally
visible locations (that could be comparatively easily compromised by a third
party driver) would, as a result, likely be an arduous task.

The debug trap handler can be used to do more than simply evade PatchGuard for
purposes of allowing conventional kernel code patches via opcode replacement.
It can also be utilized in order to completely eliminate the need to perform
opcode-replacement-based kernel patches in order to gain execution control.
In this vein, via assuming control of the debug trap handler in a way that is
transparent to PatchGuard (such as via the proposed
PsInvertedFunctionTable-based approach), it would then be possible to set
debug-register-based breakpoints at every address of interest (assuming that
there enough debug registers to patch all of the locations of interest).  From
the debug trap handler, it is possible to completely alter the execution
context at the point of the debug exception, which is exactly the same as what
one could do via traditional opcode-replacement-based patching for a given
location.  This sort of transparent patching would be extremely difficult for
Microsoft to detect, because the debug registers must remain available for use
by the kernel debugger.  Without completely crippling the ability of the
kernel debugger to set breakpoints without being attached before PatchGuard is
initialized, the author does not see a particularly viable (i.e. without a
trivial workaround) way for Microsoft to prevent the use of debug registers to
alter execution context at select points in the kernel (from a third party
driver).  Because such an approach would capitalize on the fact that Microsoft
must, from a business case perspective, make it possible for IHVs and ISVs to
debug their code on Windows, the author believes that it would be unlikely to
be successfully disabled by Microsoft.  Furthermore, because such techniques
can be implemented without even having the basic requirement of disabling
PatchGuard, they would be inherently much more likely to work with future
PatchGuard revisions.  After all, if PatchGuard can't even detect changes to
the kernel (because kernel code isn't being patched), then there is no reason
to even bother with trying to disable it, which gets one out of the
comparatively messy business of playing catch-up with Microsoft with each new
PatchGuard revision.

4.5) General Detect Bit Interception

One of PatchGuard's anti-debug mechanisms relates to debug registers.
Specifically, PatchGuard attempts to clear Dr7 (the debug control register) in
an attempt to disable all debug-register-based breakpoints, as one of the
first tasks upon entering the system integrity check routine.  This presents
an inherent weakness within PatchGuard, as there is support built-in to the
processor that allows one to detect (and intercept) direct accesses to any of
the debug registers.  This support is primarily legacy, intended for so-called
in-circuit emulators (ICEs), which were special hardware components that acted
as a true hardware-based debugger by allowing one to control a processor from
outside the context of the system entirely, in essence truly isolating the
debugger from the operating system and any programs running under it.  This
support is embodied in the General Detect bit in Dr7, which when set, causes a
debug trap to be generated on any successful access to a debug register.  This
is significant in that it provides a way for an attacker to trap PatchGuard's
access to Dr7 (zeroing it), which in effect provides a means to pinpoint the
exact location of PatchGuard's system integrity routine in-memory,
in-plaintext.  Furthermore, it gives an attacker the possibility of making any
alterations desired to the execution context at the very start of the system
integrity check, which could be trivially used in order to simply implement an
immediate return out of the system integrity check logic without actually
verifying the system's integrity (as dr7 is zeroed before any integrity checks
are performed).  This approach effectively turns another one of PatchGuard's
protection mechanisms against it, utilizing the anti-debug-register behavior
to detect (and block) PatchGuard.

The general idea behind this approach is similar to that described in
technique 4.4.  In the same fashion as in technique 4.4, an implementor of this
approach is required to gain control of the debug trap handler.  For this
task, any of the proposed approaches in technique 4.4 may be used.  After control
of the debug trap handler is established, an attacker must then set the
general detect bit in Dr7 and wait for PatchGuard to access the debug
registers.  It should be noted that during the legitimate course of execution,
the kernel itself will often directly access debug registers, such as during
context switches or if NtSetContextThread/NtGetContextThread are invoked.  Any
such implementation of this technique must be able to differentiate between
PatchGuard's accesses of the debug registers and legitimate accesses.  This
could be trivially implemented by checking if the RIP value at the time of the
trap was within a valid kernel image or not, as the PatchGuard system
integrity check routine resides in dynamically allocated non-paged pool and
not within the confines of the kernel images in-memory.

When the debug trap handler is invoked as a result of PatchGuard zeroing Dr7,
then the appropriate action (which could be as trivial as simply executing a
hard return out of the system integrity check routine) can be taken by the
third-party driver wishing to disable PatchGuard.

Like the techniques that capitalize on PatchGuard's use of SEH to obfuscate
the call to the system integrity check routine, this approach relies on using
one of PatchGuard's defensive mechanisms against it.  The most obvious counter
would be to thus remove the behavior of zeroing debug registers.  However,
disabling this behavior may not be very desirable, as it would then be very
easy to detect PatchGuard by, say, setting a read breakpoint on kernel code
and waiting for PatchGuard to perform a read.  Since reads of kernel code (as
opposed to execute fetches) are fairly atypical, this would open up another
easy mechanism by which PatchGuard could be bypassed.

The best course of action by Microsoft here would be to make it as difficult
as possible to differentiate between legitimate accesses to debug registers
and PatchGuard's own accesses, although this is likely to not be very doable.
Strengthening the debug trap path against interception by placing additional
validation checks over that code path might also be useful in countering this
technique, although likely to only a limited, easily-bypassable extent.

4.6) Patching the Kernel Timer DPC Dispatcher

Currently, PatchGuard utilizes a timer with an associated DPC to transfer
control to a preselected one of ten possible legitimate DPC routines that have
been slightly modified for use with PatchGuard.  Because third party kernel
drivers are given a documented and exported interface to create timers with
associated DPC routines, this represents a weakness in PatchGuard, in that it
presents an easily-detectable location in the critical execution path for
PatchGuard's system integrity check routine that could be relatively easily
compromised by a third-party driver.  This technique focuses on gaining
control of the timer DPC dispatcher, with the goal of detecting when the
PatchGuard DPC is about to be dispatched.  When the PatchGuard DPC is
detected, then the third-party driver could skip over the PatchGuard DPC
routine entirely, thus disabling PatchGuard.

In order to accomplish this, a third party driver would need to locate the
exact instruction within the kernel timer DPC dispatcher that is responsible
for making calls to timer DPC routines.  Fortunately, this is a fairly easy
task for a driver, as the interfaces for creating timers with associated DPCs
and DPC routines are documented and exported.  Specifically, a third party
driver could queue a timer DPC, and then record address of the DPC dispatcher
routine via inspection of the return address of the timer DPC routine when it
is called.  From there, the driver can derive the address of the call
instruction responsible for making the call to the DPC routine associated with
a DPC object that is associated with a timer.

At this point, all a third party driver needs to do is patch the call
instruction in the DPC dispatcher to transfer execution control to the
driver's code.  From there, the driver can filter all timer DPCs for the
PatchGuard DPC routine (perhaps by looking for a bogus kernel address in
DeferredContext, paired with a DPC routine that is within the confines of the
kernel image in-memory).  When the PatchGuard DPC is detected, then the driver
can decline to call the DPC routine and instead simply return control to the
kernel DPC dispatcher after the call instruction in the logical original
instruction stream.  This effectively prevents PatchGuard from ever running
the system integrity check, which again gives the driver free reign to patch
the kernel without fear of intervention by PatchGuard.

In the author's opinion, the best way to prevent this approach is to use a
multitude of different mechanisms to kick off execution of the PatchGuard
check routine.  For example, a dedicated thread waiting on a timer could also
be used, or a frequently-called system routine could be modified to
periodically make calls to PatchGuard.  As long as calls to PatchGuard are
funneled through one location, such as the timer DPC dispatcher, the entire
PatchGuard integrity check system is at risk of being trivially bypassed in
one fell swoop by third party drivers.

4.7) Searching for the PatchGuard DPC

PatchGuard currently uses a KTIMER object with an associated KDPC object, both
allocated within non-paged pool memory, as a periodic trigger used to start
PatchGuard's system integrity check routine.  It should be possible to locate
this timer object in memory and cancel it, preventing PatchGuard from
executing.

The implementation of this technique is essentially a classical egghunt style
search through non-paged pool, with some specially defined restrictions as to
how to find the target.  Specifically, one is looking for a region of memory
matching the following criteria:

  1. The memory is a valid KTIMER object.  This means that the linked list
     entries should be valid, and point to other seemingly valid KTIMER objects (or
     the list head), and that the type field of the KTIMER is consistent with a
     timer object.
  2. The timer should have a timer interval in the range of several minutes.
     PatchGuard applies a randomized fuzz factor to the timer interval (within a
     small range), but verifying that the range of the timer is no more than
     several minutes (say 7 or 8) should be an ample sanity check.
  3. The timer should have a KDPC associated with it (and the pointer should
     be valid non-paged pool).
  4. The associated KDPC should have the appropriate type field.
  5. The associated KDPC should have a DPC routine that is within the confines
     of the kernel image in-memory.
  6. The associated KDPC should have a DeferredContext value that is a
     non-canonical kernel address.

After the DPC is found, it can simply be canceled in order to disable
PatchGuard.  Similar approaches as could be used to prevent technique 4.6 would
be applicable here.  Specifically, a diversity in the set of paths leading up
to PatchGuard's execution would make a technique like this, which is targetted
at one path (such as the timer DPC dispatcher) less effective at blocking
PatchGuard.

4.8) TLB Desynchronization (Split TLB)

All x86 processors supporting protected mode and paging employ a caching
scheme to speed up the translation of virtual addresses to physical addresses.
This scheme is implemented via a set of Translation Lookaside Buffers, or
TLBs, which cache the contents of the page attributes (and associated physical
address) for a given virtual address.  Recent x86 processors (Pentium II-class
or later) utilize several sets of TLBs, such as one set of TLBs for data
accesses and one set of TLBs for instruction accesses.  In normal system
operation, both TLBs (if a processor supports multiple TLBs) maintain
consistent views for the attributes of a particular page; however, it is
possible to deliberately desynchronize the contents of these TLBs, thereby
maintaining the illusion that a single page has different attributes depending
on whether it is referenced as data or as executable code.  This deliberate
desynchronization of TLBs has many uses, from the implementation of no-execute
support (utilized by PaX/GRsec on GNU/Linux [6]) to ``memory cloaking'', a
technique often used by rootkits to provide one view of memory when memory is
referenced as data by a read operation, and a different view of memory if
memory is referenced by an instruction fetch.  This same memory cloaking
technique that has appealed to rootkit developers for the purpose of hiding
rootkits from detection can also be used to hide kernel patching from
PatchGuard's integrity check.  Strictly speaking, this proposed technique is
not a bypass mechanism for PatchGuard; rather, it is a mechanism to hide
kernel patching from PatchGuard, thus making PatchGuard harmless to third
parties that are patching the kernel.

The details of this approach are essentially similar in many respects to that
of any program implementing a split-TLB approach to altering page attributes
or contents based on execute or read fetches.  The exact details behind how
this can be accomplished are beyond the scope of this paper, and are discussed
elsewhere, by the PaX team (in the context of implementing no-execute on
legacy platforms) [6], and by Sherri Sparks and Jamie Butler (in the context of
implementing a Windows rootkit that utilizes split-TLBs to implement so-called
``memory cloaking'') [7].  Interested readers are encouraged to review these
references for the raw details on how the general split-TLB concept is
implemented.  Although the referenced articles directly apply to x86, the
concepts apply in principle to x64 as well, and can likely be made to work on
x64 with minimal modification.

After one has established a mechanism for desynchronizing TLBs (such as by
hooking the page fault handler), the recommended approach for this technique
is to desynchronize the TLBs for any regions in the kernel where one is
performing traditional opcode-replacement-based patching or hooking.
Specifically, when kernel code is read for execute on a page where an
opcode-replacement-based patch is in place, then the patched page should be
returned.  If kernel code is read for a data reference (such as PatchGuard
making a read of kernel code to validate its integrity), then the original
data should be returned.  This technique effectively hides all modifications
to kernel code to any access other than direct execution, which prevents
PatchGuard from detecting that kernel code has been altered by a third party.

Note that in order for this approach to succeed, the hook on the page fault
handler itself must be hidden from PatchGuard.  This cannot be directly
accomplished by the same TLB desynchronization tactic, as the page fault
handler must remain resident.  A combined approach, such as utilizing a debug
breakpoint on the page fault handler (when coupled with a subverted debug trap
handler, perhaps via PsInvertedFunctionTable as described previously in
technique 3) along with a scheme to prevent PatchGuard from disabling
debug-register-based breakpoints (such as described in technique 5) might be
needed in order to hook the page fault handler in a manner truly transparent
to PatchGuard.

The most logical defense for this approach is to attempt to detect a
compromise in the page fault dispatching path.  Because TLB desynchronization
cannot in general be used to hide the page fault handler itself (the page
fault handler must remain marked present in memory), it would be difficult for
a third party to conceal the alteration to the page fault handler from the
kernel.  This difficulty would be expressed in a limited number of ways in
which alterations to the page fault handler could be hidden, such as by clever
utilization of debug registers.  As a result, the key to preventing this
technique from remaining viable is to develop a way for PatchGuard to detect
the page fault hook.  If, for example, the debug trap handler and a debug
breakpoint on the page fault handler were used to gain control on a page
fault, then Microsoft might be able to prevent this technique by blocking or
detecting the interception of the debug trap handler.  One such approach might
be to better secure PsInvertedFunctionTable, which represents an easy way for
a third party to subvert the debug trap handler without PatchGuard's
knowledge.  Such counters will vary based on the mechanism used to hide the
page fault handler hook, however.

4.9) DPC Routine Patching

A variation on technique 4.2, a very simple-minded approach to disabling
PatchGuard would be to simply hook every possible DPC routine, check if the
DPC is probably being called in order to execute PatchGuard's system integrity
check, and if so, simply returning from the DPC to the kernel timer DPC
dispatcher.  In order to implement this approach, one first needs to locate
each possible DPC routine.  Technique 4.2 lists a number of viable algorithms for
fingerprinting (and locating) each DPC routine; any (preferably multiple) of
the suggested algorithms in that technique would be directly applicable to
this proposed approach.

After one has identified all the possible DPC routines, all that is left is to
patch each one to branch to driver controlled code.  From there, the driver
could make the decision as to whether the DPC is being invoked legitimately,
or whether it is being invoked as part of PatchGuard's system integrity check
process (easily identified by a non-canonical kernel address being passed as
DeferredContext).  If the DPC is PatchGuard-related, then all the driver need
do to block PatchGuard is to immediately return to the DPC dispatcher.

This approach is fairly trivial to prevent (from Microsoft's point of view).
Because it is signature-based, one possible counter-approach Microsoft could
implement would be determining which signatures third party drivers use to
detect PatchGuard DPCs, and altering the PatchGuard DPC routines to not match
those signatures in the next PatchGuard version.  Microsoft could also change
the number of DPC routines to throw off drivers that assume PatchGuard will
use exactly ten DPCs, or Microsoft could switch to an alternative delivery
mechanism other than DPCs in order to prevent existing code that detects and
hooks specific DPC routines from blocking PatchGuard.

5) Subverting PatchGuard

PatchGuard currently possesses a formidable array of defensive mechanisms that
are aimed at making it difficult to reverse engineer and debug.  Given that
Microsoft does not currently have in place the infrastructure to make
PatchGuard enforced by hardware, this is arguably the best that Microsoft will
ever really be able to do in the short term.  They're only able to build a
system that is based on obfuscation and anti-debugging techniques in an
attempt to make it difficult for third parties to detect, disable, or bypass
it.

There are other classes of software that seek to create defenses similar to
those of PatchGuard's.  However, these other classes usually have far more
nefarious purposes than preventing third parties from patching the kernel.
Specifically, anti-debugging, anti-reverse-engineering, and self-decrypting
code have often used been to hide viruses, rootkits, and other malicious
software on compromised systems.

Although Microsoft may have intended the defensive mechanisms employed by
PatchGuard for an (arguably) good cause, these same anti-debugging,
anti-detection, and anti-reverse-engineering techniques that protect
PatchGuard from attack by third party drivers can also be subverted to protect
custom code from detection or analysis by anti-virus or anti-rootkit software.
With this respect, Microsoft has created a double-bladed-sword, as the same
elaborate obfuscation and anti-debugging schemes that guard PatchGuard against
third party software can also be used to guard malicious software from system
security software.  It is in fact quite possible to subvert PatchGuard version
2's myriad defenses to execute custom code instead of PatchGuard's system
integrity check routine.  While doing so might not be exactly called trivial,
it is far from impossible.

In order to subvert PatchGuard to do one's bidding, one must first catch
PatchGuard in the act, so to speak.  To accomplish this, the author recommends
turning to one of the proposed bypass techniques as a starting place.  For
example, consider the first proposed bypass technique, wherein the author
recommends hooking _C_specific_handler to intercept control of execution at
the exception generated by the PatchGuard DPC routine in order to trigger
execution of the system integrity check.  An implementation of this bypass
technique provides direct access to the machine context inside the PatchGuard
DPC routine, and this machine context contains the information necessary to
locate the PatchGuard system integrity check routine.

Since the objective is to repurpose the system integrity check routine to
execute custom code, this is a good starting point.  However, determining the
location of the system integrity check routine is much more involved than
simply skipping over PatchGuard's checks entirely; the pointer to the routine
in question is encrypted based off of  the original arguments to the DPC (the
Dpc and DeferredContext arguments).  Additionally, the original arguments to
the PatchGuard DPC have at this point already been moved from registers to the
stack and obfuscated (rotated left or right by a magical constant).  As the
original contents of the argument registers are deliberately overwritten by
the DPC routine before the access violation is triggered, there is no choice
other than to somehow fish the DPC arguments out of the caller's stack.  This
is actually somewhat of a challenge, given that such an approach must work for
all kernel versions, and must also work for all of the different DPC
permutations.  Since this set of possibilities represents an unmaintainably
large number of routines to reverse engineer in order to determine rotate
obfuscation values and stack offsets, a more generalized approach to locating
the original arguments on the stack must be taken.  In order to create such a
generic approach, one must take a closer look at the first few instructions of
each DPC routine (leading up to the intentional access violation).  Although
PatchGuard has put into place several barriers to prevent easy retrieval of
the original arguments from this context, there might be a pattern or weakness
that could be exploited in order to recover the arguments in question.

The basic things common to each DPC routine, when it comes to the machine
context at the time of the access violation, are:

  1. The original arguments have been stored on the stack in an obfuscated
     form (rotated left or right by an arbitrary magical constant).

  2. The access violation always occurs by dereferencing rax.  Here, rax is
     always the deobfuscated form of the DeferredContext argument.  This gives us
     one of the arguments for free, as rax in the register context at the time of
     the access violation is always the plaintext DeferredContext value.

  3. The stack location where the Dpc argument is stored at varies greatly
     between DPC version to DPC version.  Furthermore, it also varies between
     different kernel flavors within an operating system family, and between
     operating system families.  As a result, it is not practical to hardcode stack
     displacements for this argument.
  
  4. The instruction immediately prior to the faulting instruction is always
     an instruction in the form of ror rax, <magical constant>.  Here, the magical
     constant is an immediate value, which means that it is encoded as a part of
     the opcode for this instruction itself.  Each DPC has its own unique magical
     constant, and the magical constants used do not change for a particular DPC
     flavor across all kernel flavors and operating system families.  This gives us
     a nice way to quickly identify which of the ten PatchGuard DPCs is in use from
     the context of the _C_specific_handler hook (without having to do ugly code
     fingerprinting or analysis).  Unfortunately, we still don't have a way to
     determine the stack displacement of the Dpc argument.

  5. The r8 register is always equal to the original Dpc argument, shifted
     right by the low byte of the DeferredContext argument.  Although this may seem
     tantalizingly close to what we're looking for, it can't actually be used as a
     substitute for the original Dpc argument, even though the DeferredContext
     argument is known here (due to the value of rax).  This is because the right
     shift operation is destructive, in that information is permanently lost as
     bits are shifted right off of the register into oblivion.  As a result,
     depending on the low byte of the DeferredContext argument, important bits in
     the Dpc argument have already been permanently lost in the pseudo-copy
     residing in r8.

Although the situation may initially appear grim, it is in fact still possible
to locate the Dpc argument given the above information; all that is needed is
a bit of work (and getting one's hands dirty with some ugly tricks).
Specifically, it is possible to search the stack frame of the DPC routine for
the Dpc argument with a brute-force attack.  This isn't exactly elegant, but
it gets the job done.  There are a number of hints that can be used to
increase the chance of successfully finding the real Dpc argument on the
stack:

  1. The stack is 8-byte aligned (at least) due to x64 calling convention
     requirements, and the Microsoft C/C++ compiler will always place pointer-sized
     values on the stack in 8-byte-aligned locations.  As a result, the search can
     be narrowed down to 8-byte-aligned locations on the stack, instead of a
     bytewise search.

  2. Because the identity of the current DPC routine is known (due to
     analyzing the ror instruction immediately preceding the faulting mov eax,
     [rax] instruction), the rotate constant used to obfuscate the Dpc argument is
     known.  Each DPC routine has its own unique magical rotate constant, and as
     the current DPC routine has been positively identified, the rotate constant
     used to obfuscate the Dpc argument on the stack is thus also known.

  3. A quick check as to whether a value on the stack could possibly be the
     Dpc argument can be made by rotating the value on the stack by the known
     obfuscation constant, then shifting the value right by the low byte in the
     DeferredContext argument and comparing the result to the r8 value at the time
     of the exception.  If there is a mismatch, then the current stack location can
     be eliminated from the search.  This does not provide a positive match, but it
     does provide a way to positively eliminate possibilities.  This step is also
     optional, in that it is still possible to locate the Dpc argument without
     relying on r8; the check against r8 is simply an optimization.

  4. The Dpc argument should point to a valid non-paged pool address, given
     that it must represent a valid kernel pointer.  In order to check that this is
     the case, MmIsAddressValid can be used to test whether the deobfuscated value
     in question is a valid pointer or not.  (Yes, MmIsAddressValid is a bit of a
     race condition and certainly a hack.  The author would like to note that this
     approach was described as requiring that the implementor get his or her
     ``hands dirty with some ugly tricks'', in an attempt to forstall the
     inevitable complaints about how this approach might be decried as an
     unstomachable ugly hack by some.)

  5. The Dpc argument should point to a valid non-paged pool address whose
     length is great enough to contain a KDPC object, plus at least one
     pointer-sized additional field.  A secondary MmIsAddressValid test can be used
     to verify that the pointer describes a valid region large enough to contain
     the KDPC object, plus the additional pointer-sized field following it (the
     PatchGuard decryption key).

  6. The Dpc argument should point to a DPC whose Type and DeferredContext
     arguments have been zeroed.  (The DPC routine intentionally zeros these values
     in the DPC before intentionally triggering an access violation.)  If the
     suspected Dpc argument, when treated as a PKDPC, does not have these
     properties then it can be eliminated as a possibility.

By repeatedly applying these rules to every applicable location within a
reasonable distance upward from the rsp value at the time of the exception
(say, 256 bytes, although the exact size can be greater; the only requirement
is that the entire local variable space of the DPC routine with the largest
local variable space is completely contained within the search region), it is
possible to recover the Dpc argument with virtual certainty.  In the author's
experience, this technique works quite reliably, despite that one might intuit
that a search of an unknown stack frame might be prone to failing or turning
up false positives.

After both the Dpc and DeferredContext arguments to the PatchGuard DPC routine
have been recovered, it is a simple matter of analyzing how PatchGuard invokes
the system integrity check in order to determine how to locate it in-memory.
This has been discussed previously, and it amounts to the following set of
statements:

ULONG64 DecryptionKey, PatchGuardCheckFunction;

DecryptionKey            = *(PULONG64)(Dpc + 0x40);
PatchGuardCheckFunction  = DecryptionKey ^ DeferredContext;
PatchGuardCheckFunction |= 0xFFFFF80000000000;

At this point, it's almost possible to replace the system integrity check
routine with custom code.  However, there is still the matter of the pesky
self-decrypting stub that runs before the check function.  Because the DPC
routine's exception handler rewrites the first instruction of the stub before
it is executed, one doesn't have a whole lot of choice but to implement at
least a very basic version of the decryption stub for the system integrity
check routine.

Recall that the first instruction in the stub is set to the following:

lock xor qword ptr [rcx],rdx

Looking at the prototype for the decryption stub, rcx corresponds to the
address of the decryption stub itself, and rdx corresponds to the decryption
key.  Since this instruction modifies both itself and the next instruction
(the instruction is four bytes long and the xor alters eight bytes), the
replacement code for the system integrity check routine must allow the first
instruction to be the above xor instruction, and the must allow for the second
instruction (at a minimum) to be initially xor-obfuscated.  For simplicity's
sake, the author has chosen to implement the simplest possible solution to
this conundrum, which is to make the second instruction in the replacement
code a duplicate of the first instruction.  In other words, the replacement
code would read as follows:

;
; This instruction is forced on us by PatchGuard,
; and cannot be altered; it is rewritten at runtime.
;

lock xor qword ptr [rcx],rdx

;
; The next instruction, conveniently four bytes
; long, re-encrypts itself by xoring the first
; eight bytes of the decryption stub (which includes
; the second instruction) by the decryption key a
; second time;
;

lock xor qword ptr [rcx],rdx

;
; (... any custom code may follow here ...)
;

As noted previously, after specially constructing the replacement code, it is
necessary to initially encrypt the second instruction (as it will be
immediately decrypted by the  first instruction).  This must be done before
control is returned to PatchGuard.

After the custom code is configured and the second instruction is encrypted,
all that remains is to copy the custom code over the PatchGuard decryption
stub.  When this is accomplished, the PatchGuard DPC's exception handler will
invoke the supplied custom code instead of the system integrity check routine.

However, this is not really all that interesting due to the fact that
PatchGuard utilizes a one-shot timer.  The custom code that was substituted
for the decryption stub will never be run again.  To account for this fact, it
would be prudent to place a call to queue a timer with an associated DPC
routine (pointing to the DPC routine that PatchGuard selected at boot) within
the custom code block.

At this point, it is possible to simply allow the normal exception dispatching
process to continue (i.e. to resume _C_specific_handler), after which the
custom code will be invoked instead of PatchGuard.  In essence, PatchGuard has
been not only disabled, but completely subverted to call customized code under
the control of a third party driver instead of the system integrity check.

Still, the situation is less than optimal.  Presently, there is still a hook
in _C_specific_handler that is there for anyone to see (and recognize that
someone has tampered with the kernel).  Additionally, the driver that was used
to subvert PatchGuard in the first place is still loaded, which may also be a
tell-tale giveaway sign that someone may have done something unsavory to the
kernel.

These problems are also solvable, however.  It turns out that after PatchGuard
has been subverted, it is safe to unhook from _C_specific_handler, and then
simply call back into _C_specific_handler after the hook is removed.
Furthermore, everything necessary to run the subverted system integrity check
routine could even reside within PatchGuard's own internal data structures;
for example, one could simply utilize extra space after the custom code, where
the decryption stub and PatchGuard check routine would normally reside as a
parameter block.  This is especially convenient, as the custom code block is
given a pointer to itself in rcx (the first argument), and it is easy to add a
known constant value to that pointer in order to retrieve the parameter block
for the custom code.  At this point, all of the code and data necessary for
the custom code that the driver has subverted PatchGuard with is located in
dynamically allocated memory.  Given this, the original driver is no longer
needed and can even be unloaded (so as to further disguise the fact that any
alterations to the kernel have taken place).  After the driver has been
unloaded, the only traces of the alterations that have taken place would be
the unloaded module list (easily modified), and the re-written PatchGuard
system integrity routine itself (which could easily be bolstered to be
self-decrypting (with a differing encryption key in order to make for an
extremely difficult to locate target in-memory).

The end result is that PatchGuard has been disabled, and in its place,
arbitrary custom code is periodically executed.  Furthermore, no modifications
or patches to kernel code or global data are present and no suspicious drivers
(or even suspicious extraneous memory allocations) remain present in memory.
In essence, the only traces of the fact that PatchGuard has been subverted
would be visible only to someone (or something) that knows how to locate and
disable PatchGuard.

The supplied example program for subverting PatchGuard is fairly simple, and
it does not utilize all of the defensive technologies employed by PatchGuard.
For instance, it does not change the decryption key on every execution, nor
does it follow through with keeping the entire code block encrypted except
just before execution.  These features could be easily added, however, and
would greatly increase the difficulty of locating the subverted PatchGuard
code in memory.

6) Future Direction of PatchGuard and ``Anti-Hack'' Systems

In the future, there are a couple of generalized approaches that Microsoft
could take to significantly strengthen PatchGuard against attack.
Specifically, these involve adding redundancy and removing single points of
failure from PatchGuard.  It is often helpful to look at an anti-hack system
like PatchGuard as a critical system that one would like to keep running at
all times with minimal downtime (i.e. a network or service with
high-availability).  The logical way to accomplish that goal is to locate and
eliminate single points of failure, such as by adding redundancy.  In a high
availability network, one would accomplish this by adding redundant cables,
switches, and the like, such that if one component were to fail, the system as
a whole would continue to operate instead of failing entirely. With an
anti-hack system such as PatchGuard, it is helpful to add redundancy to all
critical code paths such that there is no single point where an attacker can
simply change an opcode or hook in with the end result of disabling the entire
anti-hack system.

Removing these single points of failure is critical to the longevity of an
anti-hack system.  The main concept to grasp in such cases is that the
attacker will always try to seek out the easiest way to break the defenses of
the target system.  All the obfuscation and encryption in the world does
little good if an attacker can simply change a jmp to a nop and prevent
elaborate encryption and anti-debugging facilities from ever getting the
chance to run.  In this respect, PatchGuard is flawed in its current
implementation.  There are many different single points of failure where an
attacker could inject themself at a single place and completely disrupt
PatchGuard.

One possible solution to this problem might be to ensure that there are
multiple different code paths that can lead to every point in the PatchGuard
system integrity check.  The nature of the battle between anti-hack systems
and attackers relates to how easy it is to bypass the weakest link in the
anti-hack system.  Until all of the weak links in the system are shored up
simultaneously, the system remains much more vulnerable to easy attack or
bypass.  With this respect, PatchGuard version 2 does little to improve on the
weakest links of the system and as such there are still a vast number of ways
to bypass it.  Even worse, each bypass technique is often only required to
attack one specific aspect of PatchGuard in order to disable it as a whole.

As far as PatchGuard itself is concerned, one approach that Microsoft could
take to significantly increase the resiliency and robustness of the system to
outside interference would be to merge some sort of critical system
functionality with the PatchGuard system integrity check.  Such an approach
would make it difficult for a would-be attacker to simply bypass a call to
PatchGuard, as doing so would also bypass some sort of critical system
functionality that would (ideally) be required for the system to operate in
any usable capacity. At this point, the challenge for attackers then turns
into either replicating the critical system functionality that is contained
within PatchGuard, finding a way to split the critical system functionality
away from the system integrity check portions of PatchGuard, or finding a way
to evade PatchGuard's detection of kernel patching entirely.  Microsoft can
make the first two points arbitrarily difficult, especially since the
knowledge of Windows internals is presumably greater inside Microsoft than
outside Microsoft. The incorporation of critical system functionality would be
theoretically easier for Microsoft to do than it would be for would-be
attackers to reliably reverse engineer and re-implement such functionality on
their own, forcing would-be attackers to take the hard route of trying to
separate PatchGuard from critical system functionality.  This is where clever
use of obfuscation and anti-debug techniques would really see maximum
effectiveness, as an attacker would (optimally) have no choice other than to
step through and understand PatchGuard entirely before being able to replicate
the critical functionality contained within PatchGuard (or selectively
activate the critical functionality without activating the system integrity
check).

The latter problem (evading PatchGuard detection entirely) is likely to be a
much more difficult one to tackle, however.  Techniques such as the clever use
of debug registers, TLB desynchronization, and other related attacks are
extremely difficult to detect (and typically very easy to alter to avoid
detection after a known detection scheme for such attacks is developed).  In
this particular respect, Microsoft is presently at a great disadvantage.
Improving PatchGuard to avoid such evasion tactics is likely to prove both
difficult and a poor investment of time relative to how quickly attackers can
adapt and compensate for Microsoft's efforts at bolstering PatchGuard's
capabilities.

Looking towards the future, it can be expected that PatchGuard will ultimately
see the obfuscation-based defensive mechanisms currently in place substituted
with hardware-based defensive mechanisms.  In particular, the author expects
that Microsoft will eventually deploy a PatchGuard version that is augmented
by the hardware-based virtualization (also known as hypervisor) support
present in recent processors (and being developed for Windows Server
``Longhorn'', code-named ``Viridian'').  An implementation of PatchGuard that
is guarded by a hypervisor would be immune to being simply patched out of
existence (which eliminates some of the most significant flaws in current
versions of PatchGuard), at least as long as the hypervisor itself remains
secure and free from exploitable bugs.  In a hypervisor-based system with
PatchGuard, third party drivers would not be permitted to execute with
hypervisor privileges, thus completely preventing runtime patching of
PatchGuard itself (which would be a part of the privileged hypervisor layer).
A hypervisor-based system might also be able to implement concepts such as
write-once memory that could be adapted to prevent the kernel from being
patched in the first place once it is initially loaded into memory (as opposed
to detecting patching after the fact, and bringing down the system in response
to third party drivers performing underhanded deeds).

Even with hypervisor support in-place, however, it is anticipated that there
will still be ways for third parties to alter the behavior of the kernel in
ways not completely authorized by Microsoft.  For instance, as long as support
for debug registers must be retained in order for the kernel debugger to
function, it may be difficult to prevent an approach that utilizes debug
registers to modify execution context at arbitrary locations within the kernel
(at least, not without making the hypervisor completely responsible for
managing all activities relating to the processor's complement of debug
registers).

7) Conclusion

Although PatchGuard version 2 introduces significant improvements in some
areas, it still remains vulnerable to a wide variety of potential attacks.
Additionally, it is possible (though involved) to subvert PatchGuard entirely,
with the purpose of running arbitrary custom code in a difficult-to-detect
manner in the place of PatchGuard.

With these points in mind, it is perhaps time to re-evaluate whether
PatchGuard, in its current incarnation, is really worth all the trouble that
Microsoft has put into it.  Although forcing the IHV and ISV world to clean
house with their kernel mode code is certainly a reasonable goal (and one
which ultimately benefits all Windows customers, no matter how certain
companies with poorly written kernel mode code [8] may care to spin the facts),
as badly written kernel mode code results in the chronic instability that
Windows is often associated with (at best), and privilege escalation and
arbitrary code execution exploits in the worst case.  However, there are still
significant counterpoints to what PatchGuard represents; the fact that it may
provide a convenient way for malicious kernel mode code to hide in a very
difficult to detect manner, and that there is real innovation that is stifled
by the restrictions that PatchGuard places on the system.  As an example of
the latter, consider that Microsoft's very own Virtual Server 2005 R2 SP1
(Beta) runs afoul of PatchGuard and requires a special kernel hotfix to alter
what, exactly, PatchGuard protects in order to run without bugchecking the
system with the infamous CRITICAL_STRUCTURE_CORRUPTION bugcheck made famous by
PatchGuard [3].  This alone should be taken as an indicator that there *are* in
fact legitimate uses for some of the techniques that PatchGuard prevents,
despite Microsoft's insistence to the contrary.  It should also be noted that
despite Microsoft's statements that no exceptions would be made for PatchGuard
[1], they have had to make adjustments at least once for their own code to run on
PatchGuard.  The conspiracy theorists among you might wonder whether Microsoft
would be so gracious as to make such exemptions for legitimate uses of
techniques blocked by PatchGuard for third party software with similar needs
as Virtual Server 2005 R2 SP1, given their pointed statements to the contrary.

As a final note relating to the objectives of PatchGuard, even with hypervisor
technology deployed (and furthermore, even with so-called immutable memory as
implemented by a hypervisor), there is little that can be done to protect
drivers from each other, as even in a hypervisor based system (where the
kernel itself is protected from drvers), interdependent drivers will still be
able to interfere with eachother so long as they co-exist in the same domain.
This is particularly problematic in Windows, given the concepts of device
stacks and device interfaces that allow drivers to directly interact with
eachother in a variety of ways. It will be difficult to ensure that drivers do
not resort to patching eachother (or modifying pool allocations instead of
patching code, in the case where immutable memory on code regions is being
enforced by a hypervisor). Depending on what the objectives of a third party
ISV attempting to bypass PatchGuard are, it may be possible to simply patch
drivers (such as Ntfs.sys or Tcpip.sys) in lieu of patching the kernel. From
this perspective, it is unlikely that Windows will ever become an environment
where kernel mode drivers are completely isolated and unable to interfere with
eachother, despite the efforts of technologies such as PatchGuard.

Microsoft has already started down a path that may eventually lead to a system
where buggy drivers will be unable to crash the system (or patch eachother),
with the advent of the User Mode Driver Framework (UMDF). It remains to be
seen whether isolated user-mode based drivers will become a viable alternative
for high performance devices (such as PCI/PCI Express as opposed to USB
devices), however, instead of simply being confined to a small subset of of
the devices that ship with a typical computer.  The author expects that
whereever possible, Microsoft will attempt to move third party code outside of
sensitive areas (like the kernel) and into more contained locations (such as a
user-mode process).  This is in-line with the purported goals of PatchGuard;
increasing system stability by preventing third party drivers from performing
questionable actions (or at least, questionable actions in such a way that
might bring down the system).

Bibliography

[1] Microsoft Corporation. Patching Policy for x64-Based Systems. 
    http://www.microsoft.com/whdc/driver/kernel/64bitpatching.mspx; 
    accessed December 10, 2006.

[2] skape, Skywing. Bypassing PatchGuard on Windows x64. 
    http://uninformed.org/index.cgi?v=3&a=3&t=sumry; 
    accessed December 10, 2006.

[3] Microsoft Corporation. Connect: Virtual Server 2005 R2 SP1 Beta. 
    https://connect.microsoft.com/site/sitehome.aspx?SiteID=151; 
    accessed December 28, 2006.

[4] Advanced Micro Devices, Inc. AMD 64-Bit Technology 
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf;
    accessed December 28, 2006.

[5] Microsoft Corporation. RtlVirtualUnwind. 
    http://msdn2.microsoft.com/en-us/library/ms680617.aspx; 
    accessed December 28, 2006.

[6] The PaX Team. Paging Based Non-Executable Pages. 
    http://pax.grsecurity.net/docs/pageexec.txt;
    accessed December 30, 2006.

[7] Sherri Sparks and Jamie Butler. "SHADOW WALKER" Raising the Bar for Rootkit Detection. 
    http://www.blackhat.com/presentations/bh-jp-05/bh-jp-05-sparks-butler.pdf; 
    accessed December 30, 2006.

[8] Skywing. Anti-Virus Software Gone Wrong. 
    http://www.uninformed.org/?v=4&a=4&t=sumry; 
    accessed December 31, 2006.