Uninformed: Informative Information for the Uninformed

Vol 3» 2006.Jan

Thread APC

One of the most logical ways to go about staging a payload from R0 to R3 is through the use of Asynchronous Procedure Calls (APCs). The purpose of an APC is to allow code to be executed in the context of an existing thread without disrupting the normal course of execution for the thread. As such, it happens to be very useful for R0 payloads that want to run an R3 payload. This is the technique that was discussed at length in the eEye's paper[2]. A few steps are required to accomplish this.

First, the R3 payload must be copied to a location that will be accessible from a user-mode process, such as SharedUserData. After the copy has completed, the next step is to locate the thread that the APC should be queued to. There are a few important things to keep in mind in this step. For instance, it is likely the case that the R3 payload will want to be run in the context of a privileged process. As such, a privileged process must first be located and a thread running within it must be found. Secondly, the thread that will have the APC queued to it must be in the alertable state, otherwise the APC insertion will fail.

Once a suitable thread has been located, the final step is to initialize the APC and point the APC routine to the user-mode equivalent address via nt!KeInitializeApc and insert it into the thread's APC queue via nt!KeInsertQueueApc. After that has completed, the code will be run in the context of the thread that the APC was queued to and all will be well.

One of the major concerns about this type of approach is that it will generally have to rely on undocumented offsets for fields in structures like EPROCESS and ETHREAD that are very volatile across operating system versions. As such, making a portable payload that uses this technique is perfectly feasible, but it may come at the cost of size due to the requirement of factoring in different offsets and detecting the version at runtime.

The approach outlined by eEye works perfectly fine and is well thought out, and as such this subsection will merely describe ways in which it might be possible to improve the existing implementation. One way in which it might be optimized would be to eliminate the call to nt!PsLookupProcessByProcessId, but as their paper points out, this would only be possible for vulnerabilities that are triggered outside of the context of the Idle process. However, for cases where this is not a limitation, it would be easier to extract the current thread's process from Kpcr->Kprcb->CurrentThread->AcpState->Process. This can be accomplished through the following disassembly4.4:

00000000  A124F1DFFF        mov eax,[0xffdff124]
00000005  8B4044            mov eax,[eax+0x44]

After the process has been extracted, enumeration to find a privileged system process could be done in exactly the same manner as the paper describes (by enumerating the ActiveProcessLinks).

Another improvement that might be made would be to use SharedUserData as a storage location for the initialized KAPC structure rather than allocating storage for it with nt!ExAllocatePool. This would save some space by eliminating the need to resolve and call nt!ExAllocatePool. While the approach outlined in the paper describes nt!ExAllocatePool as being used to stage the payload to an IRQL safe buffer, it would be equally feasible to do so by using nt!SharedUserData for storage.