The Flaw

Standard exploit development techniques rarely work well when applied to kernel-level vulnerabilities. The kernel environment is much less friendly to the exploit writer than user mode. Each specific vulnerability will likely require custom techniques. The flaw described in the previous chapter was found in the driver provided by Apple in their Mac OS X version 10.4.7 on Macbooks and Mac Minis running on an Intel processor. This flaw allows an attacker to compromise and gain complete control of a targeted machine. Since the flaw requires a targeted machine to receive and process a wireless management frame, the attacker must be within range in order to transmit the frame^3.1.

As was described above, this flaw was discovered accidentally while fuzz testing other devices. The ``scapy'' fuzzing tool was used to generate wireless management frames with a random numbers of Information Elements (IEs) of random sizes that were then transmitted to the broadcast address^3.2. The Macbook crashed due to a page fault caused by the wireless driver during the processing of one of these fuzz packets. The panic log showed arbitrary memory corruption in the form of overwriting values in source or destination copies in memory. Three crash dumps which are described below clearly show that memory was corrupted during the handling of these fuzz packets.

Example 1: Attempt to access 0x62413863:

panic(cpu 0 caller 0x0019CADF): 
Unresolved kernel trap (CPU 0, Type 14=pagefault), registers:
CR0: 0x8001003b, CR2: 0x62413863, CR3: 0x021d7000, CR4: 0x000006e0
EAX: 0x62413862, EBX: 0x00000003, ECX: 0x0c67bc8c, EDX: 0x00000003
ESP: 0x62413863, EBP: 0x0c67bad4, ESI: 0x03717804, EDI: 0x0371787c
EFL: 0x00010202, EIP: 0x008c923d, CS:  0x00000008, DS:  0x0c670010
<Removed for length>
#3  0x00197c7d in trap_from_kernel ()
#4  0x008c923d in ieee80211_saveie ()
#5  0x008c7303 in sta_add ()
#6  0x008bccb9 in ieee80211_add_scan ()
#7  0x008cd799 in ieee80211_recv_mgmt ()
#8  0x008ddbd9 in ath_recv_mgmt ()
#9  0x008ce9a5 in ieee80211_input ()
#10 0x008de86a in ath_intr ()

Example 2: Attempt to access 0xcc

panic(cpu 1 caller 0x0019CADF): 
Unresolved kernel trap (CPU 1, Type 14=pagefault), registers:
CR0: 0x8001003b, CR2: 0x000000cc, CR3: 0x021d7000, CR4: 0x000006a0
EAX: 0x00000033, EBX: 0x037d8504, ECX: 0x036a4c78, EDX: 0x0360b610
ESP: 0x000000cc, EBP: 0x0c6ebea4, ESI: 0x037d8504, EDI: 0x0369b46c
EFL: 0x00010206, EIP: 0x008c5f03, CS:  0x00000008, DS:  0x00000010
<Removed for length>
#3  0x00197c7d in trap_from_kernel ()
#4  0x008c5f03 in sta_update_notseen ()
#5  0x008c6ba0 in sta_pick_bss ()
#6  0x008bd77c in scan_next ()
#7  0x008bc314 in thread_call_func ()

Example 3: Attempt to copy from 0x41316341

eax            0xaca7000	181039104
ecx            0xc98	3224
edx            0x3263	12899
ebx            0xf	15
esp            0xc6e3714	0xc6e3714
ebp            0xc6e3758	0xc6e3758
esi            0x41316341	1093755713
edi            0xaca7000	181039104
eip            0x1933de	0x1933de <memcpy_common+10>
eflags         0x10203	66051
cs             0x8	8
ss             0x10	16
ds             0x120010	1179664
es             0xc6e0010	208535568
fs             0x10	16
gs             0x900048	9437256
Program received signal SIGTRAP, Trace/breakpoint trap.
0x001933de in memcpy_common ()
2: x/i $eip  0x1933de <memcpy_common+10>:	repz movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
#0  0x001933de in memcpy_common ()
#1  0x03915004 in ?? ()
#2  0x008c6083 in sta_iterate ()
#3  0x008e52b7 in AirPort_Athr5424::ieee80211_notify_scan_done ()
#4  0x008e55b9 in AirPort_Athr5424::setSCAN_REQ ()
#5  0x008b2c91 in IO80211Scanner::scan ()
#6  0x008aa00c in IO80211Controller::execCommand ()
#7  0x0038e698 in IOCommandGate::runAction (this=0x3595300, 
inAction=0x8a9c7c <IO80211Controller::execCommand(OSObject*, void*, void*, 
void*, void*)>, arg0=0x8, arg1=0x399aea5, arg2=0x0, arg3=0xc6e3d2c) at
/SourceCache/xnu/xnu-792.9.72/iokit/Kernel/IOCommandGate.cpp:152
#8  0x008a9284 in IO80211Controller::queueCommand ()

Tracking down the packet that crashes a wireless driver can be frustrating because it's not necessarily the last packet to be received or transmitted. This is important when the number of packets produced and injected can be as many as several thousands per minute. Since the memory overwrites illustrated above cover an entire 32 bit value, like 0x41414141, a method to tag which packet number is responsible for the overwrite can help to cut down on this frustration.

A counter for packet tracking can be inserted into packets when at generation time. There are a few specific places where storing this counter can help with packet identification. The first place is the last 4 bytes of a BSSID with the first two bytes remaining static. For example, 0xcc 0xcc 0x41 0x41 0x41 0x01 is the BSSID of the first packet sent. When the last byte of the MAC address reaches 0xff the next higher byte starts counting. As such, 0xcc 0xcc 0x41 0x41 0x01 0x01 is the BSSID for the 256th packet sent. Likewise, the fuzzer can pad the information-element buffer in the same way with a repeating pattern of 0x41 0x41 0x41 0x01 for the first packet sent. The reason for padding the value with the extra data instead of just setting them to 0x00 is related to the page faults. While 0x41 0x41 0x41 0xf1 may translate to a bad address and cause a page fault during access attempts, 0x00 0x00 0x43 0x12 may be valid and cause no problems. Since kernel panics are the primary source of isolating the flaw at this point, they need to cause a crash instead of silently allowing the kernel to continue executing.

Several tests reveal that the only anomaly common to all the packets that cause overwrite is an overly long Extended Rate Element which is an IE sent by the access point to advertise additional speeds, such as 11mpb, that the access point supports. To verify this, the author changed the script so that it would generate a distinctive pattern in the Extend Rate IE. This pattern showing up in the crash dumps made it possible to prove that it was the ``Extended Rate'' IE that was the problem. The amount of the pattern found in memory made it easy to determine how much memory was corrupted. The following Ruby code shows how the packet was crafted that made it possible to come to this conclusion:

ssid 	= Rex::Text.rand_text_alphanumeric(rand(255))
bssid	= "\x61\x61\x61" + Rex::Text.rand_text(3)
seq	    = [rand(255)].pack('n')
xrate	= Rex.Text.rand_pattern_create(240)
  frame =
  "\x80" +
  "\x00" +
  "\x00\x00" +
  "\xff\xff\xff\xff\xff\xff" +
  bssid +
  bssid +
  seq +
  Rex::Text.rand_text(8) +
  "\xff\xff" +
  Rex::Text.rand_text(2) +
  #ssid tag
  "\x00" + ssid.length.chr + ssid +
  #supported rates
  "\x01" + "\x08" + "\x82\x84\x8b\x96\x0c\x18\x30\x48" +
  #current channel
  "\x03" + "\x01" + channel.chr +
  #Xrate
  "\x32" + xrate.length.chr + xrate

When this packet is transmitted, the victim machine will not crash right away. The vulnerable code does not process the packets the instant they are received. The packets are instead only processed when the information is needed for a scan. OS X produces a new scan every five minutes. As such, the machine may take up to five minutes to crash after receiving a corrupted packet. Pinning down this bug meant that forcing a scan would be necessary.

As luck would have it, Apple provides a tool called airport for this sort of thing (located in /System/Library/PrivateFrameworks/Apple80211.framework/Versions/A/Resources). Executing airport –z will disassociate the machine from whatever wireless access point it is currently using. Executing airport –s will force the driver to run a scan and report all access points within range. In order to crash the machine quickly after a corrupted Extended Rate IE is sent, the author ran the command airport –s –r 10000. The ``-r'' option tells the airport command to repeat an action a given number of times which, in this case, causes 10000 re-scans.

Running this command would cause the machine to reliably crash in the same manner every time. This makes it possible to figure out where, precisely, the wireless driver is a crashing. In this case, the corrupted IE in the packet that is transmitted causes a crash in a memcpy called from a function named ath_copy_scan_results in the Apple driver. It appears that the attacker can influence where the memcpy will read from and how much data will be copied. Since an attacker can copy arbitrary data from one area of memory (such as the packet) to another area of memory, it will most likely be possible to gain code execution.

If no scan is forced and the target machine is not associated with an access point, a different crash will reliably occur in a memcmp called from a function named sta_add. The memcmp is meant to check to see if a BSSID is the same as one that has been stored. However, the overflow corrupts a structure so that it compares the pointer to the new BSSID against a pointer that the attacker can set.

Most of the beacon intervals in the test scripts are set to 0xffff, which is a little over 67 seconds. This means that a machine that receives and adds one of these beacon packets into its scan cache is not expecting to get another update from the BSSID for a little over 67 seconds. Generally, management frame fuzzing means the creation of something like a fake beacon frame that is quickly injected and forgotten. A real AP would continue sending beacon packets to let a potential client know it is still available. A driver will wait up until its beacon interval before taking actions such as marking the AP with the missed beacon as non-preferential for connection or even removing it from the scan cache altogether. In order to have many packets processed, the author set the beacon interval time to its maximum so the driver would not get suspicious for at least 67 seconds, thus allowing time for the fake AP to go through processing. In other words, most beacons are sent with intervals of several times a second. By using the maximum interval, one only needs to send a corrupted beacon packet once a minute.

If the memcmp crash does not occur during normal operations, a crash in a function called sta_update can occur. Although the specific locations that the crash occurs at within this function can be different, the crash will occur reliably with the same data if the malicious frame is the same.

Analyzing these repeated crashes helps to localize where memory corruption is occurring in the code. This can include static analysis using tools like IDA Pro to read the compiled driver code. This can also include dynamic analysis such as by stepping through the code with a debugger like gdb to watch step-by-step what the driver does when it overwrites memory. Debugging a kernel driver in real-time requires setting up two machines for gdb and enabling the kernel core dump facility. There are numerous documents on how to set up live kernel debugging with gdb, so rather than rehashing the information[3].

The specific OS X boot settings the author uses involve setting the nvram boot-args argument to debug=0xd44 _panicd_ip=192.168.1.1 –v. This setting is the easiest for two machine debugging, however, the target machine will no longer produce a panic log.

Things I Wish Google told me: kernel core dumps on Intel are broken

The core kernel dumping functionality on the Intel architecture appears to be broken. Following the directions for the target and development machine yielded no core dumps. After investigating this problem, it seems to stem from the fact that the panicing machine performs no ARP resolution during a crash. The panicing machine instead forwards information to its default router. OS X expects the default router to forward this information to the core dump server. The author has found that the best way to encourage proper handling is to place the development machine on a different subnet from the target machine. Keep in mind that this information was gleaned through a series of changes and tests and observations with a network sniffer. Setting the ARP entry statically with the command arp -s did not help.