|Informative Information for the Uninformed|
Next: Transferring Control to the Up: Implementing the Decoder Previous: Determining the Stub's Base Contents
The most important part of any decoder is the way in which it transforms the data from its encoded form to its actual form. For example, many of the decoders used in the Metasploit Framework and elsewhere will xor a portion of the encoded data with a key that results in the actual bytes of the original payload being produced. While this an effective way of obtaining the desired results, it's not possible to use such a technique with the character set limitations currently defined in this paper.
In order to transform encoded data back to its original form, it must be possible to produce any byte from 0x00 to 0xff using any number of combinations of bytes that fall within the valid character set. This means that this decoder will be limited to using combinations of character that fall within 0x01-0x40 and 0x5b-0x7f. To figure out the best possible means of accomplishing the transformation, it makes sense to investigate each of the useful instructions that were identified earlier in this chapter.
The bitwise instructions, such as and, or, and xor are not going to be particularly useful to this decoder. The main reason for this is that they are unable to produce values that reside outside of the valid character sets without the aide of a bit shifting instruction. For example, it is impossible to bitwise-and two non-zero values in the valid character set together to produce 0x00. While xor could be used to accomplish this, that's about all that it could do other than producing other values below the 0x80 boundary. These restrictions make the bitwise instructions unusable.
The imul instruction could be useful in that it is possible to multiply two characters from the valid character set together to produce values that reside outside of the valid character set. For example, multiplying 0x02 by 0x7f produces 0xfe. While this may have its uses, there are two remaining instructions that are actually the most useful.
The add instruction can be used to produce almost all possible characters. However, it's unable to produce a few specific values. For example, it's impossible to add two valid characters together to produce 0x00. It is also impossible to add two valid characters together to produce 0xff and 0x01. While this limitation may make it appear that the add instruction is unusable, its saving grace is the sub instruction.
Like the add instruction, the sub instruction is capable of producing almost all possible characters. It is certainly capable of producing the values that add cannot. For example, it can produce 0x00 by subtracting 0x02 from 0x02. It can also produce 0xff by subtracting 0x03 from 0x02. Finally, 0x01 can be produce by subtracting 0x02 from 0x03. However, like the add instruction, there are also characters that the sub instruction cannot produce. These characters include 0x7f, 0x80, and 0x81.
Given this analysis, it seems that using add and sub in combination is most likely going to be the best choice when it comes to transforming encoded data for this decoder. With the fundamental operations selected, the next step is to attempt to implement the code that actually performs the transformation. In most decoders, the transform will be accomplished through a loop that simply performs the same operation on a pointer that is incremented by a set number of bytes each iteration. This type of approach results in all of the encoded data being decoded prior to executing it. Using this type of technique is a little bit more complicated for this decoder, though, because it can't simply rely on the use of a static key and it's also limited in terms of what instructions it can use within the loop.
For these reasons, the author decided to go with an alternative technique for the transformation portion of the decoder stub. Rather than using a loop that iterates over the encoded data, the author chose to use a series of sequential transformations where each block of the encoded data was decoded. This technique has been used before in similar situations. One negative aspect of using this approach over a loop-based approach is that it substantially increases the size of the encoded payload. While figure gives an idea of the structure of the decoder, it doesn't give a concrete understanding of how it's actually implemented. It's at this point that one must descend from the lofty high-level. What better way to do this than diving right into the disassembly?
00000011 6830703C14 push dword 0x143c7030 00000016 5F pop edi 00000017 0139 add [ecx],edi 00000019 030C24 add ecx,[esp]
The form of each transform will look exactly like this one. What's actually occurring is a four byte value is pushed onto the stack and then popped into the edi register. This is done in place of a mov instruction because the mov instruction contains invalid characters. Once the value is in the edi register, it is either added to or subtracted from its respective encoded data block. The result of the add or subtract is stored in place of the previously encoded data. Once the transform has completed, it adds the value at the top of the stack, which was set to 0x4 in the decoder stub header, to the register that holds the pointer into the encoded data. This results in the pointer moving on to the next encoded data block so that the subsequent transform will operate on the correct block.
This simple process is all that's necessary to perform the transformations using only valid characters. As mentioned above, one of the negative aspects of this approach is that it does add quite a bit of overhead to the original payload. For each four byte block, 11 bytes of overhead are added. The approach is also limited by the fact that if there is ever a portion of the raw payload that contains characters that add cannot handle, such as 0x00, and also contains characters that sub cannot handle, such as 0x80, then it will not be possible to encode it.