Transforming the Encoded Data
The most important part of any decoder is the way in which it
transforms the data from its encoded form to its actual form. For
example, many of the decoders used in the Metasploit Framework and
elsewhere will xor a portion of the encoded data with a key
that results in the actual bytes of the original payload being
produced. While this an effective way of obtaining the desired
results, it's not possible to use such a technique with the
character set limitations currently defined in this paper.
In order to transform encoded data back to its original form, it
must be possible to produce any byte from 0x00 to
0xff using any number of combinations of bytes that fall
within the valid character set. This means that this decoder will
be limited to using combinations of character that fall within
0x01-0x40 and 0x5b-0x7f. To
figure out the best possible means of accomplishing the
transformation, it makes sense to investigate each of the useful
instructions that were identified earlier in this chapter.
The bitwise instructions, such as and, or, and
xor are not going to be particularly useful to this
decoder. The main reason for this is that they are unable to
produce values that reside outside of the valid character sets
without the aide of a bit shifting instruction. For example, it is
impossible to bitwise-and two non-zero values in the valid
character set together to produce 0x00. While xor
could be used to accomplish this, that's about all that it could do
other than producing other values below the 0x80 boundary.
These restrictions make the bitwise instructions unusable.
The imul instruction could be useful in that it is possible
to multiply two characters from the valid character set together to
produce values that reside outside of the valid character set. For
example, multiplying 0x02 by 0x7f produces
0xfe. While this may have its uses, there are two
remaining instructions that are actually the most useful.
The add instruction can be used to produce almost all
possible characters. However, it's unable to produce a few specific
values. For example, it's impossible to add two valid
characters together to produce 0x00. It is also impossible
to add two valid characters together to produce 0xff and
0x01. While this limitation may make it appear that the
add instruction is unusable, its saving grace is the
sub instruction.
Like the add instruction, the sub instruction is
capable of producing almost all possible characters. It is
certainly capable of producing the values that add cannot.
For example, it can produce 0x00 by subtracting
0x02 from 0x02. It can also produce 0xff
by subtracting 0x03 from 0x02. Finally,
0x01 can be produce by subtracting 0x02 from
0x03. However, like the add instruction, there
are also characters that the sub instruction cannot
produce. These characters include 0x7f, 0x80,
and 0x81.
Given this analysis, it seems that using add and
sub in combination is most likely going to be the best
choice when it comes to transforming encoded data for this decoder.
With the fundamental operations selected, the next step is to
attempt to implement the code that actually performs the
transformation. In most decoders, the transform will be
accomplished through a loop that simply performs the same operation
on a pointer that is incremented by a set number of bytes each
iteration. This type of approach results in all of the encoded data
being decoded prior to executing it. Using this type of technique
is a little bit more complicated for this decoder, though, because
it can't simply rely on the use of a static key and it's also
limited in terms of what instructions it can use within the loop.
For these reasons, the author decided to go with an alternative
technique for the transformation portion of the decoder stub. Rather
than using a loop that iterates over the encoded data, the author
chose to use a series of sequential transformations where each block
of the encoded data was decoded. This technique has been used
before in similar situations. One negative aspect of using this
approach over a loop-based approach is that it substantially
increases the size of the encoded payload. While figure
gives an idea of the structure of the decoder,
it doesn't give a concrete understanding of how it's actually
implemented. It's at this point that one must descend from the
lofty high-level. What better way to do this than diving right into
the disassembly?
00000011 6830703C14 push dword 0x143c7030
00000016 5F pop edi
00000017 0139 add [ecx],edi
00000019 030C24 add ecx,[esp]
The form of each transform will look exactly like this one. What's
actually occurring is a four byte value is pushed onto the stack and
then popped into the edi register. This is done in place
of a mov instruction because the mov instruction
contains invalid characters. Once the value is in the edi
register, it is either added to or subtracted from its respective
encoded data block. The result of the add or subtract is stored in
place of the previously encoded data. Once the transform has
completed, it adds the value at the top of the stack, which was set
to 0x4 in the decoder stub header, to the register that
holds the pointer into the encoded data. This results in the
pointer moving on to the next encoded data block so that the
subsequent transform will operate on the correct block.
This simple process is all that's necessary to perform the
transformations using only valid characters. As mentioned above,
one of the negative aspects of this approach is that it does add
quite a bit of overhead to the original payload. For each four byte
block, 11 bytes of overhead are added. The approach is also limited
by the fact that if there is ever a portion of the raw payload that
contains characters that add cannot handle, such as
0x00, and also contains characters that sub cannot
handle, such as 0x80, then it will not be possible to
encode it.
|