Uninformed: Informative Information for the Uninformed

Vol 3» 2006.Jan


Table creation

The creation of rainbow tables to precompute the hashes is a good approach to easily breaking the hashes now, but as harddisks grow bigger and bigger while costing ever less, I decided to roll my own table layout instead. As the reader will see, my approach requires way more harddisk space than rainbow tables do since they are computationally less expensive to create and contain a determined set of data, unlike rainbow tables with their less than 100% probability approach to contain a certain password.

In order to create those tables, the big question is how to efficiently store all the data. In order to stay within certain bounds, I decided to stick to alphanumeric tables only. Alphanumeric, that's 26 chars from a-z, 26 chars from A-Z, and additional 10 for 0-9. Thats 62 possible values for each character, so thats 62^7 permutations, right? Wrong. NTLM hashes use the LM hash as input. The LM hashing algorithm upper-cases its input. Therefore the possible keyspace shrinks to 36 characters, and the number of possible permutations goes down to 36^7. The only other input that needs accounting is the NULL padding bytes used, bringing the total permutations to a bit more than 36^7.

The approach taken here to allow for easy storage and recovery of hashes and plain text is essentially to place every possible plaintext password into one of 2048 buckets. It could easily be expanded to more. The table creation tool simply generates every valid alphanumeric password, hashes it and checks the first 11 bits of the hash. These bits determine which of the 2048 buckets (implemented as files in this case) the plaintext password belongs to. The plaintext password is then added to the bucket. Now whenever a hash is captured, looking at the first 11 bits of the hash determines the correct bucket to look into for the password. All that's left to do now is hashing all the passwords in the bucket until a match is found. This will take on average case ((36^7)/2048))/2, or 19131876 hash operations. This takes approximately three minutes on my Pentium 4 2.8 Ghz machine. It takes the NTLM table generation tool 94 hours to run on my machine. Fortunately, I only had to do that once :)

The question is how to store more than 36^7 plaintext passwords, ranging in size from 0(empty password) to 7 bytes.

Approach #1: Store each password separated by newlines. As most passwords are 7 byte in size and an additional newline extends that to 8 byte, the outcome would be somewhere around (36^7)*8 bytes. That's roughly 584 gigabytes, for the alphanumeric keyspace. There has to be a better way.

Approach #2: By storing each password with 7 bytes, be it shorter than 7 or not, the average space required for each password goes down from 8 to 7, as it's possible to get rid of the newlines. There's no need to separate passwords by newlines if they're all the same size. (36^7)*7 is still way too much though.

Approach #3: The plaintext passwords are generated by 7 nested loops. The first character changes all the time. The second character changes every time the first has exhausted the entire keyspace. The third increments each time the second has exhausted the keyspace and so on. What's interesting is that the final 3 bytes rarely change. By storing them only when they change, it's possible to store only the first 4 bytes of each password, and once in a while a marker that signals a change in the final 3 bytes, and is followed by the 3 byte that now form the end of each plaintext password up to the next marker. That's roughly (36^7)*4 bytes = 292 gigabytes. Much better. Still too much.

Approach #4: For each character, there's 37 possible values. A-Z, 0-9 and the 0 byte. 37 different values can be expressed by 6 bits. So we can stuff 4 characters into 4*6 = 24 bits, which is 3 byte. How convenient! (37^7)*3 == 265 gigabytes. Still too much.

Approach #5: The passwords are being generated and stored in a consecutive way. The hash determines which bucket to place each new plaintext password into, but it's always 'bigger' than the previous one. Using 2048 buckets, a test showed that, within any one file, no offset between a password being stored and the next one stored into this bucket exceeded ~55000. By storing offsets to the previous password instead of the full word, each password can be stored as a 2 byte value.

For example, say the first password stored into one bucket is the one char word "A". That's index 10 in the list of possible characters, as it starts with 0-9. The table creation tool would now save 10 into the bucket, as it's the first index from the start of the new bucket, and it's 10 bigger than zero, the start value for each bucket. Now if by chance the one character password "C" was to be stored into the same bucket next, the number 2 would be stored, as "C" has an offset of 2 to the previous password. If the next password for this bucket was "JH6", the offset might be 31337.

Basically each password is being stored in a base36 system, so the first 2 byte password, being "00", has an index of 37, and all the previous password offsets and the offset for "00" itself of the bucket that "00" is being stored in add up to 37. To retrieve a password saved in this way requires a transformation of the decimal index back into the base36 system, and using the resulting individual numbers as indexes into the char keyspace[].

The resulting table size is (36^7 )*2 == 146 gigabytes. Still pretty big, but small enough to easily fit on today's harddisks. As I mentioned earlier the actual resulting size is a bit bigger in fact, as a bunch of passwords that end with null bytes have to be stored too. In the end it's not 146 gigabytes, but 151 instead.