Uninformed: Informative Information for the Uninformed

Vol 1» 2005.May


Next: The Hybrid Up: Botnets Previous: Botnets   Contents

The Linked Network

Though I called this approach a passive network, as the nodes idle and wait for commands to come to them, this type of botnet is in fact quite active. The mechanisms described now will not (easily) work when most of the nodes are on dynamic IP addresses. It is thus more interesting for nodes installed after exploiting some kind of server software. Of course, while not solving the uptime problem, a rogue dyndns account can always give a dynamic IP a static hostname.

This kind of network focuses on all of its nodes forming some kind of self-organizing peer to peer network. A node that infects some other host can send over the botnet program and make the new host link to itself, thus becoming that node's parent. This technique can make the infected hosts form a sort of tree structure over time, as each newly infected host tries to link to the infecting host. Updates, information, and commands can be transmitted using this worm network to reach each node, no matter which node it was sent from, as each node informs both child nodes as well as its parent nodes. In its early (or final) stages, a network of this type might look like this piece of ascii art:

    Level
    0      N
         /   \
    1   N     N
       / \   /
    2 N   N N

To make sure a 'successful' node that infects lots of hosts doesn't become the parent of all of those hosts, nodes must refuse link requests from child nodes after a certain number have been linked (say 5). The parent can instead in form the would-be child to link to one of its already established children instead. By keeping track of the number of nodes linked to each location in the tree, a parent can even try to keep the tree thats hierarchically below it well balanced. This way a certain node would know about its parent and up to 5 children, thus keeping the number of other hosts that someone who compromises a node rather low, while still making sure to have a network that's as effective as possible. Depending on the number of nodes in the entire network, the amount of children that may link to a parent node could be easily changed to make the network scale better. As each node may be some final link as well as a parent node, every host runs the same program. There's no need for special 'client' and 'server' nodes.

Whats the problem with a tree structure? Well, what if a parent fails? Say a node has 3 children, each having 2 children of its own. Now this node fails because the owner decides to reinstall the host. Are we left with 3 networks that can't communicate with each other any more? Not necessarily. While possibly giving a forensics expert information on additional hosts, to increase reliability each node has to know about at least one more upstream node that it can try to link to if its parent is gone. An ideal candidate could be the parent's parent. In order to make sure that all nodes are still linked to the network, a periodic (once a day) sort of ``ping'' through the entire network has to happen in any case. By giving a child node the IP of its ``grandparent'', the direct parent of the child node always knows that the fail-over node, the one its kids will try to link to if it should fail, is still up and running.

Though this may help to address the issue of parent death, another issue remains. If the topmost node fails, there are no more upstream nodes that the children could link to. Thats why in this case the children should have the ip of one(!) of its siblings as the fail-over address so that they can make this one the new top node in the case of a fail-over condition. Making use of the node-based ping, each node also knows how many of its children are still up and running. By including this number into the ping sent to the parent, the topmost node could always tell the number of linked hosts. In order to not have to rely on connecting to the topmost node to collect this type of information, a simple command can be implemented to make the topmost node report this info to any node on the network that asks for it. Using a public key stored into all the nodes, it's even possible to encrypt every piece of information thats destined for the botnet owner, making sure that no one besides the owner can decrypt the data. Although this type of botnet may give a forensics expert or someone with a sniffer information on other nodes that are part of the network, it also offers fast response times and more flexibility in the (ab)use of the network compared to the previous approach with the unlinked nodes. It's a sort of trade off between the biggest possible level of anonymity on one hand, and flexibility on the other. It is a huge step up compared to all of the zombies sitting on IRC servers right now, where a single channel contains the zombies of the entire botnet. By employing cryptography to store the IPs of the child and parent nodes, and keeping those IPs only in RAM mitigates the problem further.

Once a drone network of this type has been established with several hundreds of hosts, there are lots of possibilities of putting it to use. To conceal the originating IP address of a connection, hopping over several nodes of the drone network to a target host can be easily accomplished. A command packet tells one node to bind a port. Once it receives a connection on it, it is told to command a second node to do the same, and from then on node 1 forwards all the traffic to node 2. Node 2 does the same, and forwards to node 3, then 4, maybe 5, until finally the last node connects to the intended destination IP. By encrypting the entire connection from the original source IP address up to the last node, a possible investigator sniffing node 2 will not see the commands (and thus the IP addresses) which tell node 3 to connect to node 4, node 4 to node 5, and of course especially not the destination host's address. An idle timeout makes sure that connections don't stay up forever.

As manually updating several hundreds or thousands of hosts is tedious work, an easy updating system should be coded into the nodes. There are basically two possible ways to realize that. A command, distributed from node to node all over the network, could make each node replace itself with a newer version which it may download from a certain HTTP address. The other way is by updating the server software on one node, which in turn distributes this update to all the nodes it's linked to (children and parent), which do just the same. Cryptographic signatures are a must of course to make sure someone doesn't replace all of the precious nodes with SETI@home. Vlad902 suggested a simple and effective way to do that. Each node gets an MD5 hash hardcoded into it. Whenever someone offers a software update, it will download the first X bytes and see wether they hash to the hardcoded value. If they do, the update will be installed. Of course, a forensics expert may extract the hash out of an identified node. However, due to the nature of cryptographic hashes, he won't be able to tell which byte sequence generates that hash. This will prevent the forensics export from creating a malicious update to take down the network. As the value used to generate the hash has to be considered compromised after an update, each update has to supply a new hash value to look out for.

Further security mechanisms could include making the network completely memory resident, and parents keeping track of kids, and reinfecting those if necessary. What never hit the hard-disk can obviously not be found by forensics. Also, commands should be time-stamped to make sure a certain command will only work once, and replay attacks (sending a sniffed command packet to trigger a response from a node) will fail. Using public key cryptography to sign and encrypt data and communication is always a nice idea too, but it also has 2 disadvantages:

  1. It usually produces quite a big overhead to incorporate into the code.
  2. Holding the one and only private key matching to a public key thats been found on hundreds of hacked hosts is quite incriminating evidence.

An additional feature could be the incorporation of global unique identifiers into the network, providing each node with a unique ID that's set upon installation on each new victim. While the network master would have to keep track of host addresses and unique IDs, he could use this feature to his advantage. Imagine a sort of traceroute within the node network. The master wants to know where a certain host is linked to. Every node knows the IDs of all of the child nodes linked hierarchically below it. So he asks the topmost node to find out the path to the node he's interested in. The topmost node realizes it's linked somewhere under child 2, and in turn asks child 2. This node knows it's linked somewhere below child 4, and so on and so on. In the end, the master gets his information, a couple of IDs, while no node thats not directly linked to another gets to know the IPs of further hosts that are linked to the network.

Since a portscan shouldn't reveal a compromised host, a raw socket must be used to sniff command packets off the wire. Also, command packets should be structured as unsuspicious as possible, to make it look like the host just got hit by yet another packet of ``internet background noise''. DNS replies or certain values in TCP SYN packets could do the trick.


Next: The Hybrid Up: Botnets Previous: Botnets   Contents