What Is A Rainbow Table?
Generically, a rainbow table is nothing more than a time-storage trade-off. Instead of recomputing a function every time you want to attack it, a rainbow table is generated by pre-computing a large number of input permutations to that function. Then, given a result, it should be easy to look-up the result in a table to determine which input(s) generate it. That way, you can effectively reverse a non-reversible function...
Applied to hashing (and in this particular context, password hashing), a rainbow table is generated by generating a large number of candidate passwords (typically random, but may be dictionary based as well), and storing the password->hash mapping in a database or data file. Then simply look-up the hash that you have to get the plain text password that may have generated it.
The First Problem: Storage Space
For a rainbow table to be effective, it must have a lot of candidate passwords in it. Let's take a look at an MD5 rainbow table, and see how much storage space it will require. Let's also assume that it will be stored in MySQL with a char(10) column for the password, and binary(16) column for the hash (storing it in a binary format). So each row will have approximately 26 bytes of data (not including any overhead). And lets look at source passwords of all printable non-control ASCII characters (there are 77 of them).
|Length Of Password||Number Of Possibilities||Size Of Table|
|4 characters||35,153,041||913 MB|
|5 characters||2,706,784,157||70 GB|
|6 characters||208,422,380,089||5.4 TB|
|7 characters||16,048,523,266,853||417 TB|
|8 characters||1,235,736,291,547,681||32 PB (PetaBytes, 10^15)|
As you can see, the number of possibilities goes up quite fast as you support longer passwords. So that means for a rainbow table to be effective, it must actually reduce the number of possible candidates that it stores. After all, who would want to download 32 Petabytes to crack a hash? Sure, you could use a dictionary and permutations on the words to try to reduce the search space significantly without cutting down on effectiveness much (statistically speaking). But that also means a much greater resistance to strong-but-short passwords.
The Second Problem: Hash Algorithms
Hash algorithms are designed with two things in mind: security and speed. Their typical role is to create a MAC (message authentication code) for a document. So by hashing the document, you can tell if the original document is the same as long as the generated hashes match. So since they need to process a lot of data (potentially gigabytes or more), a key requirement is speed. In fact, most modern "secure" algorithms are even faster than their predecessors on modern hardware (for example, sha256 is several times faster than md5 which is much older).
The faster the hash function is, the less reason there is to use a rainbow table. After all, the rainbow table is just a time-storage trade-off (you're reducing time by using more storage). So since hash functions are only getting faster, the benefit of a rainbow table is diminished.
The Third Problem: Salts
Salts are a random token (usually used only once) that is combined with the password before hashing. They are specifically used to prevent the use of a rainbow table. Note that using a salt doesn't directly prevent a rainbow table from being used, it just reduces its effectiveness. It artificially increases the length of a password in the rainbow table (so to crack a 4 character password with a 4 character salt, you'd need to generate an 8 character rainbow table). In practice, most usual lengths of salts are too big to generate a universal rainbow table (for a 32 character salt and 8 character password, the rainbow table would need to be 2.8*10^75 bytes). So another method that attackers use is to steal the salt along with the hash, and then generate a new rainbow table for each salt. That's why it's so important to use a unique salt for each stored password (it reduces the return on investment that the new rainbow table will provide).
Why Were They Popular?
Rainbow tables were popular for one key reason: Up until very recently, disk was significantly cheaper than CPU time. It was easier to pre-compute the rainbow table (which can take a very long time) than to do hashes as needed.
The Reality Today
I know what you're thinking... "Isn't disk space even cheaper today than it was a few years ago?"... Yes it is. But CPU time is even cheaper by several orders of magnitude. In 2000, the cost of a hard drive was about $13 per gigabyte. Today, the cost of a hard drive is about $0.10 per gigabyte. That's 2 orders of magnitude! But if we look at a Pentium 3, it could achieve about 300 mflops (millions of floating point operations per second) for $825, for an average of $2.75 per mflop. A modern Intel i7 can do about 107,000 mflops for $999, averaging about $0.0093 per mflop. That's a 4 order or magnitude difference!
But wait; we have a reasonably new contender! Enter, the GPU. A single Radeon HD 6990M can achieve approximately 1,600,000 mflops for about $700. Computed down, that's a whopping $0.00043 per mflop. That's about an order of magnitude less than the Intel i7, and 5 orders of magnitude less than the P3. Not to mention the raw performance is 4 orders of magnitude greater!
How Many Hashes Per Second?
Well, there's a password cracking tool called John the Ripper. Currently, it can hash up to 514 million (DES crypt()) hashes per second (abbreviated mhps from here out) on a modern 4 core CPU (Intel x7550). When using a more modern algorithm such as sha256, John the Ripper can do a rather measly 200,000 hashes per second. At that rate it would take 3 minutes to generate a 4 character rainbow table. Fast, but not fast enough for our purposes.
Now, let's look at what a GPU can do. Bitcoin currently uses 2 internal sha256 rounds to compute a single "hash". So when we look at the performance numbers they are reporting, we need to realize that's for 2 sha256 hashes. If we look at the fastest single card setup (an ATI 5970), it does over 860 million bitcoin hashes per second. That's over 1.720 billion sha256 hashes per second! And a 3 card setup can hit almost 4.2 billion sha256 hashes per second. So let's take a look at our chart again, this time for a salted sha256 password:
|Length Of Password||Number Of Possibilities||CPU||GPU|
|4 characters||35,153,041||3 minutes||0.0083 seconds|
|5 characters||2,706,784,157||3.75 hours||0.64 seconds|
|6 characters||208,422,380,089||12 days||49 seconds|
|7 characters||16,048,523,266,853||2.5 years||1.06 hours|
|8 characters||1,235,736,291,547,681||195 years||3.4 days|
So, for about $2100, we can have a set of 3 GPUs that can brute force any printable 8 character password possible in about 3.4 days. And that's at the absolute worst case possible. If we started to do intelligence things such as using a dictionary as the base for our search, we could likely find that password much, much faster.
The Other Benefit To Brute Forcing
The other benefit to brute forcing, is you invest practically nothing in the algorithm. For a rainbow table you need to provide both cpu time to generate (a lot of it) and storage space (a lot of it). Not to mention thinks like disk seek time. An average high end hard drive has a seek time of around 4ms. So to merely read the data stored in a rainbow table for a 4 character password, you're spending about 1/2 the time taken by the gpu just seeking in the database file. Then, the computer needs to do a full scan of all of the data to search for the hash value. So in the end, for a 4 character password, it's likely cheaper in all accounts just to brute force it on a GPU than it is to generate a rainbow table.
A Word On Entropy
All of the numbers that I've used in this article are based off the assumption that password choice is fully random. That's the worst case situation. That means that given n bits of data, it would take on average 2^(n-1) tries to have a 50% chance of guessing it. So for a pure random 8 character password (printable characters), you'd need on average about 1.7 days on a GPU to brute force it. Each character in our pure random password has about 6.26 bits of entropy (due to the 77 possible characters, instead of 256). So an 8 character password has about 50 bits of entropy (and this is true, since 2^50 is about 10^15, which is what we calculated above).
But that's not the way of the world. The vast majority of passwords are user generated. And user generated passwords tend to have significantly less entropy. In fact, according to NIST (Appendix A), a 8 character password with symbols and numbers would only have about 18 bits of entropy. It could be 24 bits if there existed both upper-case and lower-case characters. But 2^24 is only about 16 million. So notice that our 4 character random password is actually on average twice as strong as a user-selected 8 character password. In the worst case, it would take the full 2^50 tries to guess a user selected 8 character password, so that's the same. But the 50% chance occurs much sooner at 2^23 than the random password at 2^49.
Speaking of entropy, we're going to revisit the concept in another post soon (specifically about what a recent web-comic pontificated)...
The overall point is simple. A rainbow table is a useful tool. But it's also an outdated tool that doesn't mean nearly as much as it used to. In the era of the cheap GPU, brute forcing is more than a possibility, it's a fact. Using an algorithm because it's resistant to a rainbow table is not only obsolete, it bypasses the bigger problem. You need to hash your passwords so that they are hard to brute force. If they are hard to brute force, they will be hard to rainbow table as well.
Presently, there are about 3 algorithms for PHP that will provide adequate defense against brute forcing. BCrypt (called Blowfish in PHP's docs), PBKDF2 and PHPASS's internal function (in order from strongest to weakest). It's worth noting that projects such as Drupal, PHPBB and Wordpress have all implemented either PHPASS or a derivative thereof. All of the algorithms accept a "work factor" which controls how much CPU time the algorithm takes. By artificially slowing down the hash, brute forcing is made significantly harder (but not impossible).
Use an algorithm that has protections against brute forcing, as protecting against rainbow tables alone is a lost battle...