This is a private cached copy of the original Thomas Ptacek article on "rainbow tables". We reference it often in ##PHP, and I got tired of keeping up with their changes to their URL scheme. I'm sure you can find the current, original article with some effort at matasano.com, and many thanks to Mr. Ptacek for writing this! -- TML
Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes
The socialbookmarkosphere is abuzz with talk of “rainbow tables”, what they mean for password security, and why they prove that Microsoft did a shoddy job of securing Windows for Workgroups 15 years ago. This really freaks me out. If the “advanced” pole of your threat model is “rainbow tables”, stop working on your social shopping cart calendar application right now: I can’t trust you with my Reddit karma score, let alone my credit card number.
To begin, password storage 101: servers don’t usually store actual passwords. Instead, they hash the password, store the hash, and discard the password. The hash can verify a password from a login page, but can’t be reversed back to the text of the password. So when you inevitably lose your SQL password table, you haven’t exposed all the passwords; just the crappy ones.
Now let’s re-explain rainbow tables:
take a “dictionary” —- say, of all combinations of alphanumerics less than 15 characters
hash all of them
burn the results onto a DVD.
You now have several hundred billion hash values that you can reverse back to text —- a “rainbow table”. To use,
take your stolen table of hashes
for each hash
find it in the rainbow table.
If it’s there, you cracked it.
Here’s what you need to know about rainbow tables: no modern password scheme is vulnerable to them.
Rainbow tables are easy to beat. For each password, generate a random number (a nonce). Hash the password with the nonce, and store both the hash and the nonce. The server has enough information to verify passwords (the nonce is stored in the clear). But even with a small random value, say, 16 bits, rainbow tables are infeasible: there are now 65,536 “variants” of each hash, and instead of 300 billion rainbow table entries, you need quadrillions. The nonce in this scheme is called a “salt”.
Cool, huh? Yeah, and Unix crypt —- almost the lowest common denominator in security systems —- has had this feature since 1976. If this is news to you, you shouldn’t be designing password systems. Use someone else’s good one.
No, really. Use someone else’s password system. Don’t build your own.
Most of the industry’s worst security problems (like the famously bad LANMAN hash) happened because smart developers approached security code the same way they did the rest of their code. The difference between security code and application code is, when application code fails, you find out right away. When security code fails, you find out 4 years from now, when a DVD with all your customer’s credit card and CVV2 information starts circulating in Estonia.
Here’s a “state of the art” scheme from a recent blog post on rainbow tables and salts:
hash = md5('deliciously-salty-' + password)
There are at least two problems with this code. Yeah, the author doesn’t know what a salt is; “deliciously-salty-” is not a nonce (also, Jeff, your computer really doesn’t care if you seperate the password from the nonce with a dash; it’s a computer, not a 2nd grade teacher).
But there’s a much bigger problem with this code: the letters “md5”.
You’re expecting me to go off on a rant about how there is no redeeming quality to justify using MD5 in 2007. That’s true (MD5 is broken; it’s too slow to use as a general purpose hash; etc). But that’s not the problem.
The problem is that MD5 is fast. So are its modern competitors, like SHA1 and SHA256. Speed is a design goal of a modern secure hash, because hashes are a building block of almost every cryptosystem, and usually get demand-executed on a per-packet or per-message basis.
Speed is exactly what you don’t want in a password hash function.
Modern password schemes are attacked with incremental password crackers.
Incremental crackers don’t precalculate all possible cracked passwords. They consider each password hash individually, and they feed their dictionary through the password hash function the same way your PHP login page would. Rainbow table crackers like Ophcrack use space to attack passwords; incremental crackers like John the Ripper, Crack, and LC5 work with time: statistics and compute.
The password attack game is scored in time taken to crack password X. With rainbow tables, that time depends on how big your table needs to be and how fast you can search it. With incremental crackers, the time depends on how fast you can make the password hash function run.
The better you can optimize your password hash function, the faster your password hash function gets, the weaker your scheme is. MD5 and SHA1, even conventional block ciphers like DES, are designed to be fast. MD5, SHA1, and DES are weak password hashes. On modern CPUs, raw crypto building blocks like DES and MD5 can be bitsliced, vectorized, and parallelized to make password searches lightning fast. Game-over FPGA implementations cost only hundreds of dollars.
Using raw hash functions to authenticate passwords is as naive as using unsalted hash functions. Don’t.
What is the state of the art here?
First, what your operating system already gives you: a password scheme “optimized” to be computationally expensive. The most famous of these is PHK’s FreeBSD MD5 scheme.
The difference between PHK’s scheme and the one you were about to use for your social shopping cart 2.0 application is simple. You were just going to run MD5 on a salt and a password and store the hash. PHK runs MD5 for thousands of iterations. That’s called “stretching”.
PHK’s MD5 scheme is straightforward to code and comes with Linux and BSD operating systems. If you have to choose between the PHP code you have now and PHK’s scheme, you choose PHK’s scheme or you fail your PCI audit. [∗]
The best simple answer is “adaptive hashing”, which Neils Provos and David Mazieres invented for OpenBSD in 1999. Their original scheme is called “bcrypt”, but the idea is more important than the algorithm.
There are three big differences between Provos-Mazieres and PHK’s scheme:
Bcrypt was invented by two smart guys and PHK’s was only
invented by one smart guy. That's literally twice the smart.
Bcrypt uses Blowfish instead of MD5. Blowfish is a block cipher with a notoriously expensive setup time. To optimize Blowfish to run much faster, you’d have to contribute a major advance to cryptography. We security practioners are all “betting people”, and we usually like to place our bets on the side that “demands major advances in cryptography”.
Provos and Mazieres extended Blowfish. They call theirs
"Eksblowfish". Eksblowfish is pessimized: the setup timetakes even longer than Blowfish. How long? Your call. You can make a single password trial take milliseconds, or you can make it take hours.
Why is bcrypt such a huge win? Think of the problem from two perspectives: the server, and the attacker.
First, the server: you get tens of thousands of logins per hour, or tens per second. Compared to the database hits and page refreshes and IO, the password check is negligable. You don’t care if password tests take twice as long, or even ten times as long, because password hashes aren’t in the 80/20 hot spot.
Now the attacker. This is easy. The attacker cares a lot if password tests take twice as long. If one password test takes twice as long, the total password cracking time takes twice as long.
The major advantage of adaptive hashing is that you get to tune it. As computers get faster, the same block of code continues to produce passwords that are hard to crack.
Finally, as your attorney in this matter, I am required to inform you about SRP.
SRP is the Stanford Secure Remote Password protocol. It is a public key cryptosystem designed to securely store and validate passwords without storing them in the clear or transmitting them in the clear.
That design goal is cooler than it sounds, because there’s usually a tradeoff in designing password systems:
You can store a hash of the password. Now if you lose the password database, you haven’t exposed the good passwords. However, you also don’t know the password cleartext, which means that to validate passwords, your customers need to send them to you in the clear.
You can use a challenge-response scheme, where both sides use a math problem to prove to each other that they know the password, but neither side sends the password over the wire. These schemes are great, but they don’t work unless both sides have access to the cleartext password —- in other words, the server has to store them in the clear.
Most practitioners will select the hashing scheme. Both attacks —- stolen databases and phished passwords —- happen all the time. But stolen databases compromise more passwords.
SRP resolves the tradeoff. It’s an extension of Diffie-Hellman. The salient detail for this post: instead of storing a salted password hash, you store a “verifier”, which is a number raised to the (obviously very large) power of the password hash modulo N.
If you understand DH, SRP is just going to make sense to you. If you don’t, the Wikipedia will do a better job explaining it than I will. For the test next Wednesday, you need to know:
SRP is related to Diffie-Hellman.
SRP is a challenge-response protocol that lets a server prove you know your password without your password ever hitting the wire.
SRP doesn’t require you to store plaintext passwords; you store non-reversable cryptographic verifiers.
“Cracking” SRP verifiers quickly would involve a significant advancement to cryptography.
Awesome! Why aren’t you using SRP right now? I’ll give you three reasons:
SRP is patented.
To make it work securely in a browser, you have to feed the login page over SSL; otherwise, like Meebo, you wind up with a scheme that can be beaten by anyone who can phish a web page.
SRP is easy to fuck up, so the first N mainstream Rails or PHP or Pylons SRP implementations are going to be trivially bypassable for at least the first year after they’re deployed.
We learned that if it’s 1975, you can set the ARPANet on fire with rainbow table attacks. If it’s 2007, and rainbow table attacks set you on fire, we learned that you should go back to 1975 and wait 30 years before trying to design a password hashing scheme.
We learned that if we had learned anything from this blog post, we should be consulting our friends and neighbors in the security field for help with our password schemes, because nobody is going to find the game-over bugs in our MD5 schemes until after my Mom’s credit card number is being traded out of a curbside stall in Tallinn, Estonia.
We learned that in a password hashing scheme, speed is the enemy. We learned that MD5 was designed for speed. So, we learned that MD5 is the enemy. Also Jeff Atwood and Richard Skrenta.
Finally, we learned that if we want to store passwords securely we have three reasonable options: PHK’s MD5 scheme, Provos-Maziere’s Bcrypt scheme, and SRP. We learned that the correct choice is Bcrypt.
This blog post was brought to you in part by a grant from the Jon M. Olin Foundation. Major underwriting for Matasano Chargen is provided by Archer Daniel Midland Company. ADM: Feeding A Hungry World. And of course, readers like you!
[∗] Disclaimer: I cannot actually flunk your PCI audit.