Hashing passwords

By | April 27, 2011

If there’s one thing that really annoy me, it’s when I forget my password at some random website, ask for a password reminder and get my old password sent back to me in clear text in an email. This shows so many levels of ignorance in the people that developed the system, that I immediately feel like deleting my profile and never come back.

I started thinking about this again today when I read Sony’s announcement about the PlayStation Network compromise, and got a bit surprised when they also listed passwords as one of the pieces of information that might have been stolen. Surely Sony can’t be that unprofessional, storing passwords in clear text!?

I’m not sure we’ll ever figure that out, but anyway, if you’re ever finding yourself developing a system that needs to store passwords, please continue reading.

So what’s the deal with storing passwords as clear text? Well, first of all, it’s highly insecure. There’s generally no need to store a password as clear text.

Wondering how you’re supposed to check if the user enters the right password if you can’t store it as clear text? That’s when hashing enters the picture.

Hashing

Simply put, a hash function takes input data and outputs a new value based on this input data. It’s stable, so every time you give it the same input data, you get the same output data. Typically the output data is of fixed length, no matter how much or little input data you give it. One typical example usage is to test the integrity of a file you downloaded. Say the file was downloaded from a mirror site; You can’t be 100% sure no one has changed this file while it was stored at the mirror site. Having the hash of the file content, provided that this hash value is given to you by someone you trust (like the source site), you can now compare this hash value with the hash value of your local file.

Typically we use hashing functions like SHA-1, MD5, or something similar when calculating these hash values.

So how does this relate to passwords? Well, what you should do, instead of storing passwords as clear text, is to hash them using something like SHA-1. Then the next time your user logs in, you rehash the newly entered password and check if it’s equal to the stored hash.

Put some salt on it!

But, please, don’t stop here. Hashing alone isn’t enough! There’s something called rainbow tables. This is basically a huge registry of all the available hash values and their clear text equivalent. As computing power and storage space continues to fall in price, it’s become trivial to gain access to these types of utilities for cracking hashed passwords.

So how do you fix that? It’s all about increasing the cost of cracking your users password. As we can’t trust users to pick a good password, the first thing you should do is to enforce some minimum password standard, say, lower and upper case letters, one number, minimum length, etc.

Secondly, you should salt you input. That means, in addition to hashing the password you should add something else to the password. There are many ways to pick this additional part, but it needs to be something you know the next time you need to hash the same password. It could be a fixed value you store in your code, a configuration file, or a database. The downside of this is that if someone steals your password hashes, they might also get the salt value. Still, this makes it a bit more costly to crack, as they would basically need to create a new rainbow table to look for weak passwords. However, in addition to this salt value, you could further increase the cost of cracking a password by adding something unique to each users hashed password. That could be their user ID, login name or email.

Then you would, say, have the following input to your hashing function: userID + email + secret salt value + user password => hashed password, which might look like: 33cde15ec0621256153199ccab601e7d320195bf

You could take it further, but using this is way better than storing passwords in clear text. And it’s trivial to accomplish in all major relevant programming languages.

PS: And to make it clear, as some of the comments point out: Whenever you deal with cryptography and random numbers it’s usually better to rely on ready-made libraries of good quality.

7 thoughts on “Hashing passwords

  1. Hans PUFAL

    Be aware that the French government is trying to put in place a law which, among other things, will REQUIRE passwords to be stored in plain-text. A more stupid idea I cannot imagine for a whole host of reasons!

    1. Christian Felde

      Wow, that really is stupid! Do you have any more info on this? Any risk it might become/proposed as EU law?

  2. SeanJA

    I like to look through The On-Line Encyclopedia of Integer Sequences and pick out one to use as the salt, makes it more interesting.

  3. Richard Clark

    Don’t do this yourself. Don’t. Real, actual cryptographers who understand the weaknesses inherent in just running a hash function over some plaintext you made up on the spot have already done all the work for you.

    Libraries such as bcrypt and scrypt have solved this problem completely, they use the algorithm in the correct form to prevent extension attacks, they run the algorithm in multiple rounds to enhance resistance against brute force assaults with a dictionary, they general random salts for every single password and store them in a standard format to defeat rainbow tables.

    Doing this shit yourself is almost as bad as just stuffing the password in the database without hashing it at all.

  4. Bram

    Just use an HMAC. It never ceases to amaze me why people find the need to come up with their own bizarre hashing techniques. Every sensible programming languages has HMAC libraries, so there’s really no excuse not to use them.

    That being said, if you are going to be doing it yourself, you should have the salt *AFTER* the string, not before. You’re setting yourself up for length extension attacks otherwise.

  5. Carlos

    I believe the salting should be done with variables that are sure to remain unchanged. Using ‘userID + e-mail + …’ would not be such a good idea because if the user updates his e-mail address, the final hashing result will no longer match, unless the hash is updated anytime the e-mail address is updated… There can be another debate about in which situations this would be good and when not.

Comments are closed.