Adding a Pinch of Salt

Following the recent LinkedIn breach, the company has stated that their current production database contains salted passwords. Obviously this was not the case at the time of the breach (SHA1, unsalted), so a salt value must have been added to improve security. But how can you add a salt value to a password hash, if you don't know the password?

Firstly, let's consider the difference between a salted and unsalted password hash:

salted_password_hash = hash_function(salt_value + password)
unsalted_password_hash = hash_function(password)

In other words, normally it is not possible to produce a salted password hash without the password. If migrating from unsalted passwords, you obviously do not have this information! There is a simple workaround however - simply add an extra hash function into the mix:

hash_function(salt_value + hash_function(password)) = new_salted_password_hash

As you already know the result of hash_function(password) for each user - it's stored in the database of course - you can simply add a salt to the existing password hash, and then hash again.

The last step in the migration is to re-engineer the login mechanism. Personally, I like to perform the first hashing function client-side, rather than performing both on the server. Whilst protecting the password with a hash between client and server will not help secure your application (anyone intercepting the hash will still be able to authenticate with that hash), it does help ensure that the attacker cannot take the password and use it against other on-line services where the victim has reused their password. For example:

Client-side hashing of password - unsalted password hash sent to server

hash_function(password) = unsalted_password_hash

Server receives unsalted hash from client - adds salt, hashes and compares with database stored hash

hash_function(salt_value + unsalted_password_hash) = salted_hash

It isn't critical that it's performed client-side (as you should be protecting this authentication with appropriate encryption), and there are a number of reasons why you might not want to do this; reliance on client-side scripting languages and performance to name two. If you don't want to hash client-side for these reasons, re-engineer your login function to perform the following pseudo code:

if (hash_function(salt_value + hash_function(password)) == salted_password_hash_from_database) then

Alternatively, add further logic to the whole authentication process, whereby the application handles client-side hashing if supported, or falls back to server-side hashing when a clear-text password is received.

Adding a Secret

Now this all seems very secure, but if your database is compromised, dictionary attacks are always going to be possible as long as the inputs to the hashing function are known or can be guessed. If your database contains a column called 'salt', chances are an attacker is going to be able to guess the make-up of the hash function. Likewise, simply using the username as a salt isn't going to add much security - as an attacker will try and guess such basic concatenation. However, if you add an extra input to the hashing function that isn't stored in the database, the attacker's job becomes much harder. Consider the following:

hash_function(secret + username + unsalted_password_hash) = salted_password_hash_in_database

The secret can be hardcoded in the application, in a configuration file, or any where on the file system. Wherever it is, make sure it is not readable by the database user account. Effectively the attacker must obtain server-side code execution or similar to have all of the constitute parts of the password hash to conduct an offline dictionary attack. Just make sure it's suitably long (so it cannot be guessed/brute-forced) - I would recommend generating a random 128bit secret for each application.

The Result

Using all of these options, we have taken our unsalted passwords and added a salt. More importantly, we have significantly lowered the impact of full database compromise (at least with regards to passwords). An attacker cannot attempt a rainbow table attack as the passwords are salted, and cannot perform a dictionary attack as the secret is stored outside of the database. The salted password is useless - it can't be used to authenticate against this or any other site.

Note I haven't mentioned any specific hashing algorithms up to this point - as a very basic view would suggest given enough time, any hash can be cracked. However, without the required 'secret' you cannot perform an offline attack. As an on-line brute force attack against a web application is simply not feasible due to speed, the choice of algorithm simply comes down to how many CPU cycles you want an attacker (that hasfully compromised both your application and database) to use.

That said, please don't use known weak algorithms (MD5 and SHA1). The bcrypt algorithm is fantastic at slowing down password guessing attacks, and adds a further salt value into the mix (however it is stored with the password, and therefore only the database needs to be compromised if used in isolation). If you don't want to implement your own bcrypt function, or use an external library, make sure you pick at least SHA256 or greater.