smorrey wrote:Each MD5 hash is comprised of exactly 16 bytes stored as a 32 plain text character string.

Therefore the hash can only contain 0-9 and a-f

This is just a choice of representation. There is absolutely nothing fundamental about 0-9, a-f. It is popular to represent hashes as hexadecimal values, but if you wanted you could express them as octal, decimal, binary and anything else that strikes your fancy.

This produces an MD5 hash of

9cc2ae8a1ba7a93da39b46fc1019c481

So your password , no matter how long is actually converted to something 16 bytes long and won't have any "special" characters, the hash itself is not case sensitive either (however the algorithm will produce different output with different letter cases).

*Sigh*... Hexadecimal is just a choice of number system. An MD5 has 16 bytes of *DATA*. By your line of reasoning I could express the hash as binary and then claim a hash doesn't contain anything other than 0 or 1 as if that meant anything. Incidentally, 16 bytes is 128 bits, which is much more space than any of the passwords that have been discussed in this thread.

But don't forget, you don't actually need the original password, since it's now been converted to a fixed length string all you need is something that computes to that same fixed length string. For those who don't know, this is called a hash collision.

This is not what a collision is. Finding a text that corresponds to a given hash is a "pre-image". There are three main properties that a cryptographic hash should have: Pre-image resistance, Second pre-image resistance and Collision resistance. These are different things. In particular, finding collisions is much easier than finding a pre-image.

Since the string is a fixed length of 16 bytes stored as 32 characters, this leaves us with 16^32 possible combinations without a collision.

This is nonsense and it gives the correct value entirely by chance. The only reason 16^32 is correct is because each hexadecimal digit has 16 options (from "0" to "f"), not because the hash has 16 bytes. Just try doing the same calculation for a different hash, like SHA-1, which has 20 bytes and a hexadecimal representation of 40 characters.

What you have done in your calculation is take the length of the hash in one unit (bytes) and raise it to the length of the hash in a different unit (hex).

That may sound like a lot and on a single core computer generating 1 hash per microsecond (pretty slow by today's standards) you are talking 584,554.531 years before you are guaranteed a collision.

Forgetting for a moment that storing all possible MD5 hashes ought to consume a minimum of 16^32 bytes of storage;

How do you figure that MD5 hashes consume 16^32 bytes of storage? Each hash requires 16 bytes to store (32 if you store it as ASCII hex) and there are 2^160 possible MD5 hashes. That works out to 68.7 billion times more storage than you suggested. Anyway, all you have done here is describe a brute force attack. I think everyone here is familiar with brute force attacks. Whether you have access to the hashes or not, a brute force attack is the slowest type of attack possible.

Therefore if you were a person with sinister motives in control of a small botnet with say 16,000 cores and you maxed out all cores you should be able to generate a password that will match any MD5 hash within just a few minutes.

Provided that they have access to the hash so they can attack offsite, and provided that the company used MD5 instead of something like PBKDF2.