When I used the formula given in Wikipedia (http://en.wikipedia.org/wiki/Password_strength
), I came up with the following:
Formula: H=Log2 N^L
Where H is entropy, N is typable characters on keyboard or number of symbols, and L is length of password
H=72 (not 28)
H=164 (not 44)
correcthorsebatterystaple (using Diceware)
12.9+12.9+12.9+12.9=51.6 (not 44)
Am I missing something? It doesn't seem logical that you can add up the entropy for each individual symbol within its own subset of all symbols (i.e. T is 1 of 26 vs. 1 of 94) to get the total entropy of the password. Doesn't the password have to be considered as an atomic entity for entropy calculation?
There are two things that you're missing - one is that Randall's calculations here are quite approximate - he's willing to drop decimal points on numbers in favour of using lower bounds that are easier to calculate (and to draw as little boxes). The second is more important, and it's in terms of what the underlying values are in the calculation.
The entropy of the password is dependent on the system used to choose the password. So if you got, say, correcthorsebatterystaple from a program that spits out random strings of lower case alphanumeric characters, then the size of its "keyspace" is 26 and its length is 25, giving an entropy of about 116. If the keyspace includes upper case letters as well, that doubles its size to 52 giving an entropy of 142. Including the full set of basic ASCII characters to get the N=94 you're using results in the 164 you had, but if you can produce "correcthorsebatterystaple" from a uniformly random selection of 25 ASCII characters then can I please get you to try for Shakespeare next because that's just not going to happen. But this password wasn't produced from that, it was produced from a keyspace of about 2000 dictionary words, from which we've picked 4, so N=2000 and L=4 giving H=43 or so as per the comic.
Now let's look at "Tr0ub4dor&3". The fact is, hackers *know* what systems people use to produce passwords these days. So yes, again, if the means of producing this password was from a random selection of characters, then your entropy value is just fine. But it's not. Here's how people select their supposedly strong passwords, and how many bits of entropy each step gives:
1. Pick a word from the dictionary. For fairness, let's use the same dictionary we got correcthorsebatterystaple from, so it's got 2000 words or about 11 bits of entropy.
2. Decide whether to capitalise the first letter or not. That's a simple "yes/no" choice, so it adds 1 bit of entropy (practically no-one capitalises a random letter, and in some cases you can probably remove this bit because the password is required to have a capital letter in it).
3. Make some l33t-style subtitutions. In the comic, Randall seems to suggest that this adds about 3 bits of entropy, which would depend on which letters in the word could be substituted and whether there are multiple choices for each substitution, so I'd probably add an extra bit there myself but that doesn't make a massive impact on the final result.
4. Add a symbol and a digit. Where do you add them? Almost certainly at the end. The digit adds another 3 bits of entropy, and the symbol 4, plus 1 more bit for deciding which one to put first.
As I said, hackers these days have a deep understanding of how people choose passwords, and they know that the above system covers probably about 90-95% of all passwords, even those that allegedly meet the requirements for a "strong" password. Again, if they were trying to attack the password by throwing every possible string of 11 characters, then they'd have to try 2^72 different combinations, a pretty crazy amount. But by only using passwords generated from the above 4 steps, their search drops down to 2^28, which as Randall points out is crazily easy for modern cracking methods.
As for the Diceware entropy, it looks like they're using a bigger dictionary than Randall, at 7,500 words, which gives their bigger value for the entropy.