## Complicated questions about cryptanalysis

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

drunken
Posts: 181
Joined: Tue May 27, 2008 5:14 am UTC

### Complicated questions about cryptanalysis

Firstly forgive me if there is already a thread that answers these questions, but it is not easy to find as the discussion of the comic "password strength" spams up the search results for any related queries. Secondly I hope this is the right forum for this, I also considered the coding and mathematics forums. I assume the moderators will move it if it needs to be moved.

I have been thinking recently about password strength and website security, mostly as I am not administering my own website, and while I am not programming the security algorithms myself, I still try to at least know the basic principles of things other people do for me so I can check if it is done right.

I have searched a lot on the internet for answers but I seem to be faced with a choice between oversimplified and overcomplicated write ups. I can either read 'how not to be an idiot when creating a password' or scientific articles on the detailed math involved. What I want is to clarify some assumptions I have made about how passwords are hacked, many of which don't fit with what I am being told in the oversimplified articles.

I am ignoring social engineering and phishing attacks, and that leaves to my knowledge only various forms of brute force, either simple brute force, or dictionary attacks which are simply a brute force on a more likely subset of potential password space. If there is another method this might explain my confusion.

Assumptions that may be false:
1) The password '1' has the same length cryptographic hash as the password '11111111111111111111111' and also the same as the password 'dFSDsa&\$124asGIF78WE#E\$hbf'
2) Someone attempting to crack your password generally has access to your hash, which is hopefully salted
3) There is no way to incrementally attack a password, ie. if my password is 'I am god' there is no way to ascertain that the first letter is a capital I, unlike in hollywood where all password/pin cracking is done character by character.

These assumptions lead me to a number of corollaries which may also be false. These directly dispute the information provided in the passwords for idiots guides.
For example 'use both capital and lower case letters and numbers and punctuation'. For a four character password, the only reason the password '9843' is weaker than '9tJ\$' is that numbers are likely to be bruteforced first under the assumptions that the hackers will start with every character as 0x00 (or the first printable character 0x20) and count up, leading them to hit any combination of only numbers before any combination of only letters. But if we assume that then the strongest password is '~~~~'. On the other hand hackers might start at the top and work down, in which case the strongest password is ' '.* Given that we don't know if they will start at the top or the bottom a good compromise would be '@@@@' being in the middle of the table. It seems it makes no difference at all if you use a combination of upper and lower case, a computer sees all characters as hex values.

I like number only passwords, I generate them randomly and then memorise the sequence. I don't know why I find it easier to memorise random sequences of numbers than letters but there it is. People seem to think 'Oh me yarm** number only passwords, are you mad?!'. Why is this wrong? If I have a number only password and the hacker knows this, then the brute force space is greatly reduced. But how could they possibly know that? It is not as though the hash has a header that explains what character ranges were used. If we assume that the password space is traversed in a random order then you are just as likely to get cracked per attempt with the password '1' as with the password 'dFSDsa&\$124asGIF78WE#E\$hbf'. I know the space is not traversed in random order (except in the case of bitcoin mining) so some passwords are inherently weaker to brute forcing than others, eg shorter passwords. It stands to reason that you are not going to start brute forcing at the upper limit of password length because there is none, well on some sites there is, but in cryptography in general there isn't. So is my 8 digit password really that much weaker than someone else's 8 letter password?

Of course I am assuming random sequences as passwords here. The easiest explanation for the articles I am reading is that they assume the user will use dictionary words, and therefore as they are vulnerable to a dictionary attack, adding numbers and punctuation make them significantly stronger.

*the forum seems to automatically remove multiple spaces, there are meant to be four there, ie. [0x20][0x20][0x20][0x20]
**ok now the forum is censoring and substituting o-m-g but not the word fuck. There is something weird about that.
***This post is my own opinion and no claim is being made that it is in any way scientific nor intended to be construed as such by any reader***

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: Complicated questions about cryptanalysis

drunken wrote:Assumptions that may be false:
1) The password '1' has the same length cryptographic hash as the password '11111111111111111111111' and also the same as the password 'dFSDsa&\$124asGIF78WE#E\$hbf'
2) Someone attempting to crack your password generally has access to your hash, which is hopefully salted
3) There is no way to incrementally attack a password, ie. if my password is 'I am god' there is no way to ascertain that the first letter is a capital I, unlike in hollywood where all password/pin cracking is done character by character.

Those are all correct

For a four character password, the only reason the password '9843' is weaker than '9tJ\$' is that numbers are likely to be bruteforced first under the assumptions that the hackers will start with every character as 0x00 (or the first printable character 0x20) and count up, leading them to hit any combination of only numbers before any combination of only letters. But if we assume that then the strongest password is '~~~~'. On the other hand hackers might start at the top and work down, in which case the strongest password is ' '.* Given that we don't know if they will start at the top or the bottom a good compromise would be '@@@@' being in the middle of the table. It seems it makes no difference at all if you use a combination of upper and lower case, a computer sees all characters as hex values.

Ah, this is not, at least sort of.

There are two reasons why 9843 is weaker.

First, as you say: "If I have a number only password and the hacker knows this, then the brute force space is greatly reduced."

There are 1000 number-only, four-digit passwords. Even if you say "okay, I'll use 15 digits", there are 10^15 of those: a GPU can find the MD5 of every one of those in a time reasonably measured in hours. (4x HD 5970 hits 33 billion MD5 passwords/second Edit: okay, that site is a year and a half old. There's a comment that says another project has measured around 50 billion passwords/second, and even that is from about 10 months ago). Things are better with SHA1, but not significantly: you really need hash techniques that are designed to be GPU-resistant before things look much less scary.

But let's say I tell you I'm using all lowercase letters. Instead of 1000 four-character passwords, there are now over 400,000. At 11 characters, there are now more passwords to check than your 15 digit password. (In technical terms: 11 randomly generated lowercase letters has more entropy than 15 randomly generated digits.) If I tell you I'm using any of upper and lowercase letters or numbers, now I just need 9 characters. Toss in the possibility of 14 symbols (there are 32 easily-typeable symbols on a standard US keyboard) and you need only 8 characters to match the entropy of 15 digits. Which is easier to remember now? And that's not even all that strong of a password!

But then you say "But how could they possibly know that?" and the answer is "he won't, but he'll start looking at weak passwords anyway", which is the second point.

Most people pick weak passwords (in the sense that they are mostly lowercase letters), and thus the attacker will search those passwords first. If I were trying to crack someone's password using brute force, I'd try things in about this order:

1. One or two English words, lowercase
2. #1 with one number somewhere
3. #1 with with one number or one uppercase letter somewhere
4. #2 with common replacements like ! for l, or z for s

If I had a big database of passwords, I'd probably even move onto higher lengths before moving onto other categories.

(I'm not sure where I'd put numbers only. It definitely should be there somewhere, but I'm not gonna do the math to find out where.) Basically I'd invent some categories of passwords like the above which I think are likely going to be things people think of, compute the entropy of each category at each length (e.g. "how many 7-character passwords are in category #2"), and pick whatever category comes first. If that means that "10-character English words" comes before "6-character completely random from 94 characters" (it may or may not), then I'm going to search 10 character English words first.

Said another way: weak passwords are weak precisely because people have trouble remembering randomness. Or said another way: hackers have passwords too, and know the kinds of things people do when coming up with passwords, then use that to their advantage.

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Complicated questions about cryptanalysis

The real killer about "searching weak passwords first" is that there are so few weak passwords, that searching all of them ends up not significantly delaying searching through the space of all passwords.

This is because the general method of encoding how "powerful" a password scheme is is by measuring it in bits (sometimes, bits/length). To figure out how many actual passwords are there, you raise 2 to the power of the number of bits.

A pure-digit password has a strength of 3.32 bits/character. A alpha-numeric password has a strength of ~5.17 bits/character. Upper case, lower case, digits and the 10 symbols you get from shift number is 6.17 bits/character (the addition of shift adds 1 bit/character).

This presumes you have a truly random string.

At 10 characters long, this comes to 33 bits for pure-numbers, 51 bits for alpha-numeric, and 61 bits for alpha-numeric with shift. Searching every digit only password of length 10 takes 1/262,144 the time it takes to search alpha-numeric, and 1/2,684,35,456 the time it takes to search alpha-numeric with shift.

The same holds for searching every English word with common substitutions (0 for o, 1 for i, etc) -- the amount of time you'd spend searching those is so tiny, you might as well check them all first before you go on to truly random strings. And if you think of some clever pattern, well, if your only protection against this kind of search is that "nobody else will think of it", that isn't all that strong: they can afford to think up clever patterns that would be easy to hack and search them too, before they bother searching truly strong passwords.

---

As for the bit about "checking passwords 1 character at a time", I have a story to tell you.

Once upon a time, there was a computer operating system. And in this computer operating system, you could have your own routines for catching what is known as "page faults". A page fault is when memory has been "swapped to disk" and is accessed, and the page fault handler loads the page from disk and provides it to the code that asked for the contents of the memory. Sort of as if you sent a mail message to a street address, and as the postman approached he noticed that the house had been torn down, so they go out and get a builder to build the house before the mail is delivered.

In any case, the user could install the 'page fault handler'.

When you asked to get access to some other user's stuff, you'd have to provide a password. What you'd do is provide a pointer to the first character of the password, and its lenght. The operating system would then read the password, and check if it was acceptable.

Here is the problem. When it found that the password failed to match, it would immediately stop and tell you that the password was wrong.

So what you did was arrange it so that the password was written to memory over a page boundary, so that the first character was "before" the boundary, and the rest was after. Then you'd go off and use a bunch of memory, forcing the tail end of the password to be "swapped out" of memory. Next, you'd provide the password to the OS.

It would proceed to read the first character. If it found it didn't match, it would go and tell you. If it did match, it would read the next character. And your page fault callback would be called. And you'd know that you got the first character right.

So the next time, you'd put the page boundary after the second character, and always write the correct first character.

The result? You could crack passwords one character at a time. And given the speed at which computers of that era ran, it would possibly be slow enough that you could visually see the password characters appearing (remember, you'd have to thrash memory to ensure your password page was swapped out after each attempt...)

I heard this story back in undergrad. I haven't confirmed it.

Sometimes, security systems have flaws. And the people who design the system don't know about the flaws.

In a similar sense, attacks on the German Enigma machine where often done character at a time, where you'd generate guesses about what the various rotors where doing, and you'd progress for a bit until you couldn't figure out what the next router did... So you'd see messages getting decoded starting at the front, and working their way along. (This is based off a vague memory).
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

jareds
Posts: 436
Joined: Wed Jan 03, 2007 3:56 pm UTC

### Re: Complicated questions about cryptanalysis

That was presumably Tenex. (I'm not that old, I just remember those sorts of stories.)

What is noteworthy is that the part about it being slow enough to watch is very likely true, at it's reported by both Yakk and alt.folklore.computers, albeit for different reasons.

drunken
Posts: 181
Joined: Tue May 27, 2008 5:14 am UTC

### Re: Complicated questions about cryptanalysis

Yakk wrote:This is because the general method of encoding how "powerful" a password scheme is is by measuring it in bits (sometimes, bits/length). To figure out how many actual passwords are there, you raise 2 to the power of the number of bits.

A pure-digit password has a strength of 3.32 bits/character. A alpha-numeric password has a strength of ~5.17 bits/character. Upper case, lower case, digits and the 10 symbols you get from shift number is 6.17 bits/character (the addition of shift adds 1 bit/character).

This was actually the main reason for my confusion. A bit strength analysis seems to often be used as a measure for password strength, but it is actually incorrect in that it is a measure of the password scheme, as you put it, not the individual passwords. The xkcd comic about passwords is actually guilty of this too to a degree. I think I mostly have a handle on this now, thanks to you and the other posters for clarifying. It seems the bit strength of a password is the size of a subset of all passwords in the total password space that have equal or less 'predictability'. This term predictability is probably not the best, and it is difficult to quantify and somewhat subjective, but basically corresponds to the likelihood of someone guessing the password. It can be quantified comparatively, for example 'god' is easier to guess than 'dg34'but it is subjective and varies between cultures and languages. The person trying to crack the password would be the subjective yardstick, so whatever they are more likely to guess. I guess is is mathematically fallacious to assign an objective bit strength to a password. Is this mostly right?

Yakk wrote:As for the bit about "checking passwords 1 character at a time", I have a story to tell you.

You mentioned the enigma machine which reminded me of a funny story too. Apparently mechanical encryption machines were used during the cold war too, and at one point the russian embassy in london was using an encryption machine with different sized dials to set the cipher. Because each dial was a different size it made a slightly different sound when it was turned, and each one made a click as the mechanism adjusted. So MI5 set up a hidden pinhole microphone in the wall to record the machine's use. From the recording, given a single known cipher and the date it was set, they could extrapolate the current cipher at any time. Sneaky.
***This post is my own opinion and no claim is being made that it is in any way scientific nor intended to be construed as such by any reader***

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: Complicated questions about cryptanalysis

drunken wrote:This term predictability is probably not the best, and it is difficult to quantify and somewhat subjective, but basically corresponds to the likelihood of someone guessing the password. It can be quantified comparatively, for example 'god' is easier to guess than 'dg34'but it is subjective and varies between cultures and languages. The person trying to crack the password would be the subjective yardstick, so whatever they are more likely to guess. I guess is is mathematically fallacious to assign an objective bit strength to a password. Is this mostly right?

I would say it's mostly right; the first three sentences are pretty much spot-on. However, looking at the mathematics of entropy is still a useful measure, you just have to be careful what you're looking at. E.g. the XKCD comic actually does a pretty decent job I'd say (except for something which I'll point out in a moment), in that it doesn't just say "Tr0ub4dor&3 has 28 bits of entropy" but shows you where it comes from, e.g. you lose 4 bits if you drop the punctuation symbol. The divisions he picks are very natural -- it's almost another way of giving my list of "this is what I'd look at". And while an attacker could pick different categories and you'd get a different strength measure, it seems unlikely that they'd differ all that much as long as both ways of looking are "reasonable" ways of attacking. (Unlike, say, just saying "I'm going to look at all passwords of length n regardless of their character sets, so 1234 is as strong as ob%2.")

The one objection I have is the comment that online attacks are what you should be worried about: I mostly disagree here, and think that offline hash attacks are also important to consider. Perhaps even more important, considering that even a competently run website could be hacked and hashes retrieved (though they'll be salted so no rainbow table) but a competently run website can't be brute forced because they'll block or at least slow attempts after a few attempts. (Note that this is my own definition of "competent." )

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Complicated questions about cryptanalysis

drunken wrote:This was actually the main reason for my confusion. A bit strength analysis seems to often be used as a measure for password strength, but it is actually incorrect in that it is a measure of the password scheme, as you put it, not the individual passwords.
Well, sort of. The information content of any given signal is a function of how you encode it. There is no encoding that makes all messages take up less information than just a naive raw copy.

In practice, data isn't random. And given that it isn't random, there are ways to generate it that require fewer bits than the actual message.

In some senses, all passwords are weak. If you knew the password, you can just type it in. There is a trivial password guessing algorithm that will get it on the first try. In another sense, we can remove the knowledge of the specific password, and provide the hacker with the same knowledge that the person creating the password had a moment before the password was created. And then we can measure the password strength based on how hard it would be for the hacker to defeat the password.
The xkcd comic about passwords is actually guilty of this too to a degree.
There really isn't a concept of "password strength" that isn't a function of how the password was generated, or how people would attempt to crack it.
I think I mostly have a handle on this now, thanks to you and the other posters for clarifying. It seems the bit strength of a password is the size of a subset of all passwords in the total password space that have equal or less 'predictability'.
Sure. Except the goal is to produce a lower bound on how strong the password is, not an accurate estimate. We don't mind if the password is "too strong" as much as we want to make sure it is "secure enough".

So we provide the attacker with lots of resources that they might not be able to match, such as the algorithm we used to generate the password. Anything we cannot quantify (your creative ability to come up with puns, for example) we presume the attacker could defeat for free, or nearly free. Experience backs this up, because people's intuitive attempts to munge passwords tended to be much weaker than they think they are.
This term predictability is probably not the best, and it is difficult to quantify and somewhat subjective, but basically corresponds to the likelihood of someone guessing the password.

So a formal way to approach it is that we have a (category of) reversible compression scheme on passwords, and the strength of the password is the number of bits that the compression scheme would require to generate this password. This is why you measure password strength in terms of bits of entropy, which is the standard unit of information.
I guess is is mathematically fallacious to assign an objective bit strength to a password.

No. All passwords have a password strength of 1 bit under the appropriate compression scheme. (1 represents "use the password X", 0 represents "use the remaining bits in this value as the password). That is an objective bit strength for any password X.

So yes, in order to be really specific, you'd want to say "the entropic content of passwords generated by this password generating scheme", where the attacker is reduced to not knowing the random variables in the scheme. This allows a smart attacker to say "but wait, while the scheme flipped 20 coins, there are only 2^10 possible results, so the entropic strength is merely 10 not 20". Similarly if there is a bias where some passwords are far more likely to be generated than others, the attacker can encode those using fewer bits and use more bits for rarer passwords, which would reduce the (average) bit strength of the passwords produced.

...

Offline hash attempts run into the same barriers to guess passwords, assuming a competent hashing function.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: Complicated questions about cryptanalysis

Yakk wrote:Offline hash attempts run into the same barriers to guess passwords, assuming a competent hashing function.

Same barriers as... what? As online attacks?

No they don't. Websites should* deliberately slow multiple unsuccessful attempts to the same account (in addition to doing other crap like throwing up CAPTCHAs). Say a maximum of 1 per second, and 5 per minute. Attempts beyond those are rejected summarily without even checking. What actually-reasonable hashing scheme takes an average of 12 seconds to compute a hash? How would Google afford the processing power to do that for everyone logging in? Not only that, but if (time_now - time_then < 12 seconds) printf("go away"); doesn't get any faster with better computers.

* (my assertion is "must if you want to be called competent" -- isn't it nice being able to define your own terms? I'm not actually sure how often this sort of thing is done in practice to be honest, but it really ought to be)

Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

### Re: Complicated questions about cryptanalysis

There was another good hack (or, bad password scheme) that allowed hacking of a password in parts. The full article is here. The summary:

To make a password, ask the user for a string of characters less than fourteen in length. Then, convert them to upper case, pad the end with 0x00s, and split it into two, encrypting each part separately.

So, someone gains access to the system, and finds that the second half of the password is the hash of ZEE0000. The first half is pretty obviously CHIMPAN. Or they find MER0000. It's going to be PROGRAMMER or ASTRONOMER.

So, there are other schemes where a partial match is identifiable.

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Complicated questions about cryptanalysis

Lockouts are used, but are relatively ineffective. Attacks are rarely against one person: instead you have a bot net trying passwords for a bajillion different users. No one IP attacks multiple times, no one account attacks multiple times. And if you lock out based on the existence of these attacks, then nobody can use your service.

These attacks scale with the bandwidth and number of bots.

And one bot can attempt to attack a myriad of different servers at once. Any delay on log in just reduces the resources, and bothers legitimate users as much as attack bots.

Similarly, attacks on hashing functions scale with the number of bots, and their speed.

There are order of magnitude speedups for cracking a password file. But the fundamental measure of security -- number of entropic bits in your password -- is still the only serious barrier. The only fundamental difference is that targeted attacks are easier with a downloaded file of hashed passwords.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

alfie275
Posts: 1
Joined: Sat Aug 27, 2011 12:00 am UTC

### Re: Complicated questions about cryptanalysis

How secure is my password generation method? Method is:

1. Come up with a password, usually based on what site I'm on.
2. Imagine each key of the keyboard is a pixel.
3. Type out password by pressing the key "pixels" of the letters.

You only need to remember the scan method/what each letter looks like, and you can easily make your passwords 5 or 6 times as long.

Xanthir
My HERO!!!
Posts: 5423
Joined: Tue Feb 20, 2007 12:49 am UTC
Contact:

### Re: Complicated questions about cryptanalysis

Assuming you press all the pixels of each letter, so that each "letter" in your original phrase translates to several characters in the actual password, it's plenty secure as long as no one knows your sequence and actually looks for that. The method is potentially, though that's unlikely, and without it being guess the entropy is very high.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

walkerm930
Posts: 69
Joined: Wed Apr 07, 2010 3:53 am UTC
Location: Canada: Ontario: Toronto

### Re: Complicated questions about cryptanalysis

My password generation method lately is to think of two relatively long words one around 7-9 and the other 10+. How secure is this? If the password contains an English word, but it's 14 letters long, is that considered insecure?
In the gospel according to trig there are 3 primary rules: sin θ = x/h , cos θ = y/h and tan θ = x/y. These rules are not open to interpretation and are to be treated as law.

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

### Re: Complicated questions about cryptanalysis

walkerm930 wrote:My password generation method lately is to think of two relatively long words one around 7-9 and the other 10+. How secure is this? If the password contains an English word, but it's 14 letters long, is that considered insecure?

Two random English words? About 33 bits of entropy, or a five-typeable-character password.

Xanthir
My HERO!!!
Posts: 5423
Joined: Tue Feb 20, 2007 12:49 am UTC
Contact:

### Re: Complicated questions about cryptanalysis

walkerm930 wrote:My password generation method lately is to think of two relatively long words one around 7-9 and the other 10+. How secure is this? If the password contains an English word, but it's 14 letters long, is that considered insecure?

Based on this comic, that's about 22 bits of entropy, which isn't nearly enough. Use more words.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

walkerm930
Posts: 69
Joined: Wed Apr 07, 2010 3:53 am UTC
Location: Canada: Ontario: Toronto

### Re: Complicated questions about cryptanalysis

Yeah, after reading this thread, I was kind of thinking that my passwords weren't secure enough. And now I'm wondering why I thought they were in the first place, especially given that recent comic. Come to think of it that is probably where I got the idea in the first place, without realizing it! Darn correct horse battery staple, still remember that...
In the gospel according to trig there are 3 primary rules: sin θ = x/h , cos θ = y/h and tan θ = x/y. These rules are not open to interpretation and are to be treated as law.

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

### Re: Complicated questions about cryptanalysis

Xanthir wrote:
walkerm930 wrote:My password generation method lately is to think of two relatively long words one around 7-9 and the other 10+. How secure is this? If the password contains an English word, but it's 14 letters long, is that considered insecure?

Based on this comic, that's about 22 bits of entropy, which isn't nearly enough. Use more words.

Depends on how the words are selected. 22 bits is correct if you're selecting from the 4000 or so most common words; my 33 bits is correct if you're selecting random dictionary words (of which there are about 100,000).