## Seven in random numbers vs. actual data

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

alterant
Posts: 32
Joined: Fri Feb 15, 2008 1:34 am UTC

### Seven in random numbers vs. actual data

So I was watching "numb3rs" a while ago (yes, yes, I know! not a good source for math wisdom ) and there was something mentioned about how a string of random numbers can be distinguished from actual data by the fact that actual data has less sevens (or is it more). Is this complete & total garbage? Merely a mangled version of a real phenomenon? Or is it true? Google & Wikipedia, alas, seem to have nothing to offer up.
Cheers,
-Ian

masher
Posts: 821
Joined: Tue Oct 23, 2007 11:07 pm UTC
Location: Melbourne, Australia

### Re: Seven in random numbers vs. actual data

I seem to recall something that was (is) used in forensic accounting: There are reliable distributions of numbers that start with a certain number.

I think that one person was caught out in a fraud when someone tried to test it and used fraudulent accounts. The numbers were waaay skewed, and, therefore, showed evidence of fraud.

.

Found it: Benford's Law
Last edited by masher on Tue Jul 01, 2008 4:48 am UTC, edited 1 time in total.

jmorgan3
Posts: 710
Joined: Sat Jan 26, 2008 12:22 am UTC

### Re: Seven in random numbers vs. actual data

The closest thing I can think of is Benford's Law, which states that the first digit of a non-random number is most likely to be 1, with the others numbers' likelihood decreasing in order.

Edit: Ninja'd?
This signature is Y2K compliant.
Last updated 6/29/108

masher
Posts: 821
Joined: Tue Oct 23, 2007 11:07 pm UTC
Location: Melbourne, Australia

### Re: Seven in random numbers vs. actual data

Also, which episode on numb3rs was it? "The Running Man" is mentioned in the Wikipedia article on Benford's Law...

jmorgan3 wrote:Edit: Ninja'd?

I must have been editing whilst you were posting...

phlip
Restorer of Worlds
Posts: 7573
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: Seven in random numbers vs. actual data

It's not random data in general that this will find... rather, it's human-generated random data.

If you ask someone to pick a random number from 1 to 10, you will get 7's and 3's a lot more often than any other number... because those numbers seem "more random". Basically, when you see a round multiple of 10, you think the number's been rounded off; but a number that's not a multiple of 10, you see as more precise... so even though the precise number in real data will still end with a 0 from time to time, people will tend to not pick them at all when picking random numbers (and when they do, it's still with a much lower frequency than they should be). This also happens (to a slightly lesser extent) with multiples of 5, and even numbers. The random-looking-ness of primes is amped way up, and numbers ending in 7 (and coming in second, 3) are often overrepresented. The most random one-digit number is 7, the most random two-digit number is 37.

Basically, you know that if the data is all nice neat round numbers, then the fraud will be obvious... and then overcorrect.

As for the Numb3rs ep... I think I know the one... it's not the one that mentioned Benford's Law (which was only mentioned as an analogy, not used directly), but a different episode... one of the ones with that Fantasy Baseball guy in it. Can't remember the title, or what it was about, though...

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

Tac-Tics
Posts: 536
Joined: Thu Sep 13, 2007 7:58 pm UTC

### Re: Seven in random numbers vs. actual data

alterant wrote:So I was watching "numb3rs" a while ago (yes, yes, I know! not a good source for math wisdom ) and there was something mentioned about how a string of random numbers can be distinguished from actual data by the fact that actual data has less sevens (or is it more). Is this complete & total garbage? Merely a mangled version of a real phenomenon? Or is it true? Google & Wikipedia, alas, seem to have nothing to offer up.
Cheers,
-Ian

Most likely it's nonsense created for the show. Or worse, nonsense created by a psychologist.

However, it is still true that human-created data is much more regular than random noise. For example, consider a human banging on a computer keyboard. Under most circumstances, you will find the case of the characters (A vs a) will fluctuate much slower than it would if you randomly chose a sequence of ascii characters.

Posts: 809
Joined: Sat Oct 27, 2007 5:51 pm UTC

### Re: Seven in random numbers vs. actual data

Maybe it's a nod to that pi thing that happened a hundred or two hundred years ago? I don't remember the specific names, but for a long time the longest known expansion of pi was about 700 places, which were computed by hand. However, a mistake was made around the 500th digit, which caused all the rest of them to be wrong. This was discovered when someone noticed that, in the last 200 digits, there was a curious lack (or was it abundance?) of 7's. The digits of pi tend to pass all the "randomness" tests we can throw at it, so this weirdness pointed at a computational mistake.
Let's have a fervent argument, mostly over semantics, where we all claim the burden of proof is on the other side!

phlip
Restorer of Worlds
Posts: 7573
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: Seven in random numbers vs. actual data

Some website wrote:If you ask someone to think of any number from 1 to 50, one people out of three answers 37. Strangely, if you ask someone else to think of any number from 1 to 5, the probability that he/she will answer 3 is one out of three; and if you ask to think of any number from 5 to 10 instead, the probability that he/she will answer 7 is also one out of three! -- G. Sarcone

is a 'psychologically random number' - that is a number which is chosen more often when someone is asked to pick a number (usually this kind of number is odd and doesn't end in 5, because there is a natural psychological bias to think of even numbers and numbers that end in 5 are 'less random'). 37 appears disproportionately in television and movies.

There's also some data here (but it's a small sample, from an online poll... take with a grain of salt). There, the results from the poll is compared to the results of a computer PRNG with the same number of samples... a meaningless comparison, at least directly, but it gives an indication of what the distribution of a fair RNG would be... and the spikes at 7 and 17 are much higher than the noise in the fair RNG. The author claims in the comments that he did that instead of error bars to make it easier to understand for the layperson.

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

imatrendytotebag
Posts: 152
Joined: Thu Nov 29, 2007 1:16 am UTC

### Re: Seven in random numbers vs. actual data

Its also an interesting (related) comment that if you ask somebody to flip 100 coins and record the results, you can usually tell if that person actually flipped the coins or not. That is because an artificially generated list won't have long strings of heads or tails (since it doesn't seem 'random' enough).

However, its pretty easy to show that a string of 3 heads has about 99% chance of occurring, and a string of 4 heads is greater than 90%. Even 5 heads in a row has a pretty significant chance of happening.
Hey baby, I'm proving love at nth sight by induction and you're my base case.