Pi, Monkeys, and Shakespeare

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

User avatar
Velifer
Posts: 1132
Joined: Wed Dec 26, 2007 4:05 pm UTC
Location: 40ºN, 83ºW

Pi, Monkeys, and Shakespeare

Postby Velifer » Fri Mar 14, 2008 6:36 pm UTC

So I saw this comment on the BBC news website, in reference to a story on pi and the book Contact:
Funny about Carl Sagan expecting to find encoded messages in Pi. It's an infinite series, which means that any message you want to find will be there somewhere.

This is similar to the "monkeys typing Shakespeare" thought experiment, but is it necessarily true? I can think of several examples of non-random infinite sets [0,0,0,0,0...] that would violate this rule (an infinite number of them, in fact).

If all digits have an equal probability of being next in the series, does the submitter make a valid point? What about pi specifically, as it is not random? I can handle probability, and I can cope with the infinite in a rather small-minded and well-bounded way, but when the two ideas start crashing into each other, I get a little muddled.

Would any (or every) message eventually be coded? In Esperanto? As properly formed LaTeX with raster graphics of bunnies and rainbows?

User avatar
Mathmagic
It's not as cool as that Criss Angel stuff.
Posts: 2926
Joined: Thu Nov 30, 2006 12:48 am UTC
Location: In ur fora posting in teh threads

Re: Pi, Monkeys, and Shakespeare

Postby Mathmagic » Fri Mar 14, 2008 6:39 pm UTC

This topic, about 4 topics below this one, might start to address your question. :)
Axman: That, and have you played DX 10 games? It's like having your corneas swabbed with clits made out of morphine.
Pathway: cocks cocks cocks

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Fri Mar 14, 2008 8:36 pm UTC

As I said in the other post, it's unknown whether every digit sequence will eventually appear. I think the answer is believed to be yes. Probably every message is to be found somewhere in pi, although we don't actually know that this is the case.

However, the scientists in Carl Sagan's book were looking for messages that were probably put there intentionally rather than by chance. If you find a sequence of 100 consecutive 0s and 1s in the first 10100 digits of pi, that's not terribly surprising, since if pi were purely random one would expect to find such sequences of that length in that my digits of pi. However, if you found 10,000 consecutive 0s and 1s, that would be extremely surprising, since probabilistically, that wouldn't be expected to appear until much later. If those 10,000 digits turned out to encode something meaningful, that would be that much more surprising, to the point that you would begin to guess that something fishy was going on and it wasn't just chance. That's what the scientists were doing in Contact:

Carl Sagan, Contact wrote:"The program races through the digits of pi and pauses even to think about only when there's some anomalous sequence of zeros and ones. You know what I'm saying? Something nonrandom. By chance, there'll be some zeros and ones of course. Ten percent of the digits will be zeros, and another ten percent will be ones. On average. The more digits we race through, the longer the sequences of pure zeros and ones that we should get by accident. The program knows what's expected statistically and only pays attention to unexpectedly long sequences of zeros and ones."

"I don't understand. If you look at enough random numbers, won't you get any pattern you want simply by chance?"

"Sure. But you can calculate how likely that is. If you get a very complex message very early on, you know it can't be by chance."


P.S. As you said, that quote is not actually true, since there are certainly infinite sequences that don't contain every message you might want to find. Your example was a good one. In fact, I have a friend who wrote a letter to the BBC to correct that very mistake. :P
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
Mathmagic
It's not as cool as that Criss Angel stuff.
Posts: 2926
Joined: Thu Nov 30, 2006 12:48 am UTC
Location: In ur fora posting in teh threads

Re: Pi, Monkeys, and Shakespeare

Postby Mathmagic » Fri Mar 14, 2008 8:46 pm UTC

Why would a complex sequence early on be any less-by-chance than if it occurred later on? If a complex sequence of n digits has a probability P of occurring in an infinite sequence of digits, what does it matter where it occurs?

For example, let's take rolling a 6-sided die:

If you roll the die enough times such that every combination and permutation of 6 numbers occurs exactly once (ie. there's a 100% chance that a combination of 6 numbers (1 through 6) of a certain permuation (i.e. 1-2-3-4-5-6 vs. 1-3-2-4-5-6) will occur), why would it be less-by-chance if a specific combination-permutation occurs earlier on?
Axman: That, and have you played DX 10 games? It's like having your corneas swabbed with clits made out of morphine.
Pathway: cocks cocks cocks

User avatar
pkuky
Posts: 612
Joined: Sun Oct 07, 2007 2:16 pm UTC
Location: jerusalem
Contact:

Re: Pi, Monkeys, and Shakespeare

Postby pkuky » Fri Mar 14, 2008 8:48 pm UTC

but wouldn't you inevitably get an unlikely sieries at some point? (by unlikely I mean as unlikely as you want).
It rains on the enemy too!

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Fri Mar 14, 2008 9:25 pm UTC

pkuky wrote:but wouldn't you inevitably get an unlikely sieries at some point? (by unlikely I mean as unlikely as you want).

Well, yes, of course. The chance that an infinite digit sequence would happen to contain 1415926535 very early on is every bit as low as the chance it would contain 0000000000 very early on. However, one is much more surprising than the other because the string itself appears nonrandom. An extremely long nonrandom string appearing early on in a purportedly random sequence of digits is very good evidence that the sequence of digits is not, in fact, random.

For example, suppose I were to claim that the following 5 20-digit sequences were obtained by rolling a 10-sided die:

10110001010001011101
33053054882046652138
71828182845904523536
18446744073709551616
14159265358979323846

For which of them do you believe me?
Spoiler:
I'm too lazy to roll a 10-sided die 20 times and record the results. If you are like me, I'd believe the 2nd and 4th were, but not the 1st, 3rd, or 5th, since the first consists only of 0s and 1s, the 3rd is part of the decimal expansion of e, and the 5th is part of the decimal expansion of pi. If you were a computer programmer, you might recognize the 4th as being equal to 264. Probably very few people would recognize the 2nd as the 364th through 383rd digits of pi.


So if one imagines pi as being random (which is not actually true, but probably not such a bad way to reach conclusions if it's normal) and a highly nonrandom sequences (such as a long string consisting of only 0s and 1s) occur very early on in the decimal expansion for pi, then this would be good reason to think that it did not occur merely by chance, but rather was put there by some entity capable of doing so (if one imagines that there are entities capable of doing so).
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
Mathmagic
It's not as cool as that Criss Angel stuff.
Posts: 2926
Joined: Thu Nov 30, 2006 12:48 am UTC
Location: In ur fora posting in teh threads

Re: Pi, Monkeys, and Shakespeare

Postby Mathmagic » Fri Mar 14, 2008 9:29 pm UTC

So basically, it's not so much "mathematically" unlikely as it's more *intuitively* unlikely for such a string to occur early on?

EDIT: And the example you gave of the 20 rolls with a 10-sided die isn't really analogous to the decimal representation of pi, or the example that I gave.
Axman: That, and have you played DX 10 games? It's like having your corneas swabbed with clits made out of morphine.
Pathway: cocks cocks cocks

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Fri Mar 14, 2008 9:49 pm UTC

Well you can't really talk about the probability of such and such a string appearing in such and such a place in pi, since pi is what it is, and isn't random. So the probability is 1 if it's there, and 0 if it's not. For this whole line of argument to make any sense at all, you have to imagine that before the universe was created, pi could really have been anything, and it was only when the universe actually started up that pi was given a fixed value. Furthermore, you have to imagine that if the universe exists by chance, the value of pi is essentially random, but if the universe was designed, then the value of pi might have been part of that design. So it's all rather implausible, but since this is all going on in a sci-fi novel, I think it's within an acceptable realm of implausibility.

So yes, given these assumptions, the act of finding such a long, complicated, and regular string early on in pi indicates a high probability of a designed universe, and raises the possibility that the string might encode a message of some kind from that designer. Personally I'm of the opinion that pi is what it has to be, and not even someone capable of redesigning the universe could change it (although they could probably change the cosmological constant, the forces present in our universe, the speed of light, and so forth), so I would predict we never find such messages. (If we ever found something that looked like one, I'd have to reevaluate my assumptions.) But I'd still put this in the realm of "cool idea".
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
Mathmagic
It's not as cool as that Criss Angel stuff.
Posts: 2926
Joined: Thu Nov 30, 2006 12:48 am UTC
Location: In ur fora posting in teh threads

Re: Pi, Monkeys, and Shakespeare

Postby Mathmagic » Fri Mar 14, 2008 10:04 pm UTC

Well, the value of pi is what it is... the ratio of a circle's circumference to its diameter. The circle itself is an "idealized" object in the first place, so like you said, you wouldn't be able to change the value of pi unless you changed the definition of a circle. Even if you did that you would still get 'pi' if you "imagined" a shape such that each point around the perimeter is the same distance from a central point. If you look at it that way, then no, pi is not "random".

You can extend such an argument to other irrationals like the Golden Ratio; it has an evaluative mathematical definition, and would exist regardless of what the natural laws of the universe were. The difference between the Golden Ratio and pi is that the Golden Ratio actually occurs in nature, but in my opinion, these instances of the Golden Ratio may as well have been a result of paradolia.
Axman: That, and have you played DX 10 games? It's like having your corneas swabbed with clits made out of morphine.
Pathway: cocks cocks cocks

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Fri Mar 14, 2008 10:15 pm UTC

Wait, are you saying that pi doesn't occur in nature?

...

In any case - yes, it's sci-fi. Treat it as such. If you prefer, suppose that there is some physical constant (rather than mathematical constant) whose value we can somehow compute precisely. If that constant had a very surprising digit sequence, wouldn't we start scratching our heads?
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

Sunsnail
Posts: 166
Joined: Sun Oct 21, 2007 6:44 pm UTC

Re: Pi, Monkeys, and Shakespeare

Postby Sunsnail » Fri Mar 14, 2008 10:26 pm UTC

What kind of practical joke that would be! Supernatural beings making the circumferences of circles pi larger than their diameters :lol:

Not sure how it's possible

User avatar
Mathmagic
It's not as cool as that Criss Angel stuff.
Posts: 2926
Joined: Thu Nov 30, 2006 12:48 am UTC
Location: In ur fora posting in teh threads

Re: Pi, Monkeys, and Shakespeare

Postby Mathmagic » Fri Mar 14, 2008 10:30 pm UTC

Haha, I guess reading my post again, it sounds like I'm saying that... what I meant was that pi is known as "that irrational, non-repeating number that is really useful in math", whereas the Golden Ratio is known more as "that number that occurs in nature and ancient architecture".

If you disregard mathematical constants, then yes, I would be suspicious if it had a "surprising" (however you want to define that) digit sequence. However, any empirical measurements/readings of such a constant would always be subject to standard sources of error and precision constraints.

EDIT: I see you've added in examples of pi occurring in nature. That's just a consequence of almost-perfect circles/spheres occurring in nature, and is kind of an x => x proof. :razz:

And sure you've got Schroedinger's equation, but these kinds of equations are derived from mathematical concepts that are used as approximations, don't you think?
Axman: That, and have you played DX 10 games? It's like having your corneas swabbed with clits made out of morphine.
Pathway: cocks cocks cocks

Yesila
Posts: 221
Joined: Sun Dec 16, 2007 11:38 am UTC

Re: Pi, Monkeys, and Shakespeare

Postby Yesila » Sat Mar 15, 2008 5:58 pm UTC

Pi seems pretty regular to me...

Spoiler:
Image


I think there are lots of messages you can't find in there :)

On a related note does anybody find it odd that the golden ratio is very very poorly approximated by rationals and yet has one of the nicest (simple) continued fraction representations there is?

If there are hidden messages encoded in pi, phi or whatever is base 10 decimal representations or these numbers really the "right" way to be looking at them?

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26836
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: Pi, Monkeys, and Shakespeare

Postby gmalivuk » Sat Mar 15, 2008 6:10 pm UTC

mathmagic wrote:So basically, it's not so much "mathematically" unlikely as it's more *intuitively* unlikely for such a string to occur early on?

In a sense, yes. What we're talking about is the likelihood of "unlikely-looking" sequences early on in one or another expansion of pi. So it's somewhat ill-defined, in the same way as asking *after* the fact of some surprising occurrence, "what are the chances of that?" If you define "that" as being precisely the thing that happened, perhaps the chances of it happening *again* are minute, while the chance that it already happened is one. However, if you define "that" as "something that would surprise me in some way", the chances of another such event go up quite a bit.

However, there are other, purely mathematical reasons to be surprised at a sequence like the one discovered in Contact. Just like there are purely mathematical reasons to think 10110001010001011101 is a far less likely d10 sequence than any of 33053054882046652138, 71828182845904523536, 18446744073709551616, or 14159265358979323846. The others happen to also be specific sequences, that if predicted *beforehand* would really, really surprise us if they resulted from a sequence of die rolls. But 10110001010001011101 should surprise us right off the bat. No one needs to predict beforehand "a sequence of 0's and 1's" for us to be surprised. There are a number of randomness tests one could perform, and while some of the other sequences given might pass at least a few of them, this one almost certainly wouldn't.

What this means is that we have very good mathematical reasons to suspect that 10110001010001011101 was not the result of anyone's *first* several consecutive rolls of a completely fair d10.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Sat Mar 15, 2008 6:24 pm UTC

mathmagic wrote:And sure you've got Schroedinger's equation, but these kinds of equations are derived from mathematical concepts that are used as approximations, don't you think?

No, I think that they really express the true laws of the universe (or will when we find the right ones) and the only approximation is due to limitations of measurement.

gmalivuk wrote:However, there are other, purely mathematical reasons to be surprised at a sequence like the one discovered in Contact. Just like there are purely mathematical reasons to think 10110001010001011101 is a far less likely d10 sequence than any of 33053054882046652138, 71828182845904523536, 18446744073709551616, or 14159265358979323846. The others happen to also be specific sequences, that if predicted *beforehand* would really, really surprise us if they resulted from a sequence of die rolls. But 10110001010001011101 should surprise us right off the bat. No one needs to predict beforehand "a sequence of 0's and 1's" for us to be surprised. There are a number of randomness tests one could perform, and while some of the other sequences given might pass at least a few of them, this one almost certainly wouldn't.

Er, gmalivuk, I would find this argument a lot more plausible if the other numbers you describe as "less surprising" didn't happen to coincide with the decimal expansion of pi, the decimal expansion of e, and 264. Seeing any of those come up from a d10 would strike me as very surprising, as would any other sequence which gets more than 200 Google hits. If the numbers were something like 000966121856649775389 (which really was generated by a d10) I would agree with you.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
Owehn
Posts: 479
Joined: Tue Oct 09, 2007 12:49 pm UTC
Location: Cambridge, UK

Re: Pi, Monkeys, and Shakespeare

Postby Owehn » Sat Mar 15, 2008 7:00 pm UTC

It just comes down to what we think the salient features of a sequence are. You're just as likely to roll 110100000101110 on a d10 as you are to roll 314159265358979. However, you're far less likely to roll a sequence "like" 110100000101110 than you are to roll one "like" 314159265358979 if the only important features you're considering are the number of distinct numerals encountered (only two, versus 9 out of 10). On the other hand, you're far more likely to roll a sequence of fifteen ones and zeroes than you are to roll the first 15 digits of pi.
[This space intentionally left blank.]

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Sat Mar 15, 2008 7:28 pm UTC

Well, to me, a surprising sequence is one which satisfies some "natural"* criterion which relatively few sequences satisfy. The string "110100000101110" is surprising, because it satisfies the natural criterion of consisting of only 0s and 1s, and only 1/515 of all sequences of length 15 satisfy that criterion. The string "314159265358979" is surprising because it satisfies the natural criterion of consisting of an initial segment of pi, and only 1/1015 of all sequences of length 15 satisfy that criterion. The string "121856649775389" is not surprising, since I can't think of a natural criteria which is sufficiently rare that this string satisfies.

*No, I don't have a good definition of what it means for some criterion to be "natural".
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
Cosmologicon
Posts: 1806
Joined: Sat Nov 25, 2006 9:47 am UTC
Location: Cambridge MA USA
Contact:

Re: Pi, Monkeys, and Shakespeare

Postby Cosmologicon » Sat Mar 15, 2008 8:05 pm UTC

One potential criterion for an interesting sequence is if you can write a program that produces the sequence (and only the sequence) as output, but which is smaller in size that the output itself. This doesn't work so well for sequences of length 100, but it works well for sequences of length, say, 10,000 or more.

User avatar
parallax
Posts: 157
Joined: Wed Jan 31, 2007 5:06 pm UTC
Location: The Emergency Intelligence Incinerator

Re: Pi, Monkeys, and Shakespeare

Postby parallax » Sat Mar 15, 2008 8:26 pm UTC

The usual definition of "natural" or "unnatural" sequences or sets is to imagine a computer program (in some programming language) that would print the result. Or you could use a natural language like English.

For example, "110100000101110" falls into the category "sequences consisting entirely of ones and zeros", whereas "121856649775389" falls into the category "sequences whose first and third numbers are the same, with two instances of consecutive equal numbers, and ending with three-eight-nine". And the second category, although requiring a much longer phrase to describe, still contains more sequences than the first. "sequences containing the first fifteen digits of pi" is even shorter and contains fewer sequences.

I would rank the "naturalness" or "surprisingness" of those numbers in the following way.
"the first fifteen digits of pi" -- most surprising
"the number two-six-six-seven-zero, written in base two"
"the sequence one-two-one-eight-five-six-six-four-nine-seven-seven-five-three-eight-nine" -- least surprising
Cake and grief counseling will be available at the conclusion of the test.

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Sat Mar 15, 2008 11:15 pm UTC

Cosmologicon wrote:One potential criterion for an interesting sequence is if you can write a program that produces the sequence (and only the sequence) as output, but which is smaller in size that the output itself. This doesn't work so well for sequences of length 100, but it works well for sequences of length, say, 10,000 or more.

That's true, but for this to make sense, you first need to fix a programming language, and a string will be "interesting" or "not interesting" depending on your programming language. This definition lets you say some very powerful and nonintuitive things about the set of noninteresting strings (which are called random strings), which makes it very useful, and an active area of research, especially when you move from the realm of finite strings to the realm of infinite sequences (since fortunately, the set of infinite sequences which are random does not depend on your choice of programming language). However it has some issues as a definition for interesting properties of finite strings, particularly - as you say - when they are short.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

zenten
Posts: 3799
Joined: Fri Jun 22, 2007 7:42 am UTC
Location: Ottawa, Canada

Re: Pi, Monkeys, and Shakespeare

Postby zenten » Sun Mar 16, 2008 1:47 am UTC

skeptical scientist wrote:
Cosmologicon wrote:One potential criterion for an interesting sequence is if you can write a program that produces the sequence (and only the sequence) as output, but which is smaller in size that the output itself. This doesn't work so well for sequences of length 100, but it works well for sequences of length, say, 10,000 or more.

That's true, but for this to make sense, you first need to fix a programming language, and a string will be "interesting" or "not interesting" depending on your programming language. This definition lets you say some very powerful and nonintuitive things about the set of noninteresting strings (which are called random strings), which makes it very useful, and an active area of research, especially when you move from the realm of finite strings to the realm of infinite sequences (since fortunately, the set of infinite sequences which are random does not depend on your choice of programming language). However it has some issues as a definition for interesting properties of finite strings, particularly - as you say - when they are short.


I don't see how the programming language would make a difference, as long as we're talking about Turing complete languages.

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Sun Mar 16, 2008 2:14 am UTC

Well, different programming languages have different ways of doing things, so the length of the shortest program to output some string will vary from language to language. In some languages the program might be shorter than the string, and in other languages the program might be longer, so the string will be "interesting"/"nonrandom" (have a short program) or "uninteresting"/"random" (have no short program) depending on the length of the program. Of course, since we're talking about Turing complete languages, programs will exist in both, but there's no reason why the lengths should be the same.

But it gets worse than this. Programming languages can have built in functions; for example "printf" is built into the C programming language (and several others). If I want, I can define my programming language to contain the function "print_interesting_string" which always outputs the string 78034652909874115638945062348009516209834. Now in this language, that particular string happens to have a very short program, and is therefore nonrandom. I could, of course, have defined a different programming language which replaced the above string by any other string, and in that language, that other string would be nonrandom. So no matter what string I come up with, there is some programming language relative to which that string is nonrandom*. A similar trick can be used to show that no matter what string I come up with, there is some programming language relative to which that string is random. All of these programming languages are still Turing complete.

*Excluding the string of length 0, since of course no program can have length shorter than 0. Length 1 strings can be made nonrandom by defining a programming language that adopts the standard that if the compiler is fed an empty file, it outputs an executable which prints the string "3" (or whatever).
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

bray
Posts: 67
Joined: Wed Aug 15, 2007 4:17 pm UTC

Re: Pi, Monkeys, and Shakespeare

Postby bray » Sun Mar 16, 2008 7:16 am UTC

skeptical scientist wrote:The chance that an infinite digit sequence would happen to contain 1415926535 very early on is every bit as low as the chance it would contain 0000000000 very early on.


I know this is very far beside the points that people are actually interested in here but I can't help pointing out that this isn't actually quite true. E.g., the chance that the first 11 digits of a (uniformly) random digit sequence would contain 1415926535 is 20/10^10 while the chance that it would contain 0000000000 is only 19/10^10. This actually turns out to be important in some practical string matching situations.

User avatar
Morpheus
Posts: 18
Joined: Sun Feb 17, 2008 12:43 pm UTC
Location: Netherlands
Contact:

Re: Pi, Monkeys, and Shakespeare

Postby Morpheus » Sun Mar 16, 2008 4:11 pm UTC

Owehn wrote:It just comes down to what we think the salient features of a sequence are. You're just as likely to roll 110100000101110 on a d10 as you are to roll 314159265358979. However, you're far less likely to roll a sequence "like" 110100000101110 than you are to roll one "like" 314159265358979 if the only important features you're considering are the number of distinct numerals encountered (only two, versus 9 out of 10). On the other hand, you're far more likely to roll a sequence of fifteen ones and zeroes than you are to roll the first 15 digits of pi.


This reminds me much of the 'ugly duckling theorem' in machine learning. If you get an input of numbers and your program has to find some interesting patterns, you have to say what categories of patterns you find interesting. If you consider everything being equal you can find any pattern you would like.

parallax wrote:The usual definition of "natural" or "unnatural" sequences or sets is to imagine a computer program (in some programming language) that would print the result. Or you could use a natural language like English.

For example, "110100000101110" falls into the category "sequences consisting entirely of ones and zeros", whereas "121856649775389" falls into the category "sequences whose first and third numbers are the same, with two instances of consecutive equal numbers, and ending with three-eight-nine". And the second category, although requiring a much longer phrase to describe, still contains more sequences than the first. "sequences containing the first fifteen digits of pi" is even shorter and contains fewer sequences.

I would rank the "naturalness" or "surprisingness" of those numbers in the following way.
"the first fifteen digits of pi" -- most surprising
"the number two-six-six-seven-zero, written in base two"
"the sequence one-two-one-eight-five-six-six-four-nine-seven-seven-five-three-eight-nine" -- least surprising


Have you got some more reading on this? It does sound like an interesting way to define 'interestingness' of a certain sequence. But as skeptical scientist pointed out, I do think the programing language you pick is important.

-Daniel

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Sun Mar 16, 2008 5:48 pm UTC

Morpheus wrote:Have you got some more reading on this? It does sound like an interesting way to define 'interestingness' of a certain sequence. But as skeptical scientist pointed out, I do think the programing language you pick is important.

-Daniel

http://en.wikipedia.org/wiki/Kolmogorov_complexity
http://en.wikipedia.org/wiki/Algorithmically_random_sequence
Li & Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications (amazon.com)
Online version of Li & Vitanyi, Ch. 1
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
mudge
Posts: 292
Joined: Sun Sep 30, 2007 10:14 pm UTC
Location: Chicago
Contact:

Re: Pi, Monkeys, and Shakespeare

Postby mudge » Sun Mar 16, 2008 10:35 pm UTC

Yesila wrote:On a related note does anybody find it odd that the golden ratio is very very poorly approximated by rationals and yet has one of the nicest (simple) continued fraction representations there is?


actually, the fact that it has the continued fraction that it does is why it is so poorly approximated by rationals
http://www.cdbaby.com/cd/mudge <-- buy my CD (Now back in stock!)

GMontag
Posts: 209
Joined: Mon Apr 16, 2007 5:47 am UTC
Location: Bellingham, Washington

Re: Pi, Monkeys, and Shakespeare

Postby GMontag » Mon Mar 17, 2008 1:14 am UTC

skeptical scientist wrote:
Cosmologicon wrote:One potential criterion for an interesting sequence is if you can write a program that produces the sequence (and only the sequence) as output, but which is smaller in size that the output itself. This doesn't work so well for sequences of length 100, but it works well for sequences of length, say, 10,000 or more.

That's true, but for this to make sense, you first need to fix a programming language, and a string will be "interesting" or "not interesting" depending on your programming language. This definition lets you say some very powerful and nonintuitive things about the set of noninteresting strings (which are called random strings), which makes it very useful, and an active area of research, especially when you move from the realm of finite strings to the realm of infinite sequences (since fortunately, the set of infinite sequences which are random does not depend on your choice of programming language). However it has some issues as a definition for interesting properties of finite strings, particularly - as you say - when they are short.


I don't see why infinite sequences would depend any less on the programming language than finite sequences. Your argument about including a specialized function for the sequence applies equally well to infinite sequences (assuming of course that the sequence is computable in the first place).

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Pi, Monkeys, and Shakespeare

Postby skeptical scientist » Mon Mar 17, 2008 1:42 am UTC

GMontag wrote:I don't see why infinite sequences would depend any less on the programming language than finite sequences. Your argument about including a specialized function for the sequence applies equally well to infinite sequences (assuming of course that the sequence is computable in the first place).

I didn't actually explain the definition for infinite sequences, so you probably came up with a different one which is why you don't think it would apply. The real definition can be found in one of the wikipedia articles I linked above. Here is a rather informal* explanation:

*Informal means I'm talking in terms of "programming languages", "interpreters", and other commonly understood notions from computer science rather than in terms of universal prefix-free Turing machines.

For finite sequences, we defined the sequence as random if the length of the shortest program producing it is at least as long as the sequence itself, and nonrandom if the program is shorter. From now on, we'll assume that our programs are composed entirely of 0s and 1s, and so our are strings, we'll write |x| for the length of the string x, and we'll define the (Kolmogorov) complexity C(x) as the length of the shortest program producing x. Now in our new notation, x is nonrandom if C(x)<|x|.

In order to define randomness for infinite strings, we need to modify our definition of complexity C and replace it by what is called the "prefix-free" Kolmogorov complexity K. We'll replace our language by what is called a "prefix-free" language, which means that one valid program is never an initial segment of another. For example, if 0010100110 is a valid program, then 00101 cannot be, nor can 00101001101. Now we define K(x) as complexity relative to our new prefix-free language. (Why we do this is a bit mysterious at first, but it turns out that the definition we'll give of random strings is equivalent to two other definitions which on the surface appear to be natural definitions but quite different from this one, and if we tried to give the same definition using C instead of K we'd end up with no random sequences at all. See Algorithmically random sequence (Wikipedia).)

Now that we have K, we define randomness for infinite sequences as follows:
An infinite sequence X is called random if there is some constant c so that for all natural numbers n, K(x)>n-c if x is the string comprised of the first n bits of X.

Why doesn't this depend on our choice of programming language? If we have any programming languages L1 and L2, with corresponding complexities K1 and K2, we make the following observation: we can write, in L1, an interpreter for L2. The length of this program will be c. So if for some string x, K2(x)=n, there is a program P of length n written in L2 which outputs x. So there is a program written in L1 that says "interpret P as an L2 program and output whatever it outputs"; this program has length c+n (c for the interpreter, n for the length of P). So for any x, K1(x) is at most K2(x)+c. This argument works in the other direction too, since we can write an interpreter for L1 in L2. So the complexities relative to different languages may be different, but their difference is bounded by some constant. If we look back at the definition of a random infinite sequence, we see that if we alter K by some amount bounded by a constant, we won't change which strings are random.

Also as one consequence of this definition no computable sequences are random, which answers another part of your question.

Exercise: prove that if X is computable, (i) then there's a constant c so that the prefix-free complexity of the first n bits of X is at most 2log2 n + c, (ii) that this is eventually as much smaller than n as you want, and therefore (iii) that X is nonrandom.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson


Return to “Mathematics”

Who is online

Users browsing this forum: No registered users and 16 guests