irrational numbers as typewriting monkeys
Moderators: gmalivuk, Moderators General, Prelates
irrational numbers as typewriting monkeys
The decimal expansion of irrational numbers contains an infinite sequence of random digits; therefore it stands to reason that given any sequence of digits, that sequence appears at some position within, say, pi (or phi, or e, or root 2, etc, but I'll just say pi for the sake of argument, and let's not consider things like .1010010001... because they aren't really random.) This is just like the monkeys with typewriters thought experiment; wait long enough and they'll produce the works of Shakespeare. The difference is that irrational numbers give the same random sequence of digits every time you expand them, so all you need to know is the number of places after the decimal that the particular string is. You could look at the decimal expansion of pi as a pseudorandom function that gives integers from 0 to 9 based on a whole number seed.
This seems like a paradoxically efficient way to encode information. As a simple example, suppose your phone number were 141 5926; all you would have to do to keep track of it would be to say "It starts in the 10^1 place of pi." Instead of making copies of a Shakespearian sonnet for the class, an English teacher could tell everyone at what place in pi the ascii version of the poem appears. You could back up an entire computer by having it tell you where to find the contents of the hard drive in pi. This is all too good to be true, so I have a feeling that the place value reference number will almost always be bigger than the actual data, with the exception of a few lucky people who want to encode messages that happen to appear very early.
Even if this doesn't compress the data, it could still be used for encoding data with a very small key. You could pick any expression with an irrational decimal expansion,something as simple as root 2 or something more complicated like square_root(e)^cube_root(pi) / (3^.456)find your message within the infinite sequence of digits, and keep track of the (probably gigantic) number that refers to that decimal place. That reference number is now your encoded message, and the expression is the key. It will probably take a really long time to encode, but if you have an algorithm for finding the nth digit and skip all the ones between it and the decimal the decoding should go much faster since you know exactly where to look. You probably wouldn't actually do this in decimal; it would be more convenient to use a base that fits the data: binary or hexadecimal for computer files... actually one of those probably covers it since so many kinds of data can be represented in a computer.
This probably isn't practical, but it's funny to think that any data a human has ever created or will ever createliterature, art, sculpture, music, architecture, highly detailed blueprints for cars, gourmet recipes, the proof for Fermat's last theorem, the Krabby Patty's secret formula, tech specs for computers, the entire source code of operating systems, computer programs, video games, this forum post, the procedures I have described in it, and hopefully the analytic solution to the nbody problem, to name a fewhave always existed in irrational numbers, but probably require more information to refer to than they consist of. Although I suppose this brings up the debate over whether the information is in the record or the record player, so to speak.
This seems like a paradoxically efficient way to encode information. As a simple example, suppose your phone number were 141 5926; all you would have to do to keep track of it would be to say "It starts in the 10^1 place of pi." Instead of making copies of a Shakespearian sonnet for the class, an English teacher could tell everyone at what place in pi the ascii version of the poem appears. You could back up an entire computer by having it tell you where to find the contents of the hard drive in pi. This is all too good to be true, so I have a feeling that the place value reference number will almost always be bigger than the actual data, with the exception of a few lucky people who want to encode messages that happen to appear very early.
Even if this doesn't compress the data, it could still be used for encoding data with a very small key. You could pick any expression with an irrational decimal expansion,something as simple as root 2 or something more complicated like square_root(e)^cube_root(pi) / (3^.456)find your message within the infinite sequence of digits, and keep track of the (probably gigantic) number that refers to that decimal place. That reference number is now your encoded message, and the expression is the key. It will probably take a really long time to encode, but if you have an algorithm for finding the nth digit and skip all the ones between it and the decimal the decoding should go much faster since you know exactly where to look. You probably wouldn't actually do this in decimal; it would be more convenient to use a base that fits the data: binary or hexadecimal for computer files... actually one of those probably covers it since so many kinds of data can be represented in a computer.
This probably isn't practical, but it's funny to think that any data a human has ever created or will ever createliterature, art, sculpture, music, architecture, highly detailed blueprints for cars, gourmet recipes, the proof for Fermat's last theorem, the Krabby Patty's secret formula, tech specs for computers, the entire source code of operating systems, computer programs, video games, this forum post, the procedures I have described in it, and hopefully the analytic solution to the nbody problem, to name a fewhave always existed in irrational numbers, but probably require more information to refer to than they consist of. Although I suppose this brings up the debate over whether the information is in the record or the record player, so to speak.
 phlip
 Restorer of Worlds
 Posts: 7453
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: irrational numbers as typewriting monkeys
Well, it's not going to be that useful as compression... Take, say, pi. Now, we can compress 9 (which takes 4 bits to store) down to 5 (which takes only 3 bits), since 9 is the fifth digit of pi... but the first 7 is in position 13, and the first 0 isn't until position 32...
Of course, this is nothing unique to this particular compression scheme... by the pigeonhole principle, all compression schemes make some strings shorter, but must some other strings longer. However, a good compression scheme will rely on the fact that the string you want to compress isn't random... and will make common strings shorter, and uncommon strings longer. Unless you tend to talk about long early strings of pi a lot, your compression scheme doesn't really do that.
Incidentally, the mathematical term you're looking for (to describe numbers like pi, but exclude numbers like 0.1010010001...) is "disjunctive sequence". Technically, with your mention of randomness, you're describing normal numbers (every normal number is disjunctive, but not viceversa)... but normality isn't necessary here, just being a disjunctive sequence. Incidentally, while pi is highly suspected to be normal, it's not even known if it is disjunctive.
Of course, this is nothing unique to this particular compression scheme... by the pigeonhole principle, all compression schemes make some strings shorter, but must some other strings longer. However, a good compression scheme will rely on the fact that the string you want to compress isn't random... and will make common strings shorter, and uncommon strings longer. Unless you tend to talk about long early strings of pi a lot, your compression scheme doesn't really do that.
Incidentally, the mathematical term you're looking for (to describe numbers like pi, but exclude numbers like 0.1010010001...) is "disjunctive sequence". Technically, with your mention of randomness, you're describing normal numbers (every normal number is disjunctive, but not viceversa)... but normality isn't necessary here, just being a disjunctive sequence. Incidentally, while pi is highly suspected to be normal, it's not even known if it is disjunctive.
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}

 Posts: 226
 Joined: Wed Aug 20, 2008 7:01 pm UTC
Re: irrational numbers as typewriting monkeys
I'm not sure that they've proved Pi is normal, if it is, what you say would indeed be true. (as far as I can tell).
However, assuming normal distribution of an infinite string of random numbers, the chances of finding, say, Moby Dick encoded within the first googol digits is probably infinitesimally small (I have no idea the actual probabilities, but it's small). Since we only have calculated upwards of 1 trillion digits thus far, we are not even close to being able to find the encoding of various paragraphs, let alone novels, in Pi.
However, assuming normal distribution of an infinite string of random numbers, the chances of finding, say, Moby Dick encoded within the first googol digits is probably infinitesimally small (I have no idea the actual probabilities, but it's small). Since we only have calculated upwards of 1 trillion digits thus far, we are not even close to being able to find the encoding of various paragraphs, let alone novels, in Pi.
 NathanielJ
 Posts: 882
 Joined: Sun Jan 13, 2008 9:04 pm UTC
Re: irrational numbers as typewriting monkeys
Well, phlip explained things a bit, but since there still seems to be some confusion:
1) No, it is not known whether or not pi is normal. For all we know, there are a finite number of 4's in its decimal representation. Very few numbers are known to be normal, unless they were purposefully constructed to be normal.
2) Suppose you want to encode k baseb digits in this way. Then your reference number will, on average, be b^{k}. Since there are exactly b^{k} possible strings of this length, this tells us two things. First, it does not matter what base you look at pi or e or whatever in; the reference number will still (on average) be the same. Second, there is no compression gain from doing this. If you want to encode a 500bit string via a reference number pointing to digits of pi, then expect your reference number to (on average) be about 500 bits long.
3)
If we consider Moby Dick as a base26 expression and we ignore spaces then we should expect that the reference number for Moby Dick will be about 26^{8} = 208827064576 = 2 * 10^{11}, which is nowhere near a googol. Since about 1.2 trillion decimal digits of pi are known, one should actually expect that "Moby Dick" can be encoded within the known digits of pi.
1) No, it is not known whether or not pi is normal. For all we know, there are a finite number of 4's in its decimal representation. Very few numbers are known to be normal, unless they were purposefully constructed to be normal.
2) Suppose you want to encode k baseb digits in this way. Then your reference number will, on average, be b^{k}. Since there are exactly b^{k} possible strings of this length, this tells us two things. First, it does not matter what base you look at pi or e or whatever in; the reference number will still (on average) be the same. Second, there is no compression gain from doing this. If you want to encode a 500bit string via a reference number pointing to digits of pi, then expect your reference number to (on average) be about 500 bits long.
3)
the chances of finding, say, Moby Dick encoded within the first googol digits is probably infinitesimally small
If we consider Moby Dick as a base26 expression and we ignore spaces then we should expect that the reference number for Moby Dick will be about 26^{8} = 208827064576 = 2 * 10^{11}, which is nowhere near a googol. Since about 1.2 trillion decimal digits of pi are known, one should actually expect that "Moby Dick" can be encoded within the known digits of pi.
 phlip
 Restorer of Worlds
 Posts: 7453
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: irrational numbers as typewriting monkeys
I think the discussion was less about the words "Moby Dick", and more about the entirety of the book called Moby Dick.
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Re: irrational numbers as typewriting monkeys
Unless I'm totally confused here, the odds of a one digit number appearing somewhere in the first 10 numbers (ie needing only one number to define the position) is just higher than one in three, meaning more often than not, it'll take MORE information to define where a single number is in a random normal number than to simply give the original number right away.

 Posts: 226
 Joined: Wed Aug 20, 2008 7:01 pm UTC
Re: irrational numbers as typewriting monkeys
Yes, that's why I italicized it, I was talking about the novel. Go with 26^200,000+
Just chapter one has around 10,000 characters not including spaces.
Just chapter one has around 10,000 characters not including spaces.

 Posts: 30
 Joined: Thu Sep 11, 2008 2:49 am UTC
Re: irrational numbers as typewriting monkeys
what if I want to encode the number 6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666
Will that be somewhere in the infinite expansion of pi? If it is truly random, we can expect this to eventually happen, but does irrationality imply randomness? Is it possible to get 100 sixes (or any digit) in a row?
Will that be somewhere in the infinite expansion of pi? If it is truly random, we can expect this to eventually happen, but does irrationality imply randomness? Is it possible to get 100 sixes (or any digit) in a row?
 phlip
 Restorer of Worlds
 Posts: 7453
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: irrational numbers as typewriting monkeys
angel.white wrote:Will that be somewhere in the infinite expansion of pi?
It is likely so  as mentioned, pi is expected to be "normal", though this hasn't been proven... which means that any string of n digits will not only appear in the expansion of pi, but will appear (on average) once per 10^{n} digits (which is what you'd expect if it was random).
angel.white wrote:If it is truly random, we can expect this to eventually happen, but does irrationality imply randomness?
Irrationality does not imply randomness. 0.101001000100001... is irrational, but most certainly not random. Even 0.123456789101112131415..., which is normal (indeed, I think it was the first number ever proven to be normal), couldn't really be called "random". However, normality means that, if you look for a string in the digits, it'll appear around the same number of times you'd expect from a random sequence (just not necessarily with the same distribution).
angel.white wrote:Is it possible to get 100 sixes (or any digit) in a row?
Probably. But see above re: pi not being proven normal.
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Re: irrational numbers as typewriting monkeys
I think the use of the word "random" is a bit deceiving, too. Any given real number is not "random", it is a fixed, deterministic quantity, as are the digits of its decimal expansion. However, the number can be said to be "normal", which means that if you fed its digits into a computer it would not be able to distinguish between that sequence of digits and a "true" sequence of random numbers from a uniform distribution. In that sense, you can't predict the next digit in the decimal expansion without calculating it any more than you could predict the roll of a "perfect" d10, and you can't predict the next pair of digits better than you could predict a roll of a "perfect" d100 (ungodly things that they are), and so forth. As said, pi is suspected, but not proven, to be normal, although studies of the existing known decimal expansion have shown reasonably consistent results (but of course, studying the first n decimal places and finding them to match a uniform random distribution does nothing to prevent the expansion from the n+1th place being completely different).
pollywog wrote:I want to learn this smile, perfect it, and then go around smiling at lesbians and freaking them out.Wikihow wrote:* Smile a lot! Give a gay girl a knowing "Hey, I'm a lesbian too!" smile.
 Xanthir
 My HERO!!!
 Posts: 4860
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: irrational numbers as typewriting monkeys
ConMan wrote:However, the number can be said to be "normal", which means that if you fed its digits into a computer it would not be able to distinguish between that sequence of digits and a "true" sequence of random numbers from a uniform distribution. In that sense, you can't predict the next digit in the decimal expansion without calculating it any more than you could predict the roll of a "perfect" d10, and you can't predict the next pair of digits better than you could predict a roll of a "perfect" d100 (ungodly things that they are), and so forth. As said, pi is suspected, but not proven, to be normal, although studies of the existing known decimal expansion have shown reasonably consistent results (but of course, studying the first n decimal places and finding them to match a uniform random distribution does nothing to prevent the expansion from the n+1th place being completely different).
That's not normal.
Normality has nothing to do with randomness at all. Rather, normality is a property that says that, for a given base, all possible combinations of digits in that base occur with the expected "equal" frequency. Frex, for a normal number in base 10, we'd expect the strings "0""9" to all show up 10% of the time, "00""99" to all show up 1% of the time, etc. "Simply normal" numbers only need this property to hold for onedigit strings, while "absolutely normal" numbers have this hold in any base.
It is easy to show that normality has nothing to do with randomness by studying the few numbers that we know for a fact to be normal, such as .01234567891011121314151617181920212223242526.... This is certainly *far* from random, but it's perfectly 10normal.
Interestingly, it's not just pi that we have trouble with proving normality for. While it has been proven that almost all real numbers are normal, actually constructing a normal number (or proving a number is normal) is pretty extraordinarily difficult. Apart from the handful of trivial constants that are designed specifically to be normal, we've only explicitly proven a handful of number classes to be normal.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
Re: irrational numbers as typewriting monkeys
Wow. It's funny that we can calculate the digits of pi without knowing all of their properties. Incidentally, how do they calculate pi? Is there some nice formula like there is for e like
\lim_{n\rightarrow \infty}(1+\frac{1}{n})^n
that they can keep reevaluating for higher and higher precision? Also, given that we're unsure whether pi is normal, can we at least find the probability that a given string exists, or does this fall within the realm of uncertainty as opposed to probability?Re: irrational numbers as typewriting monkeys
davetp425 wrote:Wow. It's funny that we can calculate the digits of pi without knowing all of their properties. Incidentally, how do they calculate pi? Is there some nice formula like there is for e like\lim_{n\rightarrow \infty}(1+\frac{1}{n})^nthat they can keep reevaluating for higher and higher precision? Also, given that we're unsure whether pi is normal, can we at least find the probability that a given string exists, or does this fall within the realm of uncertainty as opposed to probability?
That isn't how we calculate e. The usual approach is to use the series:
e = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! + ...
See http://mathworld.wolfram.com/PiFormulas.html (warning, slow to load) for a long list of formulas for pi. See http://www.cs.uwaterloo.ca/~alopezo/ma ... ode12.html and http://www.cecm.sfu.ca/organics/papers/ ... index.html for two different explanations of methods that can be used to compute large numbers of digits of pi on a computer.
Some of us exist to find out what can and can't be done.
Others exist to hold the beer.
 jestingrabbit
 Factoids are just Datas that haven't grown up yet
 Posts: 5942
 Joined: Tue Nov 28, 2006 9:50 pm UTC
 Location: Sydney
Re: irrational numbers as typewriting monkeys
btilly wrote:davetp425 wrote:Wow. It's funny that we can calculate the digits of pi without knowing all of their properties. Incidentally, how do they calculate pi? Is there some nice formula like there is for e like\lim_{n\rightarrow \infty}(1+\frac{1}{n})^nthat they can keep reevaluating for higher and higher precision? Also, given that we're unsure whether pi is normal, can we at least find the probability that a given string exists, or does this fall within the realm of uncertainty as opposed to probability?
That isn't how we calculate e. The usual approach is to use the series:e = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! + ...
davetp425's limit does evaluate to e and, historically, its how the constant first came to be studied (by one of the Bernoullis). There is no way at present to determine whether a given string occurs. We don't even know if there are an infinite number of 1's in the decimal representation of pi.
See http://mathworld.wolfram.com/PiFormulas.html (warning, slow to load) for a long list of formulas for pi. See http://www.cs.uwaterloo.ca/~alopezo/ma ... ode12.html and http://www.cecm.sfu.ca/organics/papers/ ... index.html for two different explanations of methods that can be used to compute large numbers of digits of pi on a computer.
Perhaps the most interesting formula for pi is the BBP formula, which allows you to work out individual digits in a base 16 represenation of pi directly. It will most likely be these digit extraction algorithms that end up resolving whether pi, and other constants, are normal or not. If not, we'll be waiting a long while for resolution.

As for normal vs random stuff... The term normal arose in ergodic theory to describe those points in the phase space whose time averages equalled the space average for the indicator functions of some significant countable collection of sets ie points where the erogodic theorem "works". A lot of ergodic theory is about teasing out the difference between the statement "the system is random" and "the system has behaviours that can be described well with straightforward statistical techniques".
The way that normal is usually used these days is a bit different from this. You take a number, look at its representations in every base from binary on up, and if its a normal point in the old sense wrt the shift operation and the countable collection of sets called cylinder sets (which are sets consisting of points which have the same starting block ie "all the points which start with .141" is a cylinder set), then its a normal number. The old definition is what allows us to directly conclude that almost all points are normal wrt the uniform Lebesgue measure (though we could pretty quickly get this from the Kolmogorov 01 law as well, though we'd have to work a bit harder to show that they form a set with positive measure, as opposed to a null set).
What Conman was probably thinking about is that there are some numbers whose representations have the most entropy in any base, and whose representations are therefore able to be confused with random sequences, which have been shown to be normal. However, we of course can't conclude from this that the converse is true ie whilst highest entropy (which can be dumbed down to "is mistakable for random") implies normal, normal doesn't imply highest entropy.
@OP: its a bad idea. The representation of a string is going to take as much information to express as the string itself on average, and many strings won't be something that you'd ever want to communicate, like people already said.
ameretrifle wrote:Magic space feudalism is therefore a viable idea.
Re: irrational numbers as typewriting monkeys
jestingrabbit wrote:btilly wrote:davetp425 wrote:Wow. It's funny that we can calculate the digits of pi without knowing all of their properties. Incidentally, how do they calculate pi? Is there some nice formula like there is for e like\lim_{n\rightarrow \infty}(1+\frac{1}{n})^nthat they can keep reevaluating for higher and higher precision? Also, given that we're unsure whether pi is normal, can we at least find the probability that a given string exists, or does this fall within the realm of uncertainty as opposed to probability?
That isn't how we calculate e. The usual approach is to use the series:e = 1/0! + 1/1! + 1/2! + 1/3! + 1/4! + ...
davetp425's limit does evaluate to e and, historically, its how the constant first came to be studied (by one of the Bernoullis). There is no way at present to determine whether a given string occurs. We don't even know if there are an infinite number of 1's in the decimal representation of pi.
I was aware of the first fact. As for the second, I'd always heard Euler as being the key person, though I'll grant I could be wrong. But either way, what I said is true. That limit is one of the reasons that e shows up in so many places, but is not of practical utility in calculating it.
Some of us exist to find out what can and can't be done.
Others exist to hold the beer.
Re: irrational numbers as typewriting monkeys
I've never even heard of this...interesting.
I'm still confused by the normal/random connection.
I think it's been established that even a normal number would not be a good way to generate pseudorandom digits, but it still seems to me that these normal numbers are eventually random.
It seems to me intuitively that this is, eventually, random. Obviously it isn't at the start, or any finite distance in, but then at those places the "normality condition" doesn't hold either.
If you chose a point infinitely far in, so to speak, you'd end up in the middle of a 'random', infinitely long number, whose digits would presumably be random. The (almost) repetition would have an infinite period
...if you get what I mean.
This would probably be my first thought for specifying whether an infinite sequence of digits is random.
Am I wrong about this? If so, can someone give me an example of a (preferably finite) set of digits with equallylikely occurrence of any given sequence of any given length, which is not random? [edit: poorly worded, but YKWIM]
I'm not getting the distinction.
I'm still confused by the normal/random connection.
I think it's been established that even a normal number would not be a good way to generate pseudorandom digits, but it still seems to me that these normal numbers are eventually random.
phlip wrote:Even 0.123456789101112131415..., which is normal (indeed, I think it was the first number ever proven to be normal), couldn't really be called "random". However, normality means that, if you look for a string in the digits, it'll appear around the same number of times you'd expect from a random sequence (just not necessarily with the same distribution).
It seems to me intuitively that this is, eventually, random. Obviously it isn't at the start, or any finite distance in, but then at those places the "normality condition" doesn't hold either.
If you chose a point infinitely far in, so to speak, you'd end up in the middle of a 'random', infinitely long number, whose digits would presumably be random. The (almost) repetition would have an infinite period
...if you get what I mean.
Xanthir wrote:Rather, normality is a property that says that, for a given base, all possible combinations of digits in that base occur with the expected "equal" frequency.
This would probably be my first thought for specifying whether an infinite sequence of digits is random.
Am I wrong about this? If so, can someone give me an example of a (preferably finite) set of digits with equallylikely occurrence of any given sequence of any given length, which is not random? [edit: poorly worded, but YKWIM]
I'm not getting the distinction.
 gmalivuk
 GNU Terry Pratchett
 Posts: 24331
 Joined: Wed Feb 28, 2007 6:02 pm UTC
 Location: Here and There
 Contact:
Re: irrational numbers as typewriting monkeys
troyp wrote:If you chose a point infinitely far in, so to speak
This is your problem. There's no such thing as "infinitely far in". Wherever you end up, you scroll to the right a large enough finite distance and you'll see what number you're on. Then you can predict the rest of the expansion without any problem.
troyp wrote:can someone give me an example of a (preferably finite) set of digits with equallylikely occurrence of any given sequence of any given length
No finite such set of digits exists. Because of your any given length criterion, we could just pick a length that's one more than the finite length of the string, and be out of luck. And for one that is infinite, Champernowne's number, which is the one you talked about first (0.1234...), is clearly normal in base10. This is because any string of base10 digits will show up eventually, because if it doesn't start with a zero it'll be one of the concatenated integers itself, and if it does (for example 00341), you can simply drop the first digit of an integer (finding it just after the start of 100341, in other words).
The reason this is not "random" is because I can ask you for any sequence of digits and, without actually computing the whole thing one digit at a time, you can tell me where that sequence appears. Or I can give you a number, and you can tell me the sequence of digits starting that many places beyond the decimal.
 phlip
 Restorer of Worlds
 Posts: 7453
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: irrational numbers as typewriting monkeys
The problem with this thread is that noone's defined what they mean when they say "random" (and I'm guilty of this too).
The term "random" has more distinct mathematical definitions than can be entirely healthy, as well as a bucketload of colloquial definitions... and any discussion that's not careful (eg: this one) will tend to have people talking at crosspurposes about different definitions of the word (or even one person conflating two or more definitions in the same argument). It's rather like "infinity" in that way... because of its colloquial use, it tends to get thrown around carelessly, when it's actually quite a complicated subject if you want to take it seriously. The fact that normality involves asymptotic frequencies, and thus both randomness and infinity, doesn't help much.
For instance: assume pi is normal for now. Would you call it random?
With how much number theory is known at the moment, we can't say anything useful about the, say, 10^{100}th digit of pi. It could be anything fro 0 to 9, we don't know. Would you call it random? And once we calculate it, that alone won't tell us a whole lot about the 10^{100}+1st digit... it could still be anything. Certainly seems random to me.
But it's only about as random as this... in that once we've calculated it, we know what it is... and if we calculate it again, we'll get the same result.
But then again, if all we knew about getRandomNumber() was that the result was chosen from a dice roll, then it would be random the first time we called it... but calling it repeatedly wouldn't give us a random sequence. But now I'm drifting away from randomness and towards Bayesian probability...
The usual definition of a "random sequence" is a sequence of numbers that, even knowing how it's generated, and all the numbers up to this point, it's impossible to predict the next value in the sequence. The digits of pi clearly fail at this test, since knowing how it's generated is theoretically enough to calculate all the remaining digits.
There are weakenings of this definition, like "cryptographically random" and "pseudorandom", which replace "impossible" with "computationally hard" and "unlikely to be noticed by the average observer", respectively... and the digits of pi probably fall under one or the other of these, depending on the specifics. The digits of champernowne's constant wouldn't, however.
Note that it doesn't make sense to ask if a specific number is "random"... only a certain way of generating numbers. Asking if pi is random is meaningless... asking if the digits of pi are random could be meaningful, but you have to be careful about how you're defining "random".
The term "random" has more distinct mathematical definitions than can be entirely healthy, as well as a bucketload of colloquial definitions... and any discussion that's not careful (eg: this one) will tend to have people talking at crosspurposes about different definitions of the word (or even one person conflating two or more definitions in the same argument). It's rather like "infinity" in that way... because of its colloquial use, it tends to get thrown around carelessly, when it's actually quite a complicated subject if you want to take it seriously. The fact that normality involves asymptotic frequencies, and thus both randomness and infinity, doesn't help much.
For instance: assume pi is normal for now. Would you call it random?
With how much number theory is known at the moment, we can't say anything useful about the, say, 10^{100}th digit of pi. It could be anything fro 0 to 9, we don't know. Would you call it random? And once we calculate it, that alone won't tell us a whole lot about the 10^{100}+1st digit... it could still be anything. Certainly seems random to me.
But it's only about as random as this... in that once we've calculated it, we know what it is... and if we calculate it again, we'll get the same result.
But then again, if all we knew about getRandomNumber() was that the result was chosen from a dice roll, then it would be random the first time we called it... but calling it repeatedly wouldn't give us a random sequence. But now I'm drifting away from randomness and towards Bayesian probability...
The usual definition of a "random sequence" is a sequence of numbers that, even knowing how it's generated, and all the numbers up to this point, it's impossible to predict the next value in the sequence. The digits of pi clearly fail at this test, since knowing how it's generated is theoretically enough to calculate all the remaining digits.
There are weakenings of this definition, like "cryptographically random" and "pseudorandom", which replace "impossible" with "computationally hard" and "unlikely to be noticed by the average observer", respectively... and the digits of pi probably fall under one or the other of these, depending on the specifics. The digits of champernowne's constant wouldn't, however.
Note that it doesn't make sense to ask if a specific number is "random"... only a certain way of generating numbers. Asking if pi is random is meaningless... asking if the digits of pi are random could be meaningful, but you have to be careful about how you're defining "random".
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Re: irrational numbers as typewriting monkeys
gmalivuk wrote:This is your problem. There's no such thing as "infinitely far in". Wherever you end up, you scroll to the right a large enough finite distance and you'll see what number you're on. Then you can predict the rest of the expansion without any problem.
Well, put it this way...imagine you pick a point a large finite distance in, without knowing the exact distance, and start reading back digits in a stream. It seems it would take a long time (the length of the 'incrementing numbers' at that point, before you could start predicting future digits (and even then, not with certainty). Of course, the 'period' is growing to infinity slower than the 'number of digits in', but that doesn't change the fact that it should be eventually "random".
gmalivuk wrote:No finite such set of digits exists. Because of your any given length criterion, we could just pick a length that's one more than the finite length of the string, and be out of luck.troyp wrote:can someone give me an example of a (preferably finite) set of digits with equallylikely occurrence of any given sequence of any given length
If you had 5 digits and you looked at all substrings of length 6 (empty set), every substring would be equally likely. That's what I meant by that part. Attempting brevity last night, I worded this request very badly. I thought people would know what I meant. Looking at it now, even I don't know what I meant. Well, I meant, "can someone show me a finite string of digits obeying this property which is clearly not random, so I can see what difference there would be", but I'm not sure exactly how someone would do that with a finite string.
phlip wrote:The problem with this thread is that noone's defined what they mean when they say "random" (and I'm guilty of this too).
...and me, in particular. I knew I was doing it, but I figured since I was only trying to convey my intuitive reasoning to others, it was okay  but it's not.
phlip wrote:The usual definition of a "random sequence" is a sequence of numbers that, even knowing how it's generated, and all the numbers up to this point, it's impossible to predict the next value in the sequence.
gmalivuk wrote:Or I can give you a number, and you can tell me the sequence of digits starting that many places beyond the decimal.
Yes. Maybe "locally random" would have been a better phrase. I was thinking of looking at a substring of digits, without knowing the index it began at.
Re: irrational numbers as typewriting monkeys
troyp wrote:If you had 5 digits and you looked at all substrings of length 6 (empty set), every substring would be equally likely. That's what I meant by that part. Attempting brevity last night, I worded this request very badly. I thought people would know what I meant. Looking at it now, even I don't know what I meant. Well, I meant, "can someone show me a finite string of digits obeying this property which is clearly not random, so I can see what difference there would be", but I'm not sure exactly how someone would do that with a finite string.
The definition of normal would include what you want as a special case.
troyp wrote:phlip wrote:The problem with this thread is that noone's defined what they mean when they say "random" (and I'm guilty of this too).
...and me, in particular. I knew I was doing it, but I figured since I was only trying to convey my intuitive reasoning to others, it was okay  but it's not.
The problem is that most people's intuitions of "random" and "specific number" are mutually exclusive. That is why you have to be precise.
A good general way to think about random things is that you can't talk about randomness without having a probability distribution in mind at least implicitly. So you can use the decimal expansion of pi to define a probability distribution for finite strings of digits of any length. Our belief is that the resulting distribution is welldefined, and all strings of n digits are equally likely. But we lack proofs of both assertions. Furthermore this is not a useful practical technique to use. (Not that practicality has been important in this discussion.)
Some of us exist to find out what can and can't be done.
Others exist to hold the beer.
Re: irrational numbers as typewriting monkeys
I was going to respond to btilly, but I've decided not to. All I seem to be doing here is making a farce of this thread. I was trying to communicate my intuitive reasoning, hoping people would 'get what I mean' and be able to help me get a grip on the issue  but it's very difficult to convey this. A big part of the problem is that I don't even have a clear understanding of some of the concepts I'm using intuitively: randomness is a deep mystery and something I've never really understood.
I'm still interested in the normal/random connection, though. jestingrabbit's post earlier seemed to discuss the kind of thing I wanted to know, but I know nothing about ergodic theory and I could only understand half of what he said.
So let me replace the vague, general questions with a more specific one.
* Suppose we have a black box containing a megadegacomputer which calculates the digits of Champernowne's number very quickly. It takes a very large number N (something made with a bunch of Knuth Arrows or Busy Beaver functions, that kind of large number), and indexes the sequence of digits with it. The black box ouputs the digits from this point on as a stream.
* N is assumed to be so large that noone would ever be able to use log_{10}N digits of the output stream.
* The person receiving this output knows the digits are from Champernowne's number, but doesn't know the index.
* Question: Can we make any kind of predictions about future digits of the output?
* If so, what kind of information can we obtain about their distribution and how many digits would we have to analyse to do this?
I'm still interested in the normal/random connection, though. jestingrabbit's post earlier seemed to discuss the kind of thing I wanted to know, but I know nothing about ergodic theory and I could only understand half of what he said.
So let me replace the vague, general questions with a more specific one.
* Suppose we have a black box containing a megadegacomputer which calculates the digits of Champernowne's number very quickly. It takes a very large number N (something made with a bunch of Knuth Arrows or Busy Beaver functions, that kind of large number), and indexes the sequence of digits with it. The black box ouputs the digits from this point on as a stream.
* N is assumed to be so large that noone would ever be able to use log_{10}N digits of the output stream.
* The person receiving this output knows the digits are from Champernowne's number, but doesn't know the index.
* Question: Can we make any kind of predictions about future digits of the output?
* If so, what kind of information can we obtain about their distribution and how many digits would we have to analyse to do this?
 skeptical scientist
 closedminded spiritualist
 Posts: 6139
 Joined: Tue Nov 28, 2006 6:09 am UTC
 Location: San Francisco
Re: irrational numbers as typewriting monkeys
Sure we can, we just have to wait a while. Even if we know nothing about N other than the fact that it is very large, once the string starts repeating itself from the beginning for a large period, we can predict, with a high degree of certainty, all subsequent digits. Of course, to get to that point, we will have to have already seen more than N many digits, which presumably will never happen. So essentially, this isn't terribly different from your black box just being a random number generator.
However, if we knew something about N, we could make stronger predictions. For example, suppose we knew that N was a very large number that happened to be of some specific type (e.g. a power of 2). Depending on our normal number, we could make accurate predictions based on this information, because if you take a normal number and select just the bits that fall on powers of two, the result isn't necessarily normal. That's one reason why it's a bad idea to conflate "normal" with "random"  random numbers are all normal (at least in the sense that the nonnormal numbers have measure 0), but not all normal numbers should be considered random.
However, if we knew something about N, we could make stronger predictions. For example, suppose we knew that N was a very large number that happened to be of some specific type (e.g. a power of 2). Depending on our normal number, we could make accurate predictions based on this information, because if you take a normal number and select just the bits that fall on powers of two, the result isn't necessarily normal. That's one reason why it's a bad idea to conflate "normal" with "random"  random numbers are all normal (at least in the sense that the nonnormal numbers have measure 0), but not all normal numbers should be considered random.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.
"With math, all things are possible." —Rebecca Watson
"With math, all things are possible." —Rebecca Watson
Re: irrational numbers as typewriting monkeys
skeptical scientist wrote:Sure we can, we just have to wait a while. Even if we know nothing about N other than the fact that it is very large, once the string starts repeating itself from the beginning for a large period, we can predict, with a high degree of certainty, all subsequent digits. Of course, to get to that point, we will have to have already seen more than N many digits, which presumably will never happen. So essentially, this isn't terribly different from your black box just being a random number generator.
(I think the second N here should be the period)
Yes, we're assuming the period is so large we'll never exceed it (I'm trying to imagine the eventual behaviour).
Whether the black box is basically a random number generator is exactly what I want to know. It seems to me it should be, but I wanted to check with others.
skeptical scientist wrote:However, if we knew something about N, we could make stronger predictions.
I can see that knowledge of N will screw things up.
(One of the things I failed to explain earlier was that I was thinking of the digits at a 'local' level, without associating them with an absolute index).
 Xanthir
 My HERO!!!
 Posts: 4860
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: irrational numbers as typewriting monkeys
skeptical scientist wrote:That's one reason why it's a bad idea to conflate "normal" with "random"  random numbers are all normal (at least in the sense that the nonnormal numbers have measure 0), but not all normal numbers should be considered random.
I still want to correct you with "random numbers are almost all normal". That phrasing accurately conveys what you expressed parenthetically.
I can easily express random numbers that are trivially not normal. For example, take any base10 number that consists of a random sequence of 1s and 0s. This number is entirely "random", in that it cannot be compressed further. However, it is obviously not normal, because it doesn't contain an equal number of all ten digits 09, even in the limit.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
 skeptical scientist
 closedminded spiritualist
 Posts: 6139
 Joined: Tue Nov 28, 2006 6:09 am UTC
 Location: San Francisco
Re: irrational numbers as typewriting monkeys
Xanthir wrote:skeptical scientist wrote:That's one reason why it's a bad idea to conflate "normal" with "random"  random numbers are all normal (at least in the sense that the nonnormal numbers have measure 0), but not all normal numbers should be considered random.
I still want to correct you with "random numbers are almost all normal". That phrasing accurately conveys what you expressed parenthetically.
No, I wrote what I meant. "Random" numbers should not have measure 0 properties of interest. So the random numbers are all normal, since the nonnormal numbers are nonrandom.
I can easily express random numbers that are trivially not normal. For example, take any base10 number that consists of a random sequence of 1s and 0s. This number is entirely "random", in that it cannot be compressed further.
It can certainly be compressed further, since the amount of bits of information contained in the first N base 10 digits is N, whereas the amount of information contained in the first N digits of a random number is N*log_{2}(10). If you want to use "random" to mean "incompressible", then it's certainly the case that all random numbers are normal.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.
"With math, all things are possible." —Rebecca Watson
"With math, all things are possible." —Rebecca Watson
 Xanthir
 My HERO!!!
 Posts: 4860
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: irrational numbers as typewriting monkeys
skeptical scientist wrote:I can easily express random numbers that are trivially not normal. For example, take any base10 number that consists of a random sequence of 1s and 0s. This number is entirely "random", in that it cannot be compressed further.
It can certainly be compressed further, since the amount of bits of information contained in the first N base 10 digits is N, whereas the amount of information contained in the first N digits of a random number is N*log_{2}(10). If you want to use "random" to mean "incompressible", then it's certainly the case that all random numbers are normal.
I see what you mean, and am stupid. Just read the 'random' number I provided as a Base 2 number, convert it to base 10, and you've achieved compression.
I'm not entirely certain about your last sentence, but I suspect it to be true. It seems like any departure from normality would introduce a bias that could be used for compression.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
 skeptical scientist
 closedminded spiritualist
 Posts: 6139
 Joined: Tue Nov 28, 2006 6:09 am UTC
 Location: San Francisco
Re: irrational numbers as typewriting monkeys
To prove it, you use the fact that the incompressible reals are the same as the MartinLof random reals, which are defined in terms of "tests". A test is a measure 0 set presented in a certain way, and a MartinLof random real is one which is not a member of any test. So an incompressible real is not a member of any test by the above equivalence, and nonnormal numbers form a test. (They are a measure 0 set, and you can present the set of nonnormal numbers in the needed way to make it a test.) Therefore incompressible numbers are all normal.
The proof that incompressible reals are the same as MartinLof random reals goes basically as you would imagine. One direction is proved by showing that if a real is part of a test, you can use that fact to compress it. The other direction is proved by showing that "compressibility" itself works as a test.
The proof that incompressible reals are the same as MartinLof random reals goes basically as you would imagine. One direction is proved by showing that if a real is part of a test, you can use that fact to compress it. The other direction is proved by showing that "compressibility" itself works as a test.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.
"With math, all things are possible." —Rebecca Watson
"With math, all things are possible." —Rebecca Watson
 chickendude
 Posts: 33
 Joined: Fri Jun 15, 2007 12:48 am UTC
 Location: Massachusetts
 Contact:
Re: irrational numbers as typewriting monkeys
http://www.drmikesmaths.com/pisearch.html
This will give you an idea of how effective your strategy is.
Over the first 32 million digits of pi, 7 letter words have just about a 0% chance of showing up.
You would need to store a shitload of pi.
Nipunn
This will give you an idea of how effective your strategy is.
Over the first 32 million digits of pi, 7 letter words have just about a 0% chance of showing up.
You would need to store a shitload of pi.
Nipunn
Re: irrational numbers as typewriting monkeys
skeptical scientist wrote:Xanthir wrote:skeptical scientist wrote:That's one reason why it's a bad idea to conflate "normal" with "random"  random numbers are all normal (at least in the sense that the nonnormal numbers have measure 0), but not all normal numbers should be considered random.
I still want to correct you with "random numbers are almost all normal". That phrasing accurately conveys what you expressed parenthetically.
No, I wrote what I meant. "Random" numbers should not have measure 0 properties of interest. So the random numbers are all normal, since the nonnormal numbers are nonrandom.
Then what you meant is wrong. Any "random" number will necessarily be in a set of measure 0  namely in the set consisting of just that number. Therefore it is impossible for a random number to not have some properties of measure 0.
This is, of course, sidestepping the fact that we cannot algorithmically actually pick a random real number. Which is why in probability theory we sidestep that question and use the language of measure theory. Which quickly leads to drawing the important distinction between "all" and "almost all".
skeptical scientist wrote:I can easily express random numbers that are trivially not normal. For example, take any base10 number that consists of a random sequence of 1s and 0s. This number is entirely "random", in that it cannot be compressed further.
It can certainly be compressed further, since the amount of bits of information contained in the first N base 10 digits is N, whereas the amount of information contained in the first N digits of a random number is N*log_{2}(10). If you want to use "random" to mean "incompressible", then it's certainly the case that all random numbers are normal.
Nothing is incompressible. It all depends on your algorithm. For the classic example, I can compress The Mona Lisa to one bit  just have a compression algorithm that uses its first bit to say whether you've got the Mona Lisa, and then uses the rest to encode the picture in a jpeg format.
If this example seems contrived then you need to remember that lossless compression is entirely a game of encoding messages you expect to see efficiently at the cost of making messages you don't expect take more room. The amount of compression you get is therefore always the result of the interaction between the compression algorithm and the message to be compressed.
skeptical scientist wrote:To prove it, you use the fact that the incompressible reals are the same as the MartinLof random reals, which are defined in terms of "tests". A test is a measure 0 set presented in a certain way, and a MartinLof random real is one which is not a member of any test. So an incompressible real is not a member of any test by the above equivalence, and nonnormal numbers form a test. (They are a measure 0 set, and you can present the set of nonnormal numbers in the needed way to make it a test.) Therefore incompressible numbers are all normal.
Now I understand you better. For those who don't understand what MartinLöf random means, here is a definition. We say that a set of real numbers X is of constructive size 0 iff there is a computer algorithm A such that for every n, A will produce a covering of X of total length less than 2^{n}. A number is MartinLöf random if it is in no set of constructive size 0. But constructive size 0 sets are all of measure 0, and there are only a countable number of algorithms, so the set of numbers that are not MartinLöf random has measure 0. Therefore almost all numbers are MartinLöf random.
Therefore when you are saying "incompressible reals" you are talking about ones which are incompressible by any algorithm expressible on a computer.
Of course as a closet constructivist I have serious qualms about the "existence" of such numbers. And if you do take them seriously, then I reiterate that an idealized algorithm for picking a random number will be overwhelmingly likely to pick a MartinLöf random number, but it must be possible to pick one that isn't. Hence again the distinction probability theorists draw between "impossible" and "of probability 0".
Some of us exist to find out what can and can't be done.
Others exist to hold the beer.
 Xanthir
 My HERO!!!
 Posts: 4860
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: irrational numbers as typewriting monkeys
btilly wrote:skeptical scientist wrote:I can easily express random numbers that are trivially not normal. For example, take any base10 number that consists of a random sequence of 1s and 0s. This number is entirely "random", in that it cannot be compressed further.
It can certainly be compressed further, since the amount of bits of information contained in the first N base 10 digits is N, whereas the amount of information contained in the first N digits of a random number is N*log_{2}(10). If you want to use "random" to mean "incompressible", then it's certainly the case that all random numbers are normal.
Nothing is incompressible. It all depends on your algorithm. For the classic example, I can compress The Mona Lisa to one bit  just have a compression algorithm that uses its first bit to say whether you've got the Mona Lisa, and then uses the rest to encode the picture in a jpeg format.
If this example seems contrived then you need to remember that lossless compression is entirely a game of encoding messages you expect to see efficiently at the cost of making messages you don't expect take more room. The amount of compression you get is therefore always the result of the interaction between the compression algorithm and the message to be compressed.
This is incorrect, because of a slight abuse of terminology. We speak of a given message being compressed into a smaller one, but in actuality the 'compressed' message must be thought of as the message + algorithm. This is why your "Mona Lisa Compressor" isn't actually an efficient compression algorithm  feeding it the Mona Lisa results in a (final message + algorithm) length larger than the Mona Lisa itself. After all, that "one bit" you have at the end *cannot* be decompressed into the Mona Lisa without the compression algorithm that produced it.
(Of course, we can amortize the cost of a common compression algorithm to effectively zero by using it for all compressions, but that's an irrelevant detail here.)
So yes, things are incompressible. In fact, almost all things are incompressible. The only 'compression algorithm' that will shorten the length of the message is one that encodes the message itself within it (or at least parts of it), thus making the (final message + algorithm) larger than the message itself.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
Re: irrational numbers as typewriting monkeys
I would really like to see a room full of monkeys and typewriters.
Wanna hear the truth? Life is downright ok.
 skeptical scientist
 closedminded spiritualist
 Posts: 6139
 Joined: Tue Nov 28, 2006 6:09 am UTC
 Location: San Francisco
Re: irrational numbers as typewriting monkeys
btilly wrote:Then what you meant is wrong. Any "random" number will necessarily be in a set of measure 0  namely in the set consisting of just that number. Therefore it is impossible for a random number to not have some properties of measure 0.
That's why I referred the properties "of interest". It's true that any number is in some measure 0 set, but usually that set won't even be definable. The set of nonnormal numbers, on the other hand, is easily described.
Nothing is incompressible. It all depends on your algorithm. For the classic example, I can compress The Mona Lisa to one bit  just have a compression algorithm that uses its first bit to say whether you've got the Mona Lisa, and then uses the rest to encode the picture in a jpeg format.
If this example seems contrived then you need to remember that lossless compression is entirely a game of encoding messages you expect to see efficiently at the cost of making messages you don't expect take more room. The amount of compression you get is therefore always the result of the interaction between the compression algorithm and the message to be compressed.
I was assuming a fixed computable compression scheme to begin with. In this case, most strings are incompressible. Obviously if you allow the compression scheme to vary, you can compress everything, but that's not very interesting or relevant.
Anyways, once you read the rest of my post it became clear that I think there are "random" and "nonrandom" strings, and that you can distinguish one from the other by their regularities. While not the probabilistic viewpoint, it captures the intuition that a string like "01010101010101010101..." is somehow different from "01100101110101100101..." and has been made rigorous by Per MartinLof among others.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.
"With math, all things are possible." —Rebecca Watson
"With math, all things are possible." —Rebecca Watson
 Yakk
 Poster with most posts but no title.
 Posts: 10879
 Joined: Sat Jan 27, 2007 7:27 pm UTC
 Location: E pur si muove
Re: irrational numbers as typewriting monkeys
I was assuming a fixed computable compression scheme to begin with. In this case, most strings are incompressible. Obviously if you allow the compression scheme to vary, you can compress everything, but that's not very interesting or relevant.
Or, to look at it another way  the number of bits required to say "which compression algorithm am I using" ends up growing fast enough that even if you have algorithms that are fast for any one given image, the description that says "oh, and by the way, we compressed it using algorithm 7" ends up pushing you into the 'fail' territory.
Now, most information theory doesn't talk about a given string, but rather families of strings. And the impact of choosing a given compression algorithm is at most constant  you define your compression algorithm to be a complete turing machine, encode the description of your rival machine at the start of the compressed contents, then stick the data at the end of it. The rival compression algorithm is fixed in size, regardless of the size of the thing you are trying to compress...
Now, as reals are not finite strings, an incompressable (up to a finite factor) real .. is incompressable in any scheme. There is, in fact, no way to say which real it is you are talking about in a finite amount of bits, at least not in a way that lets you generate the decimal representation of that real (or recognize, in a particular sense). Which makes the existence proof very nonconstructive!
So yes, things are incompressible. In fact, almost all things are incompressible. The only 'compression algorithm' that will shorten the length of the message is one that encodes the message itself within it (or at least parts of it), thus making the (final message + algorithm) larger than the message itself.
Well, if you restrict "things" to "finite strings", then to be fair, the algorithm:
If there is a 00 at the start of the data, start the result with a 0.
If there is not, start the result with a 1, and then write out the data.
is a really simple compression algorithm that compresses 25% of all data by 1 bit, and extends the other 75% by 1 bit.
75% isn't "almost all".
You can demonstrate that, for a given fixed algorithm, at most half of all strings up to a particular length can be compressed (pigeonhole), but I don't think you can give a better bound on how good compression can be?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision  BR
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
 skeptical scientist
 closedminded spiritualist
 Posts: 6139
 Joined: Tue Nov 28, 2006 6:09 am UTC
 Location: San Francisco
Re: irrational numbers as typewriting monkeys
To be more precise, I meant the set \{x \in 2^\omega : K(n)K(x _n) \text { is unbounded}\} has measure 0. (Here x_{n} denotes the string consisting of the first n bits of x, and K is prefixfree Kolmogorov complexity.)
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.
"With math, all things are possible." —Rebecca Watson
"With math, all things are possible." —Rebecca Watson
 Yakk
 Poster with most posts but no title.
 Posts: 10879
 Joined: Sat Jan 27, 2007 7:27 pm UTC
 Location: E pur si muove
Re: irrational numbers as typewriting monkeys
skeptical scientist wrote:To be more precise, I meant the set \{x \in 2^\omega : K(n)K(x _n) \text { is unbounded}\} has measure 0. (Here x_{n} denotes the string consisting of the first n bits of x, and K is prefixfree Kolmogorov complexity.)
So the complexity of the first n bits of x is bounded within a some distance of the complexity of the number n is required for something to be rejected.
Simple elements, such as "all zeros", fall in your set. In short, you need to have no more 'information' in the element than "the length of the sequence requested". So that would roughly be the set of elements that you can make a Turing machine T such that it is fed the number n, and output the first n digits of the number. Can I prove that? Hmm.
By the way, what measure? Ah, you want x to be a real number, so probably the standard measure on the reals.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision  BR
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
 Xanthir
 My HERO!!!
 Posts: 4860
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: irrational numbers as typewriting monkeys
Yakk wrote:So yes, things are incompressible. In fact, almost all things are incompressible. The only 'compression algorithm' that will shorten the length of the message is one that encodes the message itself within it (or at least parts of it), thus making the (final message + algorithm) larger than the message itself.
Well, if you restrict "things" to "finite strings", then to be fair, the algorithm:
If there is a 00 at the start of the data, start the result with a 0.
If there is not, start the result with a 1, and then write out the data.
is a really simple compression algorithm that compresses 25% of all data by 1 bit, and extends the other 75% by 1 bit.
75% isn't "almost all".
You can demonstrate that, for a given fixed algorithm, at most half of all strings up to a particular length can be compressed (pigeonhole), but I don't think you can give a better bound on how good compression can be?
For a single fixed algorithm, sure, but that's hardly interesting, nor does it prove my point wrong. A single fixed algorithm can have its length amortized across the number of messages, which means in practice that it's "free". I made this point in my previous post. If you don't assume a single fixed algorithm, though, then your "compression algorithm" is quite poor indeed, as it doesn't compress a single message! (The algorithm obviously takes greater than 1 bit to express.)
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
 Yakk
 Poster with most posts but no title.
 Posts: 10879
 Joined: Sat Jan 27, 2007 7:27 pm UTC
 Location: E pur si muove
Re: irrational numbers as typewriting monkeys
Xanthir wrote:For a single fixed algorithm, sure, but that's hardly interesting, nor does it prove my point wrong. A single fixed algorithm can have its length amortized across the number of messages, which means in practice that it's "free". I made this point in my previous post. If you don't assume a single fixed algorithm, though, then your "compression algorithm" is quite poor indeed, as it doesn't compress a single message! (The algorithm obviously takes greater than 1 bit to express.)
I can easily define my turing complete language such that any single fixed algorithm takes up 1 bit to describe. Heck, with a bit of work (heh), I think you might be able to do better than that. It gets annoying, however, as you have to define your turing complete language that expresses what is going on in strange ways.
This isn't me cheating. When you go and start defining things like "an algorithm", you either end up picking one special encoding of that algorithm for arbitrary reasons, or you write your poofs that allow any encoding that a machine that is turing complete can interpret.
Here:
http://en.wikipedia.org/wiki/Kolmogorov_complexity
You'll note that the theory is full of choosing representations, and it works regardless of what representation you pick.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision  BR
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Re: irrational numbers as typewriting monkeys
skeptical scientist wrote:To be more precise, I meant the set \{x \in 2^\omega : K(n)K(x _n) \text { is unbounded}\} has measure 0. (Here x_{n} denotes the string consisting of the first n bits of x, and K is prefixfree Kolmogorov complexity.)
If that is meant to be an easy to understand restatement of what it means to not be MartinLöf random, then I think it is wrong. For example no point in the Cantor set is MartinLöf random because there is a simple program that, given n, will create a set that covers the entire Cantor set of length under 2^{n}. However there are points in the Cantor set that are not in your set.
Perhaps you have an extra K in there. Suppose you meant to say \{x \in 2^\omega : nK(x _n) \text {is unbounded}\}, but that is still not quite right because almost all real numbers will have strings of 0 bits of unlimited length, and when you hit one of those then you can write an algorithm that prints out the start and inserts in the easily computed tail. Therefore you expect there to be lengths at which the number can be compressed slightly.
(UPDATE: You can get rid of the "almost". If a number has a limit to how many 0's appear in a row in its binary expansion, you can use that fact to compress it. So for any k > 0 and any sequence of bits, there is a length n at which that sequence is compressible by at least k bits.)
What I think you really wanted to say something like \{x \in 2^\omega : \lim_{n \rightarrow \infty} n  K(x _n) = \infty\} That gets at the idea of a number that can be compressed by some amount from some point on. And it is a set of measure 0. However I don't know whether this is equivalent to not being MartinLöf random.
Some of us exist to find out what can and can't be done.
Others exist to hold the beer.
 skeptical scientist
 closedminded spiritualist
 Posts: 6139
 Joined: Tue Nov 28, 2006 6:09 am UTC
 Location: San Francisco
Re: irrational numbers as typewriting monkeys
Yakk wrote:skeptical scientist wrote:To be more precise, I meant the set \{x \in 2^\omega : K(n)K(x _n) \text { is unbounded}\} has measure 0. (Here x_{n} denotes the string consisting of the first n bits of x, and K is prefixfree Kolmogorov complexity.)
So the complexity of the first n bits of x is bounded within a some distance of the complexity of the number n is required for something to be rejected.
Simple elements, such as "all zeros", fall in your set. In short, you need to have no more 'information' in the element than "the length of the sequence requested". So that would roughly be the set of elements that you can make a Turing machine T such that it is fed the number n, and output the first n digits of the number. Can I prove that? Hmm.
By the way, what measure? Ah, you want x to be a real number, so probably the standard measure on the reals.
Oops, that was supposed to be \{x \in 2^\omega : nK(x _n) \text { is unbounded}\}. And it is equivalent to not being MartinLof random.
btilly, your idea for showing that all strings have nK(x_{n}) unbounded doesn't work for prefixfree complexity, because you can't just concatenate two programs; you have to specify where one program ends and the next begins. Since the first program has length roughly n, in general this takes something like log n bits, which is roughly the same as the length of the expected length of the 0 sequence, so it washes out your savings for strings which have zero sequences of about the length you would expect if they were generated randomly. However, the idea does work if you substitute the plain complexity C for the prefixfree complexity K. In fact nC(x_{n}) is unbounded for every infinite sequence x.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.
"With math, all things are possible." —Rebecca Watson
"With math, all things are possible." —Rebecca Watson

 Posts: 3
 Joined: Sat Nov 08, 2008 5:52 pm UTC
Re: irrational numbers as typewriting monkeys
This reminds me of The Library of Babel by Jorge Luis Borges. The story is about a library (the universe) that contains books containing exactly the same number of characters with a space defined as a character. The Library contains books with every conceivable combination of n letters (a set value in the story, I can't remember the value off hand).
Who is online
Users browsing this forum: Exabot [Bot] and 7 guests