## Randomness Mining

For the serious discussion of weighty matters and worldly issues. No off-topic posts allowed.

Moderators: Azrael, Moderators General, Prelates

untitled
Posts: 39
Joined: Wed Jan 29, 2014 9:23 pm UTC

### Randomness Mining

I considered putting this topic on the linguistics forum (although it would be most appropriate, it didn't seem to be the place), mathematics (but we are not concerned with the algorithms or parameters... yet), computer science (doesn't need to halt), coding (it's implementation independent... or is it?). So, here we are. If a mod has an idea for a more fruitful location for the topic - please help it.

The inspiration for this idea was the Global Consciousness Project which is one of those kinds of "we must be stupid not to try it" ideas like SETI. People set up random number generators called EGG's all over the planet to see if major human events (witnessed en-masse, provoking emotions and most often tragic) alter the random numbers generated by one or more of the EGG's.

The idea, in a nutshell: if we take the output of a random number generator (true or pseudo, doesn't matter at this point) and look for matches from a dictionary of words - what would we find?

Applications are many: find the optimal parameters for the most lyrical of poetry; spiritism; divination; unbiased source of words for psychological experiments; unbiased source of words for public art; searching for long sequences of stuff and then describing it as pseudorandom generator algorithm + parameters + seed (useful for compression)... a whole smorgasbord of applications, each one more useless than the next - which would also mean it can have deep educational utility.

The issue which requires scientific rigor and I need your assistance with is one of linguistics/semantics/semiotics, namely what should we look for. Of course, we are looking for words. Words which are composed of a variable length sequence of letters (are you sure we should not be looking for syllables?). But how are those letters encoded? Would Cthulhu use ASCII, Unicode, Morse alphabet or Baudot code? Or just use 5 bits to store the characters of the Latin alphabet? But how would those be ordered - is the first one A or E (most frequent letter in most languages)? Maybe Hebrew Kabbalah, where letters and words are supposed to have a tighter connection? Should we really be searching for letters? Why not just encode each word with a number and search for meaningful phrases? But which should be the First Word? "Word"? Oh dear.

We are thinking in terms of words and letters because we use them to describe (translate?) the world: objects, actions and characteristics. We are not even touching the subject of languages - before that we need to ask: do rules exist for what a word is? Some languages have words with more than 5 consonants next to each other - while others have words formed only by vocals (do diphthongs and diphthongs even merit a separate notation?) What about new words and arbitrary words (like proper names)? Language depends on the human body - both for emitting and perceiving - but what about abstract ideas? Are they built upon the body (like some ancient ambitions) or are they independent of human context? Does it have anything to do with entropy (article: url)? Would it be better if we searched for patterns (something independent of "body") and then bruteforce that?

I wish to wiggle the whip efficiently, therefore want to solve this issue of "general encoding" (in lack of a better term) before taking up the issues of generator source (cosmic xray sources, cosmic background radiation, 'classical' radiation, weather, cascade noise, Brownian motion, jitter, news feeds, twitter, many types of pseudorandom generators etc.) and other technicalities (randomness extraction/whitening). Sure, we could look at "all" the possible encodings with "all" the possible preprocessings of "all" the available sources - and I will most certainly attempt that - but only after we settle on what "all the possible encodings" means... till then, as long as your idea pertains to randomness mining, you are more than welcome to post it in this topic.

PS: Saw a forgettable documentary once - an USA secretary of state asked the Dalai Lama if random number generators had consciousness. The Dalai Lama answered that if you believe that it is conscious, it is conscious.

PS2: Sorry for the grammar.

Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 2484
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

### Re: Randomness Mining

Interesting. Not (from what I've read, so far) based in reality, but interesting nonetheless.

The project makes me think of Paul the Octopus. Though the scale of the problem would likely need many more EGGs to be reasonably likely to get something like a 'Paul' from their many EGGs.

(Pondering your further points. If egosearch worked for new things, this would perhaps be best termed a PTW, but at the moment it's more an expression of interest.)

ucim
Posts: 5568
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

### Re: Randomness Mining

untitled wrote:[re: the original inspiration,] which is one of those kinds of "we must be stupid not to try it" ideas...
Uh.... no. There is no reason to believe that the underlying hypothesis (human attention affects random processes coherently). If a correlation is found, the most likely explanation would be "this thing you thought was 'random', isn't". It's an example of something that would look like the One True Scotsman fallacy not being a fallacy, since it literally begs the question of what "random" means.

As to the (OP) proposed experiment:
untitled wrote:The idea, in a nutshell: if we take the output of a random number generator (true or pseudo, doesn't matter at this point) and look for matches from a dictionary of words - what would we find?
The issue which requires scientific rigor and I need your assistance with is one of linguistics/semantics/semiotics, namely what should we look for. Of course, we are looking for words

What is the question you are attempting to answer? What is the theory which you are testing? What is the purpose of this exercise?

Since you are talking about words, what are words? Well, consider how they came about - as differentiated grunts associated with meaning (that is usually context-sensitive). Is your focus to be on the set of grunts or on the set of encapsulated meanings? And are you focusing on random selections, or on the (putative) difference between true randomness and the output of these RNGs? (If the latter, how will you be able to tell the difference if "true randomness" isn't random, and if the former, why do you suppose that actual randomness has any correlation with (for example) lyric poetry?

If you are looking for an unbiased source of words for {whatever}, you need to pick from a subset, and usually your picks will need to be weighted based on the actual usage of that subset in the context that is important for a particular experiment or subject group. The hard part is the weighting, not the picking.

untitled wrote:[I] want to solve this issue of "general encoding" (in lack of a better term) before taking up the issues of generator source...
I suspect you'll find that there is no "general encoding". Especially if you don't specify exactly what it is that is being encoded, and why that is the thing you are looking for.

If you listed words at random, you will find random words. This will (usually) be a representative sample of the corpus. You may be able to infer from this that the corpus contains, say, more "happiness" words, or more "complexity" words... or something like that. You could then try to deduce the "mood" of the populace from this, but unless you filter it through actual usage stats rather than simple existence stats, you could easily be misled. And there's no reason not to examine the entire corpus anyway.

Google is already doing this kind of thing for the stuff it indexes.

If you want to create the "set of all possible words" and compare it with the existing set of words, you'll find that {population} favors {this kind of grunt}. But I don't see that this tells us something we don't already know (except perhaps that, over the entire history of humanity, nobody has used {that kind of grunt} which is surprisingly easy to make, as a word. I suppose that could be an interesting result.

But if you look for patterns in randomness, you'll only find that either there are none (definition of randomness), or what you thought was random, isn't. In this case, either the RNGs are flawed, or the corpus of words itself is not random. We already know the latter is true; we can study this without adding noise.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

mcd001
Posts: 137
Joined: Tue Sep 09, 2014 7:27 pm UTC

### Re: Randomness Mining

I went to the linked website (http://www.global-mind.org) and read this:

"Our purpose is to examine subtle correlations that may reflect the presence and activity of consciousness in the world. We hypothesize that there will be structure in what should be random data, associated with major global events that engage our minds and hearts."

Regarding finding "structure in what should be random data": If you search random data looking for subtle correlations, you *will* find them.

Randomness being what it is, the truly astonishing thing would be NOT finding correlations.

ucim
Posts: 5568
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

### Re: Randomness Mining

mcd001 wrote: If you search random data looking for subtle correlations, you *will* find them.
...but will you find enough of them to be convincing? Or to at least make a fortune in click-bait?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

untitled
Posts: 39
Joined: Wed Jan 29, 2014 9:23 pm UTC

### Re: Randomness Mining

ucim wrote:Jose

Well, considering your obnoxious post, the issue in not one of corpus/dictionary (which, btw, does not even exist in reality as such - people invent and redefine things all the time) but of how to link the output of the RNG to any dictionary, in a manner that "loses" the least amount of bits of data. The dictionary is just a valid set of "structures." For six letter words YJNEES is not valid but POTATO is. Or is it? Could words even exist outside sentences (especially if we consider single words as being degenerate sentences)? That's what I am asking.

As I said, the grunts have a link to the encapsulated meaning: the body (or, more broadly, the context i.e. words explained by examples... examples ultimately pertaining to the body nonetheless). There is also the possibility of examples not pertaining to the body* but which can be applied (translated) to the body by brute force (e.g. checking all the language variants - the relationships would be more fitting for some alternatives than for others).

The fact that random words come out of more or less random generators should not be a problem yet because we are not at that stage. They would still be just "as random" as the generator is.** If you encounter the contours of a naked lady on a piece of rock lying on some lifeless planet - you couldn't say it's random or not random, you could only say that you can see it because you are a living human being with a working brain and sensory apparatus. Same with this "experiment." If you cannot understand this, please abstain from adding noise.

*see that entropy paper - that's why I linked it...
**maybe they would be more sparse (rare) than the output itself - one word or one sentence per X hours of running (in case of natural phenomena) or Z cycles of running (in case of PRNG's)

ijuin
Posts: 799
Joined: Fri Jan 09, 2009 6:02 pm UTC

### Re: Randomness Mining

Something like this will quickly run into the Library of Babel problem. Nearly all of it will be complete jumbles of characters that do not form legible sentences, and of the one-in-billions that do, nearly all will be nonsensical or non-sequiter. Artificial intelligence is not yet at a stage where we could automate the filtering sufficiently to get rid of the garbage before sending it to human proofreaders, so you will still need infinite editorial staff to handle the output of infinite monkeys.

ucim
Posts: 5568
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

### Re: Randomness Mining

untitled wrote:Well, considering your obnoxious post
What was obnoxious about it? I'm trying to figure out what it is you are looking for, because what you said doesn't really get there. For instance:
untitled wrote:...how to link the output of the RNG to any dictionary, in a manner that "loses" the least amount of bits of data
I have no clue as to what this even means.
untitled wrote:For six letter words YJNEES is not valid but POTATO is. Or is it?
There are no six letter words. There are however words that can be spelled with six letters. This is an important difference which I think your proposal fails to capture. And this isn't a gratuitous criticism - you were asking.

untitled wrote:If you encounter the contours of a naked lady on a piece of rock lying on some lifeless planet - you couldn't say it's random or not random, you could only say that you can see it because you are a living human being with a working brain and sensory apparatus.
It's not the seeing of it that's important, it's the recognition of it as resembling a naked lady, no? It could be argued that we are evolutionarily biased to recognize such shapes, it could be argued that those shapes are more common than random shapes due to erosion, cohesion, and such. Both are probably true. A question that would arise is "How often would random processes yield a shape that we would recognize as a naked lady?" Or, equally valid and more insightful, "How often....that we would find remarkable?" (avoiding the fallacy of coincidence).

untitled wrote:If you cannot understand this, please abstain from adding noise.
Now that is obnoxious. But as you request, I will exit.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

untitled
Posts: 39
Joined: Wed Jan 29, 2014 9:23 pm UTC

### Re: Randomness Mining

ijuin wrote:Something like this will quickly run into the Library of Babel problem. Nearly all of it will be complete jumbles of characters that do not form legible sentences, and of the one-in-billions that do, nearly all will be nonsensical or non-sequiter. Artificial intelligence is not yet at a stage where we could automate the filtering sufficiently to get rid of the garbage before sending it to human proofreaders, so you will still need infinite editorial staff to handle the output of infinite monkeys.

https://libraryofbabel.info/bookmark.cg ... k.xrws,197

(look around the middle of the "page")

ucim wrote:A question that would arise is "How often would random processes yield a shape that we would recognize as a naked lady?" Or, equally valid and more insightful, "How often....that we would find remarkable?" (avoiding the fallacy of coincidence).

Yes, that is the ultimate goal (e.g. find the random source/parameters for pseudorandom source that has the highest concentration of awe... or, at least, the highest concentration of words or coherent sentences). But, as I mentioned twice and you failed to notice, we are not there - at the moment we have to figure out what is the optimal form to spell words, preferably without relying on a human body (which would imply consonant-vocal balance, language etc.)

Thank you for abstaining in the future.
Last edited by untitled on Sun Jun 12, 2016 5:21 pm UTC, edited 1 time in total.

morriswalters
Posts: 6904
Joined: Thu Jun 03, 2010 12:21 am UTC

### Re: Randomness Mining

untitled wrote:The idea, in a nutshell: if we take the output of a random number generator (true or pseudo, doesn't matter at this point) and look for matches from a dictionary of words - what would we find?
Random strings of text. Which randomly produce words in those strings. If the idea is to test for global consciousness than why would you expect the RNG's to output words from any given language, since multiple languages are spoken globally. And if that isn't the point, what is?

Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 2484
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

### Re: Randomness Mining

untitled wrote:https://libraryofbabel.info/bookmark.cgi?kglkywcohks.c.rmk.xrws,197

(look around the middle of the "page")
Apart from the distinctly non-random bit, it's amazingly patterned. So many repetitions.

untitled
Posts: 39
Joined: Wed Jan 29, 2014 9:23 pm UTC

### Re: Randomness Mining

morriswalters wrote:
untitled wrote:The idea, in a nutshell: if we take the output of a random number generator (true or pseudo, doesn't matter at this point) and look for matches from a dictionary of words - what would we find?
Random strings of text. Which randomly produce words in those strings. If the idea is to test for global consciousness than why would you expect the RNG's to output words from any given language, since multiple languages are spoken globally. And if that isn't the point, what is?

Yes, that is exactly the point. The stage we are at (we, in this topic, as I don't have anybody to bug with this kind of stuff in real life...) is thinking about the most general way in which meaning could be expressed - preferably without relation to any language (try to understand what I wrote above about body, context and "examples").

If you find a pattern in meaning - the linked article is a good starting point, as it figures that results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal* - you can brute force it (i.e. looking if the elements making up the sequence considered can be found in dictionaries for multiple languages - this is the easy part). This is just one idea, one model. I am curious if anybody can think up something else while I am working on this.

*direct quote from the abstract... ffs

Soupspoon wrote:
untitled wrote:https://libraryofbabel.info/bookmark.cgi?kglkywcohks.c.rmk.xrws,197

(look around the middle of the "page")
Apart from the distinctly non-random bit, it's amazingly patterned. So many repetitions.

Yes, we can't say that the Library of Babel is a random source, as it contains all variations of everything. Still, I find it a fun and elegant solution worth mentioning.

Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 2484
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

### Re: Randomness Mining

untitled wrote:Yes, we can't say that the Library of Babel is a random source, as it contains all variations of everything. Still, I find it a fun and elegant solution worth mentioning.
So far as I can tell, what you have there is a partly-reversible hash function, somewhat like a rainbow table generator with encoded 'hint' as to what desired reveal ought to actually emerge from the one-to-many . But that's without poking and prodding too much, so I could just be overthinking it. Something similar could be done using a PRNG character 'stream', index-referencible, combining something as simple as an XORing reference to whichever 'page' produces the most compressible 'fit', not quite so much an uncomputable task if time limits are a problem.

(Reminds me of some things I've done myself, regarding simultaneously compressing and encrypting plaintext, though not presented in that manner. I'll read more of the bumf later, perhaps try to discern what might be parody and what might be misdirection.)