The Shortest String Containing all Permutations of n Symbols

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

User avatar
NathanielJ
Posts: 882
Joined: Sun Jan 13, 2008 9:04 pm UTC

The Shortest String Containing all Permutations of n Symbols

Postby NathanielJ » Thu Feb 17, 2011 6:14 am UTC

Here's an interesting (to me, anyway) combinatorics problem that has a reasonably intuitive answer that I can't for the life of me prove is correct or find a proof of.

What is the length of the shortest string that contains every permutation of the digits 1,2,...,n as substrings?

For example, when n = 2 a minimal string is 121, which contains each of "12" and "21" as substrings. When n = 3, a minimal string is 123121321, which contains each of "123", "132", "213", "231", "312", and "321" as substrings. The length of the shortest such string has been conjectured to be equal to[math]\sum_{k=1}^n k![/math]See the OEIS, MathExchange, StackOverflow and this PDF for similar posts by people asking about this question. Strangely, these are the only sources that I can find that mention this problem, and they all are from last year. Those posts all go into how to construct a string that by all rights seems optimal, and all of those sources conjecture that the given string is indeed of minimal length, but not one of them actually proves it. And I can't seem to prove it either, despite the fact that it looks like it should fall to a simple induction argument.

So my question is -- is the conjecture true? Is the conjectured "optimal" string of length [imath]\sum_{k=1}^n k![/imath] actually optimal?

I believe the reason the conjecture is more difficult to prove than would be expected is because there seems to be some sort of shift in behaviour of the optimal strings around n = 5 or n = 6, at which point it becomes infeasible to brute force what the optimal strings are, and the strings are so long as to be difficult to work with.

For example, via brute-force you can show that for n = 1,2,3,4 the optimal string has length 1,3,9,33, follows the construction given in the posts linked above, and furthermore is unique (up to requiring that the string starts with "123...n"). However, when n = 5 there are two optimal strings that start with "12345", and when n = 6 there are at least 96 distinct strings of the conjectured minimal length.

So does anyone have any insights on this problem or any other references that I'm not aware of? Cheers.
Homepage: http://www.njohnston.ca
Conway's Game of Life: http://www.conwaylife.com

User avatar
Charlie!
Posts: 2033
Joined: Sat Jan 12, 2008 8:20 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby Charlie! » Thu Feb 17, 2011 7:52 pm UTC

Hm. What properties could we expect this minimal string to have? Should we expect it to have no repetitions of length n? It *cannot* satisfy the rule that each substring of length n has every character, since then it would just be repetitions of one template. Can we find some rule governing how many "invalid" substrings the shortst string must have?
Some people tell me I laugh too much. To them I say, "ha ha ha!"

User avatar
NathanielJ
Posts: 882
Joined: Sun Jan 13, 2008 9:04 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby NathanielJ » Thu Feb 17, 2011 8:10 pm UTC

Charlie! wrote:Hm. What properties could we expect this minimal string to have? Should we expect it to have no repetitions of length n? It *cannot* satisfy the rule that each substring of length n has every character, since then it would just be repetitions of one template. Can we find some rule governing how many "invalid" substrings the shortst string must have?


The shortest string seems to be the one obtained in this way (I will demonstrate the construction for n = 4, but it extends naturally for higher n):

Start with "1234" and notice that if we tack a "1" on the end then we have added another permutation with only one extra symbol. Then adding "2" gives another permutation at the cost of only one more symbol, and so on until we get "1234123". But now we can't add another permutation to the string by adding only one symbol -- we have to add two symbols. So we get "123412314". And now we can go back to adding just one symbol 3 times to get "123412314231", which contains three more permutations. And then again we have to add 2 more symbols to add another permutation. And so on.

If we continue in this way naively using this "greedy" algorithm where we just add as few symbols at a time as possible, we end up with the string "123412314231243121342132413214321", which is in fact known to be the unique optimal string for n = 4.

For n = 4, the number of symbols that you need to add to the string to get one more permutation are as follows:

1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1

There is a clear pattern here that is not difficult to generalize, but the difficulty is proving that this pattern is indeed optimal in general -- it could be the case (for large n) that if we increase one of the early numbers (i.e., we don't use the greedy algorithm but rather add some extra length early on), it allows us to decrease later numbers by a larger factor, thus shortening the overall string.

We can simplify this a bit even. Notice that the optimal string for n = 4 can be broken into chunks of length 2n-1 = 7 as follows:

Code: Select all

123412314231243121342132413214321
1234123
     2314231
          3124312
                2134213
                     1324132
                          3214321


Each chunk has the following two properties:
(1) The middle digit of each chunk is n (=4 in this case).
(2) The first n-1 symbols of the chunk are the same as the last n-1 symbols of the chuck (including order) and are a permutation of the symbols 1,2,...,n-1.

If we can prove that the optimal string can always be broken down into chunks of length 2n-1 that satisfy those properties, then we can prove many nice things (such as the length of the optimal strings, and even a formula for how many optimal strings there are).
Homepage: http://www.njohnston.ca
Conway's Game of Life: http://www.conwaylife.com

User avatar
undecim
Posts: 289
Joined: Tue Jan 19, 2010 7:09 pm UTC
Contact:

Re: The Shortest String Containing all Permutations of n Sym

Postby undecim » Fri Feb 18, 2011 2:40 am UTC

Blue, blue, blue

skullturf
Posts: 556
Joined: Thu Dec 07, 2006 8:37 pm UTC
Location: Chicago
Contact:

Re: The Shortest String Containing all Permutations of n Sym

Postby skullturf » Fri Feb 18, 2011 3:10 am UTC

undecim wrote:http://en.wikipedia.org/wiki/De_Bruijn_sequence


Note that this includes all words of specified length, including those where symbols are repeated.

User avatar
NathanielJ
Posts: 882
Joined: Sun Jan 13, 2008 9:04 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby NathanielJ » Fri Feb 18, 2011 3:41 am UTC

How did I know that one of the first replies would be to the Wikipedia article on de Bruijn sequences?

The de Bruijn diagram approach doesn't work because there is no way to concatenate all of the permutations next to each other with overlap n-1 each. After you use n of the permutations, you run out of options for what the next symbol in the string should be.
Homepage: http://www.njohnston.ca
Conway's Game of Life: http://www.conwaylife.com

++$_
Mo' Money
Posts: 2370
Joined: Thu Nov 01, 2007 4:06 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby ++$_ » Fri Feb 18, 2011 7:26 am UTC

The more general term for this kind of object is "universal sequence". (However, universal sequences are often forbidden from having repeats -- that is, each structure would have to appear exactly once in the sequence.) In general, not much is known about them, so any solution to this problem is probably worth publishing.

In a related problem, it has been proven that when n > 2, a universal cycle of length n! (that is, a universal sequence that eats its own tail) for permutations in [imath]S_n[/imath] must have at least n+1 distinct characters in its alphabet, and n+1 is achievable for all n.

User avatar
silverhammermba
Posts: 178
Joined: Fri Oct 13, 2006 1:16 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby silverhammermba » Sun Feb 27, 2011 9:39 pm UTC

NathanielJ wrote:There is a clear pattern here that is not difficult to generalize, but the difficulty is proving that this pattern is indeed optimal in general -- it could be the case (for large n) that if we increase one of the early numbers (i.e., we don't use the greedy algorithm but rather add some extra length early on), it allows us to decrease later numbers by a larger factor, thus shortening the overall string.

Perhaps if we think about the number of "uses" of each character i.e. the number of unique permutations it is part of in the sequence. The greedy algorithm tries to maximize the use of each character, whereas the naive algorithm (concatenating all permutations) minimizes the use of each character. Some usage assignments would be impossible and perhaps we could find a restriction on the assignments that would force the string to be of the desired length.

For your n=3 example (123121321), the uses we get are
123222321
which also implies a pattern. If we can find restrictions on the usage patterns, it could provide a way to show that the uses cannot be more efficiently packed.

Here are the simple restrictions I've come up with so far:
1. The number of a character's uses is less than or equal to the number of length n substrings it is contained in
2. The total uses of a string must sum to n*n!

With those restrictions alone, for n=3 we would hope to get a string with the following uses
12333321
which is one character shorter than optimal, so hopefully we could find more restrictions on uses that would force the string to meet the hypothesized optimal length.

jtillots
Posts: 2
Joined: Wed Apr 20, 2011 1:59 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby jtillots » Wed Apr 20, 2011 2:09 pm UTC

I along with my math professor Dan Ashlock wrote a paper about this very topic back in 1993. We called the paper Construction of Small Superpermutations and Minimal Injective Superstrings. We found the length of the shortest strings to be the sum(k!) as k goes from 1 to n. Note there are also n! such strings since you can replace 1 with 2 and 2 with 1, etc. and still have a valid shortest string. We also found algorithms for deriving the shortest strings, but we couldn't prove that our strings were shortest. We used an exhaustive computer search to prove that the algorithm generated the shortest string up to 11, but that's as far as we went.

I've been trying for years to get a mathematician to take up this problem. I'm a computer scientist and cognitive scientist and don't know enough math to prove this. I've been told by many mathematicians that this problem is easy and they've generated the proof before, but no one can produce the proof. I think the problem is deceptively simple which causes mathematicians to blow it off.

I periodically check to see if anyone has done any research on this topic. Thanks for the post here! It made my day. I'd also be happy to send more information. I'm new to this forum, but ping me on here if you can or leave a message in this comment.

Jenett

User avatar
NathanielJ
Posts: 882
Joined: Sun Jan 13, 2008 9:04 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby NathanielJ » Thu Apr 21, 2011 6:08 am UTC

jtillots wrote:I along with my math professor Dan Ashlock...


Oh wow that's weird -- what a small world, Dan is a professor at my university (U of Guelph). I had no idea that he had worked on this problem!

jtillots wrote:We also found algorithms for deriving the shortest strings, but we couldn't prove that our strings were shortest. We used an exhaustive computer search to prove that the algorithm generated the shortest string up to 11, but that's as far as we went.


I would be very interested in seeing the algorithm for searching the space, as I've only been able to prove optimality up to n = 6; by n = 11 things are quite huge.

jtillots wrote:I've been told by many mathematicians that this problem is easy and they've generated the proof before, but no one can produce the proof. I think the problem is deceptively simple which causes mathematicians to blow it off.


Yep, that's been my experience with the problem too. I even found one mathematics competition that contained the problem, and then the "solutions" just hand-waved it by showing that a string of that length always exists, but they didn't prove optimality.

Anyway, I would love to see any information about this problem that you have (including the paper that you co-authored with Dan :)). You can e-mail me at njohns01(AT)uoguelph.ca if you don't want to post them here. Thanks, and all the best. :)
Homepage: http://www.njohnston.ca
Conway's Game of Life: http://www.conwaylife.com

jtillots
Posts: 2
Joined: Wed Apr 20, 2011 1:59 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby jtillots » Fri Jul 27, 2012 5:02 pm UTC

I think I heard about that mathematics competition. I was talking to someone at a party about this problem and they told me about a math competition where this was one of the problems. I got super excited and asked what the proof was, he said, "Oh, I don't remember. But it was simple." *sigh*

I'm sorry it's taken me so long to reply to you. I don't frequent these forums very often - only when I'm looking for more info about super permutations. :)

I've sent you some email. I'd love to see this problem move forward.

Jenett

tomtom2357
Posts: 563
Joined: Tue Jul 27, 2010 8:48 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby tomtom2357 » Sat Jul 28, 2012 3:50 pm UTC

If these results are true, then you have an upper bound on the string, and you're conjecturing that the correct value is that upper bound. An easy lower bound is n!+n-1, but this can be improved slightly to n!+n for n>2, because after the first n symbols (which without loss of generality we can define them to be 123...n) one of the next n symbols must not make a new permutation. I am going to try to extend this in some way to make the lower bound something of the form n!+O((n-1)!).

Edit: I have found a better lower bound on the length of the optimal string. I can prove that at least n!+(n-1)!+n-2 symbols are needed, by using the argument that for every n permutations in the string, there is at least one symbol after them in the string which make the sub-string of all n symbols behind that one, including itself, not be a new permutation. The n-1 comes from the fact that the first n-1 symbols do not add any new permutations without the later symbols. The -1 comes from the fact the last permutation does not have any symbols after it, therefore there is no non-permutation sub-string ending in a symbol after that. This lower bound is exact only for n<=3, so I might have to change my line of reasoning from here.

Edit 2: I realized how I can improve my bound on the number of symbols needed. If the number of symbols used is exactly the lower bound, then there are exactly n permutations for every symbol that does not make a new permutation (not including the first n-1 symbols). Therefore, assuming that the string starts with 123...n, then we can generate the sequence. Setting n=4, and generating the sequence the way mentioned above gives 1234123142312431234..., which is clearly repetitive, and does not give every permutation. Therefore, there must be a corrective symbol in the string that makes the string non-repetitive, so the shortest string for n=4 is 33. I am working on generalizing this to give a lower bound of the form n!+(n-1)!+O((n-2)!). I can see that if I keep thinking, I can probably come up with bounds that contain arbitrarily many factorials in it, so all I need is an inductive proof to prove that the lower bound is the upper bound.
I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.

User avatar
PM 2Ring
Posts: 3615
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: The Shortest String Containing all Permutations of n Sym

Postby PM 2Ring » Sun Jul 29, 2012 4:16 am UTC

FWIW, Martin Gardner wrote about a closely related problem that used circular lists rather than linear ones. In his article, the puzzle was described in terms of different coloured beads on a bracelet. Sorry I can't remember any details, I read the article several decades ago. :)

EDIT: After a bit of Googling, it appears that the bracelet in Gardner's puzzle only used beads of two colours, and it contains all possible bit strings upto a certain length, so that's probably not very helpful. Oh well.

User avatar
dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

Re: The Shortest String Containing all Permutations of n Sym

Postby dudiobugtron » Thu Aug 02, 2012 1:55 am UTC

NathanielJ wrote:it looks like it should fall to a simple induction argument.


I imagine the induction step would be something like:

Since there are (n+1)! permutations, there must be at least [size of min string for n] + (n+1)! characters in a minimal string for n+1, otherwise we could do some fancy trick and remove (n+1)! characters and have a minimal string for the digits 1,...,n, which would be a contradiction.

I haven't thought about it too much though but if I had more time that is the way I'd try to go about it.
Image

User avatar
dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

Re: The Shortest String Containing all Permutations of n Sym

Postby dudiobugtron » Thu Aug 02, 2012 12:13 pm UTC

Yup. I think this is the outline of a viable proof, although I have made more than a few appeals to 'obviousness' which people are free to dispute!!! I've spoiler tagged it because it's a bit long and has a few sum symbols.:

Spoiler:
Base case: done, by others above.

Background:
Consider the permutations of the digits 1,...,n+1. There are (n+1)! of these. If you remove all of the digits 'n+1' from each permutation, you will end up with n+1 copies of each of the n! permutations of the digits of 1,...,n. (One copy for each of the n+1 positions the 'n+1' digit could occupy.)

Consider a string, S_(n+1), containing all (n+1)! permutations of the digits 1,...,n+1.
Each digit 'n+1' in the string can be a part of at most n+1 of these permutations (since the permutations are of length n+1), so the minimum number of 'n+1' digits in the string is (n+1)! / (n+1) = n!
If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.
It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!

Therefore:

Lemma: Given a string S_(n+1) containing all permutations of the digits 1,...,n+1 as substrings, it is possible to remove (n+1)! digits and be left with a string which contains all of the permutations of the digits 1,...,n as substrings.

Induction step: Assume that for every string, S_n, containing every permutation of the digits 1,...,n as substrings, the length of S_n is at least:
[math]\sum_{k=1}^n k![/math]
Now, assume that there is a string, S(n+1), which contains every permutation of the digits 1,...,n+1 as substrings, and that the length of S_(n+1) is less than:
[math]\sum_{k=1}^{n+1} k! = \sum_{k=1}^n k! + (n+1)![/math]
Using the above Lemma, we can remove (n+1)! of the digits and still have a string which contains all of the permutations of the digits 1,...,n as substrings. But the length of this string is less than:
[math]\sum_{k=1}^n k![/math]
Which is a contradiction. Therefore any string, S(n+1), which contains every permutation of the digits 1,...,n+1 as substrings, must be of length at least:
[math]\sum_{k=1}^{n+1} k![/math]

/proof
Image

CCC
Posts: 46
Joined: Thu Jun 14, 2012 12:29 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby CCC » Thu Aug 02, 2012 2:55 pm UTC

dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.


I think that's not quite right. Consider the sequence of digits 41234. This contains two unique length-four sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base length-three sequence.

This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.

tomtom2357
Posts: 563
Joined: Tue Jul 27, 2010 8:48 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby tomtom2357 » Thu Aug 02, 2012 3:02 pm UTC

CCC wrote:
dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.


I think that's not quite right. Consider the sequence of digits 41234. This contains two unique length-four sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base length-three sequence.

This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.

I think that given his assumption that the string contains all (n+1)! permutations of the digits 1 to n+1, then if you remove all the n+1's there is at least n+1 of each permutation of the digits from 1 to n. I agree with everything except:
dudiobugtron wrote:It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!

If you could prove this, I would be happy with your proof.
I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.

Deedlit
Posts: 91
Joined: Sun Mar 08, 2009 2:55 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby Deedlit » Thu Aug 02, 2012 11:10 pm UTC

tomtom2357 wrote:
CCC wrote:
dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.


I think that's not quite right. Consider the sequence of digits 41234. This contains two unique length-four sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base length-three sequence.

This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.

I think that given his assumption that the string contains all (n+1)! permutations of the digits 1 to n+1, then if you remove all the n+1's there is at least n+1 of each permutation of the digits from 1 to n.


But I think, because of CCC's objection, that statement remains unproven. There are n+1 permutations of length n+1 that collapse to the same permutation of length n, but as CCC said, two such permutations could be the same permutation when you remove n+1.

I agree with everything except:
dudiobugtron wrote:It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!

If you could prove this, I would be happy with your proof.


I agree that this is a problem.

User avatar
dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

Re: The Shortest String Containing all Permutations of n Sym

Postby dudiobugtron » Thu Aug 02, 2012 11:41 pm UTC

Deedlit wrote:
tomtom2357 wrote:
CCC wrote:
dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.


I think that's not quite right. Consider the sequence of digits 41234. This contains two unique length-four sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base length-three sequence.

This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.

I think that given his assumption that the string contains all (n+1)! permutations of the digits 1 to n+1, then if you remove all the n+1's there is at least n+1 of each permutation of the digits from 1 to n.


But I think, because of CCC's objection, that statement remains unproven. There are n+1 permutations of length n+1 that collapse to the same permutation of length n, but as CCC said, two such permutations could be the same permutation when you remove n+1.


It's obvious that any two permutations which share the same n+1 digit will 'contract' to two distinct permutations like I want them to. I had been assuming that all the interesting cases were of this sort.
However, I also think CCC is right, that the issue is when you have two permutations that share all their digits except n+1. I'm thinking that these situations would be the result of 'excess' n+1 digits (ie: more than the 'n!' minimum I assumed in my 'proof'), which can be removed to compensate for the lost permutations. Also, you can really remove all occurrences of any digit you like (it doesn't have to be n+1, since the digits are all interchangible). If all of the digits had 'excess' occurrences like this, then the string probably isn't minimal. I know that's not a proof though!


tomtom2357 wrote:I agree with everything except:
dudiobugtron wrote:It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!

If you could prove this, I would be happy with your proof.


Yeah I wasn't happy with that part either.
I realised after posting the proof that you don't actually have to prove that you can remove the digits, you merely have to prove that the *size* of a string which contains p more permutations than a 'minimum string' is atleast p greater. That's still not that easy either (even though it seems intuitively correct), so I'll think about it some more (unless someone beats me to it!)
Image

CCC
Posts: 46
Joined: Thu Jun 14, 2012 12:29 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby CCC » Fri Aug 03, 2012 7:39 am UTC

dudiobugtron wrote:It's obvious that any two permutations which share the same n+1 digit will 'contract' to two distinct permutations like I want them to. I had been assuming that all the interesting cases were of this sort.
However, I also think CCC is right, that the issue is when you have two permutations that share all their digits except n+1. I'm thinking that these situations would be the result of 'excess' n+1 digits (ie: more than the 'n!' minimum I assumed in my 'proof'), which can be removed to compensate for the lost permutations.


Hmmm. Let's take a look at the earlier-provided length-4 sequence:

123412314231243121342132413214321

In this case, (n+1)=4.

Now, there's no case where a pair of sequences share every number but 4 (the 4s are all spaced just a little too widely for that). And every number except 4 does indeed occur enough times to make up for the lost permutations. But let us consider briefly the consequence of removing the threes:

123412314231243121342132413214321

Note the sequence "34123" near the start. Let's look at the 412's after removing all the 3's:

1241214212412142124121421

There are only three such sequences, less than (n+1). (In fact, since there are only six fours, there's only three of each possible three-unique-digit sequence).

Hmmm. That sequence is... usefully repetitive. After the first eight digits, it simply repeats itself. Since it has the minimum possible number of 4s (i.e. six of them), I don't think there's any option; it does have to include at least n of every sequence, n=3, and the sequences 412 and 421 cannot use the same 4. (Similarly for the pairs (241; 142) and (214; 124)).

User avatar
NathanielJ
Posts: 882
Joined: Sun Jan 13, 2008 9:04 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby NathanielJ » Thu Aug 09, 2012 11:03 am UTC

Be careful when assuming that the optimal string *must* have a certain form based on the optimal strings in the n = 1, 2, 3, 4 cases. While there are indeed exactly (n-1)! instances of the digit n for those cases, and there is tons of symmetry to be found, that breaks down for n >= 5. When n = 5, I have two fundamentally different optimal strings: one that has 24 '5's and is symmetric like we would hope... and then one that has 27 '5's and isn't symmetric.
Homepage: http://www.njohnston.ca
Conway's Game of Life: http://www.conwaylife.com

User avatar
dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

Re: The Shortest String Containing all Permutations of n Sym

Postby dudiobugtron » Tue Sep 04, 2012 8:23 pm UTC

I've been thinking about this for a while (on-and-off!), and although I haven't gotten anywhere, I thought I'd at least share my approaches in case they help anyone else do something more useful.

The main idea I was trying was to look at the underlying string of *permutations* of n symbols, namely p_1, p_2, ..., p_(n!) , ordered by their first (and only?) appearance in the main string.

I was looking at the case where the underlying string could be partitioned into (n-1)! sets of n permutations, {p_(in),...,p_((i+1)n-1)}, where each p_(jn) differed from p_((j-1)n) by merely moving the starting symbol to the end (and of course similarly for each p_(in) and p_((i+1)n-1)). I was trying to prove that finding the 'best' way for ordering these sets of partitions was suitably analogous to finding the best way for ordering the permutations of (n-1) symbols. And also of course that such a process gave a best possible ordering for the partitions of n symbols.


Another related idea I tried, based on this idea as well:
NathanielJ wrote:For n = 4, the number of symbols that you need to add to the string to get one more permutation are as follows:

1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1

was to give a value to each of those permutations based on how many symbols needed to be added to put that permutation in the string. For the example above, this would be:

4 1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1

For n=4 and below, it is easily observable that the number of permutations where this number is 1 or greater is 4!, the number of permutations where it is 2 or greater is 3!, etc etc... ending up with 4! + 3! + 2! + 1! symbols needed.
So I guess if you could prove that a solution with this property would always be as short as possible, maybe by tying it in with the ideas above, then you'd be set.

(You can also see how this illustrates the above model (6 sets of 4 'nearby' permutations, each preceded by a bigger gap).)



Like I said, nothing concrete or necessarily useful, but on the chance that it may help someone I thought it was worth posting.
Image

Jorel
Posts: 1
Joined: Wed Aug 27, 2014 5:01 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby Jorel » Wed Aug 27, 2014 6:29 pm UTC

I couldn't help noting that each of the optimal strings posted here is palindromic. That is to say, each can be read either from left to right or from right to left with the same result:

121
123121321
123412314231243121342132413214321

Many years of recreational mathematics lead me to suspect that this will be the case for all optimal strings of this type. Does anybody happen to know whether my hunch is correct?

Meanwhile, I have been working (on and off for the last fifteen years or so) on a similar problem, which is that of producing every n-digit permutation in base b. For example, there are four optimal strings in base two containing every possible two-digit permutation exactly once:

00110
11001
01100
10011

The main difference between my problem and the one being discussed here is that my optimal strings contain repeating digits. As base b increases, the number of possible optimal strings increases hyper-exponentially. In base 3 there are 216 possible solutions. In base ten, one of many, many solutions is the following 101-digit string:

00110221203323130443424140554535251506656463626160776757473727170887868584838281809989796959493929190

This string contains every single two-digit permutation from 00 to 99 in base ten, without repetition. (It is not pallindromic, of course, as no palindromic solution could possibly exist, for reasons that are not too difficult to work out.)

Moole
Posts: 93
Joined: Sun Jan 08, 2012 9:52 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby Moole » Fri Aug 29, 2014 2:16 am UTC

Well, I was googling about this a little, and I found out that the value conjectured in this thread (and elsewhere) is false. So, I guess that makes this problem even harder, since we don't even have a bound to try to prove.

Prior to knowing this, I spent some time thinking and came up with a novel way to look at the problem. I couldn't think of what to do next, but...

Spoiler:
What I'd do is to look at every occurrence of a given element. We will call this element n. For every permutation P of n-1, there must be at least one position where P immediately precedes n and one where n immediately precedes n. So, it makes sense to take the nth element and consider the string containing the n-1 preceding and succeeding elements. i.e. in 123121321, one has the substrings 12312 and 21321. Removing the middle "3" from each of these yields the tableaux

Code: Select all

1212
2121

Or, if we, in 123121321 swapped 2s and 3s

Code: Select all

 121
2112
121

What we should note here is that, any two consecutive columns of the tableaux contain every permutation of 2, representing permutations of the form xx3, x3x, and 3xx. More generally, in any tableaux, any string of n-1 columns must contain every permutation of n-1 elements. We will refer to the right side of the tableaux as the rightmost n-1 elements (which succeeded the instance of the character n) and the left side as the leftmost n-1 elements (which preceded it).

Given the tableaux (in order), we can reconstruct the relevant string (less any elements not used in any permutation) just by reinserting the n into the middle of each row, and putting successive rows together (with as much overlap as possible), which means we can work with them instead of strings. Further, removal of any cell from the tableaux always decreases the length of the string - so every row must be necessary in an optimal solution.

We start by looking at a bit of a less trivial tableaux than the above. In particular, take the optimal string 412341243124132414231421342143214, make the tableaux

Code: Select all

   123
123124
412312
312132
132142
241231
231213
213214
421321
321   

Since we only care about permutations of 3 appear in the tableaux, we can safely remove any references to "4" from the tableaux, along with any elements that would use the four in any substring of length 3 (i.e. the row "132142" can have the last two elements removed):

Code: Select all

   123
12312
 12312
312132
1321 
  1231
231213
21321
 21321
321   

Notice that, if a row ends with spaces, the next row starts with the same amount. This would be true of any tableaux reduced like so. More strongly, in fact, if the right side of a row is string A followed by k spaces, the left side of the next row is k spaces followed by A (since, for instance, the pair of rows "1231/1231" represents "123414231", so the "1" on the right of the first row is the same one as on the left of the second row).
Mathematical hangover (n.): The feeling one gets in the morning when they realize that that short, elementary proof of the Riemann hypothesis that they came up with at midnight the night before is, in fact, nonsense.

robinhouston
Posts: 3
Joined: Thu Feb 27, 2014 9:28 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby robinhouston » Tue Sep 09, 2014 5:33 pm UTC

Hello! Funny to see a link to my paper here while idly browsing the forum. It seems that lurkers like me are not allowed to post links, so you’ll have to Google for the things I wanted to link to.

The first interesting development since the main burst of activity on this thread was the discovery by Benjamin Chaffin of eight different length-153 superpermutations on five symbols, with a (machine) proof that there are no shorter ones. Nathaniel Johnston – the OP of this thread –has details on his blog.

That emboldened me to try to disprove the minimal-length conjecture for n=6, which I succeeded in doing.

I think in a way this makes our quest easier, since we can stop wasting effort trying to prove something that is not true; but it means the true picture is more complicated than some people had hoped.

I think the most obvious low-hanging fruit now is to prove (or disprove!) that length 872 is minimal for n=6. If anyone has access to a decent compute cluster, it should be pretty easy to hit it with Concorde (the TSP solver) till it breaks. The input file you need is 6.tsp in the list of ancillary files I included with the paper.

User avatar
PM 2Ring
Posts: 3615
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: The Shortest String Containing all Permutations of n Sym

Postby PM 2Ring » Wed Sep 10, 2014 3:26 pm UTC

Here's the link to the article on Robin Houston's blog.
Tackling the Minimal Superpermutation Problem
Last edited by PM 2Ring on Thu Sep 18, 2014 12:22 am UTC, edited 1 time in total.

nadando
Posts: 33
Joined: Wed Apr 21, 2010 8:33 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby nadando » Fri Sep 12, 2014 5:20 am UTC

Doesn't finding a short length-5 sequence automatically imply the existence of shorter sequences of length 6 and up?

robinhouston
Posts: 3
Joined: Thu Feb 27, 2014 9:28 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby robinhouston » Fri Sep 12, 2014 12:25 pm UTC

Doesn't finding a short length-5 sequence automatically imply the existence of shorter sequences of length 6 and up?

Indeed it would, but Ben Chaffin showed that there is no short superpermutation on 5 symbols: all the minimal ones have length 153, which is the same length as the recursively constructed one, i.e. 1! + 2! + 3! + 4! + 5!.

The existence of a shorter-than-conjectured superpermutation on 6 symbols implies that there are shorter-than-conjectured superpermutations for all n>6 too.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby korona » Mon Sep 15, 2014 8:15 pm UTC

I wrote a short conversion of the Hamiltonian path problem (see Robin Houston's paper on arXiv) to (Max-)SAT. The encoding uses roughly (n!)^2 variables, so it won't scale to large n but it may be possible to tackle the n = 6 case with this approach. However in order to do that we need to encode more domain specific knowledge. One starting point would be the following question: Can we remove edges with weight n from the graph without changing the weight of the minimal Hamiltonian path? Does any solution for n = 5 use such an edge?

robinhouston
Posts: 3
Joined: Thu Feb 27, 2014 9:28 am UTC

Re: The Shortest String Containing all Permutations of n Sym

Postby robinhouston » Mon Sep 15, 2014 9:09 pm UTC

Interesting! It would be great if we could dispose of n=6, and going via SAT is a reasonable idea.

I haven’t proved it, but it seems plausible that it’s safe to remove edges of weight n. There are no known minimal superpermutations that use these. If removing them works, it would be worth trying to prove.

There is a minimal superpermutation at n=5 that uses an edge of weight 4, namely the “obvious” recursive one. The structure is like this (read down the columns):

Code: Select all

12345   25341   ...     31542   51243   ...     21534   51324   13254   35214
23451   53412   31425   15423   ...     21345   15342   ...     32541   52143
34512   ...     14253   54231   24315   13452   53421   32415   25413   ...
45123   41235   42531   ...     43152   34521   ...     24153   54132   14325
51234   12354   25314   ...     31524   45213   42135   41532   ...     43251
...     23541   53142   31245   15243   52134   21354   15324   ...     32514
23415   35412   ...     12453   52431   ...     13542   53241   32145   25143
34152   54123   14235   24531   ...     13425   35421   ...     21453   51432
41523   ...     42351   45312   43125   34251   54213   24135   14532   ...
15234   ...     23514   53124   31254   42513   ...     41352   45321   43215
52341   23145   35142   ...     12543   25134   ...     13524   53214   32154
...     31452   51423   12435   25431   51342   13245   35241   ...     21543
34125   14523   ...     24351   54312   ...     32451   52413   21435   15432
41253   45231   42315   43512   ...     34215   24513   ...     14352   54321
12534   52314   23154   35124   ...     42153   45132   41325   43521

In the middle there’s an edge from 54312 to 21345, which has weight 4.

On the other hand, the heaviest edge in the shortest-known superpermutation for n=6 has weight 3. If you could establish a bound even assuming all edges have weight ≤3, that would still be pretty interesting I think!


Return to “Mathematics”

Who is online

Users browsing this forum: No registered users and 12 guests