The Shortest String Containing all Permutations of n Symbols
Moderators: gmalivuk, Moderators General, Prelates
 NathanielJ
 Posts: 882
 Joined: Sun Jan 13, 2008 9:04 pm UTC
The Shortest String Containing all Permutations of n Symbols
Here's an interesting (to me, anyway) combinatorics problem that has a reasonably intuitive answer that I can't for the life of me prove is correct or find a proof of.
What is the length of the shortest string that contains every permutation of the digits 1,2,...,n as substrings?
For example, when n = 2 a minimal string is 121, which contains each of "12" and "21" as substrings. When n = 3, a minimal string is 123121321, which contains each of "123", "132", "213", "231", "312", and "321" as substrings. The length of the shortest such string has been conjectured to be equal to[math]\sum_{k=1}^n k![/math]See the OEIS, MathExchange, StackOverflow and this PDF for similar posts by people asking about this question. Strangely, these are the only sources that I can find that mention this problem, and they all are from last year. Those posts all go into how to construct a string that by all rights seems optimal, and all of those sources conjecture that the given string is indeed of minimal length, but not one of them actually proves it. And I can't seem to prove it either, despite the fact that it looks like it should fall to a simple induction argument.
So my question is  is the conjecture true? Is the conjectured "optimal" string of length [imath]\sum_{k=1}^n k![/imath] actually optimal?
I believe the reason the conjecture is more difficult to prove than would be expected is because there seems to be some sort of shift in behaviour of the optimal strings around n = 5 or n = 6, at which point it becomes infeasible to brute force what the optimal strings are, and the strings are so long as to be difficult to work with.
For example, via bruteforce you can show that for n = 1,2,3,4 the optimal string has length 1,3,9,33, follows the construction given in the posts linked above, and furthermore is unique (up to requiring that the string starts with "123...n"). However, when n = 5 there are two optimal strings that start with "12345", and when n = 6 there are at least 96 distinct strings of the conjectured minimal length.
So does anyone have any insights on this problem or any other references that I'm not aware of? Cheers.
What is the length of the shortest string that contains every permutation of the digits 1,2,...,n as substrings?
For example, when n = 2 a minimal string is 121, which contains each of "12" and "21" as substrings. When n = 3, a minimal string is 123121321, which contains each of "123", "132", "213", "231", "312", and "321" as substrings. The length of the shortest such string has been conjectured to be equal to[math]\sum_{k=1}^n k![/math]See the OEIS, MathExchange, StackOverflow and this PDF for similar posts by people asking about this question. Strangely, these are the only sources that I can find that mention this problem, and they all are from last year. Those posts all go into how to construct a string that by all rights seems optimal, and all of those sources conjecture that the given string is indeed of minimal length, but not one of them actually proves it. And I can't seem to prove it either, despite the fact that it looks like it should fall to a simple induction argument.
So my question is  is the conjecture true? Is the conjectured "optimal" string of length [imath]\sum_{k=1}^n k![/imath] actually optimal?
I believe the reason the conjecture is more difficult to prove than would be expected is because there seems to be some sort of shift in behaviour of the optimal strings around n = 5 or n = 6, at which point it becomes infeasible to brute force what the optimal strings are, and the strings are so long as to be difficult to work with.
For example, via bruteforce you can show that for n = 1,2,3,4 the optimal string has length 1,3,9,33, follows the construction given in the posts linked above, and furthermore is unique (up to requiring that the string starts with "123...n"). However, when n = 5 there are two optimal strings that start with "12345", and when n = 6 there are at least 96 distinct strings of the conjectured minimal length.
So does anyone have any insights on this problem or any other references that I'm not aware of? Cheers.
Re: The Shortest String Containing all Permutations of n Sym
Hm. What properties could we expect this minimal string to have? Should we expect it to have no repetitions of length n? It *cannot* satisfy the rule that each substring of length n has every character, since then it would just be repetitions of one template. Can we find some rule governing how many "invalid" substrings the shortst string must have?
Some people tell me I laugh too much. To them I say, "ha ha ha!"
 NathanielJ
 Posts: 882
 Joined: Sun Jan 13, 2008 9:04 pm UTC
Re: The Shortest String Containing all Permutations of n Sym
Charlie! wrote:Hm. What properties could we expect this minimal string to have? Should we expect it to have no repetitions of length n? It *cannot* satisfy the rule that each substring of length n has every character, since then it would just be repetitions of one template. Can we find some rule governing how many "invalid" substrings the shortst string must have?
The shortest string seems to be the one obtained in this way (I will demonstrate the construction for n = 4, but it extends naturally for higher n):
Start with "1234" and notice that if we tack a "1" on the end then we have added another permutation with only one extra symbol. Then adding "2" gives another permutation at the cost of only one more symbol, and so on until we get "1234123". But now we can't add another permutation to the string by adding only one symbol  we have to add two symbols. So we get "123412314". And now we can go back to adding just one symbol 3 times to get "123412314231", which contains three more permutations. And then again we have to add 2 more symbols to add another permutation. And so on.
If we continue in this way naively using this "greedy" algorithm where we just add as few symbols at a time as possible, we end up with the string "123412314231243121342132413214321", which is in fact known to be the unique optimal string for n = 4.
For n = 4, the number of symbols that you need to add to the string to get one more permutation are as follows:
1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1
There is a clear pattern here that is not difficult to generalize, but the difficulty is proving that this pattern is indeed optimal in general  it could be the case (for large n) that if we increase one of the early numbers (i.e., we don't use the greedy algorithm but rather add some extra length early on), it allows us to decrease later numbers by a larger factor, thus shortening the overall string.
We can simplify this a bit even. Notice that the optimal string for n = 4 can be broken into chunks of length 2n1 = 7 as follows:
Code: Select all
123412314231243121342132413214321
1234123
2314231
3124312
2134213
1324132
3214321
Each chunk has the following two properties:
(1) The middle digit of each chunk is n (=4 in this case).
(2) The first n1 symbols of the chunk are the same as the last n1 symbols of the chuck (including order) and are a permutation of the symbols 1,2,...,n1.
If we can prove that the optimal string can always be broken down into chunks of length 2n1 that satisfy those properties, then we can prove many nice things (such as the length of the optimal strings, and even a formula for how many optimal strings there are).
Re: The Shortest String Containing all Permutations of n Sym
undecim wrote:http://en.wikipedia.org/wiki/De_Bruijn_sequence
Note that this includes all words of specified length, including those where symbols are repeated.
 NathanielJ
 Posts: 882
 Joined: Sun Jan 13, 2008 9:04 pm UTC
Re: The Shortest String Containing all Permutations of n Sym
How did I know that one of the first replies would be to the Wikipedia article on de Bruijn sequences?
The de Bruijn diagram approach doesn't work because there is no way to concatenate all of the permutations next to each other with overlap n1 each. After you use n of the permutations, you run out of options for what the next symbol in the string should be.
The de Bruijn diagram approach doesn't work because there is no way to concatenate all of the permutations next to each other with overlap n1 each. After you use n of the permutations, you run out of options for what the next symbol in the string should be.
Re: The Shortest String Containing all Permutations of n Sym
The more general term for this kind of object is "universal sequence". (However, universal sequences are often forbidden from having repeats  that is, each structure would have to appear exactly once in the sequence.) In general, not much is known about them, so any solution to this problem is probably worth publishing.
In a related problem, it has been proven that when n > 2, a universal cycle of length n! (that is, a universal sequence that eats its own tail) for permutations in [imath]S_n[/imath] must have at least n+1 distinct characters in its alphabet, and n+1 is achievable for all n.
In a related problem, it has been proven that when n > 2, a universal cycle of length n! (that is, a universal sequence that eats its own tail) for permutations in [imath]S_n[/imath] must have at least n+1 distinct characters in its alphabet, and n+1 is achievable for all n.
 silverhammermba
 Posts: 178
 Joined: Fri Oct 13, 2006 1:16 am UTC
Re: The Shortest String Containing all Permutations of n Sym
NathanielJ wrote:There is a clear pattern here that is not difficult to generalize, but the difficulty is proving that this pattern is indeed optimal in general  it could be the case (for large n) that if we increase one of the early numbers (i.e., we don't use the greedy algorithm but rather add some extra length early on), it allows us to decrease later numbers by a larger factor, thus shortening the overall string.
Perhaps if we think about the number of "uses" of each character i.e. the number of unique permutations it is part of in the sequence. The greedy algorithm tries to maximize the use of each character, whereas the naive algorithm (concatenating all permutations) minimizes the use of each character. Some usage assignments would be impossible and perhaps we could find a restriction on the assignments that would force the string to be of the desired length.
For your n=3 example (123121321), the uses we get are
123222321
which also implies a pattern. If we can find restrictions on the usage patterns, it could provide a way to show that the uses cannot be more efficiently packed.
Here are the simple restrictions I've come up with so far:
1. The number of a character's uses is less than or equal to the number of length n substrings it is contained in
2. The total uses of a string must sum to n*n!
With those restrictions alone, for n=3 we would hope to get a string with the following uses
12333321
which is one character shorter than optimal, so hopefully we could find more restrictions on uses that would force the string to meet the hypothesized optimal length.
Re: The Shortest String Containing all Permutations of n Sym
I along with my math professor Dan Ashlock wrote a paper about this very topic back in 1993. We called the paper Construction of Small Superpermutations and Minimal Injective Superstrings. We found the length of the shortest strings to be the sum(k!) as k goes from 1 to n. Note there are also n! such strings since you can replace 1 with 2 and 2 with 1, etc. and still have a valid shortest string. We also found algorithms for deriving the shortest strings, but we couldn't prove that our strings were shortest. We used an exhaustive computer search to prove that the algorithm generated the shortest string up to 11, but that's as far as we went.
I've been trying for years to get a mathematician to take up this problem. I'm a computer scientist and cognitive scientist and don't know enough math to prove this. I've been told by many mathematicians that this problem is easy and they've generated the proof before, but no one can produce the proof. I think the problem is deceptively simple which causes mathematicians to blow it off.
I periodically check to see if anyone has done any research on this topic. Thanks for the post here! It made my day. I'd also be happy to send more information. I'm new to this forum, but ping me on here if you can or leave a message in this comment.
Jenett
I've been trying for years to get a mathematician to take up this problem. I'm a computer scientist and cognitive scientist and don't know enough math to prove this. I've been told by many mathematicians that this problem is easy and they've generated the proof before, but no one can produce the proof. I think the problem is deceptively simple which causes mathematicians to blow it off.
I periodically check to see if anyone has done any research on this topic. Thanks for the post here! It made my day. I'd also be happy to send more information. I'm new to this forum, but ping me on here if you can or leave a message in this comment.
Jenett
 NathanielJ
 Posts: 882
 Joined: Sun Jan 13, 2008 9:04 pm UTC
Re: The Shortest String Containing all Permutations of n Sym
jtillots wrote:I along with my math professor Dan Ashlock...
Oh wow that's weird  what a small world, Dan is a professor at my university (U of Guelph). I had no idea that he had worked on this problem!
jtillots wrote:We also found algorithms for deriving the shortest strings, but we couldn't prove that our strings were shortest. We used an exhaustive computer search to prove that the algorithm generated the shortest string up to 11, but that's as far as we went.
I would be very interested in seeing the algorithm for searching the space, as I've only been able to prove optimality up to n = 6; by n = 11 things are quite huge.
jtillots wrote:I've been told by many mathematicians that this problem is easy and they've generated the proof before, but no one can produce the proof. I think the problem is deceptively simple which causes mathematicians to blow it off.
Yep, that's been my experience with the problem too. I even found one mathematics competition that contained the problem, and then the "solutions" just handwaved it by showing that a string of that length always exists, but they didn't prove optimality.
Anyway, I would love to see any information about this problem that you have (including the paper that you coauthored with Dan ). You can email me at njohns01(AT)uoguelph.ca if you don't want to post them here. Thanks, and all the best.
Re: The Shortest String Containing all Permutations of n Sym
I think I heard about that mathematics competition. I was talking to someone at a party about this problem and they told me about a math competition where this was one of the problems. I got super excited and asked what the proof was, he said, "Oh, I don't remember. But it was simple." *sigh*
I'm sorry it's taken me so long to reply to you. I don't frequent these forums very often  only when I'm looking for more info about super permutations.
I've sent you some email. I'd love to see this problem move forward.
Jenett
I'm sorry it's taken me so long to reply to you. I don't frequent these forums very often  only when I'm looking for more info about super permutations.
I've sent you some email. I'd love to see this problem move forward.
Jenett

 Posts: 563
 Joined: Tue Jul 27, 2010 8:48 am UTC
Re: The Shortest String Containing all Permutations of n Sym
If these results are true, then you have an upper bound on the string, and you're conjecturing that the correct value is that upper bound. An easy lower bound is n!+n1, but this can be improved slightly to n!+n for n>2, because after the first n symbols (which without loss of generality we can define them to be 123...n) one of the next n symbols must not make a new permutation. I am going to try to extend this in some way to make the lower bound something of the form n!+O((n1)!).
Edit: I have found a better lower bound on the length of the optimal string. I can prove that at least n!+(n1)!+n2 symbols are needed, by using the argument that for every n permutations in the string, there is at least one symbol after them in the string which make the substring of all n symbols behind that one, including itself, not be a new permutation. The n1 comes from the fact that the first n1 symbols do not add any new permutations without the later symbols. The 1 comes from the fact the last permutation does not have any symbols after it, therefore there is no nonpermutation substring ending in a symbol after that. This lower bound is exact only for n<=3, so I might have to change my line of reasoning from here.
Edit 2: I realized how I can improve my bound on the number of symbols needed. If the number of symbols used is exactly the lower bound, then there are exactly n permutations for every symbol that does not make a new permutation (not including the first n1 symbols). Therefore, assuming that the string starts with 123...n, then we can generate the sequence. Setting n=4, and generating the sequence the way mentioned above gives 1234123142312431234..., which is clearly repetitive, and does not give every permutation. Therefore, there must be a corrective symbol in the string that makes the string nonrepetitive, so the shortest string for n=4 is 33. I am working on generalizing this to give a lower bound of the form n!+(n1)!+O((n2)!). I can see that if I keep thinking, I can probably come up with bounds that contain arbitrarily many factorials in it, so all I need is an inductive proof to prove that the lower bound is the upper bound.
Edit: I have found a better lower bound on the length of the optimal string. I can prove that at least n!+(n1)!+n2 symbols are needed, by using the argument that for every n permutations in the string, there is at least one symbol after them in the string which make the substring of all n symbols behind that one, including itself, not be a new permutation. The n1 comes from the fact that the first n1 symbols do not add any new permutations without the later symbols. The 1 comes from the fact the last permutation does not have any symbols after it, therefore there is no nonpermutation substring ending in a symbol after that. This lower bound is exact only for n<=3, so I might have to change my line of reasoning from here.
Edit 2: I realized how I can improve my bound on the number of symbols needed. If the number of symbols used is exactly the lower bound, then there are exactly n permutations for every symbol that does not make a new permutation (not including the first n1 symbols). Therefore, assuming that the string starts with 123...n, then we can generate the sequence. Setting n=4, and generating the sequence the way mentioned above gives 1234123142312431234..., which is clearly repetitive, and does not give every permutation. Therefore, there must be a corrective symbol in the string that makes the string nonrepetitive, so the shortest string for n=4 is 33. I am working on generalizing this to give a lower bound of the form n!+(n1)!+O((n2)!). I can see that if I keep thinking, I can probably come up with bounds that contain arbitrarily many factorials in it, so all I need is an inductive proof to prove that the lower bound is the upper bound.
I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.
Re: The Shortest String Containing all Permutations of n Sym
FWIW, Martin Gardner wrote about a closely related problem that used circular lists rather than linear ones. In his article, the puzzle was described in terms of different coloured beads on a bracelet. Sorry I can't remember any details, I read the article several decades ago.
EDIT: After a bit of Googling, it appears that the bracelet in Gardner's puzzle only used beads of two colours, and it contains all possible bit strings upto a certain length, so that's probably not very helpful. Oh well.
EDIT: After a bit of Googling, it appears that the bracelet in Gardner's puzzle only used beads of two colours, and it contains all possible bit strings upto a certain length, so that's probably not very helpful. Oh well.
 dudiobugtron
 Posts: 1098
 Joined: Mon Jul 30, 2012 9:14 am UTC
 Location: The Outlier
Re: The Shortest String Containing all Permutations of n Sym
NathanielJ wrote:it looks like it should fall to a simple induction argument.
I imagine the induction step would be something like:
Since there are (n+1)! permutations, there must be at least [size of min string for n] + (n+1)! characters in a minimal string for n+1, otherwise we could do some fancy trick and remove (n+1)! characters and have a minimal string for the digits 1,...,n, which would be a contradiction.
I haven't thought about it too much though but if I had more time that is the way I'd try to go about it.
 dudiobugtron
 Posts: 1098
 Joined: Mon Jul 30, 2012 9:14 am UTC
 Location: The Outlier
Re: The Shortest String Containing all Permutations of n Sym
Yup. I think this is the outline of a viable proof, although I have made more than a few appeals to 'obviousness' which people are free to dispute!!! I've spoiler tagged it because it's a bit long and has a few sum symbols.:
Spoiler:
Re: The Shortest String Containing all Permutations of n Sym
dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.
I think that's not quite right. Consider the sequence of digits 41234. This contains two unique lengthfour sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base lengththree sequence.
This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.

 Posts: 563
 Joined: Tue Jul 27, 2010 8:48 am UTC
Re: The Shortest String Containing all Permutations of n Sym
CCC wrote:dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.
I think that's not quite right. Consider the sequence of digits 41234. This contains two unique lengthfour sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base lengththree sequence.
This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.
I think that given his assumption that the string contains all (n+1)! permutations of the digits 1 to n+1, then if you remove all the n+1's there is at least n+1 of each permutation of the digits from 1 to n. I agree with everything except:
dudiobugtron wrote:It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!
If you could prove this, I would be happy with your proof.
I have discovered a truly marvelous proof of this, which this margin is too narrow to contain.
Re: The Shortest String Containing all Permutations of n Sym
tomtom2357 wrote:CCC wrote:dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.
I think that's not quite right. Consider the sequence of digits 41234. This contains two unique lengthfour sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base lengththree sequence.
This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.
I think that given his assumption that the string contains all (n+1)! permutations of the digits 1 to n+1, then if you remove all the n+1's there is at least n+1 of each permutation of the digits from 1 to n.
But I think, because of CCC's objection, that statement remains unproven. There are n+1 permutations of length n+1 that collapse to the same permutation of length n, but as CCC said, two such permutations could be the same permutation when you remove n+1.
I agree with everything except:dudiobugtron wrote:It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!
If you could prove this, I would be happy with your proof.
I agree that this is a problem.
 dudiobugtron
 Posts: 1098
 Joined: Mon Jul 30, 2012 9:14 am UTC
 Location: The Outlier
Re: The Shortest String Containing all Permutations of n Sym
Deedlit wrote:tomtom2357 wrote:CCC wrote:dudiobugtron wrote:If you remove all of the digits 'n+1', the string will then (as above) contain at least n+1 copies of each of the n! permutations of the digits 1,...,n.
I think that's not quite right. Consider the sequence of digits 41234. This contains two unique lengthfour sequences (4123 and 1234), but removing the 4s leaves us with only one copy of the base lengththree sequence.
This only applies to the case where (n+1) is before and after the same sequence of every other digit; so the string as a whole will still contain at least n copies of each of the n! permutations.
I think that given his assumption that the string contains all (n+1)! permutations of the digits 1 to n+1, then if you remove all the n+1's there is at least n+1 of each permutation of the digits from 1 to n.
But I think, because of CCC's objection, that statement remains unproven. There are n+1 permutations of length n+1 that collapse to the same permutation of length n, but as CCC said, two such permutations could be the same permutation when you remove n+1.
It's obvious that any two permutations which share the same n+1 digit will 'contract' to two distinct permutations like I want them to. I had been assuming that all the interesting cases were of this sort.
However, I also think CCC is right, that the issue is when you have two permutations that share all their digits except n+1. I'm thinking that these situations would be the result of 'excess' n+1 digits (ie: more than the 'n!' minimum I assumed in my 'proof'), which can be removed to compensate for the lost permutations. Also, you can really remove all occurrences of any digit you like (it doesn't have to be n+1, since the digits are all interchangible). If all of the digits had 'excess' occurrences like this, then the string probably isn't minimal. I know that's not a proof though!
tomtom2357 wrote:I agree with everything except:dudiobugtron wrote:It seems apparent that we can safely remove n.n! digits (corresponding to the n.n! extra copies) and still have all n! of the permutations of the digits 1,...,n contained in the string.
Removing these n.n! digits, along with the (at least) n! 'n+1' digits gives a total of n! + n.n! = (n+1).n! = (n+1)!
If you could prove this, I would be happy with your proof.
Yeah I wasn't happy with that part either.
I realised after posting the proof that you don't actually have to prove that you can remove the digits, you merely have to prove that the *size* of a string which contains p more permutations than a 'minimum string' is atleast p greater. That's still not that easy either (even though it seems intuitively correct), so I'll think about it some more (unless someone beats me to it!)
Re: The Shortest String Containing all Permutations of n Sym
dudiobugtron wrote:It's obvious that any two permutations which share the same n+1 digit will 'contract' to two distinct permutations like I want them to. I had been assuming that all the interesting cases were of this sort.
However, I also think CCC is right, that the issue is when you have two permutations that share all their digits except n+1. I'm thinking that these situations would be the result of 'excess' n+1 digits (ie: more than the 'n!' minimum I assumed in my 'proof'), which can be removed to compensate for the lost permutations.
Hmmm. Let's take a look at the earlierprovided length4 sequence:
123412314231243121342132413214321
In this case, (n+1)=4.
Now, there's no case where a pair of sequences share every number but 4 (the 4s are all spaced just a little too widely for that). And every number except 4 does indeed occur enough times to make up for the lost permutations. But let us consider briefly the consequence of removing the threes:
123412314231243121342132413214321
Note the sequence "34123" near the start. Let's look at the 412's after removing all the 3's:
1241214212412142124121421
There are only three such sequences, less than (n+1). (In fact, since there are only six fours, there's only three of each possible threeuniquedigit sequence).
Hmmm. That sequence is... usefully repetitive. After the first eight digits, it simply repeats itself. Since it has the minimum possible number of 4s (i.e. six of them), I don't think there's any option; it does have to include at least n of every sequence, n=3, and the sequences 412 and 421 cannot use the same 4. (Similarly for the pairs (241; 142) and (214; 124)).
 NathanielJ
 Posts: 882
 Joined: Sun Jan 13, 2008 9:04 pm UTC
Re: The Shortest String Containing all Permutations of n Sym
Be careful when assuming that the optimal string *must* have a certain form based on the optimal strings in the n = 1, 2, 3, 4 cases. While there are indeed exactly (n1)! instances of the digit n for those cases, and there is tons of symmetry to be found, that breaks down for n >= 5. When n = 5, I have two fundamentally different optimal strings: one that has 24 '5's and is symmetric like we would hope... and then one that has 27 '5's and isn't symmetric.
 dudiobugtron
 Posts: 1098
 Joined: Mon Jul 30, 2012 9:14 am UTC
 Location: The Outlier
Re: The Shortest String Containing all Permutations of n Sym
I've been thinking about this for a while (onandoff!), and although I haven't gotten anywhere, I thought I'd at least share my approaches in case they help anyone else do something more useful.
The main idea I was trying was to look at the underlying string of *permutations* of n symbols, namely p_1, p_2, ..., p_(n!) , ordered by their first (and only?) appearance in the main string.
I was looking at the case where the underlying string could be partitioned into (n1)! sets of n permutations, {p_(in),...,p_((i+1)n1)}, where each p_(jn) differed from p_((j1)n) by merely moving the starting symbol to the end (and of course similarly for each p_(in) and p_((i+1)n1)). I was trying to prove that finding the 'best' way for ordering these sets of partitions was suitably analogous to finding the best way for ordering the permutations of (n1) symbols. And also of course that such a process gave a best possible ordering for the partitions of n symbols.
Another related idea I tried, based on this idea as well:
was to give a value to each of those permutations based on how many symbols needed to be added to put that permutation in the string. For the example above, this would be:
4 1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1
For n=4 and below, it is easily observable that the number of permutations where this number is 1 or greater is 4!, the number of permutations where it is 2 or greater is 3!, etc etc... ending up with 4! + 3! + 2! + 1! symbols needed.
So I guess if you could prove that a solution with this property would always be as short as possible, maybe by tying it in with the ideas above, then you'd be set.
(You can also see how this illustrates the above model (6 sets of 4 'nearby' permutations, each preceded by a bigger gap).)
Like I said, nothing concrete or necessarily useful, but on the chance that it may help someone I thought it was worth posting.
The main idea I was trying was to look at the underlying string of *permutations* of n symbols, namely p_1, p_2, ..., p_(n!) , ordered by their first (and only?) appearance in the main string.
I was looking at the case where the underlying string could be partitioned into (n1)! sets of n permutations, {p_(in),...,p_((i+1)n1)}, where each p_(jn) differed from p_((j1)n) by merely moving the starting symbol to the end (and of course similarly for each p_(in) and p_((i+1)n1)). I was trying to prove that finding the 'best' way for ordering these sets of partitions was suitably analogous to finding the best way for ordering the permutations of (n1) symbols. And also of course that such a process gave a best possible ordering for the partitions of n symbols.
Another related idea I tried, based on this idea as well:
NathanielJ wrote:For n = 4, the number of symbols that you need to add to the string to get one more permutation are as follows:
1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1
was to give a value to each of those permutations based on how many symbols needed to be added to put that permutation in the string. For the example above, this would be:
4 1 1 1 2 1 1 1 2 1 1 1 3 1 1 1 2 1 1 1 2 1 1 1
For n=4 and below, it is easily observable that the number of permutations where this number is 1 or greater is 4!, the number of permutations where it is 2 or greater is 3!, etc etc... ending up with 4! + 3! + 2! + 1! symbols needed.
So I guess if you could prove that a solution with this property would always be as short as possible, maybe by tying it in with the ideas above, then you'd be set.
(You can also see how this illustrates the above model (6 sets of 4 'nearby' permutations, each preceded by a bigger gap).)
Like I said, nothing concrete or necessarily useful, but on the chance that it may help someone I thought it was worth posting.
Re: The Shortest String Containing all Permutations of n Sym
I couldn't help noting that each of the optimal strings posted here is palindromic. That is to say, each can be read either from left to right or from right to left with the same result:
121
123121321
123412314231243121342132413214321
Many years of recreational mathematics lead me to suspect that this will be the case for all optimal strings of this type. Does anybody happen to know whether my hunch is correct?
Meanwhile, I have been working (on and off for the last fifteen years or so) on a similar problem, which is that of producing every ndigit permutation in base b. For example, there are four optimal strings in base two containing every possible twodigit permutation exactly once:
00110
11001
01100
10011
The main difference between my problem and the one being discussed here is that my optimal strings contain repeating digits. As base b increases, the number of possible optimal strings increases hyperexponentially. In base 3 there are 216 possible solutions. In base ten, one of many, many solutions is the following 101digit string:
00110221203323130443424140554535251506656463626160776757473727170887868584838281809989796959493929190
This string contains every single twodigit permutation from 00 to 99 in base ten, without repetition. (It is not pallindromic, of course, as no palindromic solution could possibly exist, for reasons that are not too difficult to work out.)
121
123121321
123412314231243121342132413214321
Many years of recreational mathematics lead me to suspect that this will be the case for all optimal strings of this type. Does anybody happen to know whether my hunch is correct?
Meanwhile, I have been working (on and off for the last fifteen years or so) on a similar problem, which is that of producing every ndigit permutation in base b. For example, there are four optimal strings in base two containing every possible twodigit permutation exactly once:
00110
11001
01100
10011
The main difference between my problem and the one being discussed here is that my optimal strings contain repeating digits. As base b increases, the number of possible optimal strings increases hyperexponentially. In base 3 there are 216 possible solutions. In base ten, one of many, many solutions is the following 101digit string:
00110221203323130443424140554535251506656463626160776757473727170887868584838281809989796959493929190
This string contains every single twodigit permutation from 00 to 99 in base ten, without repetition. (It is not pallindromic, of course, as no palindromic solution could possibly exist, for reasons that are not too difficult to work out.)
Re: The Shortest String Containing all Permutations of n Sym
Well, I was googling about this a little, and I found out that the value conjectured in this thread (and elsewhere) is false. So, I guess that makes this problem even harder, since we don't even have a bound to try to prove.
Prior to knowing this, I spent some time thinking and came up with a novel way to look at the problem. I couldn't think of what to do next, but...
Prior to knowing this, I spent some time thinking and came up with a novel way to look at the problem. I couldn't think of what to do next, but...
Spoiler:
Mathematical hangover (n.): The feeling one gets in the morning when they realize that that short, elementary proof of the Riemann hypothesis that they came up with at midnight the night before is, in fact, nonsense.

 Posts: 3
 Joined: Thu Feb 27, 2014 9:28 am UTC
Re: The Shortest String Containing all Permutations of n Sym
Hello! Funny to see a link to my paper here while idly browsing the forum. It seems that lurkers like me are not allowed to post links, so you’ll have to Google for the things I wanted to link to.
The first interesting development since the main burst of activity on this thread was the discovery by Benjamin Chaffin of eight different length153 superpermutations on five symbols, with a (machine) proof that there are no shorter ones. Nathaniel Johnston – the OP of this thread –has details on his blog.
That emboldened me to try to disprove the minimallength conjecture for n=6, which I succeeded in doing.
I think in a way this makes our quest easier, since we can stop wasting effort trying to prove something that is not true; but it means the true picture is more complicated than some people had hoped.
I think the most obvious lowhanging fruit now is to prove (or disprove!) that length 872 is minimal for n=6. If anyone has access to a decent compute cluster, it should be pretty easy to hit it with Concorde (the TSP solver) till it breaks. The input file you need is 6.tsp in the list of ancillary files I included with the paper.
The first interesting development since the main burst of activity on this thread was the discovery by Benjamin Chaffin of eight different length153 superpermutations on five symbols, with a (machine) proof that there are no shorter ones. Nathaniel Johnston – the OP of this thread –has details on his blog.
That emboldened me to try to disprove the minimallength conjecture for n=6, which I succeeded in doing.
I think in a way this makes our quest easier, since we can stop wasting effort trying to prove something that is not true; but it means the true picture is more complicated than some people had hoped.
I think the most obvious lowhanging fruit now is to prove (or disprove!) that length 872 is minimal for n=6. If anyone has access to a decent compute cluster, it should be pretty easy to hit it with Concorde (the TSP solver) till it breaks. The input file you need is 6.tsp in the list of ancillary files I included with the paper.
Re: The Shortest String Containing all Permutations of n Sym
Here's the link to the article on Robin Houston's blog.
Tackling the Minimal Superpermutation Problem
Tackling the Minimal Superpermutation Problem
Last edited by PM 2Ring on Thu Sep 18, 2014 12:22 am UTC, edited 1 time in total.
Re: The Shortest String Containing all Permutations of n Sym
Doesn't finding a short length5 sequence automatically imply the existence of shorter sequences of length 6 and up?

 Posts: 3
 Joined: Thu Feb 27, 2014 9:28 am UTC
Re: The Shortest String Containing all Permutations of n Sym
Doesn't finding a short length5 sequence automatically imply the existence of shorter sequences of length 6 and up?
Indeed it would, but Ben Chaffin showed that there is no short superpermutation on 5 symbols: all the minimal ones have length 153, which is the same length as the recursively constructed one, i.e. 1! + 2! + 3! + 4! + 5!.
The existence of a shorterthanconjectured superpermutation on 6 symbols implies that there are shorterthanconjectured superpermutations for all n>6 too.
Re: The Shortest String Containing all Permutations of n Sym
I wrote a short conversion of the Hamiltonian path problem (see Robin Houston's paper on arXiv) to (Max)SAT. The encoding uses roughly (n!)^2 variables, so it won't scale to large n but it may be possible to tackle the n = 6 case with this approach. However in order to do that we need to encode more domain specific knowledge. One starting point would be the following question: Can we remove edges with weight n from the graph without changing the weight of the minimal Hamiltonian path? Does any solution for n = 5 use such an edge?

 Posts: 3
 Joined: Thu Feb 27, 2014 9:28 am UTC
Re: The Shortest String Containing all Permutations of n Sym
Interesting! It would be great if we could dispose of n=6, and going via SAT is a reasonable idea.
I haven’t proved it, but it seems plausible that it’s safe to remove edges of weight n. There are no known minimal superpermutations that use these. If removing them works, it would be worth trying to prove.
There is a minimal superpermutation at n=5 that uses an edge of weight 4, namely the “obvious” recursive one. The structure is like this (read down the columns):
In the middle there’s an edge from 54312 to 21345, which has weight 4.
On the other hand, the heaviest edge in the shortestknown superpermutation for n=6 has weight 3. If you could establish a bound even assuming all edges have weight ≤3, that would still be pretty interesting I think!
I haven’t proved it, but it seems plausible that it’s safe to remove edges of weight n. There are no known minimal superpermutations that use these. If removing them works, it would be worth trying to prove.
There is a minimal superpermutation at n=5 that uses an edge of weight 4, namely the “obvious” recursive one. The structure is like this (read down the columns):
Code: Select all
12345 25341 ... 31542 51243 ... 21534 51324 13254 35214
23451 53412 31425 15423 ... 21345 15342 ... 32541 52143
34512 ... 14253 54231 24315 13452 53421 32415 25413 ...
45123 41235 42531 ... 43152 34521 ... 24153 54132 14325
51234 12354 25314 ... 31524 45213 42135 41532 ... 43251
... 23541 53142 31245 15243 52134 21354 15324 ... 32514
23415 35412 ... 12453 52431 ... 13542 53241 32145 25143
34152 54123 14235 24531 ... 13425 35421 ... 21453 51432
41523 ... 42351 45312 43125 34251 54213 24135 14532 ...
15234 ... 23514 53124 31254 42513 ... 41352 45321 43215
52341 23145 35142 ... 12543 25134 ... 13524 53214 32154
... 31452 51423 12435 25431 51342 13245 35241 ... 21543
34125 14523 ... 24351 54312 ... 32451 52413 21435 15432
41253 45231 42315 43512 ... 34215 24513 ... 14352 54321
12534 52314 23154 35124 ... 42153 45132 41325 43521
In the middle there’s an edge from 54312 to 21345, which has weight 4.
On the other hand, the heaviest edge in the shortestknown superpermutation for n=6 has weight 3. If you could establish a bound even assuming all edges have weight ≤3, that would still be pretty interesting I think!
Who is online
Users browsing this forum: No registered users and 7 guests