Merging lists in place
Moderators: phlip, Moderators General, Prelates
 LucasBrown
 Posts: 294
 Joined: Thu Apr 15, 2010 2:57 am UTC
 Location: Poway, CA
Merging lists in place
I'm told that there is a stable way to merge two sorted lists in place that runs in time better than O(n^{2})but I haven't been able to find or derive anything about it. Can anyone help?
 Xanthir
 My HERO!!!
 Posts: 5250
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: Merging lists in place
You can't merge two lists in place, unless you're splitting the final list across the two input lists. Merging two lists necessarily creates a new list, of size equal to the sum of the input sizes.
Stable merging of two lists in linear time is easy, though. Just set up two list pointers, each pointing to the start of one of the inputs. Every iteration, check which of the two pointers has a smaller value. Copy that value into the result and move that pointer. Repeat until one of the lists is empty, at which point you can just append the entries in the remaining list to the end of the output list. That's a single linear pass through each list.
Stable merging of two lists in linear time is easy, though. Just set up two list pointers, each pointing to the start of one of the inputs. Every iteration, check which of the two pointers has a smaller value. Copy that value into the result and move that pointer. Repeat until one of the lists is empty, at which point you can just append the entries in the remaining list to the end of the output list. That's a single linear pass through each list.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
 scarecrovv
 It's pronounced 'double u'
 Posts: 674
 Joined: Wed Jul 30, 2008 4:09 pm UTC
 Location: California
Re: Merging lists in place
Xanthir wrote:You can't merge two lists in place, unless you're splitting the final list across the two input lists. Merging two lists necessarily creates a new list, of size equal to the sum of the input sizes.
Stable merging of two lists in linear time is easy, though. Just set up two list pointers, each pointing to the start of one of the inputs. Every iteration, check which of the two pointers has a smaller value. Copy that value into the result and move that pointer. Repeat until one of the lists is empty, at which point you can just append the entries in the remaining list to the end of the output list. That's a single linear pass through each list.
Well now, it depends on how your lists are represented. If by "list" you mean contiguous block of memory, like an array in C, then this is correct. However, if you mean a Lispy (linked) list, then it's quite easy to merge two lists. Just change where all the cdrs (links) point in a similar fashion to your description and you've just merged two sorted lists in place in O(n) time.
LucasBrown: could you specify how your lists are represented?
 LucasBrown
 Posts: 294
 Joined: Thu Apr 15, 2010 2:57 am UTC
 Location: Poway, CA
Re: Merging lists in place
My program isn't really merging two lists... it's all in one contiguous block of memory, and my program is written in such a way that the first half might as well be a completely seperate list from the second half. My question is this: is there a way to rearrange the items in the list in a stable, subquadratic manner that uses a quantity of additional space that is independent of the size of the lists?
P.S. The only reason I'm posting this is because the piece of hardware my program's on has miniscule memory and the data I'm working with almost fills the memoryI would gladly do the copyintonewlist thing if it was possible on my hardware.
P.S. The only reason I'm posting this is because the piece of hardware my program's on has miniscule memory and the data I'm working with almost fills the memoryI would gladly do the copyintonewlist thing if it was possible on my hardware.
 Meteorswarm
 Posts: 979
 Joined: Sun Dec 27, 2009 12:28 am UTC
 Location: Ithaca, NY
Re: Merging lists in place
LucasBrown wrote:My program isn't really merging two lists... it's all in one contiguous block of memory, and my program is written in such a way that the first half might as well be a completely seperate list from the second half. My question is this: is there a way to rearrange the items in the list in a stable, subquadratic manner that uses a quantity of additional space that is independent of the size of the lists?
P.S. The only reason I'm posting this is because the piece of hardware my program's on has miniscule memory and the data I'm working with almost fills the memoryI would gladly do the copyintonewlist thing if it was possible on my hardware.
I think it might be doable.
Consider that you have two lists, adjacent, sorted leasttomost, as in your example. If we don't mind moving items between lists, I think we can merge, and it should have good runtime IF your lists are of similar magnitude. Have a pointer to the head of each list. As normal, choose the minimum from each to put onto the output list. If this is the list that you're growing the output list into, great, just move the pointer. Otherwise, swap whatever item you'd need to move out of list 1 into the spot in list 2 from which you took the item you put into the output list.
Since this could break the sorting invariant of list two, swap the item down until it is in the correct place. This algorithm has the potential for a quadratic runtime, but if your lists are similar, should only require very few swaps per iteration.
I can't think of a better way.
The same as the old Meteorswarm, now with fewer posts!
Re: Merging lists in place
If you've got two sorted arrays in a row, you could "merge" them inplace using heapsort. O(n log n).
 Berengal
 Superabacus Mystic of the First Rank
 Posts: 2707
 Joined: Thu May 24, 2007 5:51 am UTC
 Location: Bergen, Norway
 Contact:
Re: Merging lists in place
You can get O(n) time with O(n) memory using a merge, or O(n log n) time with O(1) memory by sorting the whole thing. Or, if you switch to linked lists, you can get O(n) time with O(1) memory using a merge.
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.
 Meteorswarm
 Posts: 979
 Joined: Sun Dec 27, 2009 12:28 am UTC
 Location: Ithaca, NY
Re: Merging lists in place
Berengal wrote:You can get O(n) time with O(n) memory using a merge, or O(n log n) time with O(1) memory by sorting the whole thing. Or, if you switch to linked lists, you can get O(n) time with O(1) memory using a merge.
The problem is that in this case he has strictly n+m memory. Converting to linked lists would take something more than that, depending on how many bits he uses for addressing, and conventional merging we've already talked about.
If this merge is the result of you trying to do mergesort, a better option is to just perform heapsort or quicksort and enjoy the better memory usage.
The same as the old Meteorswarm, now with fewer posts!
Re: Merging lists in place
Most of you seem to forget that LucasBrown needs a stable sort. Maybe in that case insertion sort is the best option. It takes O(n^{2}) comparisons and swaps, but it has low overhead so it will generally perform better than strictly inplace merge sort.
There is a more efficient way to use "inplace" merge sort with additional storage, though. Assuming that you have a memory region that consists of two serially aligned subarrays that are to be merged, you can copy just the first subarray to another region. After that, you merge the copy of the first subarray with the original second subarray into the original memory region.
So it really depends on your possibilities. I don't know of any way to merge inplace with only constant additional storage in better than O(n^{2}) time. If you can really only stand constant additional storage, go for insertion sort. If you can afford to copy the first array, take the inplace merge sort with "modest linear additional storage".
If you think the sort doesn't need to be stable afterall, I agree to the above posters that options like quicksort and heapsort are better. If speed is very important and you have time to write twice as much code, or if you're coincidentally using C++, you might be interested in introsort. If you're sorting data with few unique keys you might be interested in 3way quicksort.
There is a more efficient way to use "inplace" merge sort with additional storage, though. Assuming that you have a memory region that consists of two serially aligned subarrays that are to be merged, you can copy just the first subarray to another region. After that, you merge the copy of the first subarray with the original second subarray into the original memory region.
So it really depends on your possibilities. I don't know of any way to merge inplace with only constant additional storage in better than O(n^{2}) time. If you can really only stand constant additional storage, go for insertion sort. If you can afford to copy the first array, take the inplace merge sort with "modest linear additional storage".
If you think the sort doesn't need to be stable afterall, I agree to the above posters that options like quicksort and heapsort are better. If speed is very important and you have time to write twice as much code, or if you're coincidentally using C++, you might be interested in introsort. If you're sorting data with few unique keys you might be interested in 3way quicksort.
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
 Meteorswarm
 Posts: 979
 Joined: Sun Dec 27, 2009 12:28 am UTC
 Location: Ithaca, NY
Re: Merging lists in place
You can also mimic a stable sort (assuming that you're trying to use it to have things be subsorted by other keys) by having your comparison function be sophisticated enough to consider all the keys simultaneously.
The same as the old Meteorswarm, now with fewer posts!
Re: Merging lists in place
Meteorswarm wrote:You can also mimic a stable sort (assuming that you're trying to use it to have things be subsorted by other keys) by having your comparison function be sophisticated enough to consider all the keys simultaneously.
Certainly. You just have to watch that the comparisons are fast enough, or your n log n unstable sort gets slower than a stable n² algorithm. Of course, for large data sets n log n will beat n², but I get the feeling that these lists aren't that big.
I suspect that we need more info to discover the best solution here. Why does the program generate 2 lists, why not merge them as they're being created? Are the 2 lists roughly of equal size, and roughly how big are they? How many secondary keys are there, and how hard are they to compare?
Re: Merging lists in place
I stumbled across another option: quicksort with a stable partitioning algorithm. The C++ STL provides a stable partitioning algorithm. Their implementation is adaptive; if it can allocate some temporary storage it can run in O(n) just like the normal unstable partition of normal quicksort, and otherwise it can run O(n log n) inplace. So a quicksort built upon stable_partition would run O(n log n log n) if it has to be inplace, which is a lot better than the O(n^{2}) of insertion sort or inplace merge sort. If you're not using C++ you can still try and peek at their code here (search for "stable_partition").
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
 Meteorswarm
 Posts: 979
 Joined: Sun Dec 27, 2009 12:28 am UTC
 Location: Ithaca, NY
Re: Merging lists in place
Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.
The same as the old Meteorswarm, now with fewer posts!
Re: Merging lists in place
Meteorswarm wrote:Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.
I've always felt that this isn't emphasized enough. In my CS classes it seemed like all we cared about was the big O of the algorithm but a lot of our practice type situations weren't very large so we were actually taking more time by using a more sophisticated algorithm.
double epsilon = .0000001;
 Meteorswarm
 Posts: 979
 Joined: Sun Dec 27, 2009 12:28 am UTC
 Location: Ithaca, NY
Re: Merging lists in place
Dason wrote:Meteorswarm wrote:Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.
I've always felt that this isn't emphasized enough. In my CS classes it seemed like all we cared about was the big O of the algorithm but a lot of our practice type situations weren't very large so we were actually taking more time by using a more sophisticated algorithm.
For pedagogical reasons, it's useful because it's simpler analytically to talk about the big$character complexity than all the little vagaries that go into computing the actual run time (how does it behave with a cache of various sizes? for example). There are realworld applications where it matters. If you're Google, you probably don't care a whole lot about those constants when you're sure that your n is so huge they're tiny by comparison.
The same as the old Meteorswarm, now with fewer posts!
Re: Merging lists in place
Oh definitely. It's A LOT easier to just talk about the large n case. I just wish they would have made a slight effort to say something like "Hey guys, you know sometimes those constants matter so you gotta think about if that additional overhead makes sense in your particular problem or not". I mean hopefully everybody realizes these kinds of things... but with some of the kids I was taking those classes with I highly doubt they would think about these issues.
double epsilon = .0000001;
 Xanthir
 My HERO!!!
 Posts: 5250
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: Merging lists in place
You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems  they're still doing 100k linear searches over large linked lists rather than using hash tables.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

 Posts: 213
 Joined: Wed Sep 10, 2008 3:54 am UTC
Re: Merging lists in place
Xanthir wrote:You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems  they're still doing 100k linear searches over large linked lists rather than using hash tables.
This.
My university keeps buying flashy software that doesn't scale at all, because the test demos all have 100 students. Asymptotic analysis can sometimes mean you're doing overkill on small problems, but it will keep you from doing something dumb later. (Hopefully).
 Meteorswarm
 Posts: 979
 Joined: Sun Dec 27, 2009 12:28 am UTC
 Location: Ithaca, NY
Re: Merging lists in place
Agent_Irons wrote:Xanthir wrote:You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems  they're still doing 100k linear searches over large linked lists rather than using hash tables.
This.
My university keeps buying flashy software that doesn't scale at all, because the test demos all have 100 students. Asymptotic analysis can sometimes mean you're doing overkill on small problems, but it will keep you from doing something dumb later. (Hopefully).
It's all about matching your problem to your constraints. If you just need a oneoff analysis of 10,000 data items, who cares if it's O(n^4) if you can run it overnight. If you're writing something that scales to Google's data, you'd better optimize the heck out of it.
The same as the old Meteorswarm, now with fewer posts!

 Posts: 3
 Joined: Mon Mar 28, 2011 10:49 am UTC
Re: Merging lists in place
I recommend one of the inplace versions of quicksort: such as the following. If the call stack on this gets too big then you may be stuck using bubble sort
Code: Select all
void quicksort(ref int[] b, int begin, int end) {
int pi = end;
int ci = begin;
bool homogenous = true;
if(begin >= end) {
return;
}
for(int i = begin; i <= end; i++) {
if(homogenous && i > begin && b[i] != b[begin]) {
homogenous = false;
}
}
if(homogenous) {
return;
}
while(b[pi  1] == b[pi]) {
pi;
}
while(ci != pi) {
if(pi > ci) {
if(b[ci] < b[pi]) {
int tb = b[ci];
b[ci] = b[pi];
b[pi] = tb;
int ti = ci;
ci = pi;
pi = ti;
} else {
ci++;
}
} else {
if(b[ci] > b[pi]) {
int tb = b[ci];
b[ci] = b[pi];
b[pi] = tb;
int ti = ci;
ci = pi;
pi = ti;
} else {
ci;
}
}
}
quicksort(ref b, begin, pi  1);
quicksort(ref b, pi + 1, end);
}
Re: Merging lists in place
Jplus wrote:I stumbled across another option: quicksort with a stable partitioning algorithm. The C++ STL provides a stable partitioning algorithm. Their implementation is adaptive; if it can allocate some temporary storage it can run in O(n) just like the normal unstable partition of normal quicksort, and otherwise it can run O(n log n) inplace. So a quicksort built upon stable_partition would run O(n log n log n) if it has to be inplace, which is a lot better than the O(n^{2}) of insertion sort or inplace merge sort. If you're not using C++ you can still try and peek at their code here (search for "stable_partition").
It might not solve the OP's problem, I can't help but point out that the C++ STL has std::inplace_merge(). However, I don't think it's required to be stable (also, it's not required to be O(n), library implementors can choose O(n) with a buffer, or O(n*log(n)) without). If you have C++, it might be worth looking at the implementation provided. There's a lot of handy stuff like that in <alogrithm> that most C++ coders are unaware of.
"In no set of physics laws do you get two cats."  doogly
Re: Merging lists in place
It is possible to merge contiguous chunks of memory (input) inplace in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8523
Re: Merging lists in place
dhruvbird wrote:It is possible to merge contiguous chunks of memory (input) inplace in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/su ... .1.22.8523
That paper discusses a merge operation that takes O(n log n) operations, just like the algorithm mentioned by lgw. I do think that std::inplace_merge is the solution to the OP's problem.
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
Re: Merging lists in place
Jplus wrote:dhruvbird wrote:It is possible to merge contiguous chunks of memory (input) inplace in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8523
That paper discusses a merge operation that takes O(n log n) operations, just like the algorithm mentioned by lgw. I do think that std::inplace_merge is the solution to the OP's problem.
Sorry, my bad. Here is the actual link: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.88.1155
Re: Merging lists in place
They specify "a fixed amount of additional space" which is the same thing the std::inplace_merge requires.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.
Re: Merging lists in place
dhruvbird is right: this merge algorithm takes linear time while std::inplace_merge takes loglinear time. However there is a catch: this algorithm isn't stable!
The authors do claim to have a sorting algorithm (but not a merge algorithm) that is stable, inplace and loglinear though... I'm going to look at the paper they refer to, as I'm rather sceptical about that possibility.
The authors do claim to have a sorting algorithm (but not a merge algorithm) that is stable, inplace and loglinear though... I'm going to look at the paper they refer to, as I'm rather sceptical about that possibility.
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
 Yakk
 Poster with most posts but no title.
 Posts: 11063
 Joined: Sat Jan 27, 2007 7:27 pm UTC
 Location: E pur si muove
Re: Merging lists in place
Keep track of 4 lists:
1: source1
2: source2
3: circle  lives at the front of source2: circular buffer near the front of source2 that stores extra stuff from destination when we want to copy from source2
4: dest (which eats up the front of source1)
While source1.front() <= source2.front(), we just shove things into dest. Easy.
When source2.front() is smaller than the front of source1 or circle, we pop the front off source 2, give that space to circle, and stuff the front of source1 into circle, and create space for dest. If circle is smaller than or equal to the front of source2, we take source1's head, and put it in circle, while taking circle's head out, and putting it on the tail of dest.
When we run out of source1, we now have the circle buffer and source2 that we need to merge inplace. If circle is either sane enough, or can be made sane with a sufficiently cheap number of operations, we could argue from recursion here.
Hard part: Keeping circle sane. It is poping elements off, and getting room and new elements inserted, in a pretty insane way.
What properties does circle have to have to make this easy?
1) We want to be able to remove the front element from circle, and stop using that memory slot, while keeping it in a valid state. (this is for the 2nd merge pass)
2) We want to be able to insert a new element, higher than every element in the circle, and end up using 1 more unit of memory at the far end of the circle. (when source2.front() is smaller than circle.front(), we take source1.front() and stuff it into circle, and place source2.front() where it came from, and from there to dest).
3) We want to be able to remove the front element of the circle, and add in a new element that is than larger any element in circle, without changing our memory footprint at all. (when circle.front() is less than source2.front(), we want to take source1.front(), put it in circle while circle.front() goes to source1.front(), and from there to dest).
And we need to do 1 2 and 3 in a memory stable way.

The reason you often care more about O notation than practical tests is that practical tests on small n are easy to do, and are done all the time. Programs often break down when they work well on "typical" cases, but blow up when fed "extreme" cases. By paying attention to O notation, and making things fast and workable on small cases, you get code that is not always optimal for typical cases, but doesn't break down as often.
And it is really common that an algorithm originally intended to handle X sized cases ends up being used for Z sized cases 510 years down the road, as the power of computers makes the other bottlenecks melt away. (Why optimize for more than 10 textures, nobody has that much ram!)
Then again, the code that lives is the code that sucked just badly enough to still work in many cases.
1: source1
2: source2
3: circle  lives at the front of source2: circular buffer near the front of source2 that stores extra stuff from destination when we want to copy from source2
4: dest (which eats up the front of source1)
While source1.front() <= source2.front(), we just shove things into dest. Easy.
When source2.front() is smaller than the front of source1 or circle, we pop the front off source 2, give that space to circle, and stuff the front of source1 into circle, and create space for dest. If circle is smaller than or equal to the front of source2, we take source1's head, and put it in circle, while taking circle's head out, and putting it on the tail of dest.
When we run out of source1, we now have the circle buffer and source2 that we need to merge inplace. If circle is either sane enough, or can be made sane with a sufficiently cheap number of operations, we could argue from recursion here.
Hard part: Keeping circle sane. It is poping elements off, and getting room and new elements inserted, in a pretty insane way.
What properties does circle have to have to make this easy?
1) We want to be able to remove the front element from circle, and stop using that memory slot, while keeping it in a valid state. (this is for the 2nd merge pass)
2) We want to be able to insert a new element, higher than every element in the circle, and end up using 1 more unit of memory at the far end of the circle. (when source2.front() is smaller than circle.front(), we take source1.front() and stuff it into circle, and place source2.front() where it came from, and from there to dest).
3) We want to be able to remove the front element of the circle, and add in a new element that is than larger any element in circle, without changing our memory footprint at all. (when circle.front() is less than source2.front(), we want to take source1.front(), put it in circle while circle.front() goes to source1.front(), and from there to dest).
And we need to do 1 2 and 3 in a memory stable way.

The reason you often care more about O notation than practical tests is that practical tests on small n are easy to do, and are done all the time. Programs often break down when they work well on "typical" cases, but blow up when fed "extreme" cases. By paying attention to O notation, and making things fast and workable on small cases, you get code that is not always optimal for typical cases, but doesn't break down as often.
And it is really common that an algorithm originally intended to handle X sized cases ends up being used for Z sized cases 510 years down the road, as the power of computers makes the other bottlenecks melt away. (Why optimize for more than 10 textures, nobody has that much ram!)
Then again, the code that lives is the code that sucked just badly enough to still work in many cases.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision  BR
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Re: Merging lists in place
Jplus wrote:dhruvbird is right: this merge algorithm takes linear time while std::inplace_merge takes loglinear time. However there is a catch: this algorithm isn't stable!
The authors do claim to have a sorting algorithm (but not a merge algorithm) that is stable, inplace and loglinear though... I'm going to look at the paper they refer to, as I'm rather sceptical about that possibility.
The earlier link was to a merge sort idea that does inplace sorting in time [imath]O(n\log{n})[/imath]. It is not entirely clear to me if the inplace mergesort is stable in nature.
However, it feels as if the inplace merge can be made stable by a bit a twerking.
The inplace merge is actually relatively straightforward. I'll try to explain a simpler version here.
 1. Given an array of size [imath]2n[/imath], sort 2 parts such that each one is as close to the other and is a multiple of [imath]\frac{2n}{\log{2n}}[/imath] (it works even if this is not true). Basically, we want a close to equal split. As long as they are a constant fraction of each other, we are fine.
 2. Chunk it up into blocks of size [imath]\frac{2n}{\log{2n}}[/imath]. You have [imath]\log{2n}[/imath] such chunks.
 3. Now, move the last [imath]\frac{2n}{\log{2n}}[/imath] elements of the total sorted sequence to the beginning of the buffer.
 4. This can be done by first finding the indexes of the boundary elements in each of the last chunks of size [imath]\frac{2n}{\log{2n}}[/imath] (there are only 2 such chunks). This takes time [imath]O(\frac{n}{\log{n}})[/imath].
 5. Now, using tripple reversal, move these elements to the beginning of the list. This takes time [imath]O(n)[/imath].
 6. Use the same merging technique as shown in the paper. You will be left with an unsorted buffer of size [imath]\frac{2n}{\log{2n}}[/imath] towards the end of the list.
 7. Use any inplace sorting algorithm that takes time [imath]O(\log{n})[/imath] to sort this buffer. It till be done in time [imath]O(n)[/imath] since the input is only of size [imath]O(\frac{n}{\log{n}})[/imath].
 8. We are done!
Re: Merging lists in place
Yes, the merge in isolation is linear and the sorting algorithm overall is inplace, O(n log n) and unstable. They state that very clearly in section 5. In section 5 they also claim to have found a nonmerging sorting algorithm that is inplace, O(n log n) and stable, but that one is discussed in another paper. They do consider the latter algorithm to be related to the former.
To be honest, I found the explanation from the paper easier to digest than yours. In fact, you seem to describe a different algorithm since the paper chopped the array in O(sqrt(n)) blocks while you're chopping it in O(log n) blocks.
@Yakk: it appears that you're explaining a different merge algorithm. Is it a stable one? Where did you get it from? Was it supposed to be an answer to a question, and if so, to what question?
To be honest, I found the explanation from the paper easier to digest than yours. In fact, you seem to describe a different algorithm since the paper chopped the array in O(sqrt(n)) blocks while you're chopping it in O(log n) blocks.
@Yakk: it appears that you're explaining a different merge algorithm. Is it a stable one? Where did you get it from? Was it supposed to be an answer to a question, and if so, to what question?
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
 Yakk
 Poster with most posts but no title.
 Posts: 11063
 Joined: Sat Jan 27, 2007 7:27 pm UTC
 Location: E pur si muove
Re: Merging lists in place
I was trying to write a sorting algorithm from my head.
You walk over the first and second list in parallel, with the sorted elements being stored in the part of the first list you have already iterated over. The hard part is storing the excess elements in the first half of the second list whenever we take an element from the second list.
So I abstracted out that problem, and described what properties we'd need for the "container" stored there (which I called circle, because the naive nonworking solution is a circular buffer) in order for the problem to be solved using this naive method. Probably won't work, because those are some strange requirements. On the other hand, so long as we can do those operations in lg(n) time, we are good, so that gives some space to solve the problem...
Hmm. I might be able to argue that, if we can turn our structure stably into a list in something nice like O(n) or O(n lg n) time, requirement (1) can be done away with, because we only switch from needing (2) and (3) to needing (1) lg(n) times?
You walk over the first and second list in parallel, with the sorted elements being stored in the part of the first list you have already iterated over. The hard part is storing the excess elements in the first half of the second list whenever we take an element from the second list.
So I abstracted out that problem, and described what properties we'd need for the "container" stored there (which I called circle, because the naive nonworking solution is a circular buffer) in order for the problem to be solved using this naive method. Probably won't work, because those are some strange requirements. On the other hand, so long as we can do those operations in lg(n) time, we are good, so that gives some space to solve the problem...
Hmm. I might be able to argue that, if we can turn our structure stably into a list in something nice like O(n) or O(n lg n) time, requirement (1) can be done away with, because we only switch from needing (2) and (3) to needing (1) lg(n) times?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision  BR
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Re: Merging lists in place
Hm. Seems like this algorithm is either going to take additional storage or lots of rotations in the circle. Interesting idea though.
Semirelated: I'm going to look up that other algorithm right now.
EDIT: So I dived headlong into it, and I found out that it is a merging sort after all (so I was wrong about that). Again there is a catch: the algorithm only works if no key appears more than approximately sqrt(n) times in the array. It's basically a faster version of an algorithm that was already published in 1977 by Luis Trabb Pardo and which has more relaxed requirements on the key distribution. The older algorithm however still requires that the array has at least about sqrt(n) distinct keys in order to keep its loglinear promise, and even when that works out it has a high constant factor.
Links to the papers (neither should require any special subscriptions):
Huang and Langston 1992
Trabb Pardo 1977
Semirelated: I'm going to look up that other algorithm right now.
EDIT: So I dived headlong into it, and I found out that it is a merging sort after all (so I was wrong about that). Again there is a catch: the algorithm only works if no key appears more than approximately sqrt(n) times in the array. It's basically a faster version of an algorithm that was already published in 1977 by Luis Trabb Pardo and which has more relaxed requirements on the key distribution. The older algorithm however still requires that the array has at least about sqrt(n) distinct keys in order to keep its loglinear promise, and even when that works out it has a high constant factor.
Links to the papers (neither should require any special subscriptions):
Huang and Langston 1992
Trabb Pardo 1977
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
Re: Merging lists in place
Jplus wrote:Yes, the merge in isolation is linear and the sorting algorithm overall is inplace, O(n log n) and unstable. They state that very clearly in section 5. In section 5 they also claim to have found a nonmerging sorting algorithm that is inplace, O(n log n) and stable, but that one is discussed in another paper. They do consider the latter algorithm to be related to the former.
I will need to think about this more. Haven't read the other paper that you are referring to though.
Jplus wrote:To be honest, I found the explanation from the paper easier to digest than yours. In fact, you seem to describe a different algorithm since the paper chopped the array in O(sqrt(n)) blocks while you're chopping it in O(log n) blocks.
The algorithm I describe is similar to the one in the paper. Just the choice of partitions is different. Trying to drive home the point that any partitioning scheme that partitions into blocks of size [imath][\sqrt{n} \ldots{} \frac{n}{\log{n}}][/imath] will work.
The reason being that you get partitions that are between [imath]\log{n}\ \&\ \sqrt{n}[/imath] in number, both of which can be selected in [imath]O(n)[/imath] time. Further, you can use any [imath]O(n\log{n})[/imath] sort for the blocks and get away with a [imath]O(n)[/imath] overall running time.
Re: Merging lists in place
Jplus wrote:Links to the papers (neither should require any special subscriptions):
http://comjnl.oxfordjournals.org/content/35/6/643]Huang and Langston 1992
ftp://db.stanford.edu/pub/public_html/cstr.old/reports/cs/tr/74/470/CSTR74470.pdf]Trabb Pardo 1977
Thanks for the links! I shall be reading these over the week. The first one is not free for all though.
Re: Merging lists in place
There's a very conceptually simple O(n(logn)^{0.5}) version.
Break the lists into chunks of sqrt(logn), sort by the largest element. Perform a basic insertion sort starting from the end. No piece can have to move more than 2(sqrt(logn)) pieces, as there are only two monotonically increasing lists to choose blocks from, even adversarially I'm pretty sure, but certainly on average. Any given item cannot need to go back past items of the list it itself is from, and from the other list, you cannot have more than 1 chunk that you need to go past, because the chunks are sorted by the largest element. I'm not sure if its stable.
Break the lists into chunks of sqrt(logn), sort by the largest element. Perform a basic insertion sort starting from the end. No piece can have to move more than 2(sqrt(logn)) pieces, as there are only two monotonically increasing lists to choose blocks from, even adversarially I'm pretty sure, but certainly on average. Any given item cannot need to go back past items of the list it itself is from, and from the other list, you cannot have more than 1 chunk that you need to go past, because the chunks are sorted by the largest element. I'm not sure if its stable.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.
Re: Merging lists in place
Jplus wrote:EDIT: So I dived headlong into it, and I found out that it is a merging sort after all (so I was wrong about that). Again there is a catch: the algorithm only works if no key appears more than approximately sqrt(n) times in the array. It's basically a faster version of an algorithm that was already published in 1977 by Luis Trabb Pardo and which has more relaxed requirements on the key distribution. The older algorithm however still requires that the array has at least about sqrt(n) distinct keys in order to keep its loglinear promise, and even when that works out it has a high constant factor.
Links to the papers (neither should require any special subscriptions):
Huang and Langston 1992
Trabb Pardo 1977
Actually, there is one more catch. The elements of the buffer get randomly permuted, and that is harder than getting around the [imath]\sqrt{n}[/imath] restriction on the number of unique elements. I haven't been able to understand how Langston ensures that a final stable sorting of the buffer will ensure overall stability. i.e. How the buffer is manipulated throughout so that the buffer's elements can be reconstructed stably. It seems hard at first blush.
Re: Merging lists in place
The trick is in the composition of the buffer. All keys in the buffer are distinct, and each of the keys was the first of its rank (or the last, depending on which algorithm we're talking about). So when you sort the buffer in the end you don't need to worry about the relative order of the keys within, and when you merge the sorted buffer back into the rest of the array you know that ties should be broken in favour of the buffer (or the other way round, depending on which algorithm we're talking about).
So this is not really a catch, it's just a relatively complex way to achieve something relatively simple.
So this is not really a catch, it's just a relatively complex way to achieve something relatively simple.
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
Re: Merging lists in place
Jplus wrote:The trick is in the composition of the buffer. All keys in the buffer are distinct, and each of the keys was the first of its rank (or the last, depending on which algorithm we're talking about). So when you sort the buffer in the end you don't need to worry about the relative order of the keys within, and when you merge the sorted buffer back into the rest of the array you know that ties should be broken in favour of the buffer (or the other way round, depending on which algorithm we're talking about).
So this is not really a catch, it's just a relatively complex way to achieve something relatively simple.
So, I am not able to understand how can you ensure that the buffer contains distinct keys if all keys are the same?
Even if they aren't, what strategy does the paper suggest in case (say) the largest, 2nd largest, 3rd largest, etc... keys are repeated 3 times each. Let's assume that the buffer holds the last key of it's rank for the purpose of this discussion.
Re: Merging lists in place
Well, the paper doesn't suggest any strategy for the case where all keys have the same value. That's why there is a lower bound of sqrt(n) on the number of distinct key values. There is no way to make it work for fewer distinct key values, AFAIK. That is the catch I was talking about.
I know, it makes these algorithms of rather limited use.
I know, it makes these algorithms of rather limited use.
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
Re: Merging lists in place
Jplus wrote:Well, the paper doesn't suggest any strategy for the case where all keys have the same value. That's why there is a lower bound of sqrt(n) on the number of distinct key values. There is no way to make it work for fewer distinct key values, AFAIK. That is the catch I was talking about.
It seems that the paper does mention (in section 4.4: Other Relevant Details):
In the event that we exhaust the first sublist without filling the internal buffer, we must employ fewer but larger blocks. Specifically, if we obtain only [imath]s < \sqrt{n}[/imath] buffer elements, then we use s blocks, each of size at most [imath]\left\lceil \dfrac{n}{\sqrt{s}} \right\rceil[/imath].
This makes me think that any arbitrary input is supported by this scheme. Do you think so?
Re: Merging lists in place
Oh right, I forgot about that. Yes, you're right that any arbitrary input is supported, but note that the algorithms become less efficient as the block size differs more from sqrt(n). If you deviate much from sqrt(n), the logloglinear inplace merge sort will be more efficient.
As an aside: I notice that you're quite fond of jsMath, but others do not. AFAICT you don't really need it here.
As an aside: I notice that you're quite fond of jsMath, but others do not. AFAICT you don't really need it here.
"There are only two hard problems in computer science: cache coherence, naming things, and offbyone errors." (Phil Karlton and Leon Bambrick)
coding and xkcd combined
(Julian/Julian's)
coding and xkcd combined
(Julian/Julian's)
Who is online
Users browsing this forum: No registered users and 2 guests