Moderators: phlip, Prelates, Moderators General
Xanthir wrote:You can't merge two lists in place, unless you're splitting the final list across the two input lists. Merging two lists necessarily creates a new list, of size equal to the sum of the input sizes.
Stable merging of two lists in linear time is easy, though. Just set up two list pointers, each pointing to the start of one of the inputs. Every iteration, check which of the two pointers has a smaller value. Copy that value into the result and move that pointer. Repeat until one of the lists is empty, at which point you can just append the entries in the remaining list to the end of the output list. That's a single linear pass through each list.
LucasBrown wrote:My program isn't really merging two lists... it's all in one contiguous block of memory, and my program is written in such a way that the first half might as well be a completely seperate list from the second half. My question is this: is there a way to rearrange the items in the list in a stable, sub-quadratic manner that uses a quantity of additional space that is independent of the size of the lists?
P.S. The only reason I'm posting this is because the piece of hardware my program's on has miniscule memory and the data I'm working with almost fills the memory--I would gladly do the copy-into-new-list thing if it was possible on my hardware.
Berengal wrote:You can get O(n) time with O(n) memory using a merge, or O(n log n) time with O(1) memory by sorting the whole thing. Or, if you switch to linked lists, you can get O(n) time with O(1) memory using a merge.
Meteorswarm wrote:You can also mimic a stable sort (assuming that you're trying to use it to have things be sub-sorted by other keys) by having your comparison function be sophisticated enough to consider all the keys simultaneously.
Meteorswarm wrote:Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.
Dason wrote:Meteorswarm wrote:Right, but remember that for any given size of inputs, the constant factors in our analysis may be more important than the actual big O terms, so really the best thing to do would be to try various methods and time them all.
I've always felt that this isn't emphasized enough. In my CS classes it seemed like all we cared about was the big O of the algorithm but a lot of our practice type situations weren't very large so we were actually taking more time by using a more sophisticated algorithm.
Xanthir wrote:You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems - they're still doing 100k linear searches over large linked lists rather than using hash tables.
Agent_Irons wrote:Xanthir wrote:You don't really have to worry. Most people don't pay attention to data structures and algorithms in the first place, so ignoring constants is the least of your problems - they're still doing 100k linear searches over large linked lists rather than using hash tables.
This.
My university keeps buying flashy software that doesn't scale at all, because the test demos all have 100 students. Asymptotic analysis can sometimes mean you're doing overkill on small problems, but it will keep you from doing something dumb later. (Hopefully).
void quicksort(ref int[] b, int begin, int end) {
int pi = end;
int ci = begin;
bool homogenous = true;
if(begin >= end) {
return;
}
for(int i = begin; i <= end; i++) {
if(homogenous && i > begin && b[i] != b[begin]) {
homogenous = false;
}
}
if(homogenous) {
return;
}
while(b[pi - 1] == b[pi]) {
pi--;
}
while(ci != pi) {
if(pi > ci) {
if(b[ci] < b[pi]) {
int tb = b[ci];
b[ci] = b[pi];
b[pi] = tb;
int ti = ci;
ci = pi;
pi = ti;
} else {
ci++;
}
} else {
if(b[ci] > b[pi]) {
int tb = b[ci];
b[ci] = b[pi];
b[pi] = tb;
int ti = ci;
ci = pi;
pi = ti;
} else {
ci--;
}
}
}
quicksort(ref b, begin, pi - 1);
quicksort(ref b, pi + 1, end);
}
Jplus wrote:I stumbled across another option: quicksort with a stable partitioning algorithm. The C++ STL provides a stable partitioning algorithm. Their implementation is adaptive; if it can allocate some temporary storage it can run in O(n) just like the normal unstable partition of normal quicksort, and otherwise it can run O(n log n) in-place. So a quicksort built upon stable_partition would run O(n log n log n) if it has to be in-place, which is a lot better than the O(n^{2}) of insertion sort or in-place merge sort. If you're not using C++ you can still try and peek at their code here (search for "stable_partition").
dhruvbird wrote:It is possible to merge contiguous chunks of memory (input) in-place in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/su ... .1.22.8523
Jplus wrote:dhruvbird wrote:It is possible to merge contiguous chunks of memory (input) in-place in O(n) time. You can read about it here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.8523
That paper discusses a merge operation that takes O(n log n) operations, just like the algorithm mentioned by lgw. I do think that std::inplace_merge is the solution to the OP's problem.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.
Jplus wrote:dhruvbird is right: this merge algorithm takes linear time while std::inplace_merge takes log-linear time. However there is a catch: this algorithm isn't stable!
The authors do claim to have a sorting algorithm (but not a merge algorithm) that is stable, in-place and log-linear though... I'm going to look at the paper they refer to, as I'm rather sceptical about that possibility.
Jplus wrote:Yes, the merge in isolation is linear and the sorting algorithm overall is in-place, O(n log n) and unstable. They state that very clearly in section 5. In section 5 they also claim to have found a non-merging sorting algorithm that is in-place, O(n log n) and stable, but that one is discussed in another paper. They do consider the latter algorithm to be related to the former.
Jplus wrote:To be honest, I found the explanation from the paper easier to digest than yours. In fact, you seem to describe a different algorithm since the paper chopped the array in O(sqrt(n)) blocks while you're chopping it in O(log n) blocks.
Jplus wrote:Links to the papers (neither should require any special subscriptions):
http://comjnl.oxfordjournals.org/content/35/6/643]Huang and Langston 1992
ftp://db.stanford.edu/pub/public_html/cstr.old/reports/cs/tr/74/470/CS-TR-74-470.pdf]Trabb Pardo 1977
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.
Jplus wrote:EDIT: So I dived headlong into it, and I found out that it is a merging sort after all (so I was wrong about that). Again there is a catch: the algorithm only works if no key appears more than approximately sqrt(n) times in the array. It's basically a faster version of an algorithm that was already published in 1977 by Luis Trabb Pardo and which has more relaxed requirements on the key distribution. The older algorithm however still requires that the array has at least about sqrt(n) distinct keys in order to keep its log-linear promise, and even when that works out it has a high constant factor.
Links to the papers (neither should require any special subscriptions):
Huang and Langston 1992
Trabb Pardo 1977
Jplus wrote:The trick is in the composition of the buffer. All keys in the buffer are distinct, and each of the keys was the first of its rank (or the last, depending on which algorithm we're talking about). So when you sort the buffer in the end you don't need to worry about the relative order of the keys within, and when you merge the sorted buffer back into the rest of the array you know that ties should be broken in favour of the buffer (or the other way round, depending on which algorithm we're talking about).
So this is not really a catch, it's just a relatively complex way to achieve something relatively simple.
Jplus wrote:Well, the paper doesn't suggest any strategy for the case where all keys have the same value. That's why there is a lower bound of sqrt(n) on the number of distinct key values. There is no way to make it work for fewer distinct key values, AFAIK. That is the catch I was talking about.
In the event that we exhaust the first sublist without filling the internal buffer, we must employ fewer but larger blocks. Specifically, if we obtain only s < \sqrt{n} buffer elements, then we use s blocks, each of size at most \left\lceil \dfrac{n}{\sqrt{s}} \right\rceil.
Users browsing this forum: No registered users and 5 guests