So, first defining insertion sort, to make sure we're on the same page. I picture the algorithm in pseudocode like this:
Sort (UnsortedList) = InsertionSort (empty list, UnsortedList)
InsertionSort (SortedList, UnsortedList) =
If unsortedList == empty list,
then return SortedList
Else return InsertionSort( Insert(Head of UnsortedList, SortedList ) , Tail of UnsortedList )
Where the head of a list is its first element and its tail the rest of the list, and where Insert(element, List) is a function that inserts the element in the correct position into the List and returns the new sorted list.
This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.
But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:
Insert(element,List) =
if element > Last (List) then
return List:element
else if element > Middle (List) then
return ( firstHalf (List)) ++ Insert (element, lastHalf (List) )
else return Insert (element, FirstHalf (List)) ++ ( lastHalf (List) )
should run in O(Log N) time if I'm not mistaken, with not much overhead. I'm sure you could do even better with a nonrecursive function that works directly with list indices.
So why is InsertionSort usually given as a Log(N^2) time algorithm? Is there something that I've missed?
Why is insertion sort O(N^2) ?
Moderators: phlip, Moderators General, Prelates
Re: Why is insertion sort O(N^2) ?
The insertion cannot be implemented in O(log n). How do you represent the list? Sorting algorithms are usually designed to work on arrays. If you are doing that you can use binary search to find the correct position of the element but in order to insert it you have to shift O(n) additional elements in the worst case. If you chose to represent the list as a linked list you can insert an element in the middle of the list in O(1) but you cannot use binary search to find the correct position.
-
- Posts: 12
- Joined: Wed Oct 29, 2014 6:20 pm UTC
Re: Why is insertion sort O(N^2) ?
Ah OK, so the explanation is that even if you only have O(N Log N) comparisons, other operations on common data structures have a faster scaling and will become dominant if comparisons aren't the bottleneck. That makes sense. I guess it also explains why insertion sort for non-computing applications like sorting playing cards works so well since shifting costs nothing.
Re: Why is insertion sort O(N^2) ?
Not exactly, you're saying other operations are causing the whole algorithm to not have O(n * log n). You can achieve such O using the right data structures, such as self organizing binary trees (e.g. red-black trees). This sorting is however inferior to other approaches because of the overhead of such structures. This kind of sorting is sometimes also called binary insertion sort. Normal insertion sort is traditionally implemented over array. That is why the O(N^2) is used while discussing it.
-
- Posts: 5493
- Joined: Sun Dec 26, 2010 1:58 pm UTC
Re: Why is insertion sort O(N^2) ?
Boson Collider wrote:This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.
But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:
Alternative answer: Yes you can. That's called HeapSort. A "Heap" is a special sorted list that has O(LogN) insertions and O(LogN) removals.
Spoiler:
If you have an "insertion-sort" that uses a Heap, its called HeapSort.
First Strike +1/+1 and Indestructible.
Re: Why is insertion sort O(N^2) ?
Suppose you partially index the list? For example, you store (in an array) the address and value of every (say) tenth element. You can use a binary search on this array to jump into the linked list, so you don't have to traverse the whole thing, just the last mile. Yes, there's some overhead in maintaining the array, but the array needn't be perfect in order to speed the thing up. Would this speedup not be enough to reduce the O factor? What if it were done recursively (that is, index the index as n gets bigger)?korona wrote:If you chose to represent the list as a linked list you can insert an element in the middle of the list in O(1) but you cannot use binary search to find the correct position.
Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.
Re: Why is insertion sort O(N^2) ?
I believe there are four characteristics of insertion sort:
a) you're taking the input elements one by one.
b) you're inserting the elements directly into the output data structure.
c) after each insert, the output data structure is sorted.
d) the memory overhead is O(1).
If you push the elements into a heap, then empty the heap into the output structure, it's not insertion sort because of b). As soon as you add an index on top of the output data structure, it's not insertion sort because of d).
And I think that's an important point to make: adding an index is not improving insertion sort. It's making a different tradeoff, memory vs. speed.
About your actual suggestion, the index you're thinking of is called a skip list. If done correctly, it has O(log(n)) insertion costs on average, but the worst case insertion costs are O(n). So - unless you can prove otherwise via amortization - the worst case complexity for the whole sort must still be assumed O(n^2).
It's possible to reduce this to O(n*log(n)) with a different index structure (a B+ tree comes to mind, it keeps the leaf nodes in a linked list anyway), but again, this requires additional memory, so it's not insertion sort.
Also, these things only work if your output structure is a list. On an array, I'm not aware of any way to get insertion sort below O(n^2) while keeping characteristics a, b and c.
a) you're taking the input elements one by one.
b) you're inserting the elements directly into the output data structure.
c) after each insert, the output data structure is sorted.
d) the memory overhead is O(1).
If you push the elements into a heap, then empty the heap into the output structure, it's not insertion sort because of b). As soon as you add an index on top of the output data structure, it's not insertion sort because of d).
And I think that's an important point to make: adding an index is not improving insertion sort. It's making a different tradeoff, memory vs. speed.
About your actual suggestion, the index you're thinking of is called a skip list. If done correctly, it has O(log(n)) insertion costs on average, but the worst case insertion costs are O(n). So - unless you can prove otherwise via amortization - the worst case complexity for the whole sort must still be assumed O(n^2).
It's possible to reduce this to O(n*log(n)) with a different index structure (a B+ tree comes to mind, it keeps the leaf nodes in a linked list anyway), but again, this requires additional memory, so it's not insertion sort.
Also, these things only work if your output structure is a list. On an array, I'm not aware of any way to get insertion sort below O(n^2) while keeping characteristics a, b and c.
Re: Why is insertion sort O(N^2) ?
KnightExemplar wrote:Boson Collider wrote:This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.
But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:
Alternative answer: Yes you can. That's called HeapSort. A "Heap" is a special sorted list that has O(LogN) insertions and O(LogN) removals.Spoiler:
If you have an "insertion-sort" that uses a Heap, its called HeapSort.
Alternatively, you can insert the elements into a sorted binary tree. This is called Tree Sort .
Who is online
Users browsing this forum: operator[] and 3 guests