Read Latex

Monday, July 20, 2015

A Random Note on Sorting



The Verb View

In the landmark 1981 film “Sorting out Sorting” from the University of Toronto Computer Systems Research Group, we see computational sorts placed in three categories by the nature of the operation (the verb) they execute to accomplish the sort. These three categories are:
  • Insertion sorts
  • Exchange sorts
  • Selection sorts
Insertion sorts find an out-of-order member, put it in the correct position, and then reorders the remaining members to account for this. When an array is used to store the entries in the list to be sorted, this reordering requires a quantity of operations whose number is proportional to the original size of the list. In computational complexity speak we say that this reordering requires order n operations, abbreviated O(n).

There are three  parts to every sort:
  • Traversal
  • Comparison
  • Movement
If the comparison step requires a fixed constant number of steps but the movement requires O(n) steps, the complexity of the sort will be at least the complexity of the slowest step, O(n) in the case here.

Sorts that require O(n) steps to compare and O(n) steps to move tend to run in O(n2) time. For large lists we don't like this as it creates a practical upper bound on the size of the list we can sort. It should be noted that when we are sorting just a few items, an O(n^2) sort can be just as fast, both in execution and implementation as a linear time sort. But it is always poor form to use an O(n2) sort on a large list, especially since there are so many excellent and free implementations that run faster.

The Noun View

The complexity analysis enumerated in Sorting out Sorting assumed that the elements of the sort were, at some point, stored in an array. For some sorts, this array is converted into a tree.

Let’s go back to the insertion sort for a moment. Consider a reordered element inserted between Array[i] and Array[i+1]. If the index [j]of the reordered element was greater than [i+1], then Array[i+1]…Array[j] elements must each be scooted over until their place in the ordered list can be determined. On average some linear fraction of the list elements will have to be moved, and if we do this n times for each element in the list we are left with a O(n^2) time complexity, a famous example of which is bubblesort.

But now instead of storing our list as an array, lets say that it is stored as a doubly-linked list. Inserting an item can now be done in constant time, because we do not have to scoot all the remaining items down the line. So an insertion step that requires linear time complexity O(n) using an array data type, requires only constant O(c) time complexity in the linked list case.

The astute reader will note that this speedup has not been obtained for free. A doubly-linked list requires that we not only store the item itself, but we must also store the links to the previous and next items. If our business plan involved sorting integers, we have tripled our storage costs by using a doubly-linked list.

The mental exercise that comes from this redux is to consider the cost of traversal, comparison and movement of list entries using the following structures:

  • Arrays: See Sorting out Sorting
  • Doubly-Linked List: See development above
  • Stacks (Last In First Out = LIFO)
  • Queues (or Lines, First In First Out = FIFO)
  • Bags (non-indexed, non-linked, you reach in and see what you get)
  • Trees (many kinds of trees each with their own properties)
This blog came from a question I asked myself this morning, "Does bubblesort on a stack give you the Towers of Hanoi?"



Cool Demonstrations

This site shows a nice comparison of sorting methods including their sonic signature.



This is an aid to feeling and learning what is happening during the sort. It is also handy if you have ADD because 15 sorting methods are compared in just six minutes.

The site sorting-algorithms.com enables a comparison of various methods and does the sort while you watch.


It does the best job of showing the performance of sorting algorithms on pathological cases, like nearly-sorted lists, reversed-lists and lists whose pivots are pathological.  One sorting algorithm that I like, even though it is not optimal is shellsort. If you had to stop the sort midway, the list is very close to being in the correct order. One might imagine this shellsort being best in situations where the number of items was very large, or where there was some real-time deadline in force that prevented finishing the sort.

If you want to move past the basics, MIT open courseware has some excellent lectures on the computer science fundamentals of sorting, including the mathematics of proofs briefly touched above.



Finally consider if we could use computing machinery to sort hazardous waste into its constituent components. Wouldn't that be useful!








No comments: