2015-10-31

Sorting

Comparison based sorting algorithms

  • Bubble sort
    • O(n^2), in-place, stable
  • Insertion sort
    • O(n^2), in-place, stable
    • in practical, it is faster for small set of input
    • cache friendly
  • Selection sort
    • O(n^2), in-place, stable
  • Merge sort     
    • O(nlgn) time, extra space, stable
    • This algorithm could be used for big chunk of data which is unable to fit all of them into memory, because we can load part of the data into memory and sort them, at the last, merge each partly sorted pieces.
  • Quick sort      
    • O(nlgn) time, in-place, constant space, unstable 
    • 3 ways to choose pivot
      • always pick first one: easy implement but no good performance for sorted input.
      • With randomized algorithm running time analysis, which is choosing the pivot number randomly from the input, the worst case running time O(n^2) can be improved to O(nlgn) even for sorted input.
      • however, generate random number is not cheap. that comes strategy 3: median-of-3 (left, right, middle) which is more practical and also has good performance.
  • Heap sort       
    • O(nlgn) time, in-place, stable
    • Heap is very good way for extracting max/min although heap sort is a nice application of heap
  • BST sort
    • O(nlgn) time if we keep the tree balanced

Non-comparison based sorting algorithms

  • Counting sort
    • has to know the range of input data, for instance, grading could be within 100, age within 200 etc. This is also a limitation of this algorithm.
    • need extra linear space to count each element of input
  • Bucket sort
    • chop data chunk into small pieces, then use other algorithm to sort each piece. In this sense, one could argue it is comparison based sorting algorithm probably.
    • assuming the data has uniform distribution, otherwise this method does make much sense.
    • divide-and-conquer methodology
    • degenerate to counting sort if each element has a bucket
  • Radix sort
    • sorting based on each digit and put them into queue
    • easier way to go through digit from right to left.
    • running time: O(kn), k is the digit length of input
    • if input has different length of digits, this algorithm also works. Just output those elements with shorter digits, because they were sorted earlier than with longer digits.
    • stable

 Why are there so many sorting algorithm?

  • depends on size of input (insertion sorting may not that bad for small amount of input, for N <= 20, faster than quicksort)
  • depends on data distribution (non-comparison sorting algorithm)
  • depends on memory limits (merge sort could be very useful for huge mount of input)

Reference

No comments:

Post a Comment