Sorting

To sort an array of n items, we may require n - 1 exchanges.

An algorithm is said to be stable on a set of keyed data if elements with the same key are ordered in the same way after sorting as before.

Selection sort

Repeatedly find the smallest item and add it into the invariant list.

This is \Theta (n). The advantage is that the number of comparisons is independent of the original organization.

Binary insertion sort

A binary search on the invariant array can be performed in \left \lceil \lg(k) \right \rceil comparisons. The complete sorting process therefore makes \sum_{i = 1}^{n - 1} \left \lceil \lg(i) \right \rceil. This is bounded by \lg((n - 1)!) + n. This effectively attains the lower bound for general sorting, but it has high (quadratic) data movement costs.

Insertion sort

We swap an item down the invariant array until an item is reached which is less than it.

This clearly has worst case cost \Theta (n^2) in both comparisons and data movement. However, insertion sort is optimal for sorted lists, with n - 1 comparisons. It is practical for very small lists, or ones which are nearly sorted.

Insertion sort is stable.

Bubble sort

We repeatedly pass through the array, comparing adjacent elements. If they are out of order, we swap and continue. The algorithm is terminated when no more swaps are required.

At each stage we must move the largest item to the end of the array. Therefore, there can be at most n passes.

Mergesort

Merging data from two lists of length n/2 takes at most n steps. Therefore, we have the recurrence f(n) = 2f(n/2) + k n, which can be solved by dividing by n = 2^m. We find that the runtime is \Theta(n \lg n).

However, mergesort requires extra storage space. n/2 can be used by copying one half away and then running the algorithm as normal directly into the source array. (Another variant reverses one list and the decrements one pointer, but this still uses n units of extra space.)

A fast implementation might use insertion sort for small arrays.

Data Structures

Hash tables

Duplicate entries: h(k_i) = h(k_j), i \ne j
Chaining
Open addressing