Sorting
To sort an array of
items, we may require
exchanges.
An algorithm is said to be stable on a set of keyed data if elements with the same key are ordered in the same way after sorting as before.
Selection sort
Repeatedly find the smallest item and add it into the invariant list.
This is
. The advantage is that the number of comparisons is independent of the original organization.
Binary insertion sort
A binary search on the invariant array can be performed in
comparisons. The complete sorting process therefore makes
. This is bounded by
. This effectively attains the lower bound for general sorting, but it has high (quadratic) data movement costs.
Insertion sort
We swap an item down the invariant array until an item is reached which is less than it.
This clearly has worst case cost
in both comparisons and data movement. However, insertion sort is optimal for sorted lists, with
comparisons. It is practical for very small lists, or ones which are nearly sorted.
Insertion sort is stable.
Bubble sort
We repeatedly pass through the array, comparing adjacent elements. If they are out of order, we swap and continue. The algorithm is terminated when no more swaps are required.
At each stage we must move the largest item to the end of the array. Therefore, there can be at most
passes.
Mergesort
Merging data from two lists of length
takes at most
steps. Therefore, we have the recurrence
, which can be solved by dividing by
. We find that the runtime is
.
- Simple
- Optimal cost
- Low time overheads
However, mergesort requires extra storage space.
can be used by copying one half away and then running the algorithm as normal directly into the source array. (Another variant reverses one list and the decrements one pointer, but this still uses
units of extra space.)
A fast implementation might use insertion sort for small arrays.
Data Structures
Hash tables
Duplicate entries:
- We can deal with duplicates using either chaining or open addressing.
Chaining
- Each entry in the hash table contains a linked list. Inserting into the linked list occurs into the pointer in the table.
Open addressing
- We extend the hashing function into a two parameter function; the second parameter is the probe count, an integer.
- Insertion involves calculating
. If this index is non-empty, we calculate
and so on. The hash function must satisfy the condition that the probe sequence
must be a permutation of
. - Deleting from an open address hash table is difficult. We cannot simply mark the slot we're deleting from as empty, as if we do this, values added after the value under deletion will be inaccessible.