A better implementation of bead-sort
Bead sort (a.k.a Abacus sort, or Gravity sort) is a sorting algorithm that can sort a list of positive integers.
Belonging to a class of natural algorithms, It uses (/simulates) gravity to sort an input list.
The sort algorithm accomplishes this in three acts –
As simple as the 3-step description makes it seem, the algorithm doesn’t translate as easily to software.
It is usually implemented as a 2D matrix representing the abacus/beads state and beads are dropped level-by-level.
Here’s a reference implementation in python –
The above implementation uses O(n*max) storage⁽¹⁾, and has a time complexiy of O(n*max).
(n: size of the input list max: maximum element in the input list)
A O(n*max) storage efficiency is terrible for a sorting algorithm.
The time efficiency isn’t as terrible though, it can potentially outperform O(nlogn) comparison-based sort algorithms for very small numbers (i.e. max << n).
For instance, if all numbers in the input list are below 100, then the algorithm runs in almost linear time for large n.
But log(n) grows far too slow for a meaningful ‘small numbers set’ for the above algorithm to outperform O(nlogn).
For context, log₂(10⁹) ≈ 30.
However, the algorithm does have one last redeeming quality.
It yields well to parallelism.
In theory, with ‘max’ concurrent threads, each thread can independently drop the beads in their respective columns in linear time.
A better implementation
In this article, I outline an implementation with O(n) space, and O(n*(max-min)) time efficiency.
(n: size of the input list max: maximum element in the input list min: minimum element in the input list)
Full disclosure: I didn’t set out to improve the efficiency of bead-sort.
A random bug in a completely unrelated program pointed me towards an alternate, more efficient implementation for bead-sort.
Specifically, I was trying to print a histogram for a given set of numbers using ‘*’s for building blocks.
An input of [2,6,1,4,3] is to print the following pattern –
* * * * * * * * * * * * * * * *
The solution is simple enough –
1. There are going to be 'max' rows in the output. So iterate over max rows down to 0 2. At each row i, 2.1 Print a '*' for each a[j] (0 <= j < n) if a[j] is tall enough to reach row i (a[j] > i) 2.2 Add adequate spaces otherwise.
Below is the code for the same –
However, I missed step 2.2 in my initial solution, and ended up coding something like below –
5 sys.stdout.write('* ') if a[j] > i else None
and so when I ran it, the algorithm promptly printed
* * * * * * * * * * * * * * * *
While it was immediately obvious why this wasn’t quite the histogram output I was looking for, something else caught my eye.
The number of ‘*’s in each column were sorted in descending order.
The absence of spaces when a[j] wasn’t big enough for row ‘i’ caused the ‘*’s gravitate towards the left and fill the gaps just as in the abacus sort illustration above.
I went ahead and implemented a bead-sort solution inspired by the bug in my histogram code.
0. Initialize a temporary list, temp, filled with 0s. 1. Iterate 'max' rows down to 0 2. At each row i, Increment temp[0..] by 1 for all a[j] greater than i. 3. After 'max' iterations, the temporary list is sorted in descending order. 3.1 Copy elements from temporary list back into the original list in reverse order to sort the input list by ascending order.
A reference python implementation of the aforementioned algorithm follows –
The above algorithm has a space efficiency of O(n) and a time efficiency of O(n*max).
While we got down the space efficiency to O(n), the time efficiency isn’t any better off.
Can we do better?
Notice how the following line is always true for all i < ‘minimum’?
(minimum: smallest element in the input list)
15 if a[j] > i:
This means that the check can be avoided altogether by stopping the loop once i == minimum, and later incrementing all the elements in the list by ‘minimum’.
Conversely, initialize the list to [minimum] instead of 0, and iterate over rows – maximum to minimum.
Below code takes advantage of this optimization –
This optimization yields a time efficiency of O(n*(max-min)) while retaining the space efficiency of O(n) from the previous approach. Much better.
Better time and space efficiency.
While algorithm #2 and #3 bring the space efficiency to O(n), Bead-sort still remains a terrible algorithm for sorting even with the O(n*(max-min)) time optimization at algorithm #3.
However, a time efficiency of O(n*(max-min) means it can approach linear time for numbers in a narrow range irrespective of how big the numbers themselves are.
If all numbers in the input list are between 10⁹+1 to 10⁹+100, algorithm #3 has a runtime complexity of O(n * 100); With a big enough ‘n’ this will be closer to O(n).
In contrast, algorithms #1 and #2 would have a time efficiency of O(n*10⁹) for the same input.
To be fair, algorithm #1 can potentially work with negative numbers.
One way this can be accomplished is by –
1. Bring all the numbers to a baseline zero by adding a number x 2. Sort the modified list 3. Subsequently decrement all the numbers by x.
An input of [2, -2, 3] can be modified to [4, 0, 5] by adding +2 to all input numbers,
running the sort algorithm on the modified list would yield [0,4,5],
and finally, restore the original list of numbers by subtracting 2 from all, resulting in a sorted order for the original input: [-2, 2, 3]
Optimize step while iterating between rows.
Algorithm #3 iterates between ‘rows of beads’ from maximum down to minimum in steps of 1.
If the nature of input numbers is known in advance, the step count can be optimized to move faster.
Consider for e.g, if the input numbers are all even, the step count can be increased to 2 reducing the runtime by half.
Or, if every number in the input list, a[i] ≡ x mod 500 for some x (imagine a worker thread handling every 500th request), then a step of ‘500’ can be used.
Floating point numbers
With the step optimization outlined above, algorithm #4 can potentially work with floating point numbers when a proper step can be chosen.
An input of [2.0, 1.5, 3.0, -0.5] can be sorted with a step of 0.5 in algorithm #4.
However, finding a meaningful step might not always be viable or might be too small (e.g, 0.0005) to be used for sorting.
While at first glance, algorithms #2, #3, and #4 appear to retain the parallel properties of #1, a closer look at the following lines reveals that updates to the temporary array needs to be synchronized and might not yield well to parallelism.
temp[k] += 1 k += 1
And perhaps most importantly, Not all bugs are bad, it would seem.
⁽¹⁾A note at the end of the wolfram’s page does mention an optimization for algorithm #1 using O(n+max) space, but it doesn’t improve the time efficiency beyond O(n*max).