Radix Sort is a non-comparison-based sorting algorithm that exploits a
space–time tradeoff to efficiently sort integers.
It processes keys digit by digit, from the least significant digit (LSD) to the most significant.
At each digit position, it uses a stable sort—usually
Counting Sort—to reorder
elements by that digit while preserving the order of elements with equal digits.
Each pass is essentially a call to Counting Sort, applied not to the whole key, but to a single digit of each number.
If you are not familiar with Counting Sort, you should read that section before continuing.
Let \( n \) be the number of items to sort, \( d \) the number of digits in the largest key (in base \( k \)), and \( k \) the radix (e.g., 10 for decimal). The algorithm performs \( d \) passes—one for each digit position—by looping through \( p = 0 \) to \( d - 1 \).
Each pass uses three arrays:
Input Array \( A[0 \ldots n-1] \): the original array of keys
Count Array \( C[0 \ldots k-1] \): counts occurrences of each digit
Output Array \( B[0 \ldots n-1] \): stores the result after the current pass
The algorithm works as follows:
For each digit position \( p = 0 \) to \( d - 1 \), do the following:
Count Digit Occurrences:
Initialize \( C[0 \ldots k-1] = 0 \).
For each index \( i \) from \( 0 \) to \( n - 1 \):
Compute the digit at position \( p \):
\( d_i = \left\lfloor \frac{A[i]}{k^p} \right\rfloor \bmod k \)
After three passes, the array is fully sorted. Because each digit is sorted stably, the effects of previous digit orders are preserved.
Implementation in Java, C++, Python
Here are implementations using base \(k=10\) (base 10).
public void radixSort(int[] A) {
int n = A.length;
// Find maximum value
int max = A[0];
for (int i = 1; i < n; i++) {
if (A[i] > max) max = A[i];
}
int[] B = new int[n];
// Process each digit using exp = 1, 10, 100, ...
for (int exp = 1; max / exp > 0; exp *= 10) {
int[] C = new int[10];
// Phase 1: Count digit occurrences
for (int i = 0; i < n; i++) {
int digit = (A[i] / exp) % 10;
C[digit]++;
}
// Phase 2: Compute prefix sums
for (int i = 1; i < 10; i++) {
C[i] += C[i - 1];
}
// Phase 3: Stable placement into B
for (int i = n - 1; i >= 0; i--) {
int digit = (A[i] / exp) % 10;
B[--C[digit]] = A[i];
}
// Copy back for next pass
System.arraycopy(B, 0, A, 0, n);
}
}
void radixSort(int A[], int n) {
// Find maximum value
int maxv = A[0];
for (int i = 1; i < n; i++) {
if (A[i] > maxv) maxv = A[i];
}
int* B = new int[n];
// Process each digit using exp = 1, 10, 100, ...
for (int exp = 1; maxv / exp > 0; exp *= 10) {
int C[10] = {0};
// Phase 1: Count digit occurrences
for (int i = 0; i < n; i++) {
int digit = (A[i] / exp) % 10;
C[digit]++;
}
// Phase 2: Compute prefix sums
for (int i = 1; i < 10; i++) {
C[i] += C[i - 1];
}
// Phase 3: Stable placement into B
for (int i = n - 1; i >= 0; i--) {
int digit = (A[i] / exp) % 10;
B[--C[digit]] = A[i];
}
// Copy back for next pass
for (int i = 0; i < n; i++) {
A[i] = B[i];
}
}
delete[] B;
}
def radix_sort(A):
if not A:
return A
n = len(A)
# Find maximum value
maxv = A[0]
for x in A:
if x > maxv:
maxv = x
B = [0] * n
exp = 1
# Process each digit using exp = 1, 10, 100, ...
while maxv // exp > 0:
C = [0] * 10
# Phase 1: Count digit occurrences
for x in A:
digit = (x // exp) % 10
C[digit] += 1
# Phase 2: Compute prefix sums
for i in range(1, 10):
C[i] += C[i - 1]
# Phase 3: Stable placement into B
for i in range(n - 1, -1, -1):
digit = (A[i] // exp) % 10
C[digit] -= 1
B[C[digit]] = A[i]
# Copy back for next pass
A = B[:]
exp *= 10
return A
Time and Space Analysis
Before analyzing Radix Sort’s performance, we introduce three key parameters:
Let \(n\) be the number of keys to sort, \(k\) the radix (number of buckets per digit, e.g. 10 for decimal or 256 for byte-values),
and \(d\) the number of digit positions in the largest key (for a maximum key of value \(M\), roughly \(\lfloor \log_{k} M\rfloor + 1\)).
Time Complexity:
Each pass of Radix Sort processes one digit position. It consists of four phases, each touching either the input array \(A\) or the count array \(C\):
Counting Phase:
Scan \(A[0\ldots n-1]\). For each key, extract the current digit in \(O(1)\) and increment the corresponding bucket in \(C\).
Cost: \(\Theta(n)\).
Prefix-Sum Phase:
Transform \(C[0\ldots k-1]\) into cumulative counts so each entry indicates its endpoint in the output array.
Cost: \(\Theta(k)\).
Placement Phase:
Traverse \(A\) backwards, recompute each digit, decrement its count, and write the key into the output buffer \(B\), preserving stability.
Cost: \(\Theta(n)\).
Copy-Back Phase:
Copy \(B[0\ldots n-1]\) back into \(A\) to prepare for the next digit pass.
Cost: \(\Theta(n)\).
Summing these four phases gives \(\Theta(3n + k)\) work per pass, which simplifies to \(\Theta(n + k)\).
Radix Sort makes one such digit pass for each of the \(d\) digits in the largest key. Therefore, the total running time is
\(\Theta\bigl(d \times (n + k)\bigr)\).
Space Complexity:
Aside from the input array, Radix Sort requires:
An output buffer \(B\) of size \(n\), for \(\Theta(n)\) extra space.
A count array \(C\) of size \(k\), for \(\Theta(k)\) extra space.
A constant number of scalar variables, for \(O(1)\) extra space.
So the total auxiliary space needed is \(\Theta(n + k)\).
Variations/Improvements
Trade-Offs and Practical Notes
Radix size vs. passes:
A larger \(k\) (e.g. 256) reduces the number of passes \(d\) but increases the size of the count array \(C\) and may hurt cache performance.
Small-radix benefits:
A small \(k\) (e.g. 10) often fits entirely in cache, improving real-world speed despite requiring more passes.
When \(k\) is much larger than \(n\):
If the number of buckets \(k\) exceeds the number of keys \(n\), then the cost to initialize and scan the count array—\(\Theta(k)\)—can outweigh the work on the data itself (\(\Theta(n)\)). In that case, Radix Sort may perform worse than comparison-based algorithms like Quick Sort.
Copy-back optimization:
Alternating between two buffers or swapping pointers can eliminate the \(\Theta(n)\) copy-back cost per pass at the expense of more complex buffer management.
Stability requirement:
Each pass relies on a stable sort of one digit (e.g.\ Counting Sort) to preserve the order of equal-digit keys and ensure correctness.
MSD (Most Significant Digit) Variant:
Instead of processing from least to most significant digit, the MSD approach
partitions the array by the most significant digit first, then recursively sorts
each bucket on the next digit. This naturally handles variable-length keys and
can terminate early on buckets whose elements share identical prefixes.
Radix Selection:
Choosing the radix \(k\) involves a space–time trade-off:
using a power-of-two radix (e.g. \(2^8\) or \(2^{16}\)) lets you replace
division and modulus operations with fast bit-shifts and masks, but it also
increases the memory footprint of the count array.
In-Place Cycle-Leader Permutation:
To eliminate the \(\Theta(n)\) copy-back step, you can compute each element’s
final index and rotate cycles of elements in place. This reduces auxiliary
space but requires careful index bookkeeping.
Parallel Implementation:
Parallelize counting and placement phases across threads or distributed nodes to
improve throughput on large datasets.
Stability: Why must the intermediate sort in Radix Sort be stable, and what could go wrong if it isn’t?
Non-Comparison Basis: Explain in your own words why Radix Sort is non-comparison-based and how it leverages Counting Sort for each digit.
Complexity Derivation: Show how one pass costs \(\Theta(n + k)\) and derive the overall time bound \(\Theta(d\,(n + k))\).
Radix Choice: If you double the radix \(k\), how does that affect (a) the number of passes \(d\), and (b) the cost per pass?
Space vs Time: Describe the space complexity \(\Theta(n + k)\) and discuss the trade-off between extra memory and per-pass speed.
LSD vs MSD: Contrast the Least Significant Digit (LSD) and Most Significant Digit (MSD) variants in terms of recursion, handling of variable-length keys, and average work.
Bit-Grouping: For a 32-bit key, compare the effects of using radix \(k=2^8\) (byte-by-byte) versus \(k=2^{16}\) (half-word).
Input Distribution: How do “already sorted” or “reverse sorted” inputs affect the behavior of each pass’s count array?
Answer:
The intermediate sort must be stable because Radix Sort relies on preserving the relative order of keys that share the same digit at each pass.
If the sort on one digit were unstable, keys that compared equal at that digit could be reordered arbitrarily, and the information imposed by earlier
(less significant) passes would be lost—producing an incorrect final ordering.
Answer:
Radix Sort is non-comparison-based because it never compares two keys directly; instead, it extracts a digit from each key and uses Counting Sort
to bucket them by value of that digit. By repeating this process for each digit position, the algorithm achieves a full sort without any element-to-element comparisons.
Answer:
One digit pass has four phases:
Counting Phase: \(\Theta(n)\)
Prefix-Sum Phase: \(\Theta(k)\)
Placement Phase: \(\Theta(n)\)
Copy-Back Phase: \(\Theta(n)\)
Summing gives \(\Theta(3n + k) \approx \Theta(n + k)\). With \(d\) passes, the total cost is
\(\Theta\bigl(d \,(n + k)\bigr)\).
Answer:
Doubling the radix \(k\) reduces the number of digits \(d\) because
\[
d \approx \bigl\lceil \log_{k} M \bigr\rceil
\]
decreases when \(k\) grows. However, each pass now costs \(\Theta(n + 2k)\)
instead of \(\Theta(n + k)\). Thus you trade fewer passes for more work per pass, and the optimal choice balances these two effects.
Answer:
Radix Sort uses auxiliary arrays of size \(n\) (output buffer) and \(k\) (count array), so extra space is \(\Theta(n + k)\). Larger \(k\)
can speed up sorting by reducing \(d\), but at the cost of more memory and potential cache misses; smaller \(k\) uses less memory but requires more passes.
Answer:
The LSD variant processes digits from least to most significant in a fixed loop of \(d\) passes and is simple and stable for fixed-length keys.
The MSD variant recurses on buckets sorted by the most significant digit first, naturally handling variable-length keys and allowing early
termination on uniform buckets—but incurs recursion overhead and dynamic bucket management.
Answer:
For 32-bit keys, using radix \(k = 2^8\) (bytes) gives 4 passes of cost \(\Theta(n + 256)\) each, while \(k = 2^{16}\) (half-words) gives 2 passes of cost \(\Theta(n + 65536)\).
The byte-based choice often wins in practice because 256-sized count arrays fit in cache, whereas 65536 do not.
Answer:
Whether the input is already sorted or reverse sorted has no effect on the number of operations—each pass still touches all \(n\) keys and all \(k\) buckets.
However, the count array’s values will reflect the particular digit distributions (e.g. skewed toward high or low buckets), which does not change the asymptotic cost.
In-Class Activities
Card-and-Bucket Simulation: Use index cards labeled with numbers and buckets labeled 0–9 to perform two full passes of Radix Sort by hand.
Radix Comparison: In small groups, run the demo with base-2, base-10, and base-256 on the same data and record the number of operations and memory used.
Stability Experiment: Replace the stable Counting Sort step with an unstable sort (e.g. swapping on ties) and observe how the final result changes.
Code Workshop: Pair up to implement the digit-extraction and count-array phases in your language of choice; test on various ranges of keys.
Pseudocode Review: Write pseudocode for the MSD variant and discuss how early termination on singleton buckets can save work.
Memory Trade-Off Debate: Discuss when it’s worth using a larger radix (more memory) versus more passes (more time)—cite cache behavior and real hardware.
In-Place Variant Design: Sketch on the whiteboard an in-place cycle-leader version, including how you’d detect and close each cycle.
Logging & Analysis: Augment the demo code to log the contents of \(C\) and \(B\) after each phase; swap logs with another group and explain their run.
Parameterization Exercise: Modify the provided code so that the radix \(k\) is passed as a function parameter rather than hard-coded.
Extension Challenge: Propose how you might parallelize the Counting and Placement phases on a multi-core processor, and outline synchronization points.
Negative Keys: How would you adapt the algorithm to sort an array containing both positive and negative integers?
Problems
Negative and Positive Integers: Implement Radix Sort to handle both negative and positive values in the same pass, and analyze its complexity.
In-Place Cycle-Leader Sort: Code an in-place variant using cycle-leader permutation. Prove its correctness and compare its memory use.
Empirical Radix Study: For \(n=10^5\), benchmark radix bases \(k=2\), \(10\), \(256\), and \(2^{16}\). Plot running time versus \(k\) and explain the results.
Hybrid Design: Design a hybrid Radix-QuickSort algorithm that switches to Quick Sort when subarrays become small. Argue where the crossover point should be.
MSD Recursion Depth: Analyze the worst-case recursion depth of the MSD variant on random strings of length \(d\) drawn from an alphabet of size \(k\).
Floating-Point Keys: Extend Radix Sort to IEEE-754 floats by handling sign, exponent, and mantissa correctly. Include both pseudocode and proof of total order preservation.
Parallel Radix Sort: Implement a multi-threaded version that divides Counting and Placement work across threads. Measure speedup on a multi-core machine.
Stability Proof: Formally prove that using a stable Counting Sort at each digit pass guarantees a correct overall sort.
Adaptive Radix Selection: Given input size \(n\) and maximum key \(M\), propose a heuristic to choose the optimal radix \(k\) that balances \(d\) and \(k\). Justify your heuristic experimentally.
Leveraging Bit Operations: Give the code/pseudocode for a version that uses \(k=2^a\) for some choice of \(a\).
Your implementation should take advantage of the fact that the base being a powers of two allows a
more efficient implementation of some of the steps.