A space-time tradeoff in algorithm design arises when we allocate extra memory to store useful information (e.g. lookup/memoization tables, caches of partial results, or precomputed results) in order to accelerate computation. By doing so, we can replace repeated or costly operations with simple retrievals, achieving faster overall running time at the cost of higher space usage.
Many classical algorithms and data structures apply these ideas, including counting sort, hash tables (using chaining or open-addressing), B-trees, and string searching algorithms such as Horspool's and Boyer-Moore. If you have already read about dynamic programming, it might have occured to you that it is a special case of space-time tradeoffs (See the Dynamic Programming page).
Here we will present two straightforward examples: Counting Sort and Hashing with Chaining.
You have likely already seen many algorithms to solve the Sorting Problem. Counting sort is probably very different than those you have seen. The other algorithms you have seen are probably all what are called compatison sorts—they accomplish the job of sorting a list by comparing elements with each other. At first glance, that term may seem like nonsense—of course we have to compare elements with each other in order to sort them! Except that it is not true. For certain situtations and types of data, we can sort more efficiently by doing something else.
The idea behind Counting Sort is to count the number of occurrences of each item in an array, and then copy the numbers back out in order. For instance, if \(A=[6, 3, 2, 7, 1, 2, 7, 8, 9, 2, 3, 7]\), we can go through the list once and see that there is one 1, three 2s, two 3s, zero 4s, zero 5s, one 6, three 7s, one 8, and one 9. Then we can just output them in order by going through that list, repeating each number the number of times it occurs: \([1, 2, 2, 2, 3, 3, 6, 7, 7, 7, 8, 9]\) (by writing down one 1, three 2s, etc.).
Hopefully it occured to you that there might be a problem with applying this idea in general. For instance, how would you apply this to strings? And what if the numbers on the list can be large? This technique is best applied when the numbers in the list come from a small range of values. When we analyze the algorithm later, we will discuss the affect of the range of numbers on the time and space complexity.
Since it is ideally suited for dealing with integers, we will restrict our attention on describing the details of counting sort on them. We leave it to the reader to figure out what other sorts of data it can and cannot work on, and what adjustment need to be made.
At a high level, the algorithm is easy to describe: determine the range of integers on the list, allocate space (the counting array) to store the number of occurences of each number in the range, go through the input array incremementing the appropriate counts, and finally go through the counting array, copying the appropriate values into the output array according to the counts. For simplicity, we will assume the numbers on the list range from 0 to \(m\) for some positive integer \(m\). With a little more detail, the steps of the algorithm as as follows:
Here is a more formal version of the algorithm:
CountingSort(A,n):
m = A[0]
for i = 1 to n-1: // 1. Find maximum value m in A
if A[i] > m:
m = A[i]
for j = 0 to m: // 2. Initialize count array C[0..m] to 0
C[j] = 0
for i = 0 to n-1: // 3. Count occurrences of each number
C[A[i]] = C[A[i]] + 1
B = empty list // 4. Construct output list B
for k = 0 to m: // 5. Append each value k, C[k] times
do C[k] times:
append k to B
return B
Below is a demonstration of the algorithm. You will notice that in addition to determining the maximum value of the array, it also determines the minimum. This allows the algorithm to create a smaller counting array. By default, the minimum is ignored and the algorithm proceeds exactly as described above. If you want to see the space-saving version in action, check the "Offset by min" button. You will be asked later to give the pseudocode for that version.
Time Complexity: If \(m\) is the largest value in the array, it executes two for loops of length \(n\) and two for loops of length \(m\), doing a constant amount of work each for for a total of \(O(n + m)\) time.
Space Complexity: It uses an extra counting array of size \(m\), where \(m\) is the maximum value in the array, and an extra output array (although in some cases you can copy back onto the original array) of size \(n\), plus a few local variables, for a total of \(O(n + m)\) space.
Depending on the value of \(m\), this can be significantly better than the \(O(n\log n)\) comparison sorting algorithms you are familiar with. Often it is used when \(m\) is a fixed and relatively small value, yeilding a linear-time algorithm. This beats the best possible comparison sorting algorithm. (If you have not already learned about this, you should know that it is impossible to devise a sorting algorithm based on comparing elements whose worst-case is better than \(O(n\log n)\). Hopefully you will learn about it because the proof uses binary trees in an interesting way.) This is a key example of the space-time tradeoff: by using an extra \(m\) space, you can potentially save a factor of \(\log n\) time.
For a more detailed converage of the algorithm, see the Counting Sort page. That page presents a slightly different version of the algorithm that is better suited for general use, particularly if the keys being sorted have data associated with them. That one uses a prefix-sum of the counting array and walks through the input array backwards, coping values into their proper location in the output array.
A hash table is a data structure designed to support Insert and Search
operations in constant expected time.
Note that this example illustrates a data structure and its operations, rather than a standalone algorithm.
This data structure assists with the
Searching Problem.
Hash tables are implemented using an array. There are two important numbers related to hash tables: the size of the array (\(m\)) and the number of elements stored in it (\(n\)). The ratio \(\alpha=n/m\), called the load factor, is a measure of how full the table is, and will be important when we discuss the complexity of the hash table operations.
Elements (often refered to as keys) are placed into a hash table based on a hash function, which is simply a function that maps arbitrary values to values in a specific range, in this case to values between \(0\) and \(m-1\).
In general, hash functions can map any type of data into the desired range.
This is typically done by first mapping the value to an integer,
and then mapping that integer into the desired range using the hash function.
If you are familiar with Java's hashCode, it exists for exactly this purpose.
When discussing the implementation of hash tables, we usually assume the data are integers because
it makes it easier, and the conversion from whatever data type to an integer is a separate issue anyway.
Hash functions can be implemented in many ways, and are used for various purpose (e.g. cryptography, load balancing, data storage and retrieval). Here we will use the simplest imaginable hash function: \(h(x)=x\bmod m\).
Putting this all together, if we want to put the value \(x\) into our hash table, we place it at index \(h(x)\). Sounds good so far. But the astute reader will have identified a potential problem: what if two different values want to go to the same index? This is called a collision. The main complexity of implementing hash tables is handling collisions. There are two primary strategies:
For the remainder of this page, we focus on chaining. If we wish to insert or search for a value \(x\) in a hash table \(T\), we first compute \(i=h(x)\) to determine the index. We then go to entry \(T[i]\) in the table to find \(L_i\), a linked list. If we are inserting the value, we simply append it to the tail \(L_i\). If we are searching for the value, we perform a linear search on \(L_i\).
Below is a demonstration of chaining in action:
The following pseudocode assumes we have an array \(T\) of size \(m\).
It also assumes that the linked list has operations insertAtTail that takes \(O(1)\) time
(assuming a tail pointer) and search that takes \(O(k)\) time when the list has \(k\) elements.
h(x): // Hash function
return x mod m
insert(x):
i = h(x)
L = T[i]
L.insertAtTail(x)
search(x):
L = T[h(x)]
return L.search(x)
To demonstrate alternative ways of coding things, we employed a shortcut in the search function.
Typically you would choose one of these two ways of doing it and do it the same everywhere.
Time Complexity: If we assume the keys are uniformly distributed, each linked list has about \(\alpha=n/m\) entries.
insert: Computing \(i=h(x)\) and then \(T[i]\) to find \(L\) both take constant time, as
does L.insertAtTail. Thus, it takes \(O(1)\) time.search: Again, finding \(L\) takes constant time. The load factor (\(\alpha\))
gives the average linked-list size. Thus, on average, searching takes \(\alpha\) time.
Thus, the total time on average is \(O(1+\alpha)\).
If \(\alpha\) is small, this is essentially constant time.
However, in the worst-case, searching can be take \(O(n)\) time (why?).Hashing with chaining uses \(O(n + m)\) space to achieve (almost) constant-time per operation. Since this is much better than linear search \(\left(O(n)\right)\), binary search \(\left(O(\log n)\right)\), or using a binary search tree \(\left(O(\log n)\right)\), this is a clear example of a space-time tradeoff.
The following algorithms and data structures leverage space-time tradeoffs by investing extra memory (e.g. lookup tables, auxiliary arrays, caching structures) to gain faster running times.
uint8_t, bitsets) for counters or flags to reduce overall footprint.Space-time tradeoff techniques leverage extra memory—through tables, caches, indexes, or auxiliary structures—to accelerate computation by reducing per-operation cost. They shine when you have repeated or latency-sensitive operations on a bounded domain and sufficient RAM, but carry upfront preprocessing and maintenance overhead and can introduce cache, fragmentation, or concurrency challenges.