Quickselect

Problem Solved

Quickselect solves the K-th Order Statistic problem.

Design and Strategy

Quickselect is a decrease-and-conquer algorithm for finding the k-th smallest element in an unordered array. Arrays are indexed from 0, so the element of rank k is the one that would appear at index k-1 if the array were sorted.

Like Quicksort, Quickselect partitions the array around a chosen pivot, placing smaller elements on one side and larger ones on the other. But instead of sorting both sides, it keeps only the side that might contain the desired element. Each partition step narrows the active portion of the array until the pivot lands exactly where the k-th smallest element belongs.

Conceptually, Quickselect works within a subarray A[lo..hi]. After each partition, the pivot’s final index p tells us how many elements are smaller than it. If p matches the desired position, we are done. If p is too high, we recurse on the left half; if too low, on the right. The problem size decreases each step until only one element remains.

Here is a high-level description:

  1. If lo == hi, return A[lo].
  2. Partition A[lo..hi] around a pivot (typically A[lo]).
  3. Let p be the pivot’s final index.
  4. If p == k-1, return A[p] — the k-th smallest element.
  5. If p > k-1, recursively select in A[lo..p-1].
  6. If p < k-1, recursively select in A[p+1..hi].

Here is a simple example of the idea, where the details of how the partition step are left unspecified.

Example:

Find the 4th smallest in [5, 2, 8, 6, 1, 4, 3, 7]. At each step, we pick the first element of the current window as the pivot. Indices are 0-based; target index is k-1 = 3.

Legend: discarded in window pivot k-1 (index 3)
Step 0 — original (lo=0, hi=7, k-1=3)
5
2
8
6
1
4
3
7
Step 1 — partition by 5 → recurse left (0..3)
2
1
4
3
5
8
6
7
Step 2 — partition 0..3 by 2 → recurse right (2..3)
1
2
4
3
5
8
6
7
Step 3 — partition 2..3 by 4 → found at p=3answer: 4
1
2
3
4
5
8
6
7

The interactive demo below illustrates how each partition step narrows the interval until the algorithm locates the requested element. This implementation uses the Hoare Partition algorithm.

Implementation in Java, C++, Python

The following implementations make use of a partition algorithm. This can be implemented as Hoare partition, Lomuto partition, or any other partioning algorithm, and we leave it to the reader to plug in their preferred algorithm. The initial call is typically quickselect(A, 0, n-1, k), where \(n\) is the size of array \(A\). It is assumed that lo ≤ k ≤ hi for each call. Also, don't forget that the k-th element is at index k-1.
int quickselect(int[] A, int lo, int hi, int k) {
    if (lo == hi) 
        return A[lo];

    int target = k - 1;
    int p = partition(A, lo, hi);
    if (p == target) 
        return A[p];
    if (p > target)  
        return quickselect(A, lo, p - 1, k);
    else             
        return quickselect(A, p + 1, hi, k);
}
int quickselect(int A[], int lo, int hi, int k) {
    if (lo == hi) 
        return A[lo];

    int target = k - 1;
    int p = partition(A, lo, hi);
    if (p == target) 
        return A[p];
    if (p > target)  
        return quickselect(A, lo, p - 1, k);
    else             
        return quickselect(A, p + 1, hi, k);
}
def quickselect(A, lo, hi, k):
    if lo == hi:
        return A[lo]

    target = k - 1
    p = partition(A, lo, hi)
    if p == target:
        return A[p]
    if p > target:
        return quickselect(A, lo, p - 1, k)
    else:
        return quickselect(A, p + 1, hi, k)

Time/Space Analysis

Time Complexity: Quickselect’s running time depends on how balanced each partition is. Each partition step looks at every element in the current portion of the array once, so a single partition costs \(O(n)\). The total time depends on how many times we have to partition before finding the element of rank \(k\).

Average case. On average, the pivot splits the array into reasonably balanced parts. Since we only recurse on one of those parts, the amount of work quickly shrinks. The average running time turns out to be \(O(n)\)—about the same reasoning used to show that Quicksort averages \(O(n \log n)\), except that Quickselect keeps only one branch instead of two.

Best case. If the first pivot happens to land exactly where it belongs (for example, the pivot is the \(k\)-th smallest), then the algorithm performs one partition and stops. Even this "best case" still takes \(O(n)\) time, since the initial partition must examine every element once.

Worst case. If the pivot is always the smallest or largest element, each partition only removes one item, leaving a subproblem of size \(n-1\). This gives the recurrence \(T(n) = T(n-1) + O(n)\), leading to \(O(n^2)\) time. The situation is the same kind of bad luck that makes Quicksort's worst case quadratic when the pivot choices are poor (for example, picking the first element in an already sorted array).

Improving pivot choice. Randomly choosing the pivot or using a heuristic such as "median of three" keeps partitions roughly balanced most of the time, so the expected running time remains linear in practice. There are even special deterministic methods (like the median-of-medians algorithm) that guarantee \(O(n)\) time in every case, though they are slower in constant factors and less common in real programs.

Space Complexity: The recursive version of Quickselect uses very little extra memory beyond the array itself, since partitioning is done in place. However, each recursive call adds a frame to the call stack. The recursion depth is proportional to how many partitions occur: about \(O(\log n)\) levels on average, and up to \(O(n)\) in the worst case if the pivot choices are poor. The space use can be reduced to \(O(1)\) by rewriting Quickselect iteratively, since only one subarray is active at a time.

Summary: Quickselect runs in \(O(n)\) time on average and in the best case, and \(O(n^2)\) in the worst case. It uses constant extra space aside from the recursion stack. In practice, with a decent pivot rule, it is extremely fast for selecting a single element such as the median.

Variations/Improvements

Reading Comprehension Questions

  1. Goal & Rank vs. Index: What problem does Quickselect solve, and where would the \(k\)-th smallest appear in a sorted array?
  2. Working Window: In the description, what does the subarray \(A[\text{lo}..\text{hi}]\) represent during the algorithm?
  3. Pivot Test: After partitioning, how does comparing the pivot's final index \(p\) with \(k-1\) determine whether we return, go left, or go right?
  4. Relation to Quicksort: How is Quickselect similar to Quicksort, and what key difference makes its work smaller on average?
  5. Running Time Intuition: Briefly explain why the average time is \(O(n)\) but the worst case is \(O(n^2)\). What does the best case look like?
  6. Pivot Strategy: How do randomized pivots (or median-of-three) affect performance in practice, and why?
  7. Space Usage: For the recursive version, what is the extra space on average and in the worst case? How can it be reduced?
  8. Off-by-One Check: If you call quickselect(A, 0, n-1, 4), which rank are you asking for and which index should contain the answer?

In-Class Activities

  1. Trace Quickselect: Work through Quickselect by hand on a small array (e.g., [7, 2, 5, 9, 1, 8]) to find the 3rd and 6th smallest elements. Track lo, hi, and pivot index p at each step.
  2. Compare with Sorting: Count the number of comparisons needed to find the \(k\)-th smallest element versus fully sorting the array. Discuss why Quickselect is faster when you only need one element.
  3. Try a Worst Case: Run Quickselect on a sorted array while always choosing the first element as the pivot. Record how many partitions and recursive calls are made.
  4. Pivot Strategies: Repeat the same experiment using a random pivot and a "median-of-three" pivot rule. Compare how balanced the partitions are and how the total work changes.
  5. Handling Duplicates (Trace): Apply Quickselect to an array with many duplicate values, such as [4, 2, 4, 5, 4, 3, 2, 4]. Observe what happens when the pivot value appears multiple times. Does the algorithm still return the correct element? Which side does it recurse on?
  6. Handling Duplicates: Discuss what it means to solve the \(k\)-th order statistic problem when an array has duplicate values. Does anything about the definition change? Are there multiple valid ways of looking at it? Do we need different algorithms depending on how we look at it?
  7. Handling Duplicates (Three-Way Partition): Modify Quickselect to use a three-way partition that separates elements into "less than," "equal to," and "greater than" groups. Try it on several inputs and determine if it seems to be helpful or not.
  8. Rewriting Iteratively: Convert the recursive Quickselect into an iterative version. How much extra space is required by this version? How does it compare with the recursive version? Which version is easier to understand/implement?
  9. Beyond Two Parts: Try to imagine a version of Quickselect that divides the array into three equal parts (like a "ternary" search). Why does't this improve performance for this problem?

Homework Problems

  1. Basic Trace (first-pivot rule): Using Quickselect with first element of the current window as the pivot and 1-based rank \(k\), trace the steps to find the 4th smallest in [12, 5, 19, 2, 22, 9, 4, 14, 25, 6, 7]. For each partition, report lo, hi, the pivot value, and the pivot's final index p, and indicate the next window.
  2. Off-by-one sanity check: For array [8, 4, 6, 2, 9, 3], list the (index, value) pair that corresponds to the 1st, 3rd, and 6th smallest elements after sorting. Then state which k you would pass to quickselect(A, 0, n-1, k) for each.
  3. Worst-case behavior: Run Quickselect on the sorted array [1, 2, 3, 4, 5, 6, 7, 8] with the first-pivot rule to find the 7th smallest. Count the number of partitions and total comparisons (assume a standard single pass partition that inspects each element in the current window once). Generalize: for size \(n\), how many partitions and comparisons in this worst-case pattern?
  4. Randomized pivot (two trials): On [34, 4, 14, 8, 21, 2, 43, 9, 20, 7, 3, 29, 6, 11] find the 5th smallest twice, each time choosing the pivot uniformly at random from the current window. Record the pivot chosen, window size, and comparisons per partition.
  5. Duplicates: Using the first-pivot rule and a two-way partition, run Quickselect to find the 5th smallest in [4, 2, 4, 1, 5, 4, 3, 2, 2, 4, 3, 1, 5]. Show each step and indicate which side you recurse on when pivot value appears multiple times.
  6. Iterative Quickselect: Give pseudocode for an iterative version of Quickselect (replace recursion with a while loop that updates lo and hi). Compare/contrast it with the recursive version in terms of time, space, and simplicity of the code.
  7. Median-of-three heuristic: Modify the pivot rule to pick the median of the first, middle, and last elements of the current window. Run on [1,2,3,4,5,6,7,8,9,10] to find the 7th smallest and compare the number of partitions and comparisons to the first-pivot rule.
  8. Duplicates and Three-Way Partition: Quickselect with a two-way partition (only "< pivot" and ">= pivot") can perform poorly when the pivot value occurs many times. In this problem you will trace that behavior and then improve it with a three-way partition.
    1. Warm-up trace (two-way): Run Quickselect with the first element of the current window as pivot and a standard two-way partition to find the 5th smallest on A = [5, 2, 5, 5, 9, 5, 1, 5, 3, 5, 7, 5, 5]. For each partition show lo, hi, pivot value, pivot's final index p, and which side you recurse on. Count comparisons for each partition (assume one pass over the current window).
    2. Observation: Comment on how efficient the algorithm was on that particular data set. What are your thoughts on whether or not we can do better when repeated values are present?
    3. Three-way partition: Modify partition to produce three regions in one pass: "< pivot", "= pivot", "> pivot". Then Quickselect can decide in O(1) time whether:
      • the target rank lies in the "= pivot" block (done),
      • or it lies in the left block (recurse left),
      • or it lies in the right block (recurse right).
      Give complete pseudocode for your new partition method.
    4. Complexity of three-way partition: What is the computational complexity of your new three-way partition algorithm? Justify your answer.
    5. Run it again: Repeat (a) on the same array using your three-way partition. Show the new sequence of windows and counts of comparisons/partitions. Comment on whether or not the new partition seems to be doing its job well.
    6. Edge-case check: Consider B = [4,4,4,4,4,4,4,4]. For each rank \(k\in\{1,4,8\}\), state what Quickselect should return and how many partitions your three-way version performs. Does it perform as well as you expect? Explain.
    7. Brief conclusion: Is three-way partition worth doing when arrays have many repeated values? Explain. Is three-way partition also worth doing when array do not have repeated values? In other words, is it at least as good as the regular paritition in this case or is there too much overhead? In general, if you had to pick one, which version do you think you should use?