Quickselect

Design and Strategy

Quickselect is a decrease-and-conquer algorithm for finding the k-th smallest element in an unordered array. Arrays are indexed from 0, so the element of rank k is the one that would appear at index k-1 if the array were sorted.

Like Quicksort, Quickselect partitions the array around a chosen pivot, placing smaller elements on one side and larger ones on the other. But instead of sorting both sides, it keeps only the side that might contain the desired element. Each partition step narrows the active portion of the array until the pivot lands exactly where the k-th smallest element belongs.

Conceptually, Quickselect works within a subarray A[lo..hi]. After each partition, the pivot’s final index p tells us how many elements are smaller than it. If p matches the desired position, we are done. If p is too high, we recurse on the left half; if too low, on the right. The problem size decreases each step until only one element remains.

Here is a high-level description:

If lo == hi, return A[lo].
Partition A[lo..hi] around a pivot (typically A[lo]).
Let p be the pivot’s final index.
If p == k-1, return A[p] — the k-th smallest element.
If p > k-1, recursively select in A[lo..p-1].
If p < k-1, recursively select in A[p+1..hi].

Here is a simple example of the idea, where the details of how the partition step are left unspecified.

Example:

Find the 4th smallest in [5, 2, 8, 6, 1, 4, 3, 7]. At each step, we pick the first element of the current window as the pivot. Indices are 0-based; target index is k-1 = 3.

Legend: discarded in window pivot k-1 (index 3)

Step 0 — original (lo=0, hi=7, k-1=3)

Step 1 — partition by 5 → recurse left (0..3)

Step 2 — partition 0..3 by 2 → recurse right (2..3)

Step 3 — partition 2..3 by 4 → found at p=3answer: 4

Implementation in Java, C++, Python

The following implementations make use of a partition algorithm. This can be implemented as Hoare partition, Lomuto partition, or any other partioning algorithm, and we leave it to the reader to plug in their preferred algorithm. The initial call is typically quickselect(A, 0, n-1, k), where \(n\) is the size of array \(A\). It is assumed that lo ≤ k ≤ hi for each call. Also, don't forget that the k-th element is at index k-1.

int quickselect(int[] A, int lo, int hi, int k) {
    if (lo == hi) 
        return A[lo];

    int target = k - 1;
    int p = partition(A, lo, hi);
    if (p == target) 
        return A[p];
    if (p > target)  
        return quickselect(A, lo, p - 1, k);
    else             
        return quickselect(A, p + 1, hi, k);
}

int quickselect(int A[], int lo, int hi, int k) {
    if (lo == hi) 
        return A[lo];

    int target = k - 1;
    int p = partition(A, lo, hi);
    if (p == target) 
        return A[p];
    if (p > target)  
        return quickselect(A, lo, p - 1, k);
    else             
        return quickselect(A, p + 1, hi, k);
}

def quickselect(A, lo, hi, k):
    if lo == hi:
        return A[lo]

    target = k - 1
    p = partition(A, lo, hi)
    if p == target:
        return A[p]
    if p > target:
        return quickselect(A, lo, p - 1, k)
    else:
        return quickselect(A, p + 1, hi, k)

Time/Space Analysis

Time Complexity: Quickselect’s running time depends on how balanced each partition is. Each partition step looks at every element in the current portion of the array once, so a single partition costs \(O(n)\). The total time depends on how many times we have to partition before finding the element of rank \(k\).

Average case. On average, the pivot splits the array into reasonably balanced parts. Since we only recurse on one of those parts, the amount of work quickly shrinks. The average running time turns out to be \(O(n)\)—about the same reasoning used to show that Quicksort averages \(O(n \log n)\), except that Quickselect keeps only one branch instead of two.

Best case. If the first pivot happens to land exactly where it belongs (for example, the pivot is the \(k\)-th smallest), then the algorithm performs one partition and stops. Even this "best case" still takes \(O(n)\) time, since the initial partition must examine every element once.

Worst case. If the pivot is always the smallest or largest element, each partition only removes one item, leaving a subproblem of size \(n-1\). This gives the recurrence \(T(n) = T(n-1) + O(n)\), leading to \(O(n^2)\) time. The situation is the same kind of bad luck that makes Quicksort's worst case quadratic when the pivot choices are poor (for example, picking the first element in an already sorted array).

Improving pivot choice. Randomly choosing the pivot or using a heuristic such as "median of three" keeps partitions roughly balanced most of the time, so the expected running time remains linear in practice. There are even special deterministic methods (like the median-of-medians algorithm) that guarantee \(O(n)\) time in every case, though they are slower in constant factors and less common in real programs.

Space Complexity: The recursive version of Quickselect uses very little extra memory beyond the array itself, since partitioning is done in place. However, each recursive call adds a frame to the call stack. The recursion depth is proportional to how many partitions occur: about \(O(\log n)\) levels on average, and up to \(O(n)\) in the worst case if the pivot choices are poor. The space use can be reduced to \(O(1)\) by rewriting Quickselect iteratively, since only one subarray is active at a time.

Summary: Quickselect runs in \(O(n)\) time on average and in the best case, and \(O(n^2)\) in the worst case. It uses constant extra space aside from the recursion stack. In practice, with a decent pivot rule, it is extremely fast for selecting a single element such as the median.

Reading Comprehension Questions

Goal & Rank vs. Index: What problem does Quickselect solve, and where would the \(k\)-th smallest appear in a sorted array?
Working Window: In the description, what does the subarray \(A[\text{lo}..\text{hi}]\) represent during the algorithm?
Pivot Test: After partitioning, how does comparing the pivot's final index \(p\) with \(k-1\) determine whether we return, go left, or go right?
Relation to Quicksort: How is Quickselect similar to Quicksort, and what key difference makes its work smaller on average?
Running Time Intuition: Briefly explain why the average time is \(O(n)\) but the worst case is \(O(n^2)\). What does the best case look like?
Pivot Strategy: How do randomized pivots (or median-of-three) affect performance in practice, and why?
Space Usage: For the recursive version, what is the extra space on average and in the worst case? How can it be reduced?
Off-by-One Check: If you call quickselect(A, 0, n-1, 4), which rank are you asking for and which index should contain the answer?

Answer: It finds the \(k\)-th smallest element. In a sorted array it would be at index \(k-1\).
Answer: The current search window—the portion of the array that might still contain the answer. Everything outside it is already known to be irrelevant.
Answer: After the partition step, the pivot’s final index p tells how many elements are smaller than it. If p == k-1, the pivot is exactly the element of rank k. If p > k-1, too many elements are smaller, so the desired one must be in the left subarray A[lo..p-1]. If p < k-1, there are too few smaller elements, so the algorithm continues in the right subarray A[p+1..hi]. This comparison tells Quickselect which part of the array can still contain the target, and it discards the rest.
Answer: Both use a partition step. Quicksort recurses on both sides to fully sort; Quickselect recurses on only the side that can contain the \(k\)-th element.
Answer: Average \(O(n)\): partitions are reasonably balanced and we follow just one branch. Thus, a large number of elements are discarded each time so the number of operations done at each level is shrinking rapidly, leading to just a linear number of operations on average. Worst \(O(n^2)\): consistently terrible pivots shrink the window by 1 each time. Best \(O(n)\): the first pivot lands at \(k-1\), but we still pay for one linear partition.
Answer: They make consistently bad splits unlikely, keeping partitions roughly balanced on typical inputs, so expected time stays linear in practice.
Answer: Extra space is about \(O(\log n)\) on average and \(O(n)\) in the worst case since that is the depth of the call stack, and each call uses only \(O(1)\) space. An iterative version reduces the total extra space to just \(O(1)\).
Answer: You're asking for the 4th smallest (rank 4), which should end up at index \(3\) in the array.

In-Class Activities

Trace Quickselect: Work through Quickselect by hand on a small array (e.g., [7, 2, 5, 9, 1, 8]) to find the 3rd and 6th smallest elements. Track lo, hi, and pivot index p at each step.
Compare with Sorting: Count the number of comparisons needed to find the \(k\)-th smallest element versus fully sorting the array. Discuss why Quickselect is faster when you only need one element.
Try a Worst Case: Run Quickselect on a sorted array while always choosing the first element as the pivot. Record how many partitions and recursive calls are made.
Pivot Strategies: Repeat the same experiment using a random pivot and a "median-of-three" pivot rule. Compare how balanced the partitions are and how the total work changes.
Handling Duplicates (Trace): Apply Quickselect to an array with many duplicate values, such as [4, 2, 4, 5, 4, 3, 2, 4]. Observe what happens when the pivot value appears multiple times. Does the algorithm still return the correct element? Which side does it recurse on?
Handling Duplicates: Discuss what it means to solve the \(k\)-th order statistic problem when an array has duplicate values. Does anything about the definition change? Are there multiple valid ways of looking at it? Do we need different algorithms depending on how we look at it?
Handling Duplicates (Three-Way Partition): Modify Quickselect to use a three-way partition that separates elements into "less than," "equal to," and "greater than" groups. Try it on several inputs and determine if it seems to be helpful or not.
Rewriting Iteratively: Convert the recursive Quickselect into an iterative version. How much extra space is required by this version? How does it compare with the recursive version? Which version is easier to understand/implement?
Beyond Two Parts: Try to imagine a version of Quickselect that divides the array into three equal parts (like a "ternary" search). Why does't this improve performance for this problem?

Homework Problems

Basic Trace (first-pivot rule): Using Quickselect with first element of the current window as the pivot and 1-based rank \(k\), trace the steps to find the 4th smallest in [12, 5, 19, 2, 22, 9, 4, 14, 25, 6, 7]. For each partition, report lo, hi, the pivot value, and the pivot's final index p, and indicate the next window.
Off-by-one sanity check: For array [8, 4, 6, 2, 9, 3], list the (index, value) pair that corresponds to the 1st, 3rd, and 6th smallest elements after sorting. Then state which k you would pass to quickselect(A, 0, n-1, k) for each.
Worst-case behavior: Run Quickselect on the sorted array [1, 2, 3, 4, 5, 6, 7, 8] with the first-pivot rule to find the 7th smallest. Count the number of partitions and total comparisons (assume a standard single pass partition that inspects each element in the current window once). Generalize: for size \(n\), how many partitions and comparisons in this worst-case pattern?
Randomized pivot (two trials): On [34, 4, 14, 8, 21, 2, 43, 9, 20, 7, 3, 29, 6, 11] find the 5th smallest twice, each time choosing the pivot uniformly at random from the current window. Record the pivot chosen, window size, and comparisons per partition.
Duplicates: Using the first-pivot rule and a two-way partition, run Quickselect to find the 5th smallest in [4, 2, 4, 1, 5, 4, 3, 2, 2, 4, 3, 1, 5]. Show each step and indicate which side you recurse on when pivot value appears multiple times.
Iterative Quickselect: Give pseudocode for an iterative version of Quickselect (replace recursion with a while loop that updates lo and hi). Compare/contrast it with the recursive version in terms of time, space, and simplicity of the code.
Median-of-three heuristic: Modify the pivot rule to pick the median of the first, middle, and last elements of the current window. Run on [1,2,3,4,5,6,7,8,9,10] to find the 7th smallest and compare the number of partitions and comparisons to the first-pivot rule.
Duplicates and Three-Way Partition: Quickselect with a two-way partition (only "< pivot" and ">= pivot") can perform poorly when the pivot value occurs many times. In this problem you will trace that behavior and then improve it with a three-way partition.
1. Warm-up trace (two-way): Run Quickselect with the first element of the current window as pivot and a standard two-way partition to find the 5th smallest on A = [5, 2, 5, 5, 9, 5, 1, 5, 3, 5, 7, 5, 5]. For each partition show lo, hi, pivot value, pivot's final index p, and which side you recurse on. Count comparisons for each partition (assume one pass over the current window).
2. Observation: Comment on how efficient the algorithm was on that particular data set. What are your thoughts on whether or not we can do better when repeated values are present?
3. Three-way partition: Modify partition to produce three regions in one pass: "< pivot", "= pivot", "> pivot". Then Quickselect can decide in O(1) time whether:
  - the target rank lies in the "= pivot" block (done),
  - or it lies in the left block (recurse left),
  - or it lies in the right block (recurse right).
  Give complete pseudocode for your new partition method.
4. Complexity of three-way partition: What is the computational complexity of your new three-way partition algorithm? Justify your answer.
5. Run it again: Repeat (a) on the same array using your three-way partition. Show the new sequence of windows and counts of comparisons/partitions. Comment on whether or not the new partition seems to be doing its job well.
6. Edge-case check: Consider B = [4,4,4,4,4,4,4,4]. For each rank \(k\in\{1,4,8\}\), state what Quickselect should return and how many partitions your three-way version performs. Does it perform as well as you expect? Explain.
7. Brief conclusion: Is three-way partition worth doing when arrays have many repeated values? Explain. Is three-way partition also worth doing when array do not have repeated values? In other words, is it at least as good as the regular paritition in this case or is there too much overhead? In general, if you had to pick one, which version do you think you should use?