Variable-Size-Decrease

Introduction

In the variable-size-decrease strategy, each step reduces the problem size by a non-fixed amount—often determined by a selection, partitioning, or transformation process that depends on the input itself. Unlike constant or constant-factor reductions, the size of the next subproblem is not predictable in advance. This variability makes analysis more nuanced, but many powerful algorithms—such as QuickSelect and Quicksort—rely on this approach to achieve optimal or near-optimal performance in practice.

Typically, these algorithms spend \(O(n)\) time on a partitioning or scanning step, followed by a recursive call on a subproblem of size \(k\), where \(k\) may range from 0 to \(n - 1\). In the best case, the reduction is large, leading to fast convergence; in the worst case, the progress is slow—sometimes resulting in poor time complexity unless additional strategies (like randomization or median-of-medians) are applied.

Example 5: Euclidean Algorithm (Variable-Size-Decrease)

The Greatest Common Divisor (GCD) problem asks for the largest integer that divides two positive integers \(a\) and \(b\) without leaving a remainder. If you are unfamiliar with GCD, make sure you read the GCD page. While a brute-force approach would check all numbers from \(\min(a,b)\) down to 1, there is a much faster method based on a variable-size-decrease strategy.

The Euclidean Algorithm for computing \(\gcd(a, b)\) works as follows:

If \(b = 0\), return \(a\). (We are done.)
Otherwise, recursively compute \(\gcd(b, a \bmod b)\).

To understand why this works, recall that any number that divides both \(a\) and \(b\) (where \(a\ge b\)) must also divide the remainder when \(a\) is divided by \(b\)—that is, \(a \bmod b\). This is because we can write \(a = bq + r\), where \(r = a \bmod b\), and any common divisor of \(a\) and \(b\) must also divide \(r\). So the set of common divisors of \((a, b)\) is the same as that of \((b, a \bmod b)\), and thus their greatest common divisor is the same.

Intuitively, the Euclidean Algorithm works by replacing the original problem with a "smaller but equivalent" one: instead of asking "what divides both \(a\) and \(b\)?", we ask "what divides both \(b\) and the leftover part of \(a\) after removing as many full \(b\)'s as possible?"

At each step, the size of the problem decreases from \((a,b)\) to \((b, a \bmod b)\), but the amount it decreases depends on the values of \(a\) and \(b\). In the worst case, it may only shrink slightly; in the best case, it decreases rapidly. This makes it a clear example of a variable-size-decrease algorithm.

Here's more formal pseudocode for the recursive version of the algorithm:

gcd(a, b):
    if b == 0:
      return a
    else:
      return gcd(b, a % b)

The following demonstration shows the algorithm in action.

Time Complexity: Let \(T(a,b)\) be the time to compute \(\gcd(a,b)\). Each recursive call performs a constant-time modulus operation and then recurses on \(\bigl(b,\;a \bmod b\bigr)\). In the worst case—when \((a,b)\) are consecutive Fibonacci numbers—this recursion makes only \(O(\log b)\) steps, so \[ T(a,b) \;=\; O(\log b) \;=\; O\bigl(\log \min(a,b)\bigr). \] Thus the Euclidean Algorithm runs in time logarithmic in the smaller of its two inputs, making it extremely efficient even for very large integers. (We realize we skimped on the details of this analysis. If you are really interested in understanding the details, you can find them in various other places including Euclidean Algorithm (Wikipedia).)

Example 6: Quickselect (Variable-Size-Decrease)

Quickselect solves the kth order statistic problem, which is simply to determine the \(k\)-th smallest element in a (presumably unsorted) array. It works as follows:

Choosing a pivot and partitioning the array into \([\lt pivot \; | \; pivot \; | \gt pivot]\).
Recursing only on the group that contains the \(k\)th element.

Here is a very simple demo showing a high-level view of Quickselect. It glosses over the details of the partition step, but it should give you a general idea of how the algorithm works.

Notice that once the algorithm has finished, the element at the desired index (in this case \(k=5\)) is in the proper location, elements smaller are to the left, and elements larger are to the right, but the array is not totally ordered. Quickselect orders it just enough to be able to put the \(k\)-th element in place.

Time Complexity: On average Quickselect runs in \(O(n)\) time (with careful pivot selection), though in the worst case it can be \(O(n^2)\).

This algorithm is a lot like Quicksort, except it only recurses on one part instead of both. Because there are subtleties that make this algorithm a little more complicated than the other examples in this section, we defer full details, incuding pseudocode, to the Quickselect page.

Algorithms Using This Technique

Quickselect: partitions around a pivot to select the \(k\)th element, average-case \(O(n)\) and worst-case \(O(n^2)\) time.
Euclidean Algorithm: computes \(\gcd(a,b)\) via \(\gcd(b,\,a \bmod b)\), running in \(O(\log \min(a,b))\) time.
Binary Search Tree Operations (unbalanced tree): With an unbalanced tree, the worst-case running time of search, insert, delete, etc. is \(O(h)\), where \(h\) is the height of the tree. Since \(\log n\leq h \leq n\), the performance can vary widely between trees, and even between calls on the same tree since some paths to a leaf might be as high as \(n\) whereas others as low as \(1\) or \(2\).
Interpolation Search: estimates next position to examine based on the key's value, giving average-case \(O(\log \log n)\) on uniform data.
Deterministic Median-of-Medians Selection: picks a pivot guaranteeing worst-case \(O(n)\) time for order statistics.
Binary GCD (Stein's Algorithm): uses bit shifts and subtraction to reduce values, running in \(O(\log a)\) time (where \(a\) is the larger value).

When to Use

Single subproblem per call: When you do not need to branch into multiple subinstances (as in divide-and-conquer) but can solve the entire task by a sequence of dependent steps—e.g., gcd via the Euclidean algorithm or Quickselect.
Data-dependent reduction: When the amount you remove or partition varies with the input, leading to good average-case performance (e.g., Quickselect's pivoting yields average \(O(n)\) time).
Low per-step overhead: When the extra work beyond the recursive call is \(O(1)\) or otherwise negligible, so a long chain of reductions still runs in \(O(n)\) or \(O(\log n)\).
Simplicity over branching: When the problem's structure does not naturally split into independent subproblems, and a linear or logarithmic "peel-away" approach is clearer and easier to maintain.

Limitations

Variable-Size-Decrease may be less suitable in these scenarios:

Worst-case degradation: Some variable-size algorithms (e.g., naive Quickselect) can fall to \(O(n^2)\) time if the reductions are poorly balanced.
Stack depth: A chain of \(O(n)\) reductions (as in decrease-by-a-constant) can lead to recursion depth \(O(n)\), risking stack overflow for large \(n\).
High constant overhead: When each reduction step entails significant work—such as choosing a precise median pivot—the extra constant factors can outweigh the depth advantage on moderate inputs.
Independent subproblems: If the problem naturally splits into multiple independent pieces, a divide-and-conquer strategy often enables better parallelism and overall efficiency.

Implementation Tips

Clear base case: Define and test your termination condition (e.g., \(n \le 1\) or a small threshold) to avoid infinite recursion or loops.
In-place partitioning: In variable-size algorithms like Quickselect, perform partitioning with two-pointer swaps to maintain \(O(1)\) extra space.
Robust pivot choice: Use a randomized pivot or median-of-medians to guard against worst-case performance (e.g. \(O(n^2)\) in Quickselect).
Hybrid small-case handling: In variable-size decrease algorithms (such as Quickselect), when the remaining subarray length falls below a small cutoff (e.g. \(n \le 16\)), finish with a simple iterative method (such as insertion sort or a direct scan) to avoid extra partitioning/pivoting and recursive-call overhead.
Index-based parameters: Pass start/end indices instead of slicing arrays to avoid \(O(n)\) copy overhead.

Common Pitfalls

Missing or incorrect base case: Failing to stop when \(n \le 1\) (or your chosen threshold) can lead to infinite recursion or loops.
Off-by-one in size reduction: Mixing \(\lfloor n/b\rfloor\) and \(\lceil n/b\rceil\) (or \(n-1\) vs. \(n-c\)) inconsistently may leave subproblems unchanged or skip elements.
Unintended array copies: Using slicing or subarray creation inside each call adds \(O(n)\) work per step, potentially turning an \(O(n)\) or \(O(n\log n)\) scheme into \(O(n^2)\).
Excessive recursion depth: A long chain of recursive calls—especially without converting tail calls to iteration—can exhaust the call stack on large inputs.
Poor pivot choice: In Quickselect or similar, consistently picking bad pivots can degrade average \(O(n)\) time to worst-case \(O(n^2)\).
Boundary mismanagement: Off-by-one errors in start/end indices (inclusive vs. exclusive) can omit or duplicate elements in subproblems.

Summary & Key Takeaways

Approach: Variable-size-decrease algorithms typically perform a selection or partition to choose a pivot (or key) that takes \(O(n)\) time, and recurse on a subproblem of size \(k\), where \(0 \le k < n\); the exact shrinkage depends on the input.
When to Reach For It: Ideal when you can "peel off" a single subproblem—by a fixed amount, a fixed fraction, or adaptively—and funnel all work through one chain of reductions rather than branching into multiple independent calls (in which case divide-and-conquer might be more appropriate).
Recursion depth: Determined by the sequence of chosen \(k\) values:
- Best case: Technically, the best case can be \(O(1)\) levels, although for most problems that does not really occur. In most cases, the best case of \(O(\log n)\) levels occurs when \(k\) is generally somewhere around \(n/2\) (or some other fraction) at each step. However, given the nature of this technique, this is difficult to guarantee.
- Average case: With proper pivot/partition selection, often \(O(\log n)\) levels. For instance, using random pivot selection in Quickselect.
- Worst case (poor pivots): \(O(n)\) levels if \(k\approx n-1\) each time.
Time Complexity: Because of their unpredictable nature, these are harder to generalize. On average, they have similar performance to decrease-by-a-constant-factor algorithms (assuming they generally have "nice" splits/division), and in the worst-case, they are similar to decrease-by-a-constant algorithms (assuming they have "poor" splits/divisions). For instance:
- Euclidean Algorithm: Always \(O\left(\log \left(\min(a,b)\right)\right)\).
- Quickselect: Average \(O(n)\), worst-case \(O(n^2)\); improved to worst-case \(O(n)\) with median-of-medians.
Space complexity: As with the other variations of this technique, it is based on the recursion stack depth, as well as any additional storage needed by the algorithm. The difference is that here it can vary. Often an average case of \(O(\log n)\) extra space can be achieved, but the worst-case is generally \(O(n)\).
Limitations: Beware of worst-case degradation (e.g. bad pivots), deep recursion stacks, and hidden \(O(n)\) costs from slicing or inefficient subproblem handling.
Real-world Impact: Found in high-speed selection (Quickselect), number theory (Euclidean algorithm), etc.
Implementation Considerations:
- Define and test a clear base case (e.g. stop when \(n \le 1\) or at a small threshold) to prevent infinite recursion or loops.
- Use iteration to avoid recursion overhead.
- Use robust pivots (e.g. randomized or median-of-medians) to avoid unbalanced splits.
- Switch to an iterative or closed-form method when subproblem size drops below a cutoff (e.g. \(n \le 16\)) to reduce overhead.
- Prefer in-place operations and avoid slicing to maintain \(O(1)\) extra space and preserve \(O(n)\) time.

Reading Comprehension Questions

Euclidean Algorithm Type
Why is the Euclidean Algorithm classified as variable-size-decrease, and what is its worst-case time complexity in terms of its inputs?
Definition Insight:
What distinguishes a variable-size-decrease algorithm from decrease-by-a-constant or decrease-by-a-constant-factor?
Quickselect Behavior:
In Quickselect, why do we only recurse on one partition after choosing the pivot? What is the benefit of this?
GCD Efficiency:
Why is the Euclidean algorithm so efficient despite the variability in how much the input decreases each step?
BST Height Impact:
How does the height of a binary search tree impact the performance of search or insert operations in an unbalanced tree?
Performance Spectrum:
Why are variable-size-decrease algorithms sometimes closer to constant-decrease in the worst case but closer to constant-factor in the average case?

In-Class Activities

Euclidean Algorithm Walkthrough:
Compute \(\gcd(1071,462)\) by hand using the Euclidean algorithm. Write down each remainder step, and count how many recursive calls occur.
Quickselect Simulation:
Given the array [17, 32, 8, 24, 13, 29, 41, 19, 5, 11, 27] and \(k = 6\), simulate Quickselect with the first element as the pivot. Show each partitioning step, updated value of \(k\), and which subarray is recursed on.
GCD Variability:
Compare the number of Euclidean algorithm steps when computing \(\gcd(987, 610)\) and \(\gcd(996, 249)\). Explain why one takes more steps than the other.
Height of BST:
Construct a binary search tree by inserting [50, 25, 75, 10, 30, 60, 90, 5, 20, 27, 65, 85, 95] in order. Then construct one using [5, 10, 20, 25, 27, 30, 50, 60, 65, 75, 85, 90, 95]. Compare max depth in each.
Pivot Choice Effects:
Try Quickselect on the array [62, 71, 38, 54, 85, 23, 91, 33, 47, 19, 68, 75, 99] to find the 7th smallest element. Do this twice: once using the first element as the pivot, once using the true median. How many steps does each take?
Pivot Behavior Exploration:
Consider the array [42, 17, 63, 5, 89, 28, 34, 57, 71, 21, 46, 79]. Apply Quickselect with three pivot strategies to find the 6th smallest element:
- (a) First element as pivot
- (b) Last element as pivot
- (c) Median-of-three (choose median of first, middle, last)
For each case, record the partitions created and number of steps. Which strategy gives the most balanced partitioning?

Homework Problems

Euclidean Algorithm Walkthrough:
Compute \(\gcd(119, 34)\) by hand using the Euclidean algorithm. List each remainder step and determine how many recursive calls occur.
Quickselect by Hand:
Use Quickselect to find the 8th smallest number in the array [48, 15, 67, 34, 92, 10, 56, 74, 23, 29, 88, 12, 61]. Use the first element as the pivot at each step. Show the updated array, partition index, and range for the next recursive call at each step.
Euclidean Edge Case:
Compute \(\gcd(1597, 987)\) by hand. Count the number of recursive calls. (Hint: These are consecutive Fibonacci numbers.)
Tree Performance:
Insert the following numbers into a binary search tree in the given order: [500, 200, 800, 100, 300, 700, 900, 50, 150, 250, 350, 600, 750, 850, 950]. Draw the tree and compute the maximum depth, as well as the depth of each leaf node.
Worst-Case Quickselect:
Construct an array of 9 distinct elements such that Quickselect (using pivot-at-start) takes the maximum number of steps to find the 5th smallest. Explain why this array causes the worst-case behavior.