Technique: Greedy Algorithms

Introduction

Greedy algorithms build a solution iteratively by making the most appropriate/optimal choice available at each step. When the following two properties hold, this approach is guaranteed to produce an optimal solution:

Greedy-choice property: A global optimal solution can be reached by making a locally optimal (greedy) choice at each step, without reconsidering previous choices.
Optimal substructure: A problem exhibits optimal substructure if once you make any optimal first choice, the leftover instance is of the same form, and solving that smaller instance optimally and combining it with your initial choice yields a globally optimal solution.

Bringing these together: the greedy-choice property lets us make the best local decision at each step, and optimal substructure ensures that each decision leaves a smaller problem that can still be solved optimally. Together, they guarantee the greedy method yields a global optimum.

In order to guarantee that an algorithm produces an optimal solution, a problem must exhibit both properties:

If it does not exhibit the greedy-choice property, making only locally optimal choices may lead to a suboptimal global solution.
If it does not exhibit optimal substructure, solving each subproblem optimally does not ensure a globally optimal outcome.

The general method of proving each property follows a general pattern:

Greedy-choice property: Use an exchange argument that shows any optimal solution can be transformed to one that begins with the greedy choice without losing optimality.
Optimal substructure: Demonstrate that the tail of any optimal solution—after its first decision—remains optimal for the residual subproblem, often via contradiction.

Because they never backtrack, greedy methods are often simple to implement and run in near-linear time (or \(O(n\log n)\) when sorting is involved, which is not uncommon). Even when the greedy-choice property fails, they can yield pretty good approximations quickly—how close depends on the particular problem.

Example 1: Interval Scheduling

Given a collection of \(n\) intervals (requests), each with a start time \(s_i\) and finish time \(f_i\), the goal is to select the largest possible subset of non-overlapping intervals. For full details, see the Interval Scheduling problem page.

This problem is easily solved by a greedy algorithm: always pick the interval with the earliest finish time that does not overlap with the ones you have already selected. By choosing the first finishing job first, you leave maximum room for the rest. The easiest way to accomplish this is to sort the intervals in increasing order according to finish time, then scan the list, choosing an interval if its start time is not before the previous finish time, and skipping it otherwise.This leads to the following pseducode.

intervalScheduling(intervals):
    sort intervals ascending by f_i
    selected = empty list
    lastFinish = -infinity
    for each interval in intervals:
        if interval.start >= lastFinish:
            add interval to selected
            lastFinish = interval.finish
    return selected

We can prove that the greedy algorithm produces an optimal solution by proving the problem has the two required properties.

Greedy-Choice: We use an "exchange" argument. Let \(G\) be the schedule produced by the greedy algorithm and \(S\) be any optimal schedule. Assume the intervals are stored in increasing order by finish time in each schedule. Suppose the first interval chosen by \(G\) finishes at \(f(g_1)\) and the first interval in \(S\) finishes at \(f(o_1)\). Since \(g_1\) is the interval with the earliest finish time, \(f(g_1)\le f(o_1)\). If \(o_1\ne g_1\), replace \(o_1\) with \(g_1\) in \(S\); the new schedule remains feasible and retains optimal size. Repeating this exchange for each subsequent interval shows that \(G\) selects at least as many intervals as \(S\), establishing that the greedy strategy is optimal.
Optimal Substructure: Once you take the first interval (the one finishing earliest), the remaining task is exactly another interval scheduling instance on the intervals that start at or after its finish time. Any optimal schedule for this reduced set, when appended to your first choice, yields an optimal schedule for the full collection. If there were a better way to schedule the remaining intervals, swapping in that improved schedule would strictly increase the total number of intervals—contradicting the optimality of the original greedy solution.

Let's see the algorithm in action.

Time Complexity: This one is simple to analyze: sorting takes \(O(n\log n)\) time and the single scan takes \(O(n)\) time, so the overall complexity is \(O(n\log n + n) = O(n\log n)\).

Space Complexity: The algorithm requires a constant amount of extra space plus up to \(O(n)\) space to store the output list.

The only thing that makes this algorithm a little difficult to implement is the fact that we are sorting intervals instead of just numbers. But sorting data that contains multiple fields on one of those fields is necessary in many algorithms, and hopefully you are (or will become) comfortable implementing this idea. Some languages have mechanisms that make this easier in practice (e.g. Java's Comparable interface).

Example 2: Fractional Knapsack

Given \(n\) items, where item \(i\) has value \(v_i\) and weight \(w_i\), and a knapsack with capacity \(W,\) the fractional knapsack problem asks us to maximize total value by selecting whole or fractional items whose total weight does not exceed \(W\). If you are not familiar with this problem, see the Fractional Knapsack problem page before continuing.

The greedy solution works by computing the density \(p_i = v_i / w_i\) for each item, sorting all items in descending order of \(p_i\), and then iteratively taking as much of the current highest-density item as will fit until the knapsack is full. Because taking more of the item with the greatest value per unit weight can never reduce the total value achieved by any other feasible combination, the greedy strategy maximizes the final value. We will give a more formal/complete justification that the algorithm is correct a bit later.

Here is pseudocode for the algorithm based on the description:

// Input
// Items: list of records with fields  
//   .value: the value v_i of each item  
//   .weight: the weight w_i of each item  
// W: The capacity of the knapsack. that is, the maximum total weight
fractionalKnapsack(Items, W):
    # 1. Compute value-per-weight ratio for each item
    for each item in Items do
        item.ratio = item.value / item.weight
    end for

    # 2. Sort items by descending ratio
    sort Items so that
        Items[0].ratio >= Items[1].ratio >= ... >= Items[n-1].ratio

    remaining = W
    totalValue = 0

    # 3. Greedily fill the knapsack
    for each item in Items do
        if item.weight <= remaining
            # take the whole item
            totalValue = totalValue + item.value
            remaining = remaining - item.weight
        else
            # take only the fraction that fits
            fraction = remaining / item.weight
            totalValue = totalValue + item.value * fraction
            break    # knapsack is full
        end if
    end for

    return totalValue

This pseudocode only computes and returns the maximum value obtainable. It can easily be modified to return the list of items to take, although there is a subtlety here: We sorted the items, so we need to make sure when we return the taken items that they are correctly associated with the original items.

Here is a demonstration of the algorithm in action.

To prove that the algorithm computes an optimal solution, we will show that the problem exhibits optimal substructure and the greedy-choice property.

Optimal Substructure: Let \((A,W)\) be an instance of fractional knapsack with items \(A\) and knapsack weight \(W,\) and let \(S\) be an optimal solution for \((A,W)\) with value \(V\). Let \(k\) be some item that \(S\) includes with fractional amount \(x_k\). We define a subproblem as solution as follows:
- \(S' = S \setminus \{k\}\), the part of the original solution excluding the fraction of item \(k\).
- \(W' = W - x_k\,w_k\), the remaining capacity after reserving room for a fraction \(x_k\) of item \(k\).
- \(A'\) is the item set identical to \(A\), except item \(k\) now has only \((1-x_k)w_k\) weight left.
- \(V'=V-v_k\cdot x_k\), is the value of \(S'\).
Then \((A',W')\) is a smaller knapsack instance, and \(S'\) fills it exactly. We need to prove that \(S'\) is optimal for \((A',W')\).
Suppose, for contradiction, that \(S'\) is not optimal for \((A',W')\). Then there is some other solution \(T'\) of weight \(\le W'\) with value \(V_{T'}>V'\). But if we take \(x_k\) of item \(k\) (as \(S\) did) and then follow it with \(T'\), we get a valid solution for \((A,W)\) whose value is \(V_{T'}+v_k\cdot x_k \gt V' + v_k\cdot x_k = V\). But since \(S\) was optimal, this is a contradiction since it is a better solution. Therefore \(S'\) must be optimal for \((A',W')\). Since this holds for any optimal \(S\), the fractional-knapsack problem has optimal substructure.
Greedy-Choice Property: Let \(i\) be an item with highest density \(p_i = v_i/w_i\). Taking as much as possible of \(i\) cannot worsen the final total value, because any optimal solution that does not begin with item \(i\) can be modified by swapping in \(i\) for an equal-weight portion of any other items since none have a higher density, so the swap cannot decrease the value.

Time Complexity: \(O(n\log n)\) for sorting, plus \(O(n)\) time for selecting, for a total of \(O(n\log n + n)= O(n\log n)\) time.

Space Complexity: Requires \(O(n)\) extra space for the density array, plus a constant amount of additional space for indexing variables and such.

It should be noted that the greedy algorithm does not guarantee an optimal solution to the 0-1 Knapsack in which you are forced to either take an entire item or not at all. For more details about this algorithm, see the full Fractional Knapsack Algorithm page.

Algorithms Using This Technique

Activity Selection (Interval Scheduling): Choose the next job that finishes earliest to maximize the number of non-overlapping activities.
Fractional Knapsack Algorithm: Take items (or fractions) in decreasing order of value-to-weight ratio to maximize total value under a weight limit.
Huffman Encoding: Builds an optimal prefix code by repeatedly merging least frequent symbols.
Kruskal's Algorithm: Selects cheapest edges one by one to build a minimum spanning tree.
Prim's Algorithm: Grows a minimum spanning tree starting from a single vertex, adding the cheapest edge from the existing tree to a new vertex.
Dijkstra's Algorithm: Greedily selects the vertex with the shortest path to grow the single-source shortest-path tree.
Job Sequencing with Deadlines and Profit: Schedule jobs in order of decreasing profit, placing each job in the latest available slot before its deadline to maximize total profit.
Coin Change (Canonical Coin Systems): Repeatedly take the largest coin denomination not exceeding the remaining amount to minimize the total number of coins.
Minimizing Maximum Lateness: Sort jobs by earliest due date first and schedule in that order to minimize the maximum lateness across all jobs.

Implementation Tips

Pre-Sort Input: Sort your data by the greedy criterion up front (e.g., finish time or value/weight ratio) to avoid extra work inside the selection loop.
Composite Sorting: When sorting by multiple fields, either use a composite key (e.g., a tuple of criteria) or perform stable sorts in reverse priority order so that secondary criteria are preserved.
Preserve Original Items: If you reorder the input (as in Fractional Knapsack), work on a copy of the list or store each item's original index or ID so you can map back to the original collection when producing the final output.
Use the Right Data Structure: If you need to repeatedly extract the next best element, use a priority queue or heap for \(O(\log n)\) access and updates.
Precompute Metrics: Compute and store ratios, costs, or weights in your item objects so comparisons are constant-time.
Explicit Tie-Breaking: Define a clear rule for ties (e.g., smaller index, shorter duration) to ensure deterministic results.
Avoid Floating-Point Pitfalls: For ratio-based decisions, use a high-precision type to prevent rounding bugs.
Handle Edge Cases: Check for empty inputs, zero capacity, or all items fitting trivially to avoid out-of-bounds or division-by-zero errors.
Validate Assumptions: Add comments or assertions to document why the greedy-choice property and optimal substructure hold for your problem.
In-Place Updates: When possible, perform swaps and updates in place to minimize extra memory usage and improve cache performance.

Common Pitfalls

Incorrect Criterion: Using the wrong sorting key (e.g., start time instead of finish time) can yield suboptimal or incorrect selections.
Assumption Violations: Applying a greedy algorithm when the greedy-choice property or optimal substructure does not hold leads to incorrect results.
Tie-Breaking Ambiguity: Failing to define or implement consistent tie-break rules can produce nondeterministic or wrong outputs.
Lost Item Mapping: Forgetting to track original indices after sorting may return the wrong items in the final solution.
Floating-Point Errors: Relying on floating-point ratios without adequate precision control can cause round-off mistakes in comparisons.
Edge-Case Oversights: Neglecting cases like empty input, zero capacity, or trivial fits can lead to crashes or incorrect behavior.
Unnecessary Backtracking: Adding backtracking logic is a sign the greedy choice is not valid for the problem.
Overconfidence on NP-Hard Problems: Expecting exact optima for problems like 0-1 Knapsack where greedy algorithms only produce an approximations.

Real-World Applications

Task Scheduling: Operating systems and cloud platforms use interval scheduling to assign jobs to CPUs or VMs, maximizing throughput and resource utilization.
Data Compression: Huffman coding underpins ZIP, JPEG, and MP3 formats by greedily building optimal prefix codes for symbol frequencies.
Network Routing: Dijkstra's algorithm is used in IP routing and navigation systems to find shortest paths through large-scale networks.
Load Balancing: Fractional knapsack ideas guide proportional distribution of workload or traffic across servers under capacity constraints.
Resource Allocation: Used in allocating network bandwidth, memory, or CPU resources by repeatedly granting each application—ordered by descending priority or utility—the maximum share it needs (up to the remaining capacity) before moving on to the next.
Currency Dispensing: ATMs and vending machines use a greedy coin-change approach (for canonical denominations) to minimize the number of coins dispensed.
Spanning Tree Design: Kruskal's and Prim's MST algorithms can be used to help plan efficient layouts for electrical grids, telecommunications networks, and road systems.

Summary & Key Takeaways

Greedy algorithms build a solution by making the best local choice at each step, without revisiting prior decisions. When the greedy-choice property and optimal substructure hold, this yields an optimal result; otherwise it may serve as a fast approximation.

Core Idea: At each iteration, pick the element that looks best now (earliest finish time, highest value-weight ratio, smallest cost, etc.) and move on.
Guarantees: Correctness hinges on the greedy-choice property and optimal substructure, often proved via an exchange argument or matroid theory (an advanced mathematics topic beyond the scope of this book).
Performance: Many greedy algorithms run in \(O(n\log n)\) time due to sorting or priority-queue operations, with low extra space overhead.
When to Reach For It: Use greedy methods when you have a clear local-to-global argument, irreversible decisions, and simple data structures; or when you need a fast heuristic for a hard problem.
Watch Out: Do not apply greedy if choices interact globally, if you need backtracking, or if no clear optimal substructure exists.

In-Class Activities

Counterexample/Proof: Pick either Interval Scheduling or Fractional Knapsack and try to find a counterexample that shows that the greedy algorithm doesn't alwayts work. Alternatively, convince yourself that the greedy algorithm will indeed always produce and optimal solution. Have one or more students give clear arguments for their case.
Exchange Argument Workshop: In small groups, use colored cards to represent two schedules—one greedy (interval scheduling) and one 'optimal'. Step through the exchange argument by swapping later-finishing intervals for earlier ones until both match.
Design and Test a Greedy Strategy: Give each pair a coin system that is not canonical (for example {1, 3, 4}). Have them propose and simulate a greedy coin-change algorithm, then find a counterexample where it fails.
Fractional Knapsack Simulation: Each student receives a set of 5 items with given values and weights and a capacity. Manually compute densities \(p_i = v_i / w_i\), sort the items by density, and fill the knapsack with fractions. Compare with a brute-force check for the same instance.
Greedy versus Non-Greedy Debate: Pick a problem. Split the class: half argue that a greedy strategy works for a given problem, half argue that it does not. Each side presents either a proof sketch or a counterexample.
Minimum Stabbing Points: You are given a set of \(n\) closed intervals \([s_i, f_i]\) on the real line. Design a greedy algorithm that selects the smallest possible set of points \(P\) such that every interval contains at least one point in \(P\). Provide pseudocode, analyze its time complexity, and give a brief proof of correctness.
Optimal Rope Merging: You have \(n\) ropes with lengths \(l_1, l_2, \dots, l_n\). Merging any two ropes of lengths \(a\) and \(b\) costs \(a + b\) and produces a new rope of length \(a + b\). Design a greedy strategy to merge all ropes into one at minimum total cost. Provide pseudocode, analyze its time complexity (hint: consider an appropriate data structure), and justify why your greedy choice yields an optimal solution.

Homework Problems

Basic

Canonical Coin Change:
For U.S. coin denominations \(\{1,5,10,25\}\), design a greedy algorithm to make change for any amount \(C\). Clearly define what the greedy choice is and prove that the greedy choice minimizes the number of coins.
Non-Canonical Coin Change:
For the coin system \(\{1,3,4\}\), design a greedy algorithm to make change and find an explicit amount where it fails to be optimal. Then propose a modification that fixes the counterexample. Give a clear justification for why the modifies algorithm is correct.
Minimum Stabbing Points:
Given \(n\) closed intervals on the line, devise a greedy algorithm that chooses the smallest set of points so each interval contains at least one point. Explain your choice and prove correctness.
Rope Merging:
You have \(n\) ropes of lengths \(l_1, \dots, l_n\). Merging two ropes of lengths \(a\) and \(b\) costs \(a + b\). Give a greedy strategy to minimize total merge cost, analyze its complexity, and argue why it is optimal.

Advanced

Minimizing Maximum Lateness:
Each job \(i\) has processing time \(t_i\) and deadline \(d_i\). Design a greedy algorithm to schedule all jobs so as to minimize the maximum lateness \(\max_i(C_i - d_i)\), where \(C_i\) is completion time. Justify why sorting by earliest deadline works.
Job Sequencing with Deadlines and Profit:
You have jobs with deadlines \(d_i\) and profits \(p_i\). Each job takes unit time. Devise a greedy strategy to maximize total profit by scheduling at most one job per time slot. Analyze your algorithm and prove why it works (clearly define what the greedy choice is and prove that the greedy-choice proprty holds).
Greedy Set Cover Approximation:
Given a universe \(U\) and a collection of subsets \(S_1, \dots, S_m\), design a greedy algorithm that attempts to cover \(U\) with the fewest possible subsets. Clearly explain what your greedy choice is. Determine the computational complexity of your algorithm. If you algorithm is correct (i.e. it always give an optimal solution), give a proof. If it is not, give a counterexample.

Greedy Algorithms

Introduction

Pseudocode Skeleton

Algorithms Using This Technique

When to Use

Limitations