Greedy Algorithms

Introduction

Greedy algorithms build a solution iteratively by making the most appropriate/optimal choice available at each step. When the following two properties hold, this approach is guaranteed to produce an optimal solution:

Bringing these together: the greedy-choice property lets us make the best local decision at each step, and optimal substructure ensures that each decision leaves a smaller problem that can still be solved optimally. Together, they guarantee the greedy method yields a global optimum.

In order to guarantee that an algorithm produces an optimal solution, a problem must exhibit both properties:

The general method of proving each property follows a general pattern:

Because they never backtrack, greedy methods are often simple to implement and run in near-linear time (or \(O(n\log n)\) when sorting is involved, which is not uncommon). Even when the greedy-choice property fails, they can yield pretty good approximations quickly—how close depends on the particular problem.

Pseudocode Skeleton

Although they are not all identical, many greedy algorithms follow the same basic pattern:

greedyAlgorithm(items):
    solution = empty
    sort or prioritize items
    for each item in items:
        if item can be added to solution feasibly:
            add item to solution
    return solution

Example 1: Interval Scheduling

Given a collection of \(n\) intervals (requests), each with a start time \(s_i\) and finish time \(f_i\), the goal is to select the largest possible subset of non-overlapping intervals. For full details, see the Interval Scheduling problem page.

This problem is easily solved by a greedy algorithm: always pick the interval with the earliest finish time that does not overlap with the ones you have already selected. By choosing the first finishing job first, you leave maximum room for the rest. The easiest way to accomplish this is to sort the intervals in increasing order according to finish time, then scan the list, choosing an interval if its start time is not before the previous finish time, and skipping it otherwise.This leads to the following pseducode.

intervalScheduling(intervals):
    sort intervals ascending by f_i
    selected = empty list
    lastFinish = -infinity
    for each interval in intervals:
        if interval.start >= lastFinish:
            add interval to selected
            lastFinish = interval.finish
    return selected

We can prove that the greedy algorithm produces an optimal solution by proving the problem has the two required properties.

Let's see the algorithm in action.

Time Complexity: This one is simple to analyze: sorting takes \(O(n\log n)\) time and the single scan takes \(O(n)\) time, so the overall complexity is \(O(n\log n + n) = O(n\log n)\).

Space Complexity: The algorithm requires a constant amount of extra space plus up to \(O(n)\) space to store the output list.

The only thing that makes this algorithm a little difficult to implement is the fact that we are sorting intervals instead of just numbers. But sorting data that contains multiple fields on one of those fields is necessary in many algorithms, and hopefully you are (or will become) comfortable implementing this idea. Some languages have mechanisms that make this easier in practice (e.g. Java's Comparable interface).

Example 2: Fractional Knapsack

Given \(n\) items, where item \(i\) has value \(v_i\) and weight \(w_i\), and a knapsack with capacity \(W,\) the fractional knapsack problem asks us to maximize total value by selecting whole or fractional items whose total weight does not exceed \(W\). If you are not familiar with this problem, see the Fractional Knapsack problem page before continuing.

The greedy solution works by computing the density \(p_i = v_i / w_i\) for each item, sorting all items in descending order of \(p_i\), and then iteratively taking as much of the current highest-density item as will fit until the knapsack is full. Because taking more of the item with the greatest value per unit weight can never reduce the total value achieved by any other feasible combination, the greedy strategy maximizes the final value. We will give a more formal/complete justification that the algorithm is correct a bit later.

Here is pseudocode for the algorithm based on the description:

// Input
// Items: list of records with fields  
//   .value: the value v_i of each item  
//   .weight: the weight w_i of each item  
// W: The capacity of the knapsack. that is, the maximum total weight
fractionalKnapsack(Items, W):
    # 1. Compute value-per-weight ratio for each item
    for each item in Items do
        item.ratio = item.value / item.weight
    end for

    # 2. Sort items by descending ratio
    sort Items so that
        Items[0].ratio >= Items[1].ratio >= ... >= Items[n-1].ratio

    remaining = W
    totalValue = 0

    # 3. Greedily fill the knapsack
    for each item in Items do
        if item.weight <= remaining
            # take the whole item
            totalValue = totalValue + item.value
            remaining = remaining - item.weight
        else
            # take only the fraction that fits
            fraction = remaining / item.weight
            totalValue = totalValue + item.value * fraction
            break    # knapsack is full
        end if
    end for

    return totalValue

This pseudocode only computes and returns the maximum value obtainable. It can easily be modified to return the list of items to take, although there is a subtlety here: We sorted the items, so we need to make sure when we return the taken items that they are correctly associated with the original items.

Here is a demonstration of the algorithm in action.

To prove that the algorithm computes an optimal solution, we will show that the problem exhibits optimal substructure and the greedy-choice property.

Time Complexity: \(O(n\log n)\) for sorting, plus \(O(n)\) time for selecting, for a total of \(O(n\log n + n)= O(n\log n)\) time.

Space Complexity: Requires \(O(n)\) extra space for the density array, plus a constant amount of additional space for indexing variables and such.

It should be noted that the greedy algorithm does not guarantee an optimal solution to the 0-1 Knapsack in which you are forced to either take an entire item or not at all. For more details about this algorithm, see the full Fractional Knapsack Algorithm page.

Algorithms Using This Technique

When to Use

Limitations

Implementation Tips

Common Pitfalls

Real-World Applications

Summary & Key Takeaways

Greedy algorithms build a solution by making the best local choice at each step, without revisiting prior decisions. When the greedy-choice property and optimal substructure hold, this yields an optimal result; otherwise it may serve as a fast approximation.

Reading Comprehension Questions

  1. What is the greedy-choice property?
  2. What does optimal substructure mean?
  3. Why does the interval scheduling algorithm select the interval with the earliest finish time?
  4. In fractional knapsack, how is the item density defined and why is it important?
  5. What is an exchange argument and how does it prove correctness for greedy algorithms?
  6. Give one example of a problem where a greedy algorithm fails to produce an optimal solution.
  7. What is the typical time complexity for greedy algorithms and what causes it?
  8. How can tie-breaking rules affect the outcome of a greedy algorithm?

In-Class Activities

  1. Counterexample/Proof: Pick either Interval Scheduling or Fractional Knapsack and try to find a counterexample that shows that the greedy algorithm doesn't alwayts work. Alternatively, convince yourself that the greedy algorithm will indeed always produce and optimal solution. Have one or more students give clear arguments for their case.
  2. Exchange Argument Workshop: In small groups, use colored cards to represent two schedules—one greedy (interval scheduling) and one 'optimal'. Step through the exchange argument by swapping later-finishing intervals for earlier ones until both match.
  3. Design and Test a Greedy Strategy: Give each pair a coin system that is not canonical (for example {1, 3, 4}). Have them propose and simulate a greedy coin-change algorithm, then find a counterexample where it fails.
  4. Fractional Knapsack Simulation: Each student receives a set of 5 items with given values and weights and a capacity. Manually compute densities \(p_i = v_i / w_i\), sort the items by density, and fill the knapsack with fractions. Compare with a brute-force check for the same instance.
  5. Greedy versus Non-Greedy Debate: Pick a problem. Split the class: half argue that a greedy strategy works for a given problem, half argue that it does not. Each side presents either a proof sketch or a counterexample.
  6. Minimum Stabbing Points: You are given a set of \(n\) closed intervals \([s_i, f_i]\) on the real line. Design a greedy algorithm that selects the smallest possible set of points \(P\) such that every interval contains at least one point in \(P\). Provide pseudocode, analyze its time complexity, and give a brief proof of correctness.
  7. Optimal Rope Merging: You have \(n\) ropes with lengths \(l_1, l_2, \dots, l_n\). Merging any two ropes of lengths \(a\) and \(b\) costs \(a + b\) and produces a new rope of length \(a + b\). Design a greedy strategy to merge all ropes into one at minimum total cost. Provide pseudocode, analyze its time complexity (hint: consider an appropriate data structure), and justify why your greedy choice yields an optimal solution.

Homework Problems

Basic

  1. Canonical Coin Change:
    For U.S. coin denominations \(\{1,5,10,25\}\), design a greedy algorithm to make change for any amount \(C\). Clearly define what the greedy choice is and prove that the greedy choice minimizes the number of coins.
  2. Non-Canonical Coin Change:
    For the coin system \(\{1,3,4\}\), design a greedy algorithm to make change and find an explicit amount where it fails to be optimal. Then propose a modification that fixes the counterexample. Give a clear justification for why the modifies algorithm is correct.
  3. Minimum Stabbing Points:
    Given \(n\) closed intervals on the line, devise a greedy algorithm that chooses the smallest set of points so each interval contains at least one point. Explain your choice and prove correctness.
  4. Rope Merging:
    You have \(n\) ropes of lengths \(l_1, \dots, l_n\). Merging two ropes of lengths \(a\) and \(b\) costs \(a + b\). Give a greedy strategy to minimize total merge cost, analyze its complexity, and argue why it is optimal.

Advanced

  1. Minimizing Maximum Lateness:
    Each job \(i\) has processing time \(t_i\) and deadline \(d_i\). Design a greedy algorithm to schedule all jobs so as to minimize the maximum lateness \(\max_i(C_i - d_i)\), where \(C_i\) is completion time. Justify why sorting by earliest deadline works.
  2. Job Sequencing with Deadlines and Profit:
    You have jobs with deadlines \(d_i\) and profits \(p_i\). Each job takes unit time. Devise a greedy strategy to maximize total profit by scheduling at most one job per time slot. Analyze your algorithm and prove why it works (clearly define what the greedy choice is and prove that the greedy-choice proprty holds).
  3. Greedy Set Cover Approximation:
    Given a universe \(U\) and a collection of subsets \(S_1, \dots, S_m\), design a greedy algorithm that attempts to cover \(U\) with the fewest possible subsets. Clearly explain what your greedy choice is. Determine the computational complexity of your algorithm. If you algorithm is correct (i.e. it always give an optimal solution), give a proof. If it is not, give a counterexample.