Greedy algorithms build a solution iteratively by making the most appropriate/optimal choice available at each step. When the following two properties hold, this approach is guaranteed to produce an optimal solution:
Bringing these together: the greedy-choice property lets us make the best local decision at each step, and optimal substructure ensures that each decision leaves a smaller problem that can still be solved optimally. Together, they guarantee the greedy method yields a global optimum.
In order to guarantee that an algorithm produces an optimal solution, a problem must exhibit both properties:
The general method of proving each property follows a general pattern:
Because they never backtrack, greedy methods are often simple to implement and run in near-linear time (or \(O(n\log n)\) when sorting is involved, which is not uncommon). Even when the greedy-choice property fails, they can yield pretty good approximations quickly—how close depends on the particular problem.
Although they are not all identical, many greedy algorithms follow the same basic pattern:
greedyAlgorithm(items):
solution = empty
sort or prioritize items
for each item in items:
if item can be added to solution feasibly:
add item to solution
return solution
Given a collection of \(n\) intervals (requests), each with a start time \(s_i\) and finish time \(f_i\), the goal is to select the largest possible subset of non-overlapping intervals. For full details, see the Interval Scheduling problem page.
This problem is easily solved by a greedy algorithm: always pick the interval with the earliest finish time that does not overlap with the ones you have already selected. By choosing the first finishing job first, you leave maximum room for the rest. The easiest way to accomplish this is to sort the intervals in increasing order according to finish time, then scan the list, choosing an interval if its start time is not before the previous finish time, and skipping it otherwise.This leads to the following pseducode.
intervalScheduling(intervals):
sort intervals ascending by f_i
selected = empty list
lastFinish = -infinity
for each interval in intervals:
if interval.start >= lastFinish:
add interval to selected
lastFinish = interval.finish
return selected
We can prove that the greedy algorithm produces an optimal solution by proving the problem has the two required properties.
Let's see the algorithm in action.
Time Complexity: This one is simple to analyze: sorting takes \(O(n\log n)\) time and the single scan takes \(O(n)\) time, so the overall complexity is \(O(n\log n + n) = O(n\log n)\).
Space Complexity: The algorithm requires a constant amount of extra space plus up to \(O(n)\) space to store the output list.
The only thing that makes this algorithm a little difficult to implement is the fact that we are
sorting intervals instead of just numbers. But sorting data that contains multiple fields
on one of those fields is necessary in many algorithms, and hopefully
you are (or will become) comfortable implementing this idea.
Some languages have mechanisms that make this easier in practice (e.g. Java's Comparable
interface).
Given \(n\) items, where item \(i\) has value \(v_i\) and weight \(w_i\), and a knapsack with capacity \(W,\) the fractional knapsack problem asks us to maximize total value by selecting whole or fractional items whose total weight does not exceed \(W\). If you are not familiar with this problem, see the Fractional Knapsack problem page before continuing.
The greedy solution works by computing the density \(p_i = v_i / w_i\) for each item, sorting all items in descending order of \(p_i\), and then iteratively taking as much of the current highest-density item as will fit until the knapsack is full. Because taking more of the item with the greatest value per unit weight can never reduce the total value achieved by any other feasible combination, the greedy strategy maximizes the final value. We will give a more formal/complete justification that the algorithm is correct a bit later.
Here is pseudocode for the algorithm based on the description:
// Input
// Items: list of records with fields
// .value: the value v_i of each item
// .weight: the weight w_i of each item
// W: The capacity of the knapsack. that is, the maximum total weight
fractionalKnapsack(Items, W):
# 1. Compute value-per-weight ratio for each item
for each item in Items do
item.ratio = item.value / item.weight
end for
# 2. Sort items by descending ratio
sort Items so that
Items[0].ratio >= Items[1].ratio >= ... >= Items[n-1].ratio
remaining = W
totalValue = 0
# 3. Greedily fill the knapsack
for each item in Items do
if item.weight <= remaining
# take the whole item
totalValue = totalValue + item.value
remaining = remaining - item.weight
else
# take only the fraction that fits
fraction = remaining / item.weight
totalValue = totalValue + item.value * fraction
break # knapsack is full
end if
end for
return totalValue
This pseudocode only computes and returns the maximum value obtainable. It can easily be modified to return the list of items to take, although there is a subtlety here: We sorted the items, so we need to make sure when we return the taken items that they are correctly associated with the original items.
Here is a demonstration of the algorithm in action.
To prove that the algorithm computes an optimal solution, we will show that the problem exhibits optimal substructure and the greedy-choice property.
Time Complexity: \(O(n\log n)\) for sorting, plus \(O(n)\) time for selecting, for a total of \(O(n\log n + n)= O(n\log n)\) time.
Space Complexity: Requires \(O(n)\) extra space for the density array, plus a constant amount of additional space for indexing variables and such.
It should be noted that the greedy algorithm does not guarantee an optimal solution to the 0-1 Knapsack in which you are forced to either take an entire item or not at all. For more details about this algorithm, see the full Fractional Knapsack Algorithm page.
Greedy algorithms build a solution by making the best local choice at each step, without revisiting prior decisions. When the greedy-choice property and optimal substructure hold, this yields an optimal result; otherwise it may serve as a fast approximation.