0-1 Knapsack (Dynamic Programming)

Design and Strategy

The 0-1 Knapsack problem is a classic use of Dynamic Programming (DP): a technique that breaks a problem into smaller, overlapping subproblems and stores their results to avoid recomputation. A naive recursive solution considers each item in turn and branches on two choices: include it or exclude it. This yields a binary recursion tree with up to \(2^n\) leaves and repeats many subcases. For example, different decision paths may both end up considering the same situation— say, "items 4 through 6 with 10 units of capacity left"—even though they reached it in different ways. DP recognizes these repeated subproblems and reuses their results, turning the exponential search into a polynomial-time table computation.

The key idea is to build the solution gradually, adding one item at a time. Suppose we already know the best way to fill a knapsack of any smaller capacity using the first \(i-1\) items. When we now consider item \(i\), there are two possibilities: take it (if it fits) or do not take it. If we skip it, the total value remains whatever was best for the first \(i-1\) items and the same capacity. If we take it, we gain its value \(v_i\) but reduce our remaining capacity by \(w_i\), so we must add \(v_i\) to the best value achievable with capacity \(w - w_i\) from the previous items.

This reasoning leads to a subproblem definition \(V(i, w)\): the maximum value achievable using items \(0\) through \(i\) with total capacity \(w\). We fill the table one item at a time, beginning with item 0.

Base case. For item 0, if it does not fit (\(w_0 > w\)), the best value is 0. Otherwise, we can take it, giving \(V(0, w) = v_0\) for all \(w \ge w_0\). The first row of the table is filled accordingly.
Recursive step. For each item \(i > 0\) and capacity \(w\):
- If the item is too heavy (\(w_i > w\)), we cannot take it, so we reuse the previous result since the best we can do still only involves items \(0\) through \(i-1\): \[V(i, w) = V(i-1, w)\]
- Otherwise, we compare two options and keep the better one:
  - Skip it, giving \[V(i, w) = V(i-1, w)\]
  - Take it, gaining \(v_i\) plus the best achievable value with the remaining capacity \(w - w_i\), i.e., \[V(i, w) = v_i + V(i-1, w - w_i)\]

Putting it all together, we get the following recursive formula to compute the optimal solution for the first \(i\) items and capacity \(w\):

\(V(i, w) = \begin{cases} 0, & \text{if } i = 0 \text{ and } w_0 > w,\\[6pt] v_0, & \text{if } i = 0 \text{ and } w_0 \le w,\\[6pt] V(i-1, w), & \text{if } w_i > w,\\[6pt] \max\big(V(i-1, w),\ v_i + V(i-1, w - w_i)\big), & \text{otherwise.} \end{cases}\)

A bottom-up DP implementation fills a table of size \(n \times (W+1)\), where each cell \(V(i, w)\) represents the best total value achievable using items \(0\) through \(i\) with capacity \(w\). The final entry \(V(n-1, W)\) gives the maximum total value.

For the first item \(i = 0\):
1. For each capacity \(w = 0..W\):
  - If \(w_0 \le w\), set \(V(0, w) = v_0\).
  - Otherwise, \(V(0, w) = 0\).
For each remaining item \(i = 1..n-1\):
1. For each capacity \(w = 0..W\):
  - If \(w_i \le w\), set \(V(i, w) = \max(V(i-1, w),\ v_i + V(i-1, w - w_i))\).
  - Otherwise, \(V(i, w) = V(i-1, w)\).
Return \(V(n-1, W)\) as the maximum total value.

After the table is filled, \(V(n-1, W)\) gives the best total value. To recover items, trace back from \((n-1, W)\). At each state \((i, w)\), compare \(V(i, w)\) with \(V(i-1, w)\):

Case 1: \(V(i, w) > V(i-1, w)\).
Item \(i\) was definitely taken. Record \(i\) and move to \((i-1,\, w - w_i)\).
Case 2: \(V(i, w) = V(i-1, w)\).
In this case, there is an optimal solution that does not include item \(i\).
- If we are only interested in one possible solution, then we can assume \(i\) was not taken and move to entry \((i-1, w)\).
- If we are interested in all solutions, we need to determine whether or not \(v_i + V(i-1,\, w - w_i) = V(i, w)\) to know how to proceed:
  - If \(v_i + V(i-1,\, w - w_i) \lt V(i, w)\), \(i\) is not in an optimal solution so we simply continue to entry \((i-1, w)\).
  - If \(v_i + V(i-1,\, w - w_i) = V(i, w)\), \(i\) is in an optimal solution, so we explore two paths from here:
    - Assume we take \(i\) and continue at entry \((i-1,\, w - w_i)\), and
    - assume we do not take \(i\) and continue to entry \((i-1, w)\).

Stop when \(i = 0\) or \(w = 0\). For \(i = 0\), include item 0 iff \(V(0, w) = v_0\) (i.e., it fits), then finish.

If this still feels abstract, the demonstration below will make the process more concrete. It shows both how the table of values is filled and how the optimal solution is then reconstructed by tracing back through the entries. Step through it slowly to see how each decision affects later values, and how the final path reveals which items make up the best combination.

Implementation in Java, C++, Python

The following implementations use a bottom-up dynamic programming approach to compute the value of the optimal solution, but not the items. Extending the code to include the list of items is left to the reader.

Input consists of two arrays of size \(n\), \(v[0..n-1]\) for values and \(w[0..n-1]\) for weights, together with a total capacity \(W\).
A 2D table \(V\) of size \(n \times (W+1)\) is used, where entry \(V(i, c)\) stores the best value obtainable using items \(0..i\) with capacity \(c\).
The algorithm returns the final value \(V(n-1, W)\), representing the maximum total value achievable.

A few details are worth noting:

The first row (item \(0\)) is initialized so that \(V(0, c) = v_0\) whenever \(w_0 \le c\), and \(0\) otherwise. This matches the table initialization shown in the demo.
For each subsequent item \(i > 0\) and capacity \(c\), the recurrence \[ V(i, c) = \max\big(V(i-1, c),\ v_i + V(i-1, c - w_i)\big) \] determines whether excluding or including item \(i\) gives a better total value.
We omit code that checks things like existence/length of the arrays, etc. to focus on the core concept of the algorithm itself.

public static int knapsack(int[] values, int[] weights, int W) {
    int n = values.length;
    int[][] K = new int[n][W + 1];

    // Row 0 initialization
    for (int c = 0; c <= W; c++) {
        K[0][c] = (weights[0] <= c) ? values[0] : 0;
    }

    // Fill remaining rows
    for (int i = 1; i < n; i++) {
        for (int c = 0; c <= W; c++) {
            if (weights[i] <= c) {
                K[i][c] = Math.max(K[i - 1][c], values[i] + K[i - 1][c - weights[i]]);
            } else {
                K[i][c] = K[i - 1][c];
            }
        }
    }
    return K[n - 1][W];
}

#include <algorithm>

int knapsack(int values[], int weights[], int n, int W) {
    int K[n][W + 1];

    // Row 0 initialization
    for (int c = 0; c <= W; c++) {
        K[0][c] = (weights[0] <= c) ? values[0] : 0;
    }

    // Fill remaining rows
    for (int i = 1; i < n; i++) {
        for (int c = 0; c <= W; c++) {
            if (weights[i] <= c) {
                K[i][c] = max(K[i - 1][c], values[i] + K[i - 1][c - weights[i]]);
            } else {
                K[i][c] = K[i - 1][c];
            }
        }
    }
    return K[n - 1][W];
}

def knapsack(values, weights, W):
    n = len(values)
    if n == 0 or W <= 0:
        return 0

    K = [[0] * (W + 1) for _ in range(n)]

    # Row 0 initialization
    for c in range(W + 1):
        K[0][c] = values[0] if weights[0] <= c else 0

    # Fill remaining rows
    for i in range(1, n):
        for c in range(W + 1):
            if weights[i] <= c:
                K[i][c] = max(K[i - 1][c], values[i] + K[i - 1][c - weights[i]])
            else:
                K[i][c] = K[i - 1][c]

    return K[n - 1][W]

Time/Space Analysis

Time Complexity. The table has \(n\) rows (items \(0..n-1\)) and \(W+1\) columns (capacities \(0..W\)). Each cell \(V(i, w)\) is computed in \(O(1)\) time from at most two earlier cells. Therefore the total time is \(O(nW)\).

Pseudo-polynomial note. The algorithm is considered pseudo-polynomial because its running time grows with the numeric value of \(W\) rather than the size of its representation. In practice, it runs efficiently when \(W\) is moderate but can become impractical when \(W\) is very large (for example, \(W = 2^n).\) The subtle point is that while the input size is proportional to \(\log W\)—the number of bits needed to write down \(W\)—the algorithm's time depends directly on \(W\) itself. Thus, even though the algorithm is polynomial in \(n\) and \(W\), it is not polynomial in the total input size. Since 0-1 Knapsack is NP-hard, this limitation is expected: a truly polynomial-time algorithm is unlikely to exist.

Pseudo-polynomial note. The running time depends on the numeric capacity \(W\), not on the bit-length \(\log W\). This algorithm is considered pseudo-polynomial time: it is efficient when \(W\) is moderate, but can be large when \(W\) is large (e.g if \(W=2^n\)). Since 0-1 Knapsack is NP-hard, it is not surprising that this algorithm is not polynomial time.

Space Complexity. The 2D table stores \(n(W+1)\) integers, so space is \(O(nW)\) beyond the input arrays. If we only need the optimal value (not reconstruction), we can compress to a 1D array of size \(W+1\), updating capacities in descending order, for \(O(W)\) extra space.

Reconstruction cost. If we keep the full table, recovering one optimal set of items by tracing back takes \(O(n)\) time and no extra asymptotic space (just an output list).

Variations/Improvements

Space optimization: The 2D table can be reduced from \(O(nW)\) to \(O(W)\) by keeping only one row (or two) at a time. Because each entry \(V(i, w)\) depends only on values from the previous row, a single 1D array can be updated in reverse order of \(w\) to avoid overwriting needed values. This only works if only the final cost is needed. If the actual items in the solution are needed, the whole table needs to be stored.
Fractional knapsack: When items can be divided, a greedy algorithm that sorts items by value-to-weight ratio solves the problem optimally in \(O(n \log n)\) time.
Value-based formulation: An alternative dynamic programming version uses total value as the second dimension instead of capacity. It computes the minimum weight needed to achieve each possible value. This approach runs in \(O(nV_{\max})\), where \(V_{\max}\) is the sum of all item values, and is useful when capacities are large but values are relatively small.
Approximation schemes: Because the problem is NP-hard, a fully polynomial time approximation scheme (FPTAS) can be used when exact solutions are too expensive. These algorithms scale the item values or weights by a chosen factor, solving a smaller version of the problem to produce a solution within \((1 - \varepsilon)\) of optimal in \(O(n^3 / \varepsilon)\) time.
Extended variants: Multidimensional and bounded knapsack problems introduce additional limits, such as multiple resource constraints or restricted item counts. They rely on the same DP principle but require higher-dimensional tables or more complex recurrences.

Reading Comprehension Questions

Problem framing: In one sentence, what does the 0-1 Knapsack problem ask you to compute?
Why DP here? What makes the problem suitable for dynamic programming rather than brute force?
Subproblem meaning: Precisely define \(V(i, w)\) as used in this chapter.
Base row (item 0): How is the first row of the table filled when we start indexing at item 0?
Recurrence logic: State the recurrence for \(V(i, w)\), and briefly explain the roles of the two terms compared in the max.
Backtracking: larger vs equal: While reconstructing, what does each of the following comparisons imply about item \(i\)?
1. \(V(i, w) > V(i-1, w)\)
2. \(V(i, w) = V(i-1, w)\)
All optimal solutions: In the equal case above, how do you decide whether to explore both branches or just one?
Time complexity: What is the time complexity of the bottom-up algorithm and why?
Pseudo-polynomial: Why is the algorithm called pseudo-polynomial, and how does \(\log W\) factor into that description?
Space tradeoffs: When is it valid to compress the table to \(O(W)\) space, and when must you keep the full \(n \times (W+1)\) table?

Answer: A subset of items whose total weight is at most \(W\) and whose total value is maximized (each item either taken once or not at all).
Answer: The 0-1 Knapsack problem suits dynamic programming because solving it recursively for \(i\) items and capacity \(w\) leads to many repeated subproblems. Different decision paths often revisit the same situation involving some smaller set of earlier items and a particular remaining capacity. Dynamic programming stores each of these subproblem results once and reuses them, eliminating the exponential repetition found in the naive recursive approach.
Answer: \(V(i, w)\) is the maximum value achievable using items \(0\) through \(i\) with capacity \(w\).
Answer: For each capacity \(w\): if \(w_0 \le w\), set \(V(0, w) = v_0\); otherwise \(V(0, w) = 0\).
Answer: \[ V(i, w) = \begin{cases} V(i-1, w), & \text{if } w_i > w,\\ \max\big(V(i-1, w),\ v_i + V(i-1, w - w_i)\big), & \text{otherwise.} \end{cases} \] The first term is skipping item \(i\); the second is taking \(i\) and adding its value to the best we can do with the reduced capacity.
Answer: (a) \(V(i, w) > V(i-1, w)\): item \(i\) was definitely taken; move to \((i-1, w - w_i)\). (b) \(V(i, w) = V(i-1, w)\): there exists an optimal solution that skips \(i\); to know if taking \(i\) is also optimal, compare \(v_i + V(i-1, w - w_i)\) with \(V(i, w)\).
Answer: When \(V(i, w) = V(i-1, w)\), we know there exists an optimal solution that skips item \(i\), but we must decide whether taking it is also possible.
- If \(w_i > w\), item \(i\) does not fit, so it cannot be part of any optimal solution. Move to \((i-1, w)\).
- If \(w_i \le w\) and \(v_i + V(i-1, w - w_i) < V(i, w)\), then taking \(i\) would lead to a smaller total value, so it was not taken. Move to \((i-1, w)\).
- If \(w_i \le w\) and \(v_i + V(i-1, w - w_i) = V(i, w)\), then both taking and skipping \(i\) yield the same total value. If you want only one optimal solution, you may skip it and move to \((i-1, w)\); if you want all optimal solutions, explore both paths: include \(i\) (go to \((i-1, w - w_i)\)) and exclude \(i\) (go to \((i-1, w)\)).
Answer: \(O(nW)\). There are \(n\) rows and \(W+1\) capacities, and each cell is computed in \(O(1)\) time from previously computed cells.
Answer: The runtime depends on the numeric value \(W\), not the number of bits to represent it. Since the input length for \(W\) is \(\Theta(\log W)\), the algorithm is polynomial in \(n\) and \(W\) but not in the total input size, hence pseudo-polynomial.
Answer: Compressing to \(O(W)\) is fine if you only need the optimal value, since each row depends only on the previous one and you can update capacities in descending order. If you need to reconstruct items (or enumerate multiple optimal solutions), keep the full table.

In-Class Activities

Trace a small example: Consider the instance of 0-1 knapsack with values \([7, 10, 2, 3, 2, 6]\), weights \([3, 2, 6, 1, 7, 4]\), and \(W=10\). Fill in the DP table by hand. Highlight which cells are compared at each step and identify the final value. Then work backward through the table to find the items in the optimal solution. Verify that the sum of their weights is at most 10 and that their values add up to the expected value. Time permitting, find all optimal solutions.
Top-down recursion walkthrough: Write out the recursive definition of \(V(i, w)\) as a function that returns the optimal value for items \(0..i\) and capacity \(w\). Consider the instance of 0-1 knapsack with values \([7, 10, 2, 3]\), weights \([2, 3, 1, 2]\), and \(W=5\). Draw a recursion tree that shows the computation of \(V(3,5)\) (The optimal value given item 0 through 3 and knapsack with \(W=5\)). Highlight the values in the tree that are computed more than once. Do you spot any concerns?
Adding memoization: Modify the top-down recursion so that before solving \(V(i, w)\), the function checks whether the value has already been computed. What must the algorithm do before the first call that does not have to be done with the bottom-up version (Hint: It has something to do with the array)? Write out the algorithm. Run this and the top-down recursive algorithm on a smallish example and compare the number of values that are computed. Which one seems to be better?
Bottom-up vs. top-down comparison: Discuss as a group how the order of computation differs between top-down with memoization and bottom-up tabulation. Why do both yield the same final value even though one fills a table iteratively and the other computes values on demand? Is one of them clearly better than the other one? Justify your answer.
Space reduction experiment: Take the bottom-up version and modify it to use only one array of length \(W+1\). Verify that the results match the full 2D version for several test cases.
Algorithm intuition check: Discuss why the greedy rule “take items in decreasing value-to-weight ratio” works for the fractional version but not for the 0-1 case. Construct a small counterexample to illustrate the failure.
Growth reflection: Try doubling the capacity \(W\) while keeping the number of items fixed. How does the number of subproblems change? Explain how this behavior supports the claim that the algorithm’s running time depends on the numeric value of \(W\), not its bit-length.

Homework Problems

Basic table trace: For values \([9, 14, 6, 7, 12, 8, 5]\), weights \([4, 6, 2, 3, 7, 5, 2]\), and \(W=15\):
1. Fill the DP table \(V(i,w)\) by hand (items \(i=0..6\), capacities \(w=0..15\)).
2. Report the final value \(V(6,15)\).
3. Backtrack to list one optimal subset of items and verify its total weight \(\le 15\) and total value equals \(V(6,15)\).
Trace and enumerate ties: For values \([8, 7, 6, 5, 5]\), weights \([5, 4, 3, 2, 2]\), and \(W=9\):
1. Compute the full DP table \(V(i,w)\) and report the final value \(V(4,9)\).
2. Backtrack to find all optimal solutions. Show each tie where \(V(i,w)=V(i-1,w)\) and check whether \(v_i + V(i-1, w - w_i) = V(i,w)\) to justify exploring both branches.
3. List each distinct optimal subset and verify its total weight \(\le 9\) and total value equals \(V(4,9)\).
Top-down with memoization: Write clear pseudocode for a top-down recursive algorithm with memoization for \(V(i,w)\). Then for the values from Problem 1:
1. Use the top-down algorithm to fill in the values of the table.
  1. Go backwards through the table marking the cells that the recursive calls need to compute.
  2. Fill in the values of the table as the recursion is unwinding.
  3. Clearly indicate which cells are not computed by leaving them blank.
  4. Clearly mark all of the cells that are referred to multiple times (This indicates that a lookup occured instead of a recomputation!).
2. Rank the algorithms by efficiency: bottom-up DP, top-down DP, or naive top-down. Justify/discuss the ranking.
Space-optimized DP sanity check: For values \([9, 4, 7, 3, 6, 9]\), weights \([6, 3, 4, 2, 4, 7]\), and \(W=12\):
1. Compute the optimal value using a full \(n \times (W+1)\) table.
2. Now compute the optimal value using a 1D array of length \(W+1\), updating capacities in descending order.
3. Verify the results match and briefly justify why descending order is required.
Greedy pitfall (0-1 vs fractional): Consider values \([8, 6, 5]\), weights \([6, 4, 3]\), and \(W=7\).
1. Apply the greedy rule that sorts by value-to-weight ratio and takes items while they fit. What set do you get?
2. Compute the true 0-1 optimal set and value via DP. Explain why greedy failed here.
3. What would the fractional knapsack solution achieve for the same data?
All optimal solutions count: For values \([5, 5, 5, 5]\), weights \([2, 2, 2, 2]\), and \(W=4\):
1. Compute the optimal value.
2. How many distinct optimal subsets exist? Justify your count using the table and the tie-handling logic.
Variation: exact-fill knapsack: Modify the DP to return the maximum value achievable using items \(0..i\) with capacity exactly \(w\) (solutions that do not fill capacity exactly are disallowed).
1. Give the new recurrence and base cases precisely.
2. Apply it to values \([9, 4, 8, 5, 6]\), weights \([4, 3, 5, 2, 6]\), with \(W=10\). Report the exact-fill optimum or state that none exists.
Variation: value-based DP: When the knapsack capacity \(W\) is very large but the total of all item values is relatively small, it can be more efficient to build a table based on total value rather than capacity.
- Define \(M(i, v)\) as the minimum total weight needed to achieve value \(v\) using the first \(i\) items (items \(0..i-1\)).
- To set up the table, first compute \[ V_{\max} = \sum_{j=0}^{n-1} v_j, \] so the table will have columns for all possible total values \(v = 0..V_{\max}\).
- Base cases. Use \(i = 0..n\) to mean "first \(i\) items." Set \[ M(0,0) = 0, \quad M(0,v) = \infty \text{ for } v > 0. \]
- Recurrence. For \(i \ge 1\): \[ M(i,v) = \begin{cases} M(i-1,v), & v < v_{i-1},\\[4pt] \min\big(M(i-1,v),\, w_{i-1} + M(i-1,\, v - v_{i-1})\big), & v \ge v_{i-1}. \end{cases} \]
- Recovering the optimal value. The optimal value for capacity \(W\) is \[ \max\{\,v \mid 0 \le v \le V_{\max} \text{ and } M(n,v) \le W\,\}. \]
- Recovering items. Backtracking through \(M\) can be used to reconstruct one or more optimal sets of items. Work out the details on your own.
Use this technique to solve the instance of 0-1 knapsack with values \([9, 14, 6, 7, 12, 8, 5]\), weights \([4, 6, 2, 3, 7, 5, 2]\), and \(W=15\). Report the optimal value and one corresponding set of items.
When does 1D space fail for reconstruction? Briefly explain why the 1D space-optimized approach is fine for computing the value but insufficient for reconstructing items in general. Propose one additional data structure or technique that enables reconstruction without storing the full \(n \times (W+1)\) table.
Pseudo-polynomial growth: Give an example of a 0-1 knapsack instance with \(n\) items for which the dynamic programming algorithm takes time exponential in the input size.
Hint: Think about how you could choose the item weights and the capacity \(W\) so that the input can be written using only about \(n\) numbers or bits, yet the algorithm would still have to fill an enormous \(n \times (W+1)\) table, taking on the order of \(O(2^n)\) or \(O(n2^n)\) operations. What is the role of \(W\) in making the algorithm slow? Be as precise as possible in your reasoning. (Reminder: we call an algorithm exponential if its running time is exponential in the size of the input, not necessarily in \(n\) itself.)