Breadth-First Search

Problem Solved

Breadth-First Search solves the Graph Traversal problem. Although the algorithm can work for both directed and undirected graphs, we will focus our attention on undirected graphs, leaving it to the reader to determine what changes when the graph is directed.

Design and Strategy

Breadth-First Search (BFS) is a fundamental graph traversal algorithm whose goal is to explore every vertex reachable from a given source vertex in the most systematic way possible. Specifically, it discovers each reachable node and determines a shortest path to it—measured in the fewest number of edges—in an unweighted graph. BFS is especially useful when we want to answer questions like: “Which vertices can be reached from here?” or “What is the shortest path to each one?”

The key idea behind BFS is to explore the graph in layers. It begins at the source and visits all neighboring vertices (distance 1), then all vertices that are two edges away (distance 2), and so on. This ensures that the first time a vertex is discovered, it is along a path with the fewest possible edges from the source.

To manage this layer-by-layer process, BFS uses a queue to hold vertices that have been discovered but not yet fully explored. At each step, the algorithm dequeues a vertex, examines its neighbors, and enqueues any that haven’t been visited yet. To prevent revisiting the same vertex (which is especially important in graphs with cycles), BFS marks each vertex as visited the moment it is enqueued.

As a result, BFS explores every node reachable from the source exactly once, and computes the shortest-path distance from the source to each of them. If desired, it can also be extended to record the full path taken to reach each vertex.

Examples

Example 1: BFS on an Acyclic Graph

On the graph below, a BFS from source \( A \) visits nodes in the order: \( A \) (distance 0), then \( B \), \( C \) (distance 1), and finally \( D \), \( E \) (distance 2).

A B C D E
Example 2: BFS on a Cyclic Graph

In this graph, BFS from \( A \) visits \( A \) (distance 0), then \( B \) and \( C \) (distance 1), followed by \( D \) and \( E \) (distance 2). When visiting \( B \), the algorithm examines neighbors \( A \), \( C \), and \( E \). Since \( A \) and \( C \) have already been discovered, only \( E \) is enqueued. Similarly, when visiting \( C \), it checks \( A \), \( B \), and \( D \), and enqueues only \( D \). This behavior—skipping neighbors that were already visited—ensures that each vertex is discovered exactly once and at the earliest possible moment, which guarantees that its recorded distance from the source is correct.

A B C D E
Example 3: BFS on a Disconnected Graph

In this graph, a BFS starting at \( A \) explores only the left component: \( A \) (distance 0), \( B \), \( C \) (distance 1), \( D \), \( E \) (distance 2). Nodes \( F \) and \( G \) are not reachable from \( A \), so they are not discovered.

A B C D E F G

Breadth-First Search not only explores all vertices reachable from a source vertex \( s \), but also computes the shortest distance (in number of edges) from \( s \) to every other reachable vertex. To support this, the algorithm maintains a dist array (or map), where dist[v] records the number of edges from \( s \) to vertex \( v \). Optionally, BFS can also maintain a pred array (or map), where pred[v] stores the predecessor of \( v \) on the shortest path from \( s \), which allows reconstructing the full path later.

Given a source vertex s, BFS proceeds as follows:

  1. Initialize all vertices as unvisited.
  2. Initialize arrays dist[] and pred[] with default values.
  3. Mark s as visited, set dist[s] = 0, pred[s] = null, and enqueue s.
  4. While the queue is not empty:
    1. Dequeue vertex u.
    2. For each neighbor v of u:
      1. If v is unvisited:
        1. Mark v as visited.
        2. Set dist[v] = dist[u] + 1.
        3. Set pred[v] = u.
        4. Enqueue v.

This procedure visits vertices in increasing order of their distance from s. The interactive demo below illustrates BFS on a sample graph, showing the layer-by-layer exploration. In the demo, vertex color indicates visit status: blue nodes are unvisited, orange nodes have been discovered but not fully processed, and green nodes have had all their neighbors explored (traditionally the colors used are white, gray, and black). While the pseudocode uses a simple boolean visited array, implementations like the demo may use multiple states to track the progress of each vertex more precisely.

This three-color model (unvisited, discovered, processed) is not only helpful for visualization — it plays an essential role in many algorithms built on top of BFS. For example, detecting whether a graph is bipartite involves assigning alternating colors to layers and checking for cross-layer conflicts, and certain cycle detection algorithms rely on distinguishing whether a neighbor has been fully processed or is still in the queue. While these ideas can be implemented using additional logic with a boolean visited array, the color/state distinction provides a more expressive and modular framework for reasoning about vertex status during traversal.

Implementation in Java, C++, Python

To implement BFS effectively, you need several supporting data structures: a visited[] array (or equivalent state labels) to track which vertices have been discovered, a queue to manage the frontier of the search, a dist[] array to record the shortest distance from the source to each vertex, and optionally a pred[] array to store the predecessor of each vertex on its shortest path. The dist array is updated as soon as a vertex is first discovered, guaranteeing that it reflects the correct shortest-path length, and the pred array can be used after traversal to reconstruct the shortest path by walking backward from any vertex to the source.

Each implementation below uses idiomatic data structures for its language:

These implementations do not include predecessor tracking or multi-state coloring, since the basic version of BFS does not require them for computing distances alone.

void bfs(List<Integer>[] graph, int s, int[] dist) {
    int n = graph.length;
    boolean[] visited = new boolean[n];
    Arrays.fill(dist, Integer.MAX_VALUE);
    Queue<Integer> q = new LinkedList<>();

    visited[s] = true;
    dist[s] = 0;
    q.add(s);

    while (!q.isEmpty()) {
        int u = q.poll();
        for (int v : graph[u]) {
            if (!visited[v]) {
                visited[v] = true;
                dist[v] = dist[u] + 1;
                q.add(v);
            }
        }
    }
}
void bfs(const vector<vector<int>>& graph, int s, vector<int>& dist) {
    int n = graph.size();
    vector<bool> visited(n, false);
    dist.assign(n, INT_MAX);
    queue<int> q;

    visited[s] = true;
    dist[s] = 0;
    q.push(s);

    while (!q.empty()) {
        int u = q.front();
        q.pop();
        for (int v : graph[u]) {
            if (!visited[v]) {
                visited[v] = true;
                dist[v] = dist[u] + 1;
                q.push(v);
            }
        }
    }
}
def bfs(graph, s):
    n = len(graph)
    visited = [False] * n
    dist = [float('inf')] * n
    q = deque()

    visited[s] = True
    dist[s] = 0
    q.append(s)

    while q:
        u = q.popleft()
        for v in graph[u]:
            if not visited[v]:
                visited[v] = True
                dist[v] = dist[u] + 1
                q.append(v)
    return dist

Time/Space Analysis

Time Complexity: Breadth-First Search runs in \( O(V + E) \) time when using an adjacency list representation. The outer while loop runs as long as the queue is nonempty, and since each vertex is enqueued at most once, it executes at most \( V \) times. However, we cannot simply multiply this by the size of the inner for loop to get the total work—doing so would overcount, leading to an overestimate of \( O(V \cdot d) \), where \( d \) is the maximum degree.

Instead, we analyze more carefully:

Thus, the total time is proportional to the number of vertices plus the number of adjacency-list entries: \( O(V + E) \).

Space Complexity: BFS uses \( O(V) \) additional space beyond the graph itself:

The graph is stored as an adjacency list, which occupies \( O(V + E) \) space and is considered part of the input. So the total extra space used by BFS is \( O(V) \).

Edge Classifications After BFS

Once BFS has completed, we can classify each edge in the graph based on how it relates to the traversal tree that was constructed:

These classifications are useful in applications such as detecting cycles, checking graph structure, or understanding redundancy in connections. In BFS, edges are examined when their endpoints are discovered during traversal, so classifications can be determined from the visited and pred arrays after the search. Note that edges in components not reachable from the source will not be explored at all.

Variations and Improvements

Reading Comprehension Questions

  1. Discovery Order: Why does BFS always discover each vertex using a shortest path (fewest number of edges)?
  2. Queue Role: How does the queue ensure that vertices are explored in increasing distance from the source?
  3. Color/Visited Status: Why do we mark a vertex as visited when it is enqueued rather than when it is dequeued?
  4. Edge Handling: How many times is each edge examined during BFS on an undirected graph, and why?
  5. Disconnected Graphs: What happens if the graph is disconnected and BFS is run from a single source?
  6. State Coloring: What is the purpose of using three colors (or states) instead of a simple visited flag?
  7. Distance Array: What is stored in dist[v] and when is it updated?
  8. Time Complexity: Why is the time complexity of BFS \( O(V + E) \) instead of something like \( O(V \cdot d) \)?

In-Class Activities

  1. Manual BFS Trace: Hand-trace BFS on a small graph (directed or undirected, maybe provided by your instructor), recording the discovery order, queue contents, and distance values at each step. Try at least one graph with cycles.
  2. Distance Identification: Have two people each draw a graph on the board: one directed and one undirected, with 6-10 nodes, and more than just a few edges. Pick a starting vertex for each. Label all reachable vertices with their distance from the source as computed by BFS. Discuss the differences in directed vs. undirected graphs.
  3. Explore the Demo: Use the embedded BFS visualization on several graphs, including disconnected and cyclic ones. Predict the next queue contents before each step and verify during playback.
  4. BFS vs. DFS: Perform both BFS and DFS from the same source vertex on the same graph. Compare and contrast the order in which nodes are visited. Discuss which algorithm is better suited for tasks like shortest path or cycle detection.
  5. Using pred[] to Reconstruct a Path: Run the demo for a graph with 9 or 10 nodes. Then for each vertex, use the pred[] array (called just p in the demo) to construct a shortest path from the source to the given vertex. Does each have the correct number of edges? When you are convinced you understand how it is done, give code/pseudocode for an algorithm to do this in general.
  6. Directed Graph Reachability: Run BFS on a directed graph and observe how only outward paths from the source are followed. Discuss what changes in the algorithm and whether time/space complexity changes. Also discuss: If there is and edge from \(A\) to \(B\), does that guarantee that there is a path from \(B\) to \(A\)?
  7. Bipartiteness Test: Modify BFS to determine whether or not a graph is bipartite. Give clear pseudocode or clearly describe how to modify BFS. Test your algorithm on both bipartite and non-bipartite graphs. Give the computational complexity of your modified algorithm. Hint: Use colors.
  8. Cycle Detection (Undirected Graph): Modify BFS to determine whether or not a graph is acyclic. Give clear pseudocode or clearly describe how to modify BFS. Give the computational complexity of your modified algorithm.
  9. Connected Components: Use repeated BFS runs (starting from each unvisited vertex) to identify and label the connected components of a disconnected graph. Try to visualize each component separately. Give clear pseudocode for your algorithm, including describing how you need to modify BFS (if necessary). Give the computational complexity of your algorithm.
  10. Shortest Path in a Grid Maze: Model a small maze as a grid graph. Mark walls as missing edges, and use BFS on paper to find the shortest path from entrance to exit. Record the discovered path using pred[].

Homework Problems

  1. BFS Tree Construction: Consider the graph shown below. Run BFS starting from vertex A. For each vertex v, record its parent pred[v] in the resulting BFS tree (use None or - for the root). Submit both your completed pred[] array and a drawing of the BFS tree that shows how the vertices are connected.
    A B C D E F G H
  2. Distance Labels: Using the BFS tree from Problem 1, list the distance (number of edges from the root) for each vertex. Explain how these distances relate to the BFS traversal order.
  3. Disconnected Graph Strategy: Given a graph with multiple components, describe and implement a BFS-based strategy to ensure every vertex is visited. How do you know when to restart BFS on a new component? Give clear pseudocode, and give te computational complexity of your algorithm.
  4. Compare BFS and DFS: Use the graph below and simulate both BFS and DFS starting from vertex A. Assume adjacency lists are stored in alphabetical order. For BFS, list the shortest distance to A for each vertex and clearly indicate the tree edges. For DFS, list the timestamps and clearly indicate the tree edges. How do the traversal patterns differ, and why?

    A B C D E F G H I J K L
  5. Shortest Path Reconstruction Using pred[]: Use the BFS demo on a graph with 9–10 vertices and observe how the p array (which stores predecessors) is updated during the traversal.
    1. Include the graph and table in your homework (take a screenshot or redraw it).
    2. For at least five different vertices (reachable from the source), use the values in p[] to manually reconstruct the path from the source to that vertex. Write out each path as a sequence of vertices.
    3. Confirm that the number of edges in each path matches the corresponding value in the dist[] array shown in the demo.
    4. Once you're confident in how the paths are formed, write pseudocode (or real code) for an algorithm that takes the source vertex \(s\), the pred[] array, and a target vertex \( t \), and returns the path from the \(s\) to \( t \) as a list of vertices in order from \(s\) to \(t\).
  6. Testing Bipartiteness with BFS: Modify the BFS algorithm to test whether a graph is bipartite (i.e., 2-colorable with no adjacent vertices sharing the same color).
    1. Write clear pseudocode or describe precisely how to add coloring logic to the BFS traversal.
    2. Implement and test your algorithm on at least one bipartite and one non-bipartite graph. Clearly indicate which are which and justify your answer.
    3. What is the time complexity of your modified algorithm? Justify it in terms of \( V \) and \( E \).
  7. Cycle Detection in Undirected Graphs with BFS: Modify the BFS algorithm to detect whether an undirected graph contains a cycle.
    1. Write clear pseudocode or describe precisely how to extend BFS to detect a cycle during traversal.
    2. Explain the condition that signals a cycle when examining neighbors of a vertex. Why is it important to keep track of each vertex’s predecessor?
    3. Apply your modified algorithm to two small graphs: one that contains a cycle and one that does not. Show when and how the cycle is (or is not) detected.
    4. Analyze the time complexity of your algorithm and explain why it is correct.
  8. Shortest Path in a Grid with Obstacles: Suppose you are given an \( n \times m \) grid, where each cell is either empty or blocked (represented by 0 and 1, respectively). Use BFS to compute the shortest path from a given source cell to a destination cell, moving only up/down/left/right into empty cells.
    1. Explain the idea behind your algorithm.
    2. Write pseudocode (or real code) for your modified BFS that handles grid boundaries and obstacles.
    3. Apply your code to the following grid and give the shortest path length or indicate that no path exists. Show the shortest path length at each cell in the grid!
      S
      G
    4. Explain the time and space complexity of your algorithm in terms of \( n \) and \( m \).
  9. All Vertices at Distance \( k \): Modify BFS so that it returns all vertices that are exactly \( k \) edges away from a given source vertex.
    1. Give clear pseudocode or code for your modification.
    2. Test your algorithm on a small graph (7–10 vertices), and report the vertices found at distance \( k \) for a few different values of \( k \).
    3. Discuss the runtime and what happens if \( k \) is larger than the graph's diameter.
  10. Even-Length Cycle Detection via BFS: Suppose you are given an undirected graph. Modify BFS to detect whether it contains an even-length cycle.
    1. Describe how BFS can be used to assign each vertex a parity level (even/odd distance from the source).
    2. Explain what condition during traversal indicates the presence of an even-length cycle.
    3. Explain what else needs to change in BFS to complete the algorithm.
    4. Test your method on two graphs: one with an even-length cycle and one without. Show what your algorithm reports.
    5. Provide a brief explanation of your algorithm's time complexity.