Depth-First Search

Problem Solved

Depth-First Search (DFS) solves the Graph Traversal problem.

Design and Strategy

DFS explores a graph by diving as deep as possible along each branch before backtracking. It builds a traversal tree that reveals the structure of the graph, including which vertices are reachable from a given starting point and the order in which they are visited. This depth-first approach often uncovers recursive or hierarchical structures in the graph, making it well-suited for applications such as detecting cycles, exploring connected components, and performing topological sorts (in directed acyclic graphs).

Core Algorithm

The algorithm is typically implemented recursively using a helper function, though an iterative version using an explicit stack is also possible. The overall DFS process consists of two parts. The main function iterates through all vertices in the graph, starting a DFS traversal at each unexplored vertex. This ensures that the algorithm fully explores every component, even in disconnected graphs. The recursive helper function, when called on a vertex, explores all of its neighbors that have not yet been visited—diving deeper into the graph until it reaches a dead end, at which point it backtracks.

To understand the process more concretely, imagine starting at some vertex \( v \). We mark \( v \) as visited and record its discovery time. Then, for each of its neighbors \( u \), if \( u \) has not yet been visited, we record that \( u \)'s parent is \( v \) and recursively explore \( u \) in the same way. This exploration continues down a single path as far as possible. Once we reach a vertex whose neighbors are all visited, we record its finish time and backtrack to continue exploring other unexplored neighbors of its parent.

Here is the basic algorithm in pseudocode:

  1. dfs()
    1. Let n be the number of vertices in the graph.
    2. Create arrays of size n:
      • visited[0..n−1] = false
      • parent[0..n−1] = −1
      • discovery[0..n−1] = −1
      • finish[0..n−1] = −1
    3. Set time = 0
    4. For each vertex v from 0 to n−1:
      • If visited[v] = false, call dfsVisit(v)
  2. dfsVisit(v)
    1. Set visited[v] = true
    2. Increment time; set discovery[v] = time
    3. For each neighbor u of v:
      • If visited[u] = false:
        • Set parent[u] = v
        • Call dfsVisit(u)
    4. Increment time; set finish[v] = time

Timestamps and Their Purpose

As each vertex is visited, we record the time it was first discovered (called the discovery time) and the time it was fully explored (called the finish time). These timestamps are assigned using a single global counter and provide valuable insight into the traversal's structure.

These timestamps reflect the structure of the DFS traversal and help characterize the relationships between vertices—for example, whether one vertex is a descendant of another in the DFS tree. The timestamps allow us to reason about the order of visits and the relationship between vertices, and help classify edges into categories. In directed graphs, they are especially useful for identifying structural relationships like ancestry and edge directionality, which are key in classifying edges and analyzing connectivity.

Vertex States and Color Scheme

To better understand edge classification during DFS, it's helpful to think of vertices as having three possible states or "colors": white (unvisited), gray (discovered but not finished), and black (completely finished). This coloring scheme makes it easier to identify edge types as we encounter them. Note that the color array effectively replaces the simple boolean visited array, providing more detailed state information.

Based on these vertex states, DFS can classify edges into different types as it encounters them. In undirected graphs, there are only two types: tree edges (leading to unvisited vertices) and back edges (leading to ancestors, indicating cycles). Directed graphs have two additional types: forward edges (leading to descendants already finished) and cross edges (leading to vertices in separate subtrees or components). We'll explore each of these in detail after seeing the algorithm in action.

Here is the enhanced pseudocode that explicitly tracks vertex colors and classifies edge types during traversal:

  1. dfs()
    1. Let n be the number of vertices in the graph.
    2. Create arrays of size n:
      • color[0..n−1] = WHITE
      • parent[0..n−1] = −1
      • discovery[0..n−1] = −1
      • finish[0..n−1] = −1
    3. Set time = 0
    4. For each vertex v from 0 to n−1:
      • If color[v] = WHITE, call dfsVisit(v)
  2. dfsVisit(v)
    1. Set color[v] = GRAY
    2. Increment time; set discovery[v] = time
    3. For each neighbor u of v:
      • If color[u] = WHITE:
        • Set parent[u] = v
        • Edge \((v, u)\) is a tree edge
        • Call dfsVisit(u)
      • Else if color[u] = GRAY:
        • If parent[v] ≠ u: Edge \((v, u)\) is a back edge
      • Else if color[u] = BLACK:
        • Edge \((v, u)\) is a forward or cross edge
    4. Set color[v] = BLACK
    5. Increment time; set finish[v] = time

Note: The edge classification comments in the pseudocode above (the "else if" clauses) are not always included in basic DFS implementations—they are shown here to illustrate how edge types can be identified during traversal. Many DFS-based algorithms (such as cycle detection, topological sorting, or finding strongly connected components) include this classification logic to perform their specific tasks, while a simple graph traversal might omit these checks.

Important for Undirected Graphs: In undirected graphs, the back edge detection condition should exclude the parent vertex to avoid false cycle detection. When examining edge \((v, u)\) from vertex \(v\), if \(u\) is gray but \(u = \text{parent}[v]\), this represents the edge we just traversed to reach \(v\), not a back edge indicating a cycle.

Demonstration: Undirected Graphs

Let's first see DFS in action on an undirected graph. (The demo below uses blue/orange/green for white/gray/black because it's more visually appealing!)

Edge Types in Undirected Graphs

In undirected graphs, DFS encounters only two types of edges during traversal:

Note that forward and cross edges do not exist in undirected graphs because every edge \((u,v)\) appears as both \((u,v)\) and \((v,u)\). If an edge leads to a descendant or unrelated vertex, it would have been encountered as a tree edge when first discovered.

Directed Graphs: Added Complexity

Directed graphs introduce additional complexity in edge classification. While the core DFS algorithm remains the same, directed graphs can exhibit four different types of edges, making the color-based vertex states especially useful for understanding the traversal structure. Next we will see DFS in action on a directed graph.

All Four Edge Types in Directed Graphs

When DFS encounters an edge \((u, v)\) while exploring vertex \(u\) in a directed graph, the edge type can be determined immediately based on the current state of vertex \(v\):

Note that during traversal, we cannot immediately distinguish between forward and cross edges—both appear as edges to finished vertices. This distinction requires timestamp analysis after the traversal completes.

Post-Traversal Classification Using Timestamps

After DFS of a directe graph completes, we can precisely classify all edges using the discovery and finish times. For an edge \((u, v):\)

Applications of Edge Classification

The timestamps and edge relationships play a central role in many graph algorithms. The specific applications depend on whether the graph is directed or undirected:

Applications for Both Directed and Undirected Graphs

Applications Specific to Directed Graphs

Applications Specific to Undirected Graphs

Important note on cycles: In directed graphs, only back edges guarantee the presence of a cycle. Forward and cross edges do not indicate cycles—they simply represent different structural relationships in the graph. A directed graph is acyclic (a DAG) if and only if its DFS produces no back edges.

In short, discovery and finish times provide a timestamped record of the traversal that reveals the structural and temporal relationships between vertices—information that is crucial to many graph-based algorithms and analyses.

Summary: Edge Classification Table

The following table summarizes how DFS classifies edges based on the relationship between two vertices \( u \) and \( v \), where the edge \((u, v)\) is explored during the traversal of vertex \( u \). Note: Forward and cross edges only exist in directed graphs—undirected graphs have only tree and back edges.

Edge Type Graph Type Structural Meaning During Traversal After Traversal (Timestamps)
Tree Both \(v\) was first discovered via edge \((u, v)\)
\(v\) becomes child of \(u\
  • color[v] = WHITE
  • Check: parent[v] = u
Back Both \(v\) is an ancestor of \(u\) in DFS tree
  • color[v] = GRAY
  • Indicates cycle
  • Undirected: Must check parent[u] ≠ v
  • Timestamps: \(d[v] < d[u] < f[u] < f[v]\)
Forward Directed only \(v\) is a descendant of \(u\), but already finished
  • color[v] = BLACK
  • Cannot distinguish from cross edge yet
  • Timestamps: \(d[u] < d[v] < f[v] < f[u]\)
  • Verify: parent[v] ≠ u
Cross Directed only \(v\) is in another subtree or component (no DFS tree relationship)
  • color[v] = BLACK
  • Cannot distinguish from forward edge yet
  • Timestamps: \(f[v] < d[u]\)

Implementation in Java, C++, Python

Each implementation uses an adjacency list representation and tracks discovery and finish times. For simplicity, these implementations use a boolean visited array rather than the three-color scheme discussed earlier—the color approach is more useful when explicitly classifying edge types during traversal.

In these implementations, time is shown as a global variable for clarity and to match the pseudocode structure. In practice, you might prefer to encapsulate this within a class or pass it as a parameter to maintain better code organization, but the global approach keeps the focus on the algorithm itself.

The adjacency list representation varies by language:

int time = 0;

void dfs(List<Integer>[] adj, int n) {
    boolean[] visited = new boolean[n];
    int[] discovery = new int[n];
    int[] finish = new int[n];
    int[] parent = new int[n];
    Arrays.fill(parent, -1);
    time = 0;
    
    for (int v = 0; v < n; v++) {
        if (!visited[v]) {
            dfsVisit(v, adj, visited, discovery, finish, parent);
        }
    }
}

void dfsVisit(int v, List<Integer>[] adj, boolean[] visited,
              int[] discovery, int[] finish, int[] parent) {
    visited[v] = true;
    time = time + 1;
    discovery[v] = time;
    
    for (int u : adj[v]) {
        if (!visited[u]) {
            parent[u] = v;
            dfsVisit(u, adj, visited, discovery, finish, parent);
        }
    }
    
    time = time + 1;
    finish[v] = time;
}
int time = 0;

void dfs(vector<int> adj[], int n) {
    vector<bool> visited(n, false);
    vector<int> discovery(n), finish(n), parent(n, -1);
    time = 0;
    
    for (int v = 0; v < n; v++) {
        if (!visited[v]) {
            dfsVisit(v, adj, visited, discovery, finish, parent);
        }
    }
}

void dfsVisit(int v, vector<int> adj[], vector<bool>& visited,
              vector<int>& discovery, vector<int>& finish, vector<int>& parent) {
    visited[v] = true;
    time = time + 1;
    discovery[v] = time;
    
    for (int u : adj[v]) {
        if (!visited[u]) {
            parent[u] = v;
            dfsVisit(u, adj, visited, discovery, finish, parent);
        }
    }
    
    time = time + 1;
    finish[v] = time;
}
def dfs(adj, n):
    visited = [False] * n
    discovery = [0] * n
    finish = [0] * n
    parent = [-1] * n
    
    def dfsVisit(v):
        nonlocal time
        visited[v] = True
        time = time + 1
        discovery[v] = time
        
        for u in adj[v]:
            if not visited[u]:
                parent[u] = v
                dfsVisit(u)
        
        time = time + 1
        finish[v] = time

    time = 0
    for v in range(n):
        if not visited[v]:
            dfsVisit(v)

Time/Space Analysis

Time Complexity: Overall time complexity is \(O(V + E)\), broken down as follows:

Since we perform \(O(1)\) work for each vertex and edge, the total time complexity is \(O(V + E)\). This is optimal for graph traversal since we must examine every vertex and edge at least once to ensure complete coverage.

Space Complexity: Auxiliary space used is \(O(V)\), including:

The total auxiliary space complexity is \(O(V)\).

Comparison with BFS: Both DFS and BFS have the same time and space complexity (\(O(V + E)\) time, \(O(V)\) space), but their space usage patterns differ:

For very deep graphs, DFS might exceed stack limits, while for very wide graphs, BFS might use more queue space.

Variations/Improvements

Algorithmic Variations

Graph Analysis Applications

Specialized DFS Variants

Performance Optimizations

Extensions for Weighted Graphs

Implementation Note: Many of these variations require modifications to the basic DFS structure, such as additional data structures, different termination conditions, or specialized processing during vertex/edge discovery.

Reading Comprehension Questions

  1. Core Algorithm Strategy: What is the fundamental difference between how DFS and BFS explore a graph?
  2. Component Coverage: How does the outer loop in the main DFS function ensure that all vertices are visited, even in disconnected graphs?
  3. Discovery vs Finish Times: What events in the DFS algorithm correspond to recording a vertex's discovery time versus its finish time?
  4. Undirected vs Directed Complexity: Why do undirected graphs have only two edge types while directed graphs have four? What prevents forward and cross edges from existing in undirected graphs?
  5. Color-Based Classification: Why is the three-color scheme (white/gray/black) more useful than a simple visited boolean for edge classification during traversal?
  6. Edge Types and Cycles: In a directed graph, which edge type(s) guarantee the presence of a cycle, and why don't the other edge types indicate cycles?
  7. Timestamp Analysis: Given timestamps for vertices u and v, how would you determine if edge (u,v) is a forward edge versus a cross edge?
  8. Applications by Graph Type: Why are cross edges particularly important for directed graph algorithms like topological sorting, while back edges are crucial for cycle detection in both directed and undirected graphs?
  9. Space Complexity Trade-offs: Under what graph conditions might DFS be preferable to BFS in terms of space usage, and when might the opposite be true?

In-Class Activities

  1. Manual DFS Trace: Hand-trace DFS on a small graph (directed or undirected, maybe provided by your instructor), recording the discovery and finish times, as well as the order vertices are visited. Try at least one graph with cycles and observe how back edges are identified.
  2. Discovery vs Finish Times Analysis: Have two people each draw a graph on the board: one directed and one undirected, with 10-12 nodes. Run DFS and record discovery/finish times for all vertices. Mark each edge as tree, back, forward, or cross based on the colors and timestamps. Discuss how the edge types differ between directed and undirected graphs.
  3. Explore the Demo: Use the embedded DFS visualization on several graphs, including disconnected and cyclic ones. Focus on how the color changes (white→gray→black) correspond to vertex states, and predict which edges will be classified as tree vs back edges.
  4. DFS vs BFS Comparison: Perform both DFS and BFS from the same source vertex on the same graph. Compare and contrast the order in which nodes are visited and the resulting tree structures. Discuss which algorithm is better suited for different tasks like cycle detection vs shortest paths.
  5. Edge Classification Practice: Run the DFS demo on a directed graph with 8-10 vertices and observe how edges are classified during traversal. For each edge type (tree, back, forward, cross), identify at least one example and explain why it has that classification based on vertex colors and timestamps.
  6. Directed Graph Edge Analysis: Run DFS on a directed graph and classify all edges as tree, back, forward, or cross edges. Use both the color-based method (during traversal) and timestamp analysis (after completion) to verify your classifications match.
  7. Cycle Detection Modification: Modify DFS to detect cycles in both directed and undirected graphs. Give clear pseudocode for each case and explain the difference in detection methods. Test your algorithms on graphs with and without cycles.
  8. Connected Components with DFS: Use repeated DFS runs (starting from each unvisited vertex) to identify and label the connected components of a disconnected graph. Give clear pseudocode for your algorithm and determine its computational complexity. Compare this approach to using BFS for the same task.
  9. Topological Sorting Application: Use DFS on a directed acyclic graph (DAG) to produce a topological ordering by sorting vertices in decreasing finish time order. Verify that your ordering satisfies the topological property: for every directed edge (u,v), vertex u appears before v in the ordering.
  10. Tree Traversal Connection: Apply DFS to a tree structure and observe how the discovery times correspond to preorder traversal while finish times correspond to postorder traversal. Explain why DFS naturally produces these classic tree traversal orders.
  11. Iterative vs Recursive DFS: Implement both recursive DFS (using the call stack) and iterative DFS (using an explicit stack). Compare their behavior on the same graph and discuss when you might prefer one approach over the other, especially considering stack overflow issues for deep graphs.

Homework Problems

  1. DFS Tree Construction with Timestamps: Consider the graph shown below. Run DFS starting from vertex A (Process all vertex lists in alphabetic order). For each vertex v, record its discovery time d[v], finish time f[v], and parent parent[v] in the resulting DFS tree (use None or - for the root). Submit both your completed arrays and a drawing of the DFS tree that shows how the vertices are connected.
    A B C D E F G H
  2. Edge Classification in Directed Graphs: Using the DFS demo on a directed graph with 8-10 vertices, complete a full edge classification analysis.
    1. Include the graph and timestamp table in your homework (take a screenshot or redraw it).
    2. For each edge in the graph, classify it as tree, back, forward, or cross using both the color-based method (during traversal) and timestamp analysis (after completion).
    3. Verify that your classifications are consistent between the two methods.
    4. Identify any cycles in the graph and explain how the back edges reveal them.
  3. Parenthesis Structure Verification: The parenthesis structure property states that for any two vertices u and v in a DFS traversal, their time intervals [d[u], f[u]] and [d[v], f[v]] have one of two relationships: they are either completely disjoint (non-overlapping) or one is completely nested within the other. This creates a structure similar to properly matched parentheses, where "((" must be followed by "))" rather than ")(".

    Using the DFS tree from Problem 1, verify the parenthesis structure property for all pairs of vertices.
    1. For each pair of vertices u and v, determine whether their time intervals [d[u], f[u]] and [d[v], f[v]] are disjoint or nested.
    2. When intervals are nested, identify the ancestor-descendant relationship and verify it matches the DFS tree structure.
    3. Explain why disjoint intervals indicate that neither vertex is an ancestor of the other.
  4. Compare DFS and BFS Tree Structures: Use the graph below and simulate both DFS and BFS starting from vertex A (Process all vertex lists in alphabetic order). For each traversal, construct the resulting tree and discuss the differences.

    A B C D E F G H I J K L
  5. Cycle Detection Using DFS: Modify the DFS algorithm to detect cycles in both directed and undirected graphs.
    1. Write clear pseudocode for cycle detection in undirected graphs using DFS. Explain how back edges indicate cycles.
    2. Write clear pseudocode for directed cycle detection in directed graphs using DFS. Explain why only back edges (not forward or cross edges) guarantee cycles.
    3. Test both algorithms on small examples: create graphs with and without cycles for each case.
    4. Analyze the time complexity of your modified algorithms.
  6. Topological Sorting with DFS: Use DFS to implement topological sorting for directed acyclic graphs (DAGs).
    1. Write pseudocode for DFS-based topological sorting using finish times.
    2. Apply your algorithm to a small DAG representing task dependencies (6-8 tasks).
    3. Verify that your topological order satisfies the property: for every directed edge (u,v), vertex u appears before v in the ordering.
    4. Explain what happens if you try to run your algorithm on a directed graph with cycles.
  7. Connected Components with DFS: Use repeated DFS runs to identify connected components in disconnected graphs.
    1. Write clear pseudocode for your algorithm that labels each vertex with its component number.
    2. Apply your algorithm to a disconnected graph with 3-4 components (10-12 vertices total).
    3. Determine the computational complexity of your algorithm in terms of V and E.
    4. Compare the efficiency of using DFS vs BFS for this task.
  8. Iterative DFS Implementation: Implement DFS using an explicit stack instead of recursion to avoid stack overflow issues.
    1. Write pseudocode or real code for iterative DFS that produces the same traversal order as recursive DFS.
    2. Test your implementation on a small graph and compare the visit order with recursive DFS.
    3. Explain when you might prefer the iterative version over the recursive version.
    4. Discuss any differences in space complexity between the two approaches.
  9. DFS Tree Traversal Applications: Explore the connection between DFS and classic tree traversals.
    1. Apply DFS to a tree structure and show how discovery times correspond to preorder traversal.
    2. Show how finish times correspond to postorder traversal of the same tree.
    3. Write pseudocode for extracting preorder and postorder sequences from DFS timestamps.
    4. Explain why DFS naturally produces these traversal orders and discuss practical applications.
  10. Strongly Connected Components (Advanced): Research and implement one of the classic SCC algorithms that uses DFS as a foundation.
    1. Choose either Kosaraju's algorithm or Tarjan's algorithm and write detailed pseudocode.
    2. Apply your chosen algorithm to a small directed graph (8-10 vertices) with multiple SCCs.
    3. Verify your results by checking that vertices in the same SCC can reach each other.
    4. Analyze the time complexity and explain why DFS is essential to the algorithm's correctness.