Depth-First Search (DFS) solves the Graph Traversal problem.
DFS explores a graph by diving as deep as possible along each branch before backtracking. It builds a traversal tree that reveals the structure of the graph, including which vertices are reachable from a given starting point and the order in which they are visited. This depth-first approach often uncovers recursive or hierarchical structures in the graph, making it well-suited for applications such as detecting cycles, exploring connected components, and performing topological sorts (in directed acyclic graphs).
The algorithm is typically implemented recursively using a helper function, though an iterative version using an explicit stack is also possible. The overall DFS process consists of two parts. The main function iterates through all vertices in the graph, starting a DFS traversal at each unexplored vertex. This ensures that the algorithm fully explores every component, even in disconnected graphs. The recursive helper function, when called on a vertex, explores all of its neighbors that have not yet been visited—diving deeper into the graph until it reaches a dead end, at which point it backtracks.
To understand the process more concretely, imagine starting at some vertex \( v \). We mark \( v \) as visited and record its discovery time. Then, for each of its neighbors \( u \), if \( u \) has not yet been visited, we record that \( u \)'s parent is \( v \) and recursively explore \( u \) in the same way. This exploration continues down a single path as far as possible. Once we reach a vertex whose neighbors are all visited, we record its finish time and backtrack to continue exploring other unexplored neighbors of its parent.
Here is the basic algorithm in pseudocode:
dfs()
n
be the number of vertices in the graph.n
:
visited[0..n−1] = false
parent[0..n−1] = −1
discovery[0..n−1] = −1
finish[0..n−1] = −1
time = 0
v
from 0
to n−1
:
visited[v] = false
, call dfsVisit(v)
dfsVisit(v)
visited[v] = true
time
; set discovery[v] = time
u
of v
:
visited[u] = false
:
parent[u] = v
dfsVisit(u)
time
; set finish[v] = time
As each vertex is visited, we record the time it was first discovered (called the discovery time) and the time it was fully explored (called the finish time). These timestamps are assigned using a single global counter and provide valuable insight into the traversal's structure.
These timestamps reflect the structure of the DFS traversal and help characterize the relationships between vertices—for example, whether one vertex is a descendant of another in the DFS tree. The timestamps allow us to reason about the order of visits and the relationship between vertices, and help classify edges into categories. In directed graphs, they are especially useful for identifying structural relationships like ancestry and edge directionality, which are key in classifying edges and analyzing connectivity.
To better understand edge classification during DFS, it's helpful to think of vertices as having three possible states or "colors": white (unvisited), gray (discovered but not finished), and black (completely finished). This coloring scheme makes it easier to identify edge types as we encounter them. Note that the color array effectively replaces the simple boolean visited array, providing more detailed state information.
Based on these vertex states, DFS can classify edges into different types as it encounters them. In undirected graphs, there are only two types: tree edges (leading to unvisited vertices) and back edges (leading to ancestors, indicating cycles). Directed graphs have two additional types: forward edges (leading to descendants already finished) and cross edges (leading to vertices in separate subtrees or components). We'll explore each of these in detail after seeing the algorithm in action.
Here is the enhanced pseudocode that explicitly tracks vertex colors and classifies edge types during traversal:
dfs()
n
be the number of vertices in the graph.n
:
color[0..n−1] = WHITE
parent[0..n−1] = −1
discovery[0..n−1] = −1
finish[0..n−1] = −1
time = 0
v
from 0
to n−1
:
color[v] = WHITE
, call dfsVisit(v)
dfsVisit(v)
color[v] = GRAY
time
; set discovery[v] = time
u
of v
:
color[u] = WHITE
:
parent[u] = v
dfsVisit(u)
color[u] = GRAY
:
parent[v] ≠ u
: Edge \((v, u)\) is a back edgecolor[u] = BLACK
:
color[v] = BLACK
time
; set finish[v] = time
Note: The edge classification comments in the pseudocode above (the "else if" clauses) are not always included in basic DFS implementations—they are shown here to illustrate how edge types can be identified during traversal. Many DFS-based algorithms (such as cycle detection, topological sorting, or finding strongly connected components) include this classification logic to perform their specific tasks, while a simple graph traversal might omit these checks.
Important for Undirected Graphs: In undirected graphs, the back edge detection condition should exclude the parent vertex to avoid false cycle detection. When examining edge \((v, u)\) from vertex \(v\), if \(u\) is gray but \(u = \text{parent}[v]\), this represents the edge we just traversed to reach \(v\), not a back edge indicating a cycle.
Let's first see DFS in action on an undirected graph. (The demo below uses blue/orange/green for white/gray/black because it's more visually appealing!)
In undirected graphs, DFS encounters only two types of edges during traversal:
parent
array.Note that forward and cross edges do not exist in undirected graphs because every edge \((u,v)\) appears as both \((u,v)\) and \((v,u)\). If an edge leads to a descendant or unrelated vertex, it would have been encountered as a tree edge when first discovered.
Directed graphs introduce additional complexity in edge classification. While the core DFS algorithm remains the same, directed graphs can exhibit four different types of edges, making the color-based vertex states especially useful for understanding the traversal structure. Next we will see DFS in action on a directed graph.
When DFS encounters an edge \((u, v)\) while exploring vertex \(u\) in a directed graph, the edge type can be determined immediately based on the current state of vertex \(v\):
parent[v] = u
and continue recursion.Note that during traversal, we cannot immediately distinguish between forward and cross edges—both appear as edges to finished vertices. This distinction requires timestamp analysis after the traversal completes.
After DFS of a directe graph completes, we can precisely classify all edges using the discovery and finish times. For an edge \((u, v):\)
The timestamps and edge relationships play a central role in many graph algorithms. The specific applications depend on whether the graph is directed or undirected:
Important note on cycles: In directed graphs, only back edges guarantee the presence of a cycle. Forward and cross edges do not indicate cycles—they simply represent different structural relationships in the graph. A directed graph is acyclic (a DAG) if and only if its DFS produces no back edges.
In short, discovery and finish times provide a timestamped record of the traversal that reveals the structural and temporal relationships between vertices—information that is crucial to many graph-based algorithms and analyses.
The following table summarizes how DFS classifies edges based on the relationship between two vertices \( u \) and \( v \), where the edge \((u, v)\) is explored during the traversal of vertex \( u \). Note: Forward and cross edges only exist in directed graphs—undirected graphs have only tree and back edges.
Edge Type | Graph Type | Structural Meaning | During Traversal | After Traversal (Timestamps) |
---|---|---|---|---|
Tree | Both | \(v\) was first discovered via edge \((u, v)\) \(v\) becomes child of \(u\ |
|
|
Back | Both | \(v\) is an ancestor of \(u\) in DFS tree |
|
|
Forward | Directed only | \(v\) is a descendant of \(u\), but already finished |
|
|
Cross | Directed only | \(v\) is in another subtree or component (no DFS tree relationship) |
|
|
Each implementation uses an adjacency list representation and tracks discovery and finish times. For simplicity, these implementations use a boolean visited array rather than the three-color scheme discussed earlier—the color approach is more useful when explicitly classifying edge types during traversal.
In these implementations, time
is shown as a global variable for clarity and to match the pseudocode structure. In practice, you might prefer to encapsulate this within a class or pass it as a parameter to maintain better code organization, but the global approach keeps the focus on the algorithm itself.
The adjacency list representation varies by language:
List<Integer>
objectsvector<int>
objectsadj[0] = [1, 2]
means vertex 0 connects to vertices 1 and 2)int time = 0;
void dfs(List<Integer>[] adj, int n) {
boolean[] visited = new boolean[n];
int[] discovery = new int[n];
int[] finish = new int[n];
int[] parent = new int[n];
Arrays.fill(parent, -1);
time = 0;
for (int v = 0; v < n; v++) {
if (!visited[v]) {
dfsVisit(v, adj, visited, discovery, finish, parent);
}
}
}
void dfsVisit(int v, List<Integer>[] adj, boolean[] visited,
int[] discovery, int[] finish, int[] parent) {
visited[v] = true;
time = time + 1;
discovery[v] = time;
for (int u : adj[v]) {
if (!visited[u]) {
parent[u] = v;
dfsVisit(u, adj, visited, discovery, finish, parent);
}
}
time = time + 1;
finish[v] = time;
}
int time = 0;
void dfs(vector<int> adj[], int n) {
vector<bool> visited(n, false);
vector<int> discovery(n), finish(n), parent(n, -1);
time = 0;
for (int v = 0; v < n; v++) {
if (!visited[v]) {
dfsVisit(v, adj, visited, discovery, finish, parent);
}
}
}
void dfsVisit(int v, vector<int> adj[], vector<bool>& visited,
vector<int>& discovery, vector<int>& finish, vector<int>& parent) {
visited[v] = true;
time = time + 1;
discovery[v] = time;
for (int u : adj[v]) {
if (!visited[u]) {
parent[u] = v;
dfsVisit(u, adj, visited, discovery, finish, parent);
}
}
time = time + 1;
finish[v] = time;
}
def dfs(adj, n):
visited = [False] * n
discovery = [0] * n
finish = [0] * n
parent = [-1] * n
def dfsVisit(v):
nonlocal time
visited[v] = True
time = time + 1
discovery[v] = time
for u in adj[v]:
if not visited[u]:
parent[u] = v
dfsVisit(u)
time = time + 1
finish[v] = time
time = 0
for v in range(n):
if not visited[v]:
dfsVisit(v)
Time Complexity: Overall time complexity is \(O(V + E)\), broken down as follows:
Since we perform \(O(1)\) work for each vertex and edge, the total time complexity is \(O(V + E)\). This is optimal for graph traversal since we must examine every vertex and edge at least once to ensure complete coverage.
Space Complexity: Auxiliary space used is \(O(V)\), including:
visited
, discovery
, finish
, and parent
arrays each require \(O(V)\) space.The total auxiliary space complexity is \(O(V)\).
Comparison with BFS: Both DFS and BFS have the same time and space complexity (\(O(V + E)\) time, \(O(V)\) space), but their space usage patterns differ:
For very deep graphs, DFS might exceed stack limits, while for very wide graphs, BFS might use more queue space.
Implementation Note: Many of these variations require modifications to the basic DFS structure, such as additional data structures, different termination conditions, or specialized processing during vertex/edge discovery.
A
(Process all vertex lists in alphabetic order).
For each vertex v
, record its discovery time d[v]
, finish time f[v]
, and parent parent[v]
in the resulting DFS tree (use None
or -
for the root).
Submit both your completed arrays and a drawing of the DFS tree that shows how the vertices are connected.
A
(Process all vertex lists in alphabetic order).
For each traversal, construct the resulting tree and discuss the differences.