String Matching (Brute Force)

Problem Solved

Here we present the Brute Force algorithm that solves the Exact String Matching problem.

Design and Strategy

The brute force approach to string matching is the most direct and intuitive method for finding a pattern \( P \) of length \( m \) within a text \( T \) of length \( n \). It systematically checks every possible starting position in the text where the pattern could begin and compares characters one-by-one.

At each shift position \( i \), the algorithm checks whether \( T[i..i+m-1] = P[0..m-1] \). That is, if characters \(i\) through \(i+m-1\) in the text all match, character by character, the characters of the pattern. If all characters match, then a match is found. Otherwise, it shifts the pattern by one position and tries again.

Here is a formal description of the algorithm:

  1. Let \( n \) be the length of text \( T \), and \( m \) be the length of pattern \( P \).
  2. For \( i = 0 \) to \( n - m \):
    1. For \( j = 0 \) to \( m - 1 \):
      • If \( T[i + j] \neq P[j] \), break and try the next \( i \).
    2. If inner loop completed without break, report match at position \( i \).
Example:

Let \( T = \text{ABCADABCAAB} \) (length \( n = 11 \)) and \( P = \text{ABC} \) (length \( m = 3 \)).
The algorithm checks each possible alignment of the pattern within the text, from position \( i = 0 \) to \( i = n - m = 8 \).
At each position \( i \), it compares \( P[0] \) to \( T[i] \), then \( P[1] \) to \( T[i+1] \), and so on, stopping early if a mismatch is found.

Green characters indicate matches.
Red indicates the first mismatch (which stops further comparisons).
Black characters are not checked due to early termination.

Matches occur at positions \( i = 0 \) and \( i = 5 \).
Text:  A B C A D A B C A A B
i=0:   A B C
i=1:     A B C
i=2:       A B C
i=3:         A B C
i=4:           A B C
i=5:             A B C
i=6:               A B C
i=7:                 A B C
i=8:                   A B C

The total number of character comparisons for this example is 15.

This algorithm exemplifies the brute force technique because it tries every possibility without using any preprocessing or insight about the pattern or text to eliminate unnecessary comparisons.

Try the following demo with several different patterns and texts to make sure you understand exactly what the algorithm is doing.

Implementation in Java, C++, Python

This is the straightforward version of brute-force string matching in three languages. The Java version assumes the text and pattern are stored as Strings, C++ is using std:string, and Python is using the str type. All three implementations find all matches.

void bruteForceMatch(String text, String pattern) {
    int n = text.length();
    int m = pattern.length();
    for (int i = 0; i <= n - m; i++) {
        int j = 0;
        while (j < m && text.charAt(i + j) == pattern.charAt(j)) {
            j++;
        }
        if (j == m) {
            System.out.println("Match at index " + i);
        }
    }
}
void bruteForceMatch(string text, string pattern) {
    int n = text.size();
    int m = pattern.size();
    for (int i = 0; i <= n - m; i++) {
        int j = 0;
        while (j < m && text[i + j] == pattern[j]) {
            j++;
        }
        if (j == m) {
            cout << "Match at index " << i << endl;
        }
    }
}
def brute_force_match(text, pattern):
    n, m = len(text), len(pattern)
    for i in range(n - m + 1):
        j = 0
        while j < m and text[i + j] == pattern[j]:
            j += 1
        if j == m:
            print(f"Match at index {i}")

Time/Space Analysis

Time Complexity: In the worst case, for each of the \( n - m + 1 \) positions, we may perform up to \( m \) comparisons. Therefore, the worst-case time is \( O((n - m + 1)m) = O(nm) \). This occurs, for example, when both the pattern and text contain repeated characters and nearly match at each position.

The best case occurs when an early mismatch happens at each position, leading to \( O(n) \) total comparisons.

Space Complexity: The algorithm uses \( O(1) \) extra space. It does not use any auxiliary arrays or recursion.

Variations/Improvements

Reading Comprehension Questions

  1. Algorithm Goal: What is the brute force string matching algorithm trying to do at each position?
  2. Character Comparisons: What is the role of the inner loop?
  3. Loop Boundaries: Why does the outer loop go from \( i = 0 \) to \( i = n - m \)?
  4. Complexity: What are the best-case and worst-case time complexities of this algorithm, and when do they occur?
  5. Alternatives: What are some improvements or alternatives to the brute force method?

In-Class Activities

  1. Trace by Hand: Given \( T = \text{ABCDABCD} \) and \( P = \text{BCD} \), trace the brute force algorithm step-by-step by hand. Show which characters are compared and how many comparisons are made in total. Compare your results with the demo.
  2. Best/Worst Case Examples: Come up with an example that would cause the algorithm to stop early at every shift (best case), and one that would result in the maximum number of comparisons (worst case). Justify your choices and compute the exact number of comparisons for each case.
  3. Build a Similar Algorithm (Strings): Write a brute-force algorithm to check whether a string \( P \) is a palindrome (reads the same forwards and backwards). Then compare its structure to the brute force string matching algorithm.
  4. Build a Similar Algorithm (Arrays): Write a brute-force algorithm that checks whether an array \( A \) of integers contains a contiguous subarray that matches a given pattern array \( B \). Compare your algorithm with the one given here.
  5. Subarray Sum Match: Given an array of positive integers \( A[0 \dots n-1] \) and an integer target value \( k \), determine whether there exists a contiguous subarray whose elements sum to exactly \( k \).
    1. Design and write a brute-force algorithm using a nested loop to solve this problem.
    2. What are the best-case and worst-case time complexities?
    3. Apply your algorithm to the example \( A = [2, 4, 1, 3, 6, 2, 5] \) with \( k = 10 \). Show all subarrays considered.
  6. Subarray Sum Match Version 2: Repeat the previous problem, but in this version negative numbers are allowed. Is this version easier or harder? Explain.
  7. Compare Loop Bounds: For several different values of \( n \) and \( m \), compute how many times the outer loop will run. Explain why the condition is \( i \leq n - m \) and what would go wrong if the loop ran shorter or longer.
  8. Visual Demo Exploration: Use the interactive demo to explore various combinations of text and pattern. Identify input cases that produce:
    1. A match on the first try
    2. No matches at all
    3. Multiple matches
    4. Worst-case comparison count
    Record how many character comparisons occur in each scenario.
  9. Efficiency Reflection: After completing several traces, discuss with a partner how the algorithm might be improved. Which comparisons seem redundant? Could we avoid them?
  10. Loop Inversion Challenge: Rewrite the algorithm so that the inner loop runs in reverse (from \( m - 1 \) to 0). Does this affect correctness or efficiency? Why or why not?

Homework Problems

  1. Trace a Match: Let \( T = \text{AABAACAADAABAABA} \) and \( P = \text{AABA} \). Trace the brute-force algorithm step-by-step and list all positions where matches occur. Count the total number of character comparisons.
  2. Best and Worst Case Exploration: For each of the following
    1. Construct a pattern \( P \) of length \( m \) and a text \( T \) of length \( n \) where the brute-force algorithm performs the fewest number of comparisons. Clearly state how many comparisons it is (based on \(n\) and \(m\)).
    2. Construct a pattern \( P \) of length \( m \) and a text \( T \) of length \( n \) where the brute-force algorithm performs the maximum number of comparisons. Clearly state how many comparisons it is (based on \(n\) and \(m\)).
  3. Pattern with Gaps: Suppose you want to find all occurrences of a 3-letter pattern in a text where only the first and last letters of the pattern must match the text. (Ignore the middle letter.)
    Write a brute-force algorithm (code or pseudocode) to solve this variation.
  4. Subarray Sum Match: Given an array of positive integers \( A[0 \dots n-1] \) and an integer target value \( k \), determine whether there exists a contiguous subarray whose elements sum to exactly \( k \).
    1. Design and write a brute-force algorithm using a nested loop to solve this problem.
    2. What are the best-case and worst-case time complexities?
    3. Apply your algorithm to the example \( A = [1, 5, 2, 7, 4, 2, 5] \) with \( k = 10 \). Show all subarrays considered.
  5. Substring of Repeated Characters: Write an algorithm to find the length of the longest contiguous substring in a given string that consists of the same character repeated. (For example, in "aaabbbaaac", the answer is 3.)
    Give your algorithm in pseudocode and analyze the time complexity.
  6. Find the First Match Only: Modify the brute-force string matching algorithm to stop after the first match is found. Write out the updated code/pseudocode and explain what changes you made. Clearly explain how the best- and worst-case complexities of this algorithm compare with the original version.
  7. Reverse Match: Write a brute-force algorithm that finds all occurrences of a pattern \( P \) in a text \( T \), but matching from right to left (i.e., compare the last character first, then move backward). Give clear pseudocode and explain whether this changes the number of comparisons.
  8. Character Histogram Match: Suppose you want to find all substrings of the text of length \( m \) that contain the same multiset of characters as the pattern (e.g., “cab” matches “abc”).
    Give a brute-force algorithm for this problem. Assume the alphabet is small and fixed (e.g., lowercase letters). Provide clear pseudocode and analyze the time and space complexity of your algorithm.