String Algorithms

String Problems in CP

String algorithms unlock pattern matching, palindrome detection, and text processing problems. The key is choosing the right tool.

Pattern Recognition Signals:

“Find pattern in text” → KMP, Z-function, or Hashing
“Count occurrences of pattern” → KMP or Hashing
“Longest palindrome” → Manacher’s algorithm
“Prefix queries on dictionary” → Trie
“Compare substrings” → Hashing with binary search

Algorithm Selection Guide

Problem Type	Best Algorithm	Complexity
Single pattern search	KMP / Z-function	O(n + m)
Multiple pattern search	Aho-Corasick	O(n + m + k)
Substring comparison	Rolling Hash	O(1) per query
Longest palindromic substring	Manacher	O(n)
Prefix/suffix queries	Trie	O(length)
Lexicographically smallest rotation	Duval / Booth	O(n)

String Hashing

Convert strings to numbers for fast comparison. The Problem: Comparing two strings of length n takes O(n). For many comparisons, this is too slow. The Insight: Map each string to a number (hash). If hashes differ, strings differ. If hashes match, strings are probably equal. How it works: Treat the string as a number in base B:

H(s) = s[0] \cdot B^{n-1} + s[1] \cdot B^{n-2} + ... + s[n-1] \cdot B^0 \pmod{MOD}

Example: “abc” with B=31:

H = 1 \cdot 31^2 + 2 \cdot 31^1 + 3 \cdot 31^0 = 961 + 62 + 3 = 1026

Substring Hash in O(1): Using prefix hashes:

H(s[l..r]) = H(s[0..r]) - H(s[0..l-1]) \cdot B^{r-l+1}

Collision Risk: Two different strings can have the same hash. Use:

Large prime MOD (10^9 + 7)
Double hashing: Two different (BASE, MOD) pairs—collision probability becomes ~1/10^18

class StringHash {
public:
    const long long MOD = 1e9 + 7;
    const long long BASE = 31;
    
    int n;
    vector<long long> hash_, power;
    
    StringHash(const string& s) : n(s.size()), hash_(n + 1), power(n + 1) {
        power[0] = 1;
        for (int i = 0; i < n; i++) {
            hash_[i + 1] = (hash_[i] * BASE + (s[i] - 'a' + 1)) % MOD;
            power[i + 1] = (power[i] * BASE) % MOD;
        }
    }
    
    // Get hash of s[l..r] (0-indexed, inclusive)
    long long getHash(int l, int r) {
        long long h = (hash_[r + 1] - hash_[l] * power[r - l + 1] % MOD + MOD) % MOD;
        return h;
    }
};

// Usage
string s = "abcabc";
StringHash sh(s);
cout << (sh.getHash(0, 2) == sh.getHash(3, 5));  // true: "abc" == "abc"

Double Hashing (Reduces Collision)

class DoubleHash {
public:
    const long long MOD1 = 1e9 + 7, MOD2 = 1e9 + 9;
    const long long BASE1 = 31, BASE2 = 37;
    
    int n;
    vector<long long> h1, h2, p1, p2;
    
    DoubleHash(const string& s) : n(s.size()), 
        h1(n + 1), h2(n + 1), p1(n + 1), p2(n + 1) {
        p1[0] = p2[0] = 1;
        for (int i = 0; i < n; i++) {
            int c = s[i] - 'a' + 1;
            h1[i + 1] = (h1[i] * BASE1 + c) % MOD1;
            h2[i + 1] = (h2[i] * BASE2 + c) % MOD2;
            p1[i + 1] = p1[i] * BASE1 % MOD1;
            p2[i + 1] = p2[i] * BASE2 % MOD2;
        }
    }
    
    pair<long long, long long> getHash(int l, int r) {
        long long hash1 = (h1[r + 1] - h1[l] * p1[r - l + 1] % MOD1 + MOD1) % MOD1;
        long long hash2 = (h2[r + 1] - h2[l] * p2[r - l + 1] % MOD2 + MOD2) % MOD2;
        return {hash1, hash2};
    }
};

KMP Algorithm

Find all occurrences of pattern in text in O(n + m). The Problem: Naive search is O(n·m)—for each position in text, compare entire pattern. KMP’s Insight: When a mismatch occurs, we’ve already matched some characters. Use this information to skip ahead instead of starting over.

Computing Failure Function (LPS Array)

LPS[i] = length of the Longest Proper Prefix of pattern[0..i] which is also a Suffix. Example: pattern = “ABABAC”

i=0: "A"      → No proper prefix → LPS[0] = 0
i=1: "AB"     → No match → LPS[1] = 0  
i=2: "ABA"    → "A" = "A" → LPS[2] = 1
i=3: "ABAB"   → "AB" = "AB" → LPS[3] = 2
i=4: "ABABA"  → "ABA" = "ABA" → LPS[4] = 3
i=5: "ABABAC" → No suffix matches prefix ending with C → LPS[5] = 0

LPS = [0, 0, 1, 2, 3, 0]

Why this helps: When mismatch at position j, we know pattern[0..j-1] matched. The LPS tells us the longest prefix we don’t need to re-check.

vector<int> computeLPS(const string& pattern) {
    int m = pattern.size();
    vector<int> lps(m, 0);  // Longest Proper Prefix which is also Suffix
    
    int len = 0;
    int i = 1;
    
    while (i < m) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    
    return lps;
}

Pattern Matching

vector<int> kmpSearch(const string& text, const string& pattern) {
    int n = text.size(), m = pattern.size();
    vector<int> lps = computeLPS(pattern);
    vector<int> matches;
    
    int i = 0, j = 0;
    
    while (i < n) {
        if (pattern[j] == text[i]) {
            i++;
            j++;
        }
        
        if (j == m) {
            matches.push_back(i - j);  // Match found at index i-j
            j = lps[j - 1];
        } else if (i < n && pattern[j] != text[i]) {
            if (j != 0) {
                j = lps[j - 1];
            } else {
                i++;
            }
        }
    }
    
    return matches;
}

Z-Function

z[i] = length of longest substring starting from i that matches a prefix of s. Example: s = “aabxaab”

i=0: (undefined, by convention z[0]=0 or n)
i=1: s[1..] = "abxaab" vs s = "aabxaab" → "a" matches → z[1] = 1
i=2: s[2..] = "bxaab" vs s → No match → z[2] = 0
i=3: s[3..] = "xaab" vs s → No match → z[3] = 0
i=4: s[4..] = "aab" vs s → "aab" matches → z[4] = 3
i=5: s[5..] = "ab" vs s → "a" matches → z[5] = 1
i=6: s[6..] = "b" vs s → No match → z[6] = 0

Z = [0, 1, 0, 0, 3, 1, 0]

The Optimization (Z-box technique): Maintain [l, r)—the rightmost segment that matches a prefix. If i < r, we can reuse previously computed values! Pattern Matching Trick: To find pattern P in text T:

Create combined = P + ” $" + T ($ is a separator not in P or T)
Compute Z-function
Wherever z[i] == len(P), we found a match at position i - len(P) - 1 in T

vector<int> zFunction(const string& s) {
    int n = s.size();
    vector<int> z(n, 0);
    
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        if (i < r) {
            z[i] = min(r - i, z[i - l]);
        }
        
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
            z[i]++;
        }
        
        if (i + z[i] > r) {
            l = i;
            r = i + z[i];
        }
    }
    
    return z;
}

// Pattern matching using Z-function
vector<int> findPattern(const string& text, const string& pattern) {
    string combined = pattern + "$" + text;
    vector<int> z = zFunction(combined);
    
    vector<int> matches;
    int m = pattern.size();
    
    for (int i = m + 1; i < combined.size(); i++) {
        if (z[i] == m) {
            matches.push_back(i - m - 1);
        }
    }
    
    return matches;
}

Trie (Prefix Tree)

Efficient for prefix-based queries on a dictionary. Structure: A tree where:

Each edge is labeled with a character
Each node represents a prefix (path from root)
Words sharing a prefix share that path

Visual Example with words [“cat”, “car”, “card”, “dog”]:

        (root)
        /    \
       c      d
       |      |
       a      o
      / \     |
     t   r    g*
     *   *
         |
         d*

* = word ends here

Operations:

Insert/Search: O(length of word)
Count words with prefix: O(length of prefix)
Autocomplete: O(prefix length + number of suggestions)

When to use:

Dictionary with prefix queries
XOR maximization (binary trie)
Counting distinct prefixes

class Trie {
public:
    struct Node {
        int children[26] = {};
        int count = 0;        // Words ending here
        int prefixCount = 0;  // Words with this prefix
    };
    
    vector<Node> nodes;
    
    Trie() {
        nodes.push_back(Node());  // Root
    }
    
    void insert(const string& s) {
        int cur = 0;
        for (char c : s) {
            int idx = c - 'a';
            if (!nodes[cur].children[idx]) {
                nodes[cur].children[idx] = nodes.size();
                nodes.push_back(Node());
            }
            cur = nodes[cur].children[idx];
            nodes[cur].prefixCount++;
        }
        nodes[cur].count++;
    }
    
    bool search(const string& s) {
        int cur = 0;
        for (char c : s) {
            int idx = c - 'a';
            if (!nodes[cur].children[idx]) return false;
            cur = nodes[cur].children[idx];
        }
        return nodes[cur].count > 0;
    }
    
    int countPrefix(const string& prefix) {
        int cur = 0;
        for (char c : prefix) {
            int idx = c - 'a';
            if (!nodes[cur].children[idx]) return 0;
            cur = nodes[cur].children[idx];
        }
        return nodes[cur].prefixCount;
    }
};

Pattern 1: Longest Palindromic Substring

Using Hashing + Binary Search

int longestPalindrome(const string& s) {
    int n = s.size();
    string rev = s;
    reverse(rev.begin(), rev.end());
    
    StringHash sh(s);
    StringHash rh(rev);
    
    auto isPalindrome = [&](int l, int r) {
        // Compare s[l..r] with reverse
        return sh.getHash(l, r) == rh.getHash(n - 1 - r, n - 1 - l);
    };
    
    int ans = 1;
    
    // Odd length palindromes
    for (int i = 0; i < n; i++) {
        int lo = 0, hi = min(i, n - 1 - i);
        while (lo < hi) {
            int mid = (lo + hi + 1) / 2;
            if (isPalindrome(i - mid, i + mid)) {
                lo = mid;
            } else {
                hi = mid - 1;
            }
        }
        ans = max(ans, 2 * lo + 1);
    }
    
    // Even length palindromes (similar)
    
    return ans;
}

Manacher’s Algorithm (O(n))

The Problem: Find all palindromes in O(n) time. The Trick: Insert ’#’ between characters to handle even-length palindromes uniformly.

“abba” becomes “#a#b#b#a#”
Now all palindromes have odd length with a center

The Optimization: Similar to Z-function, maintain the rightmost palindrome [c-r, c+r]. If i is inside this palindrome, we can use the mirror position 2c-i to get a starting value for p[i]. Visual:

String: #a#b#a#
p[i]:   0 1 0 3 0 1 0
            ^
           center (full palindrome "#a#b#a#" has radius 3)

Radius 3 in transformed string = palindrome length 3 in original "aba"

Key Insight: p[i] in the transformed string equals the length of the palindrome centered at position i/2 in the original string.

vector<int> manacher(const string& s) {
    string t = "#";
    for (char c : s) {
        t += c;
        t += '#';
    }
    
    int n = t.size();
    vector<int> p(n, 0);  // p[i] = radius of palindrome centered at i
    
    int c = 0, r = 0;  // Center and right boundary of rightmost palindrome
    
    for (int i = 0; i < n; i++) {
        if (i < r) {
            p[i] = min(r - i, p[2 * c - i]);
        }
        
        while (i - p[i] - 1 >= 0 && i + p[i] + 1 < n 
               && t[i - p[i] - 1] == t[i + p[i] + 1]) {
            p[i]++;
        }
        
        if (i + p[i] > r) {
            c = i;
            r = i + p[i];
        }
    }
    
    return p;  // p[i] is the length of palindrome in original string
}

Pattern 2: Distinct Substrings

Problem: Count number of distinct substrings.

long long countDistinct(const string& s) {
    int n = s.size();
    set<pair<long long, long long>> seen;  // Use double hash
    
    DoubleHash dh(s);
    
    for (int i = 0; i < n; i++) {
        for (int j = i; j < n; j++) {
            seen.insert(dh.getHash(i, j));
        }
    }
    
    return seen.size();
}

// More efficient: Use suffix array
// Number of distinct substrings = n*(n+1)/2 - sum of LCP values

Pattern 3: Longest Common Substring

int longestCommonSubstring(const string& s, const string& t) {
    DoubleHash hs(s), ht(t);
    
    auto check = [&](int len) -> bool {
        set<pair<long long, long long>> hashes;
        
        for (int i = 0; i + len <= s.size(); i++) {
            hashes.insert(hs.getHash(i, i + len - 1));
        }
        
        for (int i = 0; i + len <= t.size(); i++) {
            if (hashes.count(ht.getHash(i, i + len - 1))) {
                return true;
            }
        }
        
        return false;
    };
    
    int lo = 0, hi = min(s.size(), t.size());
    while (lo < hi) {
        int mid = (lo + hi + 1) / 2;
        if (check(mid)) {
            lo = mid;
        } else {
            hi = mid - 1;
        }
    }
    
    return lo;
}

Pattern 4: Aho-Corasick (Multiple Pattern Matching)

For searching multiple patterns simultaneously.

class AhoCorasick {
public:
    struct Node {
        int children[26] = {};
        int fail = 0;
        int output = -1;  // Index of pattern ending here
    };
    
    vector<Node> nodes;
    
    AhoCorasick() {
        nodes.push_back(Node());
    }
    
    void insert(const string& s, int idx) {
        int cur = 0;
        for (char c : s) {
            int i = c - 'a';
            if (!nodes[cur].children[i]) {
                nodes[cur].children[i] = nodes.size();
                nodes.push_back(Node());
            }
            cur = nodes[cur].children[i];
        }
        nodes[cur].output = idx;
    }
    
    void build() {
        queue<int> q;
        for (int i = 0; i < 26; i++) {
            if (nodes[0].children[i]) {
                q.push(nodes[0].children[i]);
            }
        }
        
        while (!q.empty()) {
            int u = q.front();
            q.pop();
            
            for (int i = 0; i < 26; i++) {
                if (nodes[u].children[i]) {
                    int v = nodes[u].children[i];
                    int f = nodes[u].fail;
                    
                    while (f && !nodes[f].children[i]) {
                        f = nodes[f].fail;
                    }
                    
                    nodes[v].fail = nodes[f].children[i];
                    if (nodes[v].fail == v) nodes[v].fail = 0;
                    
                    q.push(v);
                }
            }
        }
    }
    
    vector<pair<int, int>> search(const string& text) {
        vector<pair<int, int>> matches;  // {position, pattern_idx}
        
        int cur = 0;
        for (int i = 0; i < text.size(); i++) {
            int c = text[i] - 'a';
            
            while (cur && !nodes[cur].children[c]) {
                cur = nodes[cur].fail;
            }
            cur = nodes[cur].children[c];
            
            int temp = cur;
            while (temp) {
                if (nodes[temp].output != -1) {
                    matches.push_back({i, nodes[temp].output});
                }
                temp = nodes[temp].fail;
            }
        }
        
        return matches;
    }
};

Common Mistakes

Mistake 1: Hash collision Use double hashing for important problems. Single hash can collide.

Mistake 2: Wrong base/mod

Base should be > alphabet size
Mod should be large prime
Common: base = 31 or 37, mod = 1e9 + 7

Mistake 3: Negative hash Always add MOD before taking modulo:

(hash - something % MOD + MOD) % MOD

Practice Problems

Beginner (1000-1300)

Problem	Pattern	Link
String Matching	KMP/Z	CSES
Finding Borders	KMP LPS	CSES

Intermediate (1300-1600)

Problem	Pattern	Link
Finding Periods	Z-function	CSES
Minimal Rotation	Duval	CSES
Longest Palindrome	Manacher	CSES

Advanced (1600-1900)

Problem	Pattern	Link
Pattern Positions	KMP	CSES
Distinct Substrings	Suffix Array	CSES

Key Takeaways

Hashing

O(1) substring comparison after O(n) preprocessing.

KMP/Z-function

O(n + m) pattern matching without false positives.

Trie

Efficient prefix queries and autocomplete.

Manacher

O(n) for all palindromic substrings.

Next Up

Chapter 19: Bit Manipulation

Master bitwise operations, bitmasks, and XOR tricks.

Getting Started

Fundamental Patterns

Greedy & Divide and Conquer

Dynamic Programming

Graph Algorithms

Advanced Topics

Contest Mastery

String Algorithms

String Algorithms

String Problems in CP

Algorithm Selection Guide

String Hashing

Double Hashing (Reduces Collision)

KMP Algorithm

Computing Failure Function (LPS Array)

Pattern Matching

Z-Function

Trie (Prefix Tree)

Pattern 1: Longest Palindromic Substring

Using Hashing + Binary Search

Manacher’s Algorithm (O(n))

Pattern 2: Distinct Substrings

Pattern 3: Longest Common Substring

Pattern 4: Aho-Corasick (Multiple Pattern Matching)

Common Mistakes

Practice Problems

Beginner (1000-1300)

Intermediate (1300-1600)

Advanced (1600-1900)

Key Takeaways

Hashing

KMP/Z-function

Trie

Manacher

Next Up

Chapter 19: Bit Manipulation

Getting Started

Fundamental Patterns

Greedy & Divide and Conquer

Dynamic Programming

Graph Algorithms

Advanced Topics

Contest Mastery

​String Algorithms

​String Problems in CP

​Algorithm Selection Guide

​String Hashing

​Double Hashing (Reduces Collision)

​KMP Algorithm

​Computing Failure Function (LPS Array)

​Pattern Matching

​Z-Function

​Trie (Prefix Tree)

​Pattern 1: Longest Palindromic Substring

​Using Hashing + Binary Search

​Manacher’s Algorithm (O(n))

​Pattern 2: Distinct Substrings

​Pattern 3: Longest Common Substring

​Pattern 4: Aho-Corasick (Multiple Pattern Matching)

​Common Mistakes

​Practice Problems

​Beginner (1000-1300)

​Intermediate (1300-1600)

​Advanced (1600-1900)

​Key Takeaways

Hashing

KMP/Z-function

Trie

Manacher

​Next Up

Chapter 19: Bit Manipulation

String Algorithms

String Problems in CP

Algorithm Selection Guide

String Hashing

Double Hashing (Reduces Collision)

KMP Algorithm

Computing Failure Function (LPS Array)

Pattern Matching

Z-Function

Trie (Prefix Tree)

Pattern 1: Longest Palindromic Substring

Using Hashing + Binary Search

Manacher’s Algorithm (O(n))

Pattern 2: Distinct Substrings

Pattern 3: Longest Common Substring

Pattern 4: Aho-Corasick (Multiple Pattern Matching)

Common Mistakes

Practice Problems

Beginner (1000-1300)

Intermediate (1300-1600)

Advanced (1600-1900)

Key Takeaways

Next Up