> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# String Algorithms

> Master hashing, KMP, Z-function, and Trie for pattern matching

# String Algorithms

## String Problems in CP

String algorithms unlock pattern matching, palindrome detection, and text processing problems. The key is choosing the right tool.

**Why specialized algorithms?** Naive string matching (check every position) is O(n\*m). For n = m = 10^6, that is 10^12 operations--far too slow. KMP, Z-function, and hashing each bring this down to O(n + m) by cleverly reusing work from previous comparisons. The intuition behind all of them: when a mismatch occurs, you have already matched some characters, and that partial match tells you where to look next without starting over.

<Note>
  **Pattern Recognition Signals**:

  * "Find pattern in text" → KMP, Z-function, or Hashing
  * "Count occurrences of pattern" → KMP or Hashing
  * "Longest palindrome" → Manacher's algorithm
  * "Prefix queries on dictionary" → Trie
  * "Compare substrings" → Hashing with binary search
</Note>

***

## Algorithm Selection Guide

| Problem Type                        | Best Algorithm   | Complexity     |
| ----------------------------------- | ---------------- | -------------- |
| Single pattern search               | KMP / Z-function | O(n + m)       |
| Multiple pattern search             | Aho-Corasick     | O(n + m + k)   |
| Substring comparison                | Rolling Hash     | O(1) per query |
| Longest palindromic substring       | Manacher         | O(n)           |
| Prefix/suffix queries               | Trie             | O(length)      |
| Lexicographically smallest rotation | Duval / Booth    | O(n)           |

***

## String Hashing

Convert strings to numbers for fast comparison.

**The Problem**: Comparing two strings of length n takes O(n). For many comparisons, this is too slow.

**The Insight**: Map each string to a number (hash). If hashes differ, strings differ. If hashes match, strings are *probably* equal.

**How it works**: Treat the string as a number in base B:
$H(s) = s[0] \cdot B^{n-1} + s[1] \cdot B^{n-2} + ... + s[n-1] \cdot B^0 \pmod{MOD}$

**Example**: "abc" with B=31:
$H = 1 \cdot 31^2 + 2 \cdot 31^1 + 3 \cdot 31^0 = 961 + 62 + 3 = 1026$

**Substring Hash in O(1)**: Using prefix hashes:
$H(s[l..r]) = H(s[0..r]) - H(s[0..l-1]) \cdot B^{r-l+1}$

**Collision Risk**: Two different strings can have the same hash. Use:

* Large prime MOD (10^9 + 7)
* **Double hashing**: Two different (BASE, MOD) pairs--collision probability becomes \~1/10^18

<Warning>
  **Contest edge case**: On Codeforces, some problem setters specifically add anti-hash tests for single-hash solutions. If your hashing solution gets WA on a string problem, switching to double hashing (or using a different base/mod pair) often fixes it. Always prefer double hashing for problems with large test cases.
</Warning>

```cpp theme={null}
class StringHash {
public:
    const long long MOD = 1e9 + 7;
    const long long BASE = 31;
    
    int n;
    vector<long long> hash_, power;
    
    StringHash(const string& s) : n(s.size()), hash_(n + 1), power(n + 1) {
        power[0] = 1;
        for (int i = 0; i < n; i++) {
            hash_[i + 1] = (hash_[i] * BASE + (s[i] - 'a' + 1)) % MOD;
            power[i + 1] = (power[i] * BASE) % MOD;
        }
    }
    
    // Get hash of s[l..r] (0-indexed, inclusive)
    long long getHash(int l, int r) {
        long long h = (hash_[r + 1] - hash_[l] * power[r - l + 1] % MOD + MOD) % MOD;
        return h;
    }
};

// Usage
string s = "abcabc";
StringHash sh(s);
cout << (sh.getHash(0, 2) == sh.getHash(3, 5));  // true: "abc" == "abc"
```

### Double Hashing (Reduces Collision)

```cpp theme={null}
class DoubleHash {
public:
    const long long MOD1 = 1e9 + 7, MOD2 = 1e9 + 9;
    const long long BASE1 = 31, BASE2 = 37;
    
    int n;
    vector<long long> h1, h2, p1, p2;
    
    DoubleHash(const string& s) : n(s.size()), 
        h1(n + 1), h2(n + 1), p1(n + 1), p2(n + 1) {
        p1[0] = p2[0] = 1;
        for (int i = 0; i < n; i++) {
            int c = s[i] - 'a' + 1;
            h1[i + 1] = (h1[i] * BASE1 + c) % MOD1;
            h2[i + 1] = (h2[i] * BASE2 + c) % MOD2;
            p1[i + 1] = p1[i] * BASE1 % MOD1;
            p2[i + 1] = p2[i] * BASE2 % MOD2;
        }
    }
    
    pair<long long, long long> getHash(int l, int r) {
        long long hash1 = (h1[r + 1] - h1[l] * p1[r - l + 1] % MOD1 + MOD1) % MOD1;
        long long hash2 = (h2[r + 1] - h2[l] * p2[r - l + 1] % MOD2 + MOD2) % MOD2;
        return {hash1, hash2};
    }
};
```

***

## KMP Algorithm

Find all occurrences of pattern in text in O(n + m).

**The Problem**: Naive search is O(n·m)—for each position in text, compare entire pattern.

**KMP's Insight**: When a mismatch occurs, we've already matched some characters. Use this information to skip ahead instead of starting over.

### Computing Failure Function (LPS Array)

**LPS\[i]** = length of the **L**ongest **P**roper **P**refix of pattern\[0..i] which is also a **S**uffix.

**Example**: pattern = "ABABAC"

```
i=0: "A"      → No proper prefix → LPS[0] = 0
i=1: "AB"     → No match → LPS[1] = 0  
i=2: "ABA"    → "A" = "A" → LPS[2] = 1
i=3: "ABAB"   → "AB" = "AB" → LPS[3] = 2
i=4: "ABABA"  → "ABA" = "ABA" → LPS[4] = 3
i=5: "ABABAC" → No suffix matches prefix ending with C → LPS[5] = 0

LPS = [0, 0, 1, 2, 3, 0]
```

**Why this helps**: When mismatch at position j, we know pattern\[0..j-1] matched. The LPS tells us the longest prefix we don't need to re-check.

```cpp theme={null}
vector<int> computeLPS(const string& pattern) {
    int m = pattern.size();
    vector<int> lps(m, 0);  // Longest Proper Prefix which is also Suffix
    
    int len = 0;
    int i = 1;
    
    while (i < m) {
        if (pattern[i] == pattern[len]) {
            len++;
            lps[i] = len;
            i++;
        } else {
            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    
    return lps;
}
```

### Pattern Matching

```cpp theme={null}
vector<int> kmpSearch(const string& text, const string& pattern) {
    int n = text.size(), m = pattern.size();
    vector<int> lps = computeLPS(pattern);
    vector<int> matches;
    
    int i = 0, j = 0;
    
    while (i < n) {
        if (pattern[j] == text[i]) {
            i++;
            j++;
        }
        
        if (j == m) {
            matches.push_back(i - j);  // Match found at index i-j
            j = lps[j - 1];
        } else if (i < n && pattern[j] != text[i]) {
            if (j != 0) {
                j = lps[j - 1];
            } else {
                i++;
            }
        }
    }
    
    return matches;
}
```

***

## Z-Function

**z\[i]** = length of longest substring starting from i that matches a prefix of s.

**Example**: s = "aabxaab"

```
i=0: (undefined, by convention z[0]=0 or n)
i=1: s[1..] = "abxaab" vs s = "aabxaab" → "a" matches → z[1] = 1
i=2: s[2..] = "bxaab" vs s → No match → z[2] = 0
i=3: s[3..] = "xaab" vs s → No match → z[3] = 0
i=4: s[4..] = "aab" vs s → "aab" matches → z[4] = 3
i=5: s[5..] = "ab" vs s → "a" matches → z[5] = 1
i=6: s[6..] = "b" vs s → No match → z[6] = 0

Z = [0, 1, 0, 0, 3, 1, 0]
```

**The Optimization** (Z-box technique): Maintain \[l, r)—the rightmost segment that matches a prefix. If i \< r, we can reuse previously computed values!

**Pattern Matching Trick**: To find pattern P in text T:

1. Create combined = P + "$" + T ($ is a separator not in P or T)
2. Compute Z-function
3. Wherever z\[i] == len(P), we found a match at position i - len(P) - 1 in T

```cpp theme={null}
vector<int> zFunction(const string& s) {
    int n = s.size();
    vector<int> z(n, 0);
    
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        if (i < r) {
            z[i] = min(r - i, z[i - l]);
        }
        
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
            z[i]++;
        }
        
        if (i + z[i] > r) {
            l = i;
            r = i + z[i];
        }
    }
    
    return z;
}

// Pattern matching using Z-function
vector<int> findPattern(const string& text, const string& pattern) {
    string combined = pattern + "$" + text;
    vector<int> z = zFunction(combined);
    
    vector<int> matches;
    int m = pattern.size();
    
    for (int i = m + 1; i < combined.size(); i++) {
        if (z[i] == m) {
            matches.push_back(i - m - 1);
        }
    }
    
    return matches;
}
```

***

## Trie (Prefix Tree)

Efficient for prefix-based queries on a dictionary.

**Structure**: A tree where:

* Each edge is labeled with a character
* Each node represents a prefix (path from root)
* Words sharing a prefix share that path

**Visual Example** with words \["cat", "car", "card", "dog"]:

```
        (root)
        /    \
       c      d
       |      |
       a      o
      / \     |
     t   r    g*
     *   *
         |
         d*

* = word ends here
```

**Operations**:

* Insert/Search: O(length of word)
* Count words with prefix: O(length of prefix)
* Autocomplete: O(prefix length + number of suggestions)

**When to use**:

* Dictionary with prefix queries
* XOR maximization (binary trie)
* Counting distinct prefixes

```cpp theme={null}
class Trie {
public:
    struct Node {
        int children[26] = {};
        int count = 0;        // Words ending here
        int prefixCount = 0;  // Words with this prefix
    };
    
    vector<Node> nodes;
    
    Trie() {
        nodes.push_back(Node());  // Root
    }
    
    void insert(const string& s) {
        int cur = 0;
        for (char c : s) {
            int idx = c - 'a';
            if (!nodes[cur].children[idx]) {
                nodes[cur].children[idx] = nodes.size();
                nodes.push_back(Node());
            }
            cur = nodes[cur].children[idx];
            nodes[cur].prefixCount++;
        }
        nodes[cur].count++;
    }
    
    bool search(const string& s) {
        int cur = 0;
        for (char c : s) {
            int idx = c - 'a';
            if (!nodes[cur].children[idx]) return false;
            cur = nodes[cur].children[idx];
        }
        return nodes[cur].count > 0;
    }
    
    int countPrefix(const string& prefix) {
        int cur = 0;
        for (char c : prefix) {
            int idx = c - 'a';
            if (!nodes[cur].children[idx]) return 0;
            cur = nodes[cur].children[idx];
        }
        return nodes[cur].prefixCount;
    }
};
```

***

## Pattern 1: Longest Palindromic Substring

### Using Hashing + Binary Search

```cpp theme={null}
int longestPalindrome(const string& s) {
    int n = s.size();
    string rev = s;
    reverse(rev.begin(), rev.end());
    
    StringHash sh(s);
    StringHash rh(rev);
    
    auto isPalindrome = [&](int l, int r) {
        // Compare s[l..r] with reverse
        return sh.getHash(l, r) == rh.getHash(n - 1 - r, n - 1 - l);
    };
    
    int ans = 1;
    
    // Odd length palindromes
    for (int i = 0; i < n; i++) {
        int lo = 0, hi = min(i, n - 1 - i);
        while (lo < hi) {
            int mid = (lo + hi + 1) / 2;
            if (isPalindrome(i - mid, i + mid)) {
                lo = mid;
            } else {
                hi = mid - 1;
            }
        }
        ans = max(ans, 2 * lo + 1);
    }
    
    // Even length palindromes (similar)
    
    return ans;
}
```

### Manacher's Algorithm (O(n))

**The Problem**: Find all palindromes in O(n) time.

**The Trick**: Insert '#' between characters to handle even-length palindromes uniformly.

* "abba" becomes "#a#b#b#a#"
* Now all palindromes have odd length with a center

**The Optimization**: Similar to Z-function, maintain the rightmost palindrome \[c-r, c+r]. If i is inside this palindrome, we can use the mirror position 2c-i to get a starting value for p\[i].

**Visual**:

```
String: #a#b#a#
p[i]:   0 1 0 3 0 1 0
            ^
           center (full palindrome "#a#b#a#" has radius 3)

Radius 3 in transformed string = palindrome length 3 in original "aba"
```

**Key Insight**: p\[i] in the transformed string equals the length of the palindrome centered at position i/2 in the original string.

```cpp theme={null}
vector<int> manacher(const string& s) {
    string t = "#";
    for (char c : s) {
        t += c;
        t += '#';
    }
    
    int n = t.size();
    vector<int> p(n, 0);  // p[i] = radius of palindrome centered at i
    
    int c = 0, r = 0;  // Center and right boundary of rightmost palindrome
    
    for (int i = 0; i < n; i++) {
        if (i < r) {
            p[i] = min(r - i, p[2 * c - i]);
        }
        
        while (i - p[i] - 1 >= 0 && i + p[i] + 1 < n 
               && t[i - p[i] - 1] == t[i + p[i] + 1]) {
            p[i]++;
        }
        
        if (i + p[i] > r) {
            c = i;
            r = i + p[i];
        }
    }
    
    return p;  // p[i] is the length of palindrome in original string
}
```

***

## Pattern 2: Distinct Substrings

**Problem**: Count number of distinct substrings.

```cpp theme={null}
long long countDistinct(const string& s) {
    int n = s.size();
    set<pair<long long, long long>> seen;  // Use double hash
    
    DoubleHash dh(s);
    
    for (int i = 0; i < n; i++) {
        for (int j = i; j < n; j++) {
            seen.insert(dh.getHash(i, j));
        }
    }
    
    return seen.size();
}

// More efficient: Use suffix array
// Number of distinct substrings = n*(n+1)/2 - sum of LCP values
```

***

## Pattern 3: Longest Common Substring

```cpp theme={null}
int longestCommonSubstring(const string& s, const string& t) {
    DoubleHash hs(s), ht(t);
    
    auto check = [&](int len) -> bool {
        set<pair<long long, long long>> hashes;
        
        for (int i = 0; i + len <= s.size(); i++) {
            hashes.insert(hs.getHash(i, i + len - 1));
        }
        
        for (int i = 0; i + len <= t.size(); i++) {
            if (hashes.count(ht.getHash(i, i + len - 1))) {
                return true;
            }
        }
        
        return false;
    };
    
    int lo = 0, hi = min(s.size(), t.size());
    while (lo < hi) {
        int mid = (lo + hi + 1) / 2;
        if (check(mid)) {
            lo = mid;
        } else {
            hi = mid - 1;
        }
    }
    
    return lo;
}
```

***

## Pattern 4: Aho-Corasick (Multiple Pattern Matching)

For searching multiple patterns simultaneously.

```cpp theme={null}
class AhoCorasick {
public:
    struct Node {
        int children[26] = {};
        int fail = 0;
        int output = -1;  // Index of pattern ending here
    };
    
    vector<Node> nodes;
    
    AhoCorasick() {
        nodes.push_back(Node());
    }
    
    void insert(const string& s, int idx) {
        int cur = 0;
        for (char c : s) {
            int i = c - 'a';
            if (!nodes[cur].children[i]) {
                nodes[cur].children[i] = nodes.size();
                nodes.push_back(Node());
            }
            cur = nodes[cur].children[i];
        }
        nodes[cur].output = idx;
    }
    
    void build() {
        queue<int> q;
        for (int i = 0; i < 26; i++) {
            if (nodes[0].children[i]) {
                q.push(nodes[0].children[i]);
            }
        }
        
        while (!q.empty()) {
            int u = q.front();
            q.pop();
            
            for (int i = 0; i < 26; i++) {
                if (nodes[u].children[i]) {
                    int v = nodes[u].children[i];
                    int f = nodes[u].fail;
                    
                    while (f && !nodes[f].children[i]) {
                        f = nodes[f].fail;
                    }
                    
                    nodes[v].fail = nodes[f].children[i];
                    if (nodes[v].fail == v) nodes[v].fail = 0;
                    
                    q.push(v);
                }
            }
        }
    }
    
    vector<pair<int, int>> search(const string& text) {
        vector<pair<int, int>> matches;  // {position, pattern_idx}
        
        int cur = 0;
        for (int i = 0; i < text.size(); i++) {
            int c = text[i] - 'a';
            
            while (cur && !nodes[cur].children[c]) {
                cur = nodes[cur].fail;
            }
            cur = nodes[cur].children[c];
            
            int temp = cur;
            while (temp) {
                if (nodes[temp].output != -1) {
                    matches.push_back({i, nodes[temp].output});
                }
                temp = nodes[temp].fail;
            }
        }
        
        return matches;
    }
};
```

***

## Common Mistakes

<Warning>
  **Mistake 1: Hash collision**
  Use double hashing for important problems. Single hash can collide. On Codeforces, problem setters often add anti-hash tests that specifically target common single-hash parameters.
</Warning>

<Warning>
  **Mistake 2: Wrong base/mod**

  * Base should be > alphabet size (26 for lowercase letters, so use 31+)
  * Mod should be large prime (10^9 + 7 or 10^9 + 9)
  * Using base = 26 and mapping 'a' to 0 is dangerous: all strings starting with 'a' have hash 0 for the first character, increasing collision risk
  * Safe mapping: `s[i] - 'a' + 1` (so 'a' maps to 1, not 0)
</Warning>

<Warning>
  **Mistake 3: Negative hash values**
  In C++, the `%` operator can return negative values for negative operands. Always add MOD before taking modulo:

  ```cpp theme={null}
  // WRONG -- can produce negative hash
  long long h = (hash_[r+1] - hash_[l] * power[r-l+1]) % MOD;

  // CORRECT -- add MOD to ensure non-negative
  long long h = (hash_[r+1] - hash_[l] * power[r-l+1] % MOD + MOD) % MOD;
  ```
</Warning>

<Warning>
  **Mistake 4: Off-by-one in substring hashing**
  When computing hash of s\[l..r] (inclusive), the formula uses `power[r - l + 1]` (the length of the substring). Double-check whether your getHash function uses inclusive or exclusive bounds -- mismatches cause subtle bugs.
</Warning>

***

## Practice Problems

### Beginner (1000-1300)

| Problem         | Pattern | Link                                         |
| --------------- | ------- | -------------------------------------------- |
| String Matching | KMP/Z   | [CSES](https://cses.fi/problemset/task/1753) |
| Finding Borders | KMP LPS | [CSES](https://cses.fi/problemset/task/1732) |

### Intermediate (1300-1600)

| Problem            | Pattern    | Link                                         |
| ------------------ | ---------- | -------------------------------------------- |
| Finding Periods    | Z-function | [CSES](https://cses.fi/problemset/task/1733) |
| Minimal Rotation   | Duval      | [CSES](https://cses.fi/problemset/task/1110) |
| Longest Palindrome | Manacher   | [CSES](https://cses.fi/problemset/task/1111) |

### Advanced (1600-1900)

| Problem             | Pattern      | Link                                         |
| ------------------- | ------------ | -------------------------------------------- |
| Pattern Positions   | KMP          | [CSES](https://cses.fi/problemset/task/2102) |
| Distinct Substrings | Suffix Array | [CSES](https://cses.fi/problemset/task/2105) |

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="Hashing" icon="hashtag">
    O(1) substring comparison after O(n) preprocessing.
  </Card>

  <Card title="KMP/Z-function" icon="magnifying-glass">
    O(n + m) pattern matching without false positives.
  </Card>

  <Card title="Trie" icon="diagram-project">
    Efficient prefix queries and autocomplete.
  </Card>

  <Card title="Manacher" icon="reflect-horizontal">
    O(n) for all palindromic substrings.
  </Card>
</CardGroup>

***

## Next Up

<Card title="Chapter 19: Bit Manipulation" icon="arrow-right" href="/courses/competitive-programming/19-bit-manipulation">
  Master bitwise operations, bitmasks, and XOR tricks.
</Card>
