Week 14 — Text Processing & Graph Algorithms
Pattern matching, tries, dynamic programming on strings; graph ADTs, traversals, shortest paths, and minimum spanning trees.
Text Processing
Pattern-matching algorithms (brute force, KMP, Boyer–Moore), tries, and dynamic-programming approaches like edit distance and LCS.
Theory
Brute-force pattern matching is O(n·m); KMP uses a failure function for O(n + m).
A trie indexes a set of strings by character with O(L) lookup, where L is pattern length — used in spell-checkers and autocomplete.
Edit distance and LCS are textbook dynamic-programming problems with O(n·m) tables.
Key points
- Suffix tries / suffix arrays answer substring queries in O(m).
- DP recurrences become straightforward once you define the table dimensions and base cases.
Code examples
def edit_distance(a, b):
n, m = len(a), len(b)
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(n + 1): dp[i][0] = i
for j in range(m + 1): dp[0][j] = j
for i in range(1, n + 1):
for j in range(1, m + 1):
cost = 0 if a[i-1] == b[j-1] else 1
dp[i][j] = min(
dp[i-1][j] + 1, # delete
dp[i][j-1] + 1, # insert
dp[i-1][j-1] + cost, # replace
)
return dp[n][m]Graph Algorithms
Graph ADTs and representations, BFS/DFS, shortest paths (Dijkstra), and minimum spanning trees (Prim, Kruskal).
Theory
A graph is a pair (V, E). Adjacency lists use O(V + E) space and are best for sparse graphs; adjacency matrices use O(V²) and are best for dense graphs.
BFS computes shortest paths in unweighted graphs in O(V + E); DFS classifies edges and discovers topological order on DAGs.
Dijkstra's algorithm computes single-source shortest paths in non-negative weighted graphs in O((V + E) log V) using a min-heap.
Prim and Kruskal each compute a minimum spanning tree in O(E log V).
Key points
- Dijkstra requires non-negative weights — use Bellman–Ford for negative edges.
- Kruskal needs a disjoint-set (union–find) structure with path compression.
- BFS shortest path on an unweighted graph is asymptotically optimal.
Code examples
from collections import deque
def bfs_distance(adj, src):
dist = {src: 0}
q = deque([src])
while q:
u = q.popleft()
for v in adj[u]:
if v not in dist:
dist[v] = dist[u] + 1
q.append(v)
return distimport heapq
def dijkstra(adj, src):
dist = {src: 0}
pq = [(0, src)]
while pq:
d, u = heapq.heappop(pq)
if d > dist[u]: continue
for v, w in adj[u]:
nd = d + w
if nd < dist.get(v, float("inf")):
dist[v] = nd
heapq.heappush(pq, (nd, v))
return dist