15Module · Goodrich Ch. 15
Week 15 — Memory Management & B-Trees
Memory hierarchy, external-memory algorithms, B-trees, and a synthesis review of the course.
Memory Management & B-Trees
How CPU caches, RAM, and disk drive data-structure design; external-memory algorithms; and B-trees as the workhorse of databases and filesystems.
Theory
The memory hierarchy spans registers → L1/L2/L3 cache → RAM → SSD/disk, each ~10× slower and ~10× larger than the level above.
External-memory algorithms minimize block (page) transfers because I/O dominates runtime when data spills beyond RAM.
A B-tree of order d keeps all leaves at the same depth and packs d–2d keys per node, giving O(log_d n) I/O cost — typically 3–4 disk accesses for billion-key indexes.
B-trees and their variants (B+, B*) power PostgreSQL, SQLite, NTFS, ext4, and most databases.
Key points
- Cache-aware algorithms align data to cache-line boundaries to minimize evictions.
- B-tree height grows logarithmically with branching factor — wider nodes mean shallower trees.
- External merge-sort processes data in runs sized to RAM, then merges from disk.
Code examples
B-tree node skeleton
B-tree node skeleton
class BTreeNode:
def __init__(self, t, leaf=False):
self.t = t # minimum degree
self.leaf = leaf
self.keys = [] # at most 2t - 1
self.children = [] # at most 2t
def search(self, k):
i = 0
while i < len(self.keys) and k > self.keys[i]:
i += 1
if i < len(self.keys) and self.keys[i] == k:
return (self, i)
if self.leaf:
return None
return self.children[i].search(k)External merge-sort sketch
External merge-sort sketch
# 1. Read M-sized chunks of the file, sort each in RAM,
# write back as a sorted "run".
# 2. K-way merge the runs using a min-heap that holds the
# current smallest unread element from each run.
# 3. Total I/O: O(N * log_M (N/M)) block transfers.