15Module · Goodrich Ch. 15

Week 15 — Memory Management & B-Trees

Memory hierarchy, external-memory algorithms, B-trees, and a synthesis review of the course.

Memory Management & B-Trees

How CPU caches, RAM, and disk drive data-structure design; external-memory algorithms; and B-trees as the workhorse of databases and filesystems.

Theory

The memory hierarchy spans registers → L1/L2/L3 cache → RAM → SSD/disk, each ~10× slower and ~10× larger than the level above.

External-memory algorithms minimize block (page) transfers because I/O dominates runtime when data spills beyond RAM.

A B-tree of order d keeps all leaves at the same depth and packs d–2d keys per node, giving O(log_d n) I/O cost — typically 3–4 disk accesses for billion-key indexes.

B-trees and their variants (B+, B*) power PostgreSQL, SQLite, NTFS, ext4, and most databases.

Key points

  • Cache-aware algorithms align data to cache-line boundaries to minimize evictions.
  • B-tree height grows logarithmically with branching factor — wider nodes mean shallower trees.
  • External merge-sort processes data in runs sized to RAM, then merges from disk.

Code examples

B-tree node skeleton
B-tree node skeleton
class BTreeNode:
    def __init__(self, t, leaf=False):
        self.t = t                # minimum degree
        self.leaf = leaf
        self.keys = []            # at most 2t - 1
        self.children = []        # at most 2t

    def search(self, k):
        i = 0
        while i < len(self.keys) and k > self.keys[i]:
            i += 1
        if i < len(self.keys) and self.keys[i] == k:
            return (self, i)
        if self.leaf:
            return None
        return self.children[i].search(k)
External merge-sort sketch
External merge-sort sketch
# 1. Read M-sized chunks of the file, sort each in RAM,
#    write back as a sorted "run".
# 2. K-way merge the runs using a min-heap that holds the
#    current smallest unread element from each run.
# 3. Total I/O: O(N * log_M (N/M)) block transfers.