Writing a Huffman Compressor: Prefix Codes, Bit Packing, and the Header Problem
Compression is one of those topics that sits in a peculiar spot: everyone uses it constantly, almost nobody writes one. The algorithms behind gzip, zip, zstd get treated as a black box you pipe bytes through. Huffman coding is the smallest interesting member of that family, and writing one by hand is the cleanest way to surface the parts of compression that aren’t really about compression at all: how you pack variable-length codes into fixed-size bytes, how the decoder finds the table, and what “the end of the file” even means when your last byte is half real data and half padding. ...