Writing a Huffman Compressor: Prefix Codes, Bit Packing, and the Header Problem

Compression is one of those topics that sits in a peculiar spot: everyone uses it constantly, almost nobody writes one. The algorithms behind gzip, zip, zstd get treated as a black box you pipe bytes through. Huffman coding is the smallest interesting member of that family, and writing one by hand is the cleanest way to surface the parts of compression that aren’t really about compression at all: how you pack variable-length codes into fixed-size bytes, how the decoder finds the table, and what “the end of the file” even means when your last byte is half real data and half padding. ...

May 12, 2026 · Giulia

Writing a JSON Parser From Scratch: Lexer, Recursive Descent, and the Cases You Forget

I’ve been reading JSON for years without ever writing the code that parses it. That’s a strange gap: it’s a format we touch most days, and the one whose internals I’d thought about least. So I sat down one evening and wrote one from scratch in Python, no json.loads, no regex shortcuts, just the spec and a blank file. The structure is the textbook one: a lexer that turns characters into tokens, and a recursive-descent parser that turns tokens into a tree. What I want to write about is less the structure itself (that part is well-trodden) and more the places where the spec is sharper than it looks, and where a reasonable-seeming shortcut quietly produces a parser that accepts invalid input. ...

May 10, 2026 · Giulia

Reimplementing wc: Where Text Abstractions Leak

Reimplementing wc is less about counting and more about confronting the places where “text” stops being a clean abstraction. The interface is trivial; the semantics are not. Most of the interesting behavior lives at the boundary between bytes, encodings, and Unix I/O conventions. Below is a minimal clone. It’s deliberately scoped: correct along a few dimensions, incomplete along others. #!/usr/bin/env python3 import argparse, sys def get_bytes(raw): return len(raw) def get_lines(text): return text.count('\n') def get_words(text): return len(text.split()) def get_chars(text): return len(text) def main(): parser = argparse.ArgumentParser() parser.add_argument('file_path', nargs='?') parser.add_argument('-c', action="store_true") parser.add_argument('-l', action="store_true") parser.add_argument('-w', action="store_true") parser.add_argument('-m', action="store_true") args = parser.parse_args() if args.file_path: with open(args.file_path, "rb") as f: raw = f.read() label = args.file_path elif not sys.stdin.isatty(): raw = sys.stdin.buffer.read() label = None else: return text = raw.decode('utf-8') if not any([args.c, args.l, args.w, args.m]): args.c = args.l = args.w = True results = [] if args.l: results.append(str(get_lines(text))) if args.w: results.append(str(get_words(text))) if args.c: results.append(str(get_bytes(raw))) if args.m: results.append(str(get_chars(text))) parts = ' '.join(results) print(f"{parts} {label}" if label else parts) if __name__ == "__main__": main() Bytes vs characters is an interface boundary, not trivia The -c / -m distinction is where most implementations quietly diverge from spec. ...

May 4, 2026 · Giulia

C: The Memory Model and Stack Frames

This is part two of two. Part one covers variables, symbolic constants, and the preprocessor. C always evaluates expressions before executing C evaluates every expression before running it. If main is missing, the linker signals an error, that’s the entry point contract. Parameters to main (argc, argv) behave like local variables. No garbage collector In C there’s no garbage collector. Local variables are not “objects” managed by a runtime; they’re temporary allocations on registers or the stack. When the function returns, they’re gone. There’s no tracing, no finalization, no safety net. This is by design. ...

March 28, 2026 · Giulia

C: Variables, Symbolic Constants, and the Preprocessor

This is part one of two. Part two covers the C memory model and stack frames. -Wall If I use the -Wall flag, the compiler emits warnings. Simple rule: always use it. 0 means more than zero The value 0 is equivalent to false in C boolean expressions. It also shows up in shell short-circuit evaluation. In an expression like ./a.out && ls, ls only executes if ./a.out returns 0. That’s because 0 is the success exit code, and && short-circuits on failure. So ./a.out returning 0 makes ls run. The 0 in the exit code sense is “success / true” from the shell’s perspective, opposite of the C boolean convention, which trips people up. ...

March 22, 2026 · Giulia

What Runs Before Your Program Does: The Dynamic Linker, PLT, and GOT

In the previous post, we saw that dynamically linked binaries don’t include library code, they just reference it. Which raises an obvious question: who actually loads that code at runtime, and how does the program find it? The answer is ld.so, the dynamic linker. And it runs before your main() does. The Bootstrap Problem When you execute a binary, the kernel reads its ELF header. One of the fields in that header (stored in the .interp section) contains a path like: ...

March 14, 2026 · Giulia

What Actually Happens When You Compile a C Program

Most developers treat compilation as a black box: you put source code in, you get a binary out. But there are actually three distinct stages happening, each with its own job and its own output. Understanding them has made me significantly better at debugging linker errors, understanding dependencies, and reasoning about what ends up in a binary. Let’s walk through what happens when you run gcc hello.c -o hello. Stage 1: The Preprocessor Before the compiler sees a single line of your code, the preprocessor runs. Its job is mechanical: it handles all the # directives. ...

March 8, 2026 · Giulia