Registry Parser
All articles

Windows Registry internals: a field guide to the regf format

4 min read

Most people interact with the registry as a tree of keys and values in RegEdit, or as a list of artifacts in a forensic tool. Underneath that tidy abstraction is a binary database format — regf — that has accreted features for thirty years while staying backward-compatible. The day you hit a corrupted hive, a tool that silently drops half the keys, or a value that comes back truncated, the abstraction stops helping and you need to know what is actually in the file.

This is a field guide to those internals. Each section links a focused deep-dive that explains one structure, what it means, and what it implies for anyone parsing, recovering, or trusting a hive. We parse these structures in the browser, and the existing regf hive format overview is the gentler on-ramp if you want it.

A note on sources: the single best public research on registry internals is Google Project Zero's Windows Registry Adventure series by Mateusz Jurczyk. The posts below are our own explanations — written from a parser's point of view — and cite that series and other primary sources as authoritative further reading.

The on-disk format (regf)

A hive is a base block followed by a sequence of allocation containers full of typed records. Get the format right and everything else follows; get it wrong and you corrupt your analysis silently.

  • The regf base block — the hive header: signature, sequence numbers, root cell, checksum, and why you validate it first.
  • Hive bins and cells — how a hive allocates space in 4 KB hbins and variable cells, and why the signed cell-size field marks free space (the basis of recovery).
  • The nk record — how a key is stored, including the LastWritten timestamp DFIR relies on.
  • The vk record — how a value is stored, and the inline small-data trick.
  • Subkey lists: lf, lh, li, ri — the four cell types that index a key's children.
  • Big-data records — how values larger than ~16 KB are split across segments, and how parsers truncate them.
  • The sk record — the shared, reference-counted security descriptors that hold each key's ACL.
  • Registry value types on disk — how each REG_* type is encoded, and why the stored type can't be trusted to match the data.

The logical layout

The file format is only half the story. The live registry is assembled from several hives — some of which never touch disk.

The runtime

How the kernel turns on-disk cells into the live registry — and why none of it survives in a static file.

Integrity and recovery

The registry is a database, with the crash-consistency machinery to match.

Security and history

Why this matters for analysis

Every forensic artifact you pull from the registry — the RegRipper-style plugins, the timelines, the findings — rests on correctly reading these structures. A parser that misreads a cell size, ignores transaction logs, or truncates a big-data value doesn't error out; it quietly gives you less than the hive contains. Understanding the format is how you know whether to trust the output. You can load a hive and inspect it yourself — entirely client-side, with nothing uploaded.