Subkey lists: the lf, lh, li and ri cells that index the registry

A registry key does not store its children inline. The nk record that represents the key carries a single offset to a separate cell — the subkey list — and that cell is where the actual index of children lives. There is not one subkey list format. There are four: lf, lh, li and ri. They are the registry index, and a parser that handles only the common one will silently lose keys in exactly the hives you most want to read correctly. This post covers what each of those four cell types is, why there are four, the name-hash trick lh uses, and why a parser has to walk an index root recursively or quietly drop half a key's children.

If you have not read the regf hive format overview or the nk key node record breakdown, start there — this is a zoom-in on one field of the nk record, and it assumes you know what a cell is.

Where the list pointer comes from

Each nk record has a subkey count and a subkey-list offset. Dereference that offset and you land on a cell whose first two bytes are an ASCII signature telling you which of the four shapes you have: lf (fast leaf), lh (hash leaf), li (index leaf), or ri (index root, a list of lists).

All four share the same two-field header: a 2-byte signature at offset 0x00, and a 2-byte element count at offset 0x02. After that the layout diverges — that shared prefix is the only thing you can assume before branching on the signature.

li — the plain list

li is the simplest and the oldest. After the count, it is nothing but an array of 4-byte cell offsets, one per child key. Each offset points directly at a child nk. No hint, no hash, no extra metadata — four bytes per entry.

That compactness is also its weakness. To find a subkey by name in an li, you dereference each offset, read the nk it points to, pull the name out, and compare. The names you are searching live scattered across the hive in separate cells, so every comparison is a pointer chase to a different region of the file — brutal on a cold cache. li minimizes storage at the cost of lookup locality. It has been valid since the first regf versions, and you still see it in modern hives.

lf — a list with a 4-character name hint

lf keeps the offsets but interleaves a hint. Each element is 8 bytes: a 4-byte cell offset to the child nk, followed by a 4-byte hint that packs the first four characters of the child's name into the 32 bits — first character in the low byte, and so on, stopping early if a character does not fit in a single byte.

The point is locality. With lf, the first four characters of every sibling name sit right in the list cell, so a lookup compares against the hints without touching a single child nk; only when the prefix matches do you dereference and read the full name to disambiguate. For a key with hundreds of children, that turns a wall of pointer chases into a tight scan over one contiguous cell. lf is the Windows NT 4.0-era answer to li's cache behavior.

lh — a list with a full-name hash

lh is structurally identical to lf — 8-byte elements, a 4-byte offset followed by a 4-byte value — but the second field is a hash of the entire uppercased name, not a four-character prefix. It is the modern default; it arrived with Windows XP (hive version 1.5) and is what you will see on the overwhelming majority of contemporary keys.

The hash is a plain multiplicative rolling hash over the name's characters, uppercased for case-insensitivity:

hash = 0
for each character c in name:
    hash = (hash * 37 + uppercase(c)) mod 2^32

That is the whole algorithm: multiply the running hash by 37, add the uppercased character, keep it inside 32 bits. The uppercasing follows the registry's case-insensitive comparison rules; for the ASCII names that dominate real hives, plain ASCII uppercasing reproduces it.

Why a hash over the full name instead of a prefix? The lf hint only helps when names diverge in their first four characters. A key full of children named Microsoft.Something.A, Microsoft.Something.B, and so on shares its entire prefix, so the hint is dead weight — every lookup falls through to the full-name comparison anyway. The lh hash distributes across the whole name, so two long names differing only in their last character still get distinct hashes, and it handles non-ASCII names cleanly.

For parser authors: the value in an lh element is a hash, not an order key. The list is still sorted by name (see below), so the hash is not for searching — it lets you reject non-matches cheaply before reading any child record. Reproduce it exactly, uppercasing and 32-bit truncation included, and you can validate a hive's index against its key names — a genuinely useful corruption check.

ri — an index of indexes

The leaf types (li, lf, lh) each live in a single cell, and a cell lives inside one hbin allocation. The kernel caps a single leaf at around a thousand entries, so a key with more children than that needs the format to go two-level.

ri is the root index. Its element array, after the count, is a list of 4-byte offsets — but each offset points at another subkey list (an li, lf, or lh leaf), not at an nk. The children are partitioned across several leaves and the ri stitches them back in order. This is the escape hatch for keys with many thousands of subkeys: the Services enumeration, large SOFTWARE vendor trees, anything that has accumulated a lot of children over time.

The hierarchy is two levels deep: an ri points at leaves, the leaves point at keys. In well-formed hives an ri does not point at another ri. That bound matters for the parser — more on that next.

Sorted order, and why it is not decoration

Across all of these, the subkey list is kept in case-insensitive lexicographic order: each entry's name is strictly greater than the one before it. Strictly — duplicates are not allowed, which is how the format enforces that two sibling keys cannot share a name.

The sort order is what makes lookups fast. The kernel resolves a path like HKLM\SYSTEM\CurrentControlSet\Services\Tcpip one component at a time, searching a sorted list at each level, so it binary-searches rather than scans. The lf/lh hints make each comparison cheap; the sort order makes the number of comparisons logarithmic. For an ri, the leaves are themselves ordered, so the search picks the right leaf first, then searches within it.

For a forensic parser, the sort order is both a convenience and a tripwire. Reconstructing children, you can rely on them coming out ordered. But a hive that has been tampered with — or recovered from unallocated space — may have a list whose ordering is broken, and a parser that assumes order and binary-searches will then fail to find keys that are physically present. When dumping the full tree, never binary-search: walk the entire list linearly and trust the offsets, not the ordering. Save the binary search for when you are deliberately emulating kernel lookup behavior.

Walking it, defensively

The parser obligation is straightforward to state and easy to get wrong: read the signature, and dispatch.

li — read N offsets, each is a child nk.
lf / lh — read N elements of 8 bytes, take the first 4 as the child offset, ignore or validate the hint/hash.
ri — read N offsets, recurse into each, because each one is another list, and concatenate the results in order.

That recursion is the step homegrown parsers skip. Treat an ri element as a child nk offset and feed it to your record reader, and you parse the leaf cell's bytes as if they were a key node — garbage, then either a crash or, worse, a phantom key. The symptom in the field is a key that reports thousands of subkeys in its nk count but renders only a handful, because the tool read the first leaf the ri points at and stopped. Cross-validating subkey counts between two parsers, as covered in the regf format post, catches exactly this class of bug.

A few more robustness rules earn their keep on real-world and adversarial hives:

Bound the recursion. A corrupt or crafted hive can nest ri inside ri, or point an ri element back at itself, turning an unbounded recursive walk into a stack overflow or an infinite loop. Track visited offsets and cap the depth.
Validate the count against the cell size. The element count must fit inside the cell the list lives in. A count of 65535 in a 16-byte cell is corruption or a deliberate parser bomb; reject it rather than read off the end of the allocation.
Branch on the signature, and treat the unexpected as an error. A cell whose signature is none of li/lf/lh/ri where a subkey list was expected means the offset is wrong or the hive is damaged. Surface that against the parent key, not as a silent empty child set.
Reconcile the nk subkey count with what you walked. If the key claims 1,500 children and your walk produced 12, you stopped early — almost always at an unhandled ri. That mismatch is your single best automated signal that the index walk went wrong.

Mature libraries get all of this right. libregf and the hivex library both implement the full li/lf/lh/ri dispatch with the recursion and the bounds checks; reading their subkey-list code confirms your own handling. The freshly written script that "parses the registry" in a blog post usually handles lh and falls over on ri.

Why four types, in one paragraph

The four types are an accretion of the same idea — index a key's children for fast, ordered lookup — re-solved as constraints changed. li came first and optimizes for space. lf (NT 4.0) added a name-prefix hint to fix li's cache locality. lh (XP) replaced the prefix with a full-name hash, because prefixes are useless when sibling names share a stem and a hash handles the whole character cleanly. ri is orthogonal: the scaling mechanism that lets any leaf type serve a key with far more children than one cell can hold. The kernel still accepts all of them, so any given hive is a mix — which is why "handles lh" is not the same as "handles subkey lists." The registry index is all four, or it is incomplete.