The vk record: how the registry stores a value
11 min read
A registry value is the leaf of the tree — the thing analysts actually quote in reports: the Run entry pointing at a dropper, the service ImagePath, the AppInit_DLLs string. All of it bottoms out in one cell type, the vk record, short for value key. To read a registry value out of a hive by hand, or to trust the parser that does it for you, you need to know what a value key cell looks like on disk, because the layout has two traps that catch most homegrown parsers: the name encoding flag, and the inline small-data optimization. This post walks the structure field by field.
This is part of a registry-internals series. If you have not read the regf hive format overview, start there — it covers the base block, HBIN allocation, and the signed cell-size convention everything below assumes. The companion piece on the nk key node record covers the keys that own these values, and Windows registry internals gives the broader picture.
Where a vk lives
A vk does not float free. Each nk (key node) carries a value count and a pointer to a value list — a cell holding a flat array of 4-byte offsets, one per value, each pointing at a vk cell. Reading a key's values means: read the nk, follow the value-list offset, dereference each entry. The vk records for a single key are not required to be adjacent in the file, and rarely are after a hive has churned. A deleted value leaves its vk cell marked free (positive cell size) but the bytes survive until the slot is reused, so carving for orphaned vk records is a standard recovery move.
The vk layout
The fixed portion of a vk cell — after the 4-byte signed cell size that prefixes every cell — is 20 bytes, followed by the variable-length value name. The libregf and yarp reverse-engineered specs agree; the offsets below are relative to the start of the vk record (the signature), not the cell-size field.
| Offset | Size | Field |
|---|---|---|
| 0x00 | 2 | Signature vk (0x76 0x6B) |
| 0x02 | 2 | Value name size (bytes) |
| 0x04 | 4 | Data size |
| 0x08 | 4 | Data offset (or inline data) |
| 0x0C | 4 | Data type |
| 0x10 | 2 | Flags |
| 0x12 | 2 | Padding / unused |
| 0x14 | variable | Value name |
Five of those fields do real work.
Signature
Two bytes, ASCII vk, the same way nk, lh, and sk jump out in a hex editor. It is the first sanity check when carving: a candidate offset that does not begin with vk is not a value record.
Value name size and the name flag
The name size at offset 0x02 is the length of the name in bytes, not characters. The name begins at offset 0x14 and runs for that many bytes; there is no terminator, so the count is authoritative.
The encoding is governed by the Flags field at offset 0x10. Bit 0x0001 (VALUE_COMP_NAME, "compressed name") means the name is a single-byte ASCII string; if clear, it is UTF-16 little-endian — the same distinction the nk record uses for key names, and the field homegrown parsers most often ignore. A parser that assumes ASCII mojibakes every value name in a localized hive; one that assumes UTF-16 doubles the apparent length of ASCII names and reads garbage past the end. Because the size is in bytes, a UTF-16 name of n characters reports a name size of 2n. Read the flag.
The default value
A vk with a name size of zero is the default value of its key — what regedit displays as (Default). There is no name string; the region is empty. It is not a special record type, just a vk with no name, and most tooling renders it as a friendly label rather than as "this value has no name." When matching against a path, the default value is addressed by the key path alone, with no trailing value component.
Data type
Four bytes at offset 0x0C, holding the REG_* type constant — the standard Win32 registry value types the API exposes:
REG_SZ(1) — a null-terminated UTF-16 string.REG_EXPAND_SZ(2) — a string with unexpanded environment references such as%SystemRoot%.REG_BINARY(3) — raw bytes, no interpretation.REG_DWORD(4) — a 32-bit little-endian integer.REG_DWORD_BIG_ENDIAN(5) — rare, but it exists.REG_MULTI_SZ(7) — a sequence of UTF-16 strings, double-null terminated.REG_QWORD(11 / 0x0B) — a 64-bit little-endian integer.
The type is a hint about how to interpret the bytes; the hive does not enforce it. A value typed REG_SZ can hold bytes that are not valid UTF-16, and malware occasionally exploits exactly that — storing a binary payload under a string type so string-oriented tools truncate it at the first embedded null. Surface the declared type, but never let it stop you inspecting the raw bytes.
Data size, and the inline-data trick
The data size at offset 0x04 is the length of the value's data, in bytes. The data offset at 0x08 normally points to a separate cell holding those bytes — the same relative-to-hive-data convention used everywhere in the format.
The trap is the high bit. If the most significant bit of the data size (0x80000000) is set, the data is stored inline in the data-offset field itself, and the low 31 bits give the real length. Since that field is only 4 bytes wide, inline data is at most 4 bytes — exactly enough to hold a REG_DWORD without paying for a second cell. It is a space optimization, and it is everywhere: a large fraction of values in a real hive are small DWORDs stored this way.
The byte placement within the field is the subtle part, and the libregf spec is explicit:
- A data size of 4 uses all four bytes of the data-offset field.
- A data size of 2 uses the last two bytes (little-endian).
- A data size of 1 uses the last byte.
- A data size of 0 means no data; the offset is meaningless.
So you mask off 0x80000000 first. If it was set, the remaining length says how many bytes to take from the data-offset field in place. If it was clear, the field is a real pointer to a data cell. A parser that forgets the mask treats the inline length (a number like 0x80000004) as a four-billion-byte allocation and crashes or refuses the value; one that forgets to follow the offset for non-inline values emits the offset integer as if it were data. Both bugs are common, and both are caught instantly by cross-checking against a mature tool.
Larger values: the data cell
For data longer than 4 bytes and up to roughly 16 KB, the high bit is clear and the data-offset field points at an ordinary data cell — an allocated cell with no signature of its own, whose payload is the raw value bytes. The vk's data-size field is authoritative for how many bytes are valid; the cell may be larger due to allocation granularity, so trust the size field, not the cell length.
This is where a classic edge case lives: a vk claiming a data size larger than the cell that holds it. The convention is to truncate to what the cell supplies, but tools differ on whether they truncate silently, warn, or reject. If two parsers disagree on a value's contents, a size-versus-cell mismatch is the first thing to check.
Values over 16 KB: big-data records
Once a value's data exceeds the threshold a single cell can hold — about 16 KB in modern hives — the format stops storing it contiguously. The data-offset field instead points at a big-data (db) record, an indirection layer holding a count of segments and a pointer to a list of cell offsets, each pointing at a chunk. Reassembling the value means walking that list and concatenating the chunks in order, using the vk's data-size field to know where the real data ends in the final chunk.
This matters because large REG_BINARY and REG_MULTI_SZ values — certificate blobs, Shimcache-style caches, serialized structures — routinely cross the threshold. A parser that reads the data-offset field as a flat data cell hands you the first 16 KB and silently drops the rest, or reads the db record's segment pointers as if they were value bytes. The full mechanics are covered in the post on registry big-data records.
So a vk's data lives in one of three places, and the header tells you which: inline in the offset field (high bit set, ≤ 4 bytes), in a single data cell (offset points at raw bytes), or split across a db chain (offset points at a big-data record). Reading a value correctly means deciding which case you are in before you touch the data.
What a value tells an investigator
The vk record carries no timestamp of its own. Values have no LastWrite; only keys do, on the nk — and as the regf overview notes, that LastWrite is not reliably updated when a value's data changes in place without the value being added or removed. So you cannot, in general, date a value change from the hive alone, a frequent and consequential misunderstanding. A suspicious value tells you what and where, but rarely when; for timing you pivot to transaction logs, VSS snapshots, or correlated artifacts.
What the vk does give you — the declared type and the exact byte length — catches a surprising amount of tradecraft: a REG_SZ whose length is odd (impossible for genuine UTF-16), a tiny REG_BINARY masquerading as a flag, a REG_MULTI_SZ with data after its terminator.
Reading the vk in practice — value key cell, end to end
The algorithm for a single vk record, given its offset:
- Confirm the
vksignature at 0x00. - Read the Flags at 0x10; bit 0x0001 means the name is ASCII, else UTF-16. Read
name_sizebytes from 0x14 accordingly. A zero name size is the default value. - Read the data type at 0x0C and the data size at 0x04.
- Test the high bit of the data size. If set, mask it off and read the low bits as a 1-to-4-byte inline value from the last bytes of the data-offset field. If clear, follow the data-offset field: read
data_sizeraw bytes from the data cell, or — if the size exceeds the single-cell threshold — walk the segments of thedbbig-data record it points at.
That is the whole life of a registry value: five fields and three storage cases. You can parse a hive in your browser and watch the parser make exactly these decisions on real data — every vk record in the tree resolved to its value, inline or indirect, with nothing uploaded.
Frequently asked questions
What is a vk record? The regf cell that stores a single registry value: its name, declared type, data length, and either the data inline or an offset to where the data lives. Every value in regedit is one vk cell on disk.
How does the registry store small values inline? When the data is 4 bytes or fewer, the vk sets the high bit (0x80000000) of the data-size field and stores the data directly in the 4-byte data-offset field instead of allocating a separate cell. A REG_DWORD is the canonical case.
What is the default value? A vk record with a name size of zero — the value regedit shows as (Default), addressed by its key path with no value name.
Further reading
- Google Project Zero, The Windows Registry Adventure #5: The regf file format: the inspiration for this series, with the kernel-side
_CM_KEY_VALUEperspective. - Joachim Metz, libregf: the format documentation closest to a parser-implementor's reference, with the exact vk field offsets used above.
- Maxim Suhanov, Windows registry file format specification: the canonical reverse-engineered spec; cross-validate any homegrown parser against yarp.
- hivex by Richard Jones: a compact, readable C implementation whose
ntreg_vk_recordhandling makes the inline-data masking concrete.
A vk is small, but it is where every registry investigation ends up. Get the name flag and the inline-data high bit right and you read values correctly; get either wrong and you produce confident, plausible, incorrect output. Cross-check against a mature tool the first time, and you will not have to wonder which one you are doing.