Registry Parser
All articles

Registry value types on disk: REG_SZ, REG_DWORD and the parsing traps

9 min read

Every analyst learns the registry value types as a tidy table: REG_SZ is a string, REG_DWORD is a number, REG_BINARY is "whatever". That table is fine until you are reading raw bytes out of a vk record and the bytes do not agree with the type the record claims to hold. Registry value types are a hint the writer attached to the data, not a contract the kernel enforces. Understanding how each type is encoded on disk — and where registry data encoding goes sideways — is the difference between trusting your parser's output and being quietly misled by it.

This post sits in the registry internals series. If you have not read the regf format overview, start there: it covers the base block, HBINs, cells, and the record zoo. Here we zoom into one field of one record — the data type — and everything that field does and does not tell you.

Where the type lives

A value is a vk record. Among its fields are a name, a data length, a pointer to the data cell (or inline data), and a 32-bit data type. That type is the integer you see surfaced as REG_SZ, REG_DWORD, and so on. The type IDs are stable on-disk constants, identical to the winnt.h definitions:

IDNameOn-disk encoding
0REG_NONEno defined format; treat as opaque bytes
1REG_SZUTF-16LE string
2REG_EXPAND_SZUTF-16LE string with %VAR% references
3REG_BINARYraw bytes, any form
4REG_DWORD32-bit little-endian
5REG_DWORD_BIG_ENDIAN32-bit big-endian
6REG_LINKUTF-16LE symlink target
7REG_MULTI_SZNUL-separated UTF-16LE strings
11REG_QWORD64-bit little-endian

A few more exist — REG_RESOURCE_LIST (8), REG_FULL_RESOURCE_DESCRIPTOR (9), REG_RESOURCE_REQUIREMENTS_LIST (10) — used almost exclusively under HKLM\HARDWARE. They are structured binary; treat them as REG_BINARY unless you need hardware resource maps. This codebase keeps the full table in lib/registry/types.ts as the RegType enum and REG_TYPE_NAME map, and every value carries both the decoded value and the original raw bytes precisely because the two can disagree.

The fixed-width numeric types

Start with the easy ones, because they set up the trap.

REG_DWORD (4) is four bytes, little-endian. The value 0x0000002A is stored as 2A 00 00 00. REG_QWORD (11) is eight bytes, little-endian. REG_DWORD_BIG_ENDIAN (5) is the same 32-bit integer with byte order reversed — 00 00 00 2A for 0x2A. Because Windows runs on little-endian hardware, REG_DWORD and REG_DWORD_LITTLE_ENDIAN are the same constant; the big-endian variant exists for portability and is rare. If you do hit type 5, read it big-endian or your DWORD will be off by a factor of millions.

DWORD-sized values are small enough that they are usually stored inline in the vk record rather than in a separate data cell — worth knowing when walking cells by hand, but transparent once your parser resolves it. The decoded result is a number; QWORD comes back as a bigint because it does not fit a JavaScript double safely.

The trap: a REG_DWORD is only a number because the writer said so. If a value is tagged REG_DWORD but the data length is 7 bytes, you do not have a DWORD. A robust parser reads the declared length, not a hardcoded 4, and truncates, pads, or flags the mismatch. Do not assume.

The string types and the NUL-termination problem

REG_SZ (1), REG_EXPAND_SZ (2), and REG_LINK (6) are UTF-16LE strings. REG_EXPAND_SZ carries unexpanded environment references like %SystemRoot%\system32 and is expanded by the consumer, not the registry; on disk it is just a string. REG_LINK holds a symlink target path.

Two bytes per character, little-endian. So far so good. The problem is termination. The Win32 write API, RegSetValueEx, requires that the byte count you pass for a string type include the terminating NUL. In practice, callers get this wrong constantly, and the kernel does not police it. Microsoft's own documentation says the quiet part out loud:

If data has the REG_SZ, REG_MULTI_SZ, or REG_EXPAND_SZ type, then the string might not have been stored with the proper terminating null characters. So when reading a string from the registry, you must ensure that the string is properly terminated before using it.

That is a vendor admission that registry strings in the wild may be non-terminated, or carry the terminator inside or outside the declared length. The Project Zero "Windows Registry Adventure" makes the same point about key names — they "don't use a terminator, and may contain all sorts of non-printable characters" — and value data inherits the same lawlessness.

A parser has to handle all of these for a single REG_SZ:

  • C:\Windows\0 — clean, terminator included in the length.
  • C:\Windows — no terminator at all; the length is the exact string length.
  • C:\Windows\0\0\0\0 — trailing NULs padding to a cell boundary.
  • C:\Win\0dows — an embedded NUL mid-string, whether by accident or design.

The wrong move is to read up to the first NUL and stop — that silently drops the tail of the fourth case. The defensive move: decode the entire declared length as UTF-16LE, then trim trailing NULs for display while preserving the raw bytes. That is exactly what utf16leAt in lib/plugins/helpers.ts does:

export function utf16leAt(raw: Uint8Array, off: number, len: number): string {
  const end = Math.min(off + len, raw.length);
  if (off >= end) return "";
  return new TextDecoder("utf-16le")
    .decode(raw.subarray(off, end))
    .replace(/\0+$/, "");
}

Note the Math.min against raw.length. A vk record can claim a data length longer than the cell that holds it; the guidance is to truncate to what is actually present rather than read off the end into the next cell. The \0+$ strip removes only trailing NULs, leaving embedded ones intact so an investigator can see them. Embedded NULs are a tell: a viewer that stops at the first NUL shows a complete-looking path, while the bytes after it differ — exactly the asymmetry some hiding techniques exploit.

REG_MULTI_SZ: the double-NUL list

REG_MULTI_SZ (7) is a sequence of UTF-16LE strings, each NUL-terminated, with the whole sequence terminated by an additional empty string — so it ends in two NUL characters. Microsoft's canonical example:

String1\0String2\0String3\0LastString\0\0

The first \0 ends string one; the second-from-last \0 ends the last string; the final \0 terminates the list. A consequence that surprises people: because a zero-length string is the terminator, you cannot store an empty string as a list element. An empty REG_MULTI_SZ is just \0.

Parsing traps here compound the single-string ones. Naively splitting on every NUL and dropping empties silently merges or loses elements when a producer writes only one trailing NUL instead of two, or pads with extras. The robust approach: split on the NUL boundary, drop the trailing empty string(s) that represent the terminator, and do not assume the terminator is well-formed. This tool decodes REG_MULTI_SZ to a string array; valueToString joins it with "; " for flat display, but the array is what you should reason over. PendingFileRenameOperations, service DependOnService, and assorted *List values are all multi-strings where element boundaries carry meaning.

REG_NONE, REG_BINARY, and the type that is not a type

REG_NONE (0) means the writer declared no format; REG_BINARY (3) means bytes in any form. Neither tells you anything about structure — and a great deal of forensically interesting data lives in REG_BINARY values whose layout you must know independently: UserAssist run counters, ShellBags, AppCompatCache. The type ID will not help you; you decode by context. When a value is REG_BINARY, REG_NONE, or any type this tool does not model, the decoded value is left undefined and you work from raw, rendered as hex by the hex helper.

The type does not have to match the data

This is the point the whole post builds to, and the one that matters most for anti-forensics. The registry stores type and data as independent fields. Nothing in the kernel requires that a value tagged REG_DWORD actually contains four bytes, or that a REG_SZ contains valid UTF-16. The type is metadata supplied by whoever wrote the value.

The defensive implications:

  • A REG_SZ value can hold arbitrary binary — a packed config blob, shellcode, a second stage — while presenting as a harmless string. A text viewer shows mojibake or a truncated fragment; the bytes are all there in raw. Look at the raw length and hex when a "string" looks short or garbled relative to its declared length.
  • A REG_BINARY value can hold readable UTF-16 that a tool refuses to show as text because the type says binary. Malware uses this to keep configuration illegible to casual triage.
  • A declared length can disagree with the data cell size, exploiting truncation behavior to make tools that trust the length read garbage or crash.

The correct posture is to treat the type as a claim verified against the bytes, never as ground truth. This is why every RegValue keeps raw alongside the decoded value: the decode is a convenience, the bytes are the evidence, and when the two disagree the disagreement is itself a finding. The mechanics of the vk record that carries all this — the flags, the inline-data bit, the data-cell pointer — are covered in the vk value record deep dive.

Reading values defensively

Registry value types, REG_SZ versus REG_DWORD and the rest, are a starting point, not an answer. When you read a value in a forensic context, registry data encoding has to be checked at three levels: the declared type, the declared length, and the actual bytes. A clean tool surfaces all three so the analyst can see when they diverge. You can parse a hive in your browser and inspect the decoded value, its type, and its raw bytes side by side — nothing is uploaded, and the hex view is always one click away when the decode looks wrong.

Further reading

The table you learned was never wrong. It was just incomplete. The byte stream under each REG_* type carries edge cases the type field will not warn you about, and the registry's refusal to enforce that the type matches the data is not a bug — it is the reason you read the raw bytes.