Recovering deleted registry keys and surviving log replay

The first time you replay a hive's transaction log and watch a registry value that should not exist materialize, you understand why this is a non-optional step. The live registry as Windows shows you is one view of the data. The hive file is another view. Replaying the logs gives a third. Comparing against VSS gives a fourth. Anti-forensic attackers count on most analysts looking at only the first.

This post is the recovery workflow that catches the rest.

What "deleted" means in registry terms

The registry is a copy-on-write tree inside a regf file. When a key or value is "deleted", Windows does not zero the bytes. It flips the sign of the cell's size field from negative to positive, marking it free. The HBIN block continues to hold the old bytes until a later allocation reuses the space.

This applies to deleted values, deleted keys (whose nk and child cells are all marked free), and cells freed during reallocation when a value grows beyond its original size. A regf parser that walks free cells the same way it walks allocated cells will surface deleted records with names, types, and data intact.

The unallocated-cell recovery pass

Mature parsers do this automatically:

Walk every HBIN block.
For each free cell, attempt to parse it as a typed record (nk, vk, sk, lh, db, ...).
Validate against format constraints: sensible name length, sane child counts, plausible timestamps, embedded offsets pointing at targets of the expected type.
Surface any record that validates.

Validation separates a useful pass from a junk-spewing one. Random bytes occasionally pass the magic check. Implementations worth knowing: yarp (strict, conservative), regipy (looser, occasional false positives, easier to script), and RegRipper's del plugin (walks the hive and reconstructs paths where possible).

Output typically includes recovered values with names, types, and data; recovered keys with LastWrite times; and "orphan" cells that validate but cannot be tied to a path because the parent nk is gone. Orphans are not useless: a vk cell for a value named Debugger containing C:\Windows\Temp\evil.exe is informative even without knowing which IFEO subkey it sat under.

Transaction log replay, properly done

Every primary hive on modern Windows ships with .LOG1 and .LOG2. The dual-log scheme guards against power loss during writes. Replaying them is non-optional.

The logs contain dirty-page entries describing modifications committed to the log but not yet flushed to the primary hive. Each entry says "at offset N, write these M bytes". Replay applies them in order.

What replay surfaces:

In-flight writes. Changes logged but never persisted to the primary file. These appear in the replayed hive and nowhere else.

Recently committed sequences. Even when the primary has been flushed, the log may still contain the most recent writes. Comparing the post-replay hive against the unreplayed primary tells you what changed in the last sync window.

Crash residue. If the system crashed between log write and primary flush, the logs may contain records the live registry has never seen. Exactly the records attackers do not expect you to find.

Sequence numbers, dirty-page bitmaps, per-entry validation: get the replay right or use a library that already does. Tools that handle it automatically: yarp (--recover), regipy (--apply_transaction_logs), RegRipper when logs are in the same directory, and the parser on this site. Verify the tool actually reports it replayed the logs. Some silently skip replay if the logs look clean.

VSS snapshots: registry diffs across time

Volume Shadow Copy Service stores periodic snapshots of the entire volume, including the hives. A default install keeps a few; a well-managed system keeps a week or two. The move: pull each snapshot, extract the hives, run recovery on each, diff.

What the diff surfaces:

Persistence that appears in the live hive but not in yesterday's snapshot. If yesterday's SOFTWARE had no value at Run\evil and today's does, the persistence was installed between those two times.

Persistence that appeared and disappeared. A snapshot from three days ago has a Run value; yesterday's does not; the live hive does not. The attacker added persistence, ran the payload, and cleaned up. The snapshot remembers.

Modifications to existing values. Service ImagePath values that changed. IFEO Debugger values added and removed.

Permission changes. Security descriptors modified to hide content. The pre-modification descriptor lives in the snapshot.

VSS is the single best defense against the "edit and revert" anti-forensic pattern. vssadmin list shadows and Copy-Item against each snapshot's GLOBALROOT path will get you the hives. NirSoft's shadowcopyview gives a friendlier UI for the same.

What attackers can and cannot clean

The honest answer is: a determined attacker with admin can clean almost anything. The dishonest answer is the comforting one. Let's stick with honest.

What attackers commonly do:

Delete the value, hope for the best. Most do this. Recovery from unallocated cells gets it back.

Delete the value and force a flush. reg flush or RegFlushKey. Subsequent writes to the same HBIN may overwrite the freed cell. Recovery rate drops but is not zero.

Delete the value, flush, and disable VSS. vssadmin delete shadows /all. Wipes snapshots that hold the original state. The disable itself shows up in the EVTX System log (event ID 8224). The signal is "VSS was wiped at time T", and what happened just before T is now your investigation.

Overwrite cell content deliberately. Write a large junk value, forcing reallocation, then delete it. Cleans the freed cell. Rare because it requires understanding the regf format.

Kernel-mode wipe. Possible but exotic. If your attacker has that capability, you have bigger problems.

What attackers cannot easily clean: VSS snapshots taken before they got admin, transaction logs already flushed and undisturbed, Sysmon/EVTX records of the registry edits, the MFT entry for the hive file, the contents of a RAM dump taken between edit and cleanup, and registry pages that hit swap in the pagefile.

A practical recovery workflow

For any "did the attacker modify and revert" question:

Acquire everything. Primary hive, .LOG1, .LOG2, every VSS copy of the same, hives from every available profile (including stale ones).
Replay logs on each hive. Use a tool that explicitly reports replay status. If the tool says it skipped replay because the sequence numbers matched, verify by trying a different tool with looser checks.
Run unallocated-cell recovery on each. yarp, regipy, RegRipper del. Capture orphan records as well as path-reconstructable ones.
Diff across VSS. Compare the persistence keys, services, IFEO, scheduled tasks across every snapshot. Tools like regdiff from Eric Zimmerman work; so does a custom script that hashes each subkey's contents.
Cross-correlate with EVTX. Look for Registry value set events (Sysmon event 13) and Registry object added or deleted events (Sysmon event 12). If Sysmon was logging registry, you get the change events with timestamps.
Cross-correlate with hive file timestamps. The MFT record for the hive file has the file's modification timestamp. Sudden modification clusters in the timeline are leads.
Pivot to memory and pagefile. If you have a memory image from the relevant window, the registry pages in memory may carry the pre-cleanup state. Volatility's registry plugins extract this. The pagefile parser handles swapped hive pages.

The combination of log replay plus VSS diff plus unallocated recovery catches the vast majority of attempted cleanups. Anything that survives all three is a sophisticated adversary, and your investigation needs to escalate accordingly.

A common pitfall

Tools that "recover" deleted keys sometimes show the same record from multiple sources without telling you. A value present in the live hive, in the transaction log replay, and in the VSS snapshot is one value, not three findings. Triage with deduplication in mind.

The reverse matters too. A finding that appears in exactly one source is sometimes the most interesting finding precisely because it appears in only one source. A deleted IFEO Debugger value visible only in the LOG1 replay is the kind of evidence that closes a case.