Skip to content

core/state, core: exact-writer validation for v2 destruction-resolved reads#2290

Open
cffls wants to merge 3 commits into
developfrom
cffls/v2-exact-writer-validation
Open

core/state, core: exact-writer validation for v2 destruction-resolved reads#2290
cffls wants to merge 3 commits into
developfrom
cffls/v2-exact-writer-validation

Conversation

@cffls

@cffls cffls commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

BlockSTM v2's read-validation (storeReadMatches) accepted a recorded read whenever the current MVStore value equaled the recorded value, even when the winning writer's block-order position changed. For reads resolved relative to a prior SELFDESTRUCT (the writerIdx > suicideIdx pattern) and for the CreatePath/SuicidePath existence markers, the writer's position — not its value — determines the result. A reordered metamorphic CREATE2 + same-tx SELFDESTRUCT could therefore move a code/storage/marker writer across the destruction boundary while its value stayed equal; validation wrongly skipped re-execution and v2 settled a state root that differed from serial.

Fix (first commit): a per-read ExactWriter flag on StoreReadDesc that disables the value-equality fallback for

  • CreatePath/SuicidePath markers — always (their value is always true, so value-equality can only ever mask a reorder), and
  • Nonce/Code/Storage reads when a prior destruction is in play (suicideIdx >= 0).

The common no-destruction path keeps the value-equality optimization, so the cost is bounded to the rare destruction-interleaved case. Marker reads now record the real writer incarnation. This covers all six destruction-resolved read sites (Exist, GetNonce, GetCode, GetCodeHash, GetState, GetCommittedState), so the storage- and nonce-slot twins of the code-path symptom are closed by construction rather than chased individually.

Test coverage (second commit): the old differential vocabulary could not express CREATE2, metamorphic re-create at a destroyed address, or a reorderable deploy decision, so the divergence was unreachable by the fuzzer. The grammar now adds CREATE2 deploy / deploy+destroy-in-one-tx (EIP-6780) / separate-tx destroy / re-create / EXTCODEHASH·EXTCODESIZE·EXTCODECOPY·BALANCE·SELFBALANCE·SLOAD reads of the target, a flag-conditional deployer (reorderable salt), TLOAD/TSTORE, DELEGATECALL, SSTORE clear-refund, contract LOG, a GAS canary, 7702 delegation to the metamorphic target, a reverting tx (journal rollback), and legacy / EIP-2930 / EIP-1559 envelopes.

Executed tests

Check Result
TestAllBlocksConsistency (BOR_BLOCKSTM_TEST=1, 241 real mainnet blocks, v2-parallel vs serial) 241/241 consistent, 0 failures
FuzzV2ExecutorVsSerial over the expanded grammar (post-fix) 1,016,614 execs, 0 divergences (pre-fix it finds the divergence in ~19s)
TestV2SerialParity_MetamorphicCreate2 (new deterministic regression) catches pre-fix (5/5), passes post-fix
TestMetamorphicHarnessSemantics, TestV2ExecutorVsSerial_SeedCorpus (worker grid {1,2,3,4,6,8}), core/state unit suite, existing v2 selfdestruct/balance/gas-determinism tests green

Standard CI gates (lint, unit/integration, diffguard w/ mutation, govulncheck, kurtosis e2e) still to run on this PR.

Rollout notes

  • Consensus-affecting for BlockSTM v2 only. v2 is not present in any released version (absent from v2.8.3); the default configuration runs v2 non-enforced with serial as the source of truth and validates the v2 root against the block header, falling back to serial on mismatch. There is therefore no behavior change for current production nodes and no coordinated upgrade required — this hardens v2 before it ships as a primary execution path.
  • Not operator-facing. No config, genesis, fork-activation, or precompile changes; backward compatible.
  • Cross-client parity (Erigon) is unaffected: serial execution output is unchanged; this only changes when v2 re-executes a tx internally.

🤖 Generated with Claude Code

cffls and others added 3 commits June 25, 2026 13:23
The V2 read-validation value-equality fallback accepted a read whenever the
current MVStore value equaled the recorded value, even when the winning
writer's block-order position changed. For reads resolved relative to a prior
SELFDESTRUCT (writerIdx vs suicideIdx) and for the CreatePath/SuicidePath
existence markers, the writer position — not the value — determines the result.
A reordered metamorphic CREATE2 + same-tx SELFDESTRUCT could move the winning
code/storage writer across the destruction boundary while its value stayed
equal, so re-validation skipped re-execution and V2 settled a state root that
differed from serial.

Add a per-read ExactWriter flag that disables the value-equality fallback for
these reads: always for the CreatePath/SuicidePath markers (their value is
always true, so value equality only masks a reorder), and for Nonce/Code/
Storage reads when a prior destruction is in play (suicideIdx >= 0). The marker
reads now record the real writer incarnation instead of 0. The common
no-destruction path keeps the value-equality optimization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sion tests

Broaden the V2-vs-serial differential fuzzer to reach the metamorphic
CREATE2 + SELFDESTRUCT lineage and the remaining state/gas-affecting opcodes
the old vocabulary (transfers, plain creates, storage calls, selfdestruct
payouts, 7702 auth) could not exercise:

- CREATE2 deploy, deploy+destroy-in-one-tx (EIP-6780 deletion), separate-tx
  destroy, re-create at the same address, and EXTCODEHASH / EXTCODESIZE /
  EXTCODECOPY / BALANCE / SELFBALANCE / SLOAD reads of the target
- a flag-conditional deployer whose CREATE2 salt is the balance of a flag
  account, so whether the target exists depends on a value another tx writes
  (a reorderable input the executor must invalidate on)
- TLOAD / TSTORE (transient per-tx/per-incarnation isolation), DELEGATECALL,
  SSTORE clear-refund, contract LOG, a GAS canary, and legacy / EIP-2930 /
  EIP-1559 tx envelopes
- 7702 delegation to the metamorphic target (delegated-code resolution across
  a destroy/re-create) and a reverting tx (journal rollback of a nested
  CREATE2 + SSTORE)

Add TestV2SerialParity_MetamorphicCreate2, a deterministic regression test:
the light-tx fuzz seeds reproduce the divergence only when the scheduler
interleaves the speculative re-create ahead of the conditional flag write
(empirically a few percent of runs), so this test forces that interleaving
with a heavy gas-burn flag tx and loops to defeat residual scheduling
nondeterminism. Also add TestMetamorphicHarnessSemantics, which validates the
hand-built harness bytecode on the serial path so a harness bug can't hide a
divergence, and broaden the seed-corpus worker grid to {1,2,3,4,6,8}.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dary

Add TestParallelExactWriterAcrossDestruction: for each destruction-resolved
read (GetState / GetCommittedState / GetCode / GetCodeHash / GetNonce) and the
existence markers, stage the metamorphic shape at the MVStore level (create +
SELFDESTRUCT at tx 0, equal-valued re-create at tx 2), record the read, then
withdraw tx 2 so the winner falls back to the wiped tx-0 write, and assert the
read no longer validates. This pins the ExactWriter requirement and — with the
destructor at tx index 0 — the suicideIdx>=0 boundary.

Restores diffguard mutation coverage to 100%: the `>=` -> `>` conditional-
boundary mutation on the GetCommittedState read site previously survived
because no core/state test distinguished suicideIdx>=0 from >0 (they differ
only when the destructor is the first tx).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

@cffls

cffls commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@claude review

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.69%. Comparing base (5df879f) to head (f745469).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2290      +/-   ##
===========================================
+ Coverage    53.44%   53.69%   +0.24%     
===========================================
  Files          896      897       +1     
  Lines       159770   160371     +601     
===========================================
+ Hits         85397    86108     +711     
+ Misses       69054    68921     -133     
- Partials      5319     5342      +23     
Files with missing lines Coverage Δ
core/state/parallel_statedb.go 96.06% <100.00%> (+0.01%) ⬆️
core/state/parallel_statedb_validate.go 90.51% <100.00%> (ø)

... and 40 files with indirect coverage changes

Files with missing lines Coverage Δ
core/state/parallel_statedb.go 96.06% <100.00%> (+0.01%) ⬆️
core/state/parallel_statedb_validate.go 90.51% <100.00%> (ø)

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consensus-affecting fix to BlockSTM v2 read validation; the analytical reasoning behind the selective ExactWriter placement and the claim of completeness across all six destruction-resolved read sites deserves a domain-expert look despite the strong test coverage.

Extended reasoning...

Overview

This PR fixes a real V2-vs-serial divergence: when a metamorphic CREATE2/SELFDESTRUCT reorders a code/storage/marker writer across the destruction boundary while the value stays equal, storeReadMatches' value-equality fallback wrongly accepts the stale read. The fix introduces a per-read ExactWriter flag that disables that fallback for (a) CreatePath/SuicidePath existence markers unconditionally and (b) Nonce/Code/Storage reads when suicideIdx >= 0. Six destruction-resolved read sites in parallel_statedb.go are updated via a new recordStoreReadEx helper, and storeReadMatches in parallel_statedb_validate.go gains the !rd.ExactWriter guard. The bulk of the diff is test infrastructure: a metamorphic harness (deployer/destroyer/reader/flag/reverter/transient/delegate/refund/logger contracts), an expanded fuzz vocabulary covering CREATE2/EIP-6780/7702-to-metamorphic/legacy+2930+1559 envelopes, harness-semantics sanity tests, and a deterministic regression test.

Security risks

No direct security risks (no auth, crypto, or permission changes). The change is consensus-affecting only for BlockSTM v2, which the PR description states is absent from released versions and runs non-enforced with serial as source of truth and root validation against the block header — so a v2 settlement that diverged from serial would be caught by the existing root check and fall back to serial on mismatch. There is no operator-facing surface.

Level of scrutiny

High. This is consensus-critical parallel-execution validation logic, and the correctness of the fix rests on a completeness argument: that the six listed read sites cover every read where the winning writer's position is load-bearing. The PR's analytical reasoning (markers always need ExactWriter because their value is constant; non-marker reads need it iff a prior destruction is in play) is sound on its face, but a domain expert should confirm no other path resolves reads relative to suicideIdx (e.g. in settlement, in self-read paths, or in any helper that bypasses recordStoreReadEx). The selective placement is also a forward-compat hazard: any future read site that consults priorDestructedAt must remember to use recordStoreReadEx with the right flag.

Other factors

Test coverage is strong: TestParallelExactWriterAcrossDestruction pins the invariant at the MVStore level for all six sites with a deliberately-placed destructor at idx 0 (the boundary case), TestV2SerialParity_MetamorphicCreate2 is a deterministic regression that catches pre-fix 5/5, the expanded fuzz grammar finds the pre-fix divergence in ~19s, and the author reports 241/241 mainnet blocks consistent under BOR_BLOCKSTM_TEST=1. SonarQube quality gate passed. The CI gates (lint, unit/integration, diffguard, govulncheck, kurtosis) are still to run per the PR description. Bug hunter found no issues.

This is not the kind of mechanical change that warrants auto-approval; cffls explicitly asked for review, so deferring to them rather than staying silent.

@cffls cffls marked this pull request as ready for review June 25, 2026 23:15

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@cffls cffls requested a review from a team June 26, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants