feat: merge-train/spartan-v5#24053
Merged
Merged
Conversation
## Motivation
`L2BlockStream.work()` reconciled the four chain tips through
independent code paths — a reorg walk, two
checkpoint loops with a shared cursor, a block loop, and a final
proven/finalized diff — with optimizations
(`startingBlock`, `skipFinalized`) that muted some paths but not others.
Nothing enforced that what one poll
emitted was tier-consistent, which produced a family of bugs (detailed
below). The root enabler was the fat
`chain-checkpointed` event: because it carried a full
`PublishedCheckpoint`, the stream had to fetch, order,
and replay every checkpoint individually, and every special case had to
reason about when that replay could be
skipped.
This PR makes `chain-checkpointed` a thin tip event like
`chain-proven`/`chain-finalized` and collapses
`work()` into three symmetric steps: reorg detection, one block-download
loop, and unconditional end-of-pass
tier reconciliation from a single source snapshot. The two consumers
that relied on per-checkpoint payload
delivery are first decoupled: the sentinel stops needing a block stream
at all, and the prover-node drives its
own checkpoint catch-up from a cursor.
## Bugs and races fixed
- **Checkpointed cursor stuck at genesis under `startingBlock`
(A-1061).** The `startingBlock` fast-forward
arm suppressed all checkpoint emission while the end-of-pass
`chain-proven` still fired, leaving the local
store with `proven > checkpointed` and the checkpointed cursor at
genesis until the source's checkpointed
tip passed `startingBlock`. In p2p this degraded `isEpochPrune` (epoch
prunes misclassified as ordinary
prunes, so `txPoolDeleteTxsAfterReorg` was not honored during the
window); the same pattern affected the
sentinel and prover-node streams. Tier reconciliation is now
unconditional against the source snapshot, so
the first pass converges. Regression: *emits the source checkpointed tip
on the FIRST pass even when
startingBlock is past it*.
- **Duplicate / out-of-order checkpoint events after a reorg
(stale-snapshot guard).** Loop 1's "blocks
already local" guard compared against the `localTips` snapshot taken
before the prune handler rewrote the
store, so a node lagging a checkpoint that detected a reorg in the same
pass could emit `chain-checkpointed`
for a new-fork checkpoint before its blocks, then re-emit checkpoints
from the prune target upward. Loop 1
is deleted; reconciliation re-reads the local tips after a prune so it
compares against the just-clamped
cursors. Regression: *does not re-emit the checkpointed tip after
pruning to a block ahead of it*.
- **Same-height stale cursors never refreshed (number-only
comparisons).** The proven/finalized gates compared
block numbers only, so a same-number/different-hash tip after a reorg
kept a stale (hash, checkpoint-id)
pair until the tip next advanced. Gates now compare (number, hash) —
skipping the hash comparison when the
local hash is `undefined`, which world-state legitimately reports for
tips ahead of its synced range
(comparing against it would re-emit the event on every poll).
Regressions: *re-emits the proven tip when
numbers match but the known local hash differs* and *does not re-emit
when the local hash is undefined*.
- **Sentinel credited attestors on reorged-out checkpoints.** Its
`slotNumberToCheckpoint` map had no
`chain-pruned` handling, so a reorged-out checkpoint's attestation entry
lingered and `getSlotActivity`
could classify validators against a non-canonical checkpoint. The map
(and the stream feeding it) is gone;
the sentinel fetches the checkpoint for a slot on demand, so the answer
is always canonical.
- **Read-skew race between `getL2Tips` and `getCheckpoints`.** The old
checkpoint loops fetched payloads in
separate reads from the tips snapshot; a source-side reorg between the
two reads could plant inconsistent
state in the local store (the reason catch-up code needed
validation-and-abort machinery). Eliminated
structurally: every tier event is now built from the same `getL2Tips`
snapshot, with no second fetch to
skew against.
- **Prover-node restart could have skipped unproven checkpoints
(prevented by design here).** With catch-up
driven by a cursor, seeding it from a checkpointed tip would silently
skip the unproven checkpoints of a
partially-proven epoch on restart. The cursor seeds from the last
checkpoint of the last fully-proven epoch
(or 0), and advances only after both `checkpointStore.addOrUpdate` and
`sessionManager.onCheckpointAdded`
succeed, preserving the existing at-least-once retry semantics (A-1041).
Epoch expiry additionally gets a
periodic tick, since it previously piggybacked on per-checkpoint event
volume.
## Approach
- **Sentinel (decoupled first):** deletes its `L2BlockStream`,
`L2TipsMemoryStore`, slot→checkpoint map, and
the manual stream sync in its work loop. `getSlotActivity` fetches
`archiver.getCheckpoint({ slot })` on
demand (the by-slot query already existed) and derives the same
attestation data; the p2p-sync gate reads
the archiver tips directly instead of a stale local mirror.
- **Prover-node (decoupled second):** on a thin `chain-checkpointed` tip
event, walks every checkpoint between
its cursor and the tip: light `getCheckpointsData` metadata first,
whole-epoch relevance filtering (an epoch
is skipped only if fully proven or past its proof-submission window —
never individual checkpoints inside a
provable epoch, which the SessionManager's full-coverage contract
requires), then a heavy `getCheckpoint`
fetch only for checkpoints it will actually register. This is strictly
cheaper than the old stream replay,
which transferred every full checkpoint payload before the prover-node
could decide to skip it.
- **Stream rewrite:** `chain-checkpointed` becomes `{ block: L2BlockId,
checkpoint: CheckpointId }`, emitted at
most once per pass — symmetric with `chain-proven`/`chain-finalized`.
`work()` is now: reorg walk-back +
`chain-pruned`; one `getBlocks` download loop (start incorporates
`startingBlock`/`skipFinalized`);
end-of-pass reconciliation checkpointed → proven → finalized from one
snapshot. PXE, the one remaining
payload consumer, fetches its anchor header by hash (reorg-safe) and
skips the update if the block vanished.
## API changes
Internal API only (no RPC schema changes):
- `L2BlockStreamEvent`: `chain-checkpointed` carries `{ block,
checkpoint }` ids instead of a
`PublishedCheckpoint`, and fires at most once per sync pass instead of
once per checkpoint. Consumers
needing payloads fetch them from the block source.
- `L2BlockStream` source narrows to `Pick<L2BlockSource, 'getBlocks' |
'getBlockData' | 'getL2Tips'>`; the
`checkpointPrefetchLimit` option and `CHECKPOINT_PREFETCH_LIMIT` export
are gone.
- `LocalChainTips.checkpointed` widens to `{ block, checkpoint }` so the
checkpointed tier can hash-gate like
proven/finalized (still structurally assignable from `LocalL2Tips`).
## Simplifications
- `L2BlockStream.work()`: ~220 → ~115 lines. Deleted: Loop 1
(already-local checkpoint backfill), Loop 2 +
prefetch buffer (checkpoint-transport block download),
`nextCheckpointToEmit`, both `startingBlock`
checkpoint fast-forward arms, and the checkpoint payload fetching
altogether.
- Catch-up emits at most one checkpointed event per pass regardless of
lag — no per-checkpoint replay, no
multi-emission warning path, no anti-spam special cases.
- Sentinel: net ~50 lines removed plus a whole subsystem dependency
(stream + tips store) — replaced by one
~25-line on-demand fetch.
- PXE node adapter: the ~25-line `getCheckpoints` implementation is
deleted; the telemetry stream wrapper
narrows accordingly.
- Tips stores: `handleChainCheckpointed` reads ids straight off the
event instead of recomputing the
checkpoint hash from the payload.
- Stream test suite: rewritten from 1,828 to ~640 lines while adding
four regression tests for the bugs above.
- Net across the branch: **−1,228 lines** over 15 files.
## Changes
- **stdlib**: thin `chain-checkpointed` event; collapsed `work()`;
hash-aware tier gates; narrowed stream
source type; tips-store handler reads event ids directly.
- **stdlib (tests)**: stream suite rewritten around the new event
semantics + regression tests for A-1061,
post-prune reconciliation, and both hash-gate behaviors.
- **prover-node**: cursor-driven checkpoint catch-up
(`processCheckpointJump`/`registerCheckpoint`/
`computeStartingCheckpoint`), whole-epoch relevance filtering, prune
clamping, periodic epoch-expiry tick.
- **aztec-node (sentinel)**: block stream, tips store, and
slot→checkpoint map deleted; on-demand canonical
checkpoint fetch; direct archiver read for the p2p-sync gate.
- **pxe**: anchor header fetched by hash on checkpointed-tip events;
node adapter loses `getCheckpoints`.
- **kv-store / telemetry-client**: test-suite and wrapper-type
adjustments to the new event shape.
Fixes A-1061
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
## Motivation The `proposedCheckpoint` tip on `L2Tips` was the checkpoint frontier including L1-submitted-but-unconfirmed proposals, sitting between `proposed` (blocks) and `checkpointed` (confirmed). It existed only for proposer pipelining and had outlived its usefulness as a tip: - It was internal-only — never carried on the public RPC surface (`ChainTips` already excluded it), so no schema or storage migration is involved in removing it. - It was degenerate everywhere except the archiver: local stores (world-state, l2-tips-store) could never represent it (no stream event carries it; the cursor was always equal to `checkpointed`), which is why earlier work had already narrowed `LocalL2Tips` to omit it. - It was fabricated by shims purely to satisfy the type: the PXE node adapter and the TXE archiver both invented a `proposedCheckpoint: checkpointed` value. - Most importantly, the tip and its payload were read from two independent archiver snapshots — a JS-side tips cache for the tip vs. a direct store read on `#proposedCheckpoints` for the payload. A concurrent archiver write could be observed split, which forced explicit race-detection/refuse-to-proceed code in the sequencer's `checkSync`. ## Approach Drop the `proposedCheckpoint` tip from `L2Tips` entirely. The few consumers that needed the proposed-checkpoint frontier (the sequencer pipelining path, `automine`, and the node's `simulatePublicCalls`) read the payload directly from the existing archiver method: ```ts // L2BlockSource / archiver getProposedCheckpointData(): Promise<ProposedCheckpointData | undefined>; ``` With no argument this returns the latest proposed (not-yet-L1-confirmed) checkpoint payload, which is always the leading one: `addProposedCheckpoint` only stores a checkpoint number beyond the confirmed frontier (sequentiality is enforced), and every confirmation path (`addCheckpoints`, `promoteProposedToCheckpointed`) deletes the matching proposed entry in the same transaction. Consumers derive the proposed tip from the payload — checkpoint number, and last block = `startBlock + blockCount - 1` — so there is no separate tip to read and no tip-vs-payload split to reconcile. This removes the snapshot-race reconciliation code that previously lived in the sequencer's `checkSync`. The proposed-but-unconfirmed checkpoint frontier is archiver-internal. It is **not** exposed on the public node RPC surface, so the `'proposed'` checkpoint tag is removed from `getCheckpointNumber` and `getCheckpoint` (see breaking change below). ## Breaking change (RPC) The `'proposed'` checkpoint tag is no longer accepted by the node's `getCheckpointNumber(tip?)` and `getCheckpoint(param)` (and is dropped from `CheckpointParameter`). The proposed-but-unconfirmed checkpoint frontier was never a public concept; checkpoint-tip selectors are now `'checkpointed' | 'proven' | 'finalized'`. The block-level `'proposed'` tag (latest proposed block) on `getBlockNumber` / `getChainTips` is unchanged. ## API changes (internal) - `proposedCheckpoint` removed from `L2Tips` and `L2TipsSchema`; `'proposedCheckpoint'` removed from `L2BlockTag`. - `L2BlockSource.getProposedCheckpointData(query?)` is unchanged; its no-arg form returns the leading proposed-checkpoint payload (used by the sequencer / `simulatePublicCalls`), and its by-number/by-slot forms back the node's confirmed→proposed `getCheckpoint` fallback. - The `ChainTip` / `ChainTips` / `ChainTipSchema` / `ChainTipsSchema` aliases are removed. Block-level call sites use `L2BlockTag` / `L2Tips` / `L2TipsSchema` / `BlockTagWithoutLatestSchema` directly; checkpoint-tip selectors use the new `CheckpointTag` / `CheckpointTagSchema` (`'checkpointed' | 'proven' | 'finalized'`). `LocalL2Tips` continues to alias `L2Tips`. ## Changes - **stdlib**: removed the tip from `L2Tips`/`L2TipsSchema`/`L2BlockTag`; replaced the `ChainTip(s)` aliases with `CheckpointTag`/`CheckpointTagSchema` and direct `L2*` usage; narrowed `getCheckpointNumber`/`CheckpointParameter` to checkpoint tags; updated interface tests. - **archiver**: removed the `proposedCheckpoint` cursor from `getL2TipsData` (tips cache); rewrote `pruneOrphanProposedBlocks` to derive the proposed-checkpoint frontier block number from the payload store; cleaned the trace log and the mocks. - **sequencer-client**: `checkSync` consumes the proposed payload via `getProposedCheckpointData()`; **deleted the snapshot-race reconciliation block** (the inconsistent-proposed-checkpoint refuse-to-proceed guard) that only existed because tip and payload were read separately; `hasProposedCheckpoint` is derived from the payload. Migrated `automine_sequencer` next-checkpoint-number computation. - **aztec-node**: `getCheckpointNumber` and `getCheckpoint`/`resolveCheckpointParameter` no longer resolve the `'proposed'` checkpoint tag; `simulatePublicCalls` derives the proposed-checkpoint frontier from the payload (no extra block fetch in the leading case). `getChainTips` simplified. - **pxe / txe**: deleted the `proposedCheckpoint: tips.checkpointed` fabrication shims in the PXE node adapter and the TXE archiver. Fixes A-978
Wait after the scheduled send time until the previous L1 block is observed before submitting. Alternative to #24035. Fixes A-1207
## What `e2e_fees/account_init` → *"account pays its own fee › pays natively in the Fee Juice by bridging funds themselves"* flakes while preparing L1 fee juice (before the account deploy), failing on the bridge write: ``` ContractFunctionExecutionError: ... contract function "depositToAztecPublic" from: 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266 Details: replacement transaction underpriced ``` CI: http://ci.aztec-labs.com/60b00f39a7a0e3cc ## Why The fee-juice bridge test harness does direct L1 writes (`mint` → `approve` → `depositToAztecPublic`) through a plain viem client with **no nonce manager**, so each write picks its nonce via `getTransactionCount('pending')`. That client uses Anvil account `0xf39f…92266` (mnemonic index 0) — which is **also the sequencer publisher account**. The publisher tracks its own nonce in `l1-tx-utils`, independently of the test client. When a test write and a publisher checkpoint tx land on the same nonce concurrently, the higher-priced publisher tx wins and the test's lower-priced tx is rejected. In the failed run the publisher was sending on account #0 at ~95–120 gwei: ``` node:l1-tx-utils Sent L1 transaction 0x9652… {nonce: 67, account: 0xf39f…92266, maxFeePerGas: "95.396…"} ``` while the test's `depositToAztecPublic` went out on the same account at nonce **60** / ~26 gwei → `replacement transaction underpriced`. The shared-account race is pre-existing but rarely hit; #24037 (changing publisher send timing) makes the overlap more likely, which surfaced it. ## Fix Send the harness's direct L1 writes from a dedicated mnemonic account (`L1_DIRECT_WRITE_ACCOUNT_INDEX = 1`) instead of the deployer/publisher account #0. Index 1 is unused by any L1-tx sender in these setups (publisher = 0, prover = 2, validators = 3+), so the test's writes no longer share a nonce stream with the publisher. Affects `FeesTest` and `ClientFlowsBenchmark`.
…PublicCalls (#24031) ## Motivation `AztecNodeService.simulatePublicCalls` built globals for a fictional next block using "the L2 slot at the next L1 slot" with no pipelining offset and no L1 state overrides. Under proposer pipelining this simulates one slot too early and computes the mana min fee against the current L1 state, ignoring the gossiped proposed checkpoint that will land before the simulated block does — producing wrong L1-to-L2 message sets and wrong fees for wallet-side estimation. ## Approach The simulation now mirrors how the sequencer picks the build slot and fee inputs (`sequencer.ts` / `checkpoint_proposal_job.ts` / the checkpoint simulation overrides builder), split into two cases: - **Mid-checkpoint continuation** (proposed blocks past the last checkpoint proposal): every block in a checkpoint shares the same checkpoint-wide globals, so the next block's globals are copied verbatim from the latest proposed block header with only the block number bumped. No L1 reads, no L1-to-L2 message insertion. A missing header fails the request rather than falling through, since the fork already contains the ongoing checkpoint's messages and a fall-through would insert them twice. - **Opening a new checkpoint**: the target slot is the sequencer's formula (`next-L1-slot L2 slot + PROPOSER_PIPELINING_SLOT_OFFSET`), maxed with `proposedCheckpointSlot + 1` when a checkpoint proposal is pending L1. The same `SimulationOverridesPlan` the sequencer uses is always applied: parent archive/temp-log/fee-header overrides derived from the proposed checkpoint when pipelining, tips pinned to the rollback target when the pending chain is invalid, and tips pinned to the checkpointed tip otherwise (neutralizing prunes in fee computation). Both the target slot and the plan derive from a single `ProposedCheckpointData` read so they cannot disagree about the parent. The simulator trusts archiver tips verbatim — no staleness guards. Stale proposed blocks and checkpoints are the archiver's responsibility (orphan pruning and L1-authoritative eviction), so a stale tip mis-simulates only transiently. The logic moves out of `AztecNodeService` into a new `NodePublicCallsSimulator` class so the slot/globals selection is unit-testable without standing up a node. Supersedes #23389. Caching of the L1 fee queries introduced here is tracked separately in A-1208. **Structured for commit-by-commit review:** 1. `refactor(aztec-node)`: pure move of the existing `simulatePublicCalls` code (and its tests) into `NodePublicCallsSimulator` — no behavior change. 2. `fix(aztec-node)`: the actual behavior changes described above. 3. `refactor`: moves the checkpoint simulation overrides builder from sequencer-client to `@aztec/stdlib/checkpoint` so the node consumes it from a shared home. ## API changes - `GlobalVariableBuilder.buildGlobalVariables` is removed from the stdlib interface and its implementations (sequencer-client, TXE); slot selection is now the caller's job and only `buildCheckpointGlobalVariables` remains. - `buildCheckpointSimulationOverridesPlan` and `computePipelinedParentFeeHeader` move from sequencer-client internals to `@aztec/stdlib/checkpoint`, shared by the sequencer and the node simulator. - `AztecNodeService` constructor takes an optional `RollupContract`; environments that never see a proposed checkpoint or an invalid pending chain (TXE) may omit it, and the simulator fails loudly if those paths are reached without it. ## Changes - **aztec-node**: new `NodePublicCallsSimulator` with the two-case globals selection and always-on overrides plan; `simulatePublicCalls` reduced to a thin delegate. - **aztec-node (tests)**: 12 unit tests covering verbatim mid-checkpoint continuation, the missing-header guard, the pipelining offset, parent-slot/override derivation from the proposed checkpoint data, torn-snapshot fallback, invalid-pending-chain tips pinning, message insertion semantics, and the rollup-contract invariant. - **stdlib**: gains `checkpoint/simulation_overrides.ts` (moved from sequencer-client, with its unit tests); `buildGlobalVariables` removed from the `GlobalVariableBuilder` interface. - **sequencer-client**: imports the overrides builder from stdlib; `buildGlobalVariables` removed from the implementation. - **txe**: `buildGlobalVariables` removed; TXE node construction passes no rollup contract. Fixes A-1063
Collaborator
Author
|
🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass. |
…PI (#24066) Fixes A-1220 ## Why `node.registerContractFunctionSignatures` was an always-on, unauthenticated write on the main `AztecNode` JSON-RPC surface. It populates a per-node, in-memory, non-gossiped selector→name map used only to name public-execution stack traces, so it has no business on a prod node's public API (anyone can spam it up to the cap, and you'd have to keep talking to the same node for it to mean anything). Decoding traces should be a client concern. ## What Relocate the method from the always-on `AztecNode` interface to the existing, gated `AztecNodeDebug` interface (served under the `nodeDebug_` namespace, only mounted when the node runs with debug endpoints enabled — always on in the in-process sandbox, off by default on standalone nodes). The node-side read path (`getDebugFunctionName` → AVM trace naming) is unchanged; only the RPC write moves. The PXE now calls it through an **optional** `nodeDebug` client and skips registration entirely when the node does not expose the debug API. PXE-side simulation errors keep named traces regardless, via the existing ABI-based enrichment in `pxe/src/error_enriching.ts`. - **stdlib**: removed `registerContractFunctionSignatures` from `AztecNode` + `AztecNodeApiSchema`; added it (and the length/count caps) to `AztecNodeDebug` + `AztecNodeDebugApiSchema`. - **aztec-node**: server impl unchanged — `AztecNodeService` already implements `AztecNodeDebug`, so the method now satisfies that interface and is served under `nodeDebug_`. - **pxe**: optional `nodeDebug?: AztecNodeDebug` on `PXECreateArgs`/constructor; both registration call sites become `this.nodeDebug?.registerContractFunctionSignatures(...)`. Threaded `options.nodeDebug` through all three `createPXE` entrypoints (option A — explicit injection). - **e2e**: wired `nodeDebug` in `end-to-end/src/fixtures/setup.ts` for both PXE-creation paths — `setup()` passes the in-process node service directly; `setupPXEAndGetWallet()` passes it via an `isAztecNodeDebug` type guard (one caller types its node loosely). - **tests**: moved the schema round-trip case from the node interface test to the debug interface test; retargeted the PXE registration tests to the `nodeDebug` mock; updated the `node_rpc_perf` benchmark. - **docs**: dropped the moved method from the generated node API reference (the generator only documents `node_`/`nodeAdmin_`), and added a migration note. ## Breaking change `node_registerContractFunctionSignatures` is removed from the main node RPC. Callers must use `nodeDebug_registerContractFunctionSignatures` against a debug-enabled node. See the migration note. ## Verification Local build could not be run here: the workspace is a cold checkout (noir submodule packages absent, no Rust/Docker), so `yarn install`/`yarn build` cannot bootstrap. The change is a contained TypeScript interface relocation + optional-client wiring; relying on CI for the full build/lint/test. A follow-up could wire `createAztecNodeDebugClient(nodeUrl)` into the URL-based PXE creators (bot/embedded wallet) if named traces are wanted against a debug-enabled remote node. --- *Created by [claudebox](https://claudebox.work/v2/sessions/b8d3c4fab6c0225a) · group: `slackbot`*
## Conflict resolution — review this PR Stacked on top of the raw merge PR #24071 (`cb/merge-train-spartan-v5-raw`). This PR is the reviewable diff that resolves the one conflict from merging `v5-next` into `merge-train/spartan-v5`. Land order when green: this PR → raw PR → `merge-train/spartan-v5`. ### Conflicts resolved (1) **`yarn-project/end-to-end/src/bench/client_flows/client_flows_benchmark.ts`** — import-list divergence: - Train side imported `L1_DIRECT_WRITE_ACCOUNT_INDEX` (from `fixtures.js`) and `deployAccounts` (from `setup.js`); v5-next had neither. - **Kept** `L1_DIRECT_WRITE_ACCOUNT_INDEX` — it is exported by `fixtures.ts` (`export const L1_DIRECT_WRITE_ACCOUNT_INDEX = 1`) and used in the file body (passed to the L1 client at the deployer-account index, ~line 255). - **Dropped** `deployAccounts` — it is no longer exported by the merged `setup.ts` (v5-next refactored it away) and is unused anywhere in the file body, so importing it would break the build. No fixtures/generated constants were involved; this is a pure import resolution. Full TS build is validated by CI on this PR. ### Parents - train `merge-train/spartan-v5` @ `d979a26` - base `v5-next` @ `5fd6187` --- *Created by [claudebox](https://claudebox.work/v2/sessions/0e582efb7d4723ed) · group: `slackbot`*
…rs) (#24071) ## Raw merge — do not merge alone Raw `git merge --no-ff` of `v5-next` into `merge-train/spartan-v5` with conflict markers **committed**. This PR is red by design and exists only as the base of the stacked resolution PR. Review it together with the fix PR `cb/merge-train-spartan-v5-fix`. ### Parents - train `merge-train/spartan-v5` @ `d979a26bc060068ecc33c0b1d88a8dac7e941bef` - base `v5-next` @ `5fd61870609b88b14dfcd2924453bd81f77ef0b5` - merge-base `ab5413c72dc5377107943b8614130ec8050bf06c` ### Conflicted files (1) - `yarn-project/end-to-end/src/bench/client_flows/client_flows_benchmark.ts` Resolution + any regeneration lands in the stacked fix PR. --- *Created by [claudebox](https://claudebox.work/v2/sessions/0e582efb7d4723ed) · group: `slackbot`*
## Summary - Keep the shared cross-chain rollup/inbox/outbox client on the deployer mnemonic account so L2->L1 message consumption signs with the expected recipient. - Let L1-writing cross-chain suites opt their harness into `L1_DIRECT_WRITE_ACCOUNT_INDEX` to avoid nonce races with the sequencer publisher under pipelining. - Apply that dedicated harness account to token bridge and L1->L2 message tests. - Fix Prettier formatting reported by CI. ## Verification - `git diff --check` - `./bootstrap.sh ci` (blocked locally: bootstrap reports `Unknown command: ci`; the container also has no Docker/Redis access) - `yarn-project/./bootstrap.sh format --check` (blocked locally: missing `yarn-project/node_modules/.bin/prettier`)
…nistic (#24106) ## Why the merge train was dequeued PR #24053 (`merge-train/spartan-v5` → `v5-next`) was dequeued from the merge queue because its `merge-queue-heavy` CI run failed. Tracing the failure: - CI3 merge_group run `27568575541` → shard `x4-full` (`ci-full-no-test-cache`) failed. - The failing test was `yarn-project/world-state/src/native/native_world_state.test.ts` (`code: 1`), specifically: > `NativeWorldState › Pending and Proven chain › does not fail when a delayed-close fork is destroyed by a reorg before its close fires` - Failing assertion (`native_world_state.test.ts:966`): ``` expect((ws as any).instance.queues.has(forkId)).toBe(false); // Expected: false, Received: true ``` The `warnSpy` assertion on the preceding line passed; only the queue-cleanup assertion failed. This test and the world-state instance/facade code it exercises are **not modified by this train** — the test already exists unchanged in `v5-next`. So this is a **pre-existing timing flake** that happened to fail in this grind and blocked the train. ## Root cause The delayed-close path is correct and always eventually cleans up: - `asyncDispose()` schedules `sleep(closeDelayMs).then(() => close())` (fire-and-forget). - `close()` → `doClose()` → `instance.call(DELETE_FORK)`, and the per-fork queue entry is deleted in `call()`'s `finally` regardless of whether the native side rejects with `Fork not found` (which it does here, because the reorg/unwind already destroyed the native fork). The test then waited a **fixed** `sleep(closeDelayMs * 3)` (3s) before asserting the queue was gone. Under a loaded merge-queue grind, the scheduled close (fired at +`closeDelayMs`) plus its async `DELETE_FORK` round-trip and `queue.stop()` can take longer than that fixed margin, so the queue entry was still present when the assertion ran — Received `true`. ## Fix Replace the racy fixed sleep with a deterministic wait on the exact asserted condition: ```ts await retryUntil(() => !(ws as any).instance.queues.has(forkId), 'delayed-close fork queue cleanup', 30, 0.1); ``` This polls until the queue entry is removed (the cleanup runs inside the same close path), with a 30s timeout. It preserves the regression intent: if cleanup genuinely stopped happening, `retryUntil` throws a clear `Timeout awaiting delayed-close fork queue cleanup` instead of a bare `toBe(false)`. The production code is unchanged; no test is skipped or weakened. The now-unused `sleep` import is swapped for `retryUntil`. ## Verification This is a test-only, single-statement timing change. I was unable to run `./bootstrap.sh ci` in this environment — the workspace has no `node_modules` and no barretenberg/native `world_state_napi` build artifacts, so exercising this native test would require a multi-hour C++ + TS build that isn't proportionate to a wait-deterministically change. The change is type-checked by inspection: `retryUntil(fn, name, timeoutSec, intervalSec)` matches the foundation signature, and the import path (`@aztec/foundation/retry`) is the one used across yarn-project. CI on this PR will validate it. --- *Created by [claudebox](https://claudebox.work/v2/sessions/6fb41d03fbbd7aaa) · group: `slackbot`*
## Summary - Investigated the `merge-train/spartan-v5` dequeue for train PR #24053. - Latest dequeue was from merge-group CI, not a merge conflict: #24053 is currently mergeable/clean on `merge-train/spartan-v5`. - The failing merge-group run was `27576008840` on `gh-readonly-queue/v5-next/pr-24053-fc2322d6...`, failing `ci/x3-full` via `yarn-project/kv-store/scripts/run_test.sh src/bench/sqlite-opfs-encrypted/map_bench.test.ts`. - Root cause: skipped-by-default browser benchmarks still imported SQLite/OPFS and shared benchmark dependencies at module load, so normal CI test discovery could fail before the `describe.skip` path. - Fix: lazy-load benchmark dependencies only when `VITE_BENCH=1` for the IndexedDB, SQLite-OPFS, and encrypted SQLite-OPFS map benchmarks. Failed dashboard log: http://ci.aztec-labs.com/dbe70cb905f2df69 ## Verification - `CI=1 PATH="$HOME/.cargo/bin:$PATH" scripts/run_test.sh src/bench/sqlite-opfs-encrypted/map_bench.test.ts` from `yarn-project/kv-store` passes with the benchmark skipped. - `CI=1 PATH="$HOME/.cargo/bin:$PATH" scripts/run_test.sh src/bench/sqlite-opfs/map_bench.test.ts` passes with the benchmark skipped. - `CI=1 PATH="$HOME/.cargo/bin:$PATH" scripts/run_test.sh src/bench/indexeddb/map_bench.test.ts` passes with the benchmark skipped. - `yarn format kv-store --check` passes. Notes: the requested root `./bootstrap.sh ci` is not a valid bootstrap command in this checkout (`Unknown command: ci`). I also tried the apparent CI equivalent, `./bootstrap.sh ci-full-no-test-cache`; it did not complete in this container because the local CI engine/tooling failed before reaching these tests. --- *Created by [claudebox](https://claudebox.work/v2/sessions/72862d04194777fa) · group: `slackbot`*
…6) (#24041) ## Motivation When a node receives a block proposal whose index within its checkpoint exceeds the per-checkpoint block limit, it rejects the gossip message and penalizes the relaying peer. But an oversized checkpoint is produced by the proposer, not the relay: the proposer should be slashed, and the honest relay left alone. Because the proposal was rejected at ingress, it was never stored nor re-broadcast, so no other node could observe the evidence and a slash for this offense could never fire (slashing is social-consensus voting and needs enough validators to independently see the offense, as with equivocation). ## Approach Mirrors how equivocation slashing works today. - At gossip validation, only indices at or beyond the structural ceiling `MAX_ATTESTABLE_BLOCKS_PER_CHECKPOINT` (72) are rejected with a peer penalty. Indices in `[maxBlocksPerCheckpoint, 72)` are structurally valid proposer misbehavior and pass validation. Proposer/signature/slot checks still run first, so only the slot's legitimate proposer can get a proposal this far. - Oversized proposals ride the regular attestation-pool store path, so they are retained and re-broadcast as slashing evidence with storage and amplification bounded by the existing per-position and per-slot pool caps, and equivocation detection keeps working at oversized indices. The gossip handlers never process or attest to an oversized proposal: `isOversized` is computed once during validation and threaded through the validation-result metadata alongside `isEquivocated`. Pool-cap violations keep their relay penalty regardless of oversizedness — relays must respect the caps even when forwarding slashing evidence. - When the p2p service stores an oversized proposal it invokes a new oversized-proposal callback (the same pattern as the duplicate-proposal callback for equivocation, covering both the standalone block and the checkpoint-embedded gossip paths). The validator client handles it by emitting `BROADCASTED_INVALID_BLOCK_PROPOSAL` with the existing `slashBroadcastedInvalidBlockPenalty`, deduped per (proposer, slot): a signed block proposal at an illegal index is invalid in itself, no checkpoint proposal needed. The validator-client checkpoint-level `too_many_blocks_in_checkpoint` check stays as defense-in-depth, though it cannot fire for the gossip case since oversized proposals are never processed. - When `maxBlocksPerCheckpoint` is undefined (local/test setups), the oversized paths no-op and only the structural ceiling applies. ## Changes - **p2p**: `ProposalValidator.validateBlockIndexWithinCheckpoint` rejects only at the structural ceiling; `LibP2PService` threads `isOversized` through validation metadata, skips processing/attestation for oversized block and checkpoint proposals, and reports stored oversized proposals through a new `registerOversizedProposalCallback`. - **validator-client**: registers the oversized-proposal callback and emits the `BROADCASTED_INVALID_BLOCK_PROPOSAL` offense for the proposer, deduped per (proposer, slot). - **tests**: cover the new accept window, store/re-broadcast/no-process invariants on both gossip paths, duplicate and equivocating oversized proposals, pool-cap rejections, callback firing per stored proposal, offense emission and dedup, and undefined-config no-op. An e2e exercising the full slash flow is left as a follow-up. Fixes A-1166
## Human-written summary Fixes three potential issues in the block-stream: 1. We snapshot the source tips, then download blocks until the proposed tip and pass them to the consumer, and then emit the checkpointed and proven tips. If there's a prune while we're downloading blocks, we may not be able to fetch blobs until the proposed tip. And worse, we may be then emitting checkpointed and proven events that reference blocks _we could never pass onto the consumer_. This PR fixes it by aborting all events if we fail to collect all the blocks we expected, and waiting until the next iteration to heal. 2. When we detect a prune, we walk back to find the common ancestor. But that walk back may take time. During that walk back, the checkpointed and proven tips may have shifted, so we re-fetch them to advertise the proper latest tips. 3. When walking back to find the common ancestor during a prune, we compare block hashes of source vs consumer. If the consumer didn't see a block for whatever reason, but that block was also removed from the source, then both block hashes compared are undefined, and we stop the walk-back, instead of continuing until finding an actual existing common ancestor. 4. When walking back, if the consumer for some reason did not have the block hashes available, a walk back to genesis would be triggered. We now cap the walk back to the local finalized chain tip. ## Motivation This PR hardens the block-mode `L2BlockStream` against three pre-existing hazards. It is split out of #24034 (the `chain-proposed`/`tips-only` feature work) so the correctness fixes can be reviewed on their own — #24034's `chain-proposed` emission depends on the pass-completion gate introduced here, so this PR sits below it in the stack. The three hazards, all latent in block mode today: - **Tier cursors advance from an incomplete download pass.** The download loop broke on an empty `getBlocks` result and end-of-pass reconciliation still ran, so `chain-checkpointed`/`chain-proven`/ `chain-finalized` could reference blocks the consumer never received via `blocks-added` — exactly the transient-inconsistency class #24007 set out to remove. - **`chain-pruned` carries stale clamp tips.** The prune event's `checkpointed`/`proven` payload came from the pass snapshot, which on a stale pass describes the chain *before* the prune. p2p feeds `event.checkpointed.checkpoint` into `isEpochPrune`, whose contract is "the checkpoint id **after** the prune", and the result drives the irreversible `deleteAllTxs` mempool wipe — a misclassification risk. - **The walk-back stops above the true divergence.** `areBlockHashesEqualAt` ended with `localHash === sourceHash`, so a height missing on both sides (`undefined === undefined`) counted as agreement. With a store that has gaps in its hash index (p2p keeps none below its `startingBlock`), a gap meeting a source-pruned height stopped the walk above the divergence — an under-deep prune that left old-fork state in place with no later re-detection. ## Approach - **Pass atomicity.** Extract the block download loop into `downloadBlocks()`, which returns whether the download plan completed; tier reconciliation is skipped for the pass when it did not. Two staleness terminations: an empty `getBlocks` below the target, and a delivered proposed block whose hash differs from the snapshot's (a mid-pass same-height fork swap). Intentional skips — the loop never running because `startingBlock`/caught-up put the cursor past the proposed tip — are trivially complete and still reconcile, preserving the A-1061 fix. - **Fresh reorg payload.** On walk-back divergence, re-read `getL2Tips` and drive the `chain-pruned` clamp tips, the #13471 prune-target clamp, the download target, and tier reconciliation from the re-read rather than the pre-prune snapshot. The pass does not abort if the re-read's proposed tip moved on (a busy chain advances every pass and prune emission would otherwise starve). - **Walk-back missing-local-hash rule.** A missing **local** hash now compares unequal regardless of the source side, so the walk continues to a recorded matching height or genesis (block 0 always resolves via the store's `initialBlockHash`). Prunes become over-deep-only, which consumers tolerate. The `skipFinalized` AbortError path is unchanged. - The walk-back hash cache is seeded with the **proposed tip only**. Seeding the tier tips would let a snapshot entry that went stale (a reorg between the snapshot and the walk) fake agreement at a reorged height and stop the walk early; the re-read appends only its proposed tip after the walk is over. **Behavior change worth calling out:** a *permanently* missing source range previously advanced tier cursors past undelivered blocks (masking the source bug). With pass atomicity it now wedges loudly — a warn each pass, p2p stays SYNCHING — instead of silently diverging. Loud failure is the better behavior. ## Changes - `stdlib/src/block/l2_block_stream/l2_block_stream.ts`: `downloadBlocks()` extraction + completion gate; post-divergence `getL2Tips` re-read driving the prune payload/target/download/reconciliation; missing local-hash → unequal; proposed-only cache seeding. - `stdlib/src/block/l2_block_stream/l2_block_stream.test.ts`: pass-atomicity terminations (empty `getBlocks`, delivered-hash mismatch, A-1061 fast-forward still reconciles), prune-payload freshness, re-read download target + reconciliation, and the walk-back both-undefined / stale tier-seed regressions.
…24130) ## Summary - Fixes the merge-queue failure on PR #24053 by making `CheckpointStore`'s pruning test use explicit slots for its synthetic checkpoints. - The failed CI run was merge-group run `27616214374`; `CI3` job `81652669415` failed in `x9-full` (`http://ci.aztec-labs.com/11cc1d3bfd3986cf`). - The failing test was `yarn-project/scripts/run_test.sh prover-node/src/checkpoint-store.test.ts`, log `http://ci.aztec-labs.com/0e7cfabe067f8bed`, where `markPrunedAboveBlock marks every prover holding a block above the target and returns them` hit the stricter same-slot canonical checkpoint guard while building random test data. ## Verification - `git diff --check` - `./bootstrap.sh ci` attempted, but this branch's root bootstrap has no `ci` target and the container has no Docker socket. - `./bootstrap.sh ci-full-no-test-cache` attempted to match the failed merge-queue leg, but local bootstrap failed before project tests because the test engine exited and interrupted the parallel build. - Focused Yarn/Jest verification is blocked in this checkout because `yarn-project` portal dependencies under `noir/packages/*` are not generated; `noir/bootstrap.sh` is blocked by a crates.io DNS/proxy failure while installing `just`. --- *Created by [claudebox](https://claudebox.work/v2/sessions/46e781f113c5bc56) · group: `slackbot`*
Collaborator
Author
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BEGIN_COMMIT_OVERRIDE
refactor(stdlib)!: thin chain-checkpointed event, collapse sync (#24007)
refactor!: remove proposedCheckpoint tip (#24008)
fix(sequencer): wait for previous L1 block before publishing (#24037)
test(e2e_fees): bridge fee juice from a dedicated L1 account (#24054)
fix(aztec-node): pipelining-aware slot and fee simulation in simulatePublicCalls (#24031)
refactor: move registerContractFunctionSignatures to the node debug API (#24066)
fix: resolve v5-next → merge-train/spartan-v5 conflict (#24072)
chore: merge v5-next into merge-train/spartan-v5 (raw, conflict markers) (#24071)
test(ci): mark e2e_epochs/epochs_mbps_redistribution as flaky (#24098)
fix: isolate cross-chain L1 writes from publisher nonce (#24104)
test(world-state): make delayed-close fork queue-cleanup wait deterministic (#24106)
fix(kv-store): lazy-load skipped browser benchmark deps (#24108)
feat(p2p): slash proposers exceeding max blocks per checkpoint (A-1166) (#24041)
fix(stdlib): fix race conditions in L2BlockStream (#24042)
test(prover-node): make checkpoint store pruning test deterministic (#24130)
END_COMMIT_OVERRIDE