test(devnet): network-level coverage for the 10.0.2 merged PRs#1397
Conversation
Add 7 network-level devnet suites (one per merged 10.0.2 PR), a shared devnet test harness, a cross-backend term-canon oracle, and a health-guarded stability sweep runner. All 7 suites validated green on a live 6-node devnet. Suites (devnet/prNNNN-*/): - pr1386-term-canon: diverse typed literals round-trip to the protocol canonical form via the live store + /api/query, and an RS proof lands (pipeline consistency; honest scope: all-oxigraph, not cross-backend) - pr1385-subgraph-rs: sub-graph KA publish + scoped materialization + RS liveness - pr1388-okf-integration: OKF import -> WM -> --share SWM -> verify -> export - pr1366-pca-deposit-waiver: activates the (dormant) deposit, proves waive / dust-charge / quota / authority wiring, resets the param after - pr1387-bulk-register-agents: bulk registerAgents + atomicity + owner-only - pr1384-preserve-agents-on-transfer: agents survive transfer + clearAgents - pr1370-admin-op-wallet: op-wallet add/remove + removeOperationalWallet guard Infra: - devnet/_bootstrap/harness.ts: shared harness (fetchRetry-hardened HTTP), smoke-validated; _bootstrap + each suite are pnpm workspace packages - packages/storage/test/term-canon-blazegraph-oracle.test.ts: asserts an oxigraph node and a blazegraph node compute the SAME V10 merkle leaf for the same typed literal; wired into CI's tornado-blazegraph lane - scripts/devnet-1002-coverage-sweep.sh: health-guarded stability sweep (loop-first, destructive orchestrator opt-in/last) Findings for follow-up: pr1385 `shared-memory publish --sub-graph-name` exits 1 on a cosmetic post-success error despite confirming; the CG registration deposit is dormant (0) on devnet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| log " $(printf '%-38s' "${TALLY_NAMES[$i]}") ${p}/${tot} pass$( [ "$f" -gt 0 ] && echo " ⚠ ${f} FAIL" )$( [ "$k" -gt 0 ] && echo " (${k} skip)" )" | ||
| done | ||
| log "TOTAL elapsed $(elapsed)s. aborted=$ABORTED. Results: $RESULTS" | ||
| log "10.0.2 coverage sweep complete." |
There was a problem hiding this comment.
🔴 Bug: The coverage sweep exits successfully even when suites fail
What's wrong
This script is the top-level validator for the new devnet coverage, but it only records failures in the summary log. Automation that invokes it cannot distinguish a clean sweep from one with failed suites or an aborted health loop, so regressions can pass the gate as long as the script itself reaches the end.
Example
Run the sweep with one suite that fails. The summary will include 0/1 pass ⚠ 1 FAIL, but because the script never exits non-zero after logging the summary, any caller/CI job treats the sweep as successful.
Suggested direction
Track a failure flag across baseline, stability rounds, health aborts, and the optional orchestrator, then return a non-zero process status after printing the aggregate summary.
For Agents
Update scripts/devnet-1002-coverage-sweep.sh final aggregation to compute whether any suite failed, the health loop aborted, or the optional orchestrator returned non-zero; preserve summary logging, then exit 1 for validation failures and a distinct non-zero code if desired for infrastructure aborts. Add a small shell-level check using a stub failing suite/command to prove the script returns non-zero while still writing the summary.
Coverage sweep reports success even when suites fail
What's wrong
The new sweep is itself validation evidence for the PR coverage suites, but it only logs failures instead of failing the process. That can make a broken devnet suite look green to automation or anyone checking the command status.
Example
If devnet/pr1385-subgraph-rs fails in the baseline, run_suite returns fail, the summary prints a failure count, and the script still reaches the final log and exits with status 0. A caller or CI job running this sweep would report success despite failed validation.
Suggested direction
Return a non-zero exit code whenever any tallied suite has failures or the health-guarded loop aborts unexpectedly, while still writing the aggregate summary.
For Agents
In scripts/devnet-1002-coverage-sweep.sh, track whether any suite failed, whether the stability loop aborted, and whether the optional orchestrator exited non-zero. After printing the summary, exit 1 for failed validation and reserve exit 0 for all pass/allowed skips.
There was a problem hiding this comment.
🔴 Bug: The sweep ignores failures from the optional orchestrator phase
What's wrong
The script advertises the comprehensive orchestrator as phase 3 of the sweep, but its exit status is never used to determine the sweep result. A failed destructive/orchestrator phase can therefore be reported as a successful overall sweep, which breaks CI or release gating that relies on this script.
Example
Run with RUN_ORCHESTRATOR=1; if every Vitest suite passes but scripts/devnet-comprehensive.sh exits 1, the sweep logs that exit at line 129/133, leaves TOTAL_FAILS=0, and still exits 0 at line 155.
Suggested direction
Store the orchestrator command status immediately after it runs and fold it into the final exit-code decision instead of only logging it.
For Agents
In scripts/devnet-1002-coverage-sweep.sh, capture the orchestrator return code in a variable for both timeout and non-timeout branches, include it in the aggregate summary, and make the final exit non-zero when the orchestrator phase fails. Add a shell-level check or documented dry run proving RUN_ORCHESTRATOR=1 propagates a failing orchestrator exit.
The coverage sweep ignores the opt-in orchestrator failure when deciding success
What's wrong
The script advertises itself as a gateable coverage sweep, but an enabled destructive orchestrator can fail without affecting the script exit code. That makes validation evidence misleading for anyone running the full sweep.
Example
Run RUN_ORCHESTRATOR=1 scripts/devnet-1002-coverage-sweep.sh with all Vitest suites passing but devnet-comprehensive.sh exiting 1. The script logs the orchestrator failure, leaves TOTAL_FAILS=0 and ABORTED=0, then reaches EXIT 0 — all pass.
Suggested direction
Treat a non-zero Phase 3 orchestrator exit as a failing validation result when RUN_ORCHESTRATOR=1.
For Agents
In scripts/devnet-1002-coverage-sweep.sh, capture the orchestrator status in an ORCH_RC variable and include it in the aggregate exit logic and summary. Preserve the existing distinction between suite failures and infra aborts, but do not allow a requested Phase 3 failure to return 0.
There was a problem hiding this comment.
🔴 Bug: Orchestrator failures are logged but do not fail the sweep
What's wrong
The new sweep script advertises itself as a CI-gatable validation run, but the optional destructive orchestrator phase can fail without affecting the script’s final status. That gives callers a false green result for a failed validation phase.
Example
Run with RUN_ORCHESTRATOR=1, all vitest suites passing, and scripts/devnet-comprehensive.sh exiting 1. The sweep logs PHASE 3 (orchestrator) exit=1, then TOTAL_FAILS=0 and ABORTED=0, reaches EXIT 0 — all pass., and exits successfully.
Suggested direction
Capture the phase-3 exit status and fold it into the final exit-code decision, instead of only logging it.
For Agents
Update scripts/devnet-1002-coverage-sweep.sh: initialize an orchestrator status variable, assign it immediately after the timeout/bash command, log that saved value, include it in the summary, and return non-zero when RUN_ORCHESTRATOR=1 and the orchestrator failed. Preserve the existing suite failure and health-abort precedence. Prove it with a small shell test or local substitution where the phase-3 command exits 1 and the sweep exits non-zero.
There was a problem hiding this comment.
🔴 Bug: The sweep can turn failed Vitest runs into skips
What's wrong
The release gate can undercount failures because it reclassifies some non-zero Vitest exits as benign skips using a loose text match.
Example
A suite with it.skip(...) plus a beforeAll failure can exit non-zero with a summary containing skipped tests but no individual failed tests. run_suite returns skip, TOTAL_FAILS remains 0, and the sweep can exit 0 despite a failed suite file.
Suggested direction
Treat any non-zero Vitest exit as fail unless the log conclusively proves there were no failed test files and the suite intentionally skipped all tests.
Confidence note
This depends on Vitest's summary shape, but Vitest commonly reports hook/import failures as failed test files without individual failed tests; if any tests are skipped in that file, this regex branch can match.
For Agents
In scripts/devnet-1002-coverage-sweep.sh, classify skips only when Vitest exits 0 with all tests skipped, or parse a machine-readable reporter. At minimum, do not return skip when the log contains failed test files, unhandled errors, hook failures, or any non-zero exit that is not a known all-skipped condition.
There was a problem hiding this comment.
🔴 Bug: The sweep can classify errored Vitest runs as skips
What's wrong
The sweep is the validation harness for these new suites, but its skip heuristic can hide real Vitest errors whenever any skipped test appears in the log. That can make a broken suite look benign in the aggregate pass-rate report.
Example
A suite log with Tests 1 skipped plus Errors 1 error and no Tests ... failed line would be classified as skip, so the sweep tally would not increment failures and could still exit 0.
Suggested direction
Only return skip for a clean all-skipped Vitest result; otherwise preserve the non-zero run as a failure.
Confidence note
This depends on Vitest output shape, but Vitest commonly reports setup/import failures under an Errors section rather than as failed tests.
For Agents
In scripts/devnet-1002-coverage-sweep.sh, make non-zero Vitest exits fail unless the log conclusively represents an all-skipped run with no errors. Prefer vitest --reporter=json or at least check for Errors, failed, and failed test files before returning skip.
There was a problem hiding this comment.
🟡 Issue: Sweep health check ignores devnet port configuration
What's wrong
The coverage sweep can abort before running any suite even when the devnet is healthy, because its health check assumes default ports while the devnet launcher explicitly supports overriding them. This makes the integration script incompatible with supported devnet configurations and can produce a misleading infrastructure failure.
Example
Start the devnet with HARDHAT_PORT=18545 API_PORT_BASE=9301 scripts/devnet.sh start 6. The harness tests can read node API ports from .devnet/node*/config.json, but scripts/devnet-1002-coverage-sweep.sh will fail preflight health because it still probes 8545 and 9201-9206.
Suggested direction
Derive probe URLs from the same configuration source as scripts/devnet.sh instead of hardcoding the default ports.
For Agents
Update scripts/devnet-1002-coverage-sweep.sh health probing to honor HARDHAT_PORT and API_PORT_BASE, or read .devnet/node*/config.json for each node API port. Preserve the retry behavior, and prove the fix by running the sweep preflight against a devnet started with non-default Hardhat/API ports.
| // (A) on-chain solved | ||
| try { | ||
| const ch = await rss.getNodeChallenge(id); | ||
| if (ch[6] === true) { |
There was a problem hiding this comment.
🔴 Bug: RS liveness can pass on a pre-existing solved challenge
What's wrong
The test claims to prove that Random Sampling still lands a proof with the new sub-graph KA resident, but one success path can be satisfied by chain state that existed before the test exercised the changed behavior. That gives false confidence for the extractor regression this suite is meant to cover.
Example
On a warm devnet where node1 already has getNodeChallenge(id).solved === true before this suite starts, the poll returns immediately via via: 'onChainSolved' even if the newly published sub-graph KA would still make the RS extractor stall. No new proof or new sampled challenge is required.
Suggested direction
Make the liveness assertion compare against a pre-test baseline or require a new proof signal after the suite publishes its KA, rather than accepting any current solved challenge.
For Agents
In the RS liveness tests, capture a baseline challenge tuple and proof/submission state after publishing, then require evidence created after that point: a strict submittedCount/last tx increase, a new period/challenge becoming solved, or a solved challenge that samples the suite’s KA. Preserve the deterministic sub-graph query assertions; tighten only the liveness gate.
There was a problem hiding this comment.
🔴 Bug: The sub-graph RS test can pass on an unrelated proof
What's wrong
The test is meant to guard the regression where RS cannot extract named sub-graph KAs, but the success condition is generic RS liveness. On a warm shared devnet, unrelated KAs can satisfy the assertion while the sub-graph-specific path remains broken.
Example
If the sub-graph extractor still throws KCDataMissingError for s.kaId, but the prover samples an older root-CG KA during the polling window, submittedCount increases and line 400 returns success. The test passes without proving the sub-graph KA was extractable by RS.
Suggested direction
Require evidence that RS actually processed the sub-graph KA, not just that the prover made progress somewhere on the shared devnet.
For Agents
In devnet/pr1385-subgraph-rs/automated.test.ts, make the RS assertion target the published sub-graph KA. Prefer an isolated/forced sampling fixture, a direct extractor-level integration call for s.kaId, or gate the on-chain solved challenge on knowledgeAssetId === s.kaId instead of treating unrelated liveness as success.
There was a problem hiding this comment.
🔴 Bug: RS liveness can pass without sampling the sub-graph KA
What's wrong
The test claims to verify that Random Sampling can prove a named sub-graph KA, but it accepts unrelated RS progress from any core and any sampled KA. On a warm shared devnet, that can pass even if the sub-graph extractor still fails with KCDataMissingError for the asset published by this suite.
Example
A broken extractor could still skip the new sub-graph KA, while another already-existing KA in devnet-test is sampled and submitted. submittedCount increases, the test returns success at line 411, and the sub-graph extraction regression remains untested.
Suggested direction
Make the load-bearing assertion target the sub-graph KA itself, or move this to a lower-level extractor regression test where the KA id/root is controlled.
For Agents
In devnet/pr1385-subgraph-rs/automated.test.ts, add a deterministic check that exercises extractV10KCFromStore for the published sub-graph KA, or isolate/force the RS draw so the solved challenge must reference s.kaId. Keep unrelated RS liveness as diagnostic logging, not the pass condition for the sub-graph regression.
There was a problem hiding this comment.
🔴 Bug: RS liveness checks can pass without exercising the fixture KA
What's wrong
The new verification suites claim RS coverage for sub-graph extraction and term canonicalization, but the success condition accepts unrelated or stale RS progress. That gives green validation even when the changed read/hash path for the published fixture is never used.
Example
A devnet that samples and proves an older parent-CG KA after the baseline would increment submittedCount, so the #1385 test passes even if sub-graph KA extraction is still broken with KCDataMissingError. Likewise, #1386 can pass immediately on a solved challenge that existed before the diverse-literal KA was published.
Suggested direction
Tighten the liveness gates so they validate the specific regression path, not generic RS activity on a populated shared devnet.
For Agents
In the RS-focused assertions, require proof that the published fixture KA was actually sampled and solved, or isolate the sampling set so the only eligible KA is the fixture. At minimum, baseline the challenge period and reject success unless a newer solved challenge references the expected kaId, or add a targeted extractor/prover test that fails on KCDataMissingError for the fixture.
| `unrecognised /api/query response shape on node${node.num}: ${JSON.stringify(json).slice(0, 300)}`, | ||
| ); | ||
| } | ||
| return bindings as Array<Record<string, string>>; |
There was a problem hiding this comment.
🟡 Issue: The query helper lies about the binding shape and pushes ad-hoc normalization into every suite
What's wrong
The shared harness is supposed to reduce duplicated bootstrap/test plumbing, but this type boundary does the opposite: it erases the real response shape with a cast, then each suite compensates with its own slightly different binding coercion. That makes the tests harder to scan and makes future query-shape changes risky because the policy for datatype/lang/IRI handling is scattered rather than canonical.
Example
A future query response shaped like { value, datatype, 'xml:lang' } is typed as string by the harness, so every suite has to rediscover whether it needs typeof x === 'string' checks and how to preserve datatype/lang suffixes.
Suggested direction
Make the harness own the SPARQL binding contract instead of casting it away. This is a good place for a small typed model and shared conversion helpers, which would delete the repeated valueOf/cellStr/normTerm variants from the suites.
For Agents
Refactor devnet/_bootstrap/harness.ts around the /api/query boundary. Preserve current accepted response shapes, but expose either a typed SparqlBindingCell union plus shared normalizers (bindingValue, bindingIri, bindingObjectTerm) or normalize bindings to one canonical term shape before returning. Then replace the duplicated normalizers in the PR suites and run the affected devnet tests enough to prove their assertions still compare the same values.
There was a problem hiding this comment.
🟡 Issue: The shared harness exports an untyped JSON boundary that pushes any back into every suite
What's wrong
The harness is supposed to centralize devnet plumbing, but its HTTP helpers erase response types at the boundary. That keeps the suite code cast-heavy and makes every test re-interpret daemon payloads ad hoc. The result is less maintainable than the abstraction suggests, because the shared layer does not capture the contracts it is wrapping.
Example
getJson(node, '/api/operational-wallets') returns any, so json.wallets.find((w: any) => w.isPrimary) in the new operational-wallet suite has to rediscover the response shape locally instead of relying on a typed boundary.
Suggested direction
Make the harness own the boundary instead of exposing raw any: use requestJson<T>(), typed route result interfaces for repeated daemon responses, and a single normalized query binding type that matches the actual response variants.
For Agents
Start in devnet/_bootstrap/harness.ts. Preserve the current HTTP behavior, but replace getJson/postJson with a generic requestJson<T> plus typed helpers for common daemon shapes where possible. Update the new suites to consume typed response models instead of local any casts. A typecheck should prove callers no longer need w: any for the shared route shapes.
There was a problem hiding this comment.
🟡 Issue: Make the query binding boundary explicit instead of leaking ad-hoc shapes to every suite
What's wrong
The harness currently claims one binding shape while the new suites defensively handle several shapes. That makes the most reused API in this harness less trustworthy and spreads type/normalization logic across unrelated test files.
Example
A new suite reading query bindings has to rediscover whether a binding is a raw N-Triples string or a SPARQL-JSON { value, datatype, lang } object, then write another local normalizer even though the harness owns the /api/query boundary.
Suggested direction
Centralize binding normalization/type modelling in the harness so suite code consumes one stable shape.
For Agents
In devnet/_bootstrap/harness.ts, define a real BindingCell union and export canonical helpers such as bindingValue, termString, and iriString, or normalize inside queryNode and type the return accordingly. Update the PR suites to delete their local normalizers while preserving their assertions.
There was a problem hiding this comment.
🟡 Issue: Do not make the shared HTTP harness an any boundary
What's wrong
The PR introduces a shared harness but leaves its HTTP API as json: any. That pushes route-shape knowledge back into each suite and normalizes inline casts. Over time this makes the tests harder to refactor because field-name and response-shape changes are not localized to the harness.
Example
The operational-wallet suite has to write json.wallets.find((w: any) => w.isPrimary) and json.wallets.find((w: any) => w.isAdmin). The shared boundary knows this is JSON, but not whether json.wallets exists, whether entries have address/isPrimary/isAdmin, or whether status payloads have loop/submittedCount.
Suggested direction
Keep the low-level fetch wrapper generic, but expose typed route helpers at the harness boundary. That gives the suites direct, readable code without spreading unchecked object shapes and casts across every assertion.
For Agents
In devnet/_bootstrap/harness.ts, replace any-returning HTTP helpers with generic JsonResponse<T = unknown> helpers, plus small endpoint-specific decoders for the repeated route surfaces used by these suites: operational wallets, random-sampling status, and query bindings. Preserve tolerant non-JSON handling for error responses. Update suites to consume typed helpers and remove inline any casts; TypeScript should catch wrong field names.
There was a problem hiding this comment.
🔴 Bug: Structured SPARQL cells are not serialized as valid N-Triples terms
What's wrong
When /api/query returns SPARQL-JSON objects, the shared harness can corrupt the term representation before tests compare or hash it. URI objects lose their angle brackets, and literal objects are not escaped. That means any suite using this helper for URI objects or literals with special characters can validate the wrong value or fail for the wrong reason.
Example
normTerm({ type: 'uri', value: 'urn:okf:hub' }) currently returns urn:okf:hub; callers that feed this to V10 canonicalization/hash logic need <urn:okf:hub>. normTerm({ type: 'literal', value: 'A "quote"' }) returns an invalid quoted term instead of escaping the embedded quote.
Suggested direction
Use a real RDF/N-Triples term serializer or a small escaping helper at this boundary so every structured binding cell becomes the same term string shape that the V10 leaf canonicalizer expects.
For Agents
Fix devnet/_bootstrap/harness.ts term serialization. Preserve string inputs as-is, serialize structured uri as <value>, preserve valid blank-node labels, and N-Triples-escape literal lexical values before adding datatype/lang suffixes. Add a small unit/smoke assertion for structured URI and quote-containing literal cells.
There was a problem hiding this comment.
🟡 Issue: Split the harness before it becomes the devnet god module
What's wrong
This centralizes useful bootstrap code, but it does so by putting unrelated concerns into one shared module. At 700 lines on introduction, it is already the place every future suite will be tempted to modify, so it creates a maintainability bottleneck even though it is still under the 1k-line hard smell threshold.
Example
A future suite that only needs a new contract helper has to edit the same file that owns HTTP/query parsing and CLI spawning, increasing merge conflicts and making the harness harder to scan as more devnet suites are added.
Suggested direction
Keep the shared harness idea, but decompose it now while the boundary is still simple. The code-judo move is to make harness.ts a barrel export and move each concern into a small file with a clear owner instead of growing one all-purpose test utility module.
For Agents
Split devnet/_bootstrap/harness.ts into focused modules such as node.ts, cli.ts, files.ts, contracts.ts, funding.ts, http.ts, query.ts, and wait.ts. Keep a small index/barrel preserving current imports from ../_bootstrap/harness.js so suite behavior does not change. Add/keep the smoke test to prove the re-exported public surface still works.
There was a problem hiding this comment.
🟡 Issue: Make HTTP response shapes explicit instead of exporting any
What's wrong
The harness creates a shared boundary but leaves daemon JSON untyped. That pushes loosely shaped objects into every suite and normalizes ad-hoc property access instead of making the API contract explicit in one place.
Example
The operational-wallet suite has to remember that GET /api/operational-wallets returns identityId, hasProfile, canManage, and wallets with isPrimary/isAdmin/registered flags. None of that contract is modeled at the helper boundary, so each assertion re-discovers the shape dynamically.
Suggested direction
Use the harness as the typed boundary for daemon responses. Even lightweight interfaces per endpoint would remove the scattered casts and make route-shape changes visible at compile time.
For Agents
Change the HTTP helpers to be generic, for example getJson()/postJson<TReq,TRes>(), returning JsonResponse. Add small route response interfaces or typed route helpers for repeated endpoints such as operational wallets and random-sampling status. Preserve the tolerant JSON parsing behavior, but stop exporting any as the normal API.
There was a problem hiding this comment.
🟡 Issue: The shared harness is already a kitchen-sink module
What's wrong
The PR successfully removes repeated bootstrap code, but concentrates too many unrelated responsibilities into one 700-line module. That makes future harness changes harder to reason about and creates a natural dumping ground for every devnet convenience helper.
Example
A suite that only needs queryNode imports from the same module that owns hardhat storage writes, PCA minting, CLI process spawning, and auth-token parsing. That increases the amount of unrelated context a reader has to load for every new helper added here.
Suggested direction
Decompose the harness by responsibility before adding more suites. A barrel export can preserve the current call sites while making each domain independently readable and maintainable.
For Agents
Split devnet/_bootstrap/harness.ts into focused modules such as state.ts, cli.ts, files.ts, chain.ts, http.ts, sparql.ts, and polling.ts, then keep harness.ts as a compatibility barrel exporting the same public helpers. Preserve existing imports and run the bootstrap smoke/manifest tests.
| const HUB_OWNER_PK = | ||
| '0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80'; | ||
|
|
||
| const WAIVER_ABI = [ |
There was a problem hiding this comment.
🟡 Issue: Contract ABI strings are scattered instead of living behind one devnet contract boundary
What's wrong
The PR creates a shared harness, but the contract boundary remains fragmented across the individual suites. Each suite redefines ABI strings, event fragments, and contract construction rules locally, so the reader has to verify the same chain surface repeatedly and the tests can drift from the canonical deployed ABI.
Example
agentToAccountId and wrapper/logic PublishingConviction surfaces are now represented by independent string arrays across the harness and multiple suites. A signature or event rename would require finding all of these copies manually, and one stale fragment can compile until a live devnet run hits it.
Suggested direction
Move contract resolution and ABI ownership into the bootstrap package. A contracts.ts helper that exposes typed/minimal handles for Token, ContextGraphs, PublishingConviction, IdentityStorage, etc. would delete most of these local fragments and the repeated new ethers.Interface(...) setup.
For Agents
Look in devnet/_bootstrap/harness.ts and the PR suite ABI constants. Preserve the same contract calls and event parsing, but introduce a shared contract factory that loads canonical JSON ABIs from packages/evm-module/abi or centralizes minimal fragments in one devnet/_bootstrap/contracts.ts. Convert the suites to request named handles/fragments from that layer and keep per-suite code focused on scenario setup/assertions.
There was a problem hiding this comment.
🟡 Issue: Move repeated ABI fragments and event parsing into a devnet contract adapter
What's wrong
The PR adds several local contract mini-clients with duplicated string ABIs and hand-written receipt scans. That is brittle and obscures the actual scenario logic: readers must verify ABI strings and parser loops in every suite instead of trusting one canonical test adapter.
Example
agentToAccountId is in the harness NFT ABI, the preserve-agents suite's NFT ABI, and the bulk-register suite's assertions. A contract signature change would require hunting through test files rather than updating one devnet contract adapter.
Suggested direction
Centralize Hub contract resolution, ABI fragments, and receipt event lookup behind focused harness helpers. The tests should read as scenario flows, not as local mini SDKs for each contract.
For Agents
Create a small contract adapter module under devnet/_bootstrap, ideally loading ABI JSON artifacts where they exist and exposing helpers like contracts.identityStorage(state), contracts.publishingConviction(state), and findEvent(receipt, iface, name). Migrate repeated ABI/event parsing from the new suites first; preserve the existing assertions and contract addresses.
There was a problem hiding this comment.
🟡 Issue: Use canonical ABI artifacts instead of suite-local ABI strings
What's wrong
The tests duplicate contract interfaces as ad-hoc string fragments even though the repository already has generated ABI JSON. That makes the test harness a second source of truth for contract surfaces and increases drift risk as contracts evolve.
Example
If an event signature such as ContextGraphRegistrationDeposited changes in the Solidity artifact, the local fragment at this line can drift independently from the canonical ABI. The test code will still compile because the fragment is just a string.
Suggested direction
Move ABI ownership back to the EVM module artifacts and let the harness construct contracts from those artifacts. If a test only needs a few fragments, derive them from the artifact rather than rewriting signatures by hand.
For Agents
Add a harness helper such as loadAbiArtifact(name) / contractFromArtifact(state, name) that reads packages/evm-module/abi/${name}.json. Refactor suite-local ABI constants to use artifacts or a small shared contract registry, while preserving any special address resolution such as ContextGraphStorage via the facade.
There was a problem hiding this comment.
🟡 Issue: Reuse canonical ABI artifacts instead of scattering hand-written ABI fragments
What's wrong
The PR duplicates contract interface knowledge in each suite even though the repo already has generated ABI artifacts. These hand-maintained string fragments are another layer that can drift from the deployed contracts and make otherwise simple contract setup noisy.
Example
devnet/pr1366-pca-deposit-waiver/automated.test.ts defines four local ABI arrays before it can create contract handles; pr1384 and pr1387 repeat the same pattern for PublishingConviction/NFT surfaces.
Suggested direction
Move ABI loading behind the harness and let suites request contract handles by contract/artifact name.
For Agents
Add a harness helper that builds contracts/interfaces from packages/evm-module/abi/<Contract>.json, with optional fragment selection if startup cost matters. Replace suite-local ABI string arrays with canonical artifact loads, preserving the same contract calls and event parsing.
| SOAK_RS_SECONDS="${SOAK_RS_SECONDS:-1800}" | ||
| if [ "$RUN_ORCHESTRATOR" = "1" ]; then ORCH_RESERVE_SECONDS="${ORCH_RESERVE_SECONDS:-5400}"; else ORCH_RESERVE_SECONDS=0; fi | ||
|
|
||
| NEW_SUITES=(pr1386-term-canon pr1385-subgraph-rs pr1388-okf-integration pr1366-pca-deposit-waiver pr1387-bulk-register-agents pr1384-preserve-agents-on-transfer pr1370-admin-op-wallet) |
There was a problem hiding this comment.
🟡 Issue: The new devnet suites are registered in three separate places
What's wrong
This adds another explicit list of the same seven suite names on top of the root scripts and workspace entries. That is configuration spaghetti: the suite inventory is now maintained by copy/paste rather than by a single model, which makes future additions and removals easy to partially apply.
Example
Adding devnet/pr1390-new-suite would require touching at least three independent lists. If one is missed, the suite may install but not run in the sweep, or have a root script but not be a workspace importer.
Suggested direction
Collapse suite discovery/registration to one manifest or a convention-based discovery path. The sweep can still keep ordering metadata, but the suite names should not be copied across package.json, pnpm-workspace.yaml, and the shell script by hand.
Confidence note
This assumes these PR-coverage suites are expected to be maintained after this PR, not treated as one-off local scripts. If they are intentionally disposable, the impact is lower.
For Agents
Create one source of truth for devnet suite metadata. Options: a small devnet/suites.json consumed by the sweep and script generation, or derive sweep candidates from devnet/*/package.json plus optional metadata for ordering/time-warp/destructive behavior. Preserve the existing run order for the current suites and verify pnpm can still run each suite by config.
There was a problem hiding this comment.
🟡 Issue: The PR-suite registry is duplicated across several files
What's wrong
This adds a new maintenance pattern where suite identity and ordering are hand-copied across multiple files. The structure is easy to drift as more devnet suites are added, and the boilerplate obscures the actual coverage intent.
Example
Renaming or removing pr1385-subgraph-rs now requires coordinated edits in the sweep script, root package.json, pnpm-workspace.yaml, the suite package name, and lockfile. Missing one leaves a stale command or silently drops a suite from the sweep.
Suggested direction
Create one canonical suite manifest or discover devnet/pr*/vitest.config.ts with explicit order metadata where order matters. Then have the sweep and root commands consume that source instead of maintaining parallel lists.
For Agents
Look at scripts/devnet-1002-coverage-sweep.sh, package.json, pnpm-workspace.yaml, and the new devnet/pr*/vitest.config.ts files. Preserve the current suite order, but derive the runnable suite list from one manifest or filesystem discovery, and centralize the shared Vitest config.
| const kcMatch = /KC ID:\s*(\d+)/i.exec(result.stdout); | ||
| const txMatch = /TX hash:\s*(0x[0-9a-fA-F]+)/i.exec(result.stdout); | ||
| const kaId = kcMatch ? BigInt(kcMatch[1]!) : undefined; | ||
| expect( |
There was a problem hiding this comment.
🟡 Issue: The shared harness mixes execution helpers with hidden Vitest assertions
What's wrong
The harness is becoming the canonical layer for devnet operations, but it imports Vitest and embeds assertion policy inside what reads like a CLI wrapper. That makes the abstraction less reusable and obscures which behavior belongs to the operation versus the test assertion.
Example
A suite that wants to run publish and assert a specific non-success status cannot use publishViaCli; the helper name sounds like an execution helper, but it is also an assertion helper that throws through Vitest expectations.
Suggested direction
Keep the bootstrap layer mostly pure and make assertions explicit at call sites or behind clearly named expect... helpers. This would also let negative-path suites reuse the same parser/executor without fighting an embedded success policy.
For Agents
Split pure helpers from assertion helpers in devnet/_bootstrap/harness.ts. Keep runDkgCli and a parser like parsePublishResult(stdout) side-effect-free, then add expectPublishSuccess(...) or make publishViaCli clearly named as an asserting helper. Preserve current successful publish behavior in existing suites.
There was a problem hiding this comment.
🟡 Issue: Keep Vitest assertions out of the shared harness boundary
What's wrong
The harness mixes orchestration/parsing with test-runner assertions. That turns a shared infrastructure module into a Vitest-specific assertion layer and hides suite policy inside helpers, which makes reuse and future variants harder to reason about.
Example
A suite that wants to accept tentative only under one condition or wants to inspect a malformed CLI output cannot reuse publishViaCli as a parser; the helper asserts before returning.
Suggested direction
Make the harness return structured facts or throw plain infrastructure errors; keep policy assertions in the individual suites. That keeps the harness reusable by scripts and makes suite-specific expectations explicit at the call site.
For Agents
Refactor devnet/_bootstrap/harness.ts. Keep runDkgCli as process orchestration, add a plain parser such as parsePublishResult(stdout): PublishResult, and let individual tests assert allowed statuses / required IDs. Remove the Vitest import from the harness. Preserve current publish behavior in smoke and term-canon suites with local assertions.
There was a problem hiding this comment.
🟡 Issue: The shared harness is starting as a catch-all module
What's wrong
Centralizing repeated bootstrap code is the right direction, but this implementation collapses too many concerns into one file. The module boundary is broad enough that it will become hard to scan, test, and extend without adding unrelated helpers to the same namespace.
Example
A suite that only needs queryNode imports a module that also contains hardhat storage mutation helpers like fundTrac and PCA creation. Future devnet helpers have no clear home except appending to the same file.
Suggested direction
Keep the behavior and public import path, but decompose the implementation by responsibility before more suites build on it. This would make ownership clearer and keep future helpers from turning _bootstrap/harness.ts into a devnet utility junk drawer.
Confidence note
The harness is still under 1k lines, so this is not a file-size threshold violation. The concern is the breadth of unrelated responsibilities introduced in a new shared module.
For Agents
Split devnet/_bootstrap/harness.ts into focused modules, for example nodes.ts, http.ts, cli.ts, fixtures.ts, contracts.ts, and wait.ts, with an index.ts barrel preserving existing imports. Keep assertion-free helpers throwing errors, and let suites perform Vitest expectations.
There was a problem hiding this comment.
🟡 Issue: Keep the shared harness independent of Vitest assertions
What's wrong
The new harness is presented as shared bootstrap infrastructure, but it imports the test framework and embeds assertions in a utility function. That muddies the boundary between reusable devnet primitives and suite assertions, makes the helper harder to reuse from scripts or other runners, and forces every caller into one publish-status policy.
Example
A future devnet setup script or non-Vitest smoke runner that wants to reuse publishViaCli now pulls in Vitest and gets matcher-style assertion failures rather than a plain CLI/domain error it can handle.
Suggested direction
Split reusable mechanics from test assertions: let publishViaCli be a pure harness primitive with typed return values and explicit errors, and keep expect(...) calls in the suites or a Vitest-only assertion helper.
For Agents
In devnet/_bootstrap/harness.ts, remove the Vitest import from the reusable harness. Preserve the current publish parsing behavior by returning {status, kaId, txHash, raw} and throwing explicit errors for invalid publish output, or move assertion-only checks into a small test-side expectPublishOk helper. Existing devnet suites and the harness smoke test should still pass unchanged behaviorally.
…EAL cross-backend divergence; sweep exits non-zero on failure The blazegraph cross-backend term-canon oracle DETECTED a genuine consensus bug (which is its purpose): Blazegraph normalizes xsd:dateTime / time / date / some doubles / escaped strings into a different lexical form than oxigraph (e.g. timezone-less "…T12:00:00" is stored as "…T12:00:00.000Z"), and #1386's oxigraph-tuned canonicalizeObjectTermForHash does NOT reconcile it. Because the RS extractor hashes store-emitted terms (ka-extractor.ts: triples.map(hashTripleV10)), a Blazegraph-backed node computes a DIFFERENT V10 merkle leaf → RandomSampling fork once a non-oxigraph node joins. Fixing the canon is a coordinated dkg-core consensus change (out of scope here), so the 6 divergent cases are marked `it.fails` — CI stays green while the divergence is TRACKED (they flip to failing if the canon is ever made cross-backend safe). Agreeing datatypes still assert real cross-backend leaf agreement. Also: devnet-1002-coverage-sweep.sh now exits non-zero (1 = suite failure, 2 = health-guarded infra abort) so callers/CI can gate on it (otReviewAgent). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| // date/gregorian, some xsd:double/float, some escaped string content. The | ||
| // AGREEING datatypes above assert real cross-backend agreement. | ||
| // ─────────────────────────────────────────────────────────────────────────── | ||
| it.fails('xsd:dateTime fractional-seconds + timezone [KNOWN #1386 cross-backend divergence]', async () => { |
There was a problem hiding this comment.
🔴 Bug: The Blazegraph oracle passes CI for known cross-backend leaf divergence
What's wrong
This new CI test is wired as a cross-backend consensus oracle, but several cases that are explicitly documented to fork RandomSampling are marked with it.fails. In Vitest that turns the divergence into a passing test, so the job can be green even though oxigraph and Blazegraph do not compute the same V10 leaf for supported typed literals.
Example
For "2026-06-29T12:00:00"^^xsd:dateTime, the test comments say Blazegraph canonicalizes to a different leaf input than oxigraph. Because the case is wrapped in it.fails, CI passes while the exact fork condition the workflow claims to prevent still exists.
Suggested direction
Remove the expected-failure wrapping from the consensus oracle, or split known unsupported divergences into a non-gating tracker with the CI job no longer claiming full leaf agreement.
For Agents
In packages/storage/test/term-canon-blazegraph-oracle.test.ts, do not include known consensus divergences as passing it.fails cases in the CI oracle that is described as proving cross-backend agreement. Either make these cases fail the gate until the canonicalizer is fixed, or move them to a clearly non-gating tracking suite and update the workflow comment/contract accordingly. The fix should prove that a divergent Blazegraph round-trip causes the CI job to fail.
The cross-backend oracle expects known consensus divergences to fail, so CI can pass while agreement is broken
What's wrong
This gives false validation for the highest-risk part of the change. The new CI lane says it proves Oxigraph and Blazegraph compute the same V10 leaf, but several literal families are encoded as expected failures, so the lane remains green when the agreement is known not to hold.
Example
For "2026-06-29T12:00:00"^^<xsd:dateTime>, the test body throws when Blazegraph and Oxigraph canonical forms diverge, but it.fails makes that throw count as green. CI can therefore pass while a Blazegraph node would compute a different leaf for that literal family.
Suggested direction
Do not gate cross-backend agreement with it.fails for the same behavior the CI step claims to prove; either make these assertions red until fixed, or move them to an explicitly non-gating known-fail suite and narrow the CI claim.
For Agents
In packages/storage/test/term-canon-blazegraph-oracle.test.ts, separate known-divergence documentation from the CI oracle. The CI test named as cross-backend agreement should fail on any production-relevant divergence, or the CI workflow/comment should narrow its claim and leave the expected failures out of the green gate.
There was a problem hiding this comment.
🔴 Bug: Known divergences are encoded as passing it.fails cases inside an agreement oracle
What's wrong
it.fails is a clever mechanism, but here it makes the suite harder to reason about because success has two opposite meanings in one file. Some tests pass because the invariant holds; others pass because the invariant is known not to hold. That is brittle institutional knowledge encoded in test runner semantics and a long comment rather than in a clear model.
Example
A reader scanning term-canon cross-backend oracle: oxigraph ⇄ blazegraph leaf agreement sees passing CI, but some of those passes mean expectCrossBackendLeafAgreement(...) failed as expected. That inverts the meaning of a green test within the same agreement suite.
Suggested direction
Separate the concepts: one suite should assert agreement and fail on divergence; another should document known divergence cases explicitly. Avoid mixing inverted tests into the green agreement oracle.
Confidence note
This is a maintainability finding about the test implementation style, not a claim that the expected-failing cases are behaviorally wrong.
For Agents
In packages/storage/test/term-canon-blazegraph-oracle.test.ts, split agreement checks from known-divergence documentation. Keep true agreement cases under the current oracle helper. Move known divergences into a clearly named describe('known cross-backend divergences') using it.skip, it.todo, or a purpose-built expectKnownDivergence helper that asserts divergence explicitly without masquerading as agreement. Preserve the documented cases and make the CI signal unambiguous.
Cross-backend divergence is marked as a passing test
What's wrong
The added CI lane is presented as proof of cross-backend V10 leaf agreement, but several high-risk literal families are expected to diverge and still pass because they are wrapped in it.fails. That gives green CI while preserving the exact RandomSampling fork risk the oracle is supposed to catch.
Example
If Blazegraph returns a timezone-normalized dateTime that canonicalizes differently, expectCrossBackendLeafAgreement throws, but it.fails(...) turns that failure into a passing CI result.
Suggested direction
Do not use it.fails for cases that the CI comment/release evidence treats as proven agreement. Make the unsupported cases explicit non-gating documentation, or fix the canonicalizer and assert agreement normally.
For Agents
In packages/storage/test/term-canon-blazegraph-oracle.test.ts, separate known-broken documentation from pass/fail verification. Either remove the CI claim until the canon is fixed, or gate only supported datatypes as agreement tests and create a non-gating TODO/skip for unsupported ones. The CI lane should fail for any datatype the release claims is cross-backend safe.
| import { defineConfig } from 'vitest/config'; | ||
| import { resolve } from 'node:path'; | ||
| export default defineConfig({ | ||
| test: { include: [resolve(import.meta.dirname, 'automated.test.ts')], testTimeout: 600_000, hookTimeout: 240_000, pool: 'forks', sequence: { concurrent: false }, globals: false }, |
There was a problem hiding this comment.
🟡 Issue: Collapse the repeated devnet suite scaffolding into one shared suite definition
What's wrong
The PR creates a linear-maintenance pattern around each new devnet suite. The actual suite-specific logic is in automated.test.ts, but the repository now has multiple copies of identical runner configuration and package metadata. That makes future runner changes noisy and easy to apply inconsistently.
Example
Changing the fork pool, timeout, or module resolution for these devnet suites now requires touching seven nearly identical configs plus lock/workspace metadata instead of one shared devnet-suite definition.
Suggested direction
Keep the suites separate at the test-file level, but move the common Vitest config and dependency assumptions into devnet/_bootstrap or root tooling. Ideally adding a future PR suite should require adding the test file and one script entry, not a package, config, workspace importer, and lockfile stanza.
Confidence note
Existing devnet suites already use per-suite packages/configs, but this PR adds enough identical copies that the pattern should be collapsed before it spreads further.
For Agents
Look at the new devnet/pr*/vitest.config.ts and package.json files plus the root scripts/workspace additions. Preserve each suite's current test entry point and script name, but introduce a shared defineDevnetSuiteConfig(suiteDir, overrides?) or a root devnet project runner so common config/dependency declarations live in one place. Verify one suite still runs by invoking its root test:devnet:* script.
There was a problem hiding this comment.
🟡 Issue: Collapse the repeated devnet-suite scaffolding into one shared config/registry
What's wrong
This PR adds a lot of administrative duplication around every suite. The tests themselves already share a harness, but the package/config/runner layer did not get the same treatment, so the next maintenance change has several places to drift.
Example
Adding or changing a devnet timeout/module-resolution rule now requires touching every devnet/pr*/vitest.config.ts, plus keeping the root scripts, workspace entries, lockfile importers, and sweep list aligned.
Suggested direction
Move the invariant config and suite registry into a single helper/manifest so a new PR suite adds only its actual test code and exceptional dependencies.
For Agents
Introduce a shared devnet Vitest config factory or one common config in devnet/_bootstrap, then have each suite only declare its test file/name. Preserve existing root script names if needed, but drive the sweep/package registration from one suite manifest or a narrow workspace glob. A smoke check should prove each existing suite still runs through the shared config.
| // self-contained, zero imports) so the assertion is grounded in the exact code the | ||
| // network uses, not a re-implementation. Relative-path import because this suite is | ||
| // not a pnpm workspace member and so has no `@origintrail-official/*` resolution. | ||
| import { canonicalizeObjectTermForHash } from '../../packages/core/dist/crypto/term-canon.js'; |
There was a problem hiding this comment.
🟡 Issue: Avoid reaching into built dist from a devnet suite
What's wrong
The test couples itself to the package's build layout instead of its module boundary. That makes the suite more fragile and contradicts the workspace packaging added by this same PR.
Example
A source move, build-output cleanup, or package export change can break this devnet suite even though the canonicalizer still exists as a package API.
Suggested direction
Use the package export (@origintrail-official/dkg-core) or a deliberate test helper boundary instead of a relative path into packages/core/dist. If the package does not export this symbol cleanly, that export boundary should be fixed rather than bypassed here.
Confidence note
This is anchored as a maintainability/boundary concern; if this repository intentionally requires prebuilt dist for all devnet suites, the author can justify keeping the import, but the new workspace entry makes the workaround look unnecessary.
For Agents
In devnet/pr1386-term-canon, add the appropriate workspace dependency if needed and import the canonicalizer through the package boundary, matching packages/storage/test/term-canon-blazegraph-oracle.test.ts. Keep the test's behavior unchanged and verify it still resolves after a normal repo build/install.
…viewAgent) The best-effort RS-liveness gate could pass on a pre-existing solved challenge (from before the suite published its sub-graph KA), giving false confidence for the #1385 extractor regression. Now it baselines each core's challenge period up front and accepts liveness only via a post-publish signal: a solved challenge in a NEWER period, or a strict submittedCount increase. Validated on a live devnet (passes via submittedCount). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks — triaged all findings. 🔴 Fixed:
🟡 Acknowledged — valid design polish, deferred as follow-ups (this is a test-coverage PR; these are non-correctness refactors I'd rather not fold in mid-review): I'll file these as a tracked follow-up rather than expand this PR's surface. Note on the blazegraph oracle: its |
| "test:devnet:v10-rs-prune": "vitest run --config devnet/v10-rs-prune/vitest.config.ts", | ||
| "test:all": "pnpm test && pnpm test:evm" | ||
| "test:all": "pnpm test && pnpm test:evm", | ||
| "test:devnet:pr1386-term-canon": "vitest run --config devnet/pr1386-term-canon/vitest.config.ts", |
There was a problem hiding this comment.
🟡 Issue: The new devnet suites are registered through duplicated scaffolding instead of one suite model
What's wrong
This PR does add a shared harness, but the suite-level structure still grows by copy-paste. The same metadata is spread across root scripts, workspace entries, package manifests, lockfile importers, per-suite configs, and the sweep script. That makes future additions noisy and easy to desynchronize, and it turns suite discovery into maintenance work rather than a property of the directory layout.
Example
Adding one more devnet PR suite now requires editing a package file, a vitest config, the workspace list, the root scripts, the lockfile importer list, and the sweep's NEW_SUITES list. Those are all describing the same concept: a suite directory containing automated.test.ts.
Suggested direction
Introduce one canonical devnet-suite abstraction: for example a shared devnet/vitest.config.ts factory plus a manifest or convention-based discovery in the sweep. Then root scripts can call a single runner with the suite name, and each suite directory can contain only the actual test and fixtures.
For Agents
Look at the new devnet/pr*/package.json, devnet/pr*/vitest.config.ts, root package.json scripts, pnpm-workspace.yaml, and scripts/devnet-1002-coverage-sweep.sh. Preserve the ability to run individual suites and the sweep, but replace the repeated registration with a shared devnet suite manifest or a convention-based runner/config factory. Add a small smoke check that discovers the new suite directories and runs one generated config path.
There was a problem hiding this comment.
🟡 Issue: The suite registry is duplicated instead of being source-of-truth driven
What's wrong
This PR adds a manifest and a drift guard, but it still requires the same suite inventory to be edited in several places. That is configuration sprawl: the tests may catch inconsistency, but the implementation still makes adding or renaming a suite unnecessarily error-prone and noisy.
Example
Adding devnet/pr1390-example requires editing its directory, devnet/suites.json, pnpm-workspace.yaml, root package.json, and a copied vitest.config.ts; the manifest guard catches drift only after the duplication has already been introduced.
Suggested direction
Collapse the repeated registry into a single ownership point: use globs or generated metadata for workspace/package exposure, and keep the explicit ordered list only where ordering matters.
For Agents
Make devnet/suites.json an actual source of truth. Prefer a workspace glob where possible, a parameterized root script such as test:devnet:suite <name>, and have the sweep read the manifest directly. If individual scripts are required, generate them from the manifest rather than hand-maintaining parallel lists. Preserve the existing suite order in prCoverage.
There was a problem hiding this comment.
🟡 Issue: The suite manifest is not actually single-sourced
What's wrong
The PR adds a manifest, but then duplicates that same registry across root scripts, workspace entries, and repeated suite boilerplate. That makes the manifest test a drift alarm rather than a simplification, so the next suite still requires coordinated manual edits in several files.
Example
Adding pr1390 would require touching at least devnet/suites.json, pnpm-workspace.yaml, package.json, devnet/pr1390/package.json, and devnet/pr1390/vitest.config.ts. The manifest guard detects drift after the fact, but the design still preserves the drift-prone workflow.
Suggested direction
Delete the duplicated registry surface rather than guarding it. Let suites.json drive the sweep, expose one generic devnet test command, and use globs/shared config for package discovery and Vitest defaults.
For Agents
Replace the per-suite root scripts with a small runner that accepts a suite name and reads devnet/suites.json. Use a workspace glob for devnet packages where possible, and introduce a shared vitest config factory so new suites only need their automated.test.ts plus package dependencies when they genuinely differ. Preserve existing script names only as thin compatibility aliases if needed.
| * helper, so issue it directly with the node's bearer token (the route gates | ||
| * writes on a node-admin token; the harness reads exactly that from auth.token). | ||
| */ | ||
| async function postDelete( |
There was a problem hiding this comment.
🟡 Issue: DELETE support is implemented as a one-off suite helper instead of extending the harness
What's wrong
This is exactly the kind of plumbing the new harness exists to remove, but the first missing method gets reimplemented inside a suite. That creates two subtly different HTTP paths: shared GET/POST use fetchRetry, while this DELETE path uses raw fetch. Even aside from retry behavior, future suites will likely copy this helper again unless the harness owns the full request abstraction.
Example
The new postDelete is effectively a third copy of getJson/postJson: build auth headers, call fetch, try to parse JSON, return { status, json }. The only difference is the HTTP method and URL encoding.
Suggested direction
Collapse the HTTP helpers into one method-parameterized harness function and keep all auth, retry, URL, and JSON parsing behavior in that one place.
For Agents
Move the method/path/body/auth/JSON parsing logic into devnet/_bootstrap/harness.ts as requestJson<T>(node, method, path, body?), then expose getJson, postJson, and deleteJson as thin typed wrappers. Update devnet/pr1370-admin-op-wallet/automated.test.ts to use the shared DELETE helper while preserving the same route calls and assertions.
…es on orchestrator, pr1386 imports package boundary - Move the cross-backend blazegraph term-canon oracle + its CI wiring OUT of this devnet-coverage PR: it's a CONSENSUS artifact and now lives with the canon fix (OT-RFC-57 / fix/backend-independent-leaf-canon), where its it.fails -> it flips as each datatype is fixed. Removes the "it.fails in an agreement oracle" concern from this PR. - Sweep (devnet-1002-coverage-sweep.sh) now also gates on the phase-3 orchestrator: a real orchestrator failure exits 3 (the expected time-box cap 124/137 stays non-fatal). Previously only suite failures were counted. - pr1386 imports canonicalizeObjectTermForHash from the package boundary (@origintrail-official/dkg-core) instead of reaching into ../../packages/ core/dist; adds the workspace dep. Suite still green (2/2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| // anomaly. If stdout shows no confirmation, THEN treat the non-zero exit as | ||
| // a real failure. | ||
| const status = (/Status:\s*(\w+)/i.exec(publish.stdout)?.[1] ?? 'unknown').toLowerCase(); | ||
| if (publish.code !== 0) { |
There was a problem hiding this comment.
🔴 Bug: Sub-graph publish failures are accepted as passing coverage
What's wrong
The coverage suite is supposed to validate the real dkg shared-memory publish --sub-graph-name flow, but the new logic treats a failed CLI process as success based on partial stdout. That masks an integration/API contract break.
Example
If publishFromFinalizedAssertion returns a response without kas, the CLI prints Status and KC ID, then throws on result.kas.length and exits 1. This test still passes, so the sweep can report green while the CLI command is broken for users.
Suggested direction
Make the suite fail on a non-zero publish exit, or call a lower-level API only if the goal is to bypass the CLI and then add a separate check that the CLI exits 0.
For Agents
In devnet/pr1385-subgraph-rs/automated.test.ts, keep the sub-graph KA verification, but do not classify a non-zero CLI exit as success. Either require publish.code === 0 and fix the CLI/API response shape, or move this known CLI failure into an explicit expected-failure test that cannot satisfy the PR coverage gate.
There was a problem hiding this comment.
🟡 Issue: The sub-graph publish test treats a failed CLI command as success
What's wrong
The sweep is meant to validate network-observable behavior through the real CLI, but this branch converts a non-zero CLI exit into a passing test whenever stdout looks successful. That masks API/CLI contract breakage and lets the coverage sweep pass even though the command users run failed.
Example
If dkg shared-memory publish --sub-graph-name ... prints Status: confirmed and KC ID: 123, then throws a post-success TypeError and exits 1, this suite still passes. A user or automation running the same CLI command would see a failed command, but the sweep reports the PR coverage as green.
Suggested direction
Keep parsing the KA id if needed for cleanup/debugging, but make the suite fail on non-zero exit or move the known CLI bug into an explicit quarantined check that cannot make the coverage sweep green.
Confidence note
The test comment says this is a known current CLI behavior, but the changed suite is part of the sweep gate, so treating a non-zero CLI exit as success changes what the sweep can catch.
For Agents
Look at devnet/pr1385-subgraph-rs/automated.test.ts around the shared-memory publish step. Preserve the deterministic KA publication/setup for the extractor assertions, but do not let a non-zero public CLI exit count as a passing integration result. Either fix the CLI post-success error and require publish.code === 0, or bypass the CLI for this setup and add a separate explicit failing/xfail-style check for the known CLI bug.
|
|
||
| // Snapshot the dormant deposit, then activate it for the suite. | ||
| depositSnapshot = await params.contextGraphRegistrationDeposit(); | ||
| await setDeposit(DEPOSIT); |
There was a problem hiding this comment.
🟡 Issue: Deposit cleanup is disabled during the riskiest activation window
What's wrong
This suite mutates a shared global chain parameter, but its restoration guard is enabled only after the mutation helper fully succeeds. A partial failure can leave the devnet in a non-default state and contaminate the rest of the sweep.
Example
If setContextGraphRegistrationDeposit(DEPOSIT) is mined but the follow-up read/verification in setDeposit throws, depositActivated is still false. afterAll skips restoration, leaving the shared devnet registration deposit non-zero for later suites.
Suggested direction
Mark cleanup as required before mutating the shared parameter, and make restoration depend on having a snapshot plus attempted mutation rather than only on a post-mutation success flag.
Confidence note
The failure window is narrow but real because the global parameter is changed before the restoration guard is enabled.
For Agents
In devnet/pr1366-pca-deposit-waiver/automated.test.ts, set a cleanup-needed flag before submitting the deposit-changing transaction, or wrap activation in try/finally so the snapshot is restored whenever the setter may have been sent. Add a failure-path check that later suites see the original deposit.
| `#1385: sub-graph 'shared-memory publish' exited ${publish.code} despite Status=${status} — ` + | ||
| `cosmetic CLI post-success error (worth a follow-up). stderr: ${publish.stderr.slice(0, 200)}`, | ||
| ); | ||
| expect( |
There was a problem hiding this comment.
🟡 Issue: Sub-graph publish test accepts a failed CLI exit
What's wrong
The test treats stdout confirmation as sufficient even when the CLI exits with failure. That weakens validation for the actual command path being used to publish the sub-graph KA.
Example
If dkg shared-memory publish --sub-graph-name completes the on-chain write but exits 1 because a post-success response handling regression breaks the CLI, this test still passes. A user-facing publish command regression remains unverified.
Suggested direction
Do not count the publish command as verified when the process exits non-zero; keep any temporary workaround visibly separate from the coverage claim.
For Agents
In devnet/pr1385-subgraph-rs/automated.test.ts, separate the known CLI exit-code bug from the RS extractor test. Either require publish.code === 0 once the CLI bug is fixed, or mark the workaround as an explicit expected failure with a dedicated assertion/tracker so this suite does not report full publish-flow success on exit 1.
…le, not just RS-live (otReviewAgent) The old best-effort RS-liveness test proved RS is alive (a proof lands) but NOT that RS sampled/proved OUR sub-graph KA — the network draw samples across every weighted CG, so the landed proof almost always covers a DIFFERENT KA (observed every run: matchesSubGraphKa=false). That gave false confidence for the #1385 extractor regression. Add a fully-deterministic it 3 that runs the REAL production extractV10KCFromStore (the #1385 code path) against node1's live oxigraph over the SPARQL-HTTP adapter, with the per-cgId data-graph read blanked so the #1385 sub-graph cascade is FORCED to fire (the publisher's per-cgId promotion otherwise copies the sub-graph KA into .../context/<cgId> and path (a) satisfies it without exercising #1385 — a plain root check would stay green even if #1385 were reverted). It asserts BOTH: - the extractor resolved OUR subGraphName and read every triple from OUR sub-graph home (proves the cascade fired), and - the leaves recompute — exactly as the prover does, structuredKARootV10(triples.map(hashTripleV10), privateRoots) — to the on-chain-anchored merkleRoot (getLatestMerkleRoot) for OUR kaId. This is independent of the network sampling draw and directly proves OUR sub-graph KA is extractable AND provable. it 4 (old it 3) is kept as a no-stall liveness regression guard with its misleading OUR-KA-sampled framing scrubbed. Validated green x3 against the live 6-node devnet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| ): Promise<{ status: number; json: any }> { | ||
| const headers: Record<string, string> = { 'Content-Type': 'application/json' }; | ||
| if (node.authToken) headers.Authorization = `Bearer ${node.authToken}`; | ||
| const res = await fetchRetry(`http://127.0.0.1:${node.apiPort}${path}`, { |
There was a problem hiding this comment.
🔴 Bug: postJson retries non-idempotent mutations and can duplicate shared-devnet writes
What's wrong
A thrown fetch error does not prove the daemon did not process the request. Retrying all POSTs can submit the same mutation twice, causing false failures and, worse, leaving shared state changed when the suite's cleanup flag was never set.
Example
If POST /api/operational-wallets adds the throwaway key on-chain but the connection resets before the response, fetchRetry sends the POST again. The second call can fail as already registered; the test throws before setting added=true, so the finally cleanup does not remove the key and node1 is left mutated.
Suggested direction
Restrict fetchRetry to GET/status/query-style calls, or add an option so mutation endpoints use a single attempt unless the caller can safely dedupe/reconcile the side effect.
For Agents
In devnet/_bootstrap/harness.ts, make retries opt-in or idempotent-method-only. For mutation helpers, either do not retry after dispatch, or provide operation-specific recovery that checks final state and still performs cleanup. Add a regression around a simulated post-commit connection reset for the operational-wallet add flow.
There was a problem hiding this comment.
🔴 Bug: Mutating HTTP requests are retried and can duplicate side effects
What's wrong
The shared retry helper is safe for status polls and queries, but postJson and httpDelete use it for state-changing requests. A network error after the daemon has already processed a write is an ambiguous outcome; replaying the request can create duplicate side effects, produce a misleading conflict, and skip cleanup logic that assumes the first call did not land.
Example
In pr1370, POST /api/operational-wallets can successfully add the throwaway key on the node, then the client connection can reset before the response is read. fetchRetry sends the same POST again; the retry can return 409 because the key is already registered. The test then fails before setting added = true, so the finally cleanup does not remove the key and node1 is left mutated for later suites.
Suggested direction
Make retry behavior method-aware. Default to no retry for POST/DELETE, or require callers to mark specific idempotent endpoints as retryable and make cleanup code robust to ambiguous write outcomes.
For Agents
Update devnet/_bootstrap/harness.ts so mutating helpers do not automatically retry non-idempotent POST/DELETE calls. Keep retries for safe reads or add an explicit retry opt-in for known-idempotent operations. Add a regression test or small harness test that simulates a POST processed once then connection-reset, and proves the helper does not replay the mutation or that callers can still clean up based on observed on-chain state.
| try { | ||
| const ch = await s.rss.getNodeChallenge(id); | ||
| const isSolved: boolean = ch[6]; | ||
| if (isSolved) { |
There was a problem hiding this comment.
🔴 Bug: RS proof assertion can pass on a stale solved challenge
What's wrong
The assertion claims a Random Sampling proof lands on-chain during this suite, but it accepts pre-existing solved chain state from earlier devnet activity. That makes the test a false-positive liveness signal instead of evidence that RS progressed with the newly published KA present.
Example
On a warm devnet where node2 already has a solved challenge before this test starts, this waitFor returns immediately at line 314. The suite reports that an RS proof landed even if no proof was submitted after the diverse-literal KA was published, so a regression in the publish→RS path can be missed.
Suggested direction
Baseline the challenge period or submitted counter before polling, then require progress after the publish step rather than accepting any already-solved challenge.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, record each core's current activeProofPeriodStartBlock and/or submittedCount before waiting, then accept only a solved challenge from a newer period or a strict submittedCount increase after the KA publication. Preserve the existing current-challenge observability logs.
RS proof assertion can pass on a pre-existing solved challenge
What's wrong
The test claims validation that an RS proof lands while the CG carries the new diverse-literal KA, but it can pass using an already-solved challenge from before the test behavior occurred. That gives false confidence for the changed canonicalization pipeline.
Example
On a warm devnet where node1 already solved its current RS challenge before this suite starts, this test returns at line 314 immediately. A regression that prevents proofs after publishing the diverse-literal KA would still pass because the solved challenge is stale evidence.
Suggested direction
Take a post-publish baseline and only accept fresh RS evidence, similar to the newer-period guard used in the subgraph RS suite.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, carry the published kaId or at least snapshot each core's activeProofPeriodStartBlock/submittedCount after the publish, then require a solved challenge in a newer period or a strict submitted proof after that snapshot. Preserve the generic liveness intent if sampling is nondeterministic, but prove the liveness happened after the KA was introduced.
There was a problem hiding this comment.
🔴 Bug: RS proof assertion can pass on a pre-existing solved challenge
What's wrong
The test claims that an RS proof lands while the diverse-literal KA is present, but it accepts any core whose current challenge is already solved. That gives false confidence: the changed canonicalization path could be broken for the newly published literals and this test would still pass on a warmed devnet.
Example
On a warm devnet where node2 already has getNodeChallenge(id)[6] === true before this test starts, this assertion passes immediately. If the diverse-literal KA published earlier in the suite cannot be proven by Random Sampling, the test still stays green because it accepted the old solved challenge.
Suggested direction
Make the RS assertion prove fresh progress after this suite's publish, and separate generic liveness from diverse-literal-specific evidence.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, snapshot each core's challenge period/submittedCount before publishing or before this RS assertion, then require a newer solved period or a strict submittedCount increase after the diverse-literal KA is resident. If the test is meant to verify this KA specifically, add deterministic root recomputation for pub.kaId or gate on a sampled knowledgeAssetId match when claiming KA-specific coverage.
There was a problem hiding this comment.
🔴 Bug: The RS liveness test can pass on a stale solved challenge
What's wrong
The assertion gives false confidence because it accepts pre-existing solved challenges from earlier devnet activity. It does not prove a proof landed while the new diverse-literal KA was resident.
Example
On a warm devnet, if node2 already has getNodeChallenge(id)[6] === true before this test starts, the callback returns immediately at line 287. The suite passes even if no RS proof was submitted after the term-canon KA was published.
Suggested direction
Baseline RS state before polling and require proof progress after this suite's publish step.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, snapshot each core's activeProofPeriodStartBlock and/or submittedCount after the publish test, then require a solved challenge in a newer period or a strict submitted-count increase during this test. The pr1385-subgraph-rs liveness test has the right shape.
There was a problem hiding this comment.
🔴 Bug: Term-canon suite uses the canonicalizer as its own oracle, so it can pass without proving the network used it
What's wrong
The changed test claims to validate the live publish/query/RS canonicalization pipeline, but it normalizes the store read-back inside the test before comparing. That proves the helper can canonicalize the returned value, not that the publisher/prover actually used the canonicalized term when anchoring or proving the KA.
Example
If tripleContentV10 regressed to hashing raw object terms while canonicalizeObjectTermForHash remained exported and correct, this suite could still pass: the query can return a raw term like "1.50"^^<...decimal>, line 224 canonicalizes it to the expected value, and the RS test can pass on another already-solved KA.
Suggested direction
Tie the assertion to the published KA’s on-chain root or a proof for that KA, so a regression that stops publisher/prover leaf hashing from using canonicalization fails.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, add a network-anchored assertion for the newly published diverse-literal KA: recompute the KA root from the actual queried/store terms using the production leaf path and compare it to the on-chain merkle root for pub.kaId, or otherwise make the RS proof target this KA. Keep the existing convergence checks as supporting evidence, but do not let them be the only integration proof.
| * closing `>` so the sibling `…/context/<cgId>/_meta` graph (which the extractor | ||
| * MUST still read for the UAL/rootEntity/subGraphName resolution) is untouched. | ||
| */ | ||
| function makeReplicaView(inner: TripleStore, perCgIdDataGraph: string): TripleStore { |
There was a problem hiding this comment.
🟡 Issue: Avoid the string-matching Proxy store shim for the extractor regression path
What's wrong
This is a fragile, magical abstraction in the highest-complexity suite. The test's core premise depends on an incidental query string shape rather than a stable graph/store boundary, which makes future extractor refactors harder to reason about.
Example
A harmless formatter change in extractV10KCFromStore that emits GRAPH<...>, lowercases/rewrites the query, or routes through a different TripleStore method would bypass this shim without making the test intent obvious.
Suggested direction
Model the replica/unpromoted-store view explicitly instead of keying behavior off incidental SPARQL serialization.
Confidence note
The intent of forcing the extractor fallback is clear, but the maintainability concern is about how the behavior is modeled, not whether this currently passes.
For Agents
Replace the generic Proxy + raw-string matching with a small named TripleStore test double/decorator that models the scenario at a clearer boundary, or extract an explicit helper from the extractor/store layer that can disable the per-CG data graph by graph URI rather than by SPARQL text. Keep the behavior of blanking only the per-CG data graph.
There was a problem hiding this comment.
🟡 Issue: Avoid the Proxy-based TripleStore monkey patch
What's wrong
The proxy is a magical wrapper around a complex storage boundary. It intercepts arbitrary properties, recognizes behavior by searching raw SPARQL text, and then hides the mismatch with as unknown as TripleStore. That makes a delicate regression test harder to reason about than the production path it is trying to validate.
Example
If extractV10KCFromStore changes the generated query formatting while keeping the same read semantics, this proxy can silently stop exercising the intended path or blank a different CONSTRUCT query, and the type system will not help because the wrapper is cast back to TripleStore.
Suggested direction
Use an explicit, typed test seam instead of a catch-all proxy with substring matching and casts.
For Agents
Replace makeReplicaView in devnet/pr1385-subgraph-rs/automated.test.ts with an explicit TripleStore test decorator/class that implements query directly and documents the exact predicate it blanks. Better yet, expose a narrow extractor/store-read seam and inject the per-CG graph response there. Preserve the test's intent: force the sub-graph fallback path and still compare the recomputed root to chain state.
There was a problem hiding this comment.
🟡 Issue: Avoid the magic Proxy seam for the extractor-path test
What's wrong
This is a high-leverage test, but the implementation relies on a generic Proxy, substring matching of SPARQL text, and cast-heavy type escape hatches. That hides the real invariant and makes future extractor/store refactors harder to reason about.
Example
A harmless extractor refactor that formats the same graph read differently, wraps the graph IRI with different whitespace, or routes through another store method can stop the blanking from applying. The test would then exercise the easier promoted path while still looking structurally similar.
Suggested direction
Make the test seam explicit and observable. A small typed decorator with a counter for blanked queries would preserve the behavior while making the forced read path much easier to reason about.
Confidence note
This may be intentionally test-only, but the implementation is still a brittle seam for a very important regression assertion.
For Agents
Replace the Proxy with an explicit test store/decorator class that implements the TripleStore query method and has a named shouldBlankPerCgDataGraph(query) policy. Prefer a parser/structured graph matcher if one exists; otherwise isolate the string matching in that policy and assert that the blanking branch was actually hit before accepting the extraction result.
There was a problem hiding this comment.
🟡 Issue: The replica-store override is too magical for a core regression test
What's wrong
The test is validating an important path, but it does so through a cast-heavy Proxy with hidden behavior and string-based query matching. That is brittle and hard to audit in a 600+ line suite.
Example
A harmless formatter change in extractV10KCFromStore's generated SPARQL could stop including the exact GRAPH <...> token this proxy searches for. The test would then exercise a different path, while the indirection makes that hard to see from the assertion body.
Suggested direction
Make this a typed, explicit test double or extractor fixture helper instead of a generic Proxy that rewrites behavior by SPARQL substring inspection.
For Agents
Replace makeReplicaView with an explicit test double, for example a small class implementing the TripleStore query surface with a named shouldBlankConstructGraph predicate and optional call recording. Keep the behavior of blanking only the per-cgId CONSTRUCT, and assert the blanking path was actually invoked.
| expect(reimport.code, `re-import of exported bundle failed:\n${reimport.stdout}\n${reimport.stderr}`) | ||
| .toBe(0); | ||
| const reSummary = lastJsonBlock(reimport.stdout); | ||
| expect(reSummary.conformant).toBe(true); |
There was a problem hiding this comment.
🟡 Issue: OKF export round-trip does not verify that links survive export
What's wrong
The changed export behavior is supposed to round-trip an equivalent OKF graph, including cross-concept links. The current validation would still pass if export dropped those links because the re-import checks only concept identity and basic conformance.
Example
If dkg okf export wrote four Markdown files with valid frontmatter but omitted every reconstructed [link](...), okf import outDir --dry-run could still report conformant: true, concepts: 4, and the same urn:okf: IRIs, so lines 336-340 would stay green while the cross-concept link graph was lost.
Suggested direction
Assert the exported bundle preserves the semantic link graph, not just that the concept files remain conformant with the same IRIs.
For Agents
In devnet/pr1388-okf-integration/automated.test.ts, extend the round-trip test after reSummary to assert the exported bundle re-import resolves the expected link graph, for example linksResolved > 0 plus the same hub-to-alpha/beta/gamma schema:mentions expectations used for SWM.
…urce suite manifest (otReviewAgent #1397) Three harness-quality fixes flagged by the re-review: 1. Typed SPARQL binding boundary. harness gains `SparqlBindingCell` + normTerm / valueOf / lexical / unwrapIri, and queryNode's return widens from Record<string,string> to Record<string,SparqlBindingCell> (its cells genuinely can be structured SPARQL-JSON). The three canon-comparing suites (pr1386, pr1385, pr1388) delete their near-identical local copies and import the shared ones — so a future canon-form change can't silently drift one suite's normalizer. (The self-contained v10-core-flows and the ASK-boolean core-peers helper are left as deliberate boundaries: non-canon-sensitive, no drift risk, and folding them in would add harness coupling to standalone suites for no correctness gain.) 2. DELETE as a harness primitive. pr1370's inline postDelete becomes a one-line domain wrapper over new harness `httpDelete(node, path)` (auth + tolerant JSON, via fetchRetry). 3. Single-source suite manifest. devnet/suites.json is now the one list; the coverage sweep reads prCoverage from it, and a pure-fs drift guard (_bootstrap/suite-manifest.test.ts, its own devnet-free config + test:devnet:manifest script) asserts suites.json stays in sync with the on-disk dirs, pnpm-workspace, and the package.json test:devnet:* scripts. Validated live on the running devnet: pr1370 5/5, pr1388 4/4, pr1386 2/2, pr1385 4/4; drift guard 6/6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sweep now reads NEW_SUITES from suites.json; guard against a stale manifest silently under-testing by running the pure-fs drift check (fail-fast, exit 2) before Phase 1, alongside the devnet-health gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| ProfileStorage: await get('ProfileStorage'), | ||
| }; | ||
| // Optional contracts (present only after the relevant PR's deploy script ran). | ||
| for (const opt of [ |
There was a problem hiding this comment.
🟡 Issue: Make contract lookup lazy instead of maintaining an optional-contract allow-list
What's wrong
The harness centralizes devnet state, but contract address ownership is still split between a hard-coded optional list and ad-hoc suite lookups. That makes the abstraction leaky: every new suite author has to know which contracts were preloaded, which ones need direct Hub resolution, and whether asset-storage contracts are special.
Example
Adding the next Hub-registered optional contract requires knowing to edit the harness allow-list first; otherwise some suites will use contractAt while others will hand-roll state.hub.getContractAddress. That is exactly the kind of drift the harness was meant to eliminate.
Suggested direction
Delete the fixed optional-contract list and make the harness own contract resolution uniformly. A small cached resolver would remove the allow-list and the suite-level direct Hub lookups.
For Agents
Refactor devnet/_bootstrap/harness.ts so contractAt or a new resolveContractAddress lazily resolves any Hub-registered contract by name, caches it on DevnetState, and handles asset-storage names explicitly where needed. Preserve existing suite behavior, then update pr1384 to use the same canonical helper as pr1387. A smoke test should cover resolving a required contract and an optional PR contract.
| @@ -0,0 +1,32 @@ | |||
| { | |||
| "_comment": "Single source of truth for the devnet suite set. `prCoverage` = the 10.0.2 PR-coverage suites the coverage sweep runs; `all` = every devnet/<dir> that has a vitest.config.ts (excluding _bootstrap). scripts/devnet-1002-coverage-sweep.sh reads prCoverage; devnet/_bootstrap/suite-manifest.test.ts guards this file against package.json / pnpm-workspace.yaml / on-disk drift.", | |||
There was a problem hiding this comment.
🟡 Issue: The new devnet suite registry is still not a real single source of truth
What's wrong
The PR adds a manifest and a guard against drift, but the suite list still has to be repeated in root scripts, workspace membership, per-suite package files, and identical Vitest configs. That is structural duplication, not just boilerplate. The guard detects drift, but the cleaner design is to remove most of the drift surface.
Example
To add one more PR-coverage suite, a contributor must update the directory, devnet/suites.json, pnpm-workspace.yaml, root package.json, the suite package.json, the suite vitest.config.ts, and the lockfile. The manifest test catches missed edits after the fact, but it does not remove the ceremony or the review surface.
Suggested direction
Use the manifest to drive the sweep and local runner directly, and factor the repeated Vitest config through a shared helper. If individual package scripts are needed, generate them or route them through one canonical script instead of hand-maintaining parallel lists.
Confidence note
pnpm-workspace.yaml may need explicit package entries depending on the repository's release tooling, but the current PR already adds a manifest and a drift guard, so the duplication is visible enough to simplify.
For Agents
Make devnet/suites.json genuinely canonical for suite discovery. Prefer one runner script that accepts a suite name or reads the manifest, a shared defineDevnetSuiteConfig helper for per-suite vitest configs, and a workspace/package registration pattern that does not require listing every suite in multiple files where possible. Preserve existing script names only if external tooling depends on them; otherwise route through the canonical runner. Validate with the manifest test and one representative suite command.
There was a problem hiding this comment.
🟡 Issue: The suite manifest still codifies a multi-registry drift problem
What's wrong
This adds a drift guard around duplicated registries instead of deleting the duplication. That keeps suite registration fragile and makes future devnet coverage growth depend on remembering several unrelated edits.
Example
Adding one more devnet suite now means touching at least the suite directory, devnet/suites.json, pnpm-workspace.yaml, package.json scripts, and usually a copied vitest config. The manifest guard catches missed edits, but it does not remove the duplicated registration model.
Suggested direction
Make the manifest or the filesystem the actual source of truth, then have the sweep, root scripts, workspace registration, and config resolution derive from it instead of manually mirroring the same list.
For Agents
Refactor devnet suite registration so only one source owns the suite set. Prefer deriving workspace inclusion with a devnet glob where feasible, using one root script that accepts a suite name and reads devnet/suites.json, and replacing per-suite copied vitest configs with a shared config factory or runner. Preserve the current named scripts only if generated or truly needed. Verify with pnpm test:devnet:manifest and at least one PR-coverage suite.
| join(import.meta.dirname, `.okf-manifest-${cgId}.json`); | ||
|
|
||
| /** Extract the final JSON object the okf CLI prints to stdout (after per-line logs). */ | ||
| function lastJsonBlock(stdout: string): Record<string, unknown> { |
There was a problem hiding this comment.
🟡 Issue: Make CLI JSON output a typed contract instead of scraping stdout
What's wrong
The suite locally reverse-engineers the CLI output stream with ad-hoc string scanning, then carries loosely typed objects through the assertions. That is brittle infrastructure code in the middle of a feature test and makes future CLI output changes harder to evaluate.
Example
Every OKF CLI assertion has to treat the result as Record<string, unknown> and cast fields like iris or predicates; if another CLI command starts logging a JSON-looking block before the summary, the parser logic becomes part of the test contract rather than the CLI contract.
Suggested direction
Move JSON extraction to a shared helper backed by a stable CLI output convention, and type the expected summaries at the boundary.
For Agents
Add a shared runDkgCliJson<T>() helper or a CLI --json/machine-readable mode that emits one stable JSON payload. Use typed summary interfaces for OKF import/verify/export in this suite. Preserve the same CLI coverage while deleting the stdout-scanning parser.
| ).wait(); | ||
| } | ||
| const cg = new ethers.Contract(cgAddr, CG_ABI, owner); | ||
| const tx = await cg.createContextGraph([], 0n, 0, 0, owner.address, accountId, freshNameHash(), { |
There was a problem hiding this comment.
🟡 Issue: The PCA waiver suite never exercises the registered-agent waiver path
What's wrong
The changed behavior includes waiving deposits for creators who are registered agents of the PCA, but the devnet suite only verifies PCA-owner creators. That leaves an important authorization branch unverified.
Example
A regression that deleted the agentToAccountId(creator) branch from ContextGraphWaiverStorage.tryConsumeWaiver would still pass this suite, because no test registers a separate agent address and creates the CG from that agent without approving the deposit.
Suggested direction
Add an end-to-end waiver case where a non-owner registered PCA agent is the CG creator.
For Agents
In devnet/pr1366-pca-deposit-waiver/automated.test.ts, add a well-funded PCA case that registers a fresh agent on the PCA, funds that agent for gas, then calls createContextGraph signed by the agent with publishAuthority still set to the PCA owner. Assert no deposit approval is needed, the waiver event uses the agent as creator, and waivedCgCount increments.
|
|
||
| // ───────────────────────────── HTTP / query ───────────────────────────── | ||
|
|
||
| export async function getJson( |
There was a problem hiding this comment.
💡 Suggestion: Collapse the repeated HTTP helpers into one request primitive
Why it matters
This is a small code-judo cleanup: the harness is now the shared place for devnet HTTP behavior, so keeping the request mechanics in one function reduces copy-paste and makes future methods, headers, timeouts, or response typing changes much cheaper.
Suggestion
Introduce a private requestJson(node, method, path, body?) that owns auth headers, JSON encoding, fetchRetry, and tolerant response parsing. Keep getJson, postJson, and httpDelete as tiny wrappers if the call sites read better.
…tion + mixed-backend devnet support Validated live on a MIXED devnet (oxigraph cores + Blazegraph cores) that the OT-RFC-57 / #1399 value-canon makes an oxigraph node and a Blazegraph node produce IDENTICAL V10 leaves — RS cannot fork across the backend boundary. - pr1386 round-trip assertion: identity -> CONVERGENCE. It compared the store read-back term to canon(input) directly, which held only under the old oxigraph-tuned canon. The #1399 value-canon makes the store's lexical form differ from the canonical form (oxigraph keeps "+02:00", Blazegraph forces "Z", sub-ms truncates), so we now assert canon(store_readback) === canon(input) — exactly what the RS prover computes. REQUIRED before #1399 merges or this test breaks. Holds under both canons (canon is idempotent on its own output). - New cross-backend test: publishes the affected-literal battery (dateTime tz / sub-ms / trailing-zero, date tz, leading-zero gYear, signed-zero double) on an oxigraph node AND a Blazegraph node and asserts canon(oxi_readback) === canon(bg_readback) === canon(input). Skips cleanly if the devnet provisioned no Blazegraph node. (It exercises real divergence only when BOTH a Blazegraph node and the #1399 canon are present; on a pre-#1399 mixed devnet it would fail by design — that is the bug it guards.) - devnet.sh: detect an EXTERNAL Blazegraph already serving on the port (the amd64 Docker image only runs under glacial qemu on Apple silicon; a native arm64 JAR is the workaround), and add DEVNET_BLAZEGRAPH_CTX (/bigdata vs /blazegraph webapp context) + DEVNET_BLAZEGRAPH_NS overrides. Defaults unchanged. Live result on the mixed devnet: pr1386 3/3 (incl. 8/8 identical cross-backend leaves), pr1385 4/4, pr1388 4/4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| const isOxi = (b: string) => /oxigraph|sparql-http/.test(b); | ||
| const oxiNum = [1, 2, 5, 6].find((n) => s.nodes[n] && isOxi(backendOf(n))); | ||
| const bgNum = [3, 4].find((n) => s.nodes[n] && backendOf(n) === 'blazegraph'); | ||
| if (!oxiNum || !bgNum) { |
There was a problem hiding this comment.
🔴 Bug: Cross-backend coverage silently passes when no Blazegraph node is present
What's wrong
The cross-backend test is the only live evidence that Oxigraph and Blazegraph produce identical V10 leaf terms, but the no-Blazegraph path just returns from the test. That creates a green result while omitting the behavior the test is named to verify.
Example
On a devnet where Docker is unavailable or Blazegraph is not provisioned, nodes 3-4 fall back to Oxigraph. bgNum is undefined, the test logs #1386 cross-backend SKIP, returns, and the sweep can still report pr1386-term-canon as passing without exercising the cross-backend case.
Suggested direction
Make absence of Blazegraph a visible skipped or failed precondition for the coverage sweep, not a green test.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, make the coverage-gating cross-backend test require a mixed Oxigraph/Blazegraph devnet, or use a real Vitest skip mechanism that the sweep records as skipped. If this suite is expected to validate OT-RFC-57 in the 10.0.2 sweep, fail the precondition instead of returning success.
There was a problem hiding this comment.
🔴 Bug: Cross-backend canonicalization coverage silently passes on the default all-oxigraph sweep
What's wrong
The test headline claims live oxigraph-vs-Blazegraph verification, but the default sweep explicitly does not provide Blazegraph. Because the test just logs and returns, regressions that only appear at the backend boundary still produce a green PR-coverage run.
Example
Run the 10.0.2 sweep on the default devnet described by the script: nodes 3-4 are not Blazegraph, the test logs #1386 cross-backend SKIP..., returns from the it, and the suite is counted green without ever comparing oxigraph leaves to Blazegraph leaves.
Suggested direction
Treat the absence of Blazegraph as a verification failure for this coverage path, or wire the sweep to provision and require a mixed-backend devnet before running the cross-backend assertions.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, make the cross-backend test fail or use an explicit Vitest skip that the sweep treats as missing required coverage when the PR-coverage sweep is supposed to validate OT-RFC-57. Alternatively split it into a separate required mixed-backend job and update scripts/devnet-1002-coverage-sweep.sh to run that job with a Blazegraph-backed devnet. The test should prove that at least one oxigraph node and one Blazegraph node were exercised before reporting success.
| for (const row of rows) { | ||
| const pred = valueOf(row.p).replace(/^<|>$/g, ''); | ||
| // canon(store read-back) = exactly the leaf term the RS prover hashes. | ||
| leafByPred.set(pred, canonicalizeObjectTermForHash(normTerm(row.o))); |
There was a problem hiding this comment.
🟡 Issue: Cross-backend suite does not compare the V10 leaves it claims to validate
What's wrong
The test is described as proving identical V10 leaves across oxigraph and Blazegraph, but it only compares canonicalized object terms. Because each backend publication uses a different generated subject, the actual leaf hashes would differ even in the healthy case, and a full-leaf regression could pass unnoticed.
Example
The oxigraph publication subject is generated from xbackend-oxi, while the Blazegraph publication subject is generated from xbackend-bg. Even with identical objects, the real V10 leaves for those two triples cannot be identical because V10 leaf content includes subject and predicate as well as object. The current assertion would still pass because it only compares the object term.
Suggested direction
Compute and compare the real V10 leaf hash over subject, predicate, and canonicalized object, or rename/narrow the assertion so it does not claim leaf equality.
For Agents
In devnet/pr1386-term-canon/automated.test.ts, either make the test assert object canonicalization explicitly, or publish/read the same subject and compare actual hashTripleV10(subject, predicate, canonicalObject) values across backends. The test should fail if the full V10 leaf bytes differ across stores.
|
|
||
| // Wrapper NFT ABI fragments this suite needs (harness NFT_ABI lacks transferFrom | ||
| // / getRegisteredAgents / isAgent). The wrapper name IS in state.addrs. | ||
| const NFT_ABI = [ |
There was a problem hiding this comment.
🟡 Issue: Use canonical contract factories instead of hand-copying ABI fragments per suite
What's wrong
The PR repeats low-level ABI strings across the shared harness and individual suites. That is brittle test infrastructure: it scatters contract knowledge and makes each suite responsible for keeping fragments in sync with the deployed artifacts.
Example
If DKGPublishingConvictionNFT changes a signature or event shape, the update has to be found across the harness and multiple suites. One suite can silently keep an obsolete fragment while another uses a newer one.
Suggested direction
Move ABI ownership into the harness/contracts layer and have suites ask for named contract handles. This removes duplicated string fragments and makes the existing ABI artifacts the source of truth.
For Agents
Add a contracts helper that loads canonical ABI JSON from packages/evm-module/abi or exports named contract factories for Token, DKGPublishingConvictionNFT, PublishingConviction, ContextGraphs, IdentityStorage, etc. Keep suite-specific minimal ABIs only when the contract is genuinely absent from the canonical artifacts.
…ative-Blazegraph script - pr1386 astral test: publishes 😀 (U+1F600) on an oxigraph node AND a Blazegraph node. Asserts oxigraph preserves it (leaf == canon(input)); DOCUMENTS (never hard-fails) whether Blazegraph corrupts it. LIVE FINDING on the mixed devnet: Blazegraph PRESERVED the astral char via the daemon's raw-UTF-8 SPARQL-update path — the §7.7 corruption did NOT manifest, unlike the CI unit oracle which triggered it via a `\U0001F600` ESCAPE insert. So the hazard is escape-ingestion- specific and narrower in practice than the oracle implied; the daemon decodes to raw UTF-8 before writing, which Blazegraph stores correctly. - scripts/devnet-blazegraph-native.sh: reproducible + lifecycle-owned native Blazegraph (the amd64 Docker image is dead under qemu on Apple silicon). start downloads the 2.1.6 JAR, boots it on the host arm64 JVM (~6s), and seeds the node3/node4 quads namespaces (exact canonical DOCTYPE on /blazegraph/namespace); stop/status manage it (devnet.sh does NOT own this process). Pairs with DEVNET_BLAZEGRAPH_CTX=blazegraph. Live mixed-devnet result (nodes 1-2 oxigraph, 3-4 native Blazegraph, 5-6 external oxigraph): pr1386 4/4, pr1385 4/4, pr1388 4/4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| 'cross-backend: an oxigraph node and a Blazegraph node produce IDENTICAL V10 leaves (OT-RFC-57)', | ||
| async () => { | ||
| const s = state.v!; | ||
| const backendOf = (num: number): string => { |
There was a problem hiding this comment.
🟡 Issue: Backend topology knowledge is leaking into individual suites
What's wrong
The new suites duplicate low-level config parsing and encode devnet topology assumptions locally. That couples tests to scripts/devnet.sh internals and increases drift risk as the mixed-backend setup evolves.
Example
The Blazegraph/native workaround changes devnet topology in scripts/devnet.sh, while suites separately hardcode node groups such as [1,2,5,6] for oxigraph and [3,4] for Blazegraph. The next topology tweak has to be rediscovered in several tests.
Suggested direction
Centralize devnet backend discovery in the shared bootstrap API so suites ask for capabilities instead of parsing config files and hardcoding node-number conventions themselves.
Confidence note
This is a maintainability concern, not a claim that the current topology detection is behaviorally wrong.
For Agents
Move backend discovery into the bootstrap layer, e.g. getNodeStoreConfig(node), getNodeBackend(node), findNodesByBackend(state, predicate), and oxigraphQueryEndpoint(node). Update pr1385/pr1386 to use those helpers and keep devnet.sh as the only place that defines topology policy.
…OT-RFC-57) The V10 merkle leaf is keccak256(canonicalizeObjectTermForHash(term)). The publisher builds leaves from the in-memory input; the RS prover rebuilds them from the triple-store read-back. The canon reproduced oxigraph 0.5.5's stored form, but Blazegraph (mainnet core nodes) and Neptune normalize temporal literals to a different value form (force UTC, truncate sub-ms), so canon(input) != canon(store-readback) for xsd:dateTime/xsd:time -> the publisher and a Blazegraph prover compute DIFFERENT leaves for the same triple -> RandomSampling fork (and a publisher/prover mismatch even on one backend). Root cause of the OKF->VM MERKLE_MISMATCH; OriginTrail#1386 matched oxigraph only. This makes the canon a backend-INDEPENDENT value canon for xsd:dateTime and xsd:time: normalize to UTC (subtract the tz offset, rolling the date across midnight via civilFromDays), truncate the fraction to milliseconds, always emit Z. The publisher's input AND every backend's read-back (oxigraph, Blazegraph, Neptune) then converge to one leaf. Blazegraph's form is a fixed point => ~zero mainnet migration; oxigraph/devnet leaves converge up (coordinated release, spec §9.0.2). Validation: - oxigraph oracle (packages/publisher/test/term-canon-oracle.test.ts) reframed from identity to CONVERGENCE (canon(oxigraph-readback) == canon(input)); 34/34 green locally (in-process oxigraph). - Blazegraph oracle (packages/storage/test/term-canon-blazegraph-oracle.test.ts) brought in + wired into the tornado-blazegraph CI lane; dateTime/time flipped from it.fails to it (CI validates against a live Blazegraph — local blazegraph is amd64-under-qemu, unrunnable on arm64). SCOPE: this commit fixes xsd:dateTime + xsd:time only. date/gregorian, some xsd:double/float, and some escaped strings still diverge and remain it.fails, pending the rest of the backend-independent canon (tracked in OT-RFC-57). NOT for merge until the Blazegraph oracle is green in CI and reviewed. Refs: OT-RFC-57 (dkgv10-spec#136). The oracle + CI wiring overlap OriginTrail#1397 (they originate there); resolve on merge by taking this branch's version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Network-level (devnet) integration coverage for the 7 merged 10.0.2 PRs — one suite per PR, each exercising the PR's network-observable behavior end-to-end on a live 6-node devnet. Plus a shared harness, a cross-backend consensus oracle, and a health-guarded stability-sweep runner.
All 7 suites validated green on a live devnet, and a health-guarded stability loop ran 35 rounds at 100% on 7/8 suites (the one blemish was a transient
ECONNRESETon a best-effort RS poll — since hardened viafetchRetry, not a logic bug).Suites (
devnet/prNNNN-*/)pr1386-term-canoncanonicalizeObjectTermForHash(input)via the live store +/api/query, and an RS proof lands. Honest scope: all-oxigraph ⇒ pipeline consistency, not cross-backend consensus.pr1385-subgraph-rscreate-sub-graph→shared-memory write/publish --sub-graph-name), scoped materialization in<cg>/<sub>, RS liveness with a sub-graph KA residentpr1388-okf-integrationokf import→ WM →--shareSWM query →okf verify→okf exportround-trippr1366-pca-deposit-waiverpr1387-bulk-register-agentsregisterAgents(uint256,address[])on the PublishingConviction logic contract — bulk + all-or-nothing atomicity + owner-onlypr1384-preserve-agents-on-transfertransferFrom;clearAgentsresets + emits; old owner can't clear (NotAccountOwner)pr1370-admin-op-walletremoveOperationalWalletguard (throwaway wallets only)Infra
devnet/_bootstrap/harness.ts— shared harness (detect/identity/publish/query/fresh-PCA/contract/HTTP helpers),fetchRetry-hardened, smoke-validated._bootstrap+ each suite are pnpm workspace packages.packages/storage/test/term-canon-blazegraph-oracle.test.ts— asserts an oxigraph node and a blazegraph node compute the same V10 merkle leaf for the same typed literal (canon(bzRoundTrip(x)) == canon(x) == canon(oxiRoundTrip(x))) — the genuine oxigraph-vs-blazegraph agreement the all-oxigraph devnet can't cover. Gated onBLAZEGRAPH_TEST_URL; wired into CI'stornado-blazegraphlane.scripts/devnet-1002-coverage-sweep.sh— health-guarded stability sweep (stability loop first, destructive orchestrator opt-in/last).Runtime artifacts (
turns/,.okf-manifest-*,.okf-export-*) are gitignored.Findings surfaced (worth follow-ups)
dkg shared-memory publish --sub-graph-nameexits 1 on a cosmetic post-success error (Cannot read properties of undefined (reading 'length')) even though the publish confirms.0) on devnet, so the waiver only activates when it's set.How to run
pnpm test:devnet:pr1386-term-canon(etc.), against a devnet started with./scripts/devnet.sh start 6.🤖 Generated with Claude Code