fix(core): backend-independent V10 leaf canon — xsd:dateTime/time (OT-RFC-57, Tactical)#1399
Conversation
…OT-RFC-57) The V10 merkle leaf is keccak256(canonicalizeObjectTermForHash(term)). The publisher builds leaves from the in-memory input; the RS prover rebuilds them from the triple-store read-back. The canon reproduced oxigraph 0.5.5's stored form, but Blazegraph (mainnet core nodes) and Neptune normalize temporal literals to a different value form (force UTC, truncate sub-ms), so canon(input) != canon(store-readback) for xsd:dateTime/xsd:time -> the publisher and a Blazegraph prover compute DIFFERENT leaves for the same triple -> RandomSampling fork (and a publisher/prover mismatch even on one backend). Root cause of the OKF->VM MERKLE_MISMATCH; #1386 matched oxigraph only. This makes the canon a backend-INDEPENDENT value canon for xsd:dateTime and xsd:time: normalize to UTC (subtract the tz offset, rolling the date across midnight via civilFromDays), truncate the fraction to milliseconds, always emit Z. The publisher's input AND every backend's read-back (oxigraph, Blazegraph, Neptune) then converge to one leaf. Blazegraph's form is a fixed point => ~zero mainnet migration; oxigraph/devnet leaves converge up (coordinated release, spec §9.0.2). Validation: - oxigraph oracle (packages/publisher/test/term-canon-oracle.test.ts) reframed from identity to CONVERGENCE (canon(oxigraph-readback) == canon(input)); 34/34 green locally (in-process oxigraph). - Blazegraph oracle (packages/storage/test/term-canon-blazegraph-oracle.test.ts) brought in + wired into the tornado-blazegraph CI lane; dateTime/time flipped from it.fails to it (CI validates against a live Blazegraph — local blazegraph is amd64-under-qemu, unrunnable on arm64). SCOPE: this commit fixes xsd:dateTime + xsd:time only. date/gregorian, some xsd:double/float, and some escaped strings still diverge and remain it.fails, pending the rest of the backend-independent canon (tracked in OT-RFC-57). NOT for merge until the Blazegraph oracle is green in CI and reviewed. Refs: OT-RFC-57 (dkgv10-spec#136). The oracle + CI wiring overlap #1397 (they originate there); resolve on merge by taking this branch's version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…RFC-57) The exhaustive oracle asserted core(input)==oxigraph AND core(oxi)==oxi (identity/no-migration). The backend-independent canon emits the UTC value form for dateTime/time, so those no longer hold on the oxigraph side. Assert CONVERGENCE (canon(oxi_readback)==canon(input)) + true idempotence (canon(canon(x))==canon(x)). 40/40 green locally (both publisher oracles). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es on orchestrator, pr1386 imports package boundary - Move the cross-backend blazegraph term-canon oracle + its CI wiring OUT of this devnet-coverage PR: it's a CONSENSUS artifact and now lives with the canon fix (OT-RFC-57 / fix/backend-independent-leaf-canon), where its it.fails -> it flips as each datatype is fixed. Removes the "it.fails in an agreement oracle" concern from this PR. - Sweep (devnet-1002-coverage-sweep.sh) now also gates on the phase-3 orchestrator: a real orchestrator failure exits 3 (the expected time-box cap 124/137 stays non-fatal). Previously only suite failures were counted. - pr1386 imports canonicalizeObjectTermForHash from the package boundary (@origintrail-official/dkg-core) instead of reaching into ../../packages/ core/dist; adds the workspace dep. Suite still green (2/2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nth/gMonthDay/gMonth/gDay (OT-RFC-57) Continues the value-canon: gregorian types now emit Blazegraph's value form. date/gYear/gYearMonth normalize to the UTC date of midnight-in-tz (positive offset rolls the date back a day) via a new civilFromDays + utcDateFromMidnight, emitting NO timezone and dropping leading zeros. gMonthDay/gMonth/gDay have no year to convert, so the timezone is stripped (oracle battery exercises Z/+00:00; a non-UTC offset on these bare types is undefined across backends — OT-RFC-57 §7.8). Removed the now-dead splitTz/normYear (fmtYear subsumes year formatting). Oxigraph oracles reframed-to-convergence stay green (40/40 local); the blazegraph oracle's date/gregorian case flipped it.fails -> it (CI validates cross-backend). Remaining it.fails: xsd:double/float + literal-content escaping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s them)
The date/gregorian CI blazegraph oracle caught one residual divergence:
"02026"^^gYear — Blazegraph normalizes a leading-zero year to its value ("2026")
on write, while the strict XSD YEAR pattern rejected it and kept the literal
verbatim (oxigraph's behavior). Loosen YEAR to any 4+-digit year; the existing
BigInt+fmtYear year normalization strips the leading zero, matching Blazegraph.
Convergence oracle holds either way (canon(input) and canon(store-readback) fold
to the same value). Oxigraph oracles stay green (40/40).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…C-57) Flip the double/float + escaping oracle cases from it.fails to it so CI surfaces Blazegraph's exact stored + canon'd forms for the divergent cases. This commit is expected RED on those two cases — the next commit implements the canon fix and turns them green. (date/gregorian confirmation rides along in the same run.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…MP (OT-RFC-57)
CI-reveal surfaced the last two cross-backend divergences:
- xsd:double/float: only "-0.0" diverged — Blazegraph drops the sign on write
("0.0"→value 0) while oxigraph keeps "-0". canonDouble now emits "0" for both
signed zeros; oracle case flipped green (16/16 double + 7/7 float converge).
- literal escaping: 7/9 (all BMP) already converge. The 2 astral cases (😀,
U+1F600) diverge because Blazegraph CORRUPTS supplementary-plane codepoints on
write, truncating to the low 16 bits (U+1F600 → U+F600). That is stored-byte
corruption, not a representation difference — no leaf canon can reconcile two
backends physically holding different strings. Split the oracle: BMP escaping is
an asserted `it` (green); astral is a documented `it.fails` tracking the
Blazegraph limitation (OT-RFC-57 §7.7). Publishing astral content is a
pre-existing cross-backend consensus hazard, independent of this change.
All six datatype families that CAN converge now do (dateTime/time, date/gregorian,
numeric incl. double, duration, escaping-BMP, verbatim). Oxigraph oracles 40/40.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
§7.6 is Migration; the signed-zero fold rule lives in §7.5 (Protocol value canonicalization). Matches the RFC #136 update pinning the CI-revealed rules. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion + mixed-backend devnet support Validated live on a MIXED devnet (oxigraph cores + Blazegraph cores) that the OT-RFC-57 / #1399 value-canon makes an oxigraph node and a Blazegraph node produce IDENTICAL V10 leaves — RS cannot fork across the backend boundary. - pr1386 round-trip assertion: identity -> CONVERGENCE. It compared the store read-back term to canon(input) directly, which held only under the old oxigraph-tuned canon. The #1399 value-canon makes the store's lexical form differ from the canonical form (oxigraph keeps "+02:00", Blazegraph forces "Z", sub-ms truncates), so we now assert canon(store_readback) === canon(input) — exactly what the RS prover computes. REQUIRED before #1399 merges or this test breaks. Holds under both canons (canon is idempotent on its own output). - New cross-backend test: publishes the affected-literal battery (dateTime tz / sub-ms / trailing-zero, date tz, leading-zero gYear, signed-zero double) on an oxigraph node AND a Blazegraph node and asserts canon(oxi_readback) === canon(bg_readback) === canon(input). Skips cleanly if the devnet provisioned no Blazegraph node. (It exercises real divergence only when BOTH a Blazegraph node and the #1399 canon are present; on a pre-#1399 mixed devnet it would fail by design — that is the bug it guards.) - devnet.sh: detect an EXTERNAL Blazegraph already serving on the port (the amd64 Docker image only runs under glacial qemu on Apple silicon; a native arm64 JAR is the workaround), and add DEVNET_BLAZEGRAPH_CTX (/bigdata vs /blazegraph webapp context) + DEVNET_BLAZEGRAPH_NS overrides. Defaults unchanged. Live result on the mixed devnet: pr1386 3/3 (incl. 8/8 identical cross-backend leaves), pr1385 4/4, pr1388 4/4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| const hourN = rolls ? 0 : +hh; | ||
| if (rolls) days += 1n; | ||
| // UTC: subtract the offset (whole minutes); roll the date across midnight. | ||
| const totalMin = hourN * 60 + +mi - offsetMin; |
There was a problem hiding this comment.
🔴 Bug: Timezone folding happens after the overflow guard
What's wrong
The overflow guard is meant to prevent canonicalizing temporal values that the store cannot parse stably. Because the new UTC conversion runs after that guard, boundary literals can pass validation and then be shifted outside the supported range, producing a protocol leaf for a value the storage backend may reject or preserve differently.
Example
"5391559471919-03-30T14:00:00-14:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> is still within the checked local i128 seconds range, but subtracting -14:00 emits 5391559471919-03-31T04:00:00Z, which is past the max representable second. Expected behavior is to leave an overflowed temporal literal verbatim rather than normalize it into an unrepresentable UTC value.
Suggested direction
Apply timezone/T24 normalization before the i128 range decision, or include the offset and roll in the range calculation; if the normalized value is outside the supported store range, fall back to verbatim.
Confidence note
This follows from the code's own i128 range invariant; the exact store behavior at the far boundary should be confirmed, but the canonicalizer now emits a UTC value outside the range it just validated against.
For Agents
In packages/core/src/crypto/term-canon.ts, update canonDateTime and the date/gYear/gYearMonth paths to validate the normalized UTC instant/date, not only the original lexical components. Preserve normal timezone folding, and add max/min boundary tests where the offset or T24 roll crosses the i128 limit.
There was a problem hiding this comment.
🔴 Bug: Sub-millisecond fractions can make invalid hour-24 times roll into valid leaves
What's wrong
The new millisecond truncation runs before validateClock. For hour 24, validity depends on whether the original seconds value including fraction is zero. A non-zero sub-millisecond fraction is truncated away, so invalid literals are treated as valid and normalized into a UTC leaf, collapsing distinct invalid inputs into the same hashable value instead of preserving them verbatim.
Example
"2026-06-29T24:12:00.0005"^^<http://www.w3.org/2001/XMLSchema#dateTime> has a non-zero seconds fraction with a non-zero minute, so the hour-24 form should be kept verbatim/rejected by the temporal validator. With the new code, .0005 is truncated to an empty millisecond fraction before validateClock, so it rolls and hashes as "2026-06-30T00:12:00Z"^^<...#dateTime>. The same applies to "24:12:00.0005"^^<...#time>.
Suggested direction
Separate lexical validity from output precision: decide whether hour 24 is rollable using the original fractional seconds, then truncate only after the value has passed validation.
For Agents
In packages/core/src/crypto/term-canon.ts, validate the hour-24 rule against the raw fractional seconds value, or preserve a boolean for whether the original fraction was numerically non-zero, before applying millisecond truncation. Add dateTime and time cases with 24:MM:00.0005 where MM != 00 proving they stay verbatim while valid millisecond truncation still works for ordinary times.
| // and not consensus-verified — see OT-RFC-57 §7.8. | ||
| function canonGMonthDay(lex: string): string { | ||
| const { body, tz } = splitTz(lex); | ||
| const { body } = splitTzToOffset(lex); |
There was a problem hiding this comment.
🔴 Bug: Non-UTC offsets are silently stripped from bare Gregorian types
What's wrong
For gMonthDay, gMonth, and gDay, the code parses a valid timezone and then returns only body, so every non-zero offset is lost. Since these types lack enough calendar context to roll across a date, stripping the offset can conflate distinct literals and create consensus assumptions the code comment says are not verified.
Example
"--06-29+14:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> and "--06-29-14:00"^^<http://www.w3.org/2001/XMLSchema#gMonthDay> both canonicalize to "--06-29"^^<...#gMonthDay>. Those inputs carry different timezone offsets, but the leaf drops that distinction without a date context to convert it safely.
Suggested direction
Only strip absent/zero timezones for these no-year/no-date types unless OT-RFC-57 defines a safe value-space mapping for non-zero offsets; otherwise keep the original lexical form so different values do not collapse to the same leaf.
For Agents
In packages/core/src/crypto/term-canon.ts, look at canonGMonthDay, canonGMonth, and canonGDay. Preserve the existing Z/+00:00 behavior, but leave non-zero offsets verbatim or define an explicit reject/normalization rule; add cases for +14:00 and -14:00 proving distinct or rejected behavior.
Separate the bare-gregorian timezone policy from UTC normalization
What's wrong
The PR introduces a helper whose contract is UTC offset normalization, then uses it in callers that intentionally ignore the offset. That is a boundary smell: future maintainers have to infer from comments that some datatypes normalize to UTC while others strip timezone syntax, including cases the comment says are not consensus-verified.
Example
canonGMonthDay('--06-29+02:00') goes through a UTC-offset parser, then silently drops +02:00 because the caller ignores offsetMin. That makes the timezone policy for partial Gregorian types implicit and easy to accidentally expand.
Suggested direction
Do not reuse splitTzToOffset as a generic stripper. A small typed dispatcher or separate helper per temporal policy would make the invariants visible and remove the current “parse then ignore” coupling.
For Agents
Look in packages/core/src/crypto/term-canon.ts around the date/time family. Preserve the current canonical outputs, but split the timezone handling into explicit policy helpers: one for UTC-normalized types that consumes offsetMin, and one for bare gMonth/gDay/gMonthDay that intentionally strips only the supported timezone forms or names the unsupported policy directly. Add/keep cases proving non-UTC partial-gregorian behavior is deliberate.
There was a problem hiding this comment.
🟡 Issue: Timezone normalization is split across several hand-rolled paths
What's wrong
The PR introduces multiple local implementations of the same temporal normalization concept. For a consensus-critical canonicalizer, this is a structural maintenance risk: future changes to timezone, T24, or range semantics must be mirrored across several branches, and the reader has to re-prove each formula independently.
Example
dateTime, time, date, gYear, and gYearMonth all subtract offsetMin, roll across day/year boundaries, then format and range-check. Today those rules are spread across utcDateFromMidnight, inline dateTime arithmetic, and a time-only modulo path.
Suggested direction
Collapse the duplicated UTC-roll logic into one helper or typed value model, then have each datatype parser feed that helper and format its own lexical shape. That would make the invariant auditable in one place instead of relying on several similar formulas staying aligned.
For Agents
In term-canon.ts, extract a small temporal normalization model/helper that accepts date fields when present, clock fields, T24 rollover state, and offset minutes, then returns normalized date/time fields. Preserve current output for dateTime/time/date/gYear/gYearMonth and keep the existing oracle coverage green.
| await expectCrossBackendLeafAgreement(['1', '0', 'true', 'false'].map((v) => lit(v, 'boolean'))); | ||
| }); | ||
|
|
||
| // ─────────────────────────────────────────────────────────────────────────── |
There was a problem hiding this comment.
🟡 Issue: Remove stale “known divergence” narrative from the passing oracle tests
What's wrong
This new test file is meant to be a consensus oracle, but its central comment block now contradicts the actual tests. That makes the test suite harder to maintain because the prose says several categories are expected failures while the code treats them as required guarantees.
Example
A maintainer reading lines 144-166 will think dateTime/time/date/double/escaping are intentionally failing tracked divergences, but lines 168, 177, 185, 190, 208, and 233 are passing agreement tests.
Suggested direction
Collapse this block into a current-purpose comment, or move the divergence explanation down to the single remaining it.fails case. The oracle should describe what is now guaranteed, not the intermediate state from earlier commits.
For Agents
Rewrite the narrative in packages/storage/test/term-canon-blazegraph-oracle.test.ts to match the current structure: passing cross-backend agreement cases first, with a narrow note only around the remaining astral it.fails. Do not change asserted behavior unless the comments reveal a genuinely intended failing case.
| /** Round-trip `objects` through a store; return, per input index, the object | ||
| * term string the store emits on CONSTRUCT (its canonical stored form). */ | ||
| async function roundTrip( | ||
| store: { insert(q: Quad[]): Promise<unknown>; query(s: string): Promise<any>; dropGraph?(g: string): Promise<unknown> }, |
There was a problem hiding this comment.
🟡 Issue: Keep the cross-backend oracle on the typed TripleStore boundary
What's wrong
The test sits inside the storage package and is specifically validating backend behavior, but it weakens the package’s own query contract with any and an alternate result shape. That adds unnecessary ambiguity at exactly the boundary the oracle is supposed to make trustworthy.
Example
If an adapter returned { quads: [...] } without the required type: 'quads', this oracle would still pass its round-trip extraction instead of surfacing the interface drift.
Suggested direction
Replace the bespoke structural type and any with the existing storage contracts, then remove the loose fallback branch. The helper can stay local, but it should strengthen the canonical adapter interface rather than bypass it.
For Agents
Import TripleStore or QueryResult from ../src/triple-store.js, type the helper as a Pick<TripleStore, 'insert' | 'query'> plus optional cleanup where needed, and handle only res.type === 'quads' for the CONSTRUCT query. Throw on any other result shape so the test reinforces the storage boundary.
There was a problem hiding this comment.
🟡 Issue: The Blazegraph oracle duplicates the canonicalization corpus instead of sharing it
What's wrong
The new oracle is valuable, but it copies the same broad datatype batteries already maintained in the publisher oracle. That makes the test suite harder to evolve because the canonicalization contract is spread across hand-synchronized arrays rather than one reusable corpus.
Example
Adding a new canonicalization rule now requires remembering to update the publisher oracle corpus, the exhaustive corpus when applicable, and the Blazegraph oracle corpus. Missing one leaves two tests that appear authoritative but cover different contracts.
Suggested direction
Move the literal case groups and convergence assertion scaffolding into a shared test utility, then have the oxigraph and cross-backend suites compose the same cases with different round-trip backends.
For Agents
Extract shared term-canon fixture groups and a backend-parameterized oracle helper into a test-support module that both publisher and storage tests can import. Keep backend setup local, but share the literal batteries and convergence assertion shape.
| }); | ||
|
|
||
| it('xsd:boolean', async () => { | ||
| await expectCrossBackendLeafAgreement(['1', '0', 'true', 'false'].map((v) => lit(v, 'boolean'))); |
There was a problem hiding this comment.
🟡 Issue: Negative timezone normalization is not covered by the new oracle
What's wrong
The production change now normalizes signed timezone offsets by subtracting offsetMin, but the new cross-backend tests only exercise non-zero positive offsets. A sign regression for -HH:MM could still leave the added oracle green while producing different leaf values for literals stored with negative timezones.
Example
A focused regression case would include a value such as "2026-06-29T23:30:00-02:00"^^xsd:dateTime, where the expected UTC canonical date is 2026-06-30T01:30:00Z. If the sign were accidentally applied the wrong way, the current added cases would not catch the negative-offset path.
Suggested direction
Extend the cross-backend oracle battery with non-zero negative timezone offsets, preferably including one that rolls into the next UTC day.
Confidence note
The supplied diff shows new cross-backend temporal coverage for +02:00 and zero offsets, but no non-zero negative offset case in the added tests.
For Agents
Look at packages/storage/test/term-canon-blazegraph-oracle.test.ts near the dateTime/time/date timezone batteries. Add at least one non-zero negative timezone case for xsd:dateTime that crosses midnight, plus a time and/or date case if those contracts matter. The test should prove canon(input), canon(blazegraph readback), and canon(oxigraph readback) converge for negative offsets.
There was a problem hiding this comment.
🔴 Bug: The convergence oracle does not pin the canonical leaf bytes
What's wrong
The new tests prove that input and backend readbacks converge under the current canonicalizer, but they do not independently verify the exact canonical form this PR changes. For protocol hash code, the exact output bytes are the changed behavior; comparing several values after running the same function on every side can give false confidence.
Example
If normFrac accidentally truncated to two digits instead of milliseconds, both canon(input .123456) and canon(Blazegraph readback .123Z) would become the same wrong .12Z value, so this oracle would still pass while the protocol leaf bytes are wrong.
Suggested direction
Keep the backend round-trip agreement test, but add pure canonicalization assertions with literal expected outputs for representative OT-RFC-57 cases so a consistently wrong canonicalizer cannot pass green.
For Agents
Add independent expected-output table tests near the term-canon oracle tests for the changed canonical byte forms: dateTime/time UTC Z emission, timezone date rollovers, millisecond truncation, signed double zero, and date/gYear/gYearMonth UTC-midnight forms. Preserve the cross-backend convergence oracle, but do not use the canonicalizer itself as the only oracle for the expected leaf bytes.
There was a problem hiding this comment.
🟡 Issue: The astral cross-backend case is documented but not verified
What's wrong
The new CI oracle is presented as cross-backend leaf-agreement coverage, but it still allows a known Blazegraph/Oxigraph divergence for astral literal content to pass CI. Expected-failure tests are useful documentation, but they are not validation evidence for the changed consensus surface.
Example
A PR that still lets users publish "smile\\U0001F600"^^<http://www.w3.org/2001/XMLSchema#string> can pass the new Blazegraph oracle because the only test covering that RandomSampling fork is expected to fail.
Suggested direction
Do not leave the only executable evidence for a known consensus-divergent input as it.fails; pair it with a blocking mitigation test or make the unsupported-input boundary explicit with a passing rejection test.
Confidence note
This is a verification finding about the added CI evidence. If astral-plane literals are intentionally out of scope for this PR and rejected elsewhere before storage, that mitigation should be tied to a passing regression test instead.
For Agents
Decide whether astral literals are supported. If not, add a passing publish/storage validation test proving they are rejected before leaf hashing. If they are supported, replace the expected-failure marker with a passing cross-backend agreement test after implementing the mitigation.
… + don't collapse bare-gregorian offsets
Two consensus bugs on the value-canon:
🔴 Range guard ran on the LEXICAL components, before the UTC/T24 normalization —
so a boundary literal could pass validation and then be shifted OUTSIDE the i128
seconds range, emitting a leaf for a value the store can't represent stably
(e.g. "5391559471919-03-30T14:00:00-14:00" → past max second). Now canonDateTime
/ canonDate / canonGYear / canonGYearMonth range-check the NORMALIZED UTC instant;
out of range → verbatim.
🔴 Bare gregorian types (gMonthDay/gMonth/gDay) stripped ANY timezone, silently
COLLAPSING distinct values ("--06-29+14:00" and "--06-29-14:00" → same leaf).
Now fold only a UTC-equivalent zone (Z/+00:00/-00:00 → no-tz value form); a
non-UTC offset is kept VERBATIM so distinct literals stay distinct (factored into
a shared bareGregorian helper; the "parse-then-ignore" coupling is gone).
Plus otReviewAgent 🟡s: rewrote the blazegraph-oracle's stale "known divergence /
it.fails" narrative (the divergence is RESOLVED — cases are asserted `it` now, only
2.1.5-escape astral remains it.fails), and added negative-offset (-05:00) coverage
to the dateTime/time/date batteries.
Pure-canon unit tests for both bugs added (overflow→verbatim; non-UTC offsets
distinct + verbatim; UTC-equivalent folds). oxigraph oracles 42/42; blazegraph
oracle validated locally against a live server (15/16; the 1 is the astral it.fails
inverting on 2.1.6, which fixed the escape bug CI's 2.1.5 still exercises).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks — triaged all otReviewAgent findings; fixes in 🔴 Fixed:
Both have pure-canon unit tests (overflow→verbatim; non-UTC offsets distinct+verbatim; UTC-equivalent folds). 🟡 Fixed:
🟡 Already satisfied:
Validation: oxigraph oracles 42/42; blazegraph oracle validated against a live server (15/16 — the 1 is the astral |
Draft / WIP — not for merge until the Blazegraph oracle is green in CI and reviewed. Implements the Tactical fix from OT-RFC-57.
Problem
A V10 leaf is
keccak256(canonicalizeObjectTermForHash(term)). The publisher builds leaves from the in-memory input; the RS prover rebuilds them from the triple-store read-back. The canon reproduced oxigraph 0.5.5's stored form — but mainnet core nodes run Blazegraph (and Neptune is Blazegraph-derived), which normalizesxsd:dateTime/timeto a different value form (forces UTC, truncates sub-ms). Socanon(input) ≠ canon(store-readback)→ the publisher and a Blazegraph prover compute different leaves for the same triple → RandomSampling fork (and a publisher⇄prover mismatch even on one backend). This is the root cause of the observed OKF→VMMERKLE_MISMATCH_IN_SWM; #1386 matched oxigraph only.This change (scoped to xsd:dateTime + xsd:time)
Makes the canon a backend-independent value canon for
xsd:dateTime/xsd:time: normalize to UTC (subtract the tz offset, rolling the date across midnight via a newcivilFromDays), truncate the fraction to ms, always emitZ. The publisher's input and every backend's read-back (oxigraph, Blazegraph, Neptune) then converge to one leaf. Blazegraph's form is a fixed point ⇒ ~zero mainnet migration; oxigraph/devnet leaves converge up (coordinated release, spec §9.0.2).Validation
packages/publisher/test/term-canon-oracle.test.ts) reframed from identity → convergence (canon(oxigraph-readback) == canon(input)) — the property consensus actually needs. 34/34 green locally (in-process oxigraph).packages/storage/test/term-canon-blazegraph-oracle.test.ts) brought in + wired into thetornado-blazegraphCI lane;dateTime/timeflippedit.fails→it. CI validates against a live Blazegraph (local blazegraph is amd64-under-qemu on arm64 — unrunnable), so this PR'stornado-blazegraphjob is the gate.Remaining (still
it.fails, follow-up commits)xsd:date/gregorian, somexsd:double/float, and some escaped strings still diverge — CI reveals Blazegraph's exact forms for those; I did not guess on anything unverifiable.Note
The oracle + its CI wiring overlap #1397 (they originate there as documentation of the divergence). Resolve on merge by taking this branch's version (it flips the fixed cases).
🤖 Generated with Claude Code