Skip to content

feat(okf): Google OKF → DKG integration — import OKF bundles as deterministic, provenance-bearing Knowledge Assets#1331

Open
Zigoljube wants to merge 13 commits into
OriginTrail:mainfrom
Zigoljube:feat/okf-integration
Open

feat(okf): Google OKF → DKG integration — import OKF bundles as deterministic, provenance-bearing Knowledge Assets#1331
Zigoljube wants to merge 13 commits into
OriginTrail:mainfrom
Zigoljube:feat/okf-integration

Conversation

@Zigoljube

@Zigoljube Zigoljube commented Jun 25, 2026

Copy link
Copy Markdown

What this is

A first-class Google Open Knowledge Format (OKF) → DKG integration: ingest a portable OKF bundle (Markdown + YAML frontmatter + untyped cross-links) into the DKG as deterministic, owned, provenance-bearing Knowledge Assets, reconstructing the bundle's cross-concept link graph — and a clean inverse to serialise a Context Graph back into a conformant OKF bundle.

OKF standardises how knowledge is written/exchanged but deliberately ships no verification, provenance or ownership layer (OKF SPEC §1, §10). This makes the DKG the trust-and-permanence backend for OKF: the same portable Markdown, now owned, provenance-bearing and shareable across agents.

On "verifiable": import lands in Working Memory and (with --share) Shared Working Memory — sealed (Merkle root + EIP-712 author attestation) and verification-ready, but self-attested, not on-chain-verified. On-chain verification exists only after the gated Verifiable-Memory publish, which this PR never performs automatically.

What's added

  • packages/okf (@origintrail-official/dkg-okf) — a pure, LLM-free OKF→RDF mapper (two-pass: index → extract+link), the §9 conformance validator, byte-stable N-Quads serialisation, and a graph-faithful export inverse. Mirrors the EPCIS package shape; runtime-standalone so the mapper is unit-testable in isolation.
  • dkg okf CLI (packages/cli/src/commands/okf.ts, wired into cli.ts) — import (WM default; --share → SWM; --private bulk SWM for private corpora; --dry-run offline; resumable manifest; opt-in --relate edge typing), export, and a verify completeness gate.
  • packages/ip-oracle (@origintrail-official/dkg-ip-oracle) + dkg ip-oracle generate — a deterministic, synthetic Google-Patents-shaped OKF corpus generator (no BigQuery dependency) used to exercise the --private bulk-import path at scale. Data is SIMULATED (every concept stamps source: … [SIMULATED] + CC BY 4.0). Disclosed explicitly; depends on dkg-okf (no type re-declaration).
  • Registry entry draft (packages/okf/integration.okf.json, validated against integrations/schema.ts).
  • Docs — ADR docs/adr/0005-okf-rdf-mapping.md, the full-lifecycle article docs/integrations/okf.md, packages/okf/CONTEXT.md, README.md, and the mainnet runbook packages/okf/DEMO.md.

Design: reuse the vocabulary, add a real Markdown AST

The node's markdown-extractor.ts is regex-based and resolves only [[wikilinks]], not OKF's [text](path.md) links; importing it into packages/okf would also cycle cli → okf → cli. So we converge on its predicate vocabulary (pinned by a test) but use a real Markdown AST (mdast-util-from-markdown) for link/section/citation extraction — which OKF §2 requires and which lets us honour CommonMark for links inside inline code spans.

Locked OKF → RDF mapping (ADR 0005)

OKF RDF predicate Notes
type (required) rdf:type full IRI as-is, else schema:<PascalCase> (BigQuery Datasetschema:BigQueryDataset)
title / description schema:name / schema:description
tags[] schema:keywords one per tag
timestamp schema:dateModified typed xsd:dateTime (OKF last-modified)
resource schema:url
producer keys schema:<camelCase> preserved, never dropped (§4.1/§9)
concept link (§5) schema:mentions one untyped directed edge by default — no FK/join inference (§5.3)
# Citations URL (§8) schema:citation both numbered + bare-bullet styles
body headings dkg:hasSection skolemized to deterministic .well-known/genid/ IRIs (no blank-node objects)

Concept IDs become deterministic IRIs (tables/blocksurn:okf:tables/blocks); the on-chain UAL is a distinct identifier assigned only at VM publish.

Opt-in typed edges (--relate / typeRelations). Default is one untyped schema:mentions edge per link (faithful to §5.3). A caller may supply deterministic (fromType,toType) → predicate rules to type edges by their endpoints' OKF type — e.g. BigQuery Dataset>BigQuery Table=hasPart (containment) while Table>Table stays mentions. Byte-stable, no LLM, off by default.

Memory-layer behaviour

  • Import defaults to Working Memory (free, private, reversible).
  • --share finalizes and advances to Shared Working Memory (free, team-visible).
  • --private bulk-writes loose quads into a private CG's SWM (no per-concept finalize) for large private corpora — batched (in-memory map + 5k-quad chunks, resumable), practical to ~100k concepts/run; not yet streaming.
  • Never publishes to Verifiable Memory. VM promotion (real TRAC, irreversible) is an explicitly-gated operator step, documented separately in DEMO.md §8. WM/SWM data never implies on-chain verification.

crypto_bitcoin results

Offline correctness gate (vitest, no node, byte-stable), over the vendored Google crypto_bitcoin bundle (Apache-2.0, attributed):

  • Exactly 5 Knowledge Assets; zero for the 3 reserved index.md.
  • Reconstructed untyped edge graph: dataset → 4 tables; transactions → {dataset, blocks, inputs, outputs}; inputs → {dataset, transactions, outputs}; blocks → none.
  • outputs → none by default: its only two links sit inside backticks → CommonMark literal text (recorded + warned; --include-code-span-links opts in). The one documented divergence from a naive link count.
  • Citations as schema:citation (both numbered and bare-bullet styles).
  • Determinism: identical N-Quads across runs. Round-trip is graph-faithful (not byte-faithful — prose isn't recoverable from triples; documented in export.ts/CONTEXT.md).

Tested

93 vitest tests across three suites — okf 77 (golden / edge-case / round-trip / internals / loader / injective-name / type-relations), ip-oracle 7, and CLI 9 driving the compiled dkg okf against an in-process daemon stub (dry-run-never-connects, WM vs --share advance, --private chunk/manifest, export skolem-filter + path-traversal guard, --relate, --replace, symlink-skip). Wired into turbo test; coverage ≥ thresholds. tsc --noEmit and the runtime build pass.

Review fixes addressed on this branch

  • Live-import blocker (strict v10.0.1): dkg:hasSection no longer emits blank-node objects — skolemized to deterministic IRIs. Export filters those genid nodes out.
  • import--share advances WM→SWM via a stage-aware manifest (was: skipped, falsely reporting SWM).
  • Round-trip fidelity: typed producer-key scalars (int/decimal/bool) keep their datatype.
  • Security/correctness: loader doesn't follow symlinks; export refuses path-traversal subjects; conceptIdToKaName is injective; verify scopes counts to the IRI base; --replace clears stale triples on re-import.
  • SPARQL result parsing reads bindings structurally (daemon omits type).

Deferred

  • Verifiable Memory promotion — held as the gated, money-spending capstone (DEMO.md §8).
  • Live mainnet demo — runbook only; operator-run.
  • --private true streaming (currently batched/in-memory; fine to ~100k/run).

Ziga Drev and others added 3 commits June 25, 2026 13:48
Add `packages/okf` (`@origintrail-official/dkg-okf`): a pure, LLM-free
mapper that turns a Google Open Knowledge Format (OKF) bundle into owned,
verifiable RDF Knowledge Assets, reconstructing the bundle's cross-concept
link graph. Same bundle ⇒ identical triples and IRIs.

- two-pass bundle import (index → extract+link), §9 conformance validator,
  byte-stable N-Quads, and a clean graph-faithful `export` inverse
- real Markdown AST (mdast) for link/section/citation extraction; converges
  on the node extractor's predicate vocabulary for joinable graphs
- vendored Google `crypto_bitcoin` fixture (Apache-2.0, attributed) + synthetic
  link-form and edge-case bundles; 63 golden/edge/round-trip vitest tests
- registry entry draft (`integration.okf.json`), CONTEXT.md, README.md, DEMO.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Thin node-facing wrapper over `@origintrail-official/dkg-okf`, mirroring the
EPCIS command structure (commander subcommands, exit-code mapping).

- `dkg okf import <bundleDir>`: ingest an OKF bundle into a Context Graph.
  Defaults to private Working Memory; `--share` finalizes + advances to
  Shared Working Memory. Never publishes to Verifiable Memory. `--dry-run`
  runs the deterministic mapping offline (no node). Resumable via a manifest.
- `dkg okf export <contextGraphId> <outDir>`: SPARQL-query a Context Graph and
  serialise it back into a conformant OKF bundle (clean inverse of import).
- wired into cli.ts next to `registerEpcisCommand`; importer skill note added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
- docs/adr/0005-okf-rdf-mapping.md: the locked OKF→RDF mapping table, the
  reuse-vs-fork decision for the Markdown extractor, IRI derivation, and the
  documented judgement calls (in-code-span links, broken links, citations).
- docs/integrations/okf.md: publishable article walking the entire lifecycle
  (WM → SWM → join invitation → Hermes agent → rendered graph; VM deferred).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Comment thread packages/cli/src/commands/okf.ts Outdated
let written = 0;
for (const concept of imported.concepts) {
const name = conceptKaName(concept.conceptId);
if (done.has(concept.conceptId)) continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This skip uses the same manifest entries for WM-only imports and shared imports. The documented flow imports once into WM, then reruns the command with --share, but the first run records every concept in .okf-import-manifest.json; the second run then hits this continue and never calls knowledgeAssetFinalize/knowledgeAssetShare, while still reporting memoryLayer: "SWM". Track per-concept lifecycle state (for example wmWritten vs swmShared) or still run finalize/share for already-written concepts when --share is requested.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The resumability manifest records concepts as done after the WM write, but the same default manifest is reused for a later --share run. That means the documented flow import ... followed by import ... --share skips every concept here and never reaches knowledgeAssetFinalize / knowledgeAssetShare, leaving the assets private in WM while reporting an SWM import. Track write/share phases separately or do not skip the share phase for already-written concepts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The default manifest is reused across WM imports and later --share runs, but it only records done concept IDs. After dkg okf import ... writes the WM draft, a follow-up dkg okf import ... --share sees every concept in done and skips the finalize/share block entirely, so the command reports SWM while leaving the assets private in WM. Track the completed phase in the manifest or still finalize/share already-written concepts when --share is requested.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The manifest records only contextGraphId and done concepts, so the documented flow import followed by import --share reuses the WM manifest and skips every concept before knowledgeAssetFinalize/knowledgeAssetShare runs. The command then reports memoryLayer: "SWM" with triplesWritten: 0, but nothing was actually shared. Track the completed stage in the manifest, or do not honor WM done entries when --share still needs to finalize/promote them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The resumability manifest is reused for both plain WM import and --share, so the documented flow import followed by import --share will skip every concept already marked done and never call knowledgeAssetFinalize/knowledgeAssetShare. Include the target mode/share state in the manifest or treat --share as needing to process already-written WM assets so it can finalize and promote them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The resumability manifest only records that a concept was processed, not which lifecycle phase completed. If an operator follows the documented flow and first imports to WM, the manifest marks every concept done; a later dkg okf import ... --share reuses that manifest, skips the loop here, and never calls wm/finalize or swm/share while still reporting memoryLayer: "SWM". Include the target mode/phase in the manifest key, or allow --share to finalize/share concepts already imported to WM.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The resume manifest makes the documented WM -> SWM promotion a no-op

What's wrong
A successful WM import marks concepts as done before they have ever been shared. Re-running the same command with --share later reuses that done set and skips the only block that finalizes and shares assets, so the lifecycle advertised by the CLI and docs silently fails.

Example
Run dkg okf import ./bundle --context-graph-id cg --create-context-graph first. It writes every concept to WM and records them as done. Then run dkg okf import ./bundle --context-graph-id cg --share, as the demo/docs instruct. The second run loads the manifest, skips every concept at line 394, writes triplesWritten: 0, and never calls knowledgeAssetFinalize/knowledgeAssetShare, even though the summary says memoryLayer: "SWM".

Suggested direction
Track import phases separately, or make --share process already-written concepts through finalize/share instead of treating them as fully complete.

For Agents
In packages/cli/src/commands/okf.ts, separate resumability state for WM write completion from SWM share completion, or ignore/replay the done set when --share needs to promote existing WM drafts. Preserve idempotent resume for partially written concepts. Add a scenario proving WM import followed by --share finalizes and shares all concepts already marked written.

.option('--manifest <path>', 'Resumability manifest path (default <bundleDir>/.okf-import-manifest.json)')
.option('--dry-run', 'Run the deterministic mapping offline and print the summary; never touch the node')
.option('--print-nquads', 'With --dry-run, also print the canonical N-Quads')
.action(async (bundleDir: string, opts: ActionOpts) => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The import action now mixes option parsing, conformance handling, context graph creation, manifest resume logic, KA writes, sharing, and summary output in one large closure. This makes the command hard to unit-test or safely extend. Extract the import/export runners and manifest read/write into small helpers so registerOkfCommand only wires Commander options to those routines.

The new dkg okf command is the user-facing path that actually creates assets, chunks wm/write, finalizes/shares with --share, resumes from a manifest, and exports via SPARQL, but the added tests only exercise the pure packages/okf mapper/exporter. Add CLI-level tests with a mocked ApiClient for at least import, --share, manifest resume/skip behavior, dry-run, and export; otherwise regressions in the node API sequence or option plumbing can ship green. This same coverage gap recurs in the export action below.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new user-facing dkg okf import/export command has no tests around the node-facing paths: creating/finding context graphs, chunked knowledgeAssetWrite, manifest resume, and the --share finalize/share sequence. The OKF package tests cover the pure mapper, but a regression in this CLI wrapper or its ApiClient calls would not be caught. Add CLI-level tests with a mocked ApiClient for import, resume, share, and export query/write behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new user-facing dkg okf command path is not covered by CLI tests. The added tests exercise the pure @origintrail-official/dkg-okf mapper/exporter, but nothing under packages/cli/test invokes this command or stubs the daemon to verify the wrapper behavior: required --context-graph-id, context graph creation, KA name translation, 5,000-quad chunking, manifest resume, --share finalize/share calls, export SPARQL query, and file writes. Add a CLI smoke test similar to the existing EPCIS command tests with a stub daemon so regressions in the node-facing contract fail in CI.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This single import action now owns option normalization, conformance handling, context-graph creation, peer invites, two different import strategies, manifest resume, chunk retry, and reporting. The mode model is especially hard to reason about because --private, --share, and default import are overlapping booleans checked throughout the action. Please split this into an explicit OkfImportPlan normalization step plus focused runners like runBulkPrivateSwmImport and runPerConceptImport; that would make incompatible modes and shared manifest/chunking behavior obvious instead of embedded in one long control path.

The new dkg okf import command is not covered by CLI/action tests. The package tests verify the pure mapper, but nothing mocks ApiClient and asserts the user-facing node workflow: context graph creation, per-concept wm/write, optional finalize/share, --private sharedMemoryWrite, manifest resume, and 413 split/retry. Add commander-level tests with a stubbed client so regressions in the wrapper do not ship green while mapper tests still pass.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new dkg okf import command is the user-facing path that creates graphs, writes KAs, finalizes/shares assets, streams private SWM chunks, resumes manifests, and handles dry-run, but the added tests only exercise the pure @origintrail-official/dkg-okf mapper. A regression in this wrapper, such as --dry-run connecting to the node, --share skipping finalize/share, or --private not resuming/splitting 413 batches, would still pass. Add CLI-level tests with a mocked ApiClient for at least dry-run, WM import, share import, private chunk/manifest behavior, and export query/write flow; this gap recurs for the new ip-oracle generate command as well.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: registerOkfCommand is doing too much: this one nested action owns option normalization, conformance handling, CG creation/invites, dry-run output, private bulk import, resumability manifests, per-concept KA writes, sharing, and progress reporting, with export/verify flows also embedded later in the same registration function. That makes the command hard to scan and hard to unit-test without driving Commander/process exits. Please keep this file as declarative command wiring and extract runOkfImport, runOkfExport, runOkfVerify, plus small manifest/chunk-writing helpers into focused modules; the behavior can stay identical while deleting most of the nested control flow.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new --private import path does the actual SWM write, graph assignment, resumable manifest handling, and 413 split/retry, but the added tests only cover the pure OKF mapper/generator. Add a CLI-level test with a mocked ApiClient that runs dkg okf import --private and asserts the mapped quads are written with the expected context graph, manifest resume skips completed chunks, and a 413 response splits and retries the batch; otherwise this user-facing bulk import could silently drop or mis-route corpus triples while all current tests stay green.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: registerOkfCommand is becoming the implementation, not just the command registration: this action now owns bundle loading, conformance policy, context-graph creation, private-vs-per-concept orchestration, manifest persistence, chunk retry, progress formatting, and JSON reporting. That makes the flow hard to test or change without adding more nested branches. Please extract the import/export/verify handlers plus shared manifest/chunk-writer helpers into focused functions or modules, leaving this file to wire Commander options to those units.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new --private import path is the high-volume write mode, but the added tests only cover the pure mapper/generator and never exercise this CLI branch against a mocked ApiClient. Please add a command-level test that verifies the quads sent to sharedMemoryWrite, chunk/resume manifest behavior, and the 413 split-and-retry path; otherwise the bulk private import can regress while all current tests stay green.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new OKF CLI wrapper is not verified at the command/API boundary

What's wrong
The PR verifies the pure OKF mapping well, but the user-facing command layer is where the changed behavior actually talks to the node, creates context graphs, writes KAs or loose SWM quads, finalizes/shares, exports files, and reports verification shortfalls. None of those command/API contracts are covered, so this change can ship green even if the CLI is wired to the wrong daemon route, drops options like --sub-graph-name, skips share/finalize, mishandles manifest resume, or reports verify success incorrectly.

Example
A command-level test could run dkg okf import <fixture> --context-graph-id cg --create-context-graph --share against a stub daemon and assert the CLI calls context-graph existence/create, creates one KA per concept, writes quads with the expected graph, then finalizes and shares only when --share is set. Without that, a regression in option parsing, API path wiring, graph injection, or manifest resume can pass all mapper tests while the actual user command is broken.

Suggested direction
Keep the package-level mapper tests, but add a small set of CLI smoke tests that execute the compiled dkg okf commands against a fake daemon so the node-facing behavior is validated.

For Agents
Add CLI-level tests under packages/cli/test, using the existing EPCIS-style stub-daemon pattern. Cover at least dry-run no-network behavior, per-concept import with and without --share, --private SWM chunking/413 split or resume, export query-to-files behavior, and verify complete vs shortfall exit codes. Preserve the existing mapper package tests; these new tests should prove the command wrapper sends the right daemon requests and handles exits/output correctly.

iriByConceptId[doc.conceptId] = conceptIdToIri(doc.conceptId, iriBase);
const type = doc.frontmatter.type;
if (type === undefined || type === null || String(type).trim() === '') {
warnings.push({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This diagnostic labels a missing type as missing-optional and says it is mapped as a generic concept, while validation treats type as a hard conformance requirement and the mapper does not emit a generic rdf:type. Rename the warning code/message to reflect the actual missing required type, or centralize this with the conformance errors so callers do not get conflicting guidance.

/**
* OKF §9 conformance validation — deliberately permissive.
*
* A bundle is conformant iff:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nit: The header comment says a bundle is conformant iff reserved files follow §6/§7, but the implementation intentionally reports reserved-file structure issues as warnings and only hard-fails concept parse/type errors. Update this comment, and the matching ConformanceReport.errors comment in types.ts, so future maintainers do not reintroduce stricter behavior based on stale docs.

DKG asset/assertion names cannot contain '/', but OKF concept IDs are
path-based (`tables/transactions`), so `dkg okf import` failed at the node
with `Invalid "name": Assertion name cannot contain "/"`.

Add `conceptIdToKaName(conceptId)` which maps path separators to '__'
deterministically (`tables/transactions` → `tables__transactions`) and use it
for the node-facing asset name. The RDF subject IRI is unchanged — it keeps
the original '/'-bearing concept ID (`urn:okf:tables/transactions`); only the
node handle is sanitised, so export (which keys on the IRI) is unaffected.

Adds regression tests asserting the generated name never contains '/'.

Found during a live mainnet import on a Hermes-operated node (PR OriginTrail#1331).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
@Zigoljube

Copy link
Copy Markdown
Author

Validated live on Base mainnet (and one bug found + fixed)

This was run end-to-end on a real Base mainnet DKG node (mainnet-base, chain base:8453, edge role) operated by a Hermes agent:

dkg okf import packages/okf/test/fixtures/crypto_bitcoin \
  --context-graph-id okf-crypto-bitcoin --create-context-graph --share

Result: 5 Knowledge Assets, 101 triples, 11 cross-table edges, 0 broken links, 10 citations, imported and shared to Shared Working Memory (free, off-chain). Verified live via authenticated /api/query: 101 SWM triples / 19 distinct subjects, with urn:okf:tables/transactions mentioning all four expected targets (datasets/crypto_bitcoin, tables/blocks, tables/inputs, tables/outputs).

The run surfaced one real bug, now fixed in fix(okf): sanitize '/' out of Knowledge Asset names: dkg okf import used the raw path-based concept ID as the node asset name, which the daemon rejects (Assertion name cannot contain "/"). The fix adds conceptIdToKaName() mapping /__ (the RDF subject IRI keeps the original /), with regression tests. Offline gate is now 65 tests; build + typecheck green.

No TRAC spent — the demo stops at free SWM; VM promotion remains the deferred, gated capstone.

for (const b of bindings) {
if (!b.s || !b.p || !b.o) continue;
const subject = unwrapIri(b.s);
const quad: Quad = { subject, predicate: unwrapIri(b.p), object: b.o };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: dkg okf export builds quads from /api/query bindings without normalizing binding cells or unwrapping object IRIs. The daemon can return SPARQL-JSON cell objects, which makes unwrapIri(b.s) throw, and it can also return IRI strings as <...>; because b.o is stored untouched, schema:mentions objects like <urn:okf:...> will not match iriBase in exportBundle, dropping the exported relationship graph. Normalize s/p/o with the existing binding-value logic and unwrap IRI terms before constructing the Quad.

Comment thread packages/okf/src/paths.ts Outdated
* node-facing name is sanitised.
*/
export function conceptIdToKaName(conceptId: string): string {
return conceptId.replace(/\//g, '__');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This KA-name mapping is not injective for valid OKF concept IDs: a/b.md and a__b.md both map to the same node asset name a__b because underscores are valid path characters. A bundle containing both concepts will write two distinct subject IRIs into one assertion/KA name or fail on create, corrupting the concept-to-asset mapping. Use a reversible escaping scheme or include a collision-resistant suffix.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Concept IDs can collide when converted to Knowledge Asset names

What's wrong
Two valid OKF concept paths can resolve to the same Knowledge Asset name. That loses the one-concept-to-one-KA invariant and can corrupt imports by writing distinct concepts into the same lifecycle handle.

Example
Bundle contains a/b.md and a__b.md. The mapper derives distinct subjects urn:okf:a/b and urn:okf:a__b, but the CLI creates/writes both through the same node asset name a__b, mixing or overwriting two concepts under one Knowledge Asset.

Suggested direction
Use a reversible escaping scheme, percent/base64url encoding, or append a stable hash so every valid OKF concept ID maps to a unique node-facing KA name.

For Agents
Look at conceptIdToKaName and the CLI import loop. Replace the slash-to-double-underscore encoding with a reversible or collision-resistant node asset name encoding, and add a test with both a/b.md and a__b.md proving they create separate KAs while preserving their RDF subject IRIs.

).toBe(true);
});

it('produces byte-identical N-Quads across two runs (determinism)', () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This determinism test only imports the same files array twice, so it would still pass if importBundle depended on caller/filesystem enumeration order. Since byte-stable output is a core contract of the new mapper, add a regression case that shuffles the same BundleFile[] before import and asserts the canonical N-Quads stay identical.

Comment thread packages/okf/src/types.ts
/** A non-fatal diagnostic surfaced during a bundle import. */
export interface OkfWarning {
conceptId?: string;
code: 'broken-link' | 'code-span-link' | 'missing-optional' | 'reserved-skip' | 'parse';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nit: missing-optional is used for a missing type, but type is described elsewhere in this package as the only required consumer field and the CLI treats it as a hard conformance error. Rename this diagnostic to something like missing-type (and update the warning emission in bundle.ts) so downstream callers do not have to remember that this “optional” warning is actually about a required field.

Comment thread packages/okf/src/types.ts
/** §9 conformance report. */
export interface ConformanceReport {
conformant: boolean;
/** Hard violations (only the three §9 rules can make a bundle non-conformant). */

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nit: This comment says only “the three §9 rules” can produce hard conformance errors, but validation.ts intentionally reports reserved-file structure issues as warnings and only rules 1 and 2 populate errors. Please align the public type documentation with the implementation so future validators do not reintroduce the stricter interpretation by following this comment.

A naive `STRSTARTS(STR(?s),"urn:okf:")` subject count returns ~19, not 5: the
daemon skolemises each concept's `dkg:hasSection` blank nodes into
`urn:okf:.../.well-known/genid/...` subjects that also match the prefix. Count
subjects with an `rdf:type` (only the 5 concepts have one) to get exactly 5.

Surfaced by the live Hermes-operated mainnet verification on PR OriginTrail#1331.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Comment thread packages/okf/src/loader.ts Outdated
const walk = (current: string): void => {
for (const entry of readdirSync(current).sort()) {
const full = join(current, entry);
const st = statSync(full);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: statSync follows symlinks, so an OKF bundle can include a symlink like secret.md -> ~/.dkg/auth.token or a symlinked directory outside the bundle and this loader will read that target and import/share it as bundle content. For bundles received from elsewhere, dkg okf import ... --share can leak arbitrary local files. Use lstatSync to reject symlinks, or resolve realpath and require every file/descended directory to stay under the bundle root before reading.

Comment thread packages/cli/src/commands/okf.ts Outdated
graph,
}));

await client.createKnowledgeAsset(contextGraphId, name, { subGraphName });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Re-opening an existing KA here does not clear its data graph: the underlying assertionCreate only recreates lifecycle metadata and the storage createGraph calls are no-ops for existing graphs. If the same bundle is re-imported after a concept removes a title/link/citation, or after the manifest is missing/corrupt, the old quads remain and the subsequent writes only add the new ones, so finalize/share can publish stale relationships. Clear or replace the WM draft before writing each concept, or use an atomic create/write path that guarantees the previous graph content is discarded.

const includeCodeSpanLinks = Boolean(opts.includeCodeSpanLinks);

const files = loadBundleDir(bundleDir);
const conformance = validateBundle(files);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: validateBundle(files) and importBundle(files, ...) both sort, classify, and parse the bundle, with related conformance decisions split across validation.ts and bundle.ts. That duplication makes it easy for the CLI gate and the actual import path to drift as OKF rules change. Consider sharing a single parse/index pass, or have importBundle return the conformance report used by the CLI.

// Bulk-import chunking contract (ADR 0002): ≤5,000 quads per wm/write.
const CHUNK = 5000;

const OKF_EXIT_CODES = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nit: The OKF exit-code mapping and reportOkfError helper duplicate the same HTTP-status/error-reporting pattern already present in the EPCIS command. Adding another local copy increases drift risk for CLI exit semantics. Consider extracting a shared CLI HTTP error reporter and reusing it from both commands.

}

/** Parse a concept body with a real Markdown AST. */
export function parseBody(body: string): ParsedBody {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nit: parseBody is exported from the package barrel, but its return type ParsedBody is private to this module. That makes the public API harder to consume and document. Either export ParsedBody as a public type or keep parseBody internal instead of re-exporting it.

… import

Adds the engineering harness for an IP / Patent Context Oracle on top of the
OKF→DKG integration:

- packages/ip-oracle (@origintrail-official/dkg-ip-oracle): deterministic,
  synthetic Google-Patents-shaped OKF corpus generator (no BigQuery dep).
  Same seed ⇒ byte-identical corpus (mulberry32, index-scoped streams; no
  Math.random/Date.now). Per-patent concepts carry CPC class, jurisdiction,
  assignee, family, a backward-citation DAG, counts, and substance fields
  (claim_chart_ref/essentiality_note). Every concept is stamped
  `source: … [SIMULATED]` + `license: CC BY 4.0`. `writePatentBundle` streams
  to disk for 100k–10M scale.
- `dkg ip-oracle generate` CLI command.
- `dkg okf import --private --allowed-peer`: bulk-streams reconstructed triples
  (citation + family + mention edges) into a PRIVATE Context Graph's Shared
  Working Memory as loose quads via /api/shared-memory/write — no per-concept
  finalize, no TRAC, nothing on-chain. 5,000-quad chunks, chunk-indexed
  resumable manifest, 413 → halve-and-retry, throughput + dataset-pointer
  reporting. Substance stays private (invariant).

Tests: 7 vitest (determinism, OKF-conformance round-trip through the importer,
citation/family edge reconstruction, provenance/substance stamps, streamed
write). Coverage above threshold. Builds + typechecks green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Comment thread packages/cli/src/commands/okf.ts Outdated

// Rebuild a minimal BundleImport for the exporter.
const concepts = [...quadsBySubject.entries()]
.filter(([iri]) => iri.startsWith(iriBase))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Filtering export subjects only by iriBase treats node-generated section subjects as OKF concepts. The node skolemizes section blank nodes under the concept IRI prefix, so an imported bundle with headings can export extra bogus .well-known/genid/... concepts/files. Restrict this to real concept subjects, for example by requiring an rdf:type on the subject and validating the derived concept ID segments.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Filtering export subjects with only iri.startsWith(iriBase) also matches skolemized blank-node descendants such as urn:okf:tables/blocks/.well-known/genid/..., which the node creates for section blank nodes. Those rows are then rebuilt as standalone OKF concepts, producing extra files/index entries that were never Knowledge Assets. Exclude /.well-known/genid/ descendants or select only real concept roots, e.g. subjects with the OKF concept rdf:type.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This export query selects every subject whose IRI starts with iriBase, but imported blank-node section subjects are skolemized under <concept>/.well-known/genid/..., which also has the same prefix. Export will therefore treat section nodes as OKF concepts and write bogus .well-known/genid markdown files. Restrict exported concepts to root subjects, e.g. subjects with rdf:type, or explicitly exclude /.well-known/genid/ descendants while still keeping their triples attached to the parent if needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Filtering exported concepts only by iri.startsWith(iriBase) also includes skolemized section subjects such as urn:okf:tables/blocks/.well-known/genid/... after WM/SWM storage. Those subjects are not OKF concepts, but this code turns each into a generated .md concept with empty frontmatter/index entries, so dkg okf export from the node is not graph-faithful. Filter to real concept roots, e.g. subjects with rdf:type, and exclude /.well-known/genid/ descendants.

Comment thread packages/cli/src/commands/okf.ts Outdated

const outFiles = exportBundle(imported, { iriBase });
for (const f of outFiles) {
const full = join(outDir, f.path);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: f.path comes from graph subjects via iri.slice(iriBase.length) and is joined directly with outDir. A malicious or corrupted context graph subject such as urn:okf:../../outside can make dkg okf export write outside the requested output directory. Validate exported concept IDs against the OKF path rules and ensure the resolved output path stays under outDir before writing.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Export can write outside the requested output directory

What's wrong
The export path trusts RDF subjects from the context graph to become filesystem paths. A shared or otherwise untrusted graph can supply an OKF-looking subject whose suffix contains path traversal, causing the CLI to create files outside the user-selected export directory.

Example
If a context graph contains <urn:okf:../outside> rdf:type <http://schema.org/Thing>, dkg okf export cg ./out rebuilds conceptId as ../outside; exportBundle emits ../outside.md; line 527 joins that path and writes outside ./out.

Suggested direction
Treat graph-derived concept IDs as untrusted input and enforce both OKF concept-id validation and an output-directory containment check before mkdir/writeFile.

For Agents
In packages/cli/src/commands/okf.ts and/or packages/okf/src/export.ts, validate graph-derived concept IDs with the OKF segment rules before creating files, reject ./../absolute segments, and after resolving each output path assert it is contained by outDir. Add a malicious subject export test proving traversal paths are rejected and valid nested concept IDs still export.


for (const q of c.quads) {
if (q.subject !== c.iri) continue; // skip section blank-node triples
switch (q.predicate) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This switch manually mirrors the RDF mapping already encoded in frontmatterQuads, so the locked OKF mapping now has to be maintained in two separate imperative branches. A cleaner structure would make the mapping table executable: shared field descriptors for import/export of known frontmatter predicates, with one fallback for producer-defined keys. That removes a drift-prone inverse implementation and makes future predicate additions a one-place change.

}

/** Pure: publication number for an index — a valid OKF concept-ID segment. */
export function pubNumber(o: ResolvedOpts, i: number): string {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: pubNumber is exported as a public helper, but its parameter type is the private fully-resolved ResolvedOpts; consumers cannot call it naturally with PatentGenOptions, and the test has to avoid the contract instead of exercising it. Please either make pubNumber accept PatentGenOptions and resolve internally, or export a named ResolvedPatentGenOptions plus resolver so the recompute boundary is explicit and usable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: pubNumber is exported as public API, but its parameter is the private ResolvedOpts shape rather than the exported PatentGenOptions. That exposes an internal normalization boundary to consumers and forces callers to either reconstruct defaulted options themselves or use casts, which is a brittle contract for a helper advertised in CONTEXT.md. Can this either accept PatentGenOptions and call resolve internally, or export an explicit resolved-options type/factory so the boundary is named and reusable?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: pubNumber is exported as part of the public package surface, but its first parameter is the private ResolvedOpts type. That leaks an internal normalization boundary and already forces the test to use as never just to reference the helper. Please either export a real ResolvedPatentGenOptions/resolvePatentGenOptions pair, or make pubNumber accept PatentGenOptions and resolve defaults internally so consumers do not need casts or knowledge of hidden defaults.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Exported pubNumber leaks an internal resolved-options shape

What's wrong
The helper is presented as a downstream recompute surface, but its signature is coupled to an unexported, fully-defaulted internal type. That is a muddy type boundary: callers see a public function whose usable input model is different from the package's main options model, and even the test has to route around the signature instead of exercising it.

Example
A consumer with normal generator options like { cpcClass: 'H04L', count: 100, seed: 42 } cannot use the exported recompute helper directly without also supplying all defaulted fields such as jurisdictions, owners, yearFrom, yearTo, citationsPerPatent, familySize, and retrievalDate.

Suggested direction
Make the public boundary honest: either expose resolvePatentGenOptions/ResolvedPatentGenOptions, or change pubNumber to accept the same PatentGenOptions users already pass to generatePatentBundle. If recomputation is not intended as public API, stop exporting it.

For Agents
Look at packages/ip-oracle/src/patent-generator.ts and the barrel export in src/index.ts. Preserve deterministic publication-number generation, but either keep pubNumber private, export a real resolved-options type plus resolver, or make pubNumber accept PatentGenOptions and call resolve internally. Replace the smoke-reference test with an actual call through the public API.

.option('--sub-graph-name <name>', 'Sub-graph within the Context Graph')
.option('--iri-base <base>', `IRI namespace concept subjects were minted under (default ${DEFAULT_IRI_BASE})`)
.option('--view <view>', 'working-memory | shared-working-memory | verified-memory', 'shared-working-memory')
.action(async (contextGraphId: string, outDir: string, opts: ActionOpts) => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new dkg okf export wrapper has no test for the SPARQL query result path that rebuilds BundleImport from daemon bindings and writes files. Existing round-trip tests call exportBundle directly, so a regression in binding normalization, empty-result handling, view/subGraph forwarding, or output file creation would not be caught. Add a CLI/action test with a mocked client.query response and assert the generated files and client options.

… mode

The live 100k run surfaced a superlinear slowdown as the single CG's SWM store
grows (6631 → 635 → 160 triples/s). The previous log line showed only the
cumulative average, which hides the within-run degradation. Emit the
instantaneous per-chunk rate alongside the average so the cost gate can read the
throughput-vs-graph-size curve directly from the import logs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Comment thread packages/okf/src/export.ts Outdated
default: {
if (q.predicate.startsWith(SCHEMA_NS) && !KNOWN_PREDICATES.has(q.predicate)) {
const key = q.predicate.slice(SCHEMA_NS.length);
const v = literalValue(q.object) ?? q.object;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: For producer-defined frontmatter fields, typed literals are exported by keeping only their lexical value. A numeric or boolean key imported as "3"^^xsd:integer or "true"^^xsd:boolean becomes YAML string data, and re-importing emits plain string literals instead of the original typed terms. Preserve/parse the datatype for known scalar types when reconstructing extras so export/import remains graph-faithful.

}
});

it('pubNumber is a valid single OKF segment (no slashes)', () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This test is named as coverage for pubNumber, but it never calls the exported helper; it only inspects a path produced by generatePatentBundle and then voids pubNumber. That gives false confidence for the documented downstream recompute API. Call pubNumber(resolvedOpts, i) directly and assert the exact publication number/segment shape, ideally also comparing it to the generated file path.

// Substance lives ONLY here; the public discoverability signal is a
// separate, gated step. No TRAC, no on-chain anything.
if (isPrivate) {
const allQuads = imported.concepts.flatMap((c) =>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This private-import path is framed as the large-corpus mode, but it still builds BundleFile[], a full BundleImport, and then a second allQuads array before chunking. That makes the chunking loop an implementation detail after the memory spike, and it keeps the CLI coupled to the mapper's whole-bundle representation. Can we move this behind a streaming/iterable OKF import abstraction that yields per-concept or per-chunk quads directly to the writer? That would let the large/private mode be the simple path instead of a special branch that duplicates the entire graph in memory.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The --private path is described as suitable for 100k-10M corpora, but it still materializes the whole bundle and graph before writing: loadBundleDir/importBundle build all files/concepts/quads, then this line builds a second allQuads array. That makes the new streaming/resumable mode depend on a non-streaming mapper boundary and will push future scale fixes into more CLI special cases. A cleaner structure would expose a two-pass import API that indexes concept ids first, then yields mapped concepts or graph-qualified chunks one at a time, so the CLI can write/resume without holding the full corpus twice.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The private bulk path is documented as suitable for 100k-10M corpora, but the implementation eagerly materializes the whole bundle (loadBundleDir), the full BundleImport, and later a second allQuads array before it can write the first chunk. Structurally this fights the streaming ip-oracle generator and makes the importer scale by holding the entire corpus in memory. A cleaner design would split OKF import into an index pass plus an iterable per-concept mapping/writer so --private can stream chunks after indexing instead of flattening everything up front.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Bulk private import is bolted onto an in-memory mapper instead of a streaming import abstraction

What's wrong
The implementation advertises private bulk import as the tractable path for very large corpora, but the structure still routes it through the package's all-in-memory bundle API and then duplicates the quads into a second CLI-owned array. That couples the scalability story to a special branch in the command and makes the next scale fix much harder because the wrong layer owns chunking and traversal.

Example
A future large private corpus path has to pass through BundleFile[], BundleImport, concepts[], quads[], and then allQuads[] before the first sharedMemoryWrite, even though the surrounding code presents this mode as the scalable 100k-10M path.

Suggested direction
Move the scaling boundary into @origintrail-official/dkg-okf: expose a deterministic concept iterator or import-plan API, then have both per-concept and private-SWM modes consume that same stream through a shared chunk writer. That deletes the special allQuads staging layer and makes the large-corpus story structurally true.

For Agents
Look at packages/cli/src/commands/okf.ts import handling and packages/okf/src/bundle.ts. Preserve the existing dry-run, per-concept, and private-SWM behavior, but introduce an OKF import plan/iterator that can yield per-concept mapped quads and chunked graph quads without requiring the CLI to materialize the entire corpus. Prove the refactor with existing golden tests plus a focused test/fake writer that observes chunk emission order.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Make private bulk import chunk-native instead of rematerializing the full graph

What's wrong
The implementation advertises and scaffolds large private corpus workflows, but the orchestration is still an eager in-memory batch. That is a structural mismatch: retry, manifest, progress, and memory behavior all depend on one big array instead of a chunk-producing abstraction.

Example
For a large generated patent corpus, the generator can write concepts incrementally, but dkg okf import --private still needs the entire BundleFile[], BundleImport, and allQuads array resident before chunk 1 is sent. The chunk writer and manifest cannot become truly streaming while they are keyed to indexes in this derived array.

Suggested direction
Move the chunking boundary down into the OKF import layer: index the bundle once, then yield mapped concept quads or fixed-size quad chunks to the CLI writer. That would delete allQuads, keep manifest state closer to chunks, and align the importer with the large-corpus API this PR introduces.

For Agents
Look at packages/okf/src/loader.ts, packages/okf/src/bundle.ts, and the --private branch in packages/cli/src/commands/okf.ts. Introduce an iterator/streaming import surface or a chunk-producing mapper that still preserves the two-pass link-index invariant, then have the private writer consume chunks directly. Existing dry-run and per-concept behavior should remain unchanged.

}

// Rebuild a minimal BundleImport for the exporter.
const concepts = [...quadsBySubject.entries()]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The export command is rebuilding a fake BundleImport in the CLI by grouping raw SPARQL bindings by subject prefix and filling the rest of ConceptMapping with empty arrays. That leaks the mapper's internal model into the command layer and duplicates concept-identification rules outside @origintrail-official/dkg-okf. A cleaner boundary would be an OKF package helper such as bundleImportFromQuads(quads, { iriBase }) / conceptsFromQuads(...) that owns how graph quads become exportable concepts, leaving the CLI as transport only.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: CLI export rebuilds OKF internals instead of asking the OKF package to own the boundary

What's wrong
This leaks mapper internals across the CLI/package boundary. The command now knows about section genid encoding, concept-root filtering, and the exact BundleImport shape, but those are implementation details of @origintrail-official/dkg-okf. That makes export brittle and forces future mapper changes to be coordinated in two places.

Example
If the OKF package changes section skolemization or adds another non-concept subject shape, the CLI export path must be updated in lockstep even though packages/okf is the package that owns those invariants.

Suggested direction
Collapse this into a package-owned adapter: the CLI should fetch quads and pass them to an OKF export API, while packages/okf decides which subjects are concepts and which internal section nodes are presentational.

For Agents
Look at packages/cli/src/commands/okf.ts export handling and packages/okf/src/export.ts. Preserve current SPARQL query behavior, but move the Quad[] -> export model reconstruction into @origintrail-official/dkg-okf, e.g. exportBundleFromQuads(quads, { iriBase }) or bundleImportFromConceptQuads. Add a unit test in packages/okf for excluding section genid subjects.

…ndle

The live 10k run surfaced a persistent, predicate-dependent shortfall in the
node's SWM bulk loose-write/index (e.g. rdf:type 9943 vs 10000) that does not
self-heal on settle — while the deterministic mapping provably emits exactly
10000/10000/10000. Without a check, an oracle would silently undercount.

`dkg okf verify <bundleDir> --context-graph-id <cg>` maps the bundle offline and
re-queries the CG, comparing actual vs expected triple counts per integrity
predicate (rdf:type/source/license/citation/mentions), lists concept IRIs
missing their rdf:type (--list-missing), and exits non-zero on any shortfall so
it can gate a pipeline. Remediation: re-run the idempotent import (second
loose-write pass; the store dedupes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
`Created ${isPrivate ? 'private (invite-only) ' : ''}Context Graph "${contextGraphId}"` +
(isPrivate && allowedPeers.length ? ` with ${allowedPeers.length} allowlisted peer(s).` : '.'),
);
} else if (isPrivate && allowedPeers.length) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: For an existing context graph, --private does not verify that the graph is actually invite-only/private before writing the corpus to SWM. If the user reuses a public CG with --private, the command still bulk-writes the private patent/OKF substance into that graph, which defeats the privacy contract of the option. Refuse existing non-private graphs or fetch/validate the access policy before entering private mode.

'http://schema.org/mentions',
];

okfCmd

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: dkg okf verify is introduced as the completeness gate for private bulk imports and controls success vs non-zero exit, but there is no test exercising its predicate counts, shortfall detection, count parsing, or --list-missing behavior. Because this command is itself the validation evidence operators would rely on after import, add mocked-client CLI tests for both complete and incomplete graphs so a regression cannot report complete: true for an undercounted Context Graph.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The new okf verify gate lacks tests proving it catches missing triples

What's wrong
This command is itself the validation evidence operators are expected to trust after a private bulk import. If its count parsing, daemon result-shape handling, or exit-code path regresses, it can report an incomplete graph as complete or fail to gate a pipeline, and the current tests would not catch that.

Example
Mock client.query so rdf:type returns { bindings: [{ c: '"1"^^<http://www.w3.org/2001/XMLSchema#integer>' }] } while the local bundle expects two concepts. The command should print complete: false, report one missing triple, and exit with the client-error code.

Suggested direction
Cover the verifier as a behavior test rather than relying on mapper tests: mock query responses and assert both the parsed counts and the non-zero exit path on shortfall.

For Agents
Add CLI tests for okf verify with mocked ApiClient.query: one complete graph, one shortfall, one daemon { bindings: [...] } result without a type discriminator, and --list-missing returning bracketed and unbracketed IRIs. Assert JSON output and exit code behavior.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: okf verify is added without CLI coverage

What's wrong
The PR introduces a user-facing validation gate intended to catch incomplete private bulk imports, but the added CLI tests do not execute it. That leaves the most validation-oriented command in this change unverified, including its SPARQL count parsing, missing-triple calculation, and failure exit behavior.

Example
A stub daemon could return rdf:type count 1 when the bundle expects 2; the command should print complete: false, report the missing count, and exit with code 2. No current test would fail if this path instead reported success or parsed the daemon binding shape incorrectly.

Suggested direction
Add end-to-end CLI tests against the existing in-process stub so the completeness gate’s count queries, JSON report, and exit codes are verified.

For Agents
Extend packages/cli/test/okf-subcommands.test.ts with daemon /api/query responses for okf verify: at least one complete case, one shortfall case asserting non-zero exit and totalMissingTriples, and preferably one --list-missing case using bracketed IRI bindings. Preserve the existing structural bindingsOf daemon response shape.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: okf verify lacks regression coverage

What's wrong
This PR adds a user-facing completeness gate for bulk OKF imports, but the changed behavior is not tested. A regression in the SPARQL count parsing or failure exit path could make an incomplete graph look verified, which is exactly the validation evidence this command is meant to provide.

Example
Failing-test sketch: stub /api/query to return daemon-shaped { result: { bindings: [{ c: '"0"^^<http://www.w3.org/2001/XMLSchema#integer>' }] } } for a bundle that expects rdf:type rows, then assert dkg okf verify <bundle> --context-graph-id cg exits non-zero and reports the shortfall. Add a complete-count case that exits 0 using the same daemon-shaped bindings response.

Suggested direction
Extend the existing in-process daemon CLI tests to exercise the verify subcommand’s count queries, result parsing, JSON output, and non-zero failure path.

For Agents
Add CLI tests in packages/cli/test/okf-subcommands.test.ts for okf verify: complete counts, shortfall exit, structural { bindings: [...] } parsing, and optionally --list-missing. Keep the stub response shaped like the daemon rather than the typed union discriminator.

The daemon's /api/query returns SELECT results as { bindings: [...] } without
the `type: 'bindings'` discriminator the QueryResult union models. Gating on
`result.type === 'bindings'` silently yielded zero rows — which made `okf verify`
report actual:0 for every predicate (a false "everything is missing") against a
graph that in fact held the data, and would likewise make `okf export` reconstruct
an empty bundle. Add a structural `bindingsOf()` reader and use it in both verify
and export. The live data persisted fine (no SWM eviction); this was the bug.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Comment thread packages/okf/src/mapping.ts Outdated

// Sections: every body heading (OKF titles live in frontmatter, so H1s count).
parsed.headings.forEach((text, i) => {
const blank = `_:okfsec_${sanitizeForBlank(doc.conceptId)}_${i}`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: sanitizeForBlank is lossy, so valid concept IDs like a/b and a_b both produce the same section blank node label _:okfsec_a_b_0. In one bundle that merges distinct section nodes; after node skolemization the name triple can attach to the wrong root or be lost. Use a collision-free encoding or hash of the full concept ID in the blank-node label.

Comment thread packages/cli/src/commands/okf.ts Outdated
const client = await ApiClient.connect();
const countFor = async (predicate: string): Promise<number> => {
const sparql =
`SELECT (COUNT(*) AS ?c) WHERE { GRAPH ?g { ?s <${predicate}> ?o } }`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The verify query counts every matching predicate in the Context Graph, regardless of whether the subject belongs to this OKF bundle. Any pre-existing rdf:type, schema:mentions, schema:citation, etc. triples in the same graph can offset missing OKF triples and make complete true on an incomplete import. Constrain the count to subjects under iriBase (and the relevant root entities) when comparing against expectedByPred.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Verify can report completeness even when OKF triples are missing

What's wrong
The completeness gate is meant to prove the imported corpus matches the bundle, but graph-wide aggregate counts can be satisfied by unrelated data. That can mask dropped or missing OKF triples and produce a false successful verification result.

Example
A bundle expects one schema:mentions triple. If that triple was dropped during private import but the same context graph already contains one unrelated schema:mentions triple from another corpus, actual still equals expected and complete remains true.

Suggested direction
Restrict verification queries to the bundle’s expected subjects/triples instead of using graph-wide predicate totals.

For Agents
In packages/cli/src/commands/okf.ts verify, scope counts to expected OKF subjects under the chosen iriBase, or preferably compare the expected (s,p,o) triples exactly against the graph. Preserve the current summary shape, but add a test where unrelated same-predicate triples cannot mask a missing OKF triple.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Verification can pass by counting unrelated triples

What's wrong
The completeness gate can be fooled by other data in the same Context Graph that uses the same predicates, masking missing OKF import data.

Example
If the target Context Graph already has 100 unrelated rdf:type and schema:mentions triples, and the OKF private import dropped some or all OKF triples, verify can still report complete: true because the unrelated predicate counts satisfy the expected totals.

Suggested direction
Constrain verification queries to the expected OKF subject namespace or exact expected subject set before comparing counts.

For Agents
In okf verify, scope actual counts to OKF concept subjects, for example FILTER(STRSTARTS(STR(?s), iriBase)), and ideally compare per expected concept/predicate. Add a test where unrelated triples with the same predicates are present but an OKF concept triple is missing, and verify must fail.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: okf verify can pass by counting unrelated triples

What's wrong
The completeness gate compares expected bundle counts against graph-wide predicate counts. In a mixed Context Graph, unrelated data with the same predicates can mask missing OKF triples, producing a false successful verification.

Example
If the bundle expects 10 schema:citation triples but this OKF import dropped all 10, and the same Context Graph already has 10 schema:citation triples from another corpus, verify reports actual: 10 and complete: true even though every expected citation is missing.

Suggested direction
Restrict verification to the expected OKF concept IRIs/quads instead of graph-wide predicate totals.

For Agents
In packages/cli/src/commands/okf.ts verify, scope counts to the imported OKF subjects, preferably by comparing expected quads directly or by filtering subjects to iriBase and checking per-subject/predicate counts. Add a test where unrelated same-predicate triples exist and expected OKF triples are missing.

// Gating on `result.type === 'bindings'` therefore silently yields no rows
// (the bug that made `okf verify` report 0 for everything). Read `bindings`
// structurally instead.
function bindingsOf(result: unknown): Array<Record<string, string>> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This local bindingsOf workaround papers over a query-result boundary problem inside the OKF command. The comment says /api/query returns { bindings } without the discriminator the CLI type expects, but fixing that here leaves every other query caller to rediscover the same structural cast. Please normalize SELECT results in ApiClient.query or a shared query-result helper and have okf export / okf verify use that canonical boundary instead of adding an OKF-specific escape hatch.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Normalize query bindings at the API client boundary

What's wrong
The OKF command now owns a low-level daemon response workaround using unknown and casts. That is a boundary leak: query result normalization belongs with the API client/type contract, not in one feature command. Keeping it here invites copy-paste and keeps the real invariant unclear.

Example
dkg okf export calls bindingsOf(result) after client.query, and dkg okf verify repeats the same pattern for count and missing-concept queries. The next query-based command will need to rediscover the same daemon/client mismatch unless the client boundary is fixed.

Suggested direction
Make ApiClient.query return a canonical QueryResult, or add a shared queryBindings helper, instead of parsing daemon result shapes inside the OKF command.

For Agents
Move the response-shape normalization into ApiClient.query or an exported shared query helper near QueryResult. Preserve support for both {type:'bindings', bindings} and {bindings} daemon responses, then update OKF export/verify to consume the typed helper.

// loose-write pass; the store dedupes, so only the dropped triples land).
const RDF_TYPE = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type';
// Predicates we treat as integrity signals (order = report order).
const INTEGRITY_PREDICATES = [

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: verify hard-codes a small, IP-oracle-flavored predicate allowlist in the CLI (source, license, citation, mentions) even though the OKF package already owns the mapping and can compute expected predicates from imported.quads. This leaks mapping policy into the command layer and will drift as soon as OKF adds or changes frontmatter predicates. Please derive the verification rows from the imported graph, or expose an explicit verification policy from @origintrail-official/dkg-okf if only a subset should be checked.

import { join } from 'node:path';
import { streamFor, pick, intBetween } from './prng.js';

export interface BundleFile {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: ip-oracle redefines BundleFile even though it already depends on @origintrail-official/dkg-okf, where BundleFile is the canonical mapper input contract. This duplicate structural type is small today, but it creates a second boundary that can drift while tests keep passing structurally. Import/re-export type BundleFile from dkg-okf instead so the generator advertises exactly the contract the OKF importer consumes.

@TomazOT

TomazOT commented Jun 27, 2026

Copy link
Copy Markdown

I tested this PR against a live DKG v10.0.1 edge node using the bundled crypto_bitcoin fixture.

Offline path works as expected:

  • @origintrail-official/dkg-okf tests pass
  • dkg okf import ... --dry-run --print-nquads reports 5 concepts, 101 triples, 11 resolved links, 10 citations, 0 broken links

Live import currently fails on the first concept write because the mapped section triples use blank nodes as RDF terms:

Invalid "quads[11].object": RDF object must be a quoted literal term or absolute IRI

The failing term is the dkg:hasSection object, e.g. _:okfsec_datasets_crypto_bitcoin_0.

A local compatibility patch that skolemizes OKF section blank nodes into deterministic concept-scoped IRIs before sending them to the daemon allowed the import to complete successfully into SWM. Example shape:

urn:okf:datasets/crypto_bitcoin/.well-known/genid/okfsec_datasets_crypto_bitcoin_0

With that local patch, the live SWM import completed with 5 concepts / 101 triples and SPARQL confirmed the expected 11 schema:mentions edges.

One related export concern: after skolemization/storage, dkg okf export treated section-node IRIs under urn:okf: as OKF concepts and exported extra .well-known/genid/*.md files. Export probably needs to filter to real concept subjects, for example subjects with OKF concept rdf:type, and exclude generated section/skolem nodes.

…ict daemons

Maintainer review (TomazOT, live v10.0.1 edge node) found the live import fails on
the first concept write because `dkg:hasSection` used RDF blank nodes as objects:

    Invalid "quads[11].object": RDF object must be a quoted literal term or absolute IRI

The daemon requires object terms to be quoted literals or absolute IRIs. Skolemize
section nodes deterministically into concept-scoped IRIs
`<conceptIri>/.well-known/genid/okfsec_<id>_<n>` (the node's own genid scheme, so
the stored graph is identical) instead of emitting `_:` blank nodes. The mapper now
emits zero blank-node terms, so live import succeeds without the reviewer's local
patch. Counts are unchanged (crypto_bitcoin still 5 concepts / 101 triples).

Coupled export fix: the CLI export reconstructed concepts by `iri.startsWith(iriBase)`
alone, which also matched skolemized `.well-known/genid/...` section subjects and
rebuilt them as standalone `.well-known/genid/*.md` files that were never concepts.
Keep only real concept roots — subjects under the IRI base that carry an OKF concept
rdf:type, excluding genid descendants.

Tests: new mapping regression (no blank-node terms; deterministic, stable section
IRIs) and updated round-trip projection. 68 okf tests pass; build + typecheck green.

Closes the live-import blocker in OriginTrail#1331 (review by @TomazOT).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
@Zigoljube

Copy link
Copy Markdown
Author

Thanks @TomazOT — excellent catch, and thanks for taking it all the way through a live v10.0.1 import. Your diagnosis was exactly right on both counts. Fixed in 469eba2:

Blank-node section objects → deterministic skolem IRIs. The mapper no longer emits _: blank nodes for dkg:hasSection. Section nodes are skolemized into concept-scoped IRIs using the node's own scheme — <conceptIri>/.well-known/genid/okfsec_<id>_<n> — so the stored graph is identical to what the node produces, and the first concept write no longer fails on a strict daemon. The mapper now emits zero blank-node terms (verified on the crypto_bitcoin fixture), so the local skolemization patch is no longer needed.

Export re-creating .well-known/genid/*.md. Export now selects only real concept roots — subjects under the IRI base that carry an OKF concept rdf:type, excluding /.well-known/genid/ descendants — so skolemized section nodes are no longer rebuilt as standalone concept files.

Counts are unchanged (crypto_bitcoin still 5 concepts / 101 triples / 11 schema:mentions / 10 citations / 0 broken), with a new regression test asserting no blank-node terms plus stable, deterministic section IRIs (68 okf tests pass).

Working through the rest of your review next — the importimport --share manifest reuse (so --share doesn't skip finalize/share and typed-literal fidelity in declare -x AI_AGENT="claude-code_2-1-191_agent"
declare -x ANTHROPIC_MODEL="claude-opus-4-8"
declare -x CLAUDECODE="1"
declare -x CLAUDE_CODE_CHILD_SESSION="1"
declare -x CLAUDE_CODE_ENTRYPOINT="cli"
declare -x CLAUDE_CODE_EXECPATH="/Users/zigadrev/.local/share/claude/versions/2.1.191"
declare -x CLAUDE_CODE_SESSION_ID="aa0664d7-2e3e-410b-8f58-0947fc837c0b"
declare -x CLAUDE_EFFORT="high"
declare -x COLORTERM="truecolor"
declare -x COREPACK_ENABLE_AUTO_PIN="0"
declare -x GIT_EDITOR="true"
declare -x HOME="/Users/zigadrev"
declare -x HOMEBREW_CELLAR="/opt/homebrew/Cellar"
declare -x HOMEBREW_PREFIX="/opt/homebrew"
declare -x HOMEBREW_REPOSITORY="/opt/homebrew"
declare -x INFOPATH="/opt/homebrew/share/info:"
declare -x LC_CTYPE="UTF-8"
declare -x LOGNAME="zigadrev"
declare -x NoDefaultCurrentDirectoryInExePath="1"
declare -x OLDPWD
declare -x OSLogRateLimit="64"
declare -x PATH="/Users/zigadrev/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/pkg/env/global/bin"
declare -x PWD="/Users/zigadrev/dkg"
declare -x SHELL="/bin/bash"
declare -x SHLVL="2"
declare -x SSH_AUTH_SOCK="/var/run/com.apple.launchd.HPlgQRLMXI/Listeners"
declare -x TERM="xterm-256color"
declare -x TERM_PROGRAM="Apple_Terminal"
declare -x TERM_PROGRAM_VERSION="470.2"
declare -x TERM_SESSION_ID="0A1CCE2F-CACA-4C7E-AD07-6FF01EF77B0A"
declare -x TMPDIR="/var/folders/fw/yys1yzv938sczqh0b7574_040000gn/T/"
declare -x USER="zigadrev"
declare -x XPC_FLAGS="0x0"
declare -x XPC_SERVICE_NAME="0"
declare -x __CFBundleIdentifier="com.apple.Terminal".
EOF
)

Two correctness bugs from the automated review on OriginTrail#1331:

1. import → import --share skipped finalize/share. The resumability manifest
   recorded a flat "done" set, so the documented WM-then-share flow saw every
   concept as done and continued past it before finalize/share ran — reporting
   memoryLayer:"SWM" with nothing actually shared. Track the furthest STAGE per
   concept ('wm' = created+written, 'swm' = finalized+shared); a later --share now
   advances WM concepts to SWM instead of skipping them. Legacy "done" manifests
   are read as stage 'wm'; the per-concept manifest is namespaced (mode:
   "per-concept") so it can't be confused with the bulk-private-swm manifest. The
   summary now reports assetsCreated and assetsShared.

2. Typed producer-key scalars lost their datatype on export. `count: 3` imports as
   "3"^^xsd:integer but export kept only the lexical "3", so re-import produced a
   plain string literal. Export now recovers native integer/decimal/boolean values
   so the datatype round-trips. New test asserts xsd:integer/decimal/boolean
   survive import → export → import.

69 okf tests pass; CLI build + typecheck green; crypto_bitcoin dry-run unchanged
(5 concepts / 101 triples / 11 links / 10 citations).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
// down to a floor, so a too-big batch degrades instead of failing.
const writeSlice = async (slice: typeof allQuads): Promise<void> => {
try {
const res = await client.sharedMemoryWrite(contextGraphId, slice);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Private imports ignore the requested subgraph

What's wrong
The user-facing option is accepted but silently ignored in bulk private mode, so data lands in a different graph partition than requested.

Example
Run dkg okf import ./bundle --context-graph-id cg --private --sub-graph-name patents. The command reports success, but it writes to the default SWM graph. A later query/export scoped to subgraph patents will not find the imported corpus.

Suggested direction
Thread subGraphName through sharedMemoryWrite just like the non-private KA write/finalize/share path does.

For Agents
Update the private import path and ApiClient.sharedMemoryWrite to carry subGraphName, and prove with a CLI/client test that --private --sub-graph-name x writes to the subgraph-specific shared-memory graph and is queryable with the same subgraph option.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: --private can drop triples when one concept spans chunks

What's wrong
The private import path slices a flat quad stream every 5,000 quads. Shared-memory writes are root-entity upserts, so a later slice containing the same concept can replace the earlier slice rather than append to it. The command can report a completed import while SWM has silently lost part of a large concept.

Example
A single OKF concept that maps to 5,200 quads is split across chunk 1 and chunk 2. Chunk 1 writes the first 5,000 triples for urn:okf:large; chunk 2 contains the remaining 200 triples for the same subject. The second /api/shared-memory/write replaces the existing root entity, so SWM ends up with only the tail of the concept instead of all 5,200 triples.

Suggested direction
Chunk by complete concept/root-entity groups, and handle an oversized single concept with a payload-splitting mechanism that preserves append semantics instead of replacing prior triples.

For Agents
Fix packages/cli/src/commands/okf.ts private import chunking so a root entity/concept is never split across separate replacement writes, or use an append/update path that does not delete earlier triples for the same root. Add a regression with one concept larger than CHUNK and prove all its triples remain after --private import.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: OKF sub-graph imports/exports target the wrong API path

What's wrong
The command advertises sub-graph support, but private imports silently land in the root shared-memory graph and exports with a sub-graph cannot run through the current query API shape. That can make data appear missing or publish/share the wrong graph partition.

Example
dkg okf import ./bundle --context-graph-id cg --sub-graph-name research --private reports a private SWM import, but writes to did:dkg:context-graph:cg/_shared_memory rather than did:dkg:context-graph:cg/research/_shared_memory. Then dkg okf export cg ./out --sub-graph-name research is rejected by the query layer because subGraphName is combined with view: 'shared-working-memory'.

Suggested direction
Thread subGraphName through the SWM write API and adjust export routing, or fail fast when sub-graphs are unsupported for those modes.

For Agents
In packages/cli/src/commands/okf.ts, either reject --sub-graph-name for private import/export modes or wire it through correctly: extend/use sharedMemoryWrite with subGraphName, and use a query path compatible with sub-graph SWM instead of view+subGraphName. Add CLI tests against the real ApiClient request body and query-engine behavior, not only the permissive stub.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Resume manifests can skip imports for a different target or mapping

What's wrong
The resume logic treats a previous import into the same context graph as equivalent even when the current run targets a different sub-graph or produces different quads. The command can therefore report a successful import while writing nothing to the requested target.

Example
Run dkg okf import ./bundle --context-graph-id cg --sub-graph-name a, then run dkg okf import ./bundle --context-graph-id cg --sub-graph-name b. The second run loads the first manifest, sees every concept at wm, skips all writes, and reports success with zero assets created in sub-graph b. The same stale-skip pattern applies after changing mapping options such as --include-code-span-links or --relate.

Suggested direction
Include the destination and mapping options in the manifest identity, or ignore the manifest when those values differ.

For Agents
Scope OKF import manifests to all inputs that affect the destination or generated quads: at minimum contextGraphId, subGraphName, import mode, iriBase, code-span policy, type relations, and ideally a bundle/content fingerprint. Tests should prove that importing the same bundle into two sub-graphs writes both, while import followed by import --share for the same target still advances WM to SWM.

* `dkg okf` — ingest a Google Open Knowledge Format (OKF) bundle into the DKG as
* verifiable, owned Knowledge Assets (reconstructing the cross-concept link
* graph), and serialise a Context Graph back into a conformant OKF bundle.
*

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Split the monolithic OKF command before it becomes the CLI catch-all

What's wrong
This new command grows as one 690-line function with many nested helpers and three substantial workflows. That makes the implementation harder to scan, harder to test without invoking commander, and easier to accidentally couple: private import manifest logic, per-concept sharing, export reconstruction, and verify all now share one scope. The code quality problem is structural rather than stylistic: the command layer is carrying orchestration and reusable policies that should be named and isolated.

Example
A reader trying to change the manifest format has to reason about commander option wiring, node API calls, chunk retry policy, SWM sharing, export SPARQL, and verify predicates in the same lexical scope.

Suggested direction
Keep registerOkfCommand as wiring only and move the three subcommand implementations plus shared manifest/chunk/query helpers behind named functions. The code already has natural seams: offline mapping, context graph setup, private bulk write, per-concept promotion, export query/rebuild, and verify counts.

For Agents
Split packages/cli/src/commands/okf.ts into a thin commander registration plus focused helpers/modules, e.g. okf-import.ts, okf-export.ts, okf-verify.ts, and a small manifest/chunk writer helper. Preserve all existing CLI flags and output shapes; smoke-test dkg okf import --dry-run, import path registration, export wiring, and verify wiring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Keep OKF command registration from becoming the orchestration layer

What's wrong
This command is described as a thin wrapper, but the new handler centralizes multiple responsibilities: bundle mapping, context-graph creation, private bulk streaming, per-concept KA lifecycle, resumability, export query normalization, and verification. That makes the command file a growing coordination knot instead of a stable boundary, and future modes will add more branching to the same closure.

Example
A small future change such as adding another import mode or changing resumability now has to touch option parsing, manifest shape checks, node orchestration, progress output, and summary generation inside one large nested handler.

Suggested direction
Extract the import/export/verify workflows and their helpers into separate modules or pure functions, leaving this file as a thin CLI surface.

For Agents
Refactor packages/cli/src/commands/okf.ts so registerOkfCommand only declares Commander commands. Move implementation into focused units such as runOkfImport, runOkfExport, runOkfVerify, and small helpers for graph-bound quads and summaries. Preserve CLI output and exit codes; cover with existing okf-subcommands tests.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Decompose the okf command before it becomes the implementation layer

What's wrong
The PR says the CLI is a thin wrapper over @origintrail-official/dkg-okf, but the wrapper now owns several independent abstractions. That makes future changes harder because unrelated concepts share one large lexical scope and helper set.

Example
A change to the manifest format or retry policy now requires editing the same Commander callback that also owns option declarations and user-facing output. The export and verify paths add their own SPARQL/result-shaping logic in the same closure, so the command file is becoming the implementation layer rather than a thin CLI wrapper.

Suggested direction
Keep registerOkfCommand mostly declarative: parse options, call a typed runner, print the runner result. Move manifest/state-machine logic and node API orchestration into small modules that can be read and tested independently.

For Agents
Split packages/cli/src/commands/okf.ts into command registration plus focused runners such as okf/import-runner.ts, okf/export-runner.ts, okf/verify-runner.ts, and shared helpers for manifest I/O, query bindings, and exit/error handling. Preserve the existing CLI behavior and keep current CLI subcommand tests passing against the same daemon stub.

for (const f of outFiles) {
const full = join(outDir, f.path);
await mkdir(dirname(full), { recursive: true });
await writeFile(full, f.content, 'utf-8');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Do not redeclare a canonical RDF constant inside the command

What's wrong
This shadows the package-level RDF_TYPE import with a second same-named constant inside an already large function. The value happens to match today, but it creates two sources of truth for a vocabulary term this PR otherwise tries to centralize in packages/okf/src/constants.ts. That undercuts the boundary cleanliness of the new OKF package and makes later vocabulary changes harder to reason about.

Example
The export action's concept filter uses RDF_TYPE, while verify later uses the shadowing local constant with the same value. A future change to the package constant or one call site can silently diverge because the names look canonical but are not the same binding.

Suggested direction
Use the imported RDF_TYPE from @origintrail-official/dkg-okf for verify as well. If verify needs its own integrity predicate list, compose it from package constants rather than string literals.

For Agents
Remove the local RDF_TYPE declaration in packages/cli/src/commands/okf.ts and reuse the imported constant everywhere. Check the verify SPARQL and export concept filtering still compile and produce the same predicate string.

await client.createContextGraph(
contextGraphId,
contextGraphId,
undefined,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Reframe the two import modes around a shared import plan

What's wrong
The importer currently forks into two large orchestration paths with different manifest models and separate quad preparation. The private path is not just a different sink; it introduces a parallel import model in the middle of the command. That makes the command harder to extend because shared concepts like graph decoration, chunking, resumability, and reporting are scattered across mode-specific branches instead of owned by one import abstraction.

Example
A future option that should apply to both import modes, such as changing graph assignment, retry reporting, or resumability metadata, has to be implemented twice across two different state models: concept stages for normal import and chunk indexes for private import.

Suggested direction
Build a single typed import plan once, then have small executors consume it for per-concept KA writes or bulk private SWM writes. That would keep mode-specific node calls isolated without duplicating mapping, graph decoration, manifest policy, and progress accounting in the command body.

Confidence note
This is a design concern from the diff shape; whether private loose-quad import must intentionally diverge from KA import may need author confirmation.

For Agents
Extract a shared import plan from imported.concepts first, with typed steps for conceptId, kaName, quads, and target graph. Then implement separate executors only for the real transport difference: KA writes versus loose SWM writes. Preserve --private output and resumability behavior; verify dry-run summaries and both manifest formats continue to work or migrate explicitly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Model the OKF import manifest instead of parsing two ad-hoc shapes inline

What's wrong
Resumability is part of the import state model, but the command encodes it as two unrelated JSON casts against the same file. This spreads mode strings, legacy migration, corruption handling, and persistence formatting through the orchestration path, making the resume behavior harder to reason about than it needs to be.

Example
The same default manifest path can hold either { mode: 'bulk-private-swm', chunksDone } or { mode: 'per-concept', stages }; each branch hand-rolls JSON parsing, filtering, and migration rules separately.

Suggested direction
Extract manifest loading/saving into a small typed store with an explicit discriminated union for bulk-private-swm and per-concept.

For Agents
Introduce a typed OkfImportManifest discriminated union plus loadOkfManifest / saveOkfManifest helpers. Keep legacy done migration in that helper. Both import modes should receive a typed progress object instead of parsing and validating JSON inline.

stages?: Record<string, Stage>;
done?: string[]; // legacy format → treat as reached 'wm'
};
if (prev.contextGraphId === contextGraphId && prev.mode !== 'bulk-private-swm') {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The resumable import/share state machine is not verified at the CLI boundary

What's wrong
The mapper has good unit coverage, but the risky behavior here is in the CLI wrapper: it decides whether to create/write assets, finalize/share them, and persist stage state. Without a test, a regression can make the documented import then import --share workflow silently skip sharing or re-create assets while the mapper tests still pass.

Example
A CLI test could seed .okf-import-manifest.json with { "contextGraphId": "cg", "done": ["a"] }, run okf import <bundle> --context-graph-id cg --share, and assert that knowledgeAssetFinalize and knowledgeAssetShare are called for a while createKnowledgeAsset/knowledgeAssetWrite are not.

Suggested direction
Add command-level tests for fresh import, import followed by --share, legacy done manifest upgrade, and already-shared skip behavior. These tests do not need a live node; a mocked ApiClient is enough to prove the API call sequence and manifest updates.

For Agents
Add focused CLI tests around registerOkfCommand in packages/cli/test, mocking ApiClient.connect and using a tiny OKF bundle. Preserve the intended stage semantics: prior WM imports and legacy done manifests must advance to SWM on --share, while already-swm stages are skipped.

Addresses the remaining automated-review items on OriginTrail#1331 (API/test
quality):

- pubNumber no longer exposes the private ResolvedOpts shape. The public helper
  now takes PatentGenOptions and resolves defaults internally (internal callers
  use a private pubNumberAt). The previously-fake test (which void'd pubNumber and
  only inspected a generated path) now calls pubNumber(OPTS, i) directly and
  asserts it recomputes the exact publication number for every index.

- New CLI-level tests (test/okf-subcommands.test.ts) drive the compiled CLI against
  an in-process daemon stub (no hardhat; added to the fast unit lane). They lock
  down behaviours the pure mapper tests can't: --dry-run NEVER contacts the node;
  WM import creates KAs + writes without finalize/share; import then import --share
  ADVANCES WM→SWM instead of skipping (regression for the manifest-stage fix);
  --private bulk-streams via /api/shared-memory/write with a chunked manifest; and
  export filters skolemized .well-known/genid section nodes (regression for the
  export fix).

All green: okf 69, ip-oracle 7, okf CLI 5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
Comment thread packages/cli/src/cli.ts
registerPublisherCommand(program);
registerEpcisCommand(program);
registerOkfCommand(program);
registerIpOracleCommand(program);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Do not promote the synthetic IP oracle harness into the core CLI by default

What's wrong
The PR adds a domain-specific synthetic data generator as a first-class runtime command next to core DKG commands. That widens the production CLI surface with demo/harness logic and couples a specific patent-oracle workflow to the generic OKF integration, which is architectural drift rather than a natural extension of the core command set.

Example
A user installing the production CLI now gets a top-level ip-oracle command and a runtime package for synthetic patent corpus generation, even though the generic reusable surface is the OKF importer.

Suggested direction
Keep the reusable OKF importer in the core CLI, but move synthetic corpus generation to an examples/dev package or integration plugin unless it is a deliberately supported top-level product surface.

Confidence note
This assumes ip-oracle is intended as a demo/engineering harness, based on the new command and package documentation. If the project explicitly wants it as a supported production CLI surface, the concern is weaker but the boundary should still be documented.

For Agents
Decide whether ip-oracle is a supported product command or a harness. If it is a harness, move it under examples/dev tooling or an integration/plugin entry point instead of registering it in packages/cli/src/cli.ts and the runtime build. If it remains top-level, add a clear ownership boundary and avoid coupling future demo generators directly into the core CLI.

.description('IP / Patent Context Oracle tooling (synthetic OKF patent corpora)');

cmd
.command('generate <outDir>')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: ip-oracle generate has only library tests, not command tests

What's wrong
The generator library is tested, but the new user-facing command that parses options, validates --count, calls the generator, and emits the JSON summary is not. A broken CLI registration, bad commander parser, or wrong exit code could ship while the package unit tests still pass.

Example
dkg ip-oracle generate /tmp/out --count 2 --seed 7 should exit 0, print JSON with mode: "generate", and create the OKF files. dkg ip-oracle generate /tmp/out --count 0 should exit 2. Those user-facing behaviors are not currently asserted.

Suggested direction
Cover the registered CLI path, not just the underlying generator library, so option parsing and exit behavior are locked down.

For Agents
Add a CLI-level test in packages/cli/test that runs the compiled CLI with a temp output directory. Assert success output and generated files for a small count, and assert the invalid-count exit path. This can run without a daemon because the command only writes files.

@Zigoljube

Copy link
Copy Markdown
Author

Follow-up @TomazOT — the rest of the review is now addressed too, on top of the blank-node/export fix in 469eba2:

3ba21b880

  • importimport --share no longer skips finalize/share. The resumability manifest tracked a flat "done" set, so the documented WM-then-share flow saw every concept as done and reported memoryLayer: "SWM" with nothing actually shared. It now records the furthest stage per concept (wm = created+written, swm = finalized+shared); a later --share advances WM concepts to SWM instead of skipping them (legacy done manifests read as stage wm, and the per-concept manifest is namespaced so it can't collide with the --private bulk manifest). The summary now reports assetsCreated / assetsShared.
  • Typed producer-key scalars survive export. count: 3 imported as "3"^^xsd:integer but export kept only the lexical "3", so re-import produced a plain string literal. Export now recovers native integer/decimal/boolean values so the datatype round-trips.

1ef65ff4b

  • pubNumber no longer exposes the private resolved-options shape — it takes the public PatentGenOptions and resolves internally; the previously-no-op test now calls it directly and asserts the exact publication number for every index.
  • Added CLI-level tests driving the compiled CLI against an in-process daemon stub (no hardhat): --dry-run never contacts the node, WM import vs --share advance, --private bulk SWM + manifest, and the export skolem-node filter — so the two fixes above can't silently regress.

Totals now: okf 69 tests, ip-oracle 7, okf CLI 5 — all green; build + typecheck clean. I believe that covers every item in the review (the blocker plus the bot's bugs/issues); happy to adjust anything if I've missed a case.

Addresses the substantive review findings (the compile/CLI-test/share-manifest/
typed-literal/export-genid items were already fixed in 469eba2..1ef65ff):

Security / correctness:
- Loader no longer follows symlinks (lstat, not stat): a bundle could ship
  `secret.md -> ~/.dkg/auth.token` and get a local file slurped into the graph
  and gossiped. Symlinked entries are skipped + reported; the CLI warns.
- Export path-traversal guard: concept IDs come from (untrusted) graph subjects,
  so `urn:okf:../../escape` could write outside the output dir. Export now
  refuses any file that doesn't resolve under outDir.
- conceptIdToKaName is now injective (`_`->`_5f`, `/`->`_2f`): the old `/`->`__`
  collapsed `a/b` and the literal concept `a__b` onto one node KA name.
- `okf verify` scopes counts to subjects under the IRI base, so unrelated
  pre-existing triples can't mask a real shortfall as "complete".
- `--replace` discards each concept's existing WM draft before re-writing, so a
  changed re-import doesn't accumulate stale triples.

Feature / honesty:
- Opt-in `--relate "<FromType>><ToType>=<predicate>"` (typeRelations): type
  cross-concept edges by endpoint types deterministically — e.g. dataset->table
  = schema:hasPart, table->table stays schema:mentions. Default unchanged (all
  mentions), preserving the "untyped links per SPEC §5.3" guarantee. hasPart
  round-trips through export as a plain OKF link.
- ip-oracle imports BundleFile from dkg-okf instead of re-declaring it.
- `--private` help no longer oversells "10k-10M": it's batched (in-memory map +
  5k-quad chunks), practical to ~100k/run, not yet streaming.
- CONTEXT.md documents the opt-in typing, the deliberate graph-faithful (not
  byte-faithful) round-trip, and the security guards.

Tests: okf 77 (+ injective-name, symlink-skip, type-relations, false-boolean),
CLI 9 (+ export-traversal, --relate, --replace, symlink-warn). Coverage ≥
thresholds; build + typecheck green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MtRTg9G7j9fzP1pG8o843a
@Zigoljube Zigoljube changed the title feat(okf): Google OKF → DKG integration — import OKF bundles as verifiable Knowledge Assets feat(okf): Google OKF → DKG integration — import OKF bundles as deterministic, provenance-bearing Knowledge Assets Jun 27, 2026
@Zigoljube

Copy link
Copy Markdown
Author

Worked through the latest round of review feedback. Pushed 441455271 (on top of 469eba299..1ef65ff4b). Quick notes, including where a couple of points had already been addressed:

Already fixed before this round (so a few "straw man" items are stale):

  • "It doesn't compile" (--view verified vs verifiable-memory, TS2322) — not reproducible on current HEAD: both sides use verified-memory and a clean tsc --noEmit exits 0. CI should be green now.
  • The CLI wrapper now has tests (it didn't when first reviewed): 9 CLI tests drive the compiled dkg okf against an in-process daemon stub.
  • import--share no longer no-ops (stage-aware manifest, 3ba21b880); typed literals round-trip (3ba21b880); export no longer rebuilds .well-known/genid concepts (469eba299).

Fixed in 441455271:

  • Loader no longer follows symlinks (lstat) — a secret.md -> ~/.dkg/auth.token symlink could otherwise be slurped into the graph and gossiped. Skipped + warned.
  • Export path-traversal guard — concept IDs come from untrusted graph subjects, so urn:okf:../../escape could write outside outDir. Export now refuses it.
  • conceptIdToKaName is injective (__5f, /_2f) — the old /__ collapsed a/b and the literal a__b onto one node name.
  • verify scopes counts to the IRI base — graph-wide counts could mask a real shortfall.
  • --replace discards the WM draft before re-writing (stale triples on re-import).

On the conceptual point (mentions vs containment): good call — added an opt-in, deterministic --relate "<FromType>><ToType>=<predicate>" (e.g. BigQuery Dataset>BigQuery Table=hasPart, while Table>Table stays mentions). It types edges purely from endpoint type (byte-stable, no LLM), and stays off by default so the untyped-link guarantee (§5.3) holds unless opted in. hasPart round-trips through export as a plain OKF link, since OKF can't express the relation type.

Disclosure / honesty: packages/ip-oracle is now described in the PR body (and imports BundleFile from dkg-okf rather than re-declaring it); the --private mode is described honestly as batched/in-memory (~100k/run, not yet streaming); and the title/body now say "provenance-bearing / verification-ready" rather than implying on-chain verification on the default path.

Round-trip remains graph-faithful, not byte-faithful by design (prose isn't recoverable from triples) — documented in export.ts and CONTEXT.md; a future enhancement could have export add provenance (UAL/seal) when serialising a published graph.

Totals: 93 tests (okf 77 / ip-oracle 7 / CLI 9), coverage ≥ thresholds, build + typecheck green. Happy to keep iterating.

break;
}
case SCHEMA_MENTIONS:
case SCHEMA_HAS_PART: {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Export drops custom typed concept edges

What's wrong
The importer allows callers to type OKF links with custom predicates, but the exporter only knows how to turn schema:mentions and schema:hasPart back into Markdown links. Custom typed relationships can disappear from an exported OKF bundle, breaking graph-faithful round trips for an advertised import mode.

Example
Import a bundle with --relate 'Dataset>Table=https://example.org/contains'. The graph contains <dataset> <https://example.org/contains> <table>, but exportBundle does not treat that object-under-iriBase triple as an OKF link, so the exported bundle loses the dataset-to-table link graph on re-import.

Suggested direction
Treat concept-object relation triples from supported custom predicates as exportable OKF links, or constrain the CLI contract to the built-in predicates that round-trip.

Confidence note
This affects custom --relate predicates beyond the built-in schema:hasPart case; if the product intends only hasPart to round-trip, the CLI help and docs should narrow that contract.

For Agents
In packages/okf/src/export.ts, preserve the relationship graph for any configured/custom concept-to-concept edge predicate, or explicitly restrict --relate to predicates that export supports. A focused test should import with a custom relation, export, re-import, and assert the target concept link remains present as an OKF link.

const quads: Quad[] = [];
for (const doc of docs) {
const mapping = mapConcept(doc, iriByConceptId[doc.conceptId], exists, opts);
if (typeRelations.length > 0) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Push type-pair edge selection into the edge mapper instead of mutating quads afterward

What's wrong
This is a small but telling abstraction leak. Link typing is modeled as a policy over concepts, yet the implementation treats it as a late RDF string rewrite. That makes the edge model harder to extend and obscures where edge semantics actually live.

Example
A future edge predicate option would have to remember that the canonical edge builder is not the only place predicates are chosen. It would need to coordinate mapConcept's addEdge, ConceptMapping.resolvedLinks, and the later mutation loop in bundle.ts.

Suggested direction
Let the code that creates an edge choose its predicate once, from typed concept IDs, instead of rewriting raw RDF quads after the fact. That removes the second scan, the q.object.startsWith(iriBase) coupling, and the in-place mutation.

For Agents
Move relation resolution into the edge creation path. For example, build a typed ConceptIndex in importBundle and pass an edgePredicate(fromConceptId, targetConceptId) callback into mapConcept, or make mapConcept accept indexed endpoint metadata. Preserve the default schema:mentions behavior and the existing --relate output.

expect(manifest.stages).toEqual({ a: 'swm', b: 'swm' });
});

it('--private bulk-streams quads via /api/shared-memory/write with a chunked manifest', async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The private-import test does not actually verify chunking or 413 retry

What's wrong
The test name and assertions give coverage for the happy private-write path, but not for the high-risk behavior added for bulk imports: multi-chunk progress, payload-too-large splitting, and resumability. A regression in those paths would still leave this test green.

Example
Failing-test sketch: build a fixture with one concept containing more than 5,000 emitted quads, or enough concepts to exceed the chunk size. Have the stub return HTTP 413 for the first oversized /api/shared-memory/write, then assert the CLI retries with smaller slices, records the final manifest, and exits successfully. A second test can prepopulate chunksDone: 1 and assert resume skips the first chunk.

Suggested direction
Add a targeted large-fixture or generated-fixture CLI test that forces multiple chunks and a 413 split, plus a resume case using an existing manifest.

For Agents
Update the OKF CLI private-import tests to cover the CHUNK boundary, 413 halve-and-retry path, and manifest resume behavior. Preserve the assertion that the private path does not create per-concept knowledge assets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants