Skip to content

feat(okf): Google OKF → DKG integration (OKF-only, carved from #1331)#1388

Merged
branarakic merged 3 commits into
mainfrom
feat/okf-integration-only
Jun 30, 2026
Merged

feat(okf): Google OKF → DKG integration (OKF-only, carved from #1331)#1388
branarakic merged 3 commits into
mainfrom
feat/okf-integration-only

Conversation

@branarakic

Copy link
Copy Markdown
Contributor

OKF-only carve-out of #1331 (@Zigoljube). The @origintrail-official/dkg-ip-oracle package and its CLI command are intentionally excluded here so the OKF integration can ship on its own; ip-oracle will follow in a separate PR.

What this ships

The @origintrail-official/dkg-okf package + the dkg okf CLI command — import a Google OKF bundle into a Context Graph as deterministic, provenance-bearing Knowledge Assets:

  • deterministic offline mapping (bundle → N-Quads), with a dkg okf verify completeness gate;
  • --relate typed cross-concept edges (mirrored in verify);
  • --private bulk SWM write with a resumable chunked manifest + invite-only CG guard;
  • dkg okf export (round-trips the graph back to a bundle).

What's different vs #1331

Exactly the carve-out: the packages/ip-oracle/* package and commands/ip-oracle.ts are dropped, and the cli wiring (cli.ts registration, cli/package.json dep, vitest.coverage.ts tier, lockfile) no longer references ip-oracle. No OKF logic changedgrep ip-oracle across the tree is clean and OKF has no dependency on it.

This branch is cut fresh off main (so the diff is OKF-only with no ip-oracle add/remove churn) and includes the okf verify --relate fix + hardened privacy/scoping tests from the #1331 review.

Validation

Coordination

Private OKF→VM mainnet publishing depends on the V10 leaf-canonicalization fix (#1386). The import tooling ships here; hold mainnet private OKF→VM publishes until #1386 is released.

Supersedes the OKF half of #1331 — that PR can be reduced to ip-oracle (or closed once this lands and ip-oracle is reopened separately).

🤖 Generated with Claude Code

…inistic, provenance-bearing KAs)

OKF-only carve-out of #1331 (@Zigoljube). The `@origintrail-official/dkg-ip-oracle`
package and its CLI command are intentionally EXCLUDED here so OKF can ship on its own
and ip-oracle can follow later — this PR is the OKF half only.

Imports a Google OKF bundle into a Context Graph as deterministic, provenance-bearing
Knowledge Assets: a deterministic offline mapping, a `verify` completeness gate,
`--relate` typed edges, `--private` bulk SWM with a resumable manifest, and export.
Adds the `@origintrail-official/dkg-okf` package + the `dkg okf` CLI command; removes
the ip-oracle dep/registration/coverage tier from the cli wiring.

Validated live on Base mainnet and against a v10.0.1 edge node during #1331 review.
okf 77 / okf-subcommands 12 green; 21/21 build.

Note: private OKF→VM mainnet publishing depends on the V10 leaf-canonicalization fix
(#1386); the import tooling ships here, but hold mainnet private OKF→VM publishes
until #1386 is released.

Co-Authored-By: Zigoljube <257808628+Zigoljube@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread packages/okf/test/bundle.test.ts Fixed
Comment thread packages/cli/src/cli.ts
import { registerQueryCatalogCommand } from './commands/query-catalog.js';
import { registerMaintenanceCommands } from './commands/maintenance.js';
import { registerRandomSamplingCommand } from './commands/random-sampling.js';
import { registerOkfCommand } from './commands/okf.js';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The top-level OKF import is not covered by the runtime build graph

What's wrong
This adds a required startup dependency for the whole CLI, but the existing runtime build path still omits the new workspace package. That can make the packaged or auto-updated CLI fail to build or fail to start, even for unrelated commands.

Example
On a fresh source/runtime build that runs the existing build:runtime:packages path, packages/okf/dist/index.js is not produced. Starting node packages/cli/dist/cli.js status can fail before command dispatch because commands/okf.js imports @origintrail-official/dkg-okf.

Suggested direction
Wire @origintrail-official/dkg-okf into the same build paths as dkg-epcis before loading the command from the CLI entrypoint.

For Agents
Add the new OKF package to the CLI build graph: include it in packages/cli/tsconfig.json references/prebuild as needed and in the root runtime package build filter used by auto-update/release builds. Prove a clean runtime build can execute a non-OKF command and dkg okf --help.

Comment thread packages/cli/src/commands/okf.ts Outdated
// down to a floor, so a too-big batch degrades instead of failing.
const writeSlice = async (slice: typeof allQuads): Promise<void> => {
try {
const res = await client.sharedMemoryWrite(contextGraphId, slice);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Private bulk imports ignore --sub-graph-name

What's wrong
The per-concept import path honors subGraphName, and export can read from a sub-graph, but the private bulk path silently drops the option. Data lands in the root context graph instead of the requested sub-graph, which breaks isolation and makes scoped reads/export verify the wrong location.

Example
dkg okf import ./bundle --context-graph-id cg --sub-graph-name team --private --create-context-graph should write to did:dkg:context-graph:cg/team/_shared_memory, but the request body only contains { contextGraphId, quads }, so the daemon writes the bundle into the root cg SWM bucket. A later dkg okf export cg ./out --sub-graph-name team finds nothing.

Suggested direction
Forward the sub-graph option through the shared-memory write API and make the resumability manifest distinguish root vs sub-graph imports.

For Agents
Extend ApiClient.sharedMemoryWrite to accept and serialize subGraphName, pass the import option from the OKF private path, and include sub-graph identity in the private manifest. Add an end-to-end stub assertion that --private --sub-graph-name x sends subGraphName: "x".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Private OKF import ignores --sub-graph-name

What's wrong
The per-concept import path passes subGraphName into KA create/write/finalize/share, but the private bulk path drops it. Because the shared-memory route stores into the graph selected by request metadata, not the caller-supplied quad graph, private imports targeting a subgraph silently land in the root SWM graph.

Example
Run dkg okf import ./bundle --context-graph-id cg --private --sub-graph-name notes, then dkg okf export cg ./out --sub-graph-name notes. The import reports success, but the export/query for notes will not see the triples because they were written to the root shared-memory graph.

Suggested direction
Thread subGraphName through the shared-memory write API instead of relying on the quad graph field.

For Agents
Update ApiClient.sharedMemoryWrite to accept/pass subGraphName, call it from the OKF private path with the parsed option, and include the subgraph in the private manifest identity. Add a CLI stub test that asserts /api/shared-memory/write receives subGraphName and that a subgraph export can read the private import.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Per-concept OKF import reuses manifests across sub-graphs

What's wrong
A resume manifest from one graph partition can make a later import into a different sub-graph skip all concepts or try to share assets that were never created in that sub-graph. That silently writes to the wrong logical target and loses the intended import.

Example
Run dkg okf import ./bundle --context-graph-id cg first. The manifest records { a: "wm", b: "wm" }. Then run dkg okf import ./bundle --context-graph-id cg --sub-graph-name team; the loop sees each concept already at target stage wm and skips every create/write, so cg/team receives no OKF assets while the command reports success.

Suggested direction
Include the target sub-graph in the manifest identity before reusing stages; ignore or reset manifests whose sub-graph does not match the current run.

For Agents
In packages/cli/src/commands/okf.ts, update the per-concept manifest read/write path to persist and compare subGraphName the same way the bulk-private manifest does. Preserve root-graph legacy manifests only for root runs, and add a CLI subcommand test that root and named sub-graph imports do not share stages.

const iriByConceptId: Record<string, string> = {};
const typeByConceptId: Record<string, string | undefined> = {};
for (const f of ordered) {
if (!isConceptFile(f.path)) continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Invalid concept file paths are minted into RDF subjects

What's wrong
The mapper validates resolved link targets but not the bundle files that define concepts. Non-OKF path segments can become concept IDs and are concatenated into IRIs, producing malformed RDF terms and write-time failures instead of a clear conformance error.

Example
A bundle containing bad name.md with valid frontmatter passes validation, gets concept ID bad name, and mints subject IRI urn:okf:bad name. That IRI contains a space, so dry-run can emit invalid N-Quads and node writes can fail instead of reporting the bundle as invalid up front.

Suggested direction
Validate every concept path segment before parsing/mapping and refuse or skip invalid concept IDs before deriving IRIs and KA names.

Confidence note
This depends on the intended OKF path contract, but the package itself documents the segment regex and uses it for link targets, while actual concept file paths bypass it.

For Agents
Apply segment validation to concept file paths during validation and import. validateBundle should hard-error invalid concept IDs, and importBundle should skip or reject them consistently. Add a fixture with bad name.md and -bad.md proving no invalid subject IRIs are minted.

} from '@origintrail-official/dkg-okf';
import { ApiClient } from '../api-client.js';
import type { ActionOpts } from '../cli-helpers.js';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The OKF CLI command is doing too many jobs in one monolithic function

What's wrong
The new command already mixes several layers that change for different reasons. That makes it harder to scan, harder to test below the process level, and likely to accumulate more special-case branches. The package docs call the CLI a thin wrapper, but the wrapper now owns substantial policy and graph-shaping logic.

Example
A future change to SPARQL binding shape, manifest format, or import stage semantics has to edit the same nested command function rather than a focused importer/exporter/verify module.

Suggested direction
Split this into a small registerOkfCommand, separate import/export/verify runners, and shared CLI utilities for manifests, binding cells, and relate parsing.

Confidence note
The diff shows the full command implementation is new and centralized here; behavior does not need to be wrong for this to be a structural risk.

For Agents
Keep Commander registration thin. Extract import/export/verify runners and shared helpers from packages/cli/src/commands/okf.ts, preserving current flags and output JSON. Existing CLI tests should pass unchanged.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The OKF command is doing too much in one Commander callback

What's wrong
This is a structural regression: the new command is not a thin wrapper over @origintrail-official/dkg-okf; it owns several independent workflows and state machines inside one closure. That makes the control flow hard to scan, hides reusable pure logic, and encourages future OKF-specific branches to be bolted into an already busy path.

Example
Adding another import mode or changing the manifest format requires editing the same callback that also owns graph creation, peer invitations, chunk retry behavior, and final JSON output. That makes local changes harder to review and forces behavior to be validated through the large end-to-end CLI stub.

Suggested direction
Keep registerOkfCommand as a registration shell, then move pure option parsing, graph setup, per-concept import, private bulk import, export query adaptation, and verify counting into focused functions with explicit inputs/outputs.

For Agents
In packages/cli/src/commands/okf.ts, extract the import/export/verify action bodies into named workflow functions or a small okf-command support module. Preserve the current CLI flags and outputs; keep Commander registration as wiring only. Add focused unit coverage around manifest stage transitions and mode selection, leaving only thin smoke tests for Commander wiring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Split the OKF command before it becomes a monolithic CLI controller

What's wrong
This is described as a thin node-facing wrapper, but the new command file now owns several independent state machines and transport adapters. Even though it is still under 1k lines, it is close enough that the next feature will likely push it over the threshold, and the current structure already makes unrelated import modes share local variables and helper behavior.

Example
A reader entering at okf import has to follow the dry-run path, Context Graph existence branch, --private bulk branch, manifest parsing, 413 chunk splitting, --replace, and --share state progression inside one closure before reaching the final summary.

Suggested direction
Keep registerOkfCommand as registration glue and move the import/export/verify workflows into named functions with small typed option objects. The private bulk writer, per-concept WM/SWM importer, manifest store, and query result adapter are natural seams that would make the command easier to scan and safer to extend.

For Agents
Split packages/cli/src/commands/okf.ts into thin Commander registration plus focused action modules such as okf/import.ts, okf/export.ts, okf/verify.ts, and shared helpers for manifests, query binding normalization, and graph-quad conversion. Preserve CLI flags/output shape; run the existing OKF CLI subcommand tests after refactoring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Keep query response normalization in the API boundary, not the OKF command

What's wrong
These helpers are transport-boundary logic, but they live as local OKF command utilities. That leaks API quirks into a feature command and makes the CLI more likely to accumulate one-off query adapters as other commands hit the same shape mismatch.

Example
okf export and okf verify both depend on bindingsOf(result) plus cell(...); the next CLI command that consumes /api/query can easily reintroduce a different local interpretation of the same daemon response shape.

Suggested direction
Make ApiClient own the runtime response normalization so command modules receive one canonical binding shape. The OKF command should not need to know the daemon can omit the discriminator or return SPARQL-JSON cells.

For Agents
Move query result normalization into ApiClient.query or a shared CLI query helper, returning a stable typed binding model. Update okf export and okf verify to consume that helper and keep tests that cover bare-string and SPARQL-JSON object cells.

okfCmd
.command('export <contextGraphId> <outDir>')
.description('Serialise a Context Graph back into a conformant OKF bundle (clean inverse of import)')
.option('--sub-graph-name <name>', 'Sub-graph within the Context Graph')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Export reconstruction leaks OKF graph semantics into the CLI layer

What's wrong
This couples the CLI to the mapper's internal graph model and duplicates part of the exporter's responsibility outside the OKF package. If section nodes or concept-root rules change, the CLI export path becomes a second place to update.

Example
The CLI currently has to know that subjects containing /.well-known/genid/ are section nodes and that concept roots are subjects under iriBase with an rdf:type.

Suggested direction
Push Context Graph quads-to-BundleImport conversion behind an OKF package API so the package owns concept/root/section invariants.

For Agents
Move the quadsBySubject, concept filtering, and synthetic BundleImport construction into packages/okf/src/export.ts or a new pure helper such as bundleImportFromQuads(quads, { iriBase }). The CLI should only query, call the helper, and write files.

citations,
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The --relate rule grammar is owned by the wrong layer

What's wrong
The PR introduces TypeRelation as a package-level model, but the accepted textual representation lives inside the CLI command. That splits the model from its parser and makes the contract hard to reuse safely.

Example
Adding a programmatic caller or config-file import path would require reimplementing the same Dataset>Table=hasPart grammar to avoid import/verify drift.

Suggested direction
Make relation-rule parsing a canonical OKF package helper and call it from import and verify.

For Agents
Move relation-rule parsing and schema.org predicate expansion into packages/okf as parseTypeRelationRule(s) / parseTypeRelationRules(list). Keep Commander-specific presentation in the CLI.

const list = await client.listContextGraphs().catch(() => null);
const policy = list?.contextGraphs?.find((c: { id: string }) => c.id === contextGraphId)
?.accessPolicy;
if (policy === 'public' && !opts.allowPublicContextGraph) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Existing-public Context Graph refusal is untested

What's wrong
This is the privacy guard for the riskiest private-import case: bulk-writing supposedly private OKF substance into a Context Graph that already exists and is publicly readable. The current tests only prove that newly created private Context Graphs are created with private flags, so a regression in this refusal branch would not be caught.

Example
A regression that removed the policy === 'public' exit, or changed it to warn-and-continue, would still pass the current --private bulk-streams quads test because that test creates a fresh private CG. Add a test where contextGraphExists returns true, /api/context-graph/list returns { contextGraphs: [{ id: 'cg-public', accessPolicy: 'public' }] }, and dkg okf import ... --private exits non-zero without calling /api/shared-memory/write; also assert the override flag allows the write.

Suggested direction
Add an end-to-end CLI test for --private against an already-existing public Context Graph, including the no-write assertion and the explicit override case.

For Agents
In packages/cli/test/okf-subcommands.test.ts, extend the stub or install a one-off handler for an existing public Context Graph. Preserve the new --private bulk import behavior, but prove that public existing CGs are refused unless --allow-public-context-graph is present.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The privacy refusal path for existing public graphs is not verified

What's wrong
This is the highest-risk branch in the new command: it prevents a bulk private import from writing sensitive bundle contents into a public Context Graph. Without a regression test, the suite gives confidence for private graph creation but not for the equally important existing-graph safety check.

Example
A regression that removed the policy === 'public' check, ignored listContextGraphs(), or always proceeded when the graph already exists would still pass the current --private test suite, because the tested path only creates cg-priv from scratch.

Suggested direction
Add a regression test for --private against an existing public Context Graph, plus the explicit override case, so the privacy guard cannot silently disappear.

For Agents
Add CLI tests in packages/cli/test/okf-subcommands.test.ts with the stub reporting context-graph/exists=true and /api/context-graphs returning an existing graph with accessPolicy: 'public'. Assert dkg okf import --private exits non-zero, does not call /api/shared-memory/write, and that --allow-public-context-graph is the only path that proceeds.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The verify --list-missing diagnostic has no regression coverage

What's wrong
The new option is meant to turn an aggregate shortfall into actionable concept-level evidence, but the tests never execute that branch. Count verification can stay green while this diagnostic is broken or empty.

Example
A regression that stopped adding missingConcepts, failed to unwrap <urn:okf:x> values, or ignored the requested limit would not be caught; verify would still pass the current count-based tests.

Suggested direction
Add an end-to-end CLI test for verify --list-missing that proves missing concept IRIs are listed and the numeric limit is honored.

For Agents
Extend packages/cli/test/okf-subcommands.test.ts with a verify --list-missing 1 case. Stub count queries to produce a shortfall and the distinct-subject query to return only one expected concept, then assert non-zero exit, a bounded missingConcepts array, and normalized IRI comparison for both bare and angle-bracket cells.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Private import can proceed when public-policy verification fails

What's wrong
The command’s privacy guard is fail-open. A private bulk import is supposed to avoid exposing substance in public graphs, but policy lookup failures are downgraded to warnings and the write still happens.

Example
For an existing public Context Graph cg-pub, if /api/context-graph/list temporarily returns 500 or omits accessPolicy, dkg okf import ./bundle --context-graph-id cg-pub --private continues and bulk-writes the bundle into public SWM without requiring --allow-public-context-graph.

Suggested direction
Treat an unknown access policy as unsafe for --private writes, or require the explicit override for any unconfirmed policy.

Confidence note
This depends on listContextGraphs() failing or returning no policy for an existing public graph, but the code explicitly catches that failure and proceeds.

For Agents
In packages/cli/src/commands/okf.ts, make the existing-CG --private preflight fail closed when policy lookup fails or does not prove allowList/ownerOnly, unless --allow-public-context-graph is present. Add a stubbed CLI test where exists=true and list fails/omits policy and assert no shared-memory write occurs.

iriByConceptId: Object.fromEntries(concepts.map((c) => [c.conceptId, c.iri])),
concepts,
reservedSkipped: [],
quads: concepts.flatMap((c) => c.quads),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nit: The command shadows an imported RDF constant with a duplicate local constant

Why it matters
This weakens the single-source-of-truth story for vocabulary constants in a file that already imports the canonical value.

Suggestion
Delete the local RDF_TYPE declaration and use the imported constant consistently.

…url-sanitization)

CodeQL js/incomplete-url-substring-sanitization (high) flagged
`q.object.startsWith('https://bigquery.googleapis.com')` in bundle.test.ts — a
host-prefix check an arbitrary host could follow. Assert the full citation URL
instead; CodeQL-clean and a stronger assertion. No production code involved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
// count would let unrelated pre-existing triples mask a real shortfall
// (or inflate it) and report "complete" when concepts are actually missing.
const sparql =
`SELECT (COUNT(*) AS ?c) WHERE { GRAPH ?g { ?s <${predicate}> ?o ` +

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: okf verify can pass using unrelated same-prefix triples

What's wrong
The verifier only scopes actual counts to the IRI prefix. The default prefix urn:okf: is shared across bundles, and stale or unrelated OKF data in the same Context Graph can satisfy the aggregate counts. That makes the completeness gate report success while the requested bundle is missing.

Example
Bundle A expects two rdf:type and two schema:name triples under the default urn:okf: base. If the Context Graph contains only Bundle B with two other urn:okf:* concepts carrying those same predicates, verify returns actual: 2 for each predicate and complete: true even though none of Bundle A was imported.

Suggested direction
Scope verification to the current bundle’s expected subjects/triples rather than the whole iriBase prefix.

For Agents
Change verify to compare exact expected triples or at least count per expected concept IRI/predicate using a VALUES set from imported.concepts. The optional missing-concepts logic should either feed the completeness decision or be replaced by exact triple checks. Add a test with unrelated same-prefix OKF triples masking a missing bundle.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Verify counts duplicate expected quads that RDF storage deduplicates

What's wrong
okf verify compares multiset mapper output to set-like RDF query counts. Duplicate source values can produce false shortfalls and fail pipelines even when no imported RDF triple is missing.

Example
A conformant concept with tags: [bitcoin, bitcoin] maps to two identical schema:keywords "bitcoin" quads. After import the RDF store has one triple, but okf verify expects two and reports a missing triple even though the graph contains the complete RDF content.

Suggested direction
Deduplicate imported quads by full RDF term identity before computing expected per-predicate counts.

For Agents
In packages/cli/src/commands/okf.ts verify mode, build expected counts from a Set keyed by subject/predicate/object/graph before grouping by predicate. Add a verify test with duplicate frontmatter values proving complete graphs do not fail the gate.

};

let lastTick = started;
for (let ci = chunksDone; ci < totalChunks; ci++) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Completed private manifests block the documented repair rerun

What's wrong
The private import’s resumability manifest is treated as authoritative even after a full run. That conflicts with the verifier’s documented recovery path, which relies on rerunning the same idempotent import to rewrite dropped triples. With a complete manifest present, the rerun skips every chunk and cannot repair the graph.

Example
A private import writes one chunk and records { chunksDone: 1, totalChunks: 1 }. If the node later loses or failed to persist one triple, okf verify reports a shortfall and tells the user to rerun import. The rerun loads chunksDone = 1, skips the loop, writes 0 triples, and the shortfall remains unless the operator manually deletes the manifest.

Suggested direction
Do not let a completed manifest suppress an intentional idempotent replay, or make the repair path explicit.

For Agents
In the private import manifest flow, distinguish resume-after-interruption from repair/replay. For example, remove or mark the manifest complete after success, add an explicit --resume/--repair behavior, or reset chunksDone when the user reruns a completed import. Add a regression test where a completed manifest exists and a repair rerun still writes chunks or the CLI clearly refuses with instructions.

codeSpanLinks: [],
citations: [],
}));
const imported: BundleImport = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Do not manufacture a fake BundleImport in the CLI

What's wrong
This leaks the OKF package's internal import model into the CLI and weakens the type boundary. BundleImport represents the result of parsing an OKF bundle, but export from a Context Graph has a different source and cannot honestly populate all of that state.

Example
The object created at lines 644-651 looks like a real import result, but fields such as resolvedLinks: [] and warnings: [] are placeholders. A future maintainer could reasonably rely on those fields during export and accidentally read manufactured state instead of data from the graph.

Suggested direction
Move the graph-quads-to-export-model adapter into @origintrail-official/dkg-okf, or narrow exportBundle to accept the concept/quads projection it actually needs. The CLI should pass raw queried quads or a truthful export-specific model, not construct a partial import result by hand.

For Agents
Look at packages/okf/src/export.ts and packages/cli/src/commands/okf.ts. Introduce a narrower export input, e.g. exportConceptQuads(concepts, opts) or bundleImportFromQuads(...) inside the OKF package. Preserve exported file content and path traversal checks; prove with the existing export tests plus one test that the CLI no longer fabricates diagnostic fields.

const quads: Quad[] = [];
for (const doc of docs) {
const mapping = mapConcept(doc, iriByConceptId[doc.conceptId], exists, opts);
if (typeRelations.length > 0) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Model typed edge predicates instead of post-processing mentions quads

What's wrong
The typeRelations abstraction is only partially integrated. Import presents relation predicates as an arbitrary open set, but the mapper implements them as a mutation pass over quads, and export recognizes only one typed predicate. That split makes the relation model brittle and easy to extend incorrectly.

Example
A user can import with --relate Dataset>Table=https://example.org/contains. The mapper accepts that arbitrary predicate, but export's edge reconstruction only knows schema:mentions and schema:hasPart, so the relation model is open in one direction and hard-coded in another.

Suggested direction
Make typed edges a first-class part of the mapping/export model rather than rewriting generic mentions quads and hard-coding one special predicate in export. The same edge-predicate policy should drive import, verify expectations, and export reconstruction.

Confidence note
This is a design/abstraction concern visible from the diff; the exact desired export behavior for arbitrary custom relation predicates may need product confirmation.

For Agents
Unify relation handling in packages/okf/src/bundle.ts and packages/okf/src/export.ts. Model edge predicates explicitly, such as an edgePredicateResolver during import plus an edgePredicates set during export, or treat any IRI object under the OKF IRI base as a concept edge after excluding known frontmatter predicates. Preserve default schema:mentions behavior and existing hasPart tests; add a custom-predicate round-trip test.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Emit typed link predicates directly instead of rewriting quads after mapping

What's wrong
The current implementation spreads one concept across two phases: mapping.ts owns link discovery and edge emission, while bundle.ts later searches for a specific default predicate and mutates the result. That creates a non-obvious invariant that every typeable edge must first look like schema:mentions and have an object under iriBase.

Example
The typed-edge path first creates a schema:mentions quad in mapConcept, then importBundle rewrites that same quad to schema:hasPart if the source/target type pair matches. Any future edge source must remember this second pass instead of having edge predicate selection live where the edge is created.

Suggested direction
Give mapConcept the information needed to choose the edge predicate at addEdge time, or return structured edge records that are rendered to quads once. That removes the hidden post-processing pass and keeps link semantics in one place.

For Agents
Move typed edge selection into the mapping boundary: build the concept/type index in bundle.ts, pass an edgePredicate(fromConceptId, targetConceptId) resolver into mapConcept, and have addEdge emit the final predicate directly. Preserve default schema:mentions behavior and existing --relate tests.

…oncept-path validation

Addresses the otReviewAgent findings on #1388:

🔴 dkg-okf was missing from the runtime build graph. The CLI imports it at startup, so
   a release/auto-update build (build:runtime:packages) would omit packages/okf/dist and
   the CLI could fail to start for ANY command. Added dkg-okf to build:runtime:packages
   and to the cli tsconfig references.

🔴 --private bulk import silently dropped --sub-graph-name — data landed in the root CG
   SWM instead of the requested sub-graph, breaking isolation + scoped export/verify.
   ApiClient.sharedMemoryWrite now serializes subGraphName; the private path forwards it;
   the resumability manifest records it (so a resume can't mix root/sub-graph). +test.

🔴 Invalid concept file paths were minted into RDF subjects — `bad name.md` →
   `urn:okf:bad name` (a space-bearing IRI → invalid N-Quads / write failures).
   validateBundle now hard-errors invalid concept path segments, and importBundle skips
   them with an `invalid-path` warning instead of emitting malformed RDF. +test.

🟡 Added a test for the existing-PUBLIC-CG --private refusal (+ --allow-public-context-graph
   override) — previously only new-private-CG creation was covered.

🔵 Removed the local RDF_TYPE that shadowed the imported constant.

okf 78 / okf-subcommands 14 green; build:packages 21/21; build:runtime:packages includes okf.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@branarakic

Copy link
Copy Markdown
Contributor Author

Review feedback addressed — head 0b997e4

Real findings fixed (with tests):

  • 🔴 Runtime build graphdkg-okf was missing from build:runtime:packages (and the cli tsconfig references). Since the CLI imports it at startup, a release/auto-update build would omit packages/okf/dist and the CLI could fail to start for any command. Wired it in.
  • 🔴 --private dropped --sub-graph-name — bulk writes landed in the root CG SWM instead of the requested sub-graph. ApiClient.sharedMemoryWrite now serializes subGraphName, the private path forwards it, and the resumability manifest records it (so a resume can't mix root/sub-graph). New test asserts the write body + manifest carry subGraphName.
  • 🔴 Unvalidated concept pathsbad name.mdurn:okf:bad name (malformed IRI → invalid N-Quads / write failures). validateBundle now hard-errors invalid path segments; importBundle skips them with an invalid-path warning instead of emitting bad RDF. New fixture/test.
  • 🟡 Existing-public-CG refusal — added the missing test (refuses --private into an existing public CG; --allow-public-context-graph overrides; asserts no shared-memory/write on refusal).
  • 🔵 Removed the shadowed local RDF_TYPE.
  • (CodeQL incomplete-url-substring-sanitization in bundle.test.ts already fixed in b0373a9fd.)

okf 78 / okf-subcommands 14 green; build:packages 21/21; build:runtime:packages now builds okf.

Deferred — structural refactors (behavior unchanged + covered by the suite; lower-risk as a focused follow-up than mid-merge):

  • 🟡 Split the okf command into a thin registerOkfCommand + import/export/verify runners.
  • 🟡 Move the export quads→BundleImport reconstruction into the okf package (bundleImportFromQuads).
  • 🟡 Move --relate rule parsing into the okf package so import + verify share one grammar.

🤖 Generated with Claude Code

expect(okOut.complete).toBe(true);
expect(okOut.totalMissingTriples).toBe(0);

// SCOPING CONTRACT: every COUNT must be filtered to subjects under the bundle's

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Verify coverage can pass while same-prefix triples mask missing bundle data

What's wrong
The test gives false confidence for the completeness gate: it validates that counts are scoped by IRI prefix, but not that they are scoped to the actual concepts in the bundle. Same-prefix unrelated data is the realistic masking case for the default OKF IRI base.

Example
Bundle expects names for urn:okf:x and urn:okf:y; the graph has urn:okf:x plus unrelated urn:okf:z, but is missing urn:okf:y. A stubbed COUNT of 2 makes complete:true, so the current test would not catch this masking case.

Suggested direction
Add a regression case with extra urn:okf: triples that are not part of the bundle and assert verify still fails when an expected concept or predicate is missing.

Confidence note
This is based on the test and the verify command shape in the diff; it should be confirmed against the intended semantics for Context Graphs that contain multiple OKF imports sharing the default urn:okf: base.

For Agents
In packages/cli/test/okf-subcommands.test.ts, extend the verify coverage to simulate an unrelated same-prefix concept in the graph while one expected concept is absent. The test should prove dkg okf verify reports complete:false and, with --list-missing, identifies the absent expected IRI rather than relying only on prefix-scoped aggregate counts.

const citations: string[] = [];
const extras: Record<string, unknown> = {};

for (const q of c.quads) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Collapse the duplicated OKF frontmatter mapping into one bidirectional model

What's wrong
The importer/exporter boundary relies on duplicated predicate knowledge. This makes the mapping look simple locally, but the actual contract is scattered, so future mapping changes have to be remembered in multiple places.

Example
Adding a first-class field like license would require coordinated edits in at least frontmatterQuads, reconstructConcept, and KNOWN_PREDICATES; missing one silently changes round-trip behavior or causes the field to fall into the producer-extra path.

Suggested direction
Define the canonical OKF fields once, then have import and export use that shared definition. This would make the inverse mapping explicit and remove the need to keep KNOWN_PREDICATES manually synchronized with two switches.

Confidence note
This is a structural maintainability issue rather than a behavior defect; the current tests can still pass because the duplicated tables are aligned today.

For Agents
Introduce a shared mapping table for canonical frontmatter fields with encode/decode functions and multiplicity rules. Generate the import switch/export switch/known-predicate set from that table or replace the switches with table dispatch. Preserve producer-defined extra key behavior and round-trip tests.

await rm(outDir, { recursive: true, force: true });
});

it('--relate types dataset→table edges as hasPart (dry-run, deterministic)', async () => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: --relate tests miss the documented full-IRI predicate path

What's wrong
The changed CLI exposes a user-facing contract for custom relation predicates, but the tests only cover the shorthand schema.org term. That leaves the custom-vocabulary path unverified, even though it is the risky part of the parser contract.

Example
A regression where --relate Dataset>Table=urn:rel:contains serializes as <http://schema.org/urn:rel:contains> instead of <urn:rel:contains> would still pass the current tests because they only assert the shorthand hasPart path.

Suggested direction
Add a CLI-level regression test for a custom full IRI predicate, preferably one without ://, so the parser contract is verified rather than only the schema.org shorthand case.

For Agents
Look at packages/cli/src/commands/okf.ts parseRelateRules and the CLI dry-run tests in packages/cli/test/okf-subcommands.test.ts. Add a test that runs dkg okf import --dry-run --print-nquads --relate Dataset>Table=urn:rel:contains and proves the N-Quads use the exact custom predicate, while preserving the existing schema.org shorthand behavior.

@branarakic branarakic merged commit c6cf49a into main Jun 30, 2026
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants