-
Notifications
You must be signed in to change notification settings - Fork 9
feat(okf): Google OKF → DKG integration (OKF-only, carved from #1331) #1388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| # OKF | ||
|
|
||
| Deterministic Google Open Knowledge Format (OKF) → DKG mapper. Turns a portable | ||
| OKF bundle (Markdown + YAML frontmatter + untyped cross-links) into owned, | ||
| verifiable RDF Knowledge Assets, reconstructing the bundle's cross-concept link | ||
| graph. Pure, no LLM, no network: the same bundle always yields identical triples | ||
| and IRIs. The `dkg okf` CLI command is a thin wrapper over this package. | ||
|
|
||
| The framing: OKF standardises *how* knowledge is written and exchanged but ships | ||
| **no** verification, provenance or ownership layer (OKF SPEC §1, §10). The DKG | ||
| supplies exactly that. This package is the bridge — the trust-and-permanence | ||
| backend for OKF. | ||
|
|
||
| ## Language | ||
|
|
||
| **Bundle**: | ||
| A directory tree of UTF-8 Markdown files; the unit of distribution (OKF §3). Fed | ||
| to the mapper as an in-memory `BundleFile[]` (`{ path, content }`, POSIX paths). | ||
| `loadBundleDir` is the only filesystem surface; the mapper itself is I/O-free. | ||
|
|
||
| **Concept**: | ||
| One non-reserved `.md` file = YAML frontmatter + Markdown body (OKF §4). Each | ||
| concept becomes exactly one Knowledge Asset. Reserved `index.md` / `log.md` | ||
| files are **not** concepts and are never minted as KAs (OKF §3.1, §6, §7). | ||
|
|
||
| **Concept ID**: | ||
| The file's bundle-relative path with `.md` removed (OKF §2) — e.g. | ||
| `tables/blocks`. The path *is* the concept's identity. Segment validation agrees | ||
| byte-for-byte with the reference agent's `paths.py` (`[A-Za-z0-9_][A-Za-z0-9_.\-]*`). | ||
|
|
||
| **IRI**: | ||
| The Knowledge Asset subject IRI, derived deterministically from the concept ID: | ||
| `urn:okf:<conceptId>` (configurable base). Same bundle ⇒ same IRIs. This is the | ||
| RDF subject; the on-chain UAL is assigned by the node at publish time (it is not | ||
| the same thing — see Flagged ambiguities). | ||
|
|
||
| **Link**: | ||
| A standard Markdown link `[text](path)` from one concept to another (OKF §5). | ||
| Resolved against the bundle (absolute `/abs`, relative `./`, parent `../`, | ||
| bare-sibling, extension-less forms) into an **untyped directed edge** | ||
| (`schema:mentions`). The kind of relationship lives in prose, not the link | ||
| (OKF §5.3) — the mapper never infers FK/join types. Broken links are warnings, | ||
| never errors (OKF §5.3, §9). | ||
|
|
||
| **Citation**: | ||
| A link (usually an external URL) under a `# Citations` heading (OKF §8), backing | ||
| a claim. Mapped to `schema:citation`, semantically distinct from concept edges. | ||
|
|
||
| **Memory layers** (where imported assets live): | ||
| - **WM** (Working Memory): private to one agent, free, reversible. The import | ||
| default. | ||
| - **SWM** (Shared Working Memory): team-visible, gossip-replicated, free, | ||
| TTL-bounded. Reached with `--share` (finalize + advance). | ||
| - **VM** (Verifiable Memory): on-chain, permanent, costs TRAC. **Never** written | ||
| by this package; promotion is a separate, explicitly-gated operator step. | ||
|
|
||
| ## Relationships | ||
|
|
||
| - Bundle → many Concepts (+ reserved files, skipped). Pass 1 indexes the bundle | ||
| and builds the `conceptId → IRI` map; Pass 2 maps each concept and resolves its | ||
| links against that map (so an edge only forms to a concept that exists). | ||
| - Concept → one Knowledge Asset (one subject IRI) → many quads (frontmatter | ||
| triples + body sections + untyped edges + citations). | ||
| - Frontmatter key → RDF predicate via the locked table (ADR 0005). `type` is the | ||
| only required key (OKF §9); everything else degrades gracefully when absent. | ||
| - Link → `schema:mentions` edge **iff** its resolved target is a concept in the | ||
| bundle; otherwise it is a broken-link warning (target may be not-yet-written | ||
| knowledge) or, for external URLs, simply ignored as a non-edge. | ||
| - **Opt-in typed edges (`typeRelations` / `--relate`).** By default every edge is | ||
| `schema:mentions` (zero interpretation, faithful to OKF §5.3's untyped links). | ||
| A caller may supply deterministic `(fromType, toType) → predicate` rules to | ||
| type edges by their endpoints' OKF `type` — e.g. `BigQuery Dataset → BigQuery | ||
| Table = schema:hasPart` (containment) while `Table → Table` stays `mentions`. | ||
| This is byte-stable (types come straight from frontmatter, no prose, no LLM) | ||
| and **off by default** so the purity guarantee holds unless explicitly opted in. | ||
| Caveat: the rule is endpoint-type-based, so it cannot distinguish a same-dataset | ||
| containment link from a cross-dataset reference of the same type pair — use it | ||
| where that distinction doesn't apply, or leave the default. | ||
| - **Round-trip is graph-faithful, not byte-faithful, by design.** `import → | ||
| export → import` reproduces an equivalent *semantic graph*, not the original | ||
| bytes: free-form prose isn't recoverable from triples, so export regenerates | ||
| bodies structurally, and a typed edge (e.g. `hasPart`) exports as a plain | ||
| (untyped) OKF link because OKF can't express the relation type. This is a | ||
| deliberate choice, not a defect; a future enhancement may have export *add* | ||
| provenance (UAL / seal) when serialising from a published graph. | ||
|
|
||
| ## Flagged ambiguities | ||
|
|
||
| - **Reuse vs. fork of the Markdown extractor.** The node's `markdown-extractor.ts` | ||
| is regex-based and resolves only `[[wikilinks]]`, not OKF's `[text](path)` | ||
| links; importing it from `packages/cli` would also create a `cli → okf → cli` | ||
| dependency cycle. So we **converge on its predicate vocabulary** (same | ||
| `schema:*` / `dkg:hasSection` IRIs, pinned by a test) but use a **real Markdown | ||
| AST** (`mdast-util-from-markdown`) for link/section/citation extraction — which | ||
| OKF §2 mandates and which is what lets us honour the in-code-span rule below. | ||
| - **Links inside inline code spans.** `outputs.md` writes its only two concept | ||
| links inside backticks: `` `[transactions](transactions.md)` ``. CommonMark | ||
| treats code-span content as literal text, so **by default these are NOT edges** | ||
| (the mechanism-first answer). They are recorded as `codeSpanLinks` and surfaced | ||
| as warnings. `--include-code-span-links` flips the policy; both behaviours are | ||
| tested. | ||
| - **IRI derivation / UAL.** Concept subject IRIs are `urn:okf:<conceptId>`, a pure | ||
| function of the concept ID. The on-chain UAL (`did:dkg:<chain>/<ka>/<n>`) is | ||
| assigned by the node at VM publish (RFC-43 pre-knowable UALs are still draft) — | ||
| do not conflate the two. WM/SWM data carries no on-chain verification. | ||
| - **`type` normalisation.** A bare `type` value is PascalCased into the schema.org | ||
| namespace (`BigQuery Dataset` → `http://schema.org/BigQueryDataset`); a full IRI | ||
| `type` is used unchanged. Round-trips losslessly because PascalCase of the local | ||
| name is idempotent. | ||
| - **`timestamp` → `schema:dateModified`.** OKF defines `timestamp` as last-modified | ||
| time, so we map it to `schema:dateModified` (typed `xsd:dateTime`) rather than | ||
| the extractor's naive `schema:timestamp` slug — a deliberate semantic choice. | ||
| - **`resource` → `schema:url`.** Chosen over `dcterms:source`; documented in ADR 0005. | ||
| - **Citations, two styles.** Both numbered (`[1] [text](url)`) and bare-bullet | ||
| (`- https://…`) forms are parsed leniently; deduplicated by URL. | ||
| - **Folder hierarchy.** `schema:isPartOf` from directory structure is **off by | ||
| default** — directories are not concepts, and minting them as graph nodes would | ||
| muddy the concept graph. Available via `emitFolderHierarchy`. | ||
| - **Producer-defined keys** are always preserved (camelCased into schema.org), | ||
| never dropped or rejected (OKF §4.1, §9). | ||
| - **Conformance is permissive.** Only two rules make a bundle non-conformant | ||
| (unparseable frontmatter; missing non-empty `type`). Missing optionals, unknown | ||
| types/keys, broken links and missing `index.md` are tolerated (OKF §9). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,226 @@ | ||
| # OKF ↔ DKG live demo runbook | ||
|
|
||
| End-to-end demonstration: the same portable Bitcoin Markdown, turned into | ||
| **owned, shareable** Knowledge Assets on a DKG **mainnet** node, imported into a | ||
| Context Graph, **shared through Shared Working Memory**, verified by a second | ||
| peer, and reasoned over by a Hermes agent. | ||
|
|
||
| **Cost model — read this first.** | ||
| - Steps 1–7 are the default demo and **spend nothing**. Working Memory (WM) and | ||
| Shared Working Memory (SWM) are free. | ||
| - Step 8 — **Verifiable Memory (VM) promotion — is deferred.** It spends real | ||
| **TRAC + native gas, irreversibly**, on mainnet (no faucet). It is *not* part | ||
| of the default run; the operator triggers it deliberately, after confirming | ||
| funds. It is documented here, clearly marked, and only its UALs/txHashes are | ||
| recorded *if and when* it is actually run. | ||
| - This runbook is **operator-run, never CI.** The free offline correctness gate | ||
| (`pnpm --filter @origintrail-official/dkg-okf test`) is the CI gate; it never | ||
| touches a node and is green before any of this is attempted. | ||
|
|
||
| Throughout: **never imply on-chain verification for WM/SWM data.** State which | ||
| memory layer each piece of evidence lives in. | ||
|
|
||
| Conventions: `$CG=okf-crypto-bitcoin`, `$BUNDLE=packages/okf/test/fixtures/crypto_bitcoin`. | ||
|
|
||
| --- | ||
|
|
||
| ## 0. Offline correctness gate (free, no node) | ||
|
|
||
| ```bash | ||
| pnpm --filter @origintrail-official/dkg-okf test # 60+ golden/edge/round-trip tests | ||
| dkg okf import $BUNDLE --dry-run --print-nquads # deterministic mapping, no node | ||
| ``` | ||
|
|
||
| Expect: 5 Knowledge Assets, 3 reserved `index.md` skipped, the reconstructed edge | ||
| graph, both citation styles, byte-stable N-Quads. Run it twice — identical output. | ||
|
|
||
| ## 1. Launch the node on mainnet | ||
|
|
||
| ```bash | ||
| dkg init # choose a mainnet blockchain (e.g. mainnet-base / mainnet-gnosis / mainnet-neuroweb) | ||
| dkg start | ||
| dkg status # daemon PID, version, listening port | ||
| dkg doctor # health checks | ||
| ``` | ||
|
|
||
| **Verify the node is actually on mainnet, not testnet/devnet, before going | ||
| further** — confirm the active network/chain in the printed config / `dkg doctor` | ||
| output. `edge` is the default role. | ||
|
|
||
| ## 2. Attach a Hermes agent | ||
|
|
||
| ```bash | ||
| dkg hermes setup # configure the Hermes-runtime agent bound to this node | ||
| dkg hermes # run it | ||
| ``` | ||
|
|
||
| Confirm the acting agent identity (this is the agent the shared bundle is | ||
| *proposed to* in step 6): | ||
|
|
||
| ```bash | ||
| curl -s localhost:<apiPort>/api/agent/identity # → { agentAddress, agentDid, name, framework, peerId } | ||
| ``` | ||
|
|
||
| Record the `agentAddress`. | ||
|
|
||
| ## 3. Import the bundle into a Context Graph (Working Memory — free) | ||
|
|
||
| ```bash | ||
| dkg okf import $BUNDLE --context-graph-id $CG --create-context-graph | ||
| ``` | ||
|
|
||
| Import defaults to **Working Memory** — free, private, reversible. Expect the | ||
| summary: `5 concepts, 3 reserved skipped, 101 triples, 11 links resolved, 0 | ||
| broken, 10 citations`, plus the deterministic `urn:okf:*` IRIs and the | ||
| `memoryLayer: "WM"` note. | ||
|
|
||
| Confirm the 5 assets and the reconstructed edges are present **in WM** via SPARQL | ||
| (`/api/query`, `view: working-memory`, `agentAddress` required for WM): | ||
|
|
||
| ```bash | ||
| curl -s localhost:<apiPort>/api/query -H 'content-type: application/json' -d '{ | ||
| "contextGraphId":"okf-crypto-bitcoin", | ||
| "view":"working-memory", | ||
| "agentAddress":"<agentAddress>", | ||
| "sparql":"SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }" | ||
| }' | ||
| ``` | ||
|
|
||
| Expect the 11 `schema:mentions` edges (dataset→4 tables, transactions→4, | ||
| inputs→3). `tables/outputs` has none — its only links sit inside backticks | ||
| (CommonMark: literal text). All evidence here is **WM (private, free, no on-chain | ||
| verification).** | ||
|
|
||
| ## 4. Finalize and share to Shared Working Memory (free) | ||
|
|
||
| ```bash | ||
| dkg okf import $BUNDLE --context-graph-id $CG --share | ||
| ``` | ||
|
|
||
| `--share` seals each asset (`wm/finalize`) and advances it (`swm/share`, | ||
| `entities: "all"`). SWM is free, gossip-replicated and team-visible — this is the | ||
| moment the Bitcoin bundle becomes a **shared Context Graph** other agents can | ||
| reach. The assets are now sealed and *publish-ready*, but **publishing waits** | ||
| (step 8). | ||
|
|
||
| Confirm the same assets/edges in the `shared-working-memory` view: | ||
|
|
||
| ```bash | ||
| # 11 cross-table edges | ||
| dkg query $CG -q 'SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }' --include-shared-memory | ||
|
|
||
| # Exactly 5 concepts. Count subjects that have an rdf:type — only the 5 concepts | ||
| # do. A naive `STRSTARTS(STR(?s),"urn:okf:")` count returns ~19 because the | ||
| # daemon skolemises each concept's dkg:hasSection blank nodes into | ||
| # `urn:okf:.../.well-known/genid/...` subjects, which also match the prefix. | ||
| dkg query $CG -q 'SELECT (COUNT(DISTINCT ?s) AS ?n) WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?t FILTER(STRSTARTS(STR(?s), "urn:okf:")) }' --include-shared-memory | ||
| ``` | ||
|
|
||
| Evidence here is **SWM (shared, free, TTL-bounded, no on-chain verification).** | ||
|
|
||
| ## 5. Issue a join invitation and have a second peer verify it | ||
|
|
||
| ```bash | ||
| # Curator side — invite a peer (the V10 invite is the pair <contextGraphId>\n<curatorPeerId>): | ||
| dkg context-graph invite $CG <joiningPeerId> | ||
| # For a curated graph, allow the joining agent: | ||
| dkg context-graph add-agent $CG <joiningAgentAddress> | ||
| # …or have the joiner request-join and the curator approve-join. | ||
| ``` | ||
|
|
||
| From a **second node/agent**: | ||
|
|
||
| ```bash | ||
| dkg subscribe okf-crypto-bitcoin # subscribe + catch up | ||
| dkg query okf-crypto-bitcoin -q 'SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }' --include-shared-memory | ||
| ``` | ||
|
|
||
| Record the invite and the second peer's query result — the shared Context Graph | ||
| is independently checkable by another peer, all in **free SWM**. | ||
|
|
||
| ## 6. Hermes agent reasons over the shared knowledge | ||
|
|
||
| Have the Hermes agent answer a natural-language question through its `dkg_*` | ||
| tools over the `shared-working-memory` view, e.g. *"what does the `transactions` | ||
| table reference?"*: | ||
|
|
||
| ```sparql | ||
| SELECT ?o WHERE { <urn:okf:tables/transactions> <http://schema.org/mentions> ?o } | ||
| ``` | ||
|
|
||
| Expect the four targets: `urn:okf:datasets/crypto_bitcoin`, `urn:okf:tables/blocks`, | ||
| `urn:okf:tables/inputs`, `urn:okf:tables/outputs`. Capture the transcript — an | ||
| agent consuming OKF-derived, provenance-bearing knowledge from the shared graph. | ||
|
|
||
| ## 7. Recreated, visibly | ||
|
|
||
| Regenerate the graph from the shared Context Graph and compare it to Google's own | ||
| `viz.html`: | ||
|
|
||
| ```bash | ||
| dkg okf export okf-crypto-bitcoin ./out --view shared-working-memory | ||
| ``` | ||
|
|
||
| `export` is the clean inverse of `import` (graph-faithful). Confirm the | ||
| regenerated bundle's `schema:mentions` structure matches the dataset→tables and | ||
| cross-table edges in the bundle's own | ||
| `okf/bundles/crypto_bitcoin/viz.html`. (`packages/graph-viz` can render the graph | ||
| view directly.) | ||
|
|
||
| **At this point the deliverable is complete: a shared, peer-verified, | ||
| agent-queried Context Graph in SWM. Nothing has been spent.** | ||
|
|
||
| --- | ||
|
|
||
| ## 8. VM promotion — staged, but it waits (DEFERRED; real TRAC + gas) | ||
|
|
||
| > **Do not run this as part of the demo.** The assets are sealed and | ||
| > publish-ready in SWM, so promotion to Verifiable Memory is one step away — held | ||
| > until the operator deliberately chooses to spend. | ||
|
|
||
| When (and only when) the operator chooses to promote: | ||
|
|
||
| 1. **Confirm funding first** — on mainnet there is **no faucet**: | ||
| ```bash | ||
| dkg wallet # or: curl -s localhost:<apiPort>/api/wallets/balances | ||
| ``` | ||
| Abort if TRAC + native gas are insufficient. | ||
| 2. **Publish ONE asset first** to observe real cost and validate the on-chain | ||
| path (the dataset). The first publish transparently registers the Context | ||
| Graph on-chain — expect gas/TRAC: | ||
| ```bash | ||
| # vm/publish for a single KA (gate behind explicit confirmation in your runbook). | ||
| # The KA name is the concept ID with '/' mapped to '__' (asset names can't contain '/'). | ||
| curl -s localhost:<apiPort>/api/knowledge-assets/datasets__crypto_bitcoin/vm/publish \ | ||
| -H 'content-type: application/json' -d '{"contextGraphId":"okf-crypto-bitcoin"}' | ||
| ``` | ||
| Record the returned UAL (`did:dkg:<chainId>/<kasAddress>/<number>`) and | ||
| `txHash`. | ||
| 3. **Then publish the rest** and re-verify via the `verifiable-memory` view. | ||
|
|
||
| | Asset | UAL | txHash | | ||
| |---|---|---| | ||
| | `datasets/crypto_bitcoin` | _(record if run)_ | _(record if run)_ | | ||
| | `tables/blocks` | | | | ||
| | `tables/transactions` | | | | ||
| | `tables/inputs` | | | | ||
| | `tables/outputs` | | | | ||
|
|
||
| Until promoted, the demo's deliverable is the **shared, peer-verified, | ||
| agent-queried** Context Graph in SWM. Only VM data carries on-chain verification; | ||
| WM/SWM data never does. | ||
|
|
||
| --- | ||
|
|
||
| ## Evidence log (fill in during the run) | ||
|
|
||
| - Node network/chain confirmed mainnet: ______ | ||
| - Hermes `agentAddress`: ______ | ||
| - Import summary (WM): 5 concepts / 101 triples / 11 edges / 0 broken / 10 citations | ||
| - WM SPARQL edge count: ______ | ||
| - SWM SPARQL edge count: ______ | ||
| - Join invitation: ______ | ||
| - Second peer query result: ______ | ||
| - Hermes agent transcript: ______ | ||
| - Regenerated graph vs `viz.html`: ______ | ||
| - (Deferred) VM UALs / txHashes: _(only if step 8 was run)_ |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Bug: The top-level OKF import is not covered by the runtime build graph
What's wrong
This adds a required startup dependency for the whole CLI, but the existing runtime build path still omits the new workspace package. That can make the packaged or auto-updated CLI fail to build or fail to start, even for unrelated commands.
Example
On a fresh source/runtime build that runs the existing
build:runtime:packagespath,packages/okf/dist/index.jsis not produced. Startingnode packages/cli/dist/cli.js statuscan fail before command dispatch becausecommands/okf.jsimports@origintrail-official/dkg-okf.Suggested direction
Wire
@origintrail-official/dkg-okfinto the same build paths asdkg-epcisbefore loading the command from the CLI entrypoint.For Agents
Add the new OKF package to the CLI build graph: include it in
packages/cli/tsconfig.jsonreferences/prebuild as needed and in the root runtime package build filter used by auto-update/release builds. Prove a clean runtime build can execute a non-OKF command anddkg okf --help.