OriginTrail · Zigoljube · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/docs/adr/0005-okf-rdf-mapping.md b/docs/adr/0005-okf-rdf-mapping.md
@@ -0,0 +1,112 @@
+# ADR 0005 - OKF → RDF mapping and extractor reuse-vs-fork
+
+- **Status**: Accepted
+- **Date**: 2026-06-25
+- **Version**: v10
+- **Package**: `packages/okf` (`@origintrail-official/dkg-okf`)
+- **Related**: ADR 0002 (importer chunking contract), ADR 0003 (code-graph
+  ontology convergence), `packages/cli/src/extraction/markdown-extractor.ts`,
+  OKF SPEC v0.1 (`GoogleCloudPlatform/knowledge-catalog`, commit
+  `d44368c15e38e7c92481c5992e4f9b5b421a801d`)
+
+## Context
+
+Google's Open Knowledge Format (OKF) standardises *how* knowledge is written and
+exchanged — portable UTF-8 Markdown with YAML frontmatter and cross-links that
+form a graph — but deliberately ships **no** verification, provenance or
+ownership layer (OKF SPEC §1, §10). The DKG supplies exactly that. We want to
+ingest an OKF bundle as deterministic, owned, verifiable Knowledge Assets,
+reconstructing the bundle's cross-concept link graph, and surface it as a
+first-class integration.
+
+Two questions had to be settled before writing code:
+
+1. **Reuse or fork the existing Markdown extractor?** The node already ingests
+   Markdown deterministically via `markdown-extractor.ts`. Convergence — an OKF
+   import and a natively-ingested Markdown corpus yielding joinable graphs — is a
+   goal.
+2. **What is the exact OKF → RDF predicate mapping?** It must be deterministic
+   (no LLM), reconcile with the extractor's predicates, and honour OKF's
+   permissive consumer rules (§9).
+
+## Decision
+
+### Reuse the vocabulary, not the code; add a real Markdown AST
+
+The existing extractor is **regex-based** and resolves only `[[wikilinks]]` →
+`schema:mentions`. OKF cross-links are standard Markdown links `[text](path.md)`,
+which that extractor does not handle. Importing `extractFromMarkdown` from
+`packages/cli` into `packages/okf` would also create a `cli → okf → cli`
+dependency cycle (the CLI `okf` command depends on this package).
+
+Therefore:
+
+- **Converge on the extractor's predicate vocabulary** (`rdf:type`,
+  `schema:name`, `schema:description`, `schema:keywords`, `schema:mentions`,
+  `http://dkg.io/ontology/hasSection`), re-declaring the same literal IRIs in
+  `constants.ts` and pinning the convergence with a test. The object encoding
+  (raw IRIs without angle brackets; literals as `"…"` / `"…"^^<dt>`; blanks as
+  `_:…`) matches the node's `Quad` term form so output drops straight into
+  `/api/knowledge-assets/.../wm/write`.
+- **Use a real Markdown AST** (`mdast-util-from-markdown`) for link, section and
+  citation extraction, as OKF §2 requires — this is what lets the importer honour
+  CommonMark semantics for links inside inline code spans.
+- Keep `packages/okf` **runtime-standalone** (local `isSafeIri`, local `Quad`
+  type, no `dkg-core` runtime import) so the pure mapper is unit-testable in
+  isolation.
+
+### Locked OKF → RDF mapping
+
+| OKF frontmatter / construct | RDF predicate | Object | Notes |
+|---|---|---|---|
+| `type` (required) | `rdf:type` | full IRI as-is, else `http://schema.org/<PascalCase>` | `BigQuery Dataset` → `schema:BigQueryDataset` |
+| `title` | `schema:name` | literal | |
+| `description` | `schema:description` | literal | |
+| `tags` (list) | `schema:keywords` | one literal per tag | converges with the extractor's hashtag handling |
+| `timestamp` (ISO 8601) | `schema:dateModified` | literal `^^xsd:dateTime` | OKF defines it as last-modified |
+| `resource` (canonical URI) | `schema:url` | IRI | chosen over `dcterms:source`; absent for abstract concepts |
+| producer-defined keys | `http://schema.org/<camelCase>` | typed literal / IRI | **preserved, never dropped** (§4.1, §9) |
+| concept link (§5) | `schema:mentions` | target concept IRI | **one untyped directed edge** — no FK/join inference (§5.3) |
+| `# Citations` URL (§8) | `schema:citation` | URL IRI | numbered + bare-bullet styles; distinct from edges |
+| body headings (`# …`) | `dkg:hasSection` | section blank node + `schema:name` | OKF titles live in frontmatter, so body H1s are genuine sections |
+| folder hierarchy | `schema:isPartOf` | parent IRI | **off by default** (directories are not concepts) |
+
+### IRI derivation
+
+Concept subject IRI = `urn:okf:<conceptId>` (configurable base), a pure function
+of the concept ID. Same bundle ⇒ same IRIs. This is the RDF subject; the on-chain
+UAL is assigned by the node at VM publish (RFC-43 pre-knowable UALs are draft) and
+is a distinct identifier.
+
+### Edge cases (the judgement calls)
+
+- **Links inside inline code spans** are **not** edges by default (CommonMark:
+  code-span content is literal text). Recorded as `codeSpanLinks` + warned.
+  `--include-code-span-links` opts in. This is the only place the prompt's
+  illustrative acceptance list (`outputs → transactions, inputs`) and the
+  implementation diverge — by design, and documented here and in `CONTEXT.md`.
+- **Broken links** (resolved target not in the bundle) → warning, never fatal
+  (§5.3, §9); the target may be not-yet-written knowledge.
+- **Reserved files** (`index.md`, `log.md`) are never minted as KAs; `okf_version`
+  is read only from a root `index.md` (§11).
+- **Conformance** is permissive: only unparseable frontmatter or a missing
+  non-empty `type` make a bundle non-conformant (§9); all other irregularities
+  are tolerated.
+
+### Determinism
+
+The mapper is pure and LLM-free. Quads are serialised to canonical N-Quads
+(deduped + lexically sorted), so the same bundle yields byte-identical output and
+independent importers converge on the same graph (consistent with ADR 0002).
+
+## Consequences
+
+- An OKF import and a native Markdown import share predicates and therefore join
+  naturally in SPARQL.
+- The mapper is a clean, isolated, unit-tested unit; the CLI command and the node
+  are thin wrappers.
+- `export.ts` is the clean inverse (graph-faithful, not byte-faithful): bodies are
+  regenerated structurally, so round-trip equivalence is asserted over the
+  semantic (non-`hasSection`) quad set.
+- We do **not** infer typed FK/join semantics from untyped OKF links; surfacing
+  that would require a separate, clearly-labelled Layer-2 enrichment pass.
diff --git a/docs/integrations/okf.md b/docs/integrations/okf.md
@@ -0,0 +1,227 @@
+---
+status: current
+version: v10
+title: "OKF on the DKG: a trust-and-permanence backend for Google's Open Knowledge Format"
+---
+
+# OKF on the DKG: a trust-and-permanence backend for Google's Open Knowledge Format
+
+*The same portable OKF Markdown — now owned, verifiable, and shareable across agents.*
+
+## 1. The thesis: complementary, not competing
+
+Google's **Open Knowledge Format (OKF)** is a beautifully simple way to write and
+exchange knowledge: a bundle is just a directory tree of UTF-8 Markdown files,
+each with a little YAML frontmatter and ordinary Markdown links between them. The
+links form a graph; the format is human- and agent-readable; it travels as a git
+repo or a tarball. Crucially, OKF **deliberately ships no verification, provenance
+or ownership layer** (OKF SPEC §1, §10). That is a feature — it keeps the format
+portable — but it leaves a gap: nothing about an OKF bundle tells you *who* wrote
+it, whether it has been tampered with, or who owns it.
+
+The **OriginTrail Decentralized Knowledge Graph (DKG)** supplies exactly that
+missing half: cryptographically provenanced, owned, shareable Knowledge Assets,
+with a three-layer memory model (private → shared → on-chain). OKF and the DKG
+solve adjacent halves of one problem.
+
+This integration makes the DKG the **trust-and-permanence backend for OKF**. The
+one-line claim, made literally true in the code: *the same portable OKF Markdown,
+now owned, verifiable, and shareable across agents.*
+
+## 2. What the integration does, mechanically
+
+`dkg okf import <bundleDir>` ingests an OKF bundle into a DKG Context Graph as
+Knowledge Assets, reconstructing the bundle's cross-concept link graph. It is
+**pure and deterministic — no LLM** — so the same bundle always yields identical
+triples and IRIs, and independent importers converge on the same graph.
+
+It works in **two passes** (`packages/okf`):
+
+- **Pass 1 — index the bundle.** Walk the tree; separate concept files from
+  reserved `index.md` / `log.md` (which are *never* minted as Knowledge Assets,
+  OKF §3.1/§6/§7); parse each concept's frontmatter; derive a stable subject IRI
+  per concept **from its concept ID** (`tables/blocks` → `urn:okf:tables/blocks`);
+  build a `conceptId → IRI` map. Read `okf_version` from the root `index.md` if
+  present (§11).
+- **Pass 2 — extract + link.** For each concept, map frontmatter + body to RDF,
+  then resolve its Markdown links against the Pass-1 map into untyped directed
+  edges.
+
+A key design choice: rather than write a second Markdown parser, we **converge on
+the predicate vocabulary** the node's existing deterministic Markdown extractor
+already uses, so an OKF import and a natively-ingested Markdown corpus produce
+joinable graphs. But OKF cross-links are standard Markdown links `[text](path.md)`
+(the existing extractor only handles `[[wikilinks]]`), so link resolution is new
+work — and it uses a **real Markdown AST**, not a regex, which is what lets it
+honour CommonMark semantics (see the `outputs.md` edge case below).
+
+### The locked OKF → RDF mapping
+
+| OKF frontmatter / construct | RDF predicate | Notes |
+|---|---|---|
+| `type` (required) | `rdf:type` | full IRI as-is, else `http://schema.org/<PascalCase>` (`BigQuery Dataset` → `schema:BigQueryDataset`) |
+| `title` | `schema:name` | |
+| `description` | `schema:description` | |
+| `tags` (list) | `schema:keywords` | one triple per tag |
+| `timestamp` (ISO 8601) | `schema:dateModified` | typed `xsd:dateTime`; OKF defines it as last-modified |
+| `resource` (canonical URI) | `schema:url` | the underlying asset's URI; absent for abstract concepts |
+| producer-defined keys | `http://schema.org/<camelCase>` | **preserved, never dropped** (§4.1, §9) |
+| concept link (§5) | `schema:mentions` | **one untyped directed edge** |
+| `# Citations` URL (§8) | `schema:citation` | distinct from concept edges |
+| body headings | `dkg:hasSection` | |
+
+**Concept IDs become Knowledge Asset IRIs** deterministically: `urn:okf:<conceptId>`.
+This is the RDF subject. The on-chain UAL (`did:dkg:<chain>/<ka>/<n>`) is a
+*different* identifier, assigned by the node only when an asset is published to
+Verifiable Memory.
+
+**OKF's untyped cross-links become the relationship graph.** A link from concept A
+to concept B asserts *a* relationship; OKF §5.3 says the *kind* (foreign key,
+joins-with, depends-on) lives in the surrounding prose, not the link, and that
+graph consumers "treat all links as directed edges of an untyped relationship." So
+every resolved concept link maps to one `schema:mentions` edge. **We do not invent
+typed FK/join semantics** — that would be fabrication. What you get is exactly what
+OKF asserts: a directed, untyped relationship graph.
+
+Be concrete and honest about what is *not* inferred: the integration reconstructs
+the link graph, not its semantics. If the prose says "join `inputs` to
+`transactions` on `transaction_hash`", the importer records that `inputs` mentions
+`transactions` — it does not synthesise a typed `joinsOn` predicate. Lifting
+untyped edges into typed relationships is a separate, clearly-labelled Layer-2
+concern, kept out of the deterministic importer.
+
+## 3. The lifecycle, end to end, with the Bitcoin bundle
+
+The proof artifact is Google's `crypto_bitcoin` OKF bundle — the public
+`bigquery-public-data.crypto_bitcoin` dataset (blocks, transactions, inputs,
+outputs) produced by the open-source `bitcoin-etl` pipeline. Its value is precisely
+the **cross-table foreign-key relationships expressed in prose** — the inter-concept
+link graph the importer must reconstruct.
+
+The three memory layers are the backbone of the story:
+
+- **Working Memory (WM)** — private, free, reversible.
+- **Shared Working Memory (SWM)** — shared, free, gossip-replicated, TTL-bounded.
+- **Verifiable Memory (VM)** — on-chain, permanent, costs real TRAC.
+
+### Launch a mainnet node and attach a Hermes agent
+
+A node operator runs `dkg init` (targeting a mainnet chain), `dkg start`,
+`dkg status` / `dkg doctor`, then stands up a **Hermes agent** bound to the node and
+records its `agentAddress` (`GET /api/agent/identity`). This is the agent that will
+later reason over the shared knowledge.
+
+### Import into a Context Graph (Working Memory)
+
+```bash
+dkg okf import packages/okf/test/fixtures/crypto_bitcoin \
+  --context-graph-id okf-crypto-bitcoin --create-context-graph
+```
+
+The reconstructed graph, all in **WM (private, free)**:
+
+- **5 Knowledge Assets** — the dataset (`type: BigQuery Dataset`) and four tables
+  (`type: BigQuery Table`). Zero assets for the three reserved `index.md` files.
+- **The dataset → four tables**, and the **cross-table references that were only
+  prose** now first-class edges:
+  - `transactions` → `crypto_bitcoin`, `blocks`, `inputs`, `outputs`
+  - `inputs` → `crypto_bitcoin`, `transactions`, `outputs`
+  - `blocks` → *(no concept edges; external citations only)*
+- **Citations** captured as `schema:citation` (the `bitcoin-etl` repo, the GCP blog
+  post, the BigQuery API URIs), parsed from **both** the numbered `[1] [text](url)`
+  style and the bare-bullet `- https://…` style present in the bundle.
+
+A nice, honest detail: `outputs.md`'s only two concept links are written *inside
+backticks* — `` `[transactions](transactions.md)` ``. CommonMark treats code-span
+content as literal text, so **by default these are not edges** (the importer warns
+and records them as `codeSpanLinks`; `--include-code-span-links` opts in). This is
+the mechanism-first reading, and it is the one place where the implementation
+deliberately diverges from a naive link count.
+
+Confirm it via SPARQL over `view: working-memory`:
+
+```sparql
+SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }
+```
+
+### Seal and share to Shared Working Memory
+
+```bash
+dkg okf import … --share     # wm/finalize, then swm/share entities:"all"
+```
+
+The bundle becomes a **shared Context Graph** other agents can reach — still
+**free**, still carrying **no on-chain verification**, but now sealed and
+*publish-ready*.
+
+### Issue a join invitation; a second peer verifies
+
+The curator issues a join invitation (`dkg context-graph invite`), allows the
+joining agent (`dkg context-graph add-agent`), and a **second node** subscribes
+(`dkg subscribe`) and runs the same SPARQL over `shared-working-memory` —
+independently reading the Bitcoin knowledge and traversing the cross-table
+relationships. This is the "shared memory for multi-agent AI" claim made concrete,
+entirely in free SWM.
+
+### The deferred Verifiable Memory promotion
+
+Because the assets are sealed and publish-ready, promotion to VM is one step away —
+but it **waits for explicit operator go-ahead**. It costs real TRAC + native gas
+and is irreversible; on mainnet there is no faucet. The operator confirms wallet
+balances first (`dkg wallet` / `/api/wallets/balances`), publishes a **single**
+asset first to observe real cost and validate the on-chain path, records the
+returned UAL (`did:dkg:<chainId>/<kasAddress>/<number>`) and `txHash`, then
+publishes the rest. UALs and txHashes are recorded **only if this step is actually
+run.** Until then, the deliverable is the shared, peer-verified, agent-queried
+Context Graph in SWM.
+
+### The Hermes agent reasons over the verifiable knowledge
+
+The Hermes agent answers a natural-language question over the shared graph — *"what
+does the `transactions` table reference?"* — and returns the four expected targets,
+consuming OKF-derived, provenance-bearing knowledge. And `dkg okf export
+okf-crypto-bitcoin ./out` regenerates an OKF bundle from the shared Context Graph
+whose `schema:mentions` structure matches Google's own `viz.html` — the literal
+"recreated with the integration" artifact, both ways.
+
+## 4. Why this matters
+
+Verifiable, owned, **shared** memory for multi-agent AI:
+
+- Any agent on the network can subscribe to the shared Context Graph and reason
+  over provenance-bearing knowledge — not a private copy, but a shared one.
+- Published assets are permanent and ownable; provenance and authorship are
+  cryptographic, not conventional.
+- An OKF bundle authored *anywhere* — by a human, by Google's reference agent, by
+  another LLM — gains a trust-and-permanence backend for free, without changing the
+  format. The same portable Markdown is now also a verifiable, owned graph.
+
+## 5. Honest limitations
+
+- **OKF v0.1 is structural-only.** Links are untyped; the integration reconstructs
+  the relationship graph but **does not invent semantic FK types**. Typed
+  relationship lifting would be a separate Layer-2 enrichment pass.
+- **VM publishing costs real TRAC** and is irreversible; it is deferred and gated,
+  never automatic.
+- **SWM is TTL-bounded** — shared, free, but not permanent; peers that join late may
+  not see old SWM content. Only VM is network-replicated and permanent.
+- **Export is graph-faithful, not byte-faithful.** Free-form prose is not
+  recoverable from triples, so `export` regenerates bodies structurally; round-trip
+  equivalence is asserted over the semantic graph.
+- **Deeper schema-level extraction is out of scope.** The importer captures
+  `# Schema` sections as sections, not as typed column models.
+
+## 6. Reproduce it
+
+The full offline gate and the live runbook:
+
+```bash
+pnpm --filter @origintrail-official/dkg-okf test     # 60+ deterministic golden/edge/round-trip tests
+dkg okf import <bundleDir> --dry-run --print-nquads  # the mapping, offline, byte-stable
+```
+
+The live lifecycle (mainnet node → Hermes agent → WM → SWM → join invitation →
+second-peer verification → rendered graph, with VM promotion held as the deferred
+capstone) is in **`packages/okf/DEMO.md`**. The mapping and the reuse-vs-fork
+decision are in **`docs/adr/0005-okf-rdf-mapping.md`**; the package vocabulary and
+every judgement call are in **`packages/okf/CONTEXT.md`**.
@@ -9,7 +9,7 @@
   "scripts": {
     "build": "node scripts/build.mjs",
     "build:packages": "turbo build",
-    "build:runtime:packages": "pnpm -r --filter @origintrail-official/dkg-core --filter @origintrail-official/dkg-storage --filter @origintrail-official/dkg-query --filter @origintrail-official/dkg-publisher --filter @origintrail-official/dkg-chain --filter @origintrail-official/dkg-epcis --filter @origintrail-official/dkg-random-sampling --filter @origintrail-official/dkg-agent --filter @origintrail-official/dkg-graph-viz --filter @origintrail-official/dkg-node-ui --filter @origintrail-official/dkg-adapter-openclaw --filter @origintrail-official/dkg-adapter-hermes --filter @origintrail-official/kafka-plugin --filter @origintrail-official/dkg run build",
+    "build:runtime:packages": "pnpm -r --filter @origintrail-official/dkg-core --filter @origintrail-official/dkg-storage --filter @origintrail-official/dkg-query --filter @origintrail-official/dkg-publisher --filter @origintrail-official/dkg-chain --filter @origintrail-official/dkg-epcis --filter @origintrail-official/dkg-okf --filter @origintrail-official/dkg-ip-oracle --filter @origintrail-official/dkg-random-sampling --filter @origintrail-official/dkg-agent --filter @origintrail-official/dkg-graph-viz --filter @origintrail-official/dkg-node-ui --filter @origintrail-official/dkg-adapter-openclaw --filter @origintrail-official/dkg-adapter-hermes --filter @origintrail-official/kafka-plugin --filter @origintrail-official/dkg run build",
     "build:runtime": "pnpm run build:runtime:packages && pnpm --filter @origintrail-official/dkg-node-ui run build:ui",
     "test": "turbo test",
     "test:watch": "vitest --config vitest.config.ts",