Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
"@origintrail-official/dkg-core": "workspace:*",
"@origintrail-official/dkg-mcp": "workspace:*",
"@origintrail-official/dkg-epcis": "workspace:*",
"@origintrail-official/dkg-okf": "workspace:*",
"@origintrail-official/dkg-node-ui": "workspace:*",
"@origintrail-official/dkg-publisher": "workspace:*",
"@origintrail-official/dkg-storage": "workspace:*",
Expand Down
2 changes: 2 additions & 0 deletions packages/cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import { registerNodeOpsCommands } from './commands/node-ops.js';
import { registerQueryCatalogCommand } from './commands/query-catalog.js';
import { registerMaintenanceCommands } from './commands/maintenance.js';
import { registerRandomSamplingCommand } from './commands/random-sampling.js';
import { registerOkfCommand } from './commands/okf.js';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The top-level OKF import is not covered by the runtime build graph

What's wrong
This adds a required startup dependency for the whole CLI, but the existing runtime build path still omits the new workspace package. That can make the packaged or auto-updated CLI fail to build or fail to start, even for unrelated commands.

Example
On a fresh source/runtime build that runs the existing build:runtime:packages path, packages/okf/dist/index.js is not produced. Starting node packages/cli/dist/cli.js status can fail before command dispatch because commands/okf.js imports @origintrail-official/dkg-okf.

Suggested direction
Wire @origintrail-official/dkg-okf into the same build paths as dkg-epcis before loading the command from the CLI entrypoint.

For Agents
Add the new OKF package to the CLI build graph: include it in packages/cli/tsconfig.json references/prebuild as needed and in the root runtime package build filter used by auto-update/release builds. Prove a clean runtime build can execute a non-OKF command and dkg okf --help.


const program = new Command();
program
Expand Down Expand Up @@ -56,6 +57,7 @@ registerNodeOpsCommands(program);
registerQueryCatalogCommand(program);
registerMaintenanceCommands(program);
registerRandomSamplingCommand(program);
registerOkfCommand(program);

// ─── dkg integration ─────────────────────────────────────────────────

Expand Down
823 changes: 823 additions & 0 deletions packages/cli/src/commands/okf.ts

Large diffs are not rendered by default.

514 changes: 514 additions & 0 deletions packages/cli/test/okf-subcommands.test.ts

Large diffs are not rendered by default.

123 changes: 123 additions & 0 deletions packages/okf/CONTEXT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# OKF

Deterministic Google Open Knowledge Format (OKF) → DKG mapper. Turns a portable
OKF bundle (Markdown + YAML frontmatter + untyped cross-links) into owned,
verifiable RDF Knowledge Assets, reconstructing the bundle's cross-concept link
graph. Pure, no LLM, no network: the same bundle always yields identical triples
and IRIs. The `dkg okf` CLI command is a thin wrapper over this package.

The framing: OKF standardises *how* knowledge is written and exchanged but ships
**no** verification, provenance or ownership layer (OKF SPEC §1, §10). The DKG
supplies exactly that. This package is the bridge — the trust-and-permanence
backend for OKF.

## Language

**Bundle**:
A directory tree of UTF-8 Markdown files; the unit of distribution (OKF §3). Fed
to the mapper as an in-memory `BundleFile[]` (`{ path, content }`, POSIX paths).
`loadBundleDir` is the only filesystem surface; the mapper itself is I/O-free.

**Concept**:
One non-reserved `.md` file = YAML frontmatter + Markdown body (OKF §4). Each
concept becomes exactly one Knowledge Asset. Reserved `index.md` / `log.md`
files are **not** concepts and are never minted as KAs (OKF §3.1, §6, §7).

**Concept ID**:
The file's bundle-relative path with `.md` removed (OKF §2) — e.g.
`tables/blocks`. The path *is* the concept's identity. Segment validation agrees
byte-for-byte with the reference agent's `paths.py` (`[A-Za-z0-9_][A-Za-z0-9_.\-]*`).

**IRI**:
The Knowledge Asset subject IRI, derived deterministically from the concept ID:
`urn:okf:<conceptId>` (configurable base). Same bundle ⇒ same IRIs. This is the
RDF subject; the on-chain UAL is assigned by the node at publish time (it is not
the same thing — see Flagged ambiguities).

**Link**:
A standard Markdown link `[text](path)` from one concept to another (OKF §5).
Resolved against the bundle (absolute `/abs`, relative `./`, parent `../`,
bare-sibling, extension-less forms) into an **untyped directed edge**
(`schema:mentions`). The kind of relationship lives in prose, not the link
(OKF §5.3) — the mapper never infers FK/join types. Broken links are warnings,
never errors (OKF §5.3, §9).

**Citation**:
A link (usually an external URL) under a `# Citations` heading (OKF §8), backing
a claim. Mapped to `schema:citation`, semantically distinct from concept edges.

**Memory layers** (where imported assets live):
- **WM** (Working Memory): private to one agent, free, reversible. The import
default.
- **SWM** (Shared Working Memory): team-visible, gossip-replicated, free,
TTL-bounded. Reached with `--share` (finalize + advance).
- **VM** (Verifiable Memory): on-chain, permanent, costs TRAC. **Never** written
by this package; promotion is a separate, explicitly-gated operator step.

## Relationships

- Bundle → many Concepts (+ reserved files, skipped). Pass 1 indexes the bundle
and builds the `conceptId → IRI` map; Pass 2 maps each concept and resolves its
links against that map (so an edge only forms to a concept that exists).
- Concept → one Knowledge Asset (one subject IRI) → many quads (frontmatter
triples + body sections + untyped edges + citations).
- Frontmatter key → RDF predicate via the locked table (ADR 0005). `type` is the
only required key (OKF §9); everything else degrades gracefully when absent.
- Link → `schema:mentions` edge **iff** its resolved target is a concept in the
bundle; otherwise it is a broken-link warning (target may be not-yet-written
knowledge) or, for external URLs, simply ignored as a non-edge.
- **Opt-in typed edges (`typeRelations` / `--relate`).** By default every edge is
`schema:mentions` (zero interpretation, faithful to OKF §5.3's untyped links).
A caller may supply deterministic `(fromType, toType) → predicate` rules to
type edges by their endpoints' OKF `type` — e.g. `BigQuery Dataset → BigQuery
Table = schema:hasPart` (containment) while `Table → Table` stays `mentions`.
This is byte-stable (types come straight from frontmatter, no prose, no LLM)
and **off by default** so the purity guarantee holds unless explicitly opted in.
Caveat: the rule is endpoint-type-based, so it cannot distinguish a same-dataset
containment link from a cross-dataset reference of the same type pair — use it
where that distinction doesn't apply, or leave the default.
- **Round-trip is graph-faithful, not byte-faithful, by design.** `import →
export → import` reproduces an equivalent *semantic graph*, not the original
bytes: free-form prose isn't recoverable from triples, so export regenerates
bodies structurally, and a typed edge (e.g. `hasPart`) exports as a plain
(untyped) OKF link because OKF can't express the relation type. This is a
deliberate choice, not a defect; a future enhancement may have export *add*
provenance (UAL / seal) when serialising from a published graph.

## Flagged ambiguities

- **Reuse vs. fork of the Markdown extractor.** The node's `markdown-extractor.ts`
is regex-based and resolves only `[[wikilinks]]`, not OKF's `[text](path)`
links; importing it from `packages/cli` would also create a `cli → okf → cli`
dependency cycle. So we **converge on its predicate vocabulary** (same
`schema:*` / `dkg:hasSection` IRIs, pinned by a test) but use a **real Markdown
AST** (`mdast-util-from-markdown`) for link/section/citation extraction — which
OKF §2 mandates and which is what lets us honour the in-code-span rule below.
- **Links inside inline code spans.** `outputs.md` writes its only two concept
links inside backticks: `` `[transactions](transactions.md)` ``. CommonMark
treats code-span content as literal text, so **by default these are NOT edges**
(the mechanism-first answer). They are recorded as `codeSpanLinks` and surfaced
as warnings. `--include-code-span-links` flips the policy; both behaviours are
tested.
- **IRI derivation / UAL.** Concept subject IRIs are `urn:okf:<conceptId>`, a pure
function of the concept ID. The on-chain UAL (`did:dkg:<chain>/<ka>/<n>`) is
assigned by the node at VM publish (RFC-43 pre-knowable UALs are still draft) —
do not conflate the two. WM/SWM data carries no on-chain verification.
- **`type` normalisation.** A bare `type` value is PascalCased into the schema.org
namespace (`BigQuery Dataset` → `http://schema.org/BigQueryDataset`); a full IRI
`type` is used unchanged. Round-trips losslessly because PascalCase of the local
name is idempotent.
- **`timestamp` → `schema:dateModified`.** OKF defines `timestamp` as last-modified
time, so we map it to `schema:dateModified` (typed `xsd:dateTime`) rather than
the extractor's naive `schema:timestamp` slug — a deliberate semantic choice.
- **`resource` → `schema:url`.** Chosen over `dcterms:source`; documented in ADR 0005.
- **Citations, two styles.** Both numbered (`[1] [text](url)`) and bare-bullet
(`- https://…`) forms are parsed leniently; deduplicated by URL.
- **Folder hierarchy.** `schema:isPartOf` from directory structure is **off by
default** — directories are not concepts, and minting them as graph nodes would
muddy the concept graph. Available via `emitFolderHierarchy`.
- **Producer-defined keys** are always preserved (camelCased into schema.org),
never dropped or rejected (OKF §4.1, §9).
- **Conformance is permissive.** Only two rules make a bundle non-conformant
(unparseable frontmatter; missing non-empty `type`). Missing optionals, unknown
types/keys, broken links and missing `index.md` are tolerated (OKF §9).
226 changes: 226 additions & 0 deletions packages/okf/DEMO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# OKF ↔ DKG live demo runbook

End-to-end demonstration: the same portable Bitcoin Markdown, turned into
**owned, shareable** Knowledge Assets on a DKG **mainnet** node, imported into a
Context Graph, **shared through Shared Working Memory**, verified by a second
peer, and reasoned over by a Hermes agent.

**Cost model — read this first.**
- Steps 1–7 are the default demo and **spend nothing**. Working Memory (WM) and
Shared Working Memory (SWM) are free.
- Step 8 — **Verifiable Memory (VM) promotion — is deferred.** It spends real
**TRAC + native gas, irreversibly**, on mainnet (no faucet). It is *not* part
of the default run; the operator triggers it deliberately, after confirming
funds. It is documented here, clearly marked, and only its UALs/txHashes are
recorded *if and when* it is actually run.
- This runbook is **operator-run, never CI.** The free offline correctness gate
(`pnpm --filter @origintrail-official/dkg-okf test`) is the CI gate; it never
touches a node and is green before any of this is attempted.

Throughout: **never imply on-chain verification for WM/SWM data.** State which
memory layer each piece of evidence lives in.

Conventions: `$CG=okf-crypto-bitcoin`, `$BUNDLE=packages/okf/test/fixtures/crypto_bitcoin`.

---

## 0. Offline correctness gate (free, no node)

```bash
pnpm --filter @origintrail-official/dkg-okf test # 60+ golden/edge/round-trip tests
dkg okf import $BUNDLE --dry-run --print-nquads # deterministic mapping, no node
```

Expect: 5 Knowledge Assets, 3 reserved `index.md` skipped, the reconstructed edge
graph, both citation styles, byte-stable N-Quads. Run it twice — identical output.

## 1. Launch the node on mainnet

```bash
dkg init # choose a mainnet blockchain (e.g. mainnet-base / mainnet-gnosis / mainnet-neuroweb)
dkg start
dkg status # daemon PID, version, listening port
dkg doctor # health checks
```

**Verify the node is actually on mainnet, not testnet/devnet, before going
further** — confirm the active network/chain in the printed config / `dkg doctor`
output. `edge` is the default role.

## 2. Attach a Hermes agent

```bash
dkg hermes setup # configure the Hermes-runtime agent bound to this node
dkg hermes # run it
```

Confirm the acting agent identity (this is the agent the shared bundle is
*proposed to* in step 6):

```bash
curl -s localhost:<apiPort>/api/agent/identity # → { agentAddress, agentDid, name, framework, peerId }
```

Record the `agentAddress`.

## 3. Import the bundle into a Context Graph (Working Memory — free)

```bash
dkg okf import $BUNDLE --context-graph-id $CG --create-context-graph
```

Import defaults to **Working Memory** — free, private, reversible. Expect the
summary: `5 concepts, 3 reserved skipped, 101 triples, 11 links resolved, 0
broken, 10 citations`, plus the deterministic `urn:okf:*` IRIs and the
`memoryLayer: "WM"` note.

Confirm the 5 assets and the reconstructed edges are present **in WM** via SPARQL
(`/api/query`, `view: working-memory`, `agentAddress` required for WM):

```bash
curl -s localhost:<apiPort>/api/query -H 'content-type: application/json' -d '{
"contextGraphId":"okf-crypto-bitcoin",
"view":"working-memory",
"agentAddress":"<agentAddress>",
"sparql":"SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }"
}'
```

Expect the 11 `schema:mentions` edges (dataset→4 tables, transactions→4,
inputs→3). `tables/outputs` has none — its only links sit inside backticks
(CommonMark: literal text). All evidence here is **WM (private, free, no on-chain
verification).**

## 4. Finalize and share to Shared Working Memory (free)

```bash
dkg okf import $BUNDLE --context-graph-id $CG --share
```

`--share` seals each asset (`wm/finalize`) and advances it (`swm/share`,
`entities: "all"`). SWM is free, gossip-replicated and team-visible — this is the
moment the Bitcoin bundle becomes a **shared Context Graph** other agents can
reach. The assets are now sealed and *publish-ready*, but **publishing waits**
(step 8).

Confirm the same assets/edges in the `shared-working-memory` view:

```bash
# 11 cross-table edges
dkg query $CG -q 'SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }' --include-shared-memory

# Exactly 5 concepts. Count subjects that have an rdf:type — only the 5 concepts
# do. A naive `STRSTARTS(STR(?s),"urn:okf:")` count returns ~19 because the
# daemon skolemises each concept's dkg:hasSection blank nodes into
# `urn:okf:.../.well-known/genid/...` subjects, which also match the prefix.
dkg query $CG -q 'SELECT (COUNT(DISTINCT ?s) AS ?n) WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?t FILTER(STRSTARTS(STR(?s), "urn:okf:")) }' --include-shared-memory
```

Evidence here is **SWM (shared, free, TTL-bounded, no on-chain verification).**

## 5. Issue a join invitation and have a second peer verify it

```bash
# Curator side — invite a peer (the V10 invite is the pair <contextGraphId>\n<curatorPeerId>):
dkg context-graph invite $CG <joiningPeerId>
# For a curated graph, allow the joining agent:
dkg context-graph add-agent $CG <joiningAgentAddress>
# …or have the joiner request-join and the curator approve-join.
```

From a **second node/agent**:

```bash
dkg subscribe okf-crypto-bitcoin # subscribe + catch up
dkg query okf-crypto-bitcoin -q 'SELECT ?s ?o WHERE { ?s <http://schema.org/mentions> ?o }' --include-shared-memory
```

Record the invite and the second peer's query result — the shared Context Graph
is independently checkable by another peer, all in **free SWM**.

## 6. Hermes agent reasons over the shared knowledge

Have the Hermes agent answer a natural-language question through its `dkg_*`
tools over the `shared-working-memory` view, e.g. *"what does the `transactions`
table reference?"*:

```sparql
SELECT ?o WHERE { <urn:okf:tables/transactions> <http://schema.org/mentions> ?o }
```

Expect the four targets: `urn:okf:datasets/crypto_bitcoin`, `urn:okf:tables/blocks`,
`urn:okf:tables/inputs`, `urn:okf:tables/outputs`. Capture the transcript — an
agent consuming OKF-derived, provenance-bearing knowledge from the shared graph.

## 7. Recreated, visibly

Regenerate the graph from the shared Context Graph and compare it to Google's own
`viz.html`:

```bash
dkg okf export okf-crypto-bitcoin ./out --view shared-working-memory
```

`export` is the clean inverse of `import` (graph-faithful). Confirm the
regenerated bundle's `schema:mentions` structure matches the dataset→tables and
cross-table edges in the bundle's own
`okf/bundles/crypto_bitcoin/viz.html`. (`packages/graph-viz` can render the graph
view directly.)

**At this point the deliverable is complete: a shared, peer-verified,
agent-queried Context Graph in SWM. Nothing has been spent.**

---

## 8. VM promotion — staged, but it waits (DEFERRED; real TRAC + gas)

> **Do not run this as part of the demo.** The assets are sealed and
> publish-ready in SWM, so promotion to Verifiable Memory is one step away — held
> until the operator deliberately chooses to spend.

When (and only when) the operator chooses to promote:

1. **Confirm funding first** — on mainnet there is **no faucet**:
```bash
dkg wallet # or: curl -s localhost:<apiPort>/api/wallets/balances
```
Abort if TRAC + native gas are insufficient.
2. **Publish ONE asset first** to observe real cost and validate the on-chain
path (the dataset). The first publish transparently registers the Context
Graph on-chain — expect gas/TRAC:
```bash
# vm/publish for a single KA (gate behind explicit confirmation in your runbook).
# The KA name is the concept ID with '/' mapped to '__' (asset names can't contain '/').
curl -s localhost:<apiPort>/api/knowledge-assets/datasets__crypto_bitcoin/vm/publish \
-H 'content-type: application/json' -d '{"contextGraphId":"okf-crypto-bitcoin"}'
```
Record the returned UAL (`did:dkg:<chainId>/<kasAddress>/<number>`) and
`txHash`.
3. **Then publish the rest** and re-verify via the `verifiable-memory` view.

| Asset | UAL | txHash |
|---|---|---|
| `datasets/crypto_bitcoin` | _(record if run)_ | _(record if run)_ |
| `tables/blocks` | | |
| `tables/transactions` | | |
| `tables/inputs` | | |
| `tables/outputs` | | |

Until promoted, the demo's deliverable is the **shared, peer-verified,
agent-queried** Context Graph in SWM. Only VM data carries on-chain verification;
WM/SWM data never does.

---

## Evidence log (fill in during the run)

- Node network/chain confirmed mainnet: ______
- Hermes `agentAddress`: ______
- Import summary (WM): 5 concepts / 101 triples / 11 edges / 0 broken / 10 citations
- WM SPARQL edge count: ______
- SWM SPARQL edge count: ______
- Join invitation: ______
- Second peer query result: ______
- Hermes agent transcript: ______
- Regenerated graph vs `viz.html`: ______
- (Deferred) VM UALs / txHashes: _(only if step 8 was run)_
Loading
Loading