-
Notifications
You must be signed in to change notification settings - Fork 9
DO NOT MERGE feat(okf): Google OKF → DKG integration — import OKF bundles as deterministic, provenance-bearing Knowledge Assets #1331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| import { Command } from 'commander'; | ||
| import { toErrorMessage } from '@origintrail-official/dkg-core'; | ||
| import { | ||
| writePatentBundle, | ||
| ingestPatentExport, | ||
| type PatentGenOptions, | ||
| } from '@origintrail-official/dkg-ip-oracle'; | ||
|
|
||
| /** | ||
| * `dkg ip-oracle` — engineering harness for the IP / Patent Context Oracle. | ||
| * | ||
| * `generate` emits a deterministic, **synthetic** Google-Patents-shaped OKF | ||
| * bundle to disk (no BigQuery / GCP dependency), which is then ingested into a | ||
| * PRIVATE Context Graph via `dkg okf import --private`. The data is SIMULATED — | ||
| * every concept stamps `source: … [SIMULATED]` and a CC BY 4.0 licence so the | ||
| * downstream redaction guard and the public article stay honest about what is | ||
| * real vs. generated. | ||
| * | ||
| * This command writes files only; it never touches the node and spends nothing. | ||
| */ | ||
| export function registerIpOracleCommand(program: Command): void { | ||
| const cmd = program | ||
| .command('ip-oracle') | ||
| .description('IP / Patent Context Oracle tooling (synthetic OKF patent corpora)'); | ||
|
|
||
| cmd | ||
| .command('generate <outDir>') | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Issue: What's wrong Example Suggested direction For Agents There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Issue: The new What's wrong Example Suggested direction For Agents |
||
| .description('Generate a deterministic synthetic patent OKF bundle (no BigQuery needed)') | ||
| .requiredOption('--count <n>', 'Number of patent concepts to emit', (v: string) => parseInt(v, 10)) | ||
| .option('--cpc-class <class>', 'CPC subclass tag, e.g. H04L', 'H04L') | ||
| .option('--seed <n>', 'PRNG seed (same seed ⇒ identical corpus)', (v: string) => parseInt(v, 10), 42) | ||
| .option('--citations-per-patent <n>', 'Max backward citations per patent', (v: string) => parseInt(v, 10)) | ||
| .option('--family-size <n>', 'Patents per simulated family', (v: string) => parseInt(v, 10)) | ||
| .option('--retrieval-date <iso>', 'Stamped retrieval / modified date (YYYY-MM-DD)') | ||
| .action((outDir: string, opts: Record<string, unknown>) => { | ||
| try { | ||
| const count = Number(opts.count); | ||
| if (!Number.isInteger(count) || count <= 0) { | ||
| console.error('--count must be a positive integer.'); | ||
| process.exit(2); | ||
| } | ||
| const genOpts: PatentGenOptions = { | ||
| cpcClass: String(opts.cpcClass ?? 'H04L'), | ||
| count, | ||
| seed: Number(opts.seed ?? 42), | ||
| ...(opts.citationsPerPatent != null | ||
| ? { citationsPerPatent: Number(opts.citationsPerPatent) } | ||
| : {}), | ||
| ...(opts.familySize != null ? { familySize: Number(opts.familySize) } : {}), | ||
| ...(opts.retrievalDate ? { retrievalDate: String(opts.retrievalDate) } : {}), | ||
| }; | ||
| const summary = writePatentBundle(genOpts, outDir); | ||
| console.log( | ||
| JSON.stringify( | ||
| { | ||
| mode: 'generate', | ||
| ...summary, | ||
| files: summary.conceptCount + 3, // patents + patents/index + index + log | ||
| synthetic: true, | ||
| note: | ||
| 'Synthetic SIMULATED corpus. Next: dkg okf import <outDir> ' + | ||
| '--context-graph-id <cg> --private --create-context-graph', | ||
| }, | ||
| null, | ||
| 2, | ||
| ), | ||
| ); | ||
| } catch (err) { | ||
| console.error(toErrorMessage(err)); | ||
| process.exit(1); | ||
| } | ||
| }); | ||
|
|
||
| cmd | ||
| .command('ingest <exportFile> <outDir>') | ||
| .description( | ||
| 'Map a Google Patents Public Data NDJSON export (real data, run the BigQuery ' + | ||
| 'query yourself) into OKF bundle(s). Offline, deterministic, no GCP SDK.', | ||
| ) | ||
| .option('--shard-by-cpc', 'Write one self-contained OKF bundle per CPC subclass (recommended at scale)') | ||
| .option('--retrieval-date <iso>', 'Stamped retrieval / modified date (YYYY-MM-DD)') | ||
| .action(async (exportFile: string, outDir: string, opts: Record<string, unknown>) => { | ||
| try { | ||
| const summary = await ingestPatentExport(exportFile, outDir, { | ||
| shardByCpc: Boolean(opts.shardByCpc), | ||
| ...(opts.retrievalDate ? { retrievalDate: String(opts.retrievalDate) } : {}), | ||
| }); | ||
| console.log( | ||
| JSON.stringify( | ||
| { | ||
| mode: 'ingest', | ||
| ...summary, | ||
| synthetic: false, | ||
| note: | ||
| 'Real Google Patents Public Data (CC BY 4.0). Next, per shard: dkg okf ' + | ||
| 'import <shardDir> --context-graph-id <cg> --private --create-context-graph', | ||
| }, | ||
| null, | ||
| 2, | ||
| ), | ||
| ); | ||
| } catch (err) { | ||
| console.error(toErrorMessage(err)); | ||
| process.exit(1); | ||
| } | ||
| }); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Issue: The new IP Oracle CLI surface has no end-to-end verification
What's wrong
The PR adds a new user-facing CLI command, but the added tests only verify the underlying
@origintrail-official/dkg-ip-oraclelibrary. That leaves the behavior users actually run throughdkg ip-oracleunverified, including whether the command is registered and whether Commander parses and forwards the options correctly.Example
A regression that removes
registerIpOracleCommand(program), renamesip-oracle generate, or stops forwarding--shard-by-cpc/--retrieval-datewould leave the new library tests green because they callgeneratePatentBundle,writePatentBundle, andingestPatentExportdirectly instead of the CLI entrypoint.Suggested direction
Cover the
dkg ip-oraclecommand through the same compiled-CLI path used for OKF subcommands so command registration, option parsing, validation, and output contracts are verified.For Agents
Add CLI-level tests under
packages/cli/testthat invoke the compileddist/cli.jsforip-oracle generate <outDir> --count 2 ...andip-oracle ingest <ndjson> <outDir> --shard-by-cpc --retrieval-date ...; assert exit code, JSON summary fields, produced files, and no node/API calls. Include a negative--count 0case to prove the CLI validation and exit code.