Codex/blazegraph safe rdf literals by lupuszr · Pull Request #1322 · OriginTrail/dkg

lupuszr · 2026-06-24T15:00:55Z

Summary

Fixes Blazegraph ingestion/sync failures caused by oversized RDF literals, specifically large schema.org/text values that exceed Java/Blazegraph modified UTF-8 limits.

Changes:

Adds shared RDF literal size validation using Java modified UTF-8 byte counting.
Rewrites oversized schema.org/text literals into ordered chunk resources below a conservative 60,000-byte ceiling.
Adds hash, byte-size, chunk-count, and source-predicate metadata for rewritten text bodies.
Rejects oversized non-text RDF literals before publishing with RDF_LITERAL_TOO_LARGE.
Wires validation/chunking into producer paths: direct publish, shared memory writes, conditional writes, assertion writes, and agent publish attestation generation.
Keeps Blazegraph adapter-side oversized-literal rejection as defense in depth.
Maps producer literal-size failures to HTTP 400 in daemon routes.

Testing

pnpm --filter @origintrail-official/dkg-core exec vitest run test/rdf-literal-limits.test.ts
pnpm --filter @origintrail-official/dkg-storage exec vitest run test/blazegraph.unit.test.ts
pnpm --filter @origintrail-official/dkg-publisher exec vitest run --config vitest.unit.config.ts test/dkg-publisher-compat.test.ts
git diff --check

Live Blazegraph integration was attempted, but this local environment has no BLAZEGRAPH_TEST_URL and Docker daemon access was unavailable.

otReviewAgent

Operational Notice: Review Agent could not complete this review.

Business logic reviewer failed: retry_exhausted

otReviewAgent · 2026-06-25T04:06:29Z

+    }
+
+    const sha256 = createHash('sha256').update(parsed.lexical, 'utf8').digest('hex');
+    const bodySubject = `${quad.subject}/.well-known/genid/text-${sha256.slice(0, 16)}`;


🔴 Bug: This derives the replacement resource from quad.subject even when the source subject is a blank node. For an oversized schema:text on _:b0, bodySubject becomes _:b0/.well-known/...; callers such as assertionWrite persist the normalized quads without skolemizing, and the storage serializers treat that string as a blank node label, producing invalid N-Quads and rejecting a write that previously worked on non-Blazegraph backends. Skolemize before rewriting, or generate a valid blank node/IRI for blank-node subjects before emitting the chunk quads.

otReviewAgent · 2026-06-25T04:06:29Z

+  const shaped = err as { name?: unknown; code?: unknown };
+  return shaped.name === 'RdfLiteralSizeError' || shaped.code === 'RDF_LITERAL_TOO_LARGE';
+}
+


🟡 Issue: String(err ?? ...) turns any non-Error object that matches isRdfLiteralSizeError into an unhelpful "[object Object]" response. Since the predicate explicitly accepts shaped objects by name or code, this helper should also read a string message field before falling back, so daemon responses stay actionable for cross-package/plain-object errors.

otReviewAgent · 2026-06-25T04:06:29Z

-    // or subject-level validation error downstream.
-    rejectUserAuthoredProtocolMetadata(quads);
-    const subjects = [...new Set(quads.map(q => q.subject))];
+    // Normalize oversized text literals and reject reserved/protocol


🟡 Issue: The added test only exercises DKGPublisher.publish(), but this new normalization also changes the SWM share() write path and recurs in conditional/append paths. Add a regression test that calls share() or conditionalShare() with an oversized schema:text literal and asserts the stored quads are chunked; otherwise this path could regress to sending the original >65KB literal while tests stay green.

otReviewAgent · 2026-06-25T04:06:29Z

      privateQuads?: Quad[];
    },
  ): Promise<PublishOptions['precomputedAttestation']> {
+    quads = normalizeLargeRdfLiteralsForBlazegraph(quads).quads as Quad[];


🟡 Issue: This normalization feeds _buildPrecomputedAttestationForSelection, but the added coverage does not exercise an agent/on-chain publish where the precomputed attestation is built from oversized public or private quads. Add a regression test through the agent selection/precomputed-attestation path so a raw-vs-normalized seal mismatch is caught.

otReviewAgent · 2026-06-25T04:06:29Z

      if (isPayloadTooLargeError(e)) {
        return jsonResponse(res, 413, payloadTooLargeResponseBody(e));
      }
+      if (isRdfLiteralSizeError(e)) {


🟡 Issue: The daemon now maps RdfLiteralSizeError to a public 400 response, but there is no route-level test that makes the underlying publish/share call throw this error and asserts the HTTP status/body. Please add coverage for this route and the shared-memory handlers that recur below, so a regression to the generic 500 path or missing RDF_LITERAL_TOO_LARGE code is caught.

otReviewAgent · 2026-06-25T04:06:29Z

@@ -98,6 +98,7 @@ import {
  sharedMemoryReadBothFilter,
  partitionCatalogQuads,


🔵 Nit: normalizeLargeRdfLiteralsForBlazegraph is imported from @origintrail-official/dkg-core in its own statement immediately after an existing import from the same package. Consolidating it into the existing import matches the local pattern and avoids needless import drift as more core symbols are added.

lupuszr added 2 commits June 24, 2026 15:11

Log StorageACK declines and validation failures

557d5a4

Prevent oversized RDF literals from poisoning Blazegraph

8203528

otReviewAgent reviewed Jun 24, 2026

View reviewed changes

otReviewAgent reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Codex/blazegraph safe rdf literals#1322

Codex/blazegraph safe rdf literals#1322
lupuszr wants to merge 2 commits into
mainfrom
codex/blazegraph-safe-rdf-literals

lupuszr commented Jun 24, 2026 •

edited

Loading

Uh oh!

otReviewAgent left a comment

Uh oh!

otReviewAgent Jun 25, 2026

Uh oh!

otReviewAgent Jun 25, 2026

Uh oh!

otReviewAgent Jun 25, 2026

Uh oh!

otReviewAgent Jun 25, 2026

Uh oh!

otReviewAgent Jun 25, 2026

Uh oh!

otReviewAgent Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -98,6 +98,7 @@ import {
		sharedMemoryReadBothFilter,
		partitionCatalogQuads,

Uh oh!

Conversation

lupuszr commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

otReviewAgent left a comment

Choose a reason for hiding this comment

Uh oh!

otReviewAgent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

otReviewAgent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

otReviewAgent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

otReviewAgent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

otReviewAgent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

otReviewAgent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lupuszr commented Jun 24, 2026 •

edited

Loading