Codex/blazegraph safe rdf literals#1322
Conversation
otReviewAgent
left a comment
There was a problem hiding this comment.
Operational Notice: Review Agent could not complete this review.
Business logic reviewer failed: retry_exhausted
| } | ||
|
|
||
| const sha256 = createHash('sha256').update(parsed.lexical, 'utf8').digest('hex'); | ||
| const bodySubject = `${quad.subject}/.well-known/genid/text-${sha256.slice(0, 16)}`; |
There was a problem hiding this comment.
🔴 Bug: This derives the replacement resource from quad.subject even when the source subject is a blank node. For an oversized schema:text on _:b0, bodySubject becomes _:b0/.well-known/...; callers such as assertionWrite persist the normalized quads without skolemizing, and the storage serializers treat that string as a blank node label, producing invalid N-Quads and rejecting a write that previously worked on non-Blazegraph backends. Skolemize before rewriting, or generate a valid blank node/IRI for blank-node subjects before emitting the chunk quads.
| const shaped = err as { name?: unknown; code?: unknown }; | ||
| return shaped.name === 'RdfLiteralSizeError' || shaped.code === 'RDF_LITERAL_TOO_LARGE'; | ||
| } | ||
|
|
There was a problem hiding this comment.
🟡 Issue: String(err ?? ...) turns any non-Error object that matches isRdfLiteralSizeError into an unhelpful "[object Object]" response. Since the predicate explicitly accepts shaped objects by name or code, this helper should also read a string message field before falling back, so daemon responses stay actionable for cross-package/plain-object errors.
| // or subject-level validation error downstream. | ||
| rejectUserAuthoredProtocolMetadata(quads); | ||
| const subjects = [...new Set(quads.map(q => q.subject))]; | ||
| // Normalize oversized text literals and reject reserved/protocol |
There was a problem hiding this comment.
🟡 Issue: The added test only exercises DKGPublisher.publish(), but this new normalization also changes the SWM share() write path and recurs in conditional/append paths. Add a regression test that calls share() or conditionalShare() with an oversized schema:text literal and asserts the stored quads are chunked; otherwise this path could regress to sending the original >65KB literal while tests stay green.
| privateQuads?: Quad[]; | ||
| }, | ||
| ): Promise<PublishOptions['precomputedAttestation']> { | ||
| quads = normalizeLargeRdfLiteralsForBlazegraph(quads).quads as Quad[]; |
There was a problem hiding this comment.
🟡 Issue: This normalization feeds _buildPrecomputedAttestationForSelection, but the added coverage does not exercise an agent/on-chain publish where the precomputed attestation is built from oversized public or private quads. Add a regression test through the agent selection/precomputed-attestation path so a raw-vs-normalized seal mismatch is caught.
| if (isPayloadTooLargeError(e)) { | ||
| return jsonResponse(res, 413, payloadTooLargeResponseBody(e)); | ||
| } | ||
| if (isRdfLiteralSizeError(e)) { |
There was a problem hiding this comment.
🟡 Issue: The daemon now maps RdfLiteralSizeError to a public 400 response, but there is no route-level test that makes the underlying publish/share call throw this error and asserts the HTTP status/body. Please add coverage for this route and the shared-memory handlers that recur below, so a regression to the generic 500 path or missing RDF_LITERAL_TOO_LARGE code is caught.
| @@ -98,6 +98,7 @@ import { | |||
| sharedMemoryReadBothFilter, | |||
| partitionCatalogQuads, | |||
There was a problem hiding this comment.
🔵 Nit: normalizeLargeRdfLiteralsForBlazegraph is imported from @origintrail-official/dkg-core in its own statement immediately after an existing import from the same package. Consolidating it into the existing import matches the local pattern and avoids needless import drift as more core symbols are added.
Summary
Fixes Blazegraph ingestion/sync failures caused by oversized RDF literals, specifically large
schema.org/textvalues that exceed Java/Blazegraph modified UTF-8 limits.Changes:
schema.org/textliterals into ordered chunk resources below a conservative 60,000-byte ceiling.RDF_LITERAL_TOO_LARGE.Testing
pnpm --filter @origintrail-official/dkg-core exec vitest run test/rdf-literal-limits.test.tspnpm --filter @origintrail-official/dkg-storage exec vitest run test/blazegraph.unit.test.tspnpm --filter @origintrail-official/dkg-publisher exec vitest run --config vitest.unit.config.ts test/dkg-publisher-compat.test.tsgit diff --checkLive Blazegraph integration was attempted, but this local environment has no
BLAZEGRAPH_TEST_URLand Docker daemon access was unavailable.