Skip to content

fix: prevent GraphQL scalar/union from inheriting a later type's fields and line range#477

Open
JOhnsonKC201 wants to merge 1 commit into
Egonex-AI:mainfrom
JOhnsonKC201:fix/graphql-scalar-union-definitions
Open

fix: prevent GraphQL scalar/union from inheriting a later type's fields and line range#477
JOhnsonKC201 wants to merge 1 commit into
Egonex-AI:mainfrom
JOhnsonKC201:fix/graphql-scalar-union-definitions

Conversation

@JOhnsonKC201

Copy link
Copy Markdown

Summary

GraphQLParser.extractDefinitions mis-parses scalar and union definitions. Both are matched by the definition regex but have no { ... } body, yet the parser unconditionally calls extractFields() and searches for the next }. For a body-less definition this scans forward into the next braced definition, so the scalar/union steals that type's fields and its closing-brace line range.

This corrupts the knowledge graph for almost any real GraphQL schema, since a custom scalar above object types (scalar DateTime, scalar JSON, scalar Upload, …) is ubiquitous.

Steps to reproduce

scalar DateTime

type User {
  id: ID!
  name: String!
}
new GraphQLParser().analyzeFile(schema.graphql, content).definitions

Expected: DateTime{ kind: scalar, lineRange: [1, 1], fields: [] }
Actual (before fix): DateTime{ kind: scalar, lineRange: [1, 6], fields: [id, name] }

The scalar wrongly spans lines 1–6 and claims User's id/name fields. A union declared above a braced type behaves identically.

Fix

Guard field/brace extraction behind a hasBody check (kind is not scalar or union). Body-less definitions now report no fields and a single-line range. Braced definitions (type / input / interface / enum) are unchanged — only ~10 lines, scoped to extractDefinitions.

const hasBody = kind !== scalar && kind !== union;
const fields = hasBody ? this.extractFields(content, match.index) : [];
const afterMatch = content.slice(match.index);
const closeBrace = hasBody ? afterMatch.indexOf(}) : -1;

Tests

Adds two regression tests (a scalar and a union each declared above a type) asserting the body-less definition gets no fields and a single-line range, and that the following type keeps its own fields/range.

  • pnpm --filter @understand-anything/core test755 passed (was 753; +2 new). Both new tests fail on main and pass with this change.
  • pnpm lint → clean.

PR checklist

  • Code follows the project's style guidelines
  • All tests pass (pnpm test)
  • New code has test coverage
  • Commit messages follow convention
  • No console.log or debug code left behind
  • Scope limited to the bug (no unrelated changes)

Environment

Found while auditing the parser layer. Repro is platform-independent; verified on Node v24 / pnpm 10.6.2.

…ds and line range

scalar and union definitions have no brace-delimited body, but
extractDefinitions unconditionally called extractFields() and searched
for the next }. For a body-less definition that scanned forward into
the next braced definition, so e.g. `scalar DateTime` declared above a
`type` inherited that type's fields and closing-brace line range.

Guard field/brace extraction behind a hasBody check (kind is not scalar
or union). Body-less definitions now report no fields and a single-line
range; braced definitions (type/input/interface/enum) are unchanged.

Adds regression tests for a scalar and a union declared above a type.
@tirth8205

Copy link
Copy Markdown
Contributor

Reviewed for correctness of the scalar/union body-less guard. The fix is correct and well-scoped.

Verification I ran

  • Reproduced the bug on the pre-fix logic: scalar DateTime above type User yields { kind: "scalar", lineRange: [1, 6], fields: ["id", "name"] } — confirms the description exactly.
  • Ran the patched extractDefinitions against several schemas: scalar above type, union above type, scalar between two types, scalar at EOF, multiple consecutive scalars, and an interface above a type. All produce correct body-less results for scalar/union and leave braced definitions untouched.

Correctness notes

  • hasBody = kind !== "scalar" && kind !== "union" is complete: the definition regex only matches type|input|enum|interface|union|scalar, and scalar/union are the only body-less members. Guarding both extractFields and the indexOf("}") search is exactly the right pair of fixes, since both independently scanned forward into the next definition.
  • No regression to type/input/interface/enum inheritance — those keep hasBody = true and run the unchanged path.
  • Line math checks out: for a definition at index 0, startLine = 1 and endLine = startLine, matching the test's [1, 1].

Minor

  • A multi-line union (union X =\n A |\n B) is now reported as lineRange: [start, start] rather than its true multi-line span. This is cosmetic and still strictly better than the prior behavior (which absorbed a later type's brace line), so not blocking — just flagging that multi-line union ranges remain approximate.

Overall: correct fix for a real, high-impact parsing bug (custom scalars above object types are ubiquitous). Tests are meaningful and assert both the body-less definition and the unaffected following type. Good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants