Skip to content

fix(canonical): prevent field names being substituted with type names in CanonicalSchema#308

Open
jwilliams-vc wants to merge 2 commits into
linkedin:masterfrom
jwilliams-vc:fix/canonical-schema-field-name-nondeterminism
Open

fix(canonical): prevent field names being substituted with type names in CanonicalSchema#308
jwilliams-vc wants to merge 2 commits into
linkedin:masterfrom
jwilliams-vc:fix/canonical-schema-field-name-nondeterminism

Conversation

@jwilliams-vc

@jwilliams-vc jwilliams-vc commented Apr 28, 2026

Copy link
Copy Markdown

Fixes #307.

Problem

CanonicalSchema() produces two different strings from the same schema non-deterministically (~14% vs ~86% split) when a field name and a record type share the same local name within the same inherited namespace.

Minimal reproduction:

const schema = `{
  "type": "record", "name": "Parent", "namespace": "com.example",
  "fields": [{
    "name": "items",
    "type": ["null", {"type": "array", "items": {"type": "record", "name": "items", "fields": [{"name": "id", "type": "int"}]}}],
    "default": null
  }]
}`

seen := map[string]int{}
for i := 0; i < 1000; i++ {
    codec, _ := goavro.NewCodec(schema)
    seen[codec.CanonicalSchema()]++
}
// len(seen) == 2  (should always be 1)

Correct (~86%): ..."fields":[{"name":"items","type":...
Wrong (~14%): ..."fields":[{"name":"com.example.items","type":...

Field names must never be namespace-qualified per the Avro PCF spec.

Root cause

pcfObject iterates over the JSON map (random Go order). If the "type" value is visited before the "name" value, the inner record type "items" gets registered in typeLookup as "com.example.items". When pcfString later processes the field's "name" value, it finds "items" in typeLookup and substitutes the fully-qualified type name — incorrectly namespace-qualifying the field name.

Fix

typeLookup exists to resolve type references, not identifiers. A "name" value is either a type's own name (already namespace-qualified by the block above) or a field/symbol name that must be emitted verbatim — neither should go through typeLookup resolution.

Rather than special-casing the string value inside the k == "name" branch, we pass nil as typeLookup when recursing over the value of a "name" key. This expresses the constraint at the call site: the entire subtree below a "name" key sees no type lookup table. nil map reads are safe in Go, so pcfString returns the value verbatim as intended.

… in CanonicalSchema

When a field name and a record type share the same local name within the
same inherited namespace, CanonicalSchema() produced two different strings
non-deterministically (~14%/~86% split), depending on Go map iteration
order.

Root cause: pcfObject processes map keys in random order. When the "type"
value of a field is visited before the "name" value, the inner record type
gets registered in typeLookup first. Then when pcfString processes the
"name" value it finds the field name in typeLookup and substitutes the
fully-qualified type name, incorrectly namespace-qualifying the field name.

Fix: bypass typeLookup when emitting the value of a "name" key. Name
values are either a type's own name (already namespace-qualified by the
block above) or a field/symbol name that must be emitted verbatim. Neither
case should go through pcfString's typeLookup resolution.

Adds a 1000-iteration determinism test to catch any regression.

Fixes linkedin#307
… bypass

typeLookup is for resolving type references, not identifiers. Rather than
special-casing the string value when k == "name", pass nil so that pcfString
never substitutes an identifier with a fully-qualified type name at any level
of recursion. This makes the constraint explicit at the call site.

No behaviour change — the determinism test (1000 iterations) still passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CanonicalSchema() is non-deterministic when a field name matches a record type name in the same namespace

1 participant