fix(canonical): prevent field names being substituted with type names in CanonicalSchema#308
Open
jwilliams-vc wants to merge 2 commits into
Conversation
… in CanonicalSchema When a field name and a record type share the same local name within the same inherited namespace, CanonicalSchema() produced two different strings non-deterministically (~14%/~86% split), depending on Go map iteration order. Root cause: pcfObject processes map keys in random order. When the "type" value of a field is visited before the "name" value, the inner record type gets registered in typeLookup first. Then when pcfString processes the "name" value it finds the field name in typeLookup and substitutes the fully-qualified type name, incorrectly namespace-qualifying the field name. Fix: bypass typeLookup when emitting the value of a "name" key. Name values are either a type's own name (already namespace-qualified by the block above) or a field/symbol name that must be emitted verbatim. Neither case should go through pcfString's typeLookup resolution. Adds a 1000-iteration determinism test to catch any regression. Fixes linkedin#307
… bypass typeLookup is for resolving type references, not identifiers. Rather than special-casing the string value when k == "name", pass nil so that pcfString never substitutes an identifier with a fully-qualified type name at any level of recursion. This makes the constraint explicit at the call site. No behaviour change — the determinism test (1000 iterations) still passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #307.
Problem
CanonicalSchema()produces two different strings from the same schema non-deterministically (~14% vs ~86% split) when a field name and a record type share the same local name within the same inherited namespace.Minimal reproduction:
Correct (~86%):
..."fields":[{"name":"items","type":...Wrong (~14%):
..."fields":[{"name":"com.example.items","type":...Field names must never be namespace-qualified per the Avro PCF spec.
Root cause
pcfObjectiterates over the JSON map (random Go order). If the"type"value is visited before the"name"value, the inner record type"items"gets registered intypeLookupas"com.example.items". WhenpcfStringlater processes the field's"name"value, it finds"items"intypeLookupand substitutes the fully-qualified type name — incorrectly namespace-qualifying the field name.Fix
typeLookupexists to resolve type references, not identifiers. A"name"value is either a type's own name (already namespace-qualified by the block above) or a field/symbol name that must be emitted verbatim — neither should go throughtypeLookupresolution.Rather than special-casing the string value inside the
k == "name"branch, we passnilastypeLookupwhen recursing over the value of a"name"key. This expresses the constraint at the call site: the entire subtree below a"name"key sees no type lookup table.nilmap reads are safe in Go, sopcfStringreturns the value verbatim as intended.