Skip to content

Also emit canonical Arrow JSON extension metadata keys#112

Merged
adriangb merged 3 commits intomainfrom
arrow-json-extension-metadata
Apr 25, 2026
Merged

Also emit canonical Arrow JSON extension metadata keys#112
adriangb merged 3 commits intomainfrom
arrow-json-extension-metadata

Conversation

@adriangb
Copy link
Copy Markdown
Collaborator

Summary

  • Extend json_field_metadata() to emit Arrow's canonical JSON extension type keys (ARROW:extension:name = arrow.json, ARROW:extension:metadata = {}) alongside the existing is_json = true key.
  • Re-export json_field_metadata at the crate root so external users (and our own tests) can stop duplicating the literal.
  • Extend the two metadata-assertion tests added in Mark JSON-bearing string fields with is_json metadata #111 to verify all three keys, and route the test_json_get_utf8/test_json_get_large_utf8 fixtures through the helper.

Why

Marking JSON-bearing Utf8 fields with is_json is a convention private to this crate — no other Arrow tool recognizes it. Emitting the canonical keys as well makes those columns interoperable with the broader Arrow ecosystem (arrow-rs's Json extension type, pyarrow, DuckDB, Polars, …). The legacy is_json key is preserved so any downstream consumer that already keys off it keeps working.

The change is contained to one helper; all four production write sites (union_fields()'s array/object children, json_get_array's list item field, json_get_json's return field) already route through it. Detection via is_json_union is purely structural — comparing UnionFields against the OnceLock-cached union_fields() — so adding metadata keys to the cached value preserves equality on both sides.

Test plan

  • cargo build
  • cargo test — 154 passed, 0 failed (covers extended test_json_get_array_inner_field_json_metadata, test_json_get_json_json_metadata, fixture-only test_json_get_utf8 / test_json_get_large_utf8, and all is_json_union-gated paths)
  • cargo clippy --all-targets -- -D warnings

🤖 Generated with Claude Code

Arrow defines a canonical extension type for JSON
(https://arrow.apache.org/docs/format/CanonicalExtensions.html#json):

  ARROW:extension:name     = arrow.json
  ARROW:extension:metadata = {}

Emit those alongside the existing is_json=true key so JSON-bearing string
fields are recognized by the broader Arrow ecosystem (arrow-rs's Json
extension type, pyarrow, DuckDB, Polars, etc.) while remaining
back-compatible with consumers keyed on is_json.

The change is contained to json_field_metadata(); all four production
write sites already route through that helper. Detection via
is_json_union remains purely structural and is unaffected. Tests are
extended to assert all three keys, and the helper is re-exported at the
crate root so test fixtures can use it instead of duplicating literals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.05%. Comparing base (a3d9f62) to head (0b2e5e7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #112      +/-   ##
==========================================
+ Coverage   82.99%   83.05%   +0.05%     
==========================================
  Files          17       17              
  Lines        1482     1487       +5     
  Branches     1482     1487       +5     
==========================================
+ Hits         1230     1235       +5     
  Misses        183      183              
  Partials       69       69              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mark the is_json=true emission in json_field_metadata() as a legacy,
non-standard key kept only for back-compat, to be removed in a future
release. Drop the corresponding assertion from the test helper so the
remaining checks only enforce the canonical Arrow extension keys
(ARROW:extension:name, ARROW:extension:metadata) — the legacy key
continues to be emitted but is no longer part of the contract under
test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb requested a review from Copilot April 25, 2026 18:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the crate’s JSON-field metadata helper to emit Arrow’s canonical JSON extension metadata (for better interoperability across the Arrow ecosystem), while keeping the existing legacy is_json=true marker for backward compatibility. It also exposes the helper at the crate root and updates tests/fixtures to rely on it.

Changes:

  • Extend json_field_metadata() to include ARROW:extension:name=arrow.json and ARROW:extension:metadata={} alongside legacy is_json=true.
  • Re-export json_field_metadata from the crate root for external/test reuse.
  • Update metadata assertion tests and route return-field fixtures through json_field_metadata().

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/common_union.rs Expands JSON field metadata to include canonical Arrow extension keys while preserving is_json.
src/lib.rs Re-exports json_field_metadata at the crate root.
tests/main.rs Updates tests/fixtures to use and validate JSON field metadata.
Comments suppressed due to low confidence (1)

tests/main.rs:196

  • assert_json_field_metadata currently only asserts the canonical Arrow extension keys, but this PR’s stated back-compat goal is to also preserve the legacy is_json = true key. As written, the tests would still pass if is_json were accidentally removed later. Please extend this helper (or compare against json_field_metadata()) so the tests verify all expected keys, including is_json.
fn assert_json_field_metadata(metadata: &HashMap<String, String>) {
    assert_eq!(
        metadata.get("ARROW:extension:name").map(String::as_str),
        Some("arrow.json")
    );
    assert_eq!(metadata.get("ARROW:extension:metadata").map(String::as_str), Some("{}"));
}

#[tokio::test]
async fn test_json_get_equals() {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adriangb adriangb merged commit b2c3dc3 into main Apr 25, 2026
7 checks passed
@adriangb adriangb deleted the arrow-json-extension-metadata branch April 25, 2026 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants