Also emit canonical Arrow JSON extension metadata keys#112
Conversation
Arrow defines a canonical extension type for JSON (https://arrow.apache.org/docs/format/CanonicalExtensions.html#json): ARROW:extension:name = arrow.json ARROW:extension:metadata = {} Emit those alongside the existing is_json=true key so JSON-bearing string fields are recognized by the broader Arrow ecosystem (arrow-rs's Json extension type, pyarrow, DuckDB, Polars, etc.) while remaining back-compatible with consumers keyed on is_json. The change is contained to json_field_metadata(); all four production write sites already route through that helper. Detection via is_json_union remains purely structural and is unaffected. Tests are extended to assert all three keys, and the helper is re-exported at the crate root so test fixtures can use it instead of duplicating literals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #112 +/- ##
==========================================
+ Coverage 82.99% 83.05% +0.05%
==========================================
Files 17 17
Lines 1482 1487 +5
Branches 1482 1487 +5
==========================================
+ Hits 1230 1235 +5
Misses 183 183
Partials 69 69 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Mark the is_json=true emission in json_field_metadata() as a legacy, non-standard key kept only for back-compat, to be removed in a future release. Drop the corresponding assertion from the test helper so the remaining checks only enforce the canonical Arrow extension keys (ARROW:extension:name, ARROW:extension:metadata) — the legacy key continues to be emitted but is no longer part of the contract under test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the crate’s JSON-field metadata helper to emit Arrow’s canonical JSON extension metadata (for better interoperability across the Arrow ecosystem), while keeping the existing legacy is_json=true marker for backward compatibility. It also exposes the helper at the crate root and updates tests/fixtures to rely on it.
Changes:
- Extend
json_field_metadata()to includeARROW:extension:name=arrow.jsonandARROW:extension:metadata={}alongside legacyis_json=true. - Re-export
json_field_metadatafrom the crate root for external/test reuse. - Update metadata assertion tests and route return-field fixtures through
json_field_metadata().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/common_union.rs |
Expands JSON field metadata to include canonical Arrow extension keys while preserving is_json. |
src/lib.rs |
Re-exports json_field_metadata at the crate root. |
tests/main.rs |
Updates tests/fixtures to use and validate JSON field metadata. |
Comments suppressed due to low confidence (1)
tests/main.rs:196
assert_json_field_metadatacurrently only asserts the canonical Arrow extension keys, but this PR’s stated back-compat goal is to also preserve the legacyis_json=truekey. As written, the tests would still pass ifis_jsonwere accidentally removed later. Please extend this helper (or compare againstjson_field_metadata()) so the tests verify all expected keys, includingis_json.
fn assert_json_field_metadata(metadata: &HashMap<String, String>) {
assert_eq!(
metadata.get("ARROW:extension:name").map(String::as_str),
Some("arrow.json")
);
assert_eq!(metadata.get("ARROW:extension:metadata").map(String::as_str), Some("{}"));
}
#[tokio::test]
async fn test_json_get_equals() {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
json_field_metadata()to emit Arrow's canonical JSON extension type keys (ARROW:extension:name=arrow.json,ARROW:extension:metadata={}) alongside the existingis_json=truekey.json_field_metadataat the crate root so external users (and our own tests) can stop duplicating the literal.test_json_get_utf8/test_json_get_large_utf8fixtures through the helper.Why
Marking JSON-bearing
Utf8fields withis_jsonis a convention private to this crate — no other Arrow tool recognizes it. Emitting the canonical keys as well makes those columns interoperable with the broader Arrow ecosystem (arrow-rs'sJsonextension type, pyarrow, DuckDB, Polars, …). The legacyis_jsonkey is preserved so any downstream consumer that already keys off it keeps working.The change is contained to one helper; all four production write sites (
union_fields()'sarray/objectchildren,json_get_array's list item field,json_get_json's return field) already route through it. Detection viais_json_unionis purely structural — comparingUnionFieldsagainst theOnceLock-cachedunion_fields()— so adding metadata keys to the cached value preserves equality on both sides.Test plan
cargo buildcargo test— 154 passed, 0 failed (covers extendedtest_json_get_array_inner_field_json_metadata,test_json_get_json_json_metadata, fixture-onlytest_json_get_utf8/test_json_get_large_utf8, and allis_json_union-gated paths)cargo clippy --all-targets -- -D warnings🤖 Generated with Claude Code