fix: [iceberg] Keep deep copy for Iceberg Java integration scan path by andygrove · Pull Request #9 · andygrove/datafusion-comet

andygrove · 2026-02-11T14:55:42Z

Summary

Make CometSink.isFfiSafe take the operator as a parameter so FFI safety can be determined per-scan
CometScanWrapper.isFfiSafe now returns true only for CometScanExec (native_iceberg_compat, which uses immutable Arrow readers after perf: Remove mutable buffers from scan partition/missing columns apache/datafusion-comet#3411), and false for CometBatchScanExec (Iceberg Java integration via SupportsComet, which still uses mutable buffers)
Remove stale hasScanUsingMutableBuffers check for CometScanExec since perf: Remove mutable buffers from scan partition/missing columns apache/datafusion-comet#3411 replaced mutable buffers with ArrowConstantColumnReader

Test plan

Iceberg integration tests pass (triggered by [iceberg] in title)
Existing scan tests continue to pass

🤖 Generated with Claude Code

…pache#3392) This fixes test failures when `native_datafusion` is enabled (issue apache#3315): 1. CometNativeScanExec now preserves the original outputPartitioning for bucketed scans, matching the pattern used by CometScanExec. Previously it always returned UnknownPartitioning, causing BroadcastJoinSuite tests to fail when they expected PartitioningCollection. 2. Updated diff files to accept CometNativeScanExec in the FileDataSourceV2FallBackSuite "Fallback Parquet V2 to V1" test, which checks for FileSourceScanExec or CometScanExec in the plan. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…erse-n1 (apache#3368) * Adjust target-cpus to modern baselines. Remove release-linux flag for -Ctarget-feature=-prefer-256-bit which doesn't make sense for Skylake, and on native it's unclear what effect it's supposed to have. * Undo inadvertent .PHONY target change. * Remove -prefer-256-bit flag. * Add docs.

Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.11.0 to 1.11.1. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](tokio-rs/bytes@v1.11.0...v1.11.1) --- updated-dependencies: - dependency-name: bytes dependency-version: 1.11.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…PL` config (apache#3358)

…3370) * docs: remove -pl from mvn test commands and unnecessary mvn install steps Avoid using -pl spark when running tests since it can cause Maven to pick up stale artifacts from the local repository. Without -pl, Maven builds all modules from source, eliminating the need for a separate mvn install step before running tests or regenerating golden files. Also documents how to run individual SQL file tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * address feedback --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

)

Bumps [time](https://github.com/time-rs/time) from 0.3.45 to 0.3.47. - [Release notes](https://github.com/time-rs/time/releases) - [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md) - [Commits](time-rs/time@v0.3.45...v0.3.47) --- updated-dependencies: - dependency-name: time dependency-version: 0.3.47 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…#3414) The native_datafusion scan already falls back to Spark when row index metadata columns are requested, so these tests should pass. Closes apache#3317 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…alid input (apache#3377) This PR adds: 1. Framework support for `query expect_error(<pattern>)` mode in the SQL test framework, which verifies both Spark and Comet throw exceptions containing the given pattern. 2. New ANSI mode test files: - `math/abs_ansi.sql` - Tests abs overflow on INT_MIN, LONG_MIN, etc. - `math/arithmetic_ansi.sql` - Tests arithmetic overflow and divide-by-zero - `array/get_array_item_ansi.sql` - Tests out-of-bounds array access (ignored pending apache#3375) - `array/element_at_ansi.sql` - Tests out-of-bounds element_at (ignored pending apache#3375) 3. Documentation for the new `expect_error` query mode. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…native_datafusion (apache#3415)

…pache#3407)

…) (apache#3437)

…ative scan serialization, add DPP for Iceberg scans (apache#3349)

Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.2.54 to 1.2.55. - [Release notes](https://github.com/rust-lang/cc-rs/releases) - [Changelog](https://github.com/rust-lang/cc-rs/blob/main/CHANGELOG.md) - [Commits](rust-lang/cc-rs@cc-v1.2.54...cc-v1.2.55) --- updated-dependencies: - dependency-name: cc dependency-version: 1.2.55 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [WIP] Add Iceberg TPC-H benchmarking scripts Add scripts to benchmark TPC-H queries against Iceberg tables using Comet's native iceberg-rust integration: - create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tables - tpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tables - comet-tpch-iceberg.sh: Shell script to run the benchmark with Comet Also updates README.md with Iceberg benchmarking documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix * fix * Consolidate Parquet and Iceberg benchmark scripts into single tpcbench.py Merge tpcbench-iceberg.py into tpcbench.py using mutually exclusive args: - --data for Parquet files - --catalog/--database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address review comments on README consistency - Use --packages instead of --jars for table creation to match create-iceberg-tpch.py usage - Use $ICEBERG_CATALOG variable instead of hardcoding 'local' in spark.sql.catalog config to be consistent with comet-tpch-iceberg.sh - Clarify that JAR download is only needed for benchmark execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…e#3396)

Bumps [arrow](https://github.com/apache/arrow-rs) from 57.2.0 to 57.3.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.2.0...57.3.0) --- updated-dependencies: - dependency-name: arrow dependency-version: 57.3.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…#3450) Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.12 to 1.8.13. - [Release notes](https://github.com/smithy-lang/smithy-rs/releases) - [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/smithy-lang/smithy-rs/commits) --- updated-dependencies: - dependency-name: aws-config dependency-version: 1.8.13 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [regex](https://github.com/rust-lang/regex) from 1.12.2 to 1.12.3. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](rust-lang/regex@1.12.2...1.12.3) --- updated-dependencies: - dependency-name: regex dependency-version: 1.12.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

apache#2991)

…pache#3441)

…che#3411)

The Cargo cache key only hashed Cargo.lock and Cargo.toml, not the actual .rs source files. This meant changes to Rust code without dependency changes would restore a stale cache, potentially using an old libcomet.so built from different source. Add hashFiles('native/**/*.rs') to the cache key and update restore-keys to use the dependency hash as a prefix, allowing proper incremental builds when only source changes.

…mns (apache#3411)" (apache#3486)

…umns (apache#3411)" (apache#3486) This reverts commit 4fe6452.

CometScanWrapper unconditionally set isFfiSafe=true, which told native ScanExec to skip deep copies for all scans. This is correct for CometScanExec (native_iceberg_compat) which now uses immutable Arrow readers, but incorrect for CometBatchScanExec (Iceberg Java integration via SupportsComet) which still uses mutable buffers. Make isFfiSafe conditional on the scan type: true for CometScanExec, false for CometBatchScanExec. Also remove the stale hasScanUsingMutableBuffers check for CometScanExec since PR apache#3411 replaced mutable buffers with immutable Arrow readers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove and others added 30 commits February 4, 2026 09:20

chore: show line of error sql (apache#3390)

3b18e1a

chore: Move writer-related logic to "writers" module (apache#3385)

0b3329c

feat: Drop native_comet as a valid option for `COMET_NATIVE_SCAN_IM…

a2f8e54

…PL` config (apache#3358)

chore: Clean up and split shuffle module (apache#3395)

2e24695

Make PR workflows match target-cpu flags in published jars. (apache#3402

aa5afd6

)

chore: Run Spark SQL tests with native_datafusion in CI (apache#3393)

2f64b60

fix: fall back to Spark when Parquet field ID matching is enabled in …

58cf6e1

…native_datafusion (apache#3415)

feat: Support date to timestamp cast (apache#3383)

d89e50a

refactor: Split read benchmarks and add addParquetScanCases helper (a…

454ca68

…pache#3407)

chore: 4.5x reduction in number of golden files (apache#3399)

9b05dfe

Feat: to_csv (apache#3004)

e9dafd0

fix: Expose bucketing information from CometNativeScanExec (apache#3319…

4cab60d

…) (apache#3437)

minor: map_from_entries sql tests (apache#3394)

4f1fa4b

fix: support scalar processing for space function (apache#3408)

1d01b7d

feat: CometExecRDD supports per-partition plan data, reduce Iceberg n…

28e13dd

…ative scan serialization, add DPP for Iceberg scans (apache#3349)

chore: add confirmation before tarball is released (apache#3439)

d4fce7e

chore: Remove dead code paths for deprecated native_comet scan (apach…

5a9d066

…e#3396)

perf: Optimize contains expression with SIMD-based scalar pattern sea… (

7943199

apache#2991)

Shekharrajak and others added 13 commits February 9, 2026 15:33

feat: Support right expression (apache#3207)

7a07db2

Add batch coalescing in BufBatchWriter to reduce IPC schema overhead (a…

8724b76

…pache#3441)

chore: Use native_datafusion scan in benchmark scripts (apache#3460)

025e2a6

feat: support map_contains_key expression (apache#3369)

eb43699

chore(deps): bump rand from 0.9.2 to 0.10.0 in /native (apache#3465)

aa067e9

perf: Remove mutable buffers from scan partition/missing columns (apa…

020d982

…che#3411)

perf: [iceberg] Single-pass FileScanTask validation (apache#3443)

1ccfa14

test: Add additional contains expression tests (apache#3462)

7c78fef

feat: add support for make_date expression (apache#3147)

eccf237

Revert "perf: Remove mutable buffers from scan partition/missing colu…

4fe6452

…mns (apache#3411)" (apache#3486)

Reapply "perf: Remove mutable buffers from scan partition/missing col…

c80896e

…umns (apache#3411)" (apache#3486) This reverts commit 4fe6452.

andygrove closed this Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: [iceberg] Keep deep copy for Iceberg Java integration scan path#9

fix: [iceberg] Keep deep copy for Iceberg Java integration scan path#9
andygrove wants to merge 43 commits intomainfrom
fix-iceberg-java-ffi-safety

andygrove commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

andygrove commented Feb 11, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants