Skip to content

fix: [iceberg] Keep deep copy for Iceberg Java integration scan path#9

Closed
andygrove wants to merge 43 commits intomainfrom
fix-iceberg-java-ffi-safety
Closed

fix: [iceberg] Keep deep copy for Iceberg Java integration scan path#9
andygrove wants to merge 43 commits intomainfrom
fix-iceberg-java-ffi-safety

Conversation

@andygrove
Copy link
Copy Markdown
Owner

Summary

Test plan

  • Iceberg integration tests pass (triggered by [iceberg] in title)
  • Existing scan tests continue to pass

🤖 Generated with Claude Code

andygrove and others added 30 commits February 4, 2026 09:20
…pache#3392)

This fixes test failures when `native_datafusion` is enabled (issue apache#3315):

1. CometNativeScanExec now preserves the original outputPartitioning for
   bucketed scans, matching the pattern used by CometScanExec. Previously
   it always returned UnknownPartitioning, causing BroadcastJoinSuite tests
   to fail when they expected PartitioningCollection.

2. Updated diff files to accept CometNativeScanExec in the
   FileDataSourceV2FallBackSuite "Fallback Parquet V2 to V1" test, which
   checks for FileSourceScanExec or CometScanExec in the plan.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…erse-n1 (apache#3368)

* Adjust target-cpus to modern baselines. Remove release-linux flag for -Ctarget-feature=-prefer-256-bit which doesn't make sense for Skylake, and on native it's unclear what effect it's supposed to have.

* Undo inadvertent .PHONY target change.

* Remove -prefer-256-bit flag.

* Add docs.
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.11.0 to 1.11.1.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](tokio-rs/bytes@v1.11.0...v1.11.1)

---
updated-dependencies:
- dependency-name: bytes
  dependency-version: 1.11.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…3370)

* docs: remove -pl from mvn test commands and unnecessary mvn install steps

Avoid using -pl spark when running tests since it can cause Maven to
pick up stale artifacts from the local repository. Without -pl, Maven
builds all modules from source, eliminating the need for a separate
mvn install step before running tests or regenerating golden files.

Also documents how to run individual SQL file tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* address feedback

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [time](https://github.com/time-rs/time) from 0.3.45 to 0.3.47.
- [Release notes](https://github.com/time-rs/time/releases)
- [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md)
- [Commits](time-rs/time@v0.3.45...v0.3.47)

---
updated-dependencies:
- dependency-name: time
  dependency-version: 0.3.47
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#3414)

The native_datafusion scan already falls back to Spark when row index
metadata columns are requested, so these tests should pass.

Closes apache#3317

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…alid input (apache#3377)

This PR adds:

1. Framework support for `query expect_error(<pattern>)` mode in the SQL test
   framework, which verifies both Spark and Comet throw exceptions containing
   the given pattern.

2. New ANSI mode test files:
   - `math/abs_ansi.sql` - Tests abs overflow on INT_MIN, LONG_MIN, etc.
   - `math/arithmetic_ansi.sql` - Tests arithmetic overflow and divide-by-zero
   - `array/get_array_item_ansi.sql` - Tests out-of-bounds array access (ignored pending apache#3375)
   - `array/element_at_ansi.sql` - Tests out-of-bounds element_at (ignored pending apache#3375)

3. Documentation for the new `expect_error` query mode.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…ative scan serialization, add DPP for Iceberg scans (apache#3349)
Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.2.54 to 1.2.55.
- [Release notes](https://github.com/rust-lang/cc-rs/releases)
- [Changelog](https://github.com/rust-lang/cc-rs/blob/main/CHANGELOG.md)
- [Commits](rust-lang/cc-rs@cc-v1.2.54...cc-v1.2.55)

---
updated-dependencies:
- dependency-name: cc
  dependency-version: 1.2.55
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [WIP] Add Iceberg TPC-H benchmarking scripts

Add scripts to benchmark TPC-H queries against Iceberg tables using
Comet's native iceberg-rust integration:

- create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tables
- tpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tables
- comet-tpch-iceberg.sh: Shell script to run the benchmark with Comet

Also updates README.md with Iceberg benchmarking documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix

* fix

* Consolidate Parquet and Iceberg benchmark scripts into single tpcbench.py

Merge tpcbench-iceberg.py into tpcbench.py using mutually exclusive args:
- --data for Parquet files
- --catalog/--database for Iceberg tables

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: address review comments on README consistency

- Use --packages instead of --jars for table creation to match
  create-iceberg-tpch.py usage
- Use $ICEBERG_CATALOG variable instead of hardcoding 'local' in
  spark.sql.catalog config to be consistent with comet-tpch-iceberg.sh
- Clarify that JAR download is only needed for benchmark execution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [arrow](https://github.com/apache/arrow-rs) from 57.2.0 to 57.3.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md)
- [Commits](apache/arrow-rs@57.2.0...57.3.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-version: 57.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#3450)

Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.12 to 1.8.13.
- [Release notes](https://github.com/smithy-lang/smithy-rs/releases)
- [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/smithy-lang/smithy-rs/commits)

---
updated-dependencies:
- dependency-name: aws-config
  dependency-version: 1.8.13
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [regex](https://github.com/rust-lang/regex) from 1.12.2 to 1.12.3.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](rust-lang/regex@1.12.2...1.12.3)

---
updated-dependencies:
- dependency-name: regex
  dependency-version: 1.12.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Shekharrajak and others added 13 commits February 9, 2026 15:33
The Cargo cache key only hashed Cargo.lock and Cargo.toml, not the actual .rs source files. This meant changes to Rust code without dependency changes would restore a stale cache, potentially using an old libcomet.so built from different source.

Add hashFiles('native/**/*.rs') to the cache key and update restore-keys to use the dependency hash as a prefix, allowing proper incremental builds when only source changes.
CometScanWrapper unconditionally set isFfiSafe=true, which told native
ScanExec to skip deep copies for all scans. This is correct for
CometScanExec (native_iceberg_compat) which now uses immutable Arrow
readers, but incorrect for CometBatchScanExec (Iceberg Java integration
via SupportsComet) which still uses mutable buffers.

Make isFfiSafe conditional on the scan type: true for CometScanExec,
false for CometBatchScanExec. Also remove the stale
hasScanUsingMutableBuffers check for CometScanExec since PR apache#3411
replaced mutable buffers with immutable Arrow readers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove closed this Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants