fix: [iceberg] Keep deep copy for Iceberg Java integration scan path#9
Closed
fix: [iceberg] Keep deep copy for Iceberg Java integration scan path#9
Conversation
…pache#3392) This fixes test failures when `native_datafusion` is enabled (issue apache#3315): 1. CometNativeScanExec now preserves the original outputPartitioning for bucketed scans, matching the pattern used by CometScanExec. Previously it always returned UnknownPartitioning, causing BroadcastJoinSuite tests to fail when they expected PartitioningCollection. 2. Updated diff files to accept CometNativeScanExec in the FileDataSourceV2FallBackSuite "Fallback Parquet V2 to V1" test, which checks for FileSourceScanExec or CometScanExec in the plan. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…erse-n1 (apache#3368) * Adjust target-cpus to modern baselines. Remove release-linux flag for -Ctarget-feature=-prefer-256-bit which doesn't make sense for Skylake, and on native it's unclear what effect it's supposed to have. * Undo inadvertent .PHONY target change. * Remove -prefer-256-bit flag. * Add docs.
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.11.0 to 1.11.1. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](tokio-rs/bytes@v1.11.0...v1.11.1) --- updated-dependencies: - dependency-name: bytes dependency-version: 1.11.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…3370) * docs: remove -pl from mvn test commands and unnecessary mvn install steps Avoid using -pl spark when running tests since it can cause Maven to pick up stale artifacts from the local repository. Without -pl, Maven builds all modules from source, eliminating the need for a separate mvn install step before running tests or regenerating golden files. Also documents how to run individual SQL file tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * address feedback --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [time](https://github.com/time-rs/time) from 0.3.45 to 0.3.47. - [Release notes](https://github.com/time-rs/time/releases) - [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md) - [Commits](time-rs/time@v0.3.45...v0.3.47) --- updated-dependencies: - dependency-name: time dependency-version: 0.3.47 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#3414) The native_datafusion scan already falls back to Spark when row index metadata columns are requested, so these tests should pass. Closes apache#3317 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…alid input (apache#3377) This PR adds: 1. Framework support for `query expect_error(<pattern>)` mode in the SQL test framework, which verifies both Spark and Comet throw exceptions containing the given pattern. 2. New ANSI mode test files: - `math/abs_ansi.sql` - Tests abs overflow on INT_MIN, LONG_MIN, etc. - `math/arithmetic_ansi.sql` - Tests arithmetic overflow and divide-by-zero - `array/get_array_item_ansi.sql` - Tests out-of-bounds array access (ignored pending apache#3375) - `array/element_at_ansi.sql` - Tests out-of-bounds element_at (ignored pending apache#3375) 3. Documentation for the new `expect_error` query mode. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…native_datafusion (apache#3415)
…ative scan serialization, add DPP for Iceberg scans (apache#3349)
Bumps [cc](https://github.com/rust-lang/cc-rs) from 1.2.54 to 1.2.55. - [Release notes](https://github.com/rust-lang/cc-rs/releases) - [Changelog](https://github.com/rust-lang/cc-rs/blob/main/CHANGELOG.md) - [Commits](rust-lang/cc-rs@cc-v1.2.54...cc-v1.2.55) --- updated-dependencies: - dependency-name: cc dependency-version: 1.2.55 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [WIP] Add Iceberg TPC-H benchmarking scripts Add scripts to benchmark TPC-H queries against Iceberg tables using Comet's native iceberg-rust integration: - create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tables - tpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tables - comet-tpch-iceberg.sh: Shell script to run the benchmark with Comet Also updates README.md with Iceberg benchmarking documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix * fix * Consolidate Parquet and Iceberg benchmark scripts into single tpcbench.py Merge tpcbench-iceberg.py into tpcbench.py using mutually exclusive args: - --data for Parquet files - --catalog/--database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address review comments on README consistency - Use --packages instead of --jars for table creation to match create-iceberg-tpch.py usage - Use $ICEBERG_CATALOG variable instead of hardcoding 'local' in spark.sql.catalog config to be consistent with comet-tpch-iceberg.sh - Clarify that JAR download is only needed for benchmark execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [arrow](https://github.com/apache/arrow-rs) from 57.2.0 to 57.3.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.2.0...57.3.0) --- updated-dependencies: - dependency-name: arrow dependency-version: 57.3.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#3450) Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.12 to 1.8.13. - [Release notes](https://github.com/smithy-lang/smithy-rs/releases) - [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/smithy-lang/smithy-rs/commits) --- updated-dependencies: - dependency-name: aws-config dependency-version: 1.8.13 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [regex](https://github.com/rust-lang/regex) from 1.12.2 to 1.12.3. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](rust-lang/regex@1.12.2...1.12.3) --- updated-dependencies: - dependency-name: regex dependency-version: 1.12.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The Cargo cache key only hashed Cargo.lock and Cargo.toml, not the actual .rs source files. This meant changes to Rust code without dependency changes would restore a stale cache, potentially using an old libcomet.so built from different source.
Add hashFiles('native/**/*.rs') to the cache key and update restore-keys to use the dependency hash as a prefix, allowing proper incremental builds when only source changes.
…umns (apache#3411)" (apache#3486) This reverts commit 4fe6452.
CometScanWrapper unconditionally set isFfiSafe=true, which told native ScanExec to skip deep copies for all scans. This is correct for CometScanExec (native_iceberg_compat) which now uses immutable Arrow readers, but incorrect for CometBatchScanExec (Iceberg Java integration via SupportsComet) which still uses mutable buffers. Make isFfiSafe conditional on the scan type: true for CometScanExec, false for CometBatchScanExec. Also remove the stale hasScanUsingMutableBuffers check for CometScanExec since PR apache#3411 replaced mutable buffers with immutable Arrow readers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CometSink.isFfiSafetake the operator as a parameter so FFI safety can be determined per-scanCometScanWrapper.isFfiSafenow returnstrueonly forCometScanExec(native_iceberg_compat, which uses immutable Arrow readers after perf: Remove mutable buffers from scan partition/missing columns apache/datafusion-comet#3411), andfalseforCometBatchScanExec(Iceberg Java integration via SupportsComet, which still uses mutable buffers)hasScanUsingMutableBufferscheck forCometScanExecsince perf: Remove mutable buffers from scan partition/missing columns apache/datafusion-comet#3411 replaced mutable buffers withArrowConstantColumnReaderTest plan
[iceberg]in title)🤖 Generated with Claude Code