Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
9ecf53f
fix: preserve partitioning in CometNativeScanExec for bucketed scans …
andygrove Feb 4, 2026
c8b5fce
chore: update target-cpus in published binaries to x86-64-v3 and neov…
mbutrovich Feb 4, 2026
3b18e1a
chore: show line of error sql (#3390)
peterxcli Feb 4, 2026
0b3329c
chore: Move writer-related logic to "writers" module (#3385)
EmilyMatt Feb 4, 2026
14cd6c9
chore(deps): bump bytes from 1.11.0 to 1.11.1 in /native (#3380)
dependabot[bot] Feb 4, 2026
a2f8e54
feat: Drop `native_comet` as a valid option for `COMET_NATIVE_SCAN_IM…
andygrove Feb 4, 2026
2e24695
chore: Clean up and split shuffle module (#3395)
EmilyMatt Feb 5, 2026
48ebd28
docs: Improve documentation on maven usage for running tests (#3370)
andygrove Feb 5, 2026
aa5afd6
Make PR workflows match target-cpu flags in published jars. (#3402)
mbutrovich Feb 5, 2026
d28d0e0
chore(deps): bump time from 0.3.45 to 0.3.47 in /native (#3412)
dependabot[bot] Feb 5, 2026
2f64b60
chore: Run Spark SQL tests with `native_datafusion` in CI (#3393)
andygrove Feb 5, 2026
d804b8f
fix: unignore row index Spark SQL tests for native_datafusion (#3414)
andygrove Feb 5, 2026
2c6a8ac
chore: Add ANSI mode SQL test files for expressions that throw on inv…
andygrove Feb 6, 2026
58cf6e1
fix: fall back to Spark when Parquet field ID matching is enabled in …
andygrove Feb 6, 2026
d89e50a
feat: Support date to timestamp cast (#3383)
coderfender Feb 6, 2026
454ca68
refactor: Split read benchmarks and add addParquetScanCases helper (#…
andygrove Feb 6, 2026
9b05dfe
chore: 4.5x reduction in number of golden files (#3399)
andygrove Feb 6, 2026
e9dafd0
Feat: to_csv (#3004)
kazantsev-maksim Feb 7, 2026
4cab60d
fix: Expose bucketing information from CometNativeScanExec (#3319) (#…
andygrove Feb 7, 2026
4f1fa4b
minor: map_from_entries sql tests (#3394)
kazantsev-maksim Feb 7, 2026
1d01b7d
fix: support scalar processing for `space` function (#3408)
kazantsev-maksim Feb 7, 2026
28e13dd
feat: CometExecRDD supports per-partition plan data, reduce Iceberg n…
mbutrovich Feb 8, 2026
d4fce7e
chore: add confirmation before tarball is released (#3439)
milenkovicm Feb 9, 2026
515abb8
chore(deps): bump cc from 1.2.54 to 1.2.55 in /native (#3451)
dependabot[bot] Feb 9, 2026
a2a9467
chore: Add Iceberg TPC-H benchmarking scripts (#3294)
andygrove Feb 9, 2026
5a9d066
chore: Remove dead code paths for deprecated native_comet scan (#3396)
andygrove Feb 9, 2026
006cacd
chore(deps): bump arrow from 57.2.0 to 57.3.0 in /native (#3449)
dependabot[bot] Feb 9, 2026
ef26329
chore(deps): bump aws-config from 1.8.12 to 1.8.13 in /native (#3450)
dependabot[bot] Feb 9, 2026
599af33
chore(deps): bump regex from 1.12.2 to 1.12.3 in /native (#3453)
dependabot[bot] Feb 9, 2026
7943199
perf: Optimize contains expression with SIMD-based scalar pattern sea…
Shekharrajak Feb 9, 2026
7a07db2
feat: Support right expression (#3207)
Shekharrajak Feb 9, 2026
8724b76
Add batch coalescing in BufBatchWriter to reduce IPC schema overhead …
andygrove Feb 10, 2026
025e2a6
chore: Use native_datafusion scan in benchmark scripts (#3460)
andygrove Feb 10, 2026
eb43699
feat: support map_contains_key expression (#3369)
peterxcli Feb 10, 2026
aa067e9
chore(deps): bump rand from 0.9.2 to 0.10.0 in /native (#3465)
manuzhang Feb 10, 2026
020d982
perf: Remove mutable buffers from scan partition/missing columns (#3411)
andygrove Feb 10, 2026
1ccfa14
perf: [iceberg] Single-pass FileScanTask validation (#3443)
mbutrovich Feb 10, 2026
7c78fef
test: Add additional contains expression tests (#3462)
andygrove Feb 10, 2026
1e1b88d
chore: Adjust native artifact caching key in CI (#3476)
mbutrovich Feb 10, 2026
eccf237
feat: add support for make_date expression (#3147)
andygrove Feb 11, 2026
4fe6452
Revert "perf: Remove mutable buffers from scan partition/missing colu…
mbutrovich Feb 11, 2026
c80896e
Reapply "perf: Remove mutable buffers from scan partition/missing col…
andygrove Feb 11, 2026
9a4de4b
fix: [iceberg] Keep deep copy for Iceberg Java integration scan path
andygrove Feb 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/actions/java-test/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ inputs:
scan_impl:
description: 'The default Parquet scan implementation'
required: false
default: 'native_comet'
default: 'auto'
upload-test-reports:
description: 'Whether to upload test results including coverage to GitHub'
required: false
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/iceberg_spark_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,16 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library
# Use CI profile for faster builds (no LTO) and to share cache with pr_build_linux.yml.
run: |
cd native && cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Save Cargo cache
uses: actions/cache/save@v5
Expand All @@ -86,7 +88,7 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

- name: Upload native library
uses: actions/upload-artifact@v6
Expand Down
19 changes: 11 additions & 8 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,16 +84,18 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library (CI profile)
run: |
cd native
# CI profile: same overflow behavior as release, but faster compilation
# (no LTO, parallel codegen)
cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Upload native library
uses: actions/upload-artifact@v6
Expand All @@ -110,7 +112,7 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

# Run Rust tests (runs in parallel with build-native, uses debug builds)
linux-test-rust:
Expand All @@ -136,9 +138,9 @@ jobs:
~/.cargo/git
native/target
# Note: Java version intentionally excluded - Rust target is JDK-independent
key: ${{ runner.os }}-cargo-debug-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-debug-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-debug-
${{ runner.os }}-cargo-debug-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Rust test steps
uses: ./.github/actions/rust-test
Expand All @@ -151,7 +153,7 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-debug-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-debug-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

linux-test:
needs: build-native
Expand All @@ -164,7 +166,7 @@ jobs:
- name: "Spark 3.4, JDK 11, Scala 2.12"
java_version: "11"
maven_opts: "-Pspark-3.4 -Pscala-2.12"
scan_impl: "native_comet"
scan_impl: "auto"

- name: "Spark 3.5.5, JDK 17, Scala 2.13"
java_version: "17"
Expand All @@ -174,7 +176,7 @@ jobs:
- name: "Spark 3.5.6, JDK 17, Scala 2.13"
java_version: "17"
maven_opts: "-Pspark-3.5 -Dspark.version=3.5.6 -Pscala-2.13"
scan_impl: "native_comet"
scan_impl: "auto"

- name: "Spark 3.5, JDK 17, Scala 2.12"
java_version: "17"
Expand Down Expand Up @@ -260,6 +262,7 @@ jobs:
org.apache.comet.CometStringExpressionSuite
org.apache.comet.CometBitwiseExpressionSuite
org.apache.comet.CometMapExpressionSuite
org.apache.comet.CometCsvExpressionSuite
org.apache.comet.CometJsonExpressionSuite
org.apache.comet.expressions.conditional.CometIfSuite
org.apache.comet.expressions.conditional.CometCoalesceSuite
Expand Down
9 changes: 6 additions & 3 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,16 +84,18 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library (CI profile)
run: |
cd native
# CI profile: same overflow behavior as release, but faster compilation
# (no LTO, parallel codegen)
cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=apple-m1"

- name: Upload native library
uses: actions/upload-artifact@v6
Expand All @@ -110,7 +112,7 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

macos-aarch64-test:
needs: build-native
Expand Down Expand Up @@ -204,6 +206,7 @@ jobs:
org.apache.comet.CometBitwiseExpressionSuite
org.apache.comet.CometMapExpressionSuite
org.apache.comet.CometJsonExpressionSuite
org.apache.comet.CometCsvExpressionSuite
org.apache.comet.expressions.conditional.CometIfSuite
org.apache.comet.expressions.conditional.CometCoalesceSuite
org.apache.comet.expressions.conditional.CometCaseWhenSuite
Expand Down
23 changes: 11 additions & 12 deletions .github/workflows/spark_sql_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,16 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}
restore-keys: |
${{ runner.os }}-cargo-ci-
${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-

- name: Build native library (CI profile)
run: |
cd native
cargo build --profile ci
env:
RUSTFLAGS: "-Ctarget-cpu=x86-64-v3"

- name: Upload native library
uses: actions/upload-artifact@v6
Expand All @@ -99,7 +101,7 @@ jobs:
~/.cargo/registry
~/.cargo/git
native/target
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}
key: ${{ runner.os }}-cargo-ci-${{ hashFiles('native/**/Cargo.lock', 'native/**/Cargo.toml') }}-${{ hashFiles('native/**/*.rs') }}

spark-sql-test:
needs: build-native
Expand All @@ -116,18 +118,15 @@ jobs:
- {name: "sql_hive-3", args1: "", args2: "hive/testOnly * -- -n org.apache.spark.tags.SlowHiveTest"}
# Test combinations:
# - auto scan: all Spark versions (3.4, 3.5, 4.0)
# - native_comet: Spark 3.4, 3.5
# - native_iceberg_compat: Spark 3.5 only
config:
- {spark-short: '3.4', spark-full: '3.4.3', java: 11, scan-impl: 'auto', scan-env: ''}
- {spark-short: '3.5', spark-full: '3.5.8', java: 11, scan-impl: 'auto', scan-env: ''}
- {spark-short: '4.0', spark-full: '4.0.1', java: 17, scan-impl: 'auto', scan-env: ''}
- {spark-short: '3.4', spark-full: '3.4.3', java: 11, scan-impl: 'native_comet', scan-env: 'COMET_PARQUET_SCAN_IMPL=native_comet'}
- {spark-short: '3.5', spark-full: '3.5.8', java: 11, scan-impl: 'native_comet', scan-env: 'COMET_PARQUET_SCAN_IMPL=native_comet'}
- {spark-short: '3.5', spark-full: '3.5.8', java: 11, scan-impl: 'native_iceberg_compat', scan-env: 'COMET_PARQUET_SCAN_IMPL=native_iceberg_compat'}
- {spark-short: '3.4', spark-full: '3.4.3', java: 11, scan-impl: 'auto'}
- {spark-short: '3.5', spark-full: '3.5.8', java: 11, scan-impl: 'auto'}
- {spark-short: '3.5', spark-full: '3.5.8', java: 11, scan-impl: 'native_datafusion'}
- {spark-short: '4.0', spark-full: '4.0.1', java: 17, scan-impl: 'auto'}
# Skip sql_hive-1 for Spark 4.0 due to https://github.com/apache/datafusion-comet/issues/2946
exclude:
- config: {spark-short: '4.0', spark-full: '4.0.1', java: 17, scan-impl: 'auto', scan-env: ''}
- config: {spark-short: '4.0', spark-full: '4.0.1', java: 17, scan-impl: 'auto'}
module: {name: "sql_hive-1", args1: "", args2: "hive/testOnly * -- -l org.apache.spark.tags.ExtendedHiveTest -l org.apache.spark.tags.SlowHiveTest"}
fail-fast: false
name: spark-sql-${{ matrix.config.scan-impl }}-${{ matrix.module.name }}/spark-${{ matrix.config.spark-full }}
Expand Down Expand Up @@ -156,7 +155,7 @@ jobs:
run: |
cd apache-spark
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ${{ matrix.config.scan-env }} ENABLE_COMET_LOG_FALLBACK_REASONS=${{ github.event.inputs.collect-fallback-logs || 'false' }} \
NOLINT_ON_COMPILE=true ENABLE_COMET=true ENABLE_COMET_ONHEAP=true COMET_PARQUET_SCAN_IMPL=${{ matrix.config.scan-impl }} ENABLE_COMET_LOG_FALLBACK_REASONS=${{ github.event.inputs.collect-fallback-logs || 'false' }} \
build/sbt -Dsbt.log.noformat=true ${{ matrix.module.args1 }} "${{ matrix.module.args2 }}"
if [ "${{ github.event.inputs.collect-fallback-logs }}" = "true" ]; then
find . -type f -name "unit-tests.log" -print0 | xargs -0 grep -h "Comet cannot accelerate" | sed 's/.*Comet cannot accelerate/Comet cannot accelerate/' | sort -u > fallback.log
Expand Down
14 changes: 7 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -51,26 +51,26 @@ format:

# build native libs for amd64 architecture Linux/MacOS on a Linux/amd64 machine/container
core-amd64-libs:
cd native && cargo build -j 2 --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=x86-64-v3" cargo build -j 2 --release $(FEATURES_ARG)
ifdef HAS_OSXCROSS
rustup target add x86_64-apple-darwin
cd native && cargo build -j 2 --target x86_64-apple-darwin --release $(FEATURES_ARG)
endif

# build native libs for arm64 architecture Linux/MacOS on a Linux/arm64 machine/container
core-arm64-libs:
cd native && cargo build -j 2 --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=neoverse-n1" cargo build -j 2 --release $(FEATURES_ARG)
ifdef HAS_OSXCROSS
rustup target add aarch64-apple-darwin
cd native && cargo build -j 2 --target aarch64-apple-darwin --release $(FEATURES_ARG)
endif

core-amd64:
rustup target add x86_64-apple-darwin
cd native && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=skylake" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --release $(FEATURES_ARG)
mkdir -p common/target/classes/org/apache/comet/darwin/x86_64
cp native/target/x86_64-apple-darwin/release/libcomet.dylib common/target/classes/org/apache/comet/darwin/x86_64
cd native && RUSTFLAGS="-Ctarget-cpu=haswell -Ctarget-feature=-prefer-256-bit" cargo build --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=x86-64-v3" cargo build --release $(FEATURES_ARG)
mkdir -p common/target/classes/org/apache/comet/linux/amd64
cp native/target/release/libcomet.so common/target/classes/org/apache/comet/linux/amd64
jar -cf common/target/comet-native-x86_64.jar \
Expand All @@ -83,7 +83,7 @@ core-arm64:
cd native && RUSTFLAGS="-Ctarget-cpu=apple-m1" CC=arm64-apple-darwin21.4-clang CXX=arm64-apple-darwin21.4-clang++ CARGO_FEATURE_NEON=1 cargo build --target aarch64-apple-darwin --release $(FEATURES_ARG)
mkdir -p common/target/classes/org/apache/comet/darwin/aarch64
cp native/target/aarch64-apple-darwin/release/libcomet.dylib common/target/classes/org/apache/comet/darwin/aarch64
cd native && RUSTFLAGS="-Ctarget-cpu=native" cargo build --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=neoverse-n1" cargo build --release $(FEATURES_ARG)
mkdir -p common/target/classes/org/apache/comet/linux/aarch64
cp native/target/release/libcomet.so common/target/classes/org/apache/comet/linux/aarch64
jar -cf common/target/comet-native-aarch64.jar \
Expand All @@ -94,8 +94,8 @@ core-arm64:
release-linux: clean
rustup target add aarch64-apple-darwin x86_64-apple-darwin
cd native && RUSTFLAGS="-Ctarget-cpu=apple-m1" CC=arm64-apple-darwin21.4-clang CXX=arm64-apple-darwin21.4-clang++ CARGO_FEATURE_NEON=1 cargo build --target aarch64-apple-darwin --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=native -Ctarget-feature=-prefer-256-bit" cargo build --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=skylake" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --release $(FEATURES_ARG)
cd native && RUSTFLAGS="-Ctarget-cpu=native" cargo build --release $(FEATURES_ARG)
./mvnw install -Prelease -DskipTests $(PROFILES)
release:
cd native && RUSTFLAGS="$(RUSTFLAGS) -Ctarget-cpu=native" cargo build --release $(FEATURES_ARG)
Expand Down
Loading
Loading