Skip to content

Commit 0d481f4

Browse files
authored
Merge branch 'main' into fix_comet_sum_compatbility_level
2 parents 54a83cb + 13e5f8c commit 0d481f4

53 files changed

Lines changed: 955 additions & 77 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/audit-comet-expression/SKILL.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This audit covers:
1313
1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
1414
2. Comet Scala serde implementation
1515
3. Comet Rust / DataFusion implementation
16-
4. Existing test coverage (SQL file tests and Scala tests)
16+
4. Existing test coverage (Comet SQL Tests and Comet Scala Tests)
1717
5. Gap analysis and test recommendations
1818

1919
---
@@ -161,7 +161,7 @@ Read the Rust implementation and check:
161161

162162
## Step 4: Locate Existing Comet Tests
163163

164-
### SQL file tests
164+
### Comet SQL Tests
165165

166166
```bash
167167
# Find SQL test files for this expression
@@ -179,13 +179,13 @@ Read every SQL test file found and list:
179179
- Query modes used (`query`, `spark_answer_only`, `tolerance`, `ignore`, `expect_error`)
180180
- Any ConfigMatrix directives
181181

182-
### Scala tests
182+
### Comet Scala Tests
183183

184184
```bash
185185
grep -r "$ARGUMENTS" spark/src/test/scala/ --include="*.scala" -l
186186
```
187187

188-
Read the relevant Scala test files and list:
188+
Read the relevant Comet Scala Tests and list:
189189

190190
- Input types covered
191191
- Edge cases exercised
@@ -201,7 +201,7 @@ Compare the Spark test coverage (Step 2) against the Comet test coverage (Step 4
201201

202202
For each of the following dimensions, note whether it is covered in Comet tests or missing:
203203

204-
| Dimension | Spark tests it | Comet SQL test | Comet Scala test | Gap? |
204+
| Dimension | Spark tests it | Comet SQL Test | Comet Scala Test | Gap? |
205205
| ------------------------------------------------------------------------------------------------------ | -------------- | -------------- | ---------------- | ---- |
206206
| Column reference argument(s) | | | | |
207207
| Literal argument(s) | | | | |
@@ -256,13 +256,13 @@ After presenting the gap analysis, ask the user:
256256
>
257257
> - [list each missing test case]
258258
>
259-
> I can add them as SQL file tests in `spark/src/test/resources/sql-tests/expressions/<category>/$ARGUMENTS.sql`
260-
> (or as Scala tests in `CometExpressionSuite` for cases that require programmatic setup).
259+
> I can add them as Comet SQL Tests in `spark/src/test/resources/sql-tests/expressions/<category>/$ARGUMENTS.sql`
260+
> (or as Comet Scala Tests in `CometExpressionSuite` for cases that require programmatic setup).
261261
262-
If the user says yes, implement the missing tests following the SQL file test format described in
263-
`docs/source/contributor-guide/sql-file-tests.md`. Prefer SQL file tests over Scala tests.
262+
If the user says yes, implement the missing tests following the Comet SQL Tests format described in
263+
`docs/source/contributor-guide/sql-file-tests.md`. Prefer Comet SQL Tests over Comet Scala Tests.
264264

265-
### SQL file test template
265+
### Comet SQL Tests template
266266

267267
```sql
268268
-- Licensed to the Apache Software Foundation (ASF) under one

.claude/skills/review-comet-pr/SKILL.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -138,15 +138,15 @@ Location: `native/spark-expr/src/`
138138
- [ ] No panics. Use `Result` types.
139139
- [ ] Efficient array operations (avoid row-by-row)
140140

141-
#### Tests - Prefer SQL File-Based Framework
141+
#### Tests - Prefer Comet SQL Tests
142142

143-
**Expression tests should use the SQL file-based framework (`CometSqlFileTestSuite`) where possible.** This framework automatically runs each query through both Spark and Comet and compares results. No Scala code is needed. Only fall back to Scala tests in `CometExpressionSuite` when the SQL framework cannot express the test. Examples include complex `DataFrame` setup, programmatic data generation, or non-expression tests.
143+
**Expression tests should use Comet SQL Tests (`CometSqlFileTestSuite`) where possible.** This framework automatically runs each query through both Spark and Comet and compares results. No Scala code is needed. Only fall back to Comet Scala Tests in `CometExpressionSuite` when Comet SQL Tests cannot express the test. Examples include complex `DataFrame` setup, programmatic data generation, or non-expression tests.
144144

145-
**SQL file test location:** `spark/src/test/resources/sql-tests/expressions/<category>/`
145+
**Comet SQL Test location:** `spark/src/test/resources/sql-tests/expressions/<category>/`
146146

147147
Categories include: `aggregate/`, `array/`, `string/`, `math/`, `struct/`, `map/`, `datetime/`, `hash/`, etc.
148148

149-
**SQL file structure:**
149+
**Comet SQL Test structure:**
150150

151151
```sql
152152
-- Create test data
@@ -181,10 +181,10 @@ query ignore(https://github.com/apache/datafusion-comet/issues/NNNN)
181181
SELECT known_buggy_expr(v) FROM test_table
182182
```
183183

184-
**Running SQL file tests:**
184+
**Running Comet SQL Tests:**
185185

186186
```bash
187-
# All SQL file tests
187+
# All Comet SQL Tests
188188
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -Dtest=none
189189

190190
# Specific test file (substring match)
@@ -199,7 +199,7 @@ SELECT known_buggy_expr(v) FROM test_table
199199
- [ ] Both literal values and column references tested (they use different code paths)
200200
- [ ] For timestamp/datetime expressions, timezone handling is tested (e.g., UTC, non-UTC session timezone, timestamps with and without timezone)
201201
- [ ] One expression per SQL file for easier debugging
202-
- [ ] If using Scala tests instead, literal tests MUST disable constant folding:
202+
- [ ] If using Comet Scala Tests instead, literal tests MUST disable constant folding:
203203
```scala
204204
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
205205
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
@@ -256,7 +256,7 @@ If the PR adds a new expression or operator but does not update the relevant doc
256256
1. **Incomplete type support**: Spark expression supports types not handled in PR
257257
2. **Missing edge cases**: Null, overflow, empty string, negative values
258258
3. **Wrong return type**: Return type must match Spark exactly
259-
4. **Tests in wrong framework**: Expression tests should use the SQL file-based framework (`CometSqlFileTestSuite`) rather than adding to Scala test suites like `CometExpressionSuite`. Suggest migration if the PR adds Scala tests for expressions that could use SQL files instead.
259+
4. **Tests in wrong framework**: Expression tests should use Comet SQL Tests (`CometSqlFileTestSuite`) rather than adding to Comet Scala Tests like `CometExpressionSuite`. Suggest migration if the PR adds Comet Scala Tests for expressions that could use Comet SQL Tests instead.
260260
5. **Stale native code**: PR might need `./mvnw install -pl common -DskipTests`
261261
6. **Missing `getSupportLevel`**: Edge cases should be marked as `Incompatible`
262262

.github/workflows/pr_build_linux.yml

Lines changed: 34 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,38 @@ jobs:
140140
run: |
141141
./dev/ci/check-working-tree-clean.sh
142142
143+
# Compile-only verification for Spark 4.1. Tests are intentionally skipped: the spark-4.1
144+
# profile is currently a build target only, and several runtime/test failures are tracked
145+
# in follow-up PRs. Excluded from lint-java because semanticdb-scalac_2.13.17 is not yet
146+
# published and the lint job activates -Psemanticdb.
147+
build-spark-4-1:
148+
needs: lint
149+
name: Build Spark 4.1, JDK 17
150+
runs-on: ubuntu-latest
151+
container:
152+
image: amd64/rust
153+
steps:
154+
- uses: actions/checkout@v6
155+
156+
- name: Setup Rust & Java toolchain
157+
uses: ./.github/actions/setup-builder
158+
with:
159+
rust-version: ${{ env.RUST_VERSION }}
160+
jdk-version: 17
161+
162+
- name: Cache Maven dependencies
163+
uses: actions/cache@v5
164+
with:
165+
path: |
166+
~/.m2/repository
167+
/root/.m2/repository
168+
key: ${{ runner.os }}-java-maven-${{ hashFiles('**/pom.xml') }}-spark-4.1-build
169+
restore-keys: |
170+
${{ runner.os }}-java-maven-
171+
172+
- name: Compile (skip tests)
173+
run: ./mvnw -B install -DskipTests -Dmaven.test.skip=true -Pspark-4.1
174+
143175
# Build native library once and share with all test jobs
144176
build-native:
145177
needs: lint
@@ -248,19 +280,9 @@ jobs:
248280
maven_opts: "-Pspark-3.4 -Pscala-2.12"
249281
scan_impl: "auto"
250282

251-
- name: "Spark 3.5.5, JDK 17, Scala 2.13"
283+
- name: "Spark 3.5, JDK 17, Scala 2.13"
252284
java_version: "17"
253-
maven_opts: "-Pspark-3.5 -Dspark.version=3.5.5 -Pscala-2.13"
254-
scan_impl: "auto"
255-
256-
- name: "Spark 3.5.6, JDK 17, Scala 2.13"
257-
java_version: "17"
258-
maven_opts: "-Pspark-3.5 -Dspark.version=3.5.6 -Pscala-2.13"
259-
scan_impl: "auto"
260-
261-
- name: "Spark 3.5, JDK 17, Scala 2.12"
262-
java_version: "17"
263-
maven_opts: "-Pspark-3.5 -Pscala-2.12"
285+
maven_opts: "-Pspark-3.5 -Pscala-2.13"
264286
scan_impl: "native_iceberg_compat"
265287

266288
- name: "Spark 4.0, JDK 17"

.github/workflows/spark_sql_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ jobs:
140140
config:
141141
- {spark-short: '3.4', spark-full: '3.4.3', java: 11, scan-impl: 'auto'}
142142
- {spark-short: '3.5', spark-full: '3.5.8', java: 11, scan-impl: 'auto'}
143-
- {spark-short: '4.0', spark-full: '4.0.1', java: 17, scan-impl: 'auto'}
143+
- {spark-short: '4.0', spark-full: '4.0.2', java: 17, scan-impl: 'auto'}
144144
fail-fast: false
145145
name: spark-sql-${{ matrix.config.scan-impl }}-${{ matrix.module.name }}/spark-${{ matrix.config.spark-full }}
146146
runs-on: ${{ matrix.os }}

common/src/main/spark-4.0/org/apache/comet/shims/CometTypeShim.scala renamed to common/src/main/spark-4.x/org/apache/comet/shims/CometTypeShim.scala

File renamed without changes.

common/src/main/spark-4.0/org/apache/comet/shims/ShimBatchReader.scala renamed to common/src/main/spark-4.x/org/apache/comet/shims/ShimBatchReader.scala

File renamed without changes.

common/src/main/spark-4.0/org/apache/comet/shims/ShimCometConf.scala renamed to common/src/main/spark-4.x/org/apache/comet/shims/ShimCometConf.scala

File renamed without changes.

common/src/main/spark-4.0/org/apache/comet/shims/ShimFileFormat.scala renamed to common/src/main/spark-4.x/org/apache/comet/shims/ShimFileFormat.scala

File renamed without changes.

common/src/main/spark-4.0/org/apache/spark/sql/comet/shims/ShimTaskMetrics.scala renamed to common/src/main/spark-4.x/org/apache/spark/sql/comet/shims/ShimTaskMetrics.scala

File renamed without changes.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3774,7 +3774,7 @@ index 4b27082e188..057b2430872 100644
37743774
-class HiveUDFDynamicLoadSuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
37753775
+// Comet: mix in IgnoreCometSuite so these tests are reported as ignored when Comet is enabled
37763776
+// (ENABLE_COMET=true). The jar these tests depend on (`hive-test-udfs.jar`) is stripped from the
3777-
+// Spark 4.0.1 release source tag per the ASF binary-artifact policy, so the tests cannot run in
3777+
+// Spark 4.0.2 release source tag per the ASF binary-artifact policy, so the tests cannot run in
37783778
+// Comet's CI. Ignoring keeps the suite passing without masking real regressions; the upstream
37793779
+// tests still run in non-Comet Spark builds that ship the jar on branch-4.0.
37803780
+class HiveUDFDynamicLoadSuite extends QueryTest with SQLTestUtils with TestHiveSingleton
@@ -3788,7 +3788,7 @@ index 4b27082e188..057b2430872 100644
37883788
val jarPath = "src/test/noclasspath/hive-test-udfs.jar"
37893789
- assume(new java.io.File(jarPath).exists)
37903790
+ // Comet: the upstream `assume(...)` runs here in the suite constructor (inside this foreach,
3791-
+ // before `test(...)` registers a case). When the jar is missing - as it is on the v4.0.1
3791+
+ // before `test(...)` registers a case). When the jar is missing - as it is on the v4.0.2
37923792
+ // release tag - `assume` throws TestCanceledException out of `<init>`, which ScalaTest
37933793
+ // reports as a suite abort (not a per-test cancel) and fails the whole job. The
37943794
+ // IgnoreCometSuite mixin above already reroutes these tests to `ignore` under Comet, so

0 commit comments

Comments
 (0)