-
Notifications
You must be signed in to change notification settings - Fork 307
build: Enable Spark SQL tests for Spark 4.1.1 #4093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
andygrove
wants to merge
31
commits into
apache:main
Choose a base branch
from
andygrove:spark-4.1.1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 14 commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
0118dda
build: add spark-4.1 profile and enable Spark 4.1.1 SQL tests
andygrove 7a0dd7e
save current progress
andygrove 203b88a
fix: support Spark 4.1 IndexShuffleBlockResolver constructor and docu…
andygrove b17726e
docs: reflow spark-sql-tests.md with prettier
andygrove 626c966
build: add spark-4.1 test shim sources
andygrove 8521e41
test: add spark-4.1 plan-stability golden files
andygrove 7205d4d
Merge remote-tracking branch 'apache/main' into spark-4.1.1
andygrove 330e400
fix: add isVariantStruct shim for spark-4.1 profile
andygrove 50008a6
ci: purge partial pom-only entries from local Maven cache before sbt
andygrove 5daf943
ci: enable spark-4.1 PR builds in Linux and macOS matrices
andygrove e93e67d
fix: pin spark-4.1 profile Scala to 2.13.16 for semanticdb compatibility
andygrove 58bd76a
fix: keep spark-4.1 on Scala 2.13.17 and skip semanticdb lint
andygrove 05cd6c4
fix: drop -Pscala-2.13 from macOS spark-4.1 matrix entry
andygrove 5a60be2
fix: Spark 4.1 newTaskTempFile + REMAINDER_BY_ZERO error class
andygrove 9a154b7
Merge remote-tracking branch 'apache/main' into spark-4.1.1
andygrove 84379ec
test: drop spark-4.1 plan-stability golden files
andygrove dce8dfa
test: revert CometExpressionSuite spark-4.1 changes
andygrove cf81dea
ci: revert spark-4.1 entries from pr_build workflows
andygrove 1190b5a
fix(spark-4.1): unblock Spark 4.1.1 SQL tests in CI
andygrove fc8e8e3
fix(spark-4.1): skip failing tests in 4.1.1 SQL test diff
andygrove 98a178c
ci: raise SBT and forked JVM heap for Spark SQL tests
andygrove ebbc249
ci: drop _JAVA_OPTIONS that broke SBT startup
andygrove f5edafa
ci: run Spark SQL tests on runs-on.com 16-cpu runners
andygrove 2c79a49
ci: pin Hive tests to ubuntu-24.04 and skip flaky/incompatible 4.1 SQ…
andygrove 59494d1
test: skip Spark 4.1 plan-shape tests that introspect Spark-only types
andygrove f50b4a7
test: link IgnoreComet tags to tracking issue #4098
andygrove fff9158
ci: TEMP disable non-essential workflows on spark-4.1.1 branch
andygrove cb639a3
test: drop unused ShuffleExchangeExec import in StreamingQuerySuite
andygrove e91c669
Revert "ci: TEMP disable non-essential workflows on spark-4.1.1 branch"
andygrove c5a0b81
test: skip RocksDBStateStoreIntegrationSuite under Comet on Spark 4.1
andygrove 9f4ec9c
Merge remote-tracking branch 'apache/main' into spark-4.1.1
andygrove File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
common/src/main/spark-4.1/org/apache/comet/shims/CometTypeShim.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.comet.shims | ||
|
|
||
| import org.apache.spark.sql.execution.datasources.VariantMetadata | ||
| import org.apache.spark.sql.types.{DataType, StringType, StructType} | ||
|
|
||
| trait CometTypeShim { | ||
| // A `StringType` carries collation metadata in Spark 4.0. Only non-default (non-UTF8_BINARY) | ||
| // collations have semantics Comet's byte-level hashing/sorting/equality cannot honor. The | ||
| // default `StringType` object is `StringType(UTF8_BINARY_COLLATION_ID)`, so comparing | ||
| // `collationId` against that instance's id picks out non-default collations without needing | ||
| // `private[sql]` helpers on `StringType`. | ||
| def isStringCollationType(dt: DataType): Boolean = dt match { | ||
| case st: StringType => st.collationId != StringType.collationId | ||
| case _ => false | ||
| } | ||
|
|
||
| // Spark 4.0's `PushVariantIntoScan` rewrites `VariantType` columns into a `StructType` whose | ||
| // fields each carry `__VARIANT_METADATA_KEY` metadata, then pushes `variant_get` paths down as | ||
| // ordinary struct field accesses. Comet's native scans don't understand the on-disk Parquet | ||
| // variant shredding layout, so reading such a struct natively returns nulls. Detect the marker | ||
| // and force scan fallback. | ||
| def isVariantStruct(s: StructType): Boolean = VariantMetadata.isVariantStruct(s) | ||
| } | ||
36 changes: 36 additions & 0 deletions
36
common/src/main/spark-4.1/org/apache/comet/shims/ShimBatchReader.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.comet.shims | ||
|
|
||
| import org.apache.spark.paths.SparkPath | ||
| import org.apache.spark.sql.catalyst.InternalRow | ||
| import org.apache.spark.sql.execution.datasources.PartitionedFile | ||
|
|
||
| object ShimBatchReader { | ||
| def newPartitionedFile(partitionValues: InternalRow, file: String): PartitionedFile = | ||
| PartitionedFile( | ||
| partitionValues, | ||
| SparkPath.fromUrlString(file), | ||
| -1, // -1 means we read the entire file | ||
| -1, | ||
| Array.empty[String], | ||
| 0, | ||
| 0) | ||
| } |
24 changes: 24 additions & 0 deletions
24
common/src/main/spark-4.1/org/apache/comet/shims/ShimCometConf.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.comet.shims | ||
|
|
||
| trait ShimCometConf { | ||
| protected val COMET_SCHEMA_EVOLUTION_ENABLED_DEFAULT = true | ||
| } |
33 changes: 33 additions & 0 deletions
33
common/src/main/spark-4.1/org/apache/comet/shims/ShimFileFormat.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.comet.shims | ||
|
|
||
| import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat | ||
| import org.apache.spark.sql.execution.datasources.parquet.ParquetRowIndexUtil | ||
| import org.apache.spark.sql.types.StructType | ||
|
|
||
| object ShimFileFormat { | ||
| // A name for a temporary column that holds row indexes computed by the file format reader | ||
| // until they can be placed in the _metadata struct. | ||
| val ROW_INDEX_TEMPORARY_COLUMN_NAME = ParquetFileFormat.ROW_INDEX_TEMPORARY_COLUMN_NAME | ||
|
|
||
| def findRowIndexColumnIndexInSchema(sparkSchema: StructType): Int = | ||
| ParquetRowIndexUtil.findRowIndexColumnIndexInSchema(sparkSchema) | ||
| } |
29 changes: 29 additions & 0 deletions
29
common/src/main/spark-4.1/org/apache/spark/sql/comet/shims/ShimTaskMetrics.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.comet.shims | ||
|
|
||
| import org.apache.spark.executor.TaskMetrics | ||
| import org.apache.spark.util.AccumulatorV2 | ||
|
|
||
| object ShimTaskMetrics { | ||
|
|
||
| def getTaskAccumulator(taskMetrics: TaskMetrics): Option[AccumulatorV2[_, _]] = | ||
| taskMetrics._externalAccums.lastOption | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is common for Spark 4.0 and Spark 4.1, we can move it from
spark-4.0tospark-4.x.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks most shims in Spark 4.1 are identical to Spark 4.0 except for CometExprShim. I added a
CometSumShimfor Spark 4.1 in #2829 and moved other shims fromspark-4.0tospark-4.x.