Skip to content

test: re-enable sql_hive-1 for Spark 4.0 and fix two small failures#4047

Open
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:test-hive1-spark4-retry
Open

test: re-enable sql_hive-1 for Spark 4.0 and fix two small failures#4047
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:test-hive1-spark4-retry

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 23, 2026

Which issue does this PR close?

Closes #2946.
Closes #4049.

Rationale for this change

The sql_hive-1 job for Spark 4.0 has been excluded from CI since #2946 was filed (hang/timeout on the Hive suite with JDK 17). A re-run on current main shows the hang no longer reproduces: the job completes in about 63 minutes. It does surface two small issues, which this PR also fixes, so the job can be re-enabled and kept green.

What changes are included in this PR?

  1. .github/workflows/spark_sql_test.yml: remove the exclude matrix entry that skipped sql_hive-1 for spark-4.0.1 / java 17 / auto.
  2. spark/src/main/spark-4.0/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala: the FileNotFound case was producing a SparkFileNotFoundException with error class _LEGACY_ERROR_TEMP_2055, which was removed in Spark 4.0. Delegate to QueryExecutionErrors.fileNotExistError(path, cause) so the error carries FAILED_READ_FILE.FILE_NOT_EXIST, which is what HiveMetadataCacheSuite (and similar checkError assertions) expect.
  3. dev/diffs/4.0.1.diff: drop the hunk that commented out assume(new java.io.File(jarPath).exists) in HiveUDFDynamicLoadSuite. Spark 4.0 no longer ships hive-test-udfs.jar, so restoring the upstream assume cancels the five UDF tests cleanly instead of running them without the required classes and failing with CANNOT_LOAD_FUNCTION_CLASS.

How are these changes tested?

The CI run on this PR exercises the full sql_hive-1 job for Spark 4.0. The previously failing suites behave as follows:

  • HiveMetadataCacheSuite (4 failures): now pass against the corrected error class.
  • HiveUDFDynamicLoadSuite (5 failures): now cancelled on Spark 4.0 (jar absent); still run on Spark 3.4/3.5 (jar present).

The Spark 4.0 `ShimSparkErrorConverter` was converting native FileNotFound
errors into a `SparkFileNotFoundException` with error class
`_LEGACY_ERROR_TEMP_2055`, but that error class was removed in Spark 4.0.
Throwing it triggers an internal error ("Cannot find main error class")
and fails tests such as `HiveMetadataCacheSuite` that assert on
`FAILED_READ_FILE.FILE_NOT_EXIST`.

Delegate to `QueryExecutionErrors.fileNotExistError`, which is the 4.0
replacement for `readCurrentFileNotFoundError` and produces the expected
error class and `path` parameter.
The 4.0.1 Spark diff commented out the `assume(new java.io.File(jarPath).exists)`
guard in HiveUDFDynamicLoadSuite. The jar (hive-test-udfs.jar) is not
shipped with Spark 4.0, so without the guard the five UDF/UDAF/UDTF tests
run without the required class and fail with CANNOT_LOAD_FUNCTION_CLASS.

Drop the hunk so upstream's assume() is preserved. The tests are cancelled
on Spark 4.0 (jar missing) and continue to run on Spark 3.4/3.5 (jar
present), matching upstream behavior.

Closes apache#4049.
@andygrove andygrove changed the title test: re-enable sql_hive-1 for Spark 4.0 to check if #2946 still reproduces test: re-enable sql_hive-1 for Spark 4.0 and fix surfaced failures Apr 23, 2026
@andygrove andygrove changed the title test: re-enable sql_hive-1 for Spark 4.0 and fix surfaced failures test: re-enable sql_hive-1 for Spark 4.0 and fix two small failures Apr 23, 2026
@andygrove andygrove marked this pull request as ready for review April 23, 2026 15:24
@andygrove andygrove marked this pull request as draft April 23, 2026 17:38
The upstream assume(jarPath.exists) runs during class construction
(inside udfTestInfos.foreach, before test() registers a case), so when
hive-test-udfs.jar is absent - as it is on the v4.0.1 release tag - the
TestCanceledException propagates out of <init> and ScalaTest marks the
whole suite as aborted, failing the job. Mix in IgnoreCometSuite so the
five tests are reported as ignored under Comet, and comment out the
constructor-time assume so it no longer aborts the suite.
@andygrove andygrove marked this pull request as ready for review April 24, 2026 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

1 participant