fix: use FAILED_READ_FILE.FILE_NOT_EXIST in Spark 4.0 FileNotFound shim by andygrove · Pull Request #4048 · apache/datafusion-comet

andygrove · 2026-04-23T15:12:12Z

Which issue does this PR close?

Related to #2946.

Rationale for this change

The Spark 4.0 ShimSparkErrorConverter translates native FileNotFound errors into a SparkFileNotFoundException with error class _LEGACY_ERROR_TEMP_2055. That error class exists in Spark 3.4 and 3.5 but was removed in Spark 4.0, so Spark's ErrorClassesJsonReader throws an INTERNAL_ERROR when it tries to look up the message template ("Cannot find main error class '_LEGACY_ERROR_TEMP_2055'").

This manifests as test failures on org.apache.spark.sql.hive.HiveMetadataCacheSuite (and any other suite that uses checkError/checkErrorMatchPVals against FAILED_READ_FILE.FILE_NOT_EXIST) when run against Spark 4.0 with Comet enabled:

null did not equal "FAILED_READ_FILE.FILE_NOT_EXIST"

with underlying cause:

SparkException: [INTERNAL_ERROR] Cannot find main error class '_LEGACY_ERROR_TEMP_2055'

The previously ignored sql_hive-1 CI job for Spark 4.0 (issue #2946) surfaces these failures once the job is re-enabled.

What changes are included in this PR?

In the Spark 4.0 shim, delegate to QueryExecutionErrors.fileNotExistError(path, cause), which is the 4.0 replacement for readCurrentFileNotFoundError. It produces a SparkException with the expected error class FAILED_READ_FILE.FILE_NOT_EXIST and path parameter that tests assert on. The 3.4 and 3.5 shims are unchanged.

How are these changes tested?

Covered by the existing Spark test org.apache.spark.sql.hive.HiveMetadataCacheSuite (run in the sql_hive-1 CI job for Spark 4.0), which previously failed on:

SPARK-16337 temporary view refresh
view refresh
partitioned table is cached when partition pruning is true
partitioned table is cached when partition pruning is false

Re-enabling that job will be handled separately.

The Spark 4.0 `ShimSparkErrorConverter` was converting native FileNotFound errors into a `SparkFileNotFoundException` with error class `_LEGACY_ERROR_TEMP_2055`, but that error class was removed in Spark 4.0. Throwing it triggers an internal error ("Cannot find main error class") and fails tests such as `HiveMetadataCacheSuite` that assert on `FAILED_READ_FILE.FILE_NOT_EXIST`. Delegate to `QueryExecutionErrors.fileNotExistError`, which is the 4.0 replacement for `readCurrentFileNotFoundError` and produces the expected error class and `path` parameter.

andygrove · 2026-04-23T15:22:13Z

this is included in #4047 - no need for separate PR

andygrove added the spark 4 label Apr 23, 2026

This was referenced Apr 23, 2026

Spark 4.0 sql_hive-1: HiveUDFDynamicLoadSuite fails to load UDF classes from hive-contrib #4049

Closed

test: re-enable sql_hive-1 for Spark 4.0 and fix two small failures #4047

Merged

andygrove closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use FAILED_READ_FILE.FILE_NOT_EXIST in Spark 4.0 FileNotFound shim#4048

fix: use FAILED_READ_FILE.FILE_NOT_EXIST in Spark 4.0 FileNotFound shim#4048
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix/spark4-file-not-exist-error-class

andygrove commented Apr 23, 2026 •

edited

Loading

Uh oh!

andygrove commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Apr 23, 2026 •

edited

Loading