Skip to content

fix: use FAILED_READ_FILE.FILE_NOT_EXIST in Spark 4.0 FileNotFound shim#4048

Closed
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix/spark4-file-not-exist-error-class
Closed

fix: use FAILED_READ_FILE.FILE_NOT_EXIST in Spark 4.0 FileNotFound shim#4048
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix/spark4-file-not-exist-error-class

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 23, 2026

Which issue does this PR close?

Related to #2946.

Rationale for this change

The Spark 4.0 ShimSparkErrorConverter translates native FileNotFound errors into a SparkFileNotFoundException with error class _LEGACY_ERROR_TEMP_2055. That error class exists in Spark 3.4 and 3.5 but was removed in Spark 4.0, so Spark's ErrorClassesJsonReader throws an INTERNAL_ERROR when it tries to look up the message template ("Cannot find main error class '_LEGACY_ERROR_TEMP_2055'").

This manifests as test failures on org.apache.spark.sql.hive.HiveMetadataCacheSuite (and any other suite that uses checkError/checkErrorMatchPVals against FAILED_READ_FILE.FILE_NOT_EXIST) when run against Spark 4.0 with Comet enabled:

null did not equal "FAILED_READ_FILE.FILE_NOT_EXIST"

with underlying cause:

SparkException: [INTERNAL_ERROR] Cannot find main error class '_LEGACY_ERROR_TEMP_2055'

The previously ignored sql_hive-1 CI job for Spark 4.0 (issue #2946) surfaces these failures once the job is re-enabled.

What changes are included in this PR?

In the Spark 4.0 shim, delegate to QueryExecutionErrors.fileNotExistError(path, cause), which is the 4.0 replacement for readCurrentFileNotFoundError. It produces a SparkException with the expected error class FAILED_READ_FILE.FILE_NOT_EXIST and path parameter that tests assert on. The 3.4 and 3.5 shims are unchanged.

How are these changes tested?

Covered by the existing Spark test org.apache.spark.sql.hive.HiveMetadataCacheSuite (run in the sql_hive-1 CI job for Spark 4.0), which previously failed on:

  • SPARK-16337 temporary view refresh
  • view refresh
  • partitioned table is cached when partition pruning is true
  • partitioned table is cached when partition pruning is false

Re-enabling that job will be handled separately.

The Spark 4.0 `ShimSparkErrorConverter` was converting native FileNotFound
errors into a `SparkFileNotFoundException` with error class
`_LEGACY_ERROR_TEMP_2055`, but that error class was removed in Spark 4.0.
Throwing it triggers an internal error ("Cannot find main error class")
and fails tests such as `HiveMetadataCacheSuite` that assert on
`FAILED_READ_FILE.FILE_NOT_EXIST`.

Delegate to `QueryExecutionErrors.fileNotExistError`, which is the 4.0
replacement for `readCurrentFileNotFoundError` and produces the expected
error class and `path` parameter.
@andygrove
Copy link
Copy Markdown
Member Author

this is included in #4047 - no need for separate PR

@andygrove andygrove closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant