Skip to content

[SPARK-57263][SQL] Support Hive 4.2 metastore#56337

Closed
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:worktree-SPARK-hive42-metastore
Closed

[SPARK-57263][SQL] Support Hive 4.2 metastore#56337
LuciferYang wants to merge 2 commits into
apache:masterfrom
LuciferYang:worktree-SPARK-hive42-metastore

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds Hive 4.2.0 as a supported metastore client version, following 4.0 (SPARK-45265) and 4.1 (SPARK-53095).

  • Add hive.v4_2 with the extraDeps taken from the Hive 4.2 POM. A few datanucleus/jdo deps are actually lower than 4.1 (datanucleus-api-jdo 6.0.3 vs 6.0.5, datanucleus-core 6.0.10 vs 6.0.11, javax.jdo 3.2.0 vs 3.2.1), while Derby is bumped to 10.17.1.0 for Java 21. There is a note in package.scala so these don't get "fixed" upward later.
  • Shim_v4_2 extends Shim_v4_1. The shimmed method signatures are unchanged between 4.1 and 4.2, so the body is empty.
  • Hive 4.2 is compiled with maven.compiler.target=21, so its jars cannot load on an older JVM. When a 4.2 client is constructed on Java < 21, it now fails with UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA instead of a raw UnsupportedClassVersionError. The check lives on the client-construction path rather than in hiveVersion(), so config validation still resolves 4.2.0 normally.
  • HiveClientVersions includes 4.2 in the test sweep only on Java 21+.
  • Update the supported metastore version range in the docs.

Why are the changes needed?

Hive 4.2.0 is released and supports JDK 21. Users on Java 21 should be able to connect Spark to a Hive 4.2 metastore via spark.sql.hive.metastore.version=4.2.0.

Does this PR introduce any user-facing change?

Yes. 4.2.0 is now a valid value for spark.sql.hive.metastore.version (Java 21+). On an older JVM, setting it fails fast with a clear message.

How was this patch tested?

  • Adding 4.2 to HiveClientVersions runs it through HiveClientSuites, which loads the client via the isolated classloader (requires network access to download the 4.2 jars).
  • Checked the shimmed methods in Hive.java and IMetaStoreClient.java against Hive 4.2 source; no differences from 4.1.
  • build/sbt 'core/testOnly *SparkThrowableSuite' for the new error condition.

Was this patch authored or co-authored using generative AI tooling?

No.

### What changes were proposed in this pull request?

Add Hive 4.2.0 as a supported metastore client version via
`IsolatedClientLoader`, following the same pattern as SPARK-45265 (4.0)
and SPARK-53095 (4.1).

Changes:
- Add `hive.v4_2` with `extraDeps` verified against the Hive 4.2 POM:
  `datanucleus-api-jdo:6.0.3` (down from 6.0.5 in 4.1), `core:6.0.10`,
  `javax.jdo:3.2.0-release`, `derby:10.17.1.0` (Java 21 compatible)
- Register `(4, 2, _)` in `IsolatedClientLoader.hiveVersion()` as a pure
  resolver so that config validation accepts `4.2.0`
- Add `Shim_v4_2 extends Shim_v4_1` -- no API changes between 4.1 and
  4.2; all shimmed method signatures verified against Hive 4.2 source
- Add a Java 21 guard on the client-construction path (the
  `IsolatedClientLoader` constructor). Hive 4.2 is compiled with
  `maven.compiler.target=21`, so loading its JARs on Java < 21 throws
  `UnsupportedClassVersionError`. The guard raises the new
  `UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA` error condition with an
  actionable message. Keeping it off `hiveVersion()` (which is also used
  for config validation) ensures the message reaches the user instead of
  being swallowed.
- Add "4.2" to `HiveClientVersions` for the version-sweep test suite,
  gated to Java 21+
- Update doc strings in `HiveUtils`, `sql-data-sources-hive-tables.md`,
  and `sql-migration-guide.md`

### Why are the changes needed?

Hive 4.2.0 has been released with JDK 21 support, expanded Iceberg v3
integration, REST Catalog, and HMS improvements. Users running Spark on
Java 21 should be able to connect to a Hive 4.2 metastore via
`spark.sql.hive.metastore.version=4.2.0`.

### Does this PR introduce _any_ user-facing change?

Yes: `spark.sql.hive.metastore.version=4.2.0` is now a valid option
(requires Java 21 or later at the Spark JVM level).

### How was this patch tested?

- `build/sbt 'hive/compile'` and
  `build/sbt 'core/testOnly *SparkThrowableSuite'` pass
- All shimmed method signatures in `Hive.java` and
  `IMetaStoreClient.java` verified against Hive 4.2.0 source -- no
  differences from 4.1, confirming the empty `Shim_v4_2` body
- The "4.2" entry in `HiveClientVersions` feeds `HiveClientSuites`,
  which exercises the shim via isolated classloader (requires Maven
  access to download Hive 4.2 jars at test runtime)

### Was this patch authored or co-authored using generative AI tooling?

No.
@LuciferYang
Copy link
Copy Markdown
Contributor Author

cc @yaooqinn and @dongjoon-hyun

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you for pinging me, @LuciferYang .

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, I didn't do this because of the Java 21 requirement of Apache Hive 4.2.x, but I agree with this PR as a preparation of Apache Spark 5.0.0. We may want to drop Java 17 completely in 2027.

For the record, before this PR, there already exists a few Spark features which are enabled only with Java 21+. So, this approach is reasonable.

private[spark] val MASTER_REST_SERVER_VIRTUAL_THREADS =
ConfigBuilder("spark.master.rest.virtualThread.enabled")
.doc("If true, Spark master tries to use Java 21 virtual thread for REST API.")
.version("4.0.0")
.booleanConf
.createWithDefault(true)

@LuciferYang
Copy link
Copy Markdown
Contributor Author

Thanks @dongjoon-hyun.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@LuciferYang
Copy link
Copy Markdown
Contributor Author

Merged into master for Apache Spark 5.0.0. Thanks @dongjoon-hyun and @yaooqinn

@LuciferYang LuciferYang deleted the worktree-SPARK-hive42-metastore branch June 5, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants