[SPARK-57263][SQL] Support Hive 4.2 metastore#56337
Closed
LuciferYang wants to merge 2 commits into
Closed
Conversation
### What changes were proposed in this pull request? Add Hive 4.2.0 as a supported metastore client version via `IsolatedClientLoader`, following the same pattern as SPARK-45265 (4.0) and SPARK-53095 (4.1). Changes: - Add `hive.v4_2` with `extraDeps` verified against the Hive 4.2 POM: `datanucleus-api-jdo:6.0.3` (down from 6.0.5 in 4.1), `core:6.0.10`, `javax.jdo:3.2.0-release`, `derby:10.17.1.0` (Java 21 compatible) - Register `(4, 2, _)` in `IsolatedClientLoader.hiveVersion()` as a pure resolver so that config validation accepts `4.2.0` - Add `Shim_v4_2 extends Shim_v4_1` -- no API changes between 4.1 and 4.2; all shimmed method signatures verified against Hive 4.2 source - Add a Java 21 guard on the client-construction path (the `IsolatedClientLoader` constructor). Hive 4.2 is compiled with `maven.compiler.target=21`, so loading its JARs on Java < 21 throws `UnsupportedClassVersionError`. The guard raises the new `UNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVA` error condition with an actionable message. Keeping it off `hiveVersion()` (which is also used for config validation) ensures the message reaches the user instead of being swallowed. - Add "4.2" to `HiveClientVersions` for the version-sweep test suite, gated to Java 21+ - Update doc strings in `HiveUtils`, `sql-data-sources-hive-tables.md`, and `sql-migration-guide.md` ### Why are the changes needed? Hive 4.2.0 has been released with JDK 21 support, expanded Iceberg v3 integration, REST Catalog, and HMS improvements. Users running Spark on Java 21 should be able to connect to a Hive 4.2 metastore via `spark.sql.hive.metastore.version=4.2.0`. ### Does this PR introduce _any_ user-facing change? Yes: `spark.sql.hive.metastore.version=4.2.0` is now a valid option (requires Java 21 or later at the Spark JVM level). ### How was this patch tested? - `build/sbt 'hive/compile'` and `build/sbt 'core/testOnly *SparkThrowableSuite'` pass - All shimmed method signatures in `Hive.java` and `IMetaStoreClient.java` verified against Hive 4.2.0 source -- no differences from 4.1, confirming the empty `Shim_v4_2` body - The "4.2" entry in `HiveClientVersions` feeds `HiveClientSuites`, which exercises the shim via isolated classloader (requires Maven access to download Hive 4.2 jars at test runtime) ### Was this patch authored or co-authored using generative AI tooling? No.
Contributor
Author
|
cc @yaooqinn and @dongjoon-hyun |
Member
|
Thank you for pinging me, @LuciferYang . |
Member
There was a problem hiding this comment.
As a side note, I didn't do this because of the Java 21 requirement of Apache Hive 4.2.x, but I agree with this PR as a preparation of Apache Spark 5.0.0. We may want to drop Java 17 completely in 2027.
For the record, before this PR, there already exists a few Spark features which are enabled only with Java 21+. So, this approach is reasonable.
spark/core/src/main/scala/org/apache/spark/internal/config/package.scala
Lines 2074 to 2079 in 4915340
Contributor
Author
|
Thanks @dongjoon-hyun. |
yaooqinn
approved these changes
Jun 5, 2026
Contributor
Author
|
Merged into master for Apache Spark 5.0.0. Thanks @dongjoon-hyun and @yaooqinn |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds Hive
4.2.0as a supported metastore client version, following 4.0 (SPARK-45265) and 4.1 (SPARK-53095).hive.v4_2with theextraDepstaken from the Hive 4.2 POM. A few datanucleus/jdo deps are actually lower than 4.1 (datanucleus-api-jdo6.0.3 vs 6.0.5,datanucleus-core6.0.10 vs 6.0.11,javax.jdo3.2.0 vs 3.2.1), while Derby is bumped to10.17.1.0for Java 21. There is a note inpackage.scalaso these don't get "fixed" upward later.Shim_v4_2extendsShim_v4_1. The shimmed method signatures are unchanged between 4.1 and 4.2, so the body is empty.maven.compiler.target=21, so its jars cannot load on an older JVM. When a 4.2 client is constructed on Java < 21, it now fails withUNSUPPORTED_HIVE_METASTORE_VERSION_FOR_JAVAinstead of a rawUnsupportedClassVersionError. The check lives on the client-construction path rather than inhiveVersion(), so config validation still resolves4.2.0normally.HiveClientVersionsincludes4.2in the test sweep only on Java 21+.Why are the changes needed?
Hive 4.2.0 is released and supports JDK 21. Users on Java 21 should be able to connect Spark to a Hive 4.2 metastore via
spark.sql.hive.metastore.version=4.2.0.Does this PR introduce any user-facing change?
Yes.
4.2.0is now a valid value forspark.sql.hive.metastore.version(Java 21+). On an older JVM, setting it fails fast with a clear message.How was this patch tested?
4.2toHiveClientVersionsruns it throughHiveClientSuites, which loads the client via the isolated classloader (requires network access to download the 4.2 jars).Hive.javaandIMetaStoreClient.javaagainst Hive 4.2 source; no differences from 4.1.build/sbt 'core/testOnly *SparkThrowableSuite'for the new error condition.Was this patch authored or co-authored using generative AI tooling?
No.