[SPARK-57261][SQL] Allow to disable HashAggregateExec by config by pan3793 · Pull Request #56323 · apache/spark

pan3793 · 2026-06-04T13:37:23Z

What changes were proposed in this pull request?

Currently, Spark always prefers to use HashAggregateExec over SortAggregateExec if possible, this PR adds a config spark.sql.execution.useHashAggregateExec to allow users to disable HashAggregateExec explicitly.

Why are the changes needed?

We found some jobs fail with HashAggregateExec due to OOM (auto fallback logic does not work well), and it runs well with SortAggregateExec

26/06/04 18:47:30 ERROR [SIGTERM handler] CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
26/06/04 18:47:30 WARN [Executor task launch worker for task 9749.0 in stage 14.0 (TID 61758)] TaskMemoryManager: Failed to allocate a page (2147483648 bytes) for 0 times, try again.
java.lang.OutOfMemoryError: Java heap space
	at org.apache.spark.unsafe.memory.HeapMemoryAllocator.allocate(HeapMemoryAllocator.java:72)
	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:398)
	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:359)
	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:96)
	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:868)
	at org.apache.spark.unsafe.map.BytesToBytesMap.growAndRehash(BytesToBytesMap.java:991)
	at org.apache.spark.unsafe.map.BytesToBytesMap$Location.append(BytesToBytesMap.java:817)
	at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.getAggregationBufferFromUnsafeRow(UnsafeFixedWidthAggregationMap.java:135)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.hashAgg_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.hashAgg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:44)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
	at scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:593)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:195)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:206)
	at org.apache.spark.scheduler.Task.run(Task.scala:147)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:900)
	at org.apache.spark.executor.Executor$TaskRunner$$Lambda$709/0x00007f84474fd558.apply(Unknown Source)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:86)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:83)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:903)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT is tuned.

Also verified with a production case, HashAggregateExec vs SortAggregateExec

Was this patch authored or co-authored using generative AI tooling?

No.

pan3793 · 2026-06-05T09:47:37Z

cc @cloud-fan @LuciferYang

pan3793 added 3 commits June 5, 2026 13:34

[SPARK-57261][SQL] Allow to disable HashAggregateExec by config

3dc6f11

fix

31a601f

fix

d61d0be

pan3793 force-pushed the SPARK-57261 branch from 95b4a33 to d61d0be Compare June 5, 2026 05:34

peter-toth approved these changes Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57261][SQL] Allow to disable HashAggregateExec by config#56323

[SPARK-57261][SQL] Allow to disable HashAggregateExec by config#56323
pan3793 wants to merge 3 commits into
apache:masterfrom
pan3793:SPARK-57261

pan3793 commented Jun 4, 2026 •

edited

Loading

Uh oh!

pan3793 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pan3793 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pan3793 commented Jun 4, 2026 •

edited

Loading