Description
Currently, the MODE aggregate function is not supported natively in Comet and falls back to Spark execution.
MODE is a commonly used statistical aggregate that returns the most frequent value in a group. The lack of native support prevents queries using MODE from benefiting from Comet’s execution pipeline.
Motivation
- Enables full native execution for queries involving
MODE
- Avoids fallback to Spark, improving performance and consistency
- Brings Comet closer to feature parity with Spark SQL
Proposed Change
Add native support for the MODE aggregate function with:
- Full correctness across all supported data types
- Proper handling of null and edge cases
- Support for partial and final aggregation (mergeable state)
- Integration with Comet’s
GroupsAccumulator for grouped queries
Expected Behavior
Queries using MODE should:
-
Execute fully within Comet (no fallback)
-
Return correct results across all supported types
-
Handle edge cases such as:
- all-null inputs
- ties (consistent with Spark behavior)
Description
Currently, the
MODEaggregate function is not supported natively in Comet and falls back to Spark execution.MODEis a commonly used statistical aggregate that returns the most frequent value in a group. The lack of native support prevents queries usingMODEfrom benefiting from Comet’s execution pipeline.Motivation
MODEProposed Change
Add native support for the
MODEaggregate function with:GroupsAccumulatorfor grouped queriesExpected Behavior
Queries using
MODEshould:Execute fully within Comet (no fallback)
Return correct results across all supported types
Handle edge cases such as: