Skip to content

Add native support for MODE aggregate function #3970

@pchintar

Description

@pchintar

Description

Currently, the MODE aggregate function is not supported natively in Comet and falls back to Spark execution.

MODE is a commonly used statistical aggregate that returns the most frequent value in a group. The lack of native support prevents queries using MODE from benefiting from Comet’s execution pipeline.


Motivation

  • Enables full native execution for queries involving MODE
  • Avoids fallback to Spark, improving performance and consistency
  • Brings Comet closer to feature parity with Spark SQL

Proposed Change

Add native support for the MODE aggregate function with:

  • Full correctness across all supported data types
  • Proper handling of null and edge cases
  • Support for partial and final aggregation (mergeable state)
  • Integration with Comet’s GroupsAccumulator for grouped queries

Expected Behavior

Queries using MODE should:

  • Execute fully within Comet (no fallback)

  • Return correct results across all supported types

  • Handle edge cases such as:

    • all-null inputs
    • ties (consistent with Spark behavior)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:aggregationHash aggregates, aggregate expressions

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions