You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `CometExpressionSerde` trait provides three methods you can override:
73
+
The `CometExpressionSerde` trait provides several methods you can override:
74
74
75
75
-`convert(expr: T, inputs: Seq[Attribute], binding: Boolean): Option[Expr]` - **Required**. Converts the Spark expression to protobuf. Return `None` if the expression cannot be converted.
76
-
-`getSupportLevel(expr: T): SupportLevel` - Optional. Returns the level of support for the expression. See "Using getSupportLevel" section below for details.
76
+
-`getSupportLevel(expr: T): SupportLevel` - Optional. Returns the level of support for the expression at planning time, based on a specific expression instance. See "Using getSupportLevel" section below for details.
77
+
-`getIncompatibleReasons(): Seq[String]` - Optional. Returns reasons why this expression may produce different results than Spark. Used to generate the Compatibility Guide. See "Documenting Incompatible and Unsupported Reasons" below.
78
+
-`getUnsupportedReasons(): Seq[String]` - Optional. Returns reasons why this expression may not be supported by Comet (for example, unsupported data types or format strings). Used to generate the Compatibility Guide. See "Documenting Incompatible and Unsupported Reasons" below.
77
79
-`getExprConfigName(expr: T): String` - Optional. Returns a short name for configuration keys. Defaults to the Spark class name.
78
80
79
81
For simple scalar functions that map directly to a DataFusion function, you can use the built-in `CometScalarFunction` implementation:
@@ -208,6 +210,65 @@ When the query planner encounters an expression:
208
210
209
211
Any notes provided will be logged to help with debugging and understanding why an expression was not used.
210
212
213
+
#### Documenting Incompatible and Unsupported Reasons
214
+
215
+
In addition to `getSupportLevel`, which governs runtime planning decisions, the serde trait exposes two static documentation methods:
216
+
217
+
-`getIncompatibleReasons(): Seq[String]` - Reasons the expression may produce different results than Spark.
218
+
-`getUnsupportedReasons(): Seq[String]` - Reasons the expression, or certain usages of it, may not be supported by Comet.
219
+
220
+
These methods do not affect runtime behavior. They are called by `GenerateDocs` (`spark/src/main/scala/org/apache/comet/GenerateDocs.scala`) when building the user-facing Compatibility Guide pages under `docs/source/user-guide/latest/compatibility/expressions/` (for example, `math.md`, `datetime.md`, `array.md`, `aggregate.md`, `struct.md`). Each reason is rendered as a bullet in the corresponding page.
221
+
222
+
Key differences from `getSupportLevel`:
223
+
224
+
-**No expression instance.** Both methods take no arguments, so they describe the expression in general rather than a specific call site. Use `getSupportLevel` for checks that depend on data types, argument values, or other per-instance details.
225
+
-**Markdown-friendly.** Each returned string is written to a Markdown document, so you can embed backticks, links, and line breaks. Keep each reason self-contained, since they are rendered as separate bullets.
226
+
-**Regenerated by CI.** The lists are collected by `GenerateDocs` and published by CI on every merge to `main`. The generated Markdown is not committed to the repo, so you do not need to regenerate or commit it yourself. The reasons do not have to match the `notes` passed to `Compatible`, `Incompatible`, or `Unsupported`, but keeping them consistent avoids confusing users.
#### Adding Spark-side Tests for the New Expression
212
273
213
274
It is important to verify that the new expression is correctly recognized by the native execution engine and matches the expected Spark behavior. The preferred way to add test coverage is to write a SQL test file using the SQL file test framework. This approach is simpler than writing Scala test code and makes it easy to cover many input combinations and edge cases.
Copy file name to clipboardExpand all lines: docs/source/user-guide/latest/compatibility/expressions/aggregate.md
+2-13Lines changed: 2 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,16 +19,5 @@ under the License.
19
19
20
20
# Aggregate Expressions
21
21
22
-
## Incompatible Aggregates
23
-
24
-
-**CollectSet**: Comet deduplicates NaN values (treats `NaN == NaN`) while Spark treats each NaN as a distinct value.
25
-
When `spark.comet.exec.strictFloatingPoint=true`, `collect_set` on floating-point types falls back to Spark unless
26
-
`spark.comet.expression.CollectSet.allowIncompatible=true` is set.
27
-
28
-
## ANSI Mode
29
-
30
-
Comet will fall back to Spark for the following aggregate expressions when ANSI mode is enabled. These can be enabled by setting `spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is the Spark expression class name. See the [Comet Supported Expressions Guide](../../expressions.md) for more information on this configuration setting.
31
-
32
-
- Average (supports all numeric inputs except decimal types)
33
-
34
-
There is an [epic](https://github.com/apache/datafusion-comet/issues/313) where we are tracking the work to fully implement ANSI support.
0 commit comments