-
Notifications
You must be signed in to change notification settings - Fork 304
docs: Generate expression compatibility docs from code #4057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
371741d
9f0cdd5
587adcf
03c7156
6fff3d9
654391a
3413595
7cc25ff
55c05dc
c4a670b
2900e1f
c948feb
a013aed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,10 +70,12 @@ object CometUnhex extends CometExpressionSerde[Unhex] { | |
| } | ||
| ``` | ||
|
|
||
| The `CometExpressionSerde` trait provides three methods you can override: | ||
| The `CometExpressionSerde` trait provides several methods you can override: | ||
|
|
||
| - `convert(expr: T, inputs: Seq[Attribute], binding: Boolean): Option[Expr]` - **Required**. Converts the Spark expression to protobuf. Return `None` if the expression cannot be converted. | ||
| - `getSupportLevel(expr: T): SupportLevel` - Optional. Returns the level of support for the expression. See "Using getSupportLevel" section below for details. | ||
| - `getSupportLevel(expr: T): SupportLevel` - Optional. Returns the level of support for the expression at planning time, based on a specific expression instance. See "Using getSupportLevel" section below for details. | ||
| - `getIncompatibleReasons(): Seq[String]` - Optional. Returns reasons why this expression may produce different results than Spark. Used to generate the Compatibility Guide. See "Documenting Incompatible and Unsupported Reasons" below. | ||
| - `getUnsupportedReasons(): Seq[String]` - Optional. Returns reasons why this expression may not be supported by Comet (for example, unsupported data types or format strings). Used to generate the Compatibility Guide. See "Documenting Incompatible and Unsupported Reasons" below. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets make section reference as links
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks will address these points in next PR
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed in #4067 |
||
| - `getExprConfigName(expr: T): String` - Optional. Returns a short name for configuration keys. Defaults to the Spark class name. | ||
|
|
||
| For simple scalar functions that map directly to a DataFusion function, you can use the built-in `CometScalarFunction` implementation: | ||
|
|
@@ -208,6 +210,65 @@ When the query planner encounters an expression: | |
|
|
||
| Any notes provided will be logged to help with debugging and understanding why an expression was not used. | ||
|
|
||
| #### Documenting Incompatible and Unsupported Reasons | ||
|
|
||
| In addition to `getSupportLevel`, which governs runtime planning decisions, the serde trait exposes two static documentation methods: | ||
|
|
||
| - `getIncompatibleReasons(): Seq[String]` - Reasons the expression may produce different results than Spark. | ||
| - `getUnsupportedReasons(): Seq[String]` - Reasons the expression, or certain usages of it, may not be supported by Comet. | ||
|
|
||
| These methods do not affect runtime behavior. They are called by `GenerateDocs` (`spark/src/main/scala/org/apache/comet/GenerateDocs.scala`) when building the user-facing Compatibility Guide pages under `docs/source/user-guide/latest/compatibility/expressions/` (for example, `math.md`, `datetime.md`, `array.md`, `aggregate.md`, `struct.md`). Each reason is rendered as a bullet in the corresponding page. | ||
|
|
||
| Key differences from `getSupportLevel`: | ||
|
|
||
| - **No expression instance.** Both methods take no arguments, so they describe the expression in general rather than a specific call site. Use `getSupportLevel` for checks that depend on data types, argument values, or other per-instance details. | ||
| - **Markdown-friendly.** Each returned string is written to a Markdown document, so you can embed backticks, links, and line breaks. Keep each reason self-contained, since they are rendered as separate bullets. | ||
| - **Regenerated by CI.** The lists are collected by `GenerateDocs` and published by CI on every merge to `main`. The generated Markdown is not committed to the repo, so you do not need to regenerate or commit it yourself. The reasons do not have to match the `notes` passed to `Compatible`, `Incompatible`, or `Unsupported`, but keeping them consistent avoids confusing users. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont see
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No CI changes needed - GenerateDocs is already called by a GitHub action |
||
|
|
||
| ##### Example: Incompatibility note | ||
|
|
||
| ```scala | ||
| object CometStructsToJson extends CometExpressionSerde[StructsToJson] { | ||
|
|
||
| override def getIncompatibleReasons(): Seq[String] = Seq( | ||
| "Does not support `+Infinity` and `-Infinity` for numeric types (float, double)." + | ||
| " (https://github.com/apache/datafusion-comet/issues/3016)") | ||
|
|
||
| override def getSupportLevel(expr: StructsToJson): SupportLevel = | ||
| Incompatible( | ||
| Some( | ||
| "Does not support Infinity/-Infinity for numeric types" + | ||
| " (https://github.com/apache/datafusion-comet/issues/3016)")) | ||
|
|
||
| // ... convert ... | ||
| } | ||
| ``` | ||
|
|
||
| ##### Example: Unsupported note | ||
|
|
||
| ```scala | ||
| object CometSortArray extends CometExpressionSerde[SortArray] { | ||
|
|
||
| override def getUnsupportedReasons(): Seq[String] = Seq( | ||
| "Nested arrays with `Struct` or `Null` child values are not supported natively and will" + | ||
| " fall back to Spark.") | ||
|
|
||
| // ... convert ... | ||
| } | ||
| ``` | ||
|
|
||
| ##### Example: Enumerating supported values | ||
|
|
||
| When the expression only supports a known set of argument values, build the reason from that set so the documentation stays in sync with the code: | ||
|
|
||
| ```scala | ||
| override def getUnsupportedReasons(): Seq[String] = Seq( | ||
| "Only the following formats are supported:" + | ||
| supportedFormats.keys.toSeq.sorted | ||
| .map(k => s"`$k`") | ||
| .mkString("\n - ", "\n - ", "")) | ||
| ``` | ||
|
|
||
| #### Adding Spark-side Tests for the New Expression | ||
|
|
||
| It is important to verify that the new expression is correctly recognized by the native execution engine and matches the expected Spark behavior. The preferred way to add test coverage is to write a SQL test file using the SQL file test framework. This approach is simpler than writing Scala test code and makes it easy to cover many input combinations and edge cases. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,6 +31,7 @@ Compatibility notes are grouped by expression category: | |
| aggregate | ||
| array | ||
| datetime | ||
| math | ||
| struct | ||
| cast | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| <!--- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # Math Expressions | ||
|
|
||
| <!--BEGIN:EXPR_COMPAT[math]--> | ||
| <!--END:EXPR_COMPAT--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great to show what are levels of support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in #4067