feat: add review-datafusion-pr Claude Code skill#3974
feat: add review-datafusion-pr Claude Code skill#3974andygrove wants to merge 3 commits intoapache:mainfrom
Conversation
|
Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too). |
It's a question worth discussing. The skill is reviewing specifically for any impact on the Comet project. |
My point exactly. DF reviewers may not always remember to consider impact on Comet and this would certainly be a useful for them and invaluable for us. |
| - [ ] Both scalar and array inputs are exercised (the README requires this) | ||
| - [ ] All accepted Spark input types are tested with explicit casts (`0::INT`, `0::BIGINT`, etc.) — DataFusion and Spark do not infer types the same way | ||
| - [ ] Null input is tested | ||
| - [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions |
There was a problem hiding this comment.
| - [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions | |
| - [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, `-Infinity`, `+0.0`, negative values for numeric functions |
| - [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions | ||
| - [ ] ANSI mode behavior is wrapped in `set datafusion.execution.enable_ansi_mode = true/false` pairs where Spark differs between modes | ||
| - [ ] Test only contains `SELECT` statements for the function under test, with no unrelated setup | ||
| - [ ] Header comments cite the upstream source if ported (the existing files show the pattern) |
There was a problem hiding this comment.
for nested types it is needed to test empty values if applicable, like empty array, map and mix of empty and non empty entries
Which issue does this PR close?
Closes #.
Rationale for this change
We already have a
review-comet-prskill that helps reviewers check PRs in this repo for Spark compatibility and implementation correctness. A similar workflow applies when reviewing PRs in the upstreamapache/datafusionrepository, particularly for thedatafusion-sparkcompatible function library and for core DataFusion changes that may affect Comet.The upstream repo has a different test approach. It uses
.slt(sqllogictest) files written in DataFusion SQL syntax, so the tests cannot be run directly in Spark. A reviewer needs to manually run equivalent queries in Spark to verify that the DataFusion implementation produces the same result.This skill packages that workflow so it is consistent across reviews and so new reviewers have a concrete checklist to follow.
What changes are included in this PR?
Adds a new Claude Code skill at
.claude/skills/review-datafusion-pr/SKILL.md. The skill covers:datafusion/spark/src/function/.slttest file against the testing guide indatafusion/sqllogictest/test_files/spark/README.md.slttests cannot prove Spark equivalence on their owndatafusion,datafusion-datasource,datafusion-physical-expr-adapter,datafusion-spark)How are these changes tested?
Manual review of the skill content. The skill is guidance for human reviewers and is not executed by CI.