feat: add review-datafusion-pr Claude Code skill by andygrove · Pull Request #3974 · apache/datafusion-comet

andygrove · 2026-04-17T13:22:31Z

Which issue does this PR close?

Closes #.

Rationale for this change

We already have a review-comet-pr skill that helps reviewers check PRs in this repo for Spark compatibility and implementation correctness. A similar workflow applies when reviewing PRs in the upstream apache/datafusion repository, particularly for the datafusion-spark compatible function library and for core DataFusion changes that may affect Comet.

The upstream repo has a different test approach. It uses .slt (sqllogictest) files written in DataFusion SQL syntax, so the tests cannot be run directly in Spark. A reviewer needs to manually run equivalent queries in Spark to verify that the DataFusion implementation produces the same result.

This skill packages that workflow so it is consistent across reviews and so new reviewers have a concrete checklist to follow.

What changes are included in this PR?

Adds a new Claude Code skill at .claude/skills/review-datafusion-pr/SKILL.md. The skill covers:

PR classification into a Spark expression track, a Comet API impact track, or both
Reading the Spark source and Spark tests as the canonical reference for expression behavior
Reviewing the Rust implementation under datafusion/spark/src/function/
Reviewing the .slt test file against the testing guide in datafusion/sqllogictest/test_files/spark/README.md
A manual Spark cross-check step with translation notes from DataFusion SQL to Spark SQL, since .slt tests cannot prove Spark equivalence on their own
A checklist for breaking API changes in the DataFusion crates that Comet depends on (datafusion, datafusion-datasource, datafusion-physical-expr-adapter, datafusion-spark)
CI status, documentation, and common review issues

How are these changes tested?

Manual review of the skill content. The skill is guidance for human reviewers and is not executed by CI.

parthchandra · 2026-04-19T23:30:44Z

Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too).

andygrove · 2026-04-20T00:07:23Z

Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too).

It's a question worth discussing. The skill is reviewing specifically for any impact on the Comet project.

parthchandra · 2026-04-20T00:14:42Z

Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too).

It's a question worth discussing. The skill is reviewing specifically for any impact on the Comet project.

My point exactly. DF reviewers may not always remember to consider impact on Comet and this would certainly be a useful for them and invaluable for us.

comphead · 2026-04-20T16:13:45Z

+- [ ] Both scalar and array inputs are exercised (the README requires this)
+- [ ] All accepted Spark input types are tested with explicit casts (`0::INT`, `0::BIGINT`, etc.) — DataFusion and Spark do not infer types the same way
+- [ ] Null input is tested
+- [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions


Suggested change

- [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions

- [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, `-Infinity`, `+0.0`, negative values for numeric functions

comphead · 2026-04-20T16:14:39Z

+- [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions
+- [ ] ANSI mode behavior is wrapped in `set datafusion.execution.enable_ansi_mode = true/false` pairs where Spark differs between modes
+- [ ] Test only contains `SELECT` statements for the function under test, with no unrelated setup
+- [ ] Header comments cite the upstream source if ported (the existing files show the pattern)


for nested types it is needed to test empty values if applicable, like empty array, map and mix of empty and non empty entries

feat: add review-datafusion-pr Claude Code skill

15b2dde

andygrove marked this pull request as ready for review April 17, 2026 13:23

andygrove added 2 commits April 17, 2026 07:23

style: apply prettier to review-datafusion-pr skill

ec4da85

feat: require Spark output evidence when flagging incompatibilities

ffb9023

comphead reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add review-datafusion-pr Claude Code skill#3974

feat: add review-datafusion-pr Claude Code skill#3974
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:feat/review-datafusion-pr-skill

andygrove commented Apr 17, 2026

Uh oh!

parthchandra commented Apr 19, 2026

Uh oh!

andygrove commented Apr 20, 2026

Uh oh!

parthchandra commented Apr 20, 2026

Uh oh!

comphead Apr 20, 2026

Uh oh!

comphead Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	- [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, negative values for numeric functions
	- [ ] Edge cases: empty string, boundary values (e.g., `INT_MIN`), `NaN`, `Infinity`, `-0.0`, `-Infinity`, `+0.0`, negative values for numeric functions

Conversation

andygrove commented Apr 17, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

parthchandra commented Apr 19, 2026

Uh oh!

andygrove commented Apr 20, 2026

Uh oh!

parthchandra commented Apr 20, 2026

Uh oh!

comphead Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

comphead Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants