Skip to content

test: [Spark 4.1.1] unignore SPARK-52921 union partitioning tests#4195

Merged
mbutrovich merged 2 commits intoapache:mainfrom
andygrove:issue-4191
May 3, 2026
Merged

test: [Spark 4.1.1] unignore SPARK-52921 union partitioning tests#4195
mbutrovich merged 2 commits intoapache:mainfrom
andygrove:issue-4191

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4191 (sub-issue of #4098).

Rationale for this change

Three Spark 4.1 tests in `DataFrameSetOperationsSuite` were ignored under Comet:

  • `SPARK-52921: union partitioning - reused shuffle`
  • `SPARK-52921: union partitioning - semantic equality`
  • `SPARK-52921: union partitioning - range partitioning`

The tests inspect the executed plan with strict pattern matches:

```scala
case u: UnionExec => u
case s: ShuffleExchangeExec => s
```

Under Comet, `UnionExec` is replaced by `CometUnionExec` (extends `CometExec`, not `UnionExec`) and `ShuffleExchangeExec` is replaced by `CometShuffleExchangeExec` (extends `ShuffleExchangeLike`, the trait both implementations share). The collectors found zero operators, the `size == 1` assertions failed, and the IgnoreComet was added pointing at the umbrella tracking issue #4098.

What changes are included in this PR?

Patch the matchers in `dev/diffs/4.1.1.diff` so the tests recognize Comet's wrappers:

  • `case s: ShuffleExchangeExec` → `case s: ShuffleExchangeLike` (one trait, matches both impls).
  • `case u: UnionExec` → also match `case u: CometUnionExec` (no shared parent, so two cases).

Both are valid for vanilla Spark too: `ShuffleExchangeExec` extends `ShuffleExchangeLike`, and the additional `CometUnionExec` case is simply unreachable when Comet is disabled.

How are these changes tested?

The fix is test-side only (no production code change). The partitioning equality assertions still hold under Comet because:

  • `CometShuffleExchangeExec.apply` (in `ShimCometShuffleExchangeExec`) sets `outputPartitioning = wrapped.outputPartitioning`, preserving the original shuffle's partitioning.
  • `CometUnionExec.outputPartitioning` delegates to `originalPlan.outputPartitioning` (the wrapped `UnionExec`), which honors `UNION_OUTPUT_PARTITIONING` and computes against its original Spark children — so the SPARK-52921 semantics are preserved end-to-end.

The Spark SQL CI workflow will exercise the un-ignored tests on Spark 4.1.1.

Closes apache#4191.

The three SPARK-52921 union output partitioning tests in
DataFrameSetOperationsSuite were tagged with IgnoreComet because they
collect plan operators with strict pattern matches:

  case u: UnionExec => u
  case s: ShuffleExchangeExec => s

Comet replaces UnionExec with CometUnionExec (extends CometExec, not
UnionExec) and ShuffleExchangeExec with CometShuffleExchangeExec
(extends ShuffleExchangeLike, the trait both impls share). The matchers
found zero operators and the size assertions failed.

Update the matchers in dev/diffs/4.1.1.diff:

- ShuffleExchangeExec -> ShuffleExchangeLike (matches both Spark's
  ShuffleExchangeExec and Comet's CometShuffleExchangeExec).
- UnionExec -> match UnionExec OR CometUnionExec; both expose
  outputPartitioning via SparkPlan.

CometShuffleExchangeExec.apply (ShimCometShuffleExchangeExec) sets
outputPartitioning = wrapped.outputPartitioning, and CometUnionExec
delegates outputPartitioning to its originalPlan (the wrapped
UnionExec), so the partitioning equality assertions in the SPARK-52921
tests still hold under Comet.
@andygrove andygrove changed the title test: unignore SPARK-52921 union partitioning tests test: [Spark 4.1.1] unignore SPARK-52921 union partitioning tests May 3, 2026
@andygrove andygrove marked this pull request as draft May 3, 2026 17:59
Spark's strict-mode scalac (-Wconf cat=unused-imports as fatal warning) failed
the build because two of the imports added in the previous commit are no
longer referenced after switching all matchers to ShuffleExchangeLike:

- CometShuffleExchangeExec: unused (no `case _: CometShuffleExchangeExec`)
- ShuffleExchangeExec: unused (only ShuffleExchangeLike is matched)

Drop both from the diff so the import block matches what the test actually
references.
@andygrove andygrove marked this pull request as ready for review May 3, 2026 23:29
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see more tests running, thanks @andygrove!

@mbutrovich mbutrovich merged commit 1a6cd98 into apache:main May 3, 2026
192 of 193 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark 4.1: SPARK-52921 union output partitioning tests fail because Comet replaces UnionExec/ShuffleExchangeExec

2 participants