Skip to content

Need for Customizable Plan Annotation and Network Boundary Logic in DFD #347

@gabrielkerr

Description

@gabrielkerr

Summary:
This issue aims to start a discussion around improving extensibility in datafusion-distributed, especially for custom plan annotations and network boundaries. I would appreciate insights from the DataFusion community on potential design directions and best practices.


Challenge and Motivation

I believe there is significant value in expanding the extensibility of datafusion-distributed (DFD). The project’s core strengths—plan annotation, insertion of network boundaries, and distribution of sub‑plans to workers—make it a natural place for more flexible customization.

My colleagues and I have been working toward implementing custom network boundaries and plan annotations in a fork of DFD. The use case involves inserting multiple ExecutionPlan nodes instead of relying solely on NetworkShuffleExec, NetworkCoalesceExec, or NetworkBroadcastExec. In practice, this requires a mechanism to introduce custom plan annotations and network boundaries beyond what DFD currently supports.

An initial attempt at introducing this extensibility can be found in this draft PR by @kurtvolmar:
kurtvolmar#1.

However, given that DataFusion itself already provides many extension points, it may not be ideal for DFD to introduce a separate, parallel extensibility framework. Instead, it seems likely that a DataFusion‑native mechanism could be leveraged to support more flexible behavior within DFD.


Proposal and Call for Collaboration

The existing DistributedPhysicalOptimizerRule encapsulates a substantial amount of logic. One possible direction would be to decompose this rule into smaller, more focused components—such as a PlanAnnotationRule and NetworkBoundaryRule—and expose configuration or hooks that allow users to implement custom logic where needed.

Community input would be highly valuable, particularly around:

  • Whether splitting DistributedPhysicalOptimizerRule into smaller, pluggable rules aligns with the project’s direction.
  • Alternative approaches in DataFusion that could enable the desired extensibility without modifying DFD directly.
  • Prior art or patterns in the DataFusion ecosystem that could help inform a clean design.

Feedback, suggestions, or discussion from maintainers and contributors would be greatly appreciated. The goal is to collaborate on a design that increases flexibility without adding unnecessary complexity to DFD.

cc: @gabotechs

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions