Skip to content

Latest commit

 

History

History
61 lines (44 loc) · 2.76 KB

File metadata and controls

61 lines (44 loc) · 2.76 KB

Comet Roadmap

Comet is an open-source project and contributors are welcome to work on any issues at any time, but we find it helpful to have a roadmap for some of the major items that require coordination between contributors.

Major Initiatives

Iceberg Integration

Reads of Iceberg tables with Parquet data files are fully native and enabled by default, powered by a scan operator backed by Iceberg-rust (#2528). We anticipate major improvements in the next few releases, including bringing Iceberg table format V3 features (e.g., encryption) to the reader.

Spark 4.0 Support

Comet fully supports Spark 4.0. There is ongoing work to fully implement ANSI support (#313) for all supported expressions and to address remaining Spark 4.0-specific limitations.

Dynamic Partition Pruning

Iceberg table scans support Dynamic Partition Pruning (DPP) filters generated by Spark's PlanDynamicPruningFilters optimizer rule (#3349). However, we still need to bring this functionality to our Parquet reader. Furthermore, Spark's PlanAdaptiveDynamicPruningFilters optimizer rule runs after Comet's rules, so DPP with Adaptive Query Execution requires a redesign of Comet's plan translation. We are focused on implementing DPP to keep Comet competitive with benchmarks that benefit from this feature like TPC-DS. This effort can be tracked at #3510.

Ongoing Improvements

In addition to the major initiatives above, we have the following ongoing areas of work:

  • Adding support for more Spark expressions
  • Moving more expressions to the datafusion-spark crate in the core DataFusion repository
  • Performance tuning
  • Nested type support improvements