You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Comet replaces Spark operators and expressions with native Rust implementations that run on Apache DataFusion.
62
+
It uses Apache Arrow for zero-copy data transfer between the JVM and native code.
74
63
75
-
These benchmarks can be reproduced in any environment using the documentation in the
76
-
[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage
77
-
you to run your own benchmarks.
64
+
-**Parquet scans**: native Parquet reader integrated with Spark's query planner
65
+
-**Apache Iceberg**: accelerated Parquet scans when reading Iceberg tables from Spark
66
+
(see the [Iceberg guide](https://datafusion.apache.org/comet/user-guide/iceberg.html))
67
+
-**Shuffle**: native columnar shuffle with support for hash and range partitioning
68
+
-**Expressions**: hundreds of supported Spark expressions across math, string, datetime, array,
69
+
map, JSON, hash, and predicate categories
70
+
-**Aggregations**: hash aggregate with support for `FILTER (WHERE ...)` clauses
71
+
-**Joins**: hash join, sort-merge join, and broadcast join
78
72
79
-
Results for our benchmark derived from TPC-DS are available in the [benchmarking guide](https://datafusion.apache.org/comet/contributor-guide/benchmark-results/tpc-ds.html).
73
+
For the authoritative lists, see the [supported expressions](https://datafusion.apache.org/comet/user-guide/expressions.html)
74
+
and [supported operators](https://datafusion.apache.org/comet/user-guide/operators.html) pages.
80
75
81
-
## Use Commodity Hardware
76
+
## Drop-In Integration
82
77
83
-
Comet leverages commodity hardware, eliminating the need for costly hardware upgrades or
84
-
specialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet
85
-
ensures cost-effectiveness and scalability for your Spark deployments.
78
+
Comet is designed as a drop-in accelerator for Apache Spark, allowing you to integrate Comet into your existing
79
+
Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness the
80
+
benefits of Comet's acceleration capabilities without disrupting your Spark applications.
86
81
87
-
## Spark Compatibility
82
+
## Getting Started
88
83
89
-
Comet aims for 100% compatibility with all supported versions of Apache Spark, allowing you to integrate Comet into
90
-
your existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness
91
-
the benefits of Comet's acceleration capabilities without disrupting your Spark applications.
84
+
Comet supports Apache Spark 3.4 and 3.5, and provides experimental support for Spark 4.0. See the
85
+
[installation guide](https://datafusion.apache.org/comet/user-guide/installation.html) for the detailed
86
+
version, Java, and Scala compatibility matrix.
92
87
93
-
## Tight Integration with Apache DataFusion
88
+
Install Comet by adding the jar for your Spark and Scala version to the Spark classpath and enabling the plugin.
89
+
A typical configuration looks like:
94
90
95
-
Comet tightly integrates with the core Apache DataFusion project, leveraging its powerful execution engine. With
96
-
seamless interoperability between Comet and DataFusion, you can achieve optimal performance and efficiency in your
0 commit comments