CometSqlFileTestSuite is a test suite that automatically discovers .sql test files and
runs each query through both Spark and Comet, comparing results. This provides a lightweight
way to add expression and operator test coverage without writing Scala test code.
Run all SQL file tests:
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -Dtest=noneRun a single test file by adding the file name (without .sql extension) after the suite name:
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite create_named_struct" -Dtest=noneThis uses ScalaTest's substring matching, so the argument must match part of the test name.
Test names follow the pattern sql-file: expressions/<category>/<file>.sql [<config>].
SQL test files live under:
spark/src/test/resources/sql-tests/expressions/
Files are organized into category subdirectories:
expressions/
aggregate/ -- avg, sum, count, min_max, ...
array/ -- array_contains, array_append, get_array_item, ...
bitwise/
cast/
conditional/ -- case_when, coalesce, if_expr, ...
datetime/ -- date_add, date_diff, unix_timestamp, ...
decimal/
hash/
map/ -- get_map_value, map_keys, map_values, ...
math/ -- abs, ceil, floor, round, sqrt, ...
misc/ -- width_bucket, scalar_subquery, ...
string/ -- concat, like, substring, lower, upper, ...
struct/ -- create_named_struct, get_struct_field, ...
The test suite recursively discovers all .sql files in these directories. Each file becomes
one or more ScalaTest test cases.
A test file consists of SQL comments, directives, statements, and queries separated by blank lines. Here is a minimal example:
statement
CREATE TABLE test_abs(v double) USING parquet
statement
INSERT INTO test_abs VALUES (1.5), (-2.5), (0.0), (NULL)
query
SELECT abs(v) FROM test_absDirectives are SQL comments at the top of the file that configure how the test runs.
Sets a Spark SQL config for all queries in the file.
-- Config: spark.sql.ansi.enabled=trueRuns the entire file once per combination of values. Multiple ConfigMatrix lines produce a
cross product of all combinations.
-- ConfigMatrix: spark.sql.optimizer.inSetConversionThreshold=100,0This generates two test cases:
sql-file: expressions/conditional/in_set.sql [spark.sql.optimizer.inSetConversionThreshold=100]
sql-file: expressions/conditional/in_set.sql [spark.sql.optimizer.inSetConversionThreshold=0]
Only add a ConfigMatrix directive when there is a real reason to run the test under
multiple configurations. Do not add ConfigMatrix directives speculatively.
Skips the file when running on a Spark version older than the specified version.
-- MinSparkVersion: 3.5A statement block executes DDL or DML and does not check results. Use this for CREATE TABLE
and INSERT commands. Table names are automatically extracted for cleanup after the test.
statement
CREATE TABLE my_table(x int, y double) USING parquet
statement
INSERT INTO my_table VALUES (1, 2.0), (3, 4.0), (NULL, NULL)A query block executes a SELECT and compares results between Spark and Comet. The query
mode controls how results are validated.
Checks that the query runs natively on Comet (not falling back to Spark) and that results match Spark exactly.
query
SELECT abs(v) FROM test_absOnly checks that Comet results match Spark. Does not assert that the query runs natively. Use this for expressions that Comet may not fully support yet but should still produce correct results.
query spark_answer_only
SELECT some_expression(v) FROM test_tableChecks results with a numeric tolerance. Useful for floating-point functions where small differences are acceptable.
query tolerance=0.0001
SELECT cos(v) FROM test_trigAsserts that the query falls back to Spark and verifies the fallback reason contains the given string.
query expect_fallback(unsupported expression)
SELECT unsupported_func(v) FROM test_tableSkips the query entirely. Use this for queries that hit known bugs. The reason should be a link to the tracking GitHub issue.
-- Comet bug: space(-1) causes native crash
query ignore(https://github.com/apache/datafusion-comet/issues/3326)
SELECT space(n) FROM test_space WHERE n < 0Asserts that both Spark and Comet throw an exception containing the given pattern. Use this for ANSI mode tests where invalid operations should throw errors.
-- Config: spark.sql.ansi.enabled=true
-- integer overflow should throw in ANSI mode
query expect_error(ARITHMETIC_OVERFLOW)
SELECT 2147483647 + 1
-- division by zero should throw in ANSI mode
query expect_error(DIVIDE_BY_ZERO)
SELECT 1 / 0
-- array out of bounds should throw in ANSI mode
query expect_error(INVALID_ARRAY_INDEX)
SELECT array(1, 2, 3)[10]-
Create a
.sqlfile under the appropriate subdirectory inspark/src/test/resources/sql-tests/expressions/. Create a new subdirectory if no existing category fits. -
Add the Apache license header as a SQL comment.
-
Add a
ConfigMatrixdirective only if the test needs to run under multiple configurations (e.g., testing behavior that varies with a specific Spark config). Do not addConfigMatrixdirectives speculatively. -
Create tables and insert test data using
statementblocks. Include edge cases such asNULL, boundary values, and negative numbers. -
Add
queryblocks for each expression or behavior to test. Use the defaultquerymode when you expect Comet to run the expression natively. Usequery spark_answer_onlywhen native execution is not yet expected. -
Run the tests to verify:
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -Dtest=none
Comet often uses different code paths for literal values versus column references. Tests should exercise both. For a function with multiple arguments, test every useful combination.
For a single-argument function, test both a column reference and a literal:
-- column argument (reads from Parquet, goes through columnar evaluation)
query
SELECT ascii(s) FROM test_ascii
-- literal arguments
query
SELECT ascii('A'), ascii(''), ascii(NULL)For a multi-argument function like concat_ws(sep, str1, str2, ...), test with the separator
as a column versus a literal, and similarly for the other arguments:
-- all columns
query
SELECT concat_ws(sep, a, b) FROM test_table
-- literal separator, column values
query
SELECT concat_ws(',', a, b) FROM test_table
-- all literals
query
SELECT concat_ws(',', 'hello', 'world')Note on constant folding: Normally Spark constant-folds all-literal expressions during
planning, so Comet would never see them. However, CometSqlFileTestSuite automatically
disables constant folding (by excluding ConstantFolding from the optimizer rules), so
all-literal queries are evaluated by Comet's native engine. This means you can use the
default query mode for all-literal cases and they will be tested natively just like
column-based queries.
Include edge-case values in your test data. The exact cases depend on the function, but common ones include:
- NULL values -- every test should include NULLs
- Empty strings -- for string functions
- Zero, negative, and very large numbers -- for numeric functions
- Boundary values --
INT_MIN,INT_MAX,NaN,Infinity,-Infinityfor numeric types - Special characters and multibyte UTF-8 -- for string functions (e.g.
'é','中文','\t') - Empty arrays/maps -- for collection functions
- Single-element and multi-element collections -- for aggregate and collection functions
Keep each .sql file focused on a single expression or a small group of closely related
expressions. This makes failures easy to locate and keeps files readable.
Add SQL comments before query blocks to describe what aspect of the expression is being tested. This helps reviewers and future maintainers understand the intent:
-- literal separator with NULL values
query
SELECT concat_ws(',', NULL, 'b', 'c')
-- empty separator
query
SELECT concat_ws('', a, b, c) FROM test_tableWriting thorough SQL test files is a task well suited to agentic coding tools such as
Claude Code. You can point the tool at an existing test file as an example, describe the
expression you want to test, and ask it to generate a complete .sql file covering all
argument combinations and edge cases. This is significantly faster than writing the
combinatorial test cases by hand and helps ensure nothing is missed.
For example:
Read the test file spark/src/test/resources/sql-tests/expressions/string/ascii.sql
and the documentation in docs/source/contributor-guide/sql-file-tests.md.
Then write a similar test file for the `reverse` function, covering column arguments,
literal arguments, NULLs, empty strings, and multibyte characters.
Set the system property comet.sqlFileTest.verboseOutput to a file path to
have the suite append a markdown-formatted record of every query it runs,
including the SQL, mode, source file and line, and Spark's actual result or
error. This is useful for documenting Spark behavior across versions or for
cross-checking results against another engine.
rm -f /tmp/sql-file-tests.md
ARGS="-ea -Xmx4g -Xss4m -XX:+IgnoreUnrecognizedVMOptions \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.nio=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED \
--add-opens=java.base/sun.security.action=ALL-UNNAMED \
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED \
-Djdk.reflect.useDirectMethodHandle=false \
-Dcomet.sqlFileTest.verboseOutput=/tmp/sql-file-tests.md"
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite encode" \
-Dtest=none -DargLine="$ARGS"The JDK 17 --add-opens flags are copied from the pom's default
extraJavaTestArgs. Overriding argLine replaces that default entirely, so
all the module-opens entries must be reproduced alongside the new property.
Skipped files (e.g. those gated by MinSparkVersion) do not appear in the
output. Queries marked ignore(...) are not executed, so they are also
absent.
When a query fails due to a known Comet bug:
- File a GitHub issue describing the problem.
- Change the query mode to
ignore(...)with a link to the issue. - Optionally add a SQL comment above the query explaining the problem.
-- GetArrayItem returns incorrect results with dynamic index
query ignore(https://github.com/apache/datafusion-comet/issues/3332)
SELECT arr[idx] FROM test_get_array_itemWhen the bug is fixed, remove the ignore(...) and restore the original query mode.
The test infrastructure consists of two Scala files:
-
SqlFileTestParser(spark/src/test/scala/org/apache/comet/SqlFileTestParser.scala) -- Parses.sqlfiles into aSqlTestFiledata structure containing directives, statements, and queries. -
CometSqlFileTestSuite(spark/src/test/scala/org/apache/comet/CometSqlFileTestSuite.scala) -- Discovers test files at suite initialization time, generates ScalaTest test cases for each file and config combination, and executes them usingCometTestBaseassertion methods.
Tables created in test files are automatically cleaned up after each test.