Cast operations in Comet fall into three levels of support:
- C (Compatible): The results match Apache Spark
- I (Incompatible): The results may match Apache Spark for some inputs, but there are known issues where some inputs
will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting
spark.comet.expression.Cast.allowIncompatible=truewill allow all incompatible casts to run natively in Comet, but this is not recommended for production use. - U (Unsupported): Comet does not provide a native version of this cast expression and the query stage will fall back to Spark.
- N/A: Spark does not support this cast.
Cast will fall back to Spark in some cases when ANSI mode is enabled. This can be enabled by setting spark.comet.expression.Cast.allowIncompatible=true. See the Comet Supported Expressions Guide for more information on this configuration setting.
There is an epic where we are tracking the work to fully implement ANSI support.
Comet's native CAST(string AS DECIMAL) implementation matches Apache Spark's behavior,
including:
- Leading and trailing ASCII whitespace is trimmed before parsing.
- Null bytes (
\u0000) at the start or end of a string are trimmed, matching Spark'sUTF8Stringbehavior. Null bytes embedded in the middle of a string produceNULL. - Fullwidth Unicode digits (U+FF10–U+FF19, e.g.
123.45) are treated as their ASCII equivalents, soCAST('123.45' AS DECIMAL(10,2))returns123.45. - Scientific notation (e.g.
1.23E+5) is supported. - Special values (
inf,infinity,nan) produceNULL.
Comet's native CAST(string AS DATE) implementation matches Apache Spark's behavior for years
between 262143 BC and 262142 AD. This range limitation comes from the underlying chrono library's
NaiveDate type. Spark itself supports a wider range. All three eval modes (Legacy, ANSI, Try)
are supported.
Supported input formats match Spark exactly:
yyyy,yyyy-[m]m,yyyy-[m]m-[d]d- Optional
Tsuffix with arbitrary trailing text (e.g.2020-01-01T12:34:56) - Leading/trailing whitespace and control characters are trimmed
- Optional sign prefix (
-for negative years) - Leading zeros (e.g.
0002020-01-01is year 2020)
Comet's native CAST(date AS TIMESTAMP) is compatible with Spark. The cast interprets each
date as midnight in the session timezone and converts to a UTC epoch value. DST transitions
are handled correctly, including spring-forward gaps (where midnight may not exist) and
fall-back ambiguity (where Comet picks the earlier/DST occurrence, matching Spark's
LocalDate.atStartOfDay(zoneId) behavior).
Comet's native CAST(date AS TIMESTAMP_NTZ) is compatible with Spark. The cast is
timezone-independent: each date is converted to midnight as pure arithmetic
(days * 86,400,000,000 microseconds) with no session timezone offset applied. The result
is the same regardless of the session timezone setting.
In Legacy mode, CAST(date AS INT), CAST(date AS LONG), and casts to all other numeric
types (Boolean, Byte, Short, Float, Double, Decimal) always return NULL. Comet handles
this by short-circuiting to a null literal during query planning, so no native execution
is needed. In ANSI and Try modes, Spark rejects these casts at analysis time (before
execution reaches Comet).
Comet's native CAST(string AS TIMESTAMP) implementation supports all timestamp formats accepted
by Apache Spark, including ISO 8601 date-time strings, date-only strings, time-only strings
(HH:MM:SS), embedded timezone offsets (e.g. +07:30, GMT-01:00, UTC), named timezone
suffixes (e.g. Europe/Moscow), and the full Spark timestamp year range
(-290308 to 294247).
Comet's native CAST(string AS TIMESTAMP_NTZ) implementation matches Apache Spark's behavior.
Unlike CAST(string AS TIMESTAMP), this cast is timezone-independent: any timezone offset in
the input string (e.g. +08:00, Z, UTC) is silently discarded, and the local date-time
components are preserved as-is. Time-only strings (e.g. T12:34:56, 12:34) produce NULL.
The result is always a wall-clock timestamp with no timezone conversion or DST adjustment.
Comet supports the following TIMESTAMP_NTZ casts natively:
| Cast | Compatible | Notes |
|---|---|---|
CAST(timestamp_ntz AS STRING) |
Yes | Formats local time as-is, timezone-independent |
CAST(timestamp_ntz AS DATE) |
Yes | Extracts the date component, timezone-independent |
CAST(timestamp_ntz AS TIMESTAMP) |
Yes | Interprets NTZ as local time in session TZ, converts to UTC epoch |
CAST(date AS TIMESTAMP_NTZ) |
Yes | Pure arithmetic, timezone-independent |
CAST(timestamp AS TIMESTAMP_NTZ) |
Yes | Shifts UTC epoch to local time in session TZ |
CAST(string AS TIMESTAMP_NTZ) |
Yes | See String to TimestampNTZ above |
The NTZ-to-Timestamp and Timestamp-to-NTZ casts are session-timezone-dependent (the session timezone determines the UTC offset). All other NTZ casts are timezone-independent and produce the same result regardless of the session timezone.
Comet's native CAST(date AS STRING) is compatible with Spark. Years below 1000 are
zero-padded to four digits (e.g. year 999 renders as 0999-01-01). Years above 9999 are
rendered without truncation. The cast is timezone-independent.
Comet's native CAST(string AS TIMESTAMP_NTZ) implementation matches Apache Spark's behavior.
Unlike CAST(string AS TIMESTAMP), this cast is timezone-independent: any timezone offset in
the input string (e.g. +08:00, Z, UTC) is silently discarded, and the local date-time
components are preserved as-is. Time-only strings (e.g. T12:34:56, 12:34) produce NULL.
The result is always a wall-clock timestamp with no timezone conversion or DST adjustment.
Casting a DecimalType with a negative scale to StringType is marked as incompatible when
spark.sql.legacy.allowNegativeScaleOfDecimal is false (the default). When that config is
disabled, Spark cannot create negative-scale decimals, so Comet falls back to avoid running
native execution on unexpected inputs.
When spark.sql.legacy.allowNegativeScaleOfDecimal=true, the cast is compatible. Comet matches
Spark's behavior of using Java BigDecimal.toString() semantics, which produces scientific
notation (e.g. a value of 12300 stored as Decimal(7,-2) with unscaled value 123 is rendered
as "1.23E+4").
See the tracking issue for more details.