Cast

Cast operations in Comet fall into three levels of support:

C (Compatible): The results match Apache Spark
I (Incompatible): The results may match Apache Spark for some inputs, but there are known issues where some inputs will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting spark.comet.expression.Cast.allowIncompatible=true will allow all incompatible casts to run natively in Comet, but this is not recommended for production use.
U (Unsupported): Comet does not provide a native version of this cast expression and the query stage will fall back to Spark.
N/A: Spark does not support this cast.

ANSI Mode Fallback

Cast will fall back to Spark in some cases when ANSI mode is enabled. This can be enabled by setting spark.comet.expression.Cast.allowIncompatible=true. See the Comet Supported Expressions Guide for more information on this configuration setting.

There is an epic where we are tracking the work to fully implement ANSI support.

String to Decimal

Comet's native CAST(string AS DECIMAL) implementation matches Apache Spark's behavior, including:

Leading and trailing ASCII whitespace is trimmed before parsing.
Null bytes (\u0000) at the start or end of a string are trimmed, matching Spark's UTF8String behavior. Null bytes embedded in the middle of a string produce NULL.
Fullwidth Unicode digits (U+FF10–U+FF19, e.g. １２３.４５) are treated as their ASCII equivalents, so CAST('１２３.４５' AS DECIMAL(10,2)) returns 123.45.
Scientific notation (e.g. 1.23E+5) is supported.
Special values (inf, infinity, nan) produce NULL.

String to Date

Comet's native CAST(string AS DATE) implementation matches Apache Spark's behavior for years between 262143 BC and 262142 AD. This range limitation comes from the underlying chrono library's NaiveDate type. Spark itself supports a wider range. All three eval modes (Legacy, ANSI, Try) are supported.

Supported input formats match Spark exactly:

yyyy, yyyy-[m]m, yyyy-[m]m-[d]d
Optional T suffix with arbitrary trailing text (e.g. 2020-01-01T12:34:56)
Leading/trailing whitespace and control characters are trimmed
Optional sign prefix (- for negative years)
Leading zeros (e.g. 0002020-01-01 is year 2020)

Date to Timestamp

Comet's native CAST(date AS TIMESTAMP) is compatible with Spark. The cast interprets each date as midnight in the session timezone and converts to a UTC epoch value. DST transitions are handled correctly, including spring-forward gaps (where midnight may not exist) and fall-back ambiguity (where Comet picks the earlier/DST occurrence, matching Spark's LocalDate.atStartOfDay(zoneId) behavior).

Date to TimestampNTZ

Comet's native CAST(date AS TIMESTAMP_NTZ) is compatible with Spark. The cast is timezone-independent: each date is converted to midnight as pure arithmetic (days * 86,400,000,000 microseconds) with no session timezone offset applied. The result is the same regardless of the session timezone setting.

Date to Numeric Types

In Legacy mode, CAST(date AS INT), CAST(date AS LONG), and casts to all other numeric types (Boolean, Byte, Short, Float, Double, Decimal) always return NULL. Comet handles this by short-circuiting to a null literal during query planning, so no native execution is needed. In ANSI and Try modes, Spark rejects these casts at analysis time (before execution reaches Comet).

String to Timestamp

Comet's native CAST(string AS TIMESTAMP) implementation supports all timestamp formats accepted by Apache Spark, including ISO 8601 date-time strings, date-only strings, time-only strings (HH:MM:SS), embedded timezone offsets (e.g. +07:30, GMT-01:00, UTC), named timezone suffixes (e.g. Europe/Moscow), and the full Spark timestamp year range (-290308 to 294247).

String to TimestampNTZ

Comet's native CAST(string AS TIMESTAMP_NTZ) implementation matches Apache Spark's behavior. Unlike CAST(string AS TIMESTAMP), this cast is timezone-independent: any timezone offset in the input string (e.g. +08:00, Z, UTC) is silently discarded, and the local date-time components are preserved as-is. Time-only strings (e.g. T12:34:56, 12:34) produce NULL. The result is always a wall-clock timestamp with no timezone conversion or DST adjustment.

TimestampNTZ Casts

Comet supports the following TIMESTAMP_NTZ casts natively:

Cast	Compatible	Notes
`CAST(timestamp_ntz AS STRING)`	Yes	Formats local time as-is, timezone-independent
`CAST(timestamp_ntz AS DATE)`	Yes	Extracts the date component, timezone-independent
`CAST(timestamp_ntz AS TIMESTAMP)`	Yes	Interprets NTZ as local time in session TZ, converts to UTC epoch
`CAST(date AS TIMESTAMP_NTZ)`	Yes	Pure arithmetic, timezone-independent
`CAST(timestamp AS TIMESTAMP_NTZ)`	Yes	Shifts UTC epoch to local time in session TZ
`CAST(string AS TIMESTAMP_NTZ)`	Yes	See String to TimestampNTZ above

The NTZ-to-Timestamp and Timestamp-to-NTZ casts are session-timezone-dependent (the session timezone determines the UTC offset). All other NTZ casts are timezone-independent and produce the same result regardless of the session timezone.

Date to String

Comet's native CAST(date AS STRING) is compatible with Spark. Years below 1000 are zero-padded to four digits (e.g. year 999 renders as 0999-01-01). Years above 9999 are rendered without truncation. The cast is timezone-independent.

String to TimestampNTZ

Decimal with Negative Scale to String

Casting a DecimalType with a negative scale to StringType is marked as incompatible when spark.sql.legacy.allowNegativeScaleOfDecimal is false (the default). When that config is disabled, Spark cannot create negative-scale decimals, so Comet falls back to avoid running native execution on unexpected inputs.

When spark.sql.legacy.allowNegativeScaleOfDecimal=true, the cast is compatible. Comet matches Spark's behavior of using Java BigDecimal.toString() semantics, which produces scientific notation (e.g. a value of 12300 stored as Decimal(7,-2) with unscaled value 123 is rendered as "1.23E+4").

Legacy Mode

Try Mode

ANSI Mode

See the tracking issue for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cast

ANSI Mode Fallback

String to Decimal

String to Date

Date to Timestamp

Date to TimestampNTZ

Date to Numeric Types

String to Timestamp

String to TimestampNTZ

TimestampNTZ Casts

Date to String

String to TimestampNTZ

Decimal with Negative Scale to String

Legacy Mode

Try Mode

ANSI Mode

FilesExpand file tree

cast.md

Latest commit

History

cast.md

File metadata and controls

Cast

ANSI Mode Fallback

String to Decimal

String to Date

Date to Timestamp

Date to TimestampNTZ

Date to Numeric Types

String to Timestamp

String to TimestampNTZ

TimestampNTZ Casts

Date to String

String to TimestampNTZ

Decimal with Negative Scale to String

Legacy Mode

Try Mode

ANSI Mode