Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 70 additions & 2 deletions docs/source/user-guide/latest/compatibility/expressions/cast.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,82 @@ including:
- Scientific notation (e.g. `1.23E+5`) is supported.
- Special values (`inf`, `infinity`, `nan`) produce `NULL`.

## String to Date

Comet's native `CAST(string AS DATE)` implementation matches Apache Spark's behavior for years
between 262143 BC and 262142 AD. This range limitation comes from the underlying chrono library's
`NaiveDate` type. Spark itself supports a wider range. All three eval modes (Legacy, ANSI, Try)
are supported.

Supported input formats match Spark exactly:

- `yyyy`, `yyyy-[m]m`, `yyyy-[m]m-[d]d`
- Optional `T` suffix with arbitrary trailing text (e.g. `2020-01-01T12:34:56`)
- Leading/trailing whitespace and control characters are trimmed
- Optional sign prefix (`-` for negative years)
- Leading zeros (e.g. `0002020-01-01` is year 2020)

## Date to Timestamp

Comet's native `CAST(date AS TIMESTAMP)` is compatible with Spark. The cast interprets each
date as midnight in the session timezone and converts to a UTC epoch value. DST transitions
are handled correctly, including spring-forward gaps (where midnight may not exist) and
fall-back ambiguity (where Comet picks the earlier/DST occurrence, matching Spark's
`LocalDate.atStartOfDay(zoneId)` behavior).

## Date to TimestampNTZ

Comet's native `CAST(date AS TIMESTAMP_NTZ)` is compatible with Spark. The cast is
timezone-independent: each date is converted to midnight as pure arithmetic
(`days * 86,400,000,000` microseconds) with no session timezone offset applied. The result
is the same regardless of the session timezone setting.

## Date to Numeric Types

In Legacy mode, `CAST(date AS INT)`, `CAST(date AS LONG)`, and casts to all other numeric
types (Boolean, Byte, Short, Float, Double, Decimal) always return `NULL`. Comet handles
this by short-circuiting to a null literal during query planning, so no native execution
is needed. In ANSI and Try modes, Spark rejects these casts at analysis time (before
execution reaches Comet).

## String to Timestamp

Comet's native `CAST(string AS TIMESTAMP)` implementation supports all timestamp formats accepted
by Apache Spark, including ISO 8601 date-time strings, date-only strings, time-only strings
(`HH:MM:SS`), embedded timezone offsets (e.g. `+07:30`, `GMT-01:00`, `UTC`), named timezone
suffixes (e.g. `Europe/Moscow`), and the full Spark timestamp year range
(-290308 to 294247). Note that `CAST(string AS DATE)` is only compatible for years between
262143 BC and 262142 AD due to an underlying library limitation.
(-290308 to 294247).

## String to TimestampNTZ

Comet's native `CAST(string AS TIMESTAMP_NTZ)` implementation matches Apache Spark's behavior.
Unlike `CAST(string AS TIMESTAMP)`, this cast is timezone-independent: any timezone offset in
the input string (e.g. `+08:00`, `Z`, `UTC`) is silently discarded, and the local date-time
components are preserved as-is. Time-only strings (e.g. `T12:34:56`, `12:34`) produce `NULL`.
The result is always a wall-clock timestamp with no timezone conversion or DST adjustment.

## TimestampNTZ Casts

Comet supports the following `TIMESTAMP_NTZ` casts natively:

| Cast | Compatible | Notes |
| ---------------------------------- | ---------- | ----------------------------------------------------------------- |
| `CAST(timestamp_ntz AS STRING)` | Yes | Formats local time as-is, timezone-independent |
| `CAST(timestamp_ntz AS DATE)` | Yes | Extracts the date component, timezone-independent |
| `CAST(timestamp_ntz AS TIMESTAMP)` | Yes | Interprets NTZ as local time in session TZ, converts to UTC epoch |
| `CAST(date AS TIMESTAMP_NTZ)` | Yes | Pure arithmetic, timezone-independent |
| `CAST(timestamp AS TIMESTAMP_NTZ)` | Yes | Shifts UTC epoch to local time in session TZ |
| `CAST(string AS TIMESTAMP_NTZ)` | Yes | See [String to TimestampNTZ](#string-to-timestampntz) above |

The NTZ-to-Timestamp and Timestamp-to-NTZ casts are session-timezone-dependent (the session
timezone determines the UTC offset). All other NTZ casts are timezone-independent and produce
the same result regardless of the session timezone.

## Date to String

Comet's native `CAST(date AS STRING)` is compatible with Spark. Years below 1000 are
zero-padded to four digits (e.g. year 999 renders as `0999-01-01`). Years above 9999 are
rendered without truncation. The cast is timezone-independent.

## String to TimestampNTZ

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ under the License.
time without timezone, so no conversion should be applied. These expressions work correctly with Timestamp inputs.
[#3180](https://github.com/apache/datafusion-comet/issues/3180)
- **TruncTimestamp (date_trunc)**: Produces incorrect results when used with non-UTC timezones. Compatible when
timezone is UTC.
timezone is UTC. TimestampNTZ inputs are handled correctly (timezone-independent truncation).
[#2649](https://github.com/apache/datafusion-comet/issues/2649)

## Date and Time Functions
Expand All @@ -41,5 +41,7 @@ If you need to process dates far in the future with accurate timezone handling,

- Using timezone-naive types (`timestamp_ntz`) when timezone conversion is not required
- Falling back to Spark for these specific operations
<!--BEGIN:EXPR_COMPAT[datetime]-->
<!--END:EXPR_COMPAT-->

<!--BEGIN:EXPR_COMPAT[datetime]-->

<!--END:EXPR_COMPAT-->
Loading