Skip to content

Commit 3be3a34

Browse files
authored
doc: update documentation for cast and datetime functions (#4058)
1 parent 5333d09 commit 3be3a34

2 files changed

Lines changed: 75 additions & 5 deletions

File tree

docs/source/user-guide/latest/compatibility/expressions/cast.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,14 +49,82 @@ including:
4949
- Scientific notation (e.g. `1.23E+5`) is supported.
5050
- Special values (`inf`, `infinity`, `nan`) produce `NULL`.
5151

52+
## String to Date
53+
54+
Comet's native `CAST(string AS DATE)` implementation matches Apache Spark's behavior for years
55+
between 262143 BC and 262142 AD. This range limitation comes from the underlying chrono library's
56+
`NaiveDate` type. Spark itself supports a wider range. All three eval modes (Legacy, ANSI, Try)
57+
are supported.
58+
59+
Supported input formats match Spark exactly:
60+
61+
- `yyyy`, `yyyy-[m]m`, `yyyy-[m]m-[d]d`
62+
- Optional `T` suffix with arbitrary trailing text (e.g. `2020-01-01T12:34:56`)
63+
- Leading/trailing whitespace and control characters are trimmed
64+
- Optional sign prefix (`-` for negative years)
65+
- Leading zeros (e.g. `0002020-01-01` is year 2020)
66+
67+
## Date to Timestamp
68+
69+
Comet's native `CAST(date AS TIMESTAMP)` is compatible with Spark. The cast interprets each
70+
date as midnight in the session timezone and converts to a UTC epoch value. DST transitions
71+
are handled correctly, including spring-forward gaps (where midnight may not exist) and
72+
fall-back ambiguity (where Comet picks the earlier/DST occurrence, matching Spark's
73+
`LocalDate.atStartOfDay(zoneId)` behavior).
74+
75+
## Date to TimestampNTZ
76+
77+
Comet's native `CAST(date AS TIMESTAMP_NTZ)` is compatible with Spark. The cast is
78+
timezone-independent: each date is converted to midnight as pure arithmetic
79+
(`days * 86,400,000,000` microseconds) with no session timezone offset applied. The result
80+
is the same regardless of the session timezone setting.
81+
82+
## Date to Numeric Types
83+
84+
In Legacy mode, `CAST(date AS INT)`, `CAST(date AS LONG)`, and casts to all other numeric
85+
types (Boolean, Byte, Short, Float, Double, Decimal) always return `NULL`. Comet handles
86+
this by short-circuiting to a null literal during query planning, so no native execution
87+
is needed. In ANSI and Try modes, Spark rejects these casts at analysis time (before
88+
execution reaches Comet).
89+
5290
## String to Timestamp
5391

5492
Comet's native `CAST(string AS TIMESTAMP)` implementation supports all timestamp formats accepted
5593
by Apache Spark, including ISO 8601 date-time strings, date-only strings, time-only strings
5694
(`HH:MM:SS`), embedded timezone offsets (e.g. `+07:30`, `GMT-01:00`, `UTC`), named timezone
5795
suffixes (e.g. `Europe/Moscow`), and the full Spark timestamp year range
58-
(-290308 to 294247). Note that `CAST(string AS DATE)` is only compatible for years between
59-
262143 BC and 262142 AD due to an underlying library limitation.
96+
(-290308 to 294247).
97+
98+
## String to TimestampNTZ
99+
100+
Comet's native `CAST(string AS TIMESTAMP_NTZ)` implementation matches Apache Spark's behavior.
101+
Unlike `CAST(string AS TIMESTAMP)`, this cast is timezone-independent: any timezone offset in
102+
the input string (e.g. `+08:00`, `Z`, `UTC`) is silently discarded, and the local date-time
103+
components are preserved as-is. Time-only strings (e.g. `T12:34:56`, `12:34`) produce `NULL`.
104+
The result is always a wall-clock timestamp with no timezone conversion or DST adjustment.
105+
106+
## TimestampNTZ Casts
107+
108+
Comet supports the following `TIMESTAMP_NTZ` casts natively:
109+
110+
| Cast | Compatible | Notes |
111+
| ---------------------------------- | ---------- | ----------------------------------------------------------------- |
112+
| `CAST(timestamp_ntz AS STRING)` | Yes | Formats local time as-is, timezone-independent |
113+
| `CAST(timestamp_ntz AS DATE)` | Yes | Extracts the date component, timezone-independent |
114+
| `CAST(timestamp_ntz AS TIMESTAMP)` | Yes | Interprets NTZ as local time in session TZ, converts to UTC epoch |
115+
| `CAST(date AS TIMESTAMP_NTZ)` | Yes | Pure arithmetic, timezone-independent |
116+
| `CAST(timestamp AS TIMESTAMP_NTZ)` | Yes | Shifts UTC epoch to local time in session TZ |
117+
| `CAST(string AS TIMESTAMP_NTZ)` | Yes | See [String to TimestampNTZ](#string-to-timestampntz) above |
118+
119+
The NTZ-to-Timestamp and Timestamp-to-NTZ casts are session-timezone-dependent (the session
120+
timezone determines the UTC offset). All other NTZ casts are timezone-independent and produce
121+
the same result regardless of the session timezone.
122+
123+
## Date to String
124+
125+
Comet's native `CAST(date AS STRING)` is compatible with Spark. Years below 1000 are
126+
zero-padded to four digits (e.g. year 999 renders as `0999-01-01`). Years above 9999 are
127+
rendered without truncation. The cast is timezone-independent.
60128

61129
## String to TimestampNTZ
62130

docs/source/user-guide/latest/compatibility/expressions/datetime.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ under the License.
2323
time without timezone, so no conversion should be applied. These expressions work correctly with Timestamp inputs.
2424
[#3180](https://github.com/apache/datafusion-comet/issues/3180)
2525
- **TruncTimestamp (date_trunc)**: Produces incorrect results when used with non-UTC timezones. Compatible when
26-
timezone is UTC.
26+
timezone is UTC. TimestampNTZ inputs are handled correctly (timezone-independent truncation).
2727
[#2649](https://github.com/apache/datafusion-comet/issues/2649)
2828

2929
## Date and Time Functions
@@ -41,5 +41,7 @@ If you need to process dates far in the future with accurate timezone handling,
4141

4242
- Using timezone-naive types (`timestamp_ntz`) when timezone conversion is not required
4343
- Falling back to Spark for these specific operations
44-
<!--BEGIN:EXPR_COMPAT[datetime]-->
45-
<!--END:EXPR_COMPAT-->
44+
45+
<!--BEGIN:EXPR_COMPAT[datetime]-->
46+
47+
<!--END:EXPR_COMPAT-->

0 commit comments

Comments
 (0)