Skip to content

[SPARK-57250][SQL] Construct sub-microsecond timestamp typed literals with precision derived from fractional digits#56306

Open
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:spark-57250-typed-literals
Open

[SPARK-57250][SQL] Construct sub-microsecond timestamp typed literals with precision derived from fractional digits#56306
MaxGekk wants to merge 4 commits into
apache:masterfrom
MaxGekk:spark-57250-typed-literals

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 3, 2026

What changes were proposed in this pull request?

Per the ANSI SQL standard (ISO/IEC 9075-2, Subclause 5.3, Syntax Rule 27), the fractional-seconds precision of a typed timestamp literal is the number of digits in its <seconds fraction>. This PR makes the typed timestamp literals TIMESTAMP '...', TIMESTAMP_NTZ '...', and TIMESTAMP_LTZ '...' construct nanosecond-capable values, deriving the precision p from the fractional-digit count, when the SQL config spark.sql.timestampNanosTypes.enabled is enabled:

  • 7-9 fractional digits -> TimestampNTZNanosType(p) / TimestampLTZNanosType(p)
  • <= 6 fractional digits -> existing microsecond behavior (unchanged)
  • > 9 fractional digits -> new INVALID_TIMESTAMP_LITERAL_PRECISION parse error

Concretely:

  • Add SparkDateTimeUtils.fractionalSecondsDigits(String): Int to count the digits of the seconds fraction in a timestamp string.
  • Route AstBuilder.visitTypeConstructor for TIMESTAMP/TIMESTAMP_NTZ/TIMESTAMP_LTZ to the nanosecond parse helpers when the preview flag is on and the literal carries 7-9 fractional digits, preserving the existing NTZ/LTZ resolution rules for the bare TIMESTAMP keyword.
  • Add the INVALID_TIMESTAMP_LITERAL_PRECISION error condition and a QueryParsingErrors.timestampLiteralPrecisionExceedsMaxError factory for literals with more than 9 fractional-second digits.

Why are the changes needed?

To support standard-compliant sub-microsecond (nanosecond) timestamp typed literals as part of the nanosecond timestamp preview umbrella (SPARK-56822). The ANSI SQL literal grammar carries no explicit precision argument; the precision is implied by the number of fractional-second digits, so the parser must derive it from the literal text.

Does this PR introduce any user-facing change?

Yes, but only when the preview flag spark.sql.timestampNanosTypes.enabled is set to true. In that case a typed timestamp literal with 7-9 fractional-second digits now produces a nanosecond-precision value instead of being truncated to microseconds, and a literal with more than 9 fractional digits raises INVALID_TIMESTAMP_LITERAL_PRECISION. With the flag off (the default), behavior is unchanged: literals are parsed at microsecond precision as before.

How was this patch tested?

Added cases to ExpressionParserSuite ("SPARK-57250: nanosecond timestamp typed literals") covering:

  • 7/8/9-digit fractional precision for TIMESTAMP_NTZ, TIMESTAMP_LTZ, and bare TIMESTAMP;
  • NTZ/LTZ resolution based on the keyword and the presence of a time-zone part;
  • boundary values (pre-epoch, max year, nanosWithinMicro 0 and 999);
  • the 6-digit-stays-microsecond regression;
  • the >9 fractional-digit error;
  • the flag-off regression (microsecond behavior preserved).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk added 3 commits June 3, 2026 18:24
… with precision derived from fractional digits

### What changes were proposed in this pull request?

Per ANSI SQL (ISO/IEC 9075-2, Subclause 5.3, Syntax Rule 27), the fractional-seconds
precision of a typed timestamp literal is the number of digits in its `<seconds fraction>`.
This PR makes `TIMESTAMP`, `TIMESTAMP_NTZ`, and `TIMESTAMP_LTZ` typed literals construct
nanosecond-capable values with precision `p` derived from the fractional-digit count when
the nanosecond preview (`spark.sql.timestampNanosTypes.enabled`) is on:

- 7-9 fractional digits -> `TimestampNTZNanosType(p)` / `TimestampLTZNanosType(p)`
- <= 6 fractional digits -> existing microsecond behavior (unchanged)
- > 9 fractional digits -> `INVALID_TIMESTAMP_LITERAL_PRECISION` parse error

A new `fractionalSecondsDigits` helper counts the digits of the seconds fraction, and
`AstBuilder.visitTypeConstructor` routes to the nanos parse helpers accordingly.

### Why are the changes needed?

To support standard-compliant sub-microsecond timestamp literals as part of the
nanosecond timestamp preview (SPARK-56822).

### Does this PR introduce any user-facing change?

Yes, but only when the nanosecond preview flag is enabled. With the flag off, behavior
is unchanged.

### How was this patch tested?

Added cases to `ExpressionParserSuite` covering 7/8/9-digit NTZ/LTZ/TIMESTAMP literals,
zone resolution, boundary values, the 6-digit-stays-micro regression, the `>9`-digit
error, and the flag-off regression.
…actional-digit literals

Co-authored-by: Isaac
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 3, 2026

@srielau May I ask you to review this PR when you have time.

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 4, 2026

@stevomitric @uros-b Could you review this PR, please.

* `<seconds fraction>`). Digits beyond the fractional run (e.g. a trailing time zone) are not
* counted.
*/
def fractionalSecondsDigits(s: String): Int = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this gets something like abcd.1234?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. fractionalSecondsDigits is intentionally a lightweight digit counter that runs before the full timestamp parse, so it doesn't validate the rest of the string itself. For abcd.1234 it returns 4, which is < 7, so nanosLiteralOpt returns None and we fall through to the existing microsecond path -- which then fails to parse abcd.1234 and throws the usual "cannot parse value" error, exactly as on master. The digit count only decides which parse path (micro vs. nanos vs. >9 error) we take, and every path re-parses and validates the whole string.

The one nuance: a malformed value with more than 9 digits after the first . (e.g. abcd.1234567890) short-circuits to INVALID_TIMESTAMP_LITERAL_PRECISION instead of the generic parse error. It's still an error, and the value does literally carry >9 fractional digits, so I kept the current behavior -- happy to gate the precision check behind a successful parse if you'd prefer the generic error there.

I've clarified the doc comment to state this "pre-parse counter, validated downstream" contract and added direct unit tests for fractionalSecondsDigits (no fraction, trailing time zone, non-digit stop, the abcd.1234 case, and the digit-count boundaries).

Copy link
Copy Markdown
Contributor

@stevomitric stevomitric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Two small coverage suggestions, neither blocking:

  1. Add a flag-off regression for >9 fractional digits, since the implementation intentionally keeps legacy truncation when the preview flag is disabled.
  2. Consider direct unit coverage for fractionalSecondsDigits, especially no fraction, timezone suffix, and non-digit stop cases.

…digit regression

Address review feedback on apache#56306:

- Add direct unit coverage for `fractionalSecondsDigits` in `DateTimeUtilsSuite`
  (no fraction, trailing dot, 1/6/7/9/10-digit boundaries, non-digit stop cases
  such as a trailing time zone/whitespace, multi-dot, and malformed inputs).
- Add a flag-off regression in `ExpressionParserSuite` asserting that a literal
  with more than 9 fractional digits narrows to microseconds (instead of raising
  INVALID_TIMESTAMP_LITERAL_PRECISION) when the nanosecond preview is disabled.
- Clarify the `fractionalSecondsDigits` doc comment to state that it is a
  lightweight pre-parse digit counter and that malformed inputs are rejected
  downstream by the chosen parse path.
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

Thanks @stevomitric! Both made sense and neither was covered, so I added them:

  1. A flag-off regression with >9 fractional digits asserting the literal narrows to microseconds (...00.1234567890 -> ...00.123456) instead of raising INVALID_TIMESTAMP_LITERAL_PRECISION, locking in the intentional flag-gated validation.
  2. Direct unit tests for fractionalSecondsDigits in DateTimeUtilsSuite, covering no fraction, a trailing time-zone suffix, a non-digit stop (e.g. abcd.1234), the multi-dot case, and the 0/6/7/9/10-digit boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants