[SPARK-57250][SQL] Construct sub-microsecond timestamp typed literals with precision derived from fractional digits#56306
Conversation
… with precision derived from fractional digits ### What changes were proposed in this pull request? Per ANSI SQL (ISO/IEC 9075-2, Subclause 5.3, Syntax Rule 27), the fractional-seconds precision of a typed timestamp literal is the number of digits in its `<seconds fraction>`. This PR makes `TIMESTAMP`, `TIMESTAMP_NTZ`, and `TIMESTAMP_LTZ` typed literals construct nanosecond-capable values with precision `p` derived from the fractional-digit count when the nanosecond preview (`spark.sql.timestampNanosTypes.enabled`) is on: - 7-9 fractional digits -> `TimestampNTZNanosType(p)` / `TimestampLTZNanosType(p)` - <= 6 fractional digits -> existing microsecond behavior (unchanged) - > 9 fractional digits -> `INVALID_TIMESTAMP_LITERAL_PRECISION` parse error A new `fractionalSecondsDigits` helper counts the digits of the seconds fraction, and `AstBuilder.visitTypeConstructor` routes to the nanos parse helpers accordingly. ### Why are the changes needed? To support standard-compliant sub-microsecond timestamp literals as part of the nanosecond timestamp preview (SPARK-56822). ### Does this PR introduce any user-facing change? Yes, but only when the nanosecond preview flag is enabled. With the flag off, behavior is unchanged. ### How was this patch tested? Added cases to `ExpressionParserSuite` covering 7/8/9-digit NTZ/LTZ/TIMESTAMP literals, zone resolution, boundary values, the 6-digit-stays-micro regression, the `>9`-digit error, and the flag-off regression.
…ll-through test cases Co-authored-by: Isaac
…actional-digit literals Co-authored-by: Isaac
|
@srielau May I ask you to review this PR when you have time. |
|
@stevomitric @uros-b Could you review this PR, please. |
| * `<seconds fraction>`). Digits beyond the fractional run (e.g. a trailing time zone) are not | ||
| * counted. | ||
| */ | ||
| def fractionalSecondsDigits(s: String): Int = { |
There was a problem hiding this comment.
What if this gets something like abcd.1234?
There was a problem hiding this comment.
Good question. fractionalSecondsDigits is intentionally a lightweight digit counter that runs before the full timestamp parse, so it doesn't validate the rest of the string itself. For abcd.1234 it returns 4, which is < 7, so nanosLiteralOpt returns None and we fall through to the existing microsecond path -- which then fails to parse abcd.1234 and throws the usual "cannot parse value" error, exactly as on master. The digit count only decides which parse path (micro vs. nanos vs. >9 error) we take, and every path re-parses and validates the whole string.
The one nuance: a malformed value with more than 9 digits after the first . (e.g. abcd.1234567890) short-circuits to INVALID_TIMESTAMP_LITERAL_PRECISION instead of the generic parse error. It's still an error, and the value does literally carry >9 fractional digits, so I kept the current behavior -- happy to gate the precision check behind a successful parse if you'd prefer the generic error there.
I've clarified the doc comment to state this "pre-parse counter, validated downstream" contract and added direct unit tests for fractionalSecondsDigits (no fraction, trailing time zone, non-digit stop, the abcd.1234 case, and the digit-count boundaries).
stevomitric
left a comment
There was a problem hiding this comment.
LGTM.
Two small coverage suggestions, neither blocking:
- Add a flag-off regression for >9 fractional digits, since the implementation intentionally keeps legacy truncation when the preview flag is disabled.
- Consider direct unit coverage for
fractionalSecondsDigits, especially no fraction, timezone suffix, and non-digit stop cases.
…digit regression Address review feedback on apache#56306: - Add direct unit coverage for `fractionalSecondsDigits` in `DateTimeUtilsSuite` (no fraction, trailing dot, 1/6/7/9/10-digit boundaries, non-digit stop cases such as a trailing time zone/whitespace, multi-dot, and malformed inputs). - Add a flag-off regression in `ExpressionParserSuite` asserting that a literal with more than 9 fractional digits narrows to microseconds (instead of raising INVALID_TIMESTAMP_LITERAL_PRECISION) when the nanosecond preview is disabled. - Clarify the `fractionalSecondsDigits` doc comment to state that it is a lightweight pre-parse digit counter and that malformed inputs are rejected downstream by the chosen parse path.
|
Thanks @stevomitric! Both made sense and neither was covered, so I added them:
|
What changes were proposed in this pull request?
Per the ANSI SQL standard (ISO/IEC 9075-2, Subclause 5.3, Syntax Rule 27), the fractional-seconds precision of a typed timestamp literal is the number of digits in its
<seconds fraction>. This PR makes the typed timestamp literalsTIMESTAMP '...',TIMESTAMP_NTZ '...', andTIMESTAMP_LTZ '...'construct nanosecond-capable values, deriving the precisionpfrom the fractional-digit count, when the SQL configspark.sql.timestampNanosTypes.enabledis enabled:TimestampNTZNanosType(p)/TimestampLTZNanosType(p)<=6 fractional digits -> existing microsecond behavior (unchanged)>9 fractional digits -> newINVALID_TIMESTAMP_LITERAL_PRECISIONparse errorConcretely:
SparkDateTimeUtils.fractionalSecondsDigits(String): Intto count the digits of the seconds fraction in a timestamp string.AstBuilder.visitTypeConstructorforTIMESTAMP/TIMESTAMP_NTZ/TIMESTAMP_LTZto the nanosecond parse helpers when the preview flag is on and the literal carries 7-9 fractional digits, preserving the existing NTZ/LTZ resolution rules for the bareTIMESTAMPkeyword.INVALID_TIMESTAMP_LITERAL_PRECISIONerror condition and aQueryParsingErrors.timestampLiteralPrecisionExceedsMaxErrorfactory for literals with more than 9 fractional-second digits.Why are the changes needed?
To support standard-compliant sub-microsecond (nanosecond) timestamp typed literals as part of the nanosecond timestamp preview umbrella (SPARK-56822). The ANSI SQL literal grammar carries no explicit precision argument; the precision is implied by the number of fractional-second digits, so the parser must derive it from the literal text.
Does this PR introduce any user-facing change?
Yes, but only when the preview flag
spark.sql.timestampNanosTypes.enabledis set totrue. In that case a typed timestamp literal with 7-9 fractional-second digits now produces a nanosecond-precision value instead of being truncated to microseconds, and a literal with more than 9 fractional digits raisesINVALID_TIMESTAMP_LITERAL_PRECISION. With the flag off (the default), behavior is unchanged: literals are parsed at microsecond precision as before.How was this patch tested?
Added cases to
ExpressionParserSuite("SPARK-57250: nanosecond timestamp typed literals") covering:TIMESTAMP_NTZ,TIMESTAMP_LTZ, and bareTIMESTAMP;nanosWithinMicro0 and 999);>9fractional-digit error;Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)