Commit f3b08bc
fix: [df52] timestamp nanos precision loss with nanosAsLong (#3502)
When Spark's `LEGACY_PARQUET_NANOS_AS_LONG=true` converts TIMESTAMP(NANOS)
to LongType, the PhysicalExprAdapter detects a type mismatch between the
file's Timestamp(Nanosecond) and the logical Int64. The DefaultAdapter
creates a CastColumnExpr, which SparkPhysicalExprAdapter then replaces
with Spark's Cast expression. Spark's Cast postprocess for Timestamp→Int64
unconditionally divides by MICROS_PER_SECOND (10^6), assuming microsecond
precision. But the values are nanoseconds, so the raw value
1668537129123534758 becomes 1668537129123 — losing sub-millisecond
precision.
Fix: route Timestamp→Int64 casts through CometCastColumnExpr (which uses
spark_parquet_convert → Arrow cast) instead of Spark Cast. Arrow's cast
correctly reinterprets the raw i64 value without any division.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent dc2b9a4 commit f3b08bc
1 file changed
Lines changed: 11 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
302 | 302 | | |
303 | 303 | | |
304 | 304 | | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
309 | 310 | | |
310 | 311 | | |
311 | 312 | | |
312 | 313 | | |
313 | 314 | | |
314 | 315 | | |
315 | 316 | | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
316 | 322 | | |
317 | 323 | | |
318 | 324 | | |
319 | 325 | | |
320 | 326 | | |
321 | 327 | | |
| 328 | + | |
322 | 329 | | |
323 | 330 | | |
324 | 331 | | |
| |||
0 commit comments