Skip to content

[VL] Add lazy per-column deserialization for Columnar Table Cache#12211

Open
jackylee-ch wants to merge 1 commit into
apache:mainfrom
jackylee-ch:table-cache-lazy-deserialization
Open

[VL] Add lazy per-column deserialization for Columnar Table Cache#12211
jackylee-ch wants to merge 1 commit into
apache:mainfrom
jackylee-ch:table-cache-lazy-deserialization

Conversation

@jackylee-ch
Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch commented Jun 1, 2026

What changes

This PR makes Velox table cache write V3 per-column framed bytes by default. Lazy materialization is a base table-cache capability; spark.gluten.sql.columnar.tableCache.partitionStats.enabled now only controls the optional stats/pruning payload.

  • Removes spark.gluten.sql.columnar.tableCache.lazy.deserialization.enabled.
  • Adds V3 no-stats serialization (statsLen=0) for the default lazy path.
  • Keeps V3 with stats for partition pruning when partition stats are enabled.
  • Keeps V2 stats and legacy raw bytes as native-capability / backward-read fallback paths.
  • Routes V3 cached bytes through projected native deserialization.
  • Adds JVM/native golden, lazy serde, and GHA benchmark coverage.

Performance

No local benchmark result is quoted in this PR. Results below are from GitHub Actions.

  • Workflow: Velox Backend (x86), run 26822472145.
  • Artifact: table-cache-lazy-deserialization-benchmark.
  • Benchmark: ColumnarTableCacheLazyDeserBenchmark.
  • Environment: Linux x86_64, AMD EPYC 7763, JDK 8.
  • Dataset: 5M rows, 32 partitions, 3 iterations.
  • Benchmark step duration: 03:08 min.
Case V3 lazy no stats V3 lazy + stats Relative
Cache footprint 493.59 MiB 493.58 MiB ~1.0x
Read 1/16 columns, best time 219 ms 204 ms 1.1x
Read all 16 columns, best time 2228 ms 2222 ms 1.0x
Filter + 2/16 columns, best time 104 ms 81 ms 1.3x

How was this patch tested?

  • ./dev/format-scala-code.sh
  • PATH="/opt/homebrew/opt/llvm@15/bin:$PATH" ./dev/format-cpp-code.sh
  • git diff --check
  • ruby -e 'require "yaml"; YAML.load_file(".github/workflows/velox_backend_x86.yml"); puts "yaml ok"'
  • env CCACHE_DIR=/private/tmp/gluten-ccache cmake --build cpp/build --target libgluten.dylib velox_operators_test -j 8
  • env DYLD_LIBRARY_PATH=cpp/build/releases:/Users/lijunqing/Code/stczwd/gluten/ep/build-velox/build/velox_ep/deps-install/lib:/opt/homebrew/lib cpp/build/velox/tests/velox_operators_test --gtest_filter=VeloxColumnarBatchSerializerTest.framedSerializeWithStatsV3EmptyGolden:VeloxColumnarBatchSerializerTest.framedSerializeV3NoStatsEmptyGolden
  • env DYLD_LIBRARY_PATH=cpp/build/releases:/Users/lijunqing/Code/stczwd/gluten/ep/build-velox/build/velox_ep/deps-install/lib:/opt/homebrew/lib ./build/mvn test-compile scalatest:test -pl backends-velox -Pbackends-velox -Pspark-3.5 -Pscala-2.12 -DwildcardSuites=org.apache.spark.sql.execution.ColumnarCachedBatchFramedBytesSuite,org.apache.spark.sql.execution.ColumnarCachedBatchLazySerdeTest -DfailIfNoTests=false
  • Local benchmark smoke only, not used as PR performance data: ColumnarTableCacheLazyDeserBenchmark with 10000 rows, 4 partitions, 1 iteration, phases read1,readAll,filter.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex GPT-5

@github-actions github-actions Bot added CORE works for Gluten Core VELOX DOCS labels Jun 1, 2026
@jackylee-ch jackylee-ch marked this pull request as draft June 1, 2026 04:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 58bd451 to d5a0502 Compare June 1, 2026 08:59
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from d5a0502 to 8e374db Compare June 1, 2026 09:05
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 8e374db to 0f0ccd2 Compare June 1, 2026 09:08
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 0f0ccd2 to 8b09d6b Compare June 1, 2026 11:21
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch marked this pull request as ready for review June 1, 2026 14:20
@jackylee-ch
Copy link
Copy Markdown
Contributor Author

@yaooqinn PTAL

@yaooqinn
Copy link
Copy Markdown
Member

yaooqinn commented Jun 2, 2026

Thanks @jackylee-ch, V3 layout is a sensible extension of the cache-stats wire we landed in #12092 / #12196. Several things to discuss before this lands:

1. Benchmark needs to be re-run. The checked-in -results.txt is 10K rows / 4 partitions / 1 iteration on an Apple M5 Pro — Stdev=0 across the board because there's only one sample. Differences in the 1-3 ms range (e.g. "1.1X" at all-16-cols read, where lazy mode physically cannot be faster than eager) are noise. Also build 1.9X is surprising because V3 does N serializeSingleColumn calls vs V2's single-pass batchSerialize — the ordering legacy > V2 > V3 doesn't match the physical work done; this needs reruns on a server / GHA-equivalent runner with iter≥3 and 100M rows / 32 partitions (matching the code defaults). Please also add a cache memory footprint column — V3 per-col framing + getFlattenedRowVector() flattening Dictionary/Constant encodings could regress cache size significantly for dict-encoded payloads, and that's currently unmeasured.

2. Do we really need a new SQLConf? V3 functionally supersedes V2 (V3 frames also carry statsBlob), so this isn't a new behavioral feature — it's a wire-format upgrade. Adding a dedicated lazy.deserialization.enabled boolean commits Gluten to maintaining three cache paths (legacy / V2-stats / V3-lazy-and-stats) and a three-level fallback chain. Once we trust V3, we'd want to deprecate V2-stats, which means another deprecation cycle. Could we either (a) skip the conf and gate V3 behind partitionStats.enabled once it's stable, or (b) turn partitionStats.enabled into a string conf with off | v2 | v3 values? Configuration.md already warns "V3 is NOT backward compatible with V2 readers" + default=false — operationally nobody is going to flip this, so the conf risks being long-lived dead code.

3. Cross-language test parity vs #12196. V3 has no cpp-side byte-equal golden test; JVM-side tests synthesize their own frames via craftV3Framed. We just established the cpp-golden ↔ JVM-parser round-trip pattern in #12196 specifically because layout drift between halves is a correctness hazard. V3 needs the same: a framedSerializeWithStatsV3Golden cpp test pinning a byte-stable literal + a JVM parser round-trip over that same literal.

4. Smaller items.

  • All-null column case not covered (we hit the PrestoSerde uninit-values bug in [VL] Add min/max partition stats to columnar InMemoryRelation cache for partition pruning #12092 development, same risk class for per-col path).
  • getFlattenedRowVector() side effect on Dictionary/Constant encoding not documented.
  • The // JNI pin outlives comment in deserializeV3 describes a non-issue (copies are made synchronously in step 6, the lazy loader doesn't depend on the pin) — please trim.
  • Two near-identical magic checks (parseFramedBytes byte[3] dispatch vs isV3Format 4-byte compare) — please consolidate.
  • Consider folding statsExtV3AvailableFlag and statsExtAvailableFlag into a single capability enum (Unknown | V2 | V3 | Unavailable) — two independent one-shot latches double the operational diagnosis surface.

Happy to file any of these as separate issues if it helps.

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 8b09d6b to 09679ee Compare June 2, 2026 06:24
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 09679ee to ab9e0f7 Compare June 2, 2026 06:30
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Run Gluten Clickhouse CI on x86

@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from ab9e0f7 to 144e816 Compare June 2, 2026 06:47
@github-actions github-actions Bot removed the CORE works for Gluten Core label Jun 2, 2026
@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch 2 times, most recently from b77f4ab to 9a0f96a Compare June 2, 2026 07:28
@github-actions github-actions Bot removed the DOCS label Jun 2, 2026
@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 9a0f96a to b5b1906 Compare June 2, 2026 09:01
@github-actions github-actions Bot added the INFRA label Jun 2, 2026
@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch 2 times, most recently from 2fad6fb to 2b96545 Compare June 2, 2026 13:20
Write V3 per-column cache bytes by default for Velox table cache. Partition stats now only controls the optional stats/pruning payload: stats off writes a no-stats V3 frame, stats on writes V3 with stats, and older native libraries still fall back to V2 stats or legacy bytes.

Add the V3 no-stats JNI/native serializer, JVM parsing for statsLen=0, cross-language golden coverage, and GitHub Actions benchmark execution without committing local benchmark results.

Change-Id: I2a8582f901fafd436cac1a1d16e0367e9330b336
@jackylee-ch jackylee-ch force-pushed the table-cache-lazy-deserialization branch from 2b96545 to c3cc1bd Compare June 2, 2026 15:28
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Run Gluten Clickhouse CI on x86

@github-actions github-actions Bot added the CORE works for Gluten Core label Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core INFRA VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants