Skip to content

[VL][Delta] Add native Delta bitmap aggregation support#12214

Open
malinjawi wants to merge 4 commits into
apache:mainfrom
malinjawi:split/delta-dv-native-bitmap-pr
Open

[VL][Delta] Add native Delta bitmap aggregation support#12214
malinjawi wants to merge 4 commits into
apache:mainfrom
malinjawi:split/delta-dv-native-bitmap-pr

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented Jun 1, 2026

What changes are proposed in this pull request?

This PR is the next split for Delta deletion-vector MoR support. It adds the native bitmap primitive needed by later DELETE DV work, without changing DELETE routing or enabling native bitmap construction in the command path yet.

Main changes:

  • extend RoaringBitmapArray for Delta Portable-format deletion-vector payloads
  • add bounded deserialization using CRoaring portable deserialize sizing before readSafe
  • add native bitmapaggregator support for Delta row-index aggregation
  • wire the aggregate name through Gluten expression/substrait planning
  • add focused native tests for bitmap serialization/deserialization and aggregate behavior
  • add delta_bitmap_benchmark with construction, partial-merge, and deserialize/probe cases

This PR is intentionally primitive-only:

  • no DELETE command routing changes
  • no DML row-index scan planning changes
  • no plain Parquet target scan optimization
  • no native bitmap aggregation enabled as the default DELETE path

Those pieces remain in follow-up split PRs after the primitive and benchmark shape are reviewed.

How was this patch tested?

Post-rebase validation on top of current upstream/main (33be6fb8bf703ac16eae3c75efa919a97d9cdf5a):

  • git diff --check upstream/main...HEAD
  • env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests
  • env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests

Focused standalone native validation from the same diff before the final rebase:

  • standalone RoaringBitmapArrayTest: passed all 9 focused tests
  • Delta JVM compatibility: JVM-generated sparse-gap portable fixture for values 1, 7, and 1 << 33 is read by native code; native compact portable payload for the same values is read by a Delta 3.3.2 JVM helper with cardinality 3, all expected contains checks, and last value 8589934592
  • standalone delta_bitmap_benchmark construction/merge output: /tmp/delta_bitmap_benchmark_delete_construction.json
  • standalone delta_bitmap_benchmark read/probe output: /tmp/delta_bitmap_benchmark_read_probe.json

Benchmark highlights from the standalone run:

  • contiguous 1M build+serialize: 7.91 ms, 132.5M rows/s
  • sparse 1M build+serialize: 9.99 ms, 105.0M rows/s
  • clustered 1M build+serialize: 10.10 ms, 103.9M rows/s
  • multi-bucket 256K build+serialize: 2.28 ms, 114.9M rows/s
  • sparse 1M merge from 64 partials: 1.12 ms
  • contiguous round-robin merge from 64 partials: 1.32 ms
  • sparse deserialize+probe: 487 us for an 8,192-probe sample

CI status:

  • Superseding license, C++, and Scala format checks are green.
  • Native build/test lanes are green, including Velox Backend x86/ARM native build and UDF test coverage.
  • Spark matrix and TPC/check lanes are green in the latest completed snapshot.

Notes:

  • Normal local Gluten C++ target validation is still limited by local Velox/build-tree setup, so the regular project CI is the authoritative native validation for this PR.
  • This PR is ready for review as the primitive-only bitmap foundation. End-to-end DELETE bitmap construction performance remains in the follow-up native DELETE benchmark branch.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi marked this pull request as ready for review June 1, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant