Skip to content

[parquet] Add map shredding for hot keys#7877

Open
Aitozi wants to merge 1 commit into
apache:masterfrom
Aitozi:mwj-map-shredding
Open

[parquet] Add map shredding for hot keys#7877
Aitozi wants to merge 1 commit into
apache:masterfrom
Aitozi:mwj-map-shredding

Conversation

@Aitozi
Copy link
Copy Markdown
Contributor

@Aitozi Aitozi commented May 17, 2026

Purpose

Add Parquet map shredding support for MAP<STRING, T> columns.

This allows selected map columns to extract hot keys into independent physical Parquet columns while preserving the original logical map schema for readers. The feature is controlled by map.shredding.* options, aligned with the existing variant.shredding.* naming style. It also adds a focused round-trip test and a storage benchmark to validate the storage benefit.

Tests

  • mvn -pl paimon-api,paimon-format -Pfast-build -DskipTests compile
  • mvn -pl paimon-format -am -Pfast-build -DfailIfNoTests=false -Dtest=ParquetFormatReadWriteTest#testMapShreddingRoundTrip,MapShreddingStorageBenchmark test
  • git diff --check

Physical Layout

This change does not introduce a new Parquet logical type and does not modify the standard Parquet MAP encoding. A shredded map is still written with the regular Parquet map group as the residual map. Hot keys are promoted into additional sibling sidecar columns in the parent Parquet group.

For example, a logical field:

headers MAP<STRING, STRING>

is normally written as:

message paimon_schema {
  optional group headers (MAP) {
    repeated group key_value {
      required binary key (STRING);
      optional binary value (STRING);
    }
  }
}

With map shredding enabled, if user-agent and host are selected as hot keys, the physical Parquet schema becomes:

message paimon_schema {
  optional group headers (MAP) {
    repeated group key_value {
      required binary key (STRING);
      optional binary value (STRING);
    }
  }

  optional binary dynamic_column_headers_value_0 (STRING);
  optional binary dynamic_column_headers_value_1 (STRING);
}

The footer metadata records the mapping from sidecar columns to map keys:

parquet.meta.dynamic.column.map.keys.of.headers = user-agent,host

During writing, entries for promoted hot keys are omitted from the residual map when their values are non-null, and their values are written into the corresponding sidecar columns. During reading, Paimon reads both the residual map and the sidecar columns, then reconstructs the original logical MAP<STRING, T> value.

For nested maps, the same rule applies within the containing row group. For example, for payload.headers, sidecar columns are added as siblings of the headers map inside the payload group, and the footer metadata uses the full logical path:

parquet.meta.dynamic.column.map.keys.of.payload.headers = user-agent,host

#7876

@Aitozi Aitozi force-pushed the mwj-map-shredding branch from 61967d4 to 5a5b5a5 Compare May 17, 2026 04:36
@Aitozi Aitozi force-pushed the mwj-map-shredding branch from 5a5b5a5 to 5f397f8 Compare May 17, 2026 04:38
@Aitozi
Copy link
Copy Markdown
Contributor Author

Aitozi commented May 17, 2026

Benchmark command:

mvn -s ~/.m2/apache-community.xml -pl paimon-format -am -Pfast-build \
  -DfailIfNoTests=false -Dtest=MapShreddingStorageBenchmark test

Benchmark file: [MapShreddingStorageBenchmark.java]

Common Setup

  • Schema: id INT, headers MAP<STRING, STRING>
  • Rows: 100,000
  • Hot keys: 32
  • Value length: 16
  • Compression: snappy
  • Compared layouts:
    • regular: normal Parquet map encoding
    • mapShredding: promotes 32 hot keys from headers into sidecar columns
  • Map shredding options:
    • map.shredding.columns=headers
    • map.shredding.maxKeys=32
    • map.shredding.maxInferBufferRow=10000
    • map.shredding.maxInferBufferMemory=64 mb

Results

Scenario Regular Map Shredding Saved Saving
Columnar value storage 708,012 bytes 431,637 bytes 276,375 bytes 39.04%
Long hot key storage 40,845,943 bytes 16,365,106 bytes 24,480,837 bytes 59.93%

Scenario Details

  • Columnar value storage: key names are short, values follow a repeated pattern with valueRunLength=128 and valueCardinality=4, dictionary encoding enabled. This measures whether promoted hot-key values benefit from columnar and dictionary encoding.
  • Long hot key storage: hot key names include 128 bytes of padding, dictionary encoding disabled. This measures the benefit of avoiding repeated long map-key strings in every row.

Conclusion: in this synthetic storage benchmark, map shredding reduces file size in both cases. The biggest gain appears when hot map keys are long and repeated across many rows, saving about 59.93%.

@JingsongLi
Copy link
Copy Markdown
Contributor

This looks very suitable to be solved using Variant, why not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants