[core] Optimize Flink BTree index topology by leaves12138 · Pull Request #7852 · apache/paimon

leaves12138 · 2026-05-14T07:13:17Z

What changed

Reworked Flink BTree global index building to use one task-driven topology for all contiguous row ranges instead of building one topology per range.
Added an internal build task id to the sort key so each range keeps its own row-range metadata while sharing the same Flink source/read/sort/write chain.
Added coverage for parallelism calculation, many small ranges, and a single large range split across multiple writer subtasks.

Why

When row ranges are highly fragmented, the old implementation creates a separate Flink topology for each range. That can make the create-index procedure spend a long time constructing the JobGraph and can produce an oversized topology.

Validation

mvn -pl paimon-flink/paimon-flink-common -DfailIfNoTests=false -Dtest=BTreeIndexTopoBuilderTest test
mvn -pl paimon-flink/paimon-flink-common -Pfast-build -DfailIfNoTests=false -Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithManyPartitions test
mvn -pl paimon-flink/paimon-flink-common -Pfast-build -DfailIfNoTests=false -Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithSingleRangeAndParallelWriters test

JingsongLi

Review: [core] Optimize Flink BTree index topology

Nice optimization. Replacing N separate Flink topologies (one per row range) with a single unified topology keyed by a synthetic buildTaskId sort prefix is a clean approach to reducing JobGraph construction overhead.

Correctness

The overall design is sound:

The buildTaskId field is prepended as the primary sort key, so after range-shuffle + local-sort, data within each writer subtask is guaranteed monotonically ordered by (taskId, indexColumn). Task transitions are one-directional within each subtask.
flushCurrentWriter() correctly handles both task-boundary flush and within-task overflow flush.
The ReadDataOperator output type matches sortReadType (with the taskId column), and the taskId column survives the sort.

Suggestions

BUILD_TASK_ID_FIELD_ID = -1 -- The choice of a negative field ID avoids collision with real schema field IDs. A short comment documenting this invariant would help future readers.
buildTasksById HashMap rebuilt in every parallel writer subtask -- Each subtask independently reconstructs the full map. For a small number of tasks this is negligible, but could be restricted to only relevant tasks in the future.
Parallelism calculation -- Integer division (totalRecords / recordsPerRange) means 1500 records with recordsPerRange=1000 yields parallelism=1. Matches old behavior but worth noting.
BTreeSplitTask.split field relies on Java-Serializable -- Slightly tighter coupling than the previous Flink TypeInformation-based serializer.

Tests

Good coverage: unit tests for calculateParallelism and IT tests for end-to-end flow. Overall a well-structured change. LGTM with minor suggestions.

JingsongLi · 2026-05-23T14:48:05Z

+1

Optimize Flink BTree index topology

52ff891

leaves12138 changed the title ~~[codex] Optimize Flink BTree index topology~~ [core] Optimize Flink BTree index topology May 18, 2026

leaves12138 marked this pull request as ready for review May 20, 2026 14:06

JingsongLi reviewed May 23, 2026

View reviewed changes

JingsongLi merged commit f840232 into apache:master May 23, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Optimize Flink BTree index topology#7852

[core] Optimize Flink BTree index topology#7852
JingsongLi merged 1 commit into
apache:masterfrom
leaves12138:codex/flink-btree-single-topology

leaves12138 commented May 14, 2026

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leaves12138 commented May 14, 2026

What changed

Why

Validation

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Review: [core] Optimize Flink BTree index topology

Correctness

Suggestions

Tests

Uh oh!

JingsongLi commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants