Skip to content

Decouple bulk load from SHARD_ENCODE_LOCATION_METADATA dependency#13162

Open
saintstack wants to merge 4 commits intoapple:mainfrom
saintstack:decouple
Open

Decouple bulk load from SHARD_ENCODE_LOCATION_METADATA dependency#13162
saintstack wants to merge 4 commits intoapple:mainfrom
saintstack:decouple

Conversation

@saintstack
Copy link
Copy Markdown
Contributor

Bulk load previously required SHARD_ENCODE_LOCATION_METADATA to be enabled, which pulled in the entire DataMoveMetaData persistence machinery. This was unnecessary overhead — the BulkLoadTaskState is already independently persisted at bulkLoadTaskKeys (keyed by range), and the storage server already knows its range when fetching keys.

The core change: SS now looks up bulk load task metadata directly by range from bulkLoadTaskKeys, instead of the old indirection path:
dataMoveId → DataMoveMetaData → BulkLoadTaskState

Changes by component:

Storage Server (storageserver.actor.cpp):

  • Replace getBulkLoadTaskStateFromDataMove() with getBulkLoadTaskStateByRange() in both fetchKeys and fetchShard paths
  • Remove the gate that disabled conductBulkLoad when dataMoveId was anonymousShardId or invalid
  • Remove assertions that cross-checked dataMoveId against the task state

Data Distributor (DDRelocationQueue.actor.cpp, DataDistribution.cpp/.h):

  • When SHARD_ENCODE_LOCATION_METADATA is off, generate a proper dataMoveId with LOGICAL_BULKLOAD encoded for bulk load tasks (instead of anonymousShardId)
  • Add bulk load task validation and dataMoveId assignment path that runs without SHARD_ENCODE_LOCATION_METADATA
  • Remove the bulkLoadIsEnabled() gate that required SHARD_ENCODE_LOCATION_METADATA
  • Remove the DDBulkLoadModeMonitorSkipped early return

MoveKeys (MoveKeys.cpp):

  • Add optional dataMoveId parameter to startMoveKeys and finishMoveKeys
  • When a valid dataMoveId is present, write serverKeysValue(dataMoveId) instead of serverKeysTrue — this encodes the LOGICAL_BULKLOAD type so SS can decode it
  • Add bulk load phase transitions (Triggered→Running in startMoveKeys, Running→Complete in finishMoveKeys) that were previously only in startMoveShards
  • Pass dataMoveId through rawStartMovement and rawFinishMovement

BulkLoadUtil (BulkLoadUtil.cpp/.h):

  • New getBulkLoadTaskStateByRange() function that reads bulkLoadTaskKeys directly using krmGetRanges, with the same retry/version logic as the old function

Test (BulkLoading.toml):

  • Changed shard_encode_location_metadata from true to false to exercise the new decoupled path

Bulk load previously required SHARD_ENCODE_LOCATION_METADATA to be enabled,
which pulled in the entire DataMoveMetaData persistence machinery. This was
unnecessary overhead — the BulkLoadTaskState is already independently persisted
at bulkLoadTaskKeys (keyed by range), and the storage server already knows its
range when fetching keys.

The core change: SS now looks up bulk load task metadata directly by range from
bulkLoadTaskKeys, instead of the old indirection path:
  dataMoveId → DataMoveMetaData → BulkLoadTaskState

Changes by component:

Storage Server (storageserver.actor.cpp):
- Replace getBulkLoadTaskStateFromDataMove() with getBulkLoadTaskStateByRange()
  in both fetchKeys and fetchShard paths
- Remove the gate that disabled conductBulkLoad when dataMoveId was
  anonymousShardId or invalid
- Remove assertions that cross-checked dataMoveId against the task state

Data Distributor (DDRelocationQueue.actor.cpp, DataDistribution.cpp/.h):
- When SHARD_ENCODE_LOCATION_METADATA is off, generate a proper dataMoveId
  with LOGICAL_BULKLOAD encoded for bulk load tasks (instead of anonymousShardId)
- Add bulk load task validation and dataMoveId assignment path that runs
  without SHARD_ENCODE_LOCATION_METADATA
- Remove the bulkLoadIsEnabled() gate that required SHARD_ENCODE_LOCATION_METADATA
- Remove the DDBulkLoadModeMonitorSkipped early return

MoveKeys (MoveKeys.cpp):
- Add optional dataMoveId parameter to startMoveKeys and finishMoveKeys
- When a valid dataMoveId is present, write serverKeysValue(dataMoveId) instead
  of serverKeysTrue — this encodes the LOGICAL_BULKLOAD type so SS can decode it
- Add bulk load phase transitions (Triggered→Running in startMoveKeys,
  Running→Complete in finishMoveKeys) that were previously only in startMoveShards
- Pass dataMoveId through rawStartMovement and rawFinishMovement

BulkLoadUtil (BulkLoadUtil.cpp/.h):
- New getBulkLoadTaskStateByRange() function that reads bulkLoadTaskKeys directly
  using krmGetRanges, with the same retry/version logic as the old function

Test (BulkLoading.toml):
- Changed shard_encode_location_metadata from true to false to exercise the new
  decoupled path
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 509fca4
  • Duration 0:03:32
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 509fca4
  • Duration 0:03:56
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 509fca4
  • Duration 0:04:02
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 509fca4
  • Duration 0:04:01
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 509fca4
  • Duration 0:04:06
  • Result: ❌ FAILED
  • Error: Error while executing command: if [[ $(git diff --shortstat 2> /dev/null | tail -n1) == "" ]]; then echo "CODE FORMAT CLEAN"; else echo "CODE FORMAT NOT CLEAN"; echo; echo "THE FOLLOWING FILES NEED TO BE FORMATTED"; echo; git ls-files -m; echo; if [[ $FDB_VERSION =~ 7\.\3. ]]; then echo skip; else exit 1; fi; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 509fca4
  • Duration 1:08:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 509fca4
  • Duration 2:35:08
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /opt/homebrew/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 8a86172
  • Duration 0:24:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 8a86172
  • Duration 0:44:12
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 8a86172
  • Duration 0:46:39
  • Result: ❌ FAILED
  • Error: Error while executing command: ctest -j ${NPROC} --no-compress-output -T test --output-on-failure. Reason: exit status 8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 8a86172
  • Duration 1:09:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 8a86172
  • Duration 1:10:13
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 8a86172
  • Duration 1:10:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 8a86172
  • Duration 1:42:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

michael stack added 3 commits May 6, 2026 22:33
- Assert ranges.size()==1 in fetchShard bulk load path (makes single-range
  invariant explicit)
- Document range mismatch as expected (shard splits within task range are
  normal; SS only loads keys within its assigned sub-range)
- Add detailed comment explaining cancellable=false before prevCleanup
  (DDQueueValidateError13 race condition)
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 37947cb
  • Duration 0:24:01
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 37947cb
  • Duration 0:36:04
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 37947cb
  • Duration 0:45:48
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 37947cb
  • Duration 0:59:25
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 37947cb
  • Duration 0:59:48
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 37947cb
  • Duration 1:00:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 37947cb
  • Duration 1:09:07
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants