Skip to content

feat: persist part metadata during multipart uploads and reuse when completing#3041

Open
Copilot wants to merge 2 commits intomainfrom
copilot/let-s3mock-persist-part-metadata
Open

feat: persist part metadata during multipart uploads and reuse when completing#3041
Copilot wants to merge 2 commits intomainfrom
copilot/let-s3mock-persist-part-metadata

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 22, 2026

Summary

Implements per-part metadata persistence during multipart uploads, so that ETag, checksum, size and last-modified are stored on disk (as {partNumber}.partMetadata.json sidecar files) alongside the binary part data and reused instead of being recomputed on every read or complete operation.

Changes

New

  • PartMetadata.kt — data class storing per-part metadata: partNumber, etag, checksumAlgorithm, checksum, size, lastModified

Modified

  • Part.kt — adds checksum fields (ChecksumCRC32, ChecksumCRC32C, ChecksumCRC64NVME, ChecksumSHA1, ChecksumSHA256) and a checksum(algorithm) helper method, matching the AWS S3 Part API element
  • MultipartStore.kt:
    • putPart now accepts checksumAlgorithm and checksum, and persists a {partNumber}.partMetadata.json sidecar file after writing the binary part
    • copyPart also persists part metadata (ETag + size; no checksum for copies)
    • getMultipartUploadParts reads ETag, size, last-modified and checksum from persisted metadata files; falls back to computing from binary file for backward compatibility
    • completeMultipartUpload reuses stored per-part checksums for the COMPOSITE checksum computation, avoiding redundant re-reads and re-hashing of part files
  • DigestUtil.kt — adds checksumMultipartFromStoredChecksums(partChecksums, algorithm) to compute a COMPOSITE checksum from pre-stored base64-encoded per-part checksums
  • MultipartService.ktputPart threads checksumAlgorithm and checksum through to the store (default null for backward compatibility)
  • MultipartController.kt — passes the parsed checksumAlgorithm and checksum to MultipartService.putPart

Tests

  • Updated MultipartControllerTest mock matchers for new putPart signature
  • Added 3 new MultipartStoreTest tests:
    • putPart persists part metadata sidecar file — verifies the JSON sidecar file exists after upload
    • getMultipartUploadParts returns checksum from persisted metadata — verifies checksum is returned in ListParts
    • completeMultipartUpload reuses stored part checksums for COMPOSITE computation — verifies end-to-end COMPOSITE checksum computation via stored metadata

Related

Closes the issue described in the problem statement: "Let S3Mock persist part metadata during Multipart uploads and reuse it when completing the upload."

Copilot AI and others added 2 commits April 22, 2026 09:30
…plete

- Add PartMetadata data class to store per-part metadata (etag, checksum, size, lastModified)
- Add checksum fields to Part DTO (CRC32, CRC32C, CRC64NVME, SHA1, SHA256)
- MultipartStore.putPart now persists a {n}.partMetadata.json sidecar file alongside the binary part data
- MultipartStore.copyPart now also persists part metadata (ETag + size; no checksum for copies)
- MultipartStore.getMultipartUploadParts reads from persisted metadata files instead of re-computing from binary data; falls back to computing from file for backward compatibility
- MultipartStore.completeMultipartUpload reuses stored per-part checksums for COMPOSITE checksum computation (avoids re-reading part files for hashing)
- DigestUtil.checksumMultipartFromStoredChecksums computes composite checksum from pre-stored base64 part checksums
- MultipartService.putPart and MultipartController.uploadPart now thread checksum info through to the store

Agent-Logs-Url: https://github.com/adobe/S3Mock/sessions/d3d3231a-25df-4323-9e9f-37d72c402ce8

Co-authored-by: afranken <763000+afranken@users.noreply.github.com>
- Add KDoc to Part.checksum(algorithm)
- Optimize checksumMultipartFromStoredChecksums to avoid repeated array allocation
- Refactor partFromMetadata to use when expression for checksum mapping
- Rename metadatas variable to metadataList

Agent-Logs-Url: https://github.com/adobe/S3Mock/sessions/d3d3231a-25df-4323-9e9f-37d72c402ce8

Co-authored-by: afranken <763000+afranken@users.noreply.github.com>
Copilot AI requested a review from afranken April 22, 2026 09:37
@afranken afranken marked this pull request as ready for review April 22, 2026 13:57
Copilot AI review requested due to automatic review settings April 22, 2026 13:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds persistence of per-part multipart upload metadata (ETag/checksum/size/last-modified) to disk and reuses it during ListParts and CompleteMultipartUpload to avoid re-reading/re-hashing part files.

Changes:

  • Persist {partNumber}.partMetadata.json sidecar files when uploading/copying parts and read them when listing parts.
  • Reuse stored per-part checksums to compute COMPOSITE checksums during completion.
  • Extend DTOs/tests to support per-part checksum fields and updated putPart signatures.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/src/main/kotlin/com/adobe/testing/s3mock/store/PartMetadata.kt Introduces a persisted model for per-part metadata sidecars.
server/src/main/kotlin/com/adobe/testing/s3mock/store/MultipartStore.kt Writes/reads per-part metadata, uses stored checksums for COMPOSITE completion.
server/src/main/kotlin/com/adobe/testing/s3mock/util/DigestUtil.kt Adds COMPOSITE checksum computation from stored per-part checksums.
server/src/main/kotlin/com/adobe/testing/s3mock/service/MultipartService.kt Threads checksum algorithm/value through to the store.
server/src/main/kotlin/com/adobe/testing/s3mock/controller/MultipartController.kt Passes parsed checksum algorithm/value into putPart.
server/src/main/kotlin/com/adobe/testing/s3mock/dto/Part.kt Adds checksum fields + helper accessor to align with AWS Part element.
server/src/test/kotlin/com/adobe/testing/s3mock/controller/MultipartControllerTest.kt Updates mock matchers for new putPart signature.
server/src/test/kotlin/com/adobe/testing/s3mock/store/MultipartStoreTest.kt Adds tests verifying sidecar persistence, ListParts checksum, and COMPOSITE reuse.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 189 to +213
fun putPart(
bucket: BucketMetadata,
id: UUID,
uploadId: UUID,
partNumber: Int,
path: Path,
encryptionHeaders: Map<String, String>,
checksumAlgorithm: ChecksumAlgorithm? = null,
checksum: String? = null,
): String {
val file = inputPathToFile(path, getPartPath(bucket, uploadId, partNumber))

return DigestUtil.hexDigest(encryptionHeaders[AwsHttpHeaders.X_AMZ_SERVER_SIDE_ENCRYPTION_AWS_KMS_KEY_ID], file)
val etag = DigestUtil.hexDigest(encryptionHeaders[AwsHttpHeaders.X_AMZ_SERVER_SIDE_ENCRYPTION_AWS_KMS_KEY_ID], file)
writePartMetadata(
bucket,
uploadId,
PartMetadata(
partNumber = partNumber,
etag = etag,
checksumAlgorithm = checksumAlgorithm,
checksum = checksum,
size = file.length(),
lastModified = file.lastModified(),
),
)
return etag
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putPart writes both the part file and a new sidecar metadata file, but this method is not synchronized on lockStore[uploadId] even though the class-level contract says any file modification should acquire the upload lock (see comment near the lockStore field). This can race with abortMultipartUpload (folder deletion) or concurrent operations on the same upload. Wrap the part write + metadata write in a synchronized(lockStore[uploadId]!!) block (and ensure the lock is present) to keep multipart operations consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +242 to 246
val checksumFor =
validateChecksums(uploadInfo, tempFile, parts, partsPaths, checksum, checksumType, checksumAlgorithm, bucket, uploadId)
val etag = DigestUtil.hexDigestMultipart(partsPaths)
val s3ObjectMetadata =
objectStore.storeS3ObjectMetadata(
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completeMultipartUpload deletes only the .part files (partsPaths) after completion. With the new {partNumber}.partMetadata.json sidecars being written, those metadata files will be left behind indefinitely, increasing disk usage for completed uploads. When cleaning up parts, also delete the corresponding .partMetadata.json files (or delete all part-related files in the upload folder) so completed uploads don’t accumulate per-part metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +433 to +446
private fun writePartMetadata(
bucket: BucketMetadata,
uploadId: UUID,
partMetadata: PartMetadata,
) {
try {
val metaFile = getPartMetadataPath(bucket, uploadId, partMetadata.partNumber).toFile()
objectMapper.writeValue(metaFile, partMetadata)
} catch (e: IOException) {
throw IllegalStateException(
"Could not write part metadata file. uploadId=$uploadId, partNumber=${partMetadata.partNumber}",
e,
)
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writePartMetadata writes JSON directly to the final path without using the per-upload lock and without an atomic write pattern. A concurrent read can observe a partially-written file (leading to parse errors and fallback behavior), and concurrent abort/cleanup can fail mid-write. Consider synchronizing on lockStore[uploadId] and writing via a temp file + atomic move/replace to ensure metadata files are either fully written or absent.

Copilot uses AI. Check for mistakes.
Comment on lines +190 to +194
val allChecksumBytes =
partChecksums
.flatMap { Base64.getDecoder().decode(it).toList() }
.toByteArray()
sdkChecksum.update(allChecksumBytes, 0, allChecksumBytes.size)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checksumMultipartFromStoredChecksums decodes each part checksum via Base64.getDecoder().decode(...).toList() and then concatenates all bytes. This creates a lot of intermediate allocations, and invalid base64 in persisted metadata will currently throw an IllegalArgumentException that bubbles out of completion. Update the implementation to stream updates into SdkChecksum per decoded part (no list/flatten), and either wrap decode failures with a clearer exception or fall back to computing from the part files when persisted checksums are malformed.

Suggested change
val allChecksumBytes =
partChecksums
.flatMap { Base64.getDecoder().decode(it).toList() }
.toByteArray()
sdkChecksum.update(allChecksumBytes, 0, allChecksumBytes.size)
val decoder = Base64.getDecoder()
try {
partChecksums.forEach { partChecksum ->
val decodedChecksum = decoder.decode(partChecksum)
sdkChecksum.update(decodedChecksum, 0, decodedChecksum.size)
}
} catch (e: IllegalArgumentException) {
throw IllegalStateException(
"Checksum could not be calculated because stored multipart part checksum metadata is malformed.",
e,
)
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants