feat: persist part metadata during multipart uploads and reuse when completing#3041
feat: persist part metadata during multipart uploads and reuse when completing#3041
Conversation
…plete
- Add PartMetadata data class to store per-part metadata (etag, checksum, size, lastModified)
- Add checksum fields to Part DTO (CRC32, CRC32C, CRC64NVME, SHA1, SHA256)
- MultipartStore.putPart now persists a {n}.partMetadata.json sidecar file alongside the binary part data
- MultipartStore.copyPart now also persists part metadata (ETag + size; no checksum for copies)
- MultipartStore.getMultipartUploadParts reads from persisted metadata files instead of re-computing from binary data; falls back to computing from file for backward compatibility
- MultipartStore.completeMultipartUpload reuses stored per-part checksums for COMPOSITE checksum computation (avoids re-reading part files for hashing)
- DigestUtil.checksumMultipartFromStoredChecksums computes composite checksum from pre-stored base64 part checksums
- MultipartService.putPart and MultipartController.uploadPart now thread checksum info through to the store
Agent-Logs-Url: https://github.com/adobe/S3Mock/sessions/d3d3231a-25df-4323-9e9f-37d72c402ce8
Co-authored-by: afranken <763000+afranken@users.noreply.github.com>
- Add KDoc to Part.checksum(algorithm) - Optimize checksumMultipartFromStoredChecksums to avoid repeated array allocation - Refactor partFromMetadata to use when expression for checksum mapping - Rename metadatas variable to metadataList Agent-Logs-Url: https://github.com/adobe/S3Mock/sessions/d3d3231a-25df-4323-9e9f-37d72c402ce8 Co-authored-by: afranken <763000+afranken@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds persistence of per-part multipart upload metadata (ETag/checksum/size/last-modified) to disk and reuses it during ListParts and CompleteMultipartUpload to avoid re-reading/re-hashing part files.
Changes:
- Persist
{partNumber}.partMetadata.jsonsidecar files when uploading/copying parts and read them when listing parts. - Reuse stored per-part checksums to compute COMPOSITE checksums during completion.
- Extend DTOs/tests to support per-part checksum fields and updated
putPartsignatures.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/main/kotlin/com/adobe/testing/s3mock/store/PartMetadata.kt | Introduces a persisted model for per-part metadata sidecars. |
| server/src/main/kotlin/com/adobe/testing/s3mock/store/MultipartStore.kt | Writes/reads per-part metadata, uses stored checksums for COMPOSITE completion. |
| server/src/main/kotlin/com/adobe/testing/s3mock/util/DigestUtil.kt | Adds COMPOSITE checksum computation from stored per-part checksums. |
| server/src/main/kotlin/com/adobe/testing/s3mock/service/MultipartService.kt | Threads checksum algorithm/value through to the store. |
| server/src/main/kotlin/com/adobe/testing/s3mock/controller/MultipartController.kt | Passes parsed checksum algorithm/value into putPart. |
| server/src/main/kotlin/com/adobe/testing/s3mock/dto/Part.kt | Adds checksum fields + helper accessor to align with AWS Part element. |
| server/src/test/kotlin/com/adobe/testing/s3mock/controller/MultipartControllerTest.kt | Updates mock matchers for new putPart signature. |
| server/src/test/kotlin/com/adobe/testing/s3mock/store/MultipartStoreTest.kt | Adds tests verifying sidecar persistence, ListParts checksum, and COMPOSITE reuse. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fun putPart( | ||
| bucket: BucketMetadata, | ||
| id: UUID, | ||
| uploadId: UUID, | ||
| partNumber: Int, | ||
| path: Path, | ||
| encryptionHeaders: Map<String, String>, | ||
| checksumAlgorithm: ChecksumAlgorithm? = null, | ||
| checksum: String? = null, | ||
| ): String { | ||
| val file = inputPathToFile(path, getPartPath(bucket, uploadId, partNumber)) | ||
|
|
||
| return DigestUtil.hexDigest(encryptionHeaders[AwsHttpHeaders.X_AMZ_SERVER_SIDE_ENCRYPTION_AWS_KMS_KEY_ID], file) | ||
| val etag = DigestUtil.hexDigest(encryptionHeaders[AwsHttpHeaders.X_AMZ_SERVER_SIDE_ENCRYPTION_AWS_KMS_KEY_ID], file) | ||
| writePartMetadata( | ||
| bucket, | ||
| uploadId, | ||
| PartMetadata( | ||
| partNumber = partNumber, | ||
| etag = etag, | ||
| checksumAlgorithm = checksumAlgorithm, | ||
| checksum = checksum, | ||
| size = file.length(), | ||
| lastModified = file.lastModified(), | ||
| ), | ||
| ) | ||
| return etag |
There was a problem hiding this comment.
putPart writes both the part file and a new sidecar metadata file, but this method is not synchronized on lockStore[uploadId] even though the class-level contract says any file modification should acquire the upload lock (see comment near the lockStore field). This can race with abortMultipartUpload (folder deletion) or concurrent operations on the same upload. Wrap the part write + metadata write in a synchronized(lockStore[uploadId]!!) block (and ensure the lock is present) to keep multipart operations consistent.
| val checksumFor = | ||
| validateChecksums(uploadInfo, tempFile, parts, partsPaths, checksum, checksumType, checksumAlgorithm, bucket, uploadId) | ||
| val etag = DigestUtil.hexDigestMultipart(partsPaths) | ||
| val s3ObjectMetadata = | ||
| objectStore.storeS3ObjectMetadata( |
There was a problem hiding this comment.
completeMultipartUpload deletes only the .part files (partsPaths) after completion. With the new {partNumber}.partMetadata.json sidecars being written, those metadata files will be left behind indefinitely, increasing disk usage for completed uploads. When cleaning up parts, also delete the corresponding .partMetadata.json files (or delete all part-related files in the upload folder) so completed uploads don’t accumulate per-part metadata.
| private fun writePartMetadata( | ||
| bucket: BucketMetadata, | ||
| uploadId: UUID, | ||
| partMetadata: PartMetadata, | ||
| ) { | ||
| try { | ||
| val metaFile = getPartMetadataPath(bucket, uploadId, partMetadata.partNumber).toFile() | ||
| objectMapper.writeValue(metaFile, partMetadata) | ||
| } catch (e: IOException) { | ||
| throw IllegalStateException( | ||
| "Could not write part metadata file. uploadId=$uploadId, partNumber=${partMetadata.partNumber}", | ||
| e, | ||
| ) | ||
| } |
There was a problem hiding this comment.
writePartMetadata writes JSON directly to the final path without using the per-upload lock and without an atomic write pattern. A concurrent read can observe a partially-written file (leading to parse errors and fallback behavior), and concurrent abort/cleanup can fail mid-write. Consider synchronizing on lockStore[uploadId] and writing via a temp file + atomic move/replace to ensure metadata files are either fully written or absent.
| val allChecksumBytes = | ||
| partChecksums | ||
| .flatMap { Base64.getDecoder().decode(it).toList() } | ||
| .toByteArray() | ||
| sdkChecksum.update(allChecksumBytes, 0, allChecksumBytes.size) |
There was a problem hiding this comment.
checksumMultipartFromStoredChecksums decodes each part checksum via Base64.getDecoder().decode(...).toList() and then concatenates all bytes. This creates a lot of intermediate allocations, and invalid base64 in persisted metadata will currently throw an IllegalArgumentException that bubbles out of completion. Update the implementation to stream updates into SdkChecksum per decoded part (no list/flatten), and either wrap decode failures with a clearer exception or fall back to computing from the part files when persisted checksums are malformed.
| val allChecksumBytes = | |
| partChecksums | |
| .flatMap { Base64.getDecoder().decode(it).toList() } | |
| .toByteArray() | |
| sdkChecksum.update(allChecksumBytes, 0, allChecksumBytes.size) | |
| val decoder = Base64.getDecoder() | |
| try { | |
| partChecksums.forEach { partChecksum -> | |
| val decodedChecksum = decoder.decode(partChecksum) | |
| sdkChecksum.update(decodedChecksum, 0, decodedChecksum.size) | |
| } | |
| } catch (e: IllegalArgumentException) { | |
| throw IllegalStateException( | |
| "Checksum could not be calculated because stored multipart part checksum metadata is malformed.", | |
| e, | |
| ) | |
| } |
Summary
Implements per-part metadata persistence during multipart uploads, so that ETag, checksum, size and last-modified are stored on disk (as
{partNumber}.partMetadata.jsonsidecar files) alongside the binary part data and reused instead of being recomputed on every read or complete operation.Changes
New
PartMetadata.kt— data class storing per-part metadata:partNumber,etag,checksumAlgorithm,checksum,size,lastModifiedModified
Part.kt— adds checksum fields (ChecksumCRC32,ChecksumCRC32C,ChecksumCRC64NVME,ChecksumSHA1,ChecksumSHA256) and achecksum(algorithm)helper method, matching the AWS S3PartAPI elementMultipartStore.kt:putPartnow acceptschecksumAlgorithmandchecksum, and persists a{partNumber}.partMetadata.jsonsidecar file after writing the binary partcopyPartalso persists part metadata (ETag + size; no checksum for copies)getMultipartUploadPartsreads ETag, size, last-modified and checksum from persisted metadata files; falls back to computing from binary file for backward compatibilitycompleteMultipartUploadreuses stored per-part checksums for the COMPOSITE checksum computation, avoiding redundant re-reads and re-hashing of part filesDigestUtil.kt— addschecksumMultipartFromStoredChecksums(partChecksums, algorithm)to compute a COMPOSITE checksum from pre-stored base64-encoded per-part checksumsMultipartService.kt—putPartthreadschecksumAlgorithmandchecksumthrough to the store (defaultnullfor backward compatibility)MultipartController.kt— passes the parsedchecksumAlgorithmandchecksumtoMultipartService.putPartTests
MultipartControllerTestmock matchers for newputPartsignatureMultipartStoreTesttests:putPart persists part metadata sidecar file— verifies the JSON sidecar file exists after uploadgetMultipartUploadParts returns checksum from persisted metadata— verifies checksum is returned inListPartscompleteMultipartUpload reuses stored part checksums for COMPOSITE computation— verifies end-to-end COMPOSITE checksum computation via stored metadataRelated
Closes the issue described in the problem statement: "Let S3Mock persist part metadata during Multipart uploads and reuse it when completing the upload."