perf(castore): generate metadata from bytes in in-mem cache#614
perf(castore): generate metadata from bytes in in-mem cache#614sambhav-jain-16 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes metainfo generation for blobs already resident in the in-memory CA store by adding a byte-slice-specific path in core and switching the memory-cache codepath to use it. It fits into the existing torrent/metainfo pipeline by preserving the same serialized MetaInfo shape while reducing allocation overhead during cache population.
Changes:
- Add
core.NewMetaInfoFromBytesandcalcPieceSumsFromBytesto compute piece checksums directly from[]byte. - Update
lib/store/ca_store.goto use the new byte-slice path when generating metainfo for in-memory cache entries. - Add correctness tests comparing reader-based vs byte-slice metainfo generation, plus a benchmark for the new path.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
lib/store/ca_store.go |
Switches in-memory metainfo generation from bytes.NewReader to the new byte-slice API. |
core/metainfo.go |
Introduces the new []byte metainfo constructor and byte-slice piece checksum helper. |
core/metainfo_test.go |
Adds equivalence tests for the new constructor and benchmarks its performance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0db0de4 to
0a384be
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| infoHash: h, | ||
| digest: d, | ||
| }, nil | ||
| return assembleMetaInfo(d, length, pieceSums, pieceLength) |
There was a problem hiding this comment.
Is a 0.5% performance improvement worth the extra complexity we are introducing from extra code?
There was a problem hiding this comment.
I think given the code addition is very less and it is in the critical part of downloading and generating the metainfo, it should benefit us.
0a384be to
db525b7
Compare
db525b7 to
02aaf00
Compare
02aaf00 to
e0b5045
Compare
e0b5045 to
8547447
Compare
| bReader, err := miReader.Serialize() | ||
| require.NoError(err) | ||
| bBytes, err := miBytes.Serialize() | ||
| require.NoError(err) | ||
| require.Equal(bReader, bBytes) |
There was a problem hiding this comment.
What is the reason for these serialisation-based checks when we also manually assert the fields?
| b.SetBytes(int64(tc.blobSize)) | ||
| for b.Loop() { | ||
| mi, err := NewMetaInfoFromBytes(d, data, pl) | ||
| require.NoError(b, err) |
There was a problem hiding this comment.
nit: If we revert to calling b.Fatal directly, we mitigate any concurrency issues from the benchmarking loop, and we improve consistency with the other assertion.
There was a problem hiding this comment.
Just curious, what concurrency issues can we face with this?
There was a problem hiding this comment.
"Concurrency issues" was a bit of an exaggeration, but require.NoError ends up calling runtime.Goexit if the assertion fails, meaning the cleanups may be skipped.
What?
While downloading the blobs in the in-memory cache, we use the same method as for disk cache to generate metadata for the blob downloaded.

core.NewMetaInfocreates a 32KB scratch buffer for each piece of the blob, resulting in unnecessary allocs even though the blob is already in memory. This is shown in theio.CopyN, which internally copies the buffer.Although this is just 0.5% of the total allocation, it provides a quick win, with further improvements already planned.
This change adds
core.NewMetaInfoFromBytes, a byte-slice variant ofcore.NewMetaInfothat avoids the per-piece 32KB allocation.castore.generateMetadataFromBytesis updated to use itTesting
Added a new benchmark
BenchmarkNewMetaInfo, which was first executed oncore.NewMetaInfoand then oncore.NewMetaInfoFromBytes.