fix(vector-index): preserve byteOffset/byteLength in base64 round-trip (closes #587, #584)#683
Conversation
Buffer.from(b64, 'base64') returns a slice of Node's shared 8KB pool, and new Float32Array(buf.buffer) ignores byteOffset/byteLength — minting a 2048-element view over the entire pool. Same hazard on the encode side when the source Float32Array is itself a sub-view (e.g. .subarray() or a typed-array set into a larger buffer). The encode path now passes byteOffset/byteLength explicitly; decode mints the view at the correct offset with length scaled by Float32Array.BYTES_PER_ELEMENT. Reported as 'dimensions seen on disk: 2048' index-startup crashes in #455 / #469 / #584 / #587. Two regression tests added: - 384-dim x 5 vectors round-trip (within pool threshold, hits the decode bug) - subarray sub-view encode (hits the encode bug)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR fixes a bug in Float32Array serialization where encoding/decoding the entire underlying buffer could create incorrectly sized "phantom" views from shared Buffer pool slices. The fix explicitly preserves byteOffset and byteLength boundaries during base64 round-trips. Two new regression tests validate the fix for pooled and sliced Float32Array scenarios. ChangesFloat32Array Serialization Fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/state/vector-index.ts (1)
1-7: ⚡ Quick winTrim WHAT-level implementation narration in
srccode.Please reduce this to a minimal WHY note (or remove it) to align with the project’s
srccomment style.As per coding guidelines "Avoid code comments explaining WHAT — use clear naming instead".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/state/vector-index.ts` around lines 1 - 7, The comment block in src/state/vector-index.ts contains WHAT-level implementation details about Node's Buffer pool and Float32Array slicing; replace it with a minimal WHY-only note (e.g., "Pass byteOffset and byteLength to preserve correct slice bounds and avoid Buffer pool/Float32Array view issues") or remove the comment entirely so it follows src comment style; locate the comment near the Buffer.from(...) / new Float32Array(...) usage and update that comment accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/state/vector-index.ts`:
- Around line 1-7: The comment block in src/state/vector-index.ts contains
WHAT-level implementation details about Node's Buffer pool and Float32Array
slicing; replace it with a minimal WHY-only note (e.g., "Pass byteOffset and
byteLength to preserve correct slice bounds and avoid Buffer pool/Float32Array
view issues") or remove the comment entirely so it follows src comment style;
locate the comment near the Buffer.from(...) / new Float32Array(...) usage and
update that comment accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc010e5b-082a-4805-b47d-3bc99e83135a
📒 Files selected for processing (2)
src/state/vector-index.tstest/vector-index.test.ts
Closes #587. Also resolves #584 (same root cause), and unblocks #455 / #469 which reported the same 2048-element phantom view.
What was happening
Buffer.from(b64, "base64")returns a slice of Node's internal Buffer pool (8KB).new Float32Array(buf.buffer)ignoresbuf.byteOffsetandbuf.byteLength— it mints a Float32 view over the entire pool, not the slice. For embeddings whose serialised form is small enough to fit in the pool (≤8KB raw bytes, i.e. ≤2048 floats), restart-time deserialisation produces a 2048-dim vector. The live index then refuses to start withdimensions seen on disk: 2048.Same risk on the encode side:
Buffer.from(arr.buffer)drops slice metadata whenarris a sub-view (e.g..subarray()).Fix
Both helpers in
src/state/vector-index.tsnow passbyteOffset+byteLengthexplicitly. Decode scales length byFloat32Array.BYTES_PER_ELEMENTfor a self-documenting constant.Tests
Two new regression cases:
subarraysub-view encode (hits the encode bug)Full
vector-index.test.tssuite: 11/11 pass.Summary by CodeRabbit
Bug Fixes
Tests