Skip to content

Support dictionary-compressed virtual chunk locations#22

Merged
Shane98c merged 10 commits into
mainfrom
Shane98c/vendor-fzstd-compressed-locations
Jun 23, 2026
Merged

Support dictionary-compressed virtual chunk locations#22
Shane98c merged 10 commits into
mainfrom
Shane98c/vendor-fzstd-compressed-locations

Conversation

@Shane98c

@Shane98c Shane98c commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Closes #3. Closes #21.

Decodes icechunk's dictionary-compressed virtual chunk locations (Manifest.location_dictionary + ChunkRef.compressed_location, from earth-mover/icechunk#1776).

Why a vendored, patched fzstd?

compressed_location needs zstd dictionary decompression, which released fzstd@0.1.1 lacks. Tested against all 244 real metadata fixtures:

Decoder Result
fzstd@0.1.1 byte-exact on all 244, but no dictionary API
zstdify (1.4.0 and main) silently corrupts 77/244 (~32%) — wrong bytes, no error (vs zstd CLI)
fzstd PR#18 correct on all 244 + decodes real dictionary frames

zstdify is disqualified (silent corruption), and dictionary support exists only in the unmerged, unreleased fzstd#18 (can't be cleanly npm installed; upstream dormant). So we vendor its source under src/vendor/fzstd/ (MIT, attributed — see its README.md for provenance, re-vendor steps, and drop criteria).

Testing

New manifest-dictionary.test.ts covers end-to-end decode, missing-dictionary error, the size bound, the aliasing-fix regression, and malformed-frame rejection. Fixtures are zstd-CLI-generated (no upstream fixture uses this feature yet).

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request vendors the fzstd library to support zstd dictionary decompression for virtual chunk locations in manifests, resolving dependency limitations and data corruption issues. It updates the manifest parser to decode compressed locations using the manifest's zstd dictionary, implements safety checks to bound decompression memory allocation, and aligns payload extraction priority with the Rust implementation. Comprehensive test coverage is also added. Feedback on the changes points out a potential integer overflow in zstdFrameAllocSize due to 32-bit bitwise shift operations in JavaScript, suggesting standard arithmetic exponentiation and division as a safer alternative.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +60 to +61
const base = 1 << (10 + (wd >> 3));
windowSize = base + (base >> 3) * (wd & 7);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

In JavaScript, bitwise operators (like << and >>) operate on 32-bit signed integers. If the window descriptor wd has a large exponent (e.g., wd >> 3 >= 21), 10 + (wd >> 3) will be >= 31, causing 1 << (10 + (wd >> 3)) to overflow or wrap around to a negative or incorrect value. Similarly, base >> 3 will overflow if base is >= 2^31.

To prevent this and ensure correct handling of large window sizes (especially when validating untrusted input), use standard arithmetic exponentiation (2 ** ...) and division (/ 8) instead of bitwise shifts.

Suggested change
const base = 1 << (10 + (wd >> 3));
windowSize = base + (base >> 3) * (wd & 7);
const base = 2 ** (10 + (wd >> 3));
windowSize = base + (base / 8) * (wd & 7);

@Shane98c Shane98c merged commit a6de5bd into main Jun 23, 2026
12 checks passed
@Shane98c Shane98c deleted the Shane98c/vendor-fzstd-compressed-locations branch June 23, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

location being interpreted as null Support compressed virtual chunk locations (needs fzstd dictionary support)

1 participant