Add hash → block number index + lookup API#72
Draft
mo4islona wants to merge 2 commits into
Draft
Conversation
New CF_BLOCK_HASHES maps dataset_id||hash to a block number for sub-ms
lookups, updated on ingest/fork/retention/delete (EVM-only), untouched by
compaction. delete_dataset now uses per-chunk txs to bound memory. Adds
find_block_by_hash and GET /datasets/{id}/hashes/{hash}/block. No backfill:
old chunks 404 until they age out.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
79ec945 to
ae9883d
Compare
Clear the dataset's whole hash index with one range tombstone over its CF_BLOCK_HASHES prefix, then delete all chunk metadata and the label in a single transaction. The range-delete stays outside the tx (RocksDB forbids delete_range inside a transaction); a crash between the two leaves chunks without index entries, which 404 until re-indexed, not corruption. Keeps memory bounded (no delete_cf per block) while making the metadata deletion all-or-nothing again, and re-takes the optimistic label lock. Adds BlockHashIndexKey::dataset_range for the prefix bounds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lets consumers resolve a bare block hash to its block number. Today the hash is only buried inside each chunk's Arrow
hashcolumn, so the only way to look it up is a linear scan (~0.1–0.6s, and worse as retention grows). This adds a proper index for an O(1) point lookup, plus an HTTP endpoint to use it.The motivation: services that receive a raw hash (from logs, fork events, external sources) need to map it back to a block number to keep working with the standard
/streamAPIs.What's new
hash → block numberper dataset, kept up to date automatically as chunks are ingested, forked, and pruned. Enabled for EVM datasets for now (easy to extend to other chains later).GET /datasets/{id}/hashes/{hash}/block, returning{number, hash}or404if the hash isn't found.Rollout notes
404until those chunks roll off via retention. Worth calling out in release notes.Testing
Unit tests in
sqd-storagecover ingest, fork, retention, dataset deletion, and confirm compaction leaves the index intact. Manual HTTP smoke against a live hotblocks instance still recommended.🤖 Generated with Claude Code