Add hash → block number index + lookup API by mo4islona · Pull Request #72 · subsquid/data

mo4islona · 2026-05-29T18:08:35Z

Summary

Lets consumers resolve a bare block hash to its block number. Today the hash is only buried inside each chunk's Arrow hash column, so the only way to look it up is a linear scan (~0.1–0.6s, and worse as retention grows). This adds a proper index for an O(1) point lookup, plus an HTTP endpoint to use it.

The motivation: services that receive a raw hash (from logs, fork events, external sources) need to map it back to a block number to keep working with the standard /stream APIs.

What's new

Storage — a RocksDB index mapping hash → block number per dataset, kept up to date automatically as chunks are ingested, forked, and pruned. Enabled for EVM datasets for now (easy to extend to other chains later).
Hotblocks — GET /datasets/{id}/hashes/{hash}/block, returning {number, hash} or 404 if the hash isn't found.

Rollout notes

No backfill: the index covers only data ingested after this ships. Hashes from chunks that already exist at upgrade time resolve to 404 until those chunks roll off via retention. Worth calling out in release notes.
Safe to roll back: older binaries simply ignore the new column family.

Testing

Unit tests in sqd-storage cover ingest, fork, retention, dataset deletion, and confirm compaction leaves the index intact. Manual HTTP smoke against a live hotblocks instance still recommended.

🤖 Generated with Claude Code

New CF_BLOCK_HASHES maps dataset_id||hash to a block number for sub-ms lookups, updated on ingest/fork/retention/delete (EVM-only), untouched by compaction. delete_dataset now uses per-chunk txs to bound memory. Adds find_block_by_hash and GET /datasets/{id}/hashes/{hash}/block. No backfill: old chunks 404 until they age out. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Clear the dataset's whole hash index with one range tombstone over its CF_BLOCK_HASHES prefix, then delete all chunk metadata and the label in a single transaction. The range-delete stays outside the tx (RocksDB forbids delete_range inside a transaction); a crash between the two leaves chunks without index entries, which 404 until re-indexed, not corruption. Keeps memory bounded (no delete_cf per block) while making the metadata deletion all-or-nothing again, and re-takes the optimistic label lock. Adds BlockHashIndexKey::dataset_range for the prefix bounds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mo4islona force-pushed the block-hash-index branch from 79ec945 to ae9883d Compare May 29, 2026 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hash → block number index + lookup API#72

Add hash → block number index + lookup API#72
mo4islona wants to merge 2 commits into
masterfrom
block-hash-index

mo4islona commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mo4islona commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new

Rollout notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mo4islona commented May 29, 2026 •

edited

Loading