Skip to content

Add hash → block number index + lookup API#72

Draft
mo4islona wants to merge 2 commits into
masterfrom
block-hash-index
Draft

Add hash → block number index + lookup API#72
mo4islona wants to merge 2 commits into
masterfrom
block-hash-index

Conversation

@mo4islona
Copy link
Copy Markdown
Contributor

@mo4islona mo4islona commented May 29, 2026

Summary

Lets consumers resolve a bare block hash to its block number. Today the hash is only buried inside each chunk's Arrow hash column, so the only way to look it up is a linear scan (~0.1–0.6s, and worse as retention grows). This adds a proper index for an O(1) point lookup, plus an HTTP endpoint to use it.

The motivation: services that receive a raw hash (from logs, fork events, external sources) need to map it back to a block number to keep working with the standard /stream APIs.

What's new

  • Storage — a RocksDB index mapping hash → block number per dataset, kept up to date automatically as chunks are ingested, forked, and pruned. Enabled for EVM datasets for now (easy to extend to other chains later).
  • HotblocksGET /datasets/{id}/hashes/{hash}/block, returning {number, hash} or 404 if the hash isn't found.

Rollout notes

  • No backfill: the index covers only data ingested after this ships. Hashes from chunks that already exist at upgrade time resolve to 404 until those chunks roll off via retention. Worth calling out in release notes.
  • Safe to roll back: older binaries simply ignore the new column family.

Testing

Unit tests in sqd-storage cover ingest, fork, retention, dataset deletion, and confirm compaction leaves the index intact. Manual HTTP smoke against a live hotblocks instance still recommended.

🤖 Generated with Claude Code

New CF_BLOCK_HASHES maps dataset_id||hash to a block number for sub-ms
lookups, updated on ingest/fork/retention/delete (EVM-only), untouched by
compaction. delete_dataset now uses per-chunk txs to bound memory. Adds
find_block_by_hash and GET /datasets/{id}/hashes/{hash}/block. No backfill:
old chunks 404 until they age out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clear the dataset's whole hash index with one range tombstone over its CF_BLOCK_HASHES prefix, then delete all chunk metadata and the label in a single transaction. The range-delete stays outside the tx (RocksDB forbids delete_range inside a transaction); a crash between the two leaves chunks without index entries, which 404 until re-indexed, not corruption.

Keeps memory bounded (no delete_cf per block) while making the metadata deletion all-or-nothing again, and re-takes the optimistic label lock. Adds BlockHashIndexKey::dataset_range for the prefix bounds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant