feat(duckdb,sqlite): add invalidate_instance for snapshot-driven cache invalidation by phillipleblanc · Pull Request #635 · datafusion-contrib/datafusion-table-providers

phillipleblanc · 2026-04-28T11:53:20Z

Summary

DuckDBTableProviderFactory and SqliteTableProviderFactory cache pool instances by DbInstanceKey (file path or memory). There is currently no way for a caller to evict an entry from that registry, so a caller that legitimately needs to swap the underlying database file out-of-band — for example, after restoring it from a snapshot in object storage — has no way to force the next provider build to open a fresh pool. Subsequent providers reuse the prior pool's already-open connections, which keep observing the previously-opened file inode rather than the replacement on disk.

This PR adds a small, targeted invalidation API on both factories:

DuckDBTableProviderFactory::invalidate_instance(&DbInstanceKey) -> Option<DuckDbConnectionPool>
DuckDBTableProviderFactory::invalidate_file_instance(impl Into<Arc<str>>) -> Option<DuckDbConnectionPool>
SqliteTableProviderFactory::invalidate_instance(&DbInstanceKey) -> Option<SqliteConnectionPool>
SqliteTableProviderFactory::invalidate_file_instance(impl Into<Arc<str>>) -> Option<SqliteConnectionPool>

Semantics:

Removes the cached pool entry and returns it (so the caller can hold the prior pool alive for in-flight queries to drain) without disturbing other entries.
Re-invalidating an absent key is a no-op (returns None).
Existing connections held by other clones of the previously returned pool keep operating against the file descriptor they opened; once they are dropped the OS releases the prior inode. New providers built after invalidation will open the file fresh and observe the replacement contents.

Motivation

Used by spiceai/spiceai#10565 (refresh_mode: snapshot) to allow accelerator providers to be cleanly reloaded after a snapshot is restored to the local file. Manual end-to-end testing without this API confirmed that the swap path produces a fresh provider but queries continue to read stale data from the cached connection pool.

Tests

duckdb::tests::invalidate_instance_drops_cached_pool — insertion, idempotent reuse, eviction, no-op missing-key, repopulation.
sqlite::tests::invalidate_instance_drops_cached_pool — same coverage including the file-path convenience wrapper.

Both pass: cargo test --lib --features duckdb,sqlite,duckdb-federation invalidate_instance. Clippy clean.

Note on backporting

Branched from the spiceai-52 rev currently consumed by Spice (d7fae5f0). Targeting main here as the canonical upstream landing; will request a backport to spiceai-52 once accepted.

…eshes Address Copilot review comments and bugs found via manual testing against an S3 snapshot bucket: - runtime-acceleration: download_to_file now uses temp-file + atomic rename so concurrent readers in refresh_mode: snapshot never observe a partially written file (download is also fsync'd before rename for crash safety). - runtime-acceleration: download_if_newer uses strict >, never regresses to an older remote snapshot id; warns when remote is older than local. - runtime: SnapshotRefreshState::current_snapshot_id is now Arc<Mutex<Option<u64>>> rather than AtomicU64+sentinel-0, so snapshot id 0 (a valid first-snapshot id) is no longer indistinguishable from "no snapshot loaded". - build_snapshot_refresh_state: bootstrap_loaded_id is passed through directly without sentinel collapsing. - swappable: doc comments now describe poison-recovery behavior accurately. - New test: download_if_newer rejects regression to older remote id. Bugs surfaced via manual S3 testing: - build_snapshot_refresh_state was constructing SnapshotManager without a checkpointer factory, so refresh-time download_latest_snapshot failed with "Dataset checkpointer factory not set". Now wires the same factory used by the bootstrap path. - refresh_from_snapshot validated snapshot schema against the federated schema. For snapshot mode the federated source is intentionally not consulted at refresh time and may diverge (e.g. empty stub source). Now validates against the SwappableTableProvider's cached schema. - Schema validation was using strict arrow Schema equality, which rejected logically-identical schemas that differ only in engine-attached metadata or nullability flags. Replaced with schemas_compatible() that compares field count, ordered names, and data types only. Pool cache invalidation: - DuckDB and SQLite reload_from_snapshot now call the new upstream invalidate_file_instance / invalidate_instance APIs on the TableProviderFactory before rebuilding. Without this eviction the newly-built provider would reuse the prior pool's open connections (which keep observing the previously-opened file inode) and queries would return stale data even after the file is replaced on disk. - Cargo.toml bumped to consume the upstream PR (datafusion-contrib/datafusion-table-providers#635). Original architecture: A new refresh mode that, when paired with snapshots, drives accelerator data exclusively from snapshots in object storage. The federated source is never queried for refreshes; instead the runtime polls the snapshot store on a configurable cadence and reloads the accelerator only when a strictly newer snapshot is available. - spicepod / runtime: new RefreshMode::Snapshot enum variant. - runtime-acceleration: SnapshotDownloadInfo carries snapshot_id; new remote_current_snapshot_id() and download_if_newer(). - DataAccelerator trait: new supports_snapshot_reload() and reload_from_snapshot(source, previous_provider, factory). Implemented for DuckDB, SQLite, Cayenne, Turso. - New SwappableTableProvider wrapper for atomic provider swap. - AcceleratedTable / Refresher / RefreshTaskRunner / RefreshTask carry optional SnapshotRefreshState. RefreshTask::refresh_from_snapshot() short-circuits the federated fetch path. - create_accelerated_table validates configuration and auto-applies a default refresh_check_interval (1m) for snapshot mode. Manual end-to-end test against an S3 snapshot bucket confirms the full flow: writer creates snapshots, reader polls, detects newer ids, downloads atomically, evicts the cached pool, rebuilds the provider, swaps under SwappableTableProvider, and serves the new data on subsequent queries.

…eshes Address Copilot review comments and bugs found via manual testing against an S3 snapshot bucket: - runtime-acceleration: download_to_file now uses temp-file + atomic rename so concurrent readers in refresh_mode: snapshot never observe a partially written file (download is also fsync'd before rename for crash safety). Adds an explicit AlreadyExists fallback for older Windows configurations where MoveFileExW does not honor the REPLACE_EXISTING flag. - runtime-acceleration: download_if_newer uses strict >, never regresses to an older remote snapshot id; warns when remote is older than local. - runtime: SnapshotRefreshState::current_snapshot_id is now Arc<Mutex<Option<u64>>> rather than AtomicU64+sentinel-0, so snapshot id 0 (a valid first-snapshot id) is no longer indistinguishable from "no snapshot loaded". - build_snapshot_refresh_state: bootstrap_loaded_id is passed through directly without sentinel collapsing. - swappable: doc comments now describe poison-recovery behavior accurately. - New test: download_if_newer rejects regression to older remote id. Bugs surfaced via manual S3 testing: - build_snapshot_refresh_state was constructing SnapshotManager without a checkpointer factory, so refresh-time download_latest_snapshot failed with "Dataset checkpointer factory not set". Now wires the same factory used by the bootstrap path. - refresh_from_snapshot validated snapshot schema against the federated schema. For snapshot mode the federated source is intentionally not consulted at refresh time and may diverge (e.g. empty stub source). Now validates against the SwappableTableProvider's cached schema. - Schema validation was using strict arrow Schema equality, which rejected logically-identical schemas that differ only in engine-attached metadata or nullability flags. Replaced with schemas_compatible() that compares field count, ordered names, and data types only. Pool cache invalidation: - DuckDB and SQLite reload_from_snapshot now call the new upstream invalidate_file_instance / invalidate_instance APIs on the TableProviderFactory before rebuilding. Without this eviction the newly-built provider would reuse the prior pool's open connections (which keep observing the previously-opened file inode) and queries would return stale data even after the file is replaced on disk. - Cargo.toml bumped to consume the upstream PR (datafusion-contrib/datafusion-table-providers#635). Error context preservation: - build_snapshot_refresh_state no longer swallows FilePathError from get_acceleration_layout; a new SnapshotRefreshModeLayoutUnavailable variant carries the original error so snapshot-mode config issues surface with their root cause (which engine, which path). Original architecture: A new refresh mode that, when paired with snapshots, drives accelerator data exclusively from snapshots in object storage. The federated source is never queried for refreshes; instead the runtime polls the snapshot store on a configurable cadence and reloads the accelerator only when a strictly newer snapshot is available. - spicepod / runtime: new RefreshMode::Snapshot enum variant. - runtime-acceleration: SnapshotDownloadInfo carries snapshot_id; new remote_current_snapshot_id() and download_if_newer(). - DataAccelerator trait: new supports_snapshot_reload() and reload_from_snapshot(source, previous_provider, factory). Implemented for DuckDB, SQLite, Cayenne, Turso. Engines that opt out (e.g. TableModePartitionedDuckDB) are explicitly rejected at config validation with a clear error. - New SwappableTableProvider wrapper for atomic provider swap. - AcceleratedTable / Refresher / RefreshTaskRunner / RefreshTask carry optional SnapshotRefreshState. RefreshTask::refresh_from_snapshot() short-circuits the federated fetch path. - create_accelerated_table validates configuration and auto-applies a default refresh_check_interval (1m) for snapshot mode. Manual end-to-end test against an S3 snapshot bucket confirms the full flow: writer creates snapshots, reader polls, detects newer ids, downloads atomically, evicts the cached pool, rebuilds the provider, swaps under SwappableTableProvider, and serves the new data on subsequent queries.

…e invalidation DuckDBTableProviderFactory and SqliteTableProviderFactory cache pool instances by DbInstanceKey (file path or memory). Without an invalidation API, callers that legitimately need to swap the underlying database file out-of-band — for example, after restoring it from a snapshot in object storage — have no way to force the next provider build to open a fresh pool. Subsequent providers reuse the prior pool's already-open connections, which keep observing the previously-opened file inode rather than the replacement on disk. Add: - DuckDBTableProviderFactory::invalidate_instance(&DbInstanceKey) - DuckDBTableProviderFactory::invalidate_file_instance(path) - SqliteTableProviderFactory::invalidate_instance(&DbInstanceKey) - SqliteTableProviderFactory::invalidate_file_instance(path) Each removes the cached pool entry and returns it (so the caller can hold the prior pool alive for in-flight queries to drain) without disturbing other entries. Re-invalidating an absent key is a no-op. Tests cover: insertion, idempotent reuse, eviction, no-op missing-key, and re-population.

…eshes Address Copilot review comments and bugs found via manual testing against an S3 snapshot bucket: - runtime-acceleration: download_to_file now uses temp-file + atomic rename so concurrent readers in refresh_mode: snapshot never observe a partially written file (download is also fsync'd before rename for crash safety). Adds an explicit AlreadyExists fallback for older Windows configurations where MoveFileExW does not honor the REPLACE_EXISTING flag. - runtime-acceleration: download_if_newer uses strict >, never regresses to an older remote snapshot id; warns when remote is older than local. - runtime: SnapshotRefreshState::current_snapshot_id is now Arc<Mutex<Option<u64>>> rather than AtomicU64+sentinel-0, so snapshot id 0 (a valid first-snapshot id) is no longer indistinguishable from "no snapshot loaded". - build_snapshot_refresh_state: bootstrap_loaded_id is passed through directly without sentinel collapsing. - swappable: doc comments now describe poison-recovery behavior accurately. - New test: download_if_newer rejects regression to older remote id. Bugs surfaced via manual S3 testing: - build_snapshot_refresh_state was constructing SnapshotManager without a checkpointer factory, so refresh-time download_latest_snapshot failed with "Dataset checkpointer factory not set". Now wires the same factory used by the bootstrap path. - refresh_from_snapshot validated snapshot schema against the federated schema. For snapshot mode the federated source is intentionally not consulted at refresh time and may diverge (e.g. empty stub source). Now validates against the SwappableTableProvider's cached schema. - Schema validation was using strict arrow Schema equality, which rejected logically-identical schemas that differ only in engine-attached metadata or nullability flags. Replaced with schemas_compatible() that compares field count, ordered names, and data types only. Pool cache invalidation: - DuckDB and SQLite reload_from_snapshot now call the new upstream invalidate_file_instance / invalidate_instance APIs on the TableProviderFactory before rebuilding. Without this eviction the newly-built provider would reuse the prior pool's open connections (which keep observing the previously-opened file inode) and queries would return stale data even after the file is replaced on disk. - Cargo.toml bumped to consume the upstream PR (datafusion-contrib/datafusion-table-providers#635). Error context preservation: - build_snapshot_refresh_state no longer swallows FilePathError from get_acceleration_layout; a new SnapshotRefreshModeLayoutUnavailable variant carries the original error so snapshot-mode config issues surface with their root cause (which engine, which path). Original architecture: A new refresh mode that, when paired with snapshots, drives accelerator data exclusively from snapshots in object storage. The federated source is never queried for refreshes; instead the runtime polls the snapshot store on a configurable cadence and reloads the accelerator only when a strictly newer snapshot is available. - spicepod / runtime: new RefreshMode::Snapshot enum variant. - runtime-acceleration: SnapshotDownloadInfo carries snapshot_id; new remote_current_snapshot_id() and download_if_newer(). - DataAccelerator trait: new supports_snapshot_reload() and reload_from_snapshot(source, previous_provider, factory). Implemented for DuckDB, SQLite, Cayenne, Turso. Engines that opt out (e.g. TableModePartitionedDuckDB) are explicitly rejected at config validation with a clear error. - New SwappableTableProvider wrapper for atomic provider swap. - AcceleratedTable / Refresher / RefreshTaskRunner / RefreshTask carry optional SnapshotRefreshState. RefreshTask::refresh_from_snapshot() short-circuits the federated fetch path. - create_accelerated_table validates configuration and auto-applies a default refresh_check_interval (1m) for snapshot mode. Manual end-to-end test against an S3 snapshot bucket confirms the full flow: writer creates snapshots, reader polls, detects newer ids, downloads atomically, evicts the cached pool, rebuilds the provider, swaps under SwappableTableProvider, and serves the new data on subsequent queries.

…tore slice (stacked) (spiceai#10651) * fix(snapshot): flush SQLite/Turso WAL before snapshotting SQLite and Turso accelerators run with WAL journal mode. The pre-existing snapshot path performed `fs::copy(live_db, temp_copy)` directly, which captured only the durable pages — every uncheckpointed write (typically all writes since the last automatic checkpoint) lived only in the `-wal` sidecar and was silently lost in the upload. This bug was invisible under `refresh_mode: full|append|changes` because the federated source repopulates on bootstrap. It surfaces as data loss with `refresh_mode: snapshot` (separate change) where the snapshot is the only data source. Fixes by: 1. Adding `SnapshotEngine::checkpoint_live` (default no-op) — invoked by `SnapshotManager::create_file_snapshot` *before* `fs::copy` while the accelerator write lock is held. 2. New `SqliteSnapshotEngine` overrides `checkpoint_live` to run `PRAGMA wal_checkpoint(TRUNCATE)` against the live file via a short-lived rusqlite connection. WAL mode allows multiple connections; the existing accelerator write lock guarantees no concurrent writers race the checkpoint. 3. `SqliteSnapshotEngine::prepare_for_upload` switches the *copied* file to `journal_mode=DELETE` as defense-in-depth: even if the copy ever ended up with stale `-wal`/`-shm` sidecars next to it, the uploaded file is guaranteed to be self-contained. 4. New `TursoSnapshotEngine` reuses the SqliteSnapshotEngine logic (libsql is on-disk-compatible with SQLite). The `turso` feature now pulls in rusqlite for this purpose. 5. `create_snapshot_engine` now returns the engine-specific impls for `AccelerationEngine::Sqlite` and `::Turso` instead of `DefaultSnapshotEngine`. Closes spiceai#10643 * feat(acceleration): add refresh_mode: snapshot for snapshot-only refreshes Adds a new refresh mode that loads accelerated tables only by polling the snapshot store for newer snapshots, never querying the federated source. Use cases: read replicas, expensive/slow sources, air-gapped propagation. Behavior: * `RefreshMode::Snapshot` is a new variant accepted in spicepod `acceleration.refresh_mode` (and is rejected by the validator unless `acceleration.snapshots` is enabled). * On startup, the dataset bootstraps from the snapshot store as today. * The refresh task polls the snapshot store on a configurable interval (default 60 s) via the new `SnapshotManager::download_if_newer(current_local_id, validate_schema)` API. The schema validator runs against the snapshot's `metadata.json` *before* download so a mismatched snapshot can never overwrite the primary file. * Per-engine reload is performed via a new `Accelerator::reload_from_snapshot` trait method. DuckDB/SQLite/ Turso impls evict pool connections and reopen at the same primary path; the live federated query path serves the prior inode until the swap completes via `SwappableTableProvider` (RwLock-backed `Arc<dyn TableProvider>` that caches schema/constraints/table_type). * `INSERT INTO <dataset>` is rejected with a clear error when the accelerator's refresh_mode is Snapshot. * Reload + swap is serialized against concurrent accelerator writes via the existing accelerator write mutex. * Atomic file write on download (temp + fsync + rename + parent fsync). Integration tests: * `snapshot_refresh::duckdb` — bootstrap → writer change → reader picks up new snapshot → swap → query reflects change. * `snapshot_refresh::sqlite` — same shape; passes thanks to the SQLite WAL flush in the parent commit. * `snapshot_refresh::turso` — same shape; same WAL flush dependency. CI: new `Snapshot Refresh Mode Integration Tests` job in integration.yml runs DuckDB/SQLite/Turso variants against MinIO. Cayenne is intentionally not covered in this commit: portable Cayenne snapshots require a per-dataset metastore export/import refactor (addressed in the next commit). The snapshot_refresh test directory omits Cayenne until then. Depends on the SQLite/Turso WAL flush from the previous commit; without it the SQLite/Turso integration tests cannot pass. Pulls in the upstream pool-eviction APIs from datafusion-contrib/datafusion-table-providers#635 (`invalidate_instance`, `invalidate_file_instance`). * feat(cayenne): add per-dataset metastore export/import (snapshot foundation) Adds the foundation for portable, per-dataset Cayenne metastore snapshots: a versioned-JSON 'slice' format that captures only one dataset's rows from cayenne_table and its dependent tables, with path columns rewritten to be relative to a configurable anchor. Why: The legacy Cayenne snapshot format archived the entire cayenne.db SQLite file. That approach forced a one-dataset-per-metadata-directory limit (see validate_cayenne_snapshot_consistency) because two snapshots would clobber each other's metastore rows on extract, and produced snapshots that were not portable across nodes whose data directories did not match the writer's absolute paths (spiceai#10642). It also raced with the reader process's eager metastore initialization (spiceai#10649): SQLite journal sidecars created during init triggered checksum mismatches against the archive on extract. This commit lays the groundwork to fix all three issues by introducing: * `crates/cayenne/src/metastore/snapshot.rs` — the slice format and export/import logic: - `DatasetMetastoreSlice` (versioned JSON, format_version: 1) with one entry per metastore table from EXPECTED_TABLES. - `SliceValue` mirrors MetastoreValue but JSON-friendly (blobs base64-encoded under the 'x' tag). - `export_dataset(metastore, dataset_name, anchor)` selects every row that belongs to the dataset (the cayenne_table row plus all rows in dependent tables matching the same table_id) and emits a slice. Path columns (cayenne_table.path, cayenne_delete_file.path, cayenne_partition.path) are rewritten relative to `anchor` so the slice contains no absolute filesystem paths. - `import_dataset(metastore, slice, anchor)` runs inside a single transaction: deletes any existing rows for the same dataset_name (FK ON DELETE CASCADE removes all dependent rows), then inserts the slice's rows with paths rewritten back to absolute form anchored at `anchor`. Unsupported format_version or engine mismatch are rejected up front. * `MetastoreRow::get_value(index) -> MetastoreValue` — a new trait method that returns the raw column value without type coercion, so generic export logic can serialize columns without knowing each column's expected type at compile time. SQLite and Turso impls add a one-line implementation that clones from their existing values Vec. Tests in `crates/cayenne/src/metastore/snapshot.rs::tests`: * `round_trip_preserves_rows_and_relocates_paths` — exports a dataset from one metastore, imports into a fresh metastore at a different anchor, verifies all partitions resolve to the new anchor. * `import_replaces_prior_dataset_rows_wholesale` — confirms the DuckDB-style snapshot-refresh semantic that import wipes any prior rows for the same dataset_name before inserting (no leftover partitions). * `import_leaves_other_datasets_untouched` — confirms slicing is correctly scoped: importing dataset A does not affect dataset B's rows in the same shared metastore. * `rejects_unsupported_format_version` — both wrong format_version and wrong engine identifier are rejected with clear error messages *before* any DML runs. * `json_round_trip` — slice serializes and parses back to an equivalent slice. Follow-up (separate commit / PR): wire `CayenneSnapshotEngine` into `SnapshotManager`'s directory create/extract paths so the new format is actually used by snapshot upload/download. With that follow-up: * the cayenne.db file leaves the archive (replaces spiceai#10649), * paths become portable (closes spiceai#10642), * the single-dataset-per-metadata-dir validation can be lifted, * Cayenne refresh_mode: snapshot becomes a passing integration test. * feat(cayenne): wire CayenneSnapshotEngine into snapshot pipeline This is Commit 4 of the refresh_mode: snapshot stack. Commit 3 added the metastore export/import API; this commit wires it into the SnapshotManager upload/download pipeline so Cayenne snapshots actually use the per-dataset slice format instead of shipping raw cayenne.db. Closes: - spiceai#10642 (Cayenne snapshots not portable across data dirs) - spiceai#10649 (Cayenne metastore-init races with checksum) Architecture: * Trait extension in runtime-acceleration::snapshot::engine: Adds two new SnapshotEngine methods, each with a no-op default so DuckDB/SQLite/Turso continue to behave exactly as before: - prepare_directory_snapshot(dirs, dataset_name) -> DirectorySnapshotPlan Returns (a) a list of file paths *relative to each source dir* that should be excluded from the tar, and (b) extra in-memory entries to append at the end of the tar. - finalize_directory_snapshot(dirs, dataset_name, extras) Engine-specific post-extract hook. Plus a new SnapshotEngineError::Custom { message } variant and a SnapshotEngineError::from_display() constructor so engines defined outside runtime-acceleration (CayenneSnapshotEngine in the runtime crate) can surface their own rich errors without needing a feature-gated variant in the trait crate. * directory_archive: new archive_directories_with_plan() that takes skip_relative_paths + extras. The original archive_directories() is preserved as a thin wrapper. The new add_directory_to_archive_filtered helper handles the recursive walk while honoring the skip set. The module remains engine-agnostic — the skip predicate is just a HashSet passed in. * SnapshotManager: - create_directory_snapshot now calls prepare_directory_snapshot first, then archive_directories_with_plan with the engine's skip list and extras. - download_to_directories now calls finalize_directory_snapshot after extract. - New with_snapshot_engine(self, Arc<dyn SnapshotEngine>) builder so accelerators can inject a custom engine. * download_snapshot_if_needed and snapshot_before_recreate gain an Option<Arc<dyn SnapshotEngine>> parameter. DuckDB/SQLite/Turso pass None (default behavior). Cayenne builds and passes a CayenneSnapshotEngine. * Cayenne accelerator (crates/runtime/src/dataaccelerator/cayenne): - New module snapshot_engine.rs implements CayenneSnapshotEngine. Holds an Arc<dyn MetadataCatalog>, dataset_name, and data_dir_anchor. prepare_directory_snapshot exports the per-dataset slice via MetadataCatalog::export_dataset_slice and instructs the archiver to skip cayenne.db / -wal / -shm. finalize_directory_snapshot reads the slice JSON back from the extracted directory and calls MetadataCatalog::import_dataset_slice. - The cayenne accelerator's bootstrap path constructs a CayenneSnapshotEngine using its existing get_or_create_catalog machinery and threads it through download_snapshot_if_needed. * cayenne::MetadataCatalog: new export_dataset_slice and import_dataset_slice trait methods (default impls return InvalidOperationNoSource); CayenneCatalog overrides both with dispatch to the underlying SQLite or libsql metastore. * Lifted validate_cayenne_snapshot_consistency's SharedAcceleration restriction. With per-dataset slices, multiple Cayenne datasets sharing a metastore directory can each ship their own snapshot without clobbering each other's metastore rows on extract. The InconsistentSnapshotSettings (mixed enabled/disabled) check stays. * Re-enabled the Cayenne snapshot_refresh integration test (snapshot_refresh::cayenne::snapshot_refresh_cayenne_bootstrap_then_refresh). Tests: - 3 new unit tests in cayenne::snapshot_engine::tests: * create_directory_snapshot_skips_cayenne_db_and_emits_slice * refuses_mismatched_dataset * finalize_missing_slice_returns_clear_error - validate_snapshots tests updated; the test_cayenne_shared_acceleration_with_snapshots_errors test is renamed to ..._now_supported and asserts Ok. - Existing 91 runtime-acceleration::snapshot lib tests still pass. - Cayenne integration test runs against real S3 in CI (same harness as the DuckDB/SQLite/Turso refresh-mode tests). * fix(snapshot): bootstrap dataset checkpoint from snapshot metadata When a downloaded snapshot doesn't carry a populated _dataset_checkpoint row (the steady state for Cayenne, where the on-disk archive ships the per-dataset metastore slice JSON instead of the raw cayenne.db), the post-extract Checkpointer::get_schema() returns None and bootstrap was failing with MissingSchema. The metadata.json carries the dataset's verified schema; after import we now materialize that schema into the local checkpoint via Checkpointer::checkpoint(metadata_schema, None) so the spice_sys side matches the snapshot. For DuckDB / SQLite / Turso this branch is unreachable in steady state (their archives ship _dataset_checkpoint already); on a corrupted snapshot the self-heal is harmless because the metadata schema is the same one we'd otherwise have validated against. This unblocks refresh_mode: snapshot for Cayenne and re-enables the snapshot_refresh::cayenne integration test by default. Closes spiceai#10658.

phillipleblanc mentioned this pull request Apr 28, 2026

feat(acceleration): add refresh_mode: snapshot for snapshot-only refreshes spiceai/spiceai#10565

Closed

phillipleblanc force-pushed the phillip/invalidate-instance branch from 6bf50e6 to 577d85f Compare April 28, 2026 13:38

phillipleblanc changed the base branch from main to spiceai-52 April 29, 2026 00:54

phillipleblanc marked this pull request as draft April 30, 2026 01:04

phillipleblanc force-pushed the phillip/invalidate-instance branch from 577d85f to e2ecafa Compare April 30, 2026 08:58

phillipleblanc force-pushed the phillip/invalidate-instance branch from e2ecafa to 346381c Compare April 30, 2026 15:26

phillipleblanc force-pushed the phillip/invalidate-instance branch from 346381c to 96482d2 Compare April 30, 2026 15:27

phillipleblanc mentioned this pull request May 3, 2026

feat: refresh_mode: snapshot + SQLite/Turso WAL flush + Cayenne metastore slice (stacked) spiceai/spiceai#10651

Merged

phillipleblanc self-assigned this May 3, 2026

phillipleblanc marked this pull request as ready for review May 3, 2026 12:29

Merge branch 'spiceai-52' into phillip/invalidate-instance

53cc919

sgrebnov approved these changes May 3, 2026

View reviewed changes

phillipleblanc merged commit 97ecd00 into spiceai-52 May 3, 2026
12 checks passed

phillipleblanc deleted the phillip/invalidate-instance branch May 3, 2026 14:58

Copilot AI mentioned this pull request May 5, 2026

Fix federation pushing denied functions inside subqueries to remote engines spiceai/spiceai#10692

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(duckdb,sqlite): add invalidate_instance for snapshot-driven cache invalidation#635

feat(duckdb,sqlite): add invalidate_instance for snapshot-driven cache invalidation#635
phillipleblanc merged 2 commits intospiceai-52from
phillip/invalidate-instance

phillipleblanc commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phillipleblanc commented Apr 28, 2026

Summary

Motivation

Tests

Note on backporting

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants