Fix worktree index seeding by Ismael · Pull Request #170 · ory/lumen

Ismael · 2026-05-30T14:35:31Z

Working in a fresh git worktree of an already-indexed repo re-indexed everything from scratch instead of reusing the parent's embeddings. While fixing that, a cross-process race in the seeding path also surfaced. This PR addresses both.

Fixes #169

1. lumen index never copied a sibling worktree's index

Seeding an existing sibling-worktree index into a fresh worktree only happened in the MCP server's getOrCreate. But lumen index created the DB and re-indexed every file from scratch, never consulting FindDonorIndex/SeedFromDonor. Since that background process almost always wins the race to create the DB, getOrCreate's own seeding was then skipped (the DB already existed), so a worktree whose parent was fully indexed still paid the entire embedding cost again.

Fix: runIndexer now seeds from a donor when the DB doesn't yet exist — under the index lock it already holds, so only one indexer seeds — then lets the normal incremental EnsureFresh re-index only the changed files.

2. MCP seeding raced the background indexer

getOrCreate seeded and created a new index DB without holding the index flock that lumen index acquires. So the MCP server and the SessionStart-spawned background indexer could create the same fresh worktree DB concurrently: both run SeedFromDonor through a shared temp path and rename it over the DB (corrupting the SQLite file), or one creates an empty DB that makes the other skip seeding and re-index from scratch. The window is narrow — only the first index of a worktree — but the SessionStart directive that tells the agent to search first makes it reachable, and fix #1 (making lumen index seed too) widened the seed-vs-seed collision.

Fix: getOrCreate now acquires the same flock before seeding. When it wins, it seeds under the lock; when a peer holds it, it waits briefly (bounded by createWaitTimeout) for the peer to publish the DB so NewIndexer opens the seeded copy instead of creating an empty one that clobbers the seed.

Testing

TestSeedFromDonorIfNew_SeedsWorktreeFromSibling — a fresh worktree with an already-indexed sibling gets seeded rather than rebuilt.
TestGetOrCreate_SkipsSeedWhenIndexLockHeld — with the index lock held (simulating the background indexer), getOrCreate does not run SeedFromDonor.

Summary by CodeRabbit

Release Notes

New Features
- Index databases can now be seeded from existing sibling worktrees to accelerate initialization.
- Added cross-process coordination to prevent database corruption during concurrent index creation.
Tests
- Added tests for worktree index seeding behavior.
- Added test to verify race condition prevention during simultaneous index creation.

The lumen index command (the background indexer spawned at SessionStart) created the DB and re-indexed every file from scratch instead of copying an existing sibling-worktree index — only the MCP getOrCreate seeded, and it lost the race to create the DB. runIndexer now seeds from a donor, under the index lock it already holds, before indexing. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

getOrCreate seeded and created a new DB without the index flock that lumen index holds, so both could run SeedFromDonor against the same fresh worktree DB concurrently and corrupt it. getOrCreate now takes the same lock to seed; when a peer holds it, it waits briefly for the peer to publish the DB instead of creating an empty one that clobbers the seed. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

CLAassistant · 2026-05-30T14:35:39Z

All committers have signed the CLA.

coderabbitai · 2026-05-30T14:35:42Z

📝 Walkthrough

Walkthrough

The PR adds cross-process coordination for creating index databases in sibling worktrees, preventing race conditions between main and background indexer processes. Background indexer seeds from a donor worktree under lock; main process acquires the same lock to seed or defers by waiting for peer.

Changes

Cross-process Seeding Coordination

Layer / File(s)	Summary
Donor-based seeding and background indexer integration `cmd/seed.go`, `cmd/index.go`, `cmd/seed_test.go`	New `seedFromDonorIfNew` helper detects missing index databases and seeds them from a donor worktree via `config.FindDonorIndex` and `index.SeedFromDonor`, with warnings on failure. Background indexer in `runIndexer` invokes this helper under the index lock before DB initialization. Test validates seeding of sibling worktrees and metadata propagation from donor.
Main process lock-based coordination with deconfliction `cmd/stdio.go`, `cmd/stdio_seedrace_test.go`	Lock-based coordination refactor in `getOrCreate`: lock owner seeds from donor (via extracted `seedFromDonor` helper), non-owner waits for peer via new `waitForDB` polling (bounded by configurable timeout). New `defaultCreateWaitTimeout`, `createWaitPollInterval` constants and `getCreateWaitTimeout()` accessor control deconfliction timeouts. Test verifies seeding is skipped when index lock is held by peer, preventing concurrent SQLite operations.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR addresses two focused fixes (seeding under index lock in runIndexer and getOrCreate), but the linked issue `#1` requests major refactoring of high-complexity functions across multiple files to reduce cyclomatic complexity, which is not implemented in these changes.	The PR should include the extraction of helper functions from high-complexity functions in cmd/stdio.go, cmd/index.go, internal/index/split.go, and internal/chunker/goast.go as specified in issue `#1` to satisfy the linked requirements.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title "Fix worktree index seeding" directly summarizes the main change - addressing two critical issues in how worktree index databases are seeded from donor worktrees, which is the primary focus of this PR.
Out of Scope Changes check	✅ Passed	All changes in this PR are narrowly scoped to fixing worktree index seeding issues via new seeding functions and cross-process coordination logic, which aligns with the two specific problems described in the PR objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

cmd/stdio.go (1)
536-542: ⚡ Quick win

Distinguish "lock held by peer" from a TryAcquire error.

The default branch fires both when the peer holds the lock (lockErr == nil && lk == nil) and when TryAcquire actually errored (lockErr != nil). In the error case the code silently waits the full createWaitTimeout for a DB that may never be published, then creates an empty one — and the underlying lock error is dropped. At minimum log lockErr so flock failures are diagnosable.
♻️ Log the lock-acquire error before falling back to waiting
 		default:
+			if lockErr != nil {
+				ic.logger().Warn("acquire index lock for seeding failed; waiting for peer DB",
+					"db_path", dbPath, "error", lockErr)
+			}
 			// A background indexer holds the lock and is creating + seeding the
 			// DB. Wait briefly for it to publish the file so NewIndexer opens
 			// the seeded copy instead of creating an empty DB that would clobber
 			// the seed.
 			ic.waitForDB(dbPath)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/stdio.go` around lines 536 - 542, The default branch conflates "lock held
by peer" and genuine TryAcquire failures; before calling ic.waitForDB(dbPath),
detect if lockErr != nil and log the error (including dbPath and lockErr) using
the component's logger so TryAcquire failures are visible; keep the existing
fallback wait behavior (ic.waitForDB) for the peer-held case but ensure
TryAcquire errors are not silently dropped by emitting a clear log entry
referencing TryAcquire, lockErr, dbPath, and the index controller (ic) context.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cmd/stdio.go`:
- Around line 536-542: The default branch conflates "lock held by peer" and
genuine TryAcquire failures; before calling ic.waitForDB(dbPath), detect if
lockErr != nil and log the error (including dbPath and lockErr) using the
component's logger so TryAcquire failures are visible; keep the existing
fallback wait behavior (ic.waitForDB) for the peer-held case but ensure
TryAcquire errors are not silently dropped by emitting a clear log entry
referencing TryAcquire, lockErr, dbPath, and the index controller (ic) context.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 741905fc-0864-4ecb-ac6d-cff7d6747063

📥 Commits

Reviewing files that changed from the base of the PR and between d0dee0e and 3e258e6.

📒 Files selected for processing (5)

cmd/index.go
cmd/seed.go
cmd/seed_test.go
cmd/stdio.go
cmd/stdio_seedrace_test.go

Ismael · 2026-05-30T15:07:54Z

@aeneasr Do you want me to address the refactoring coderabbitai pointed out? Seems too broad a change for this PR.

Ismael and others added 2 commits May 30, 2026 11:27

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Ismael mentioned this pull request May 30, 2026

Feat add status cli command #171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix worktree index seeding#170

Fix worktree index seeding#170
Ismael wants to merge 2 commits into
ory:mainfrom
Ismael:fix-worktree

Ismael commented May 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

CLAassistant commented May 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Ismael commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ismael commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. lumen index never copied a sibling worktree's index

2. MCP seeding raced the background indexer

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

CLAassistant commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Ismael commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ismael commented May 30, 2026 •

edited by coderabbitai Bot

Loading

CLAassistant commented May 30, 2026 •

edited

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading