fix(runners): critical correctness bugs in runner registration flow#3916
Draft
cursor[bot] wants to merge 3 commits into
Draft
fix(runners): critical correctness bugs in runner registration flow#3916cursor[bot] wants to merge 3 commits into
cursor[bot] wants to merge 3 commits into
Conversation
ResetRunnerRegistration in the Bolt store cleared the auth token but left Active=true, unlike the SQL implementation. The task scheduler kept assigning work to a runner that could no longer authenticate, leaving tasks stuck in starting status indefinitely. Co-authored-by: Denis Gukov <fiftin@outlook.com>
The unregister client omitted the X-Runner-Token header required by RunnerMiddleware, so every DELETE /api/internal/runners request returned 401 and the runner was never removed from the server. Co-authored-by: Denis Gukov <fiftin@outlook.com>
RegisterRunner used a read-then-update pattern without guarding the UPDATE, so concurrent registration requests with the same one-time token could both succeed and leave one registrant with an invalid auth token. Co-authored-by: Denis Gukov <fiftin@outlook.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated bug scan of recent runner registration commits (
b5dc8c7b) found three critical correctness issues. This PR applies minimal, targeted fixes with tests.Bug 1: Bolt runner stays active after registration reset
Impact: Tasks stuck in
startingstatus indefinitely on Bolt/SQLite deployments.Root cause:
ResetRunnerRegistrationindb/bolt/global_runner.gocleared the auth token but did not setActive=false, unlike the SQL implementation. The task scheduler continued assigning work to a runner that could no longer authenticate.Fix: Set
runner.Active = falsein the Bolt reset path. AddedTest_ResetRunnerRegistration_DeactivatesRunnerand enabled the previously commented Active assertion inrunner_svc_test.go.Trigger: Admin regenerates a registration token for an already-registered, active runner while using Bolt storage.
Bug 2: Runner unregister always fails with 401
Impact:
semaphore runner unregisternever removes the runner from the server; local token file is not cleaned up on auth failure.Root cause:
JobPool.Unregister()sentDELETE /api/internal/runnerswithout theX-Runner-Tokenheader required byRunnerMiddleware.Fix: Add
req.Header.Set("X-Runner-Token", util.Config.Runner.Token)before the DELETE request.Bug 3: SQL one-time registration token race
Impact: Concurrent registration with the same one-time token could leave one registrant with an invalid auth token.
Root cause:
RegisterRunnerused SELECT → validate → UPDATE without an atomic consume guard on the registration token column.Fix: Include
registration_token=?in the UPDATE WHERE clause and fail if zero rows are affected.Validation
go test ./db/bolt/... ./services/server/...— all pass, including newTest_ResetRunnerRegistration_DeactivatesRunner