Skip to content

fix(state/txclient): close estimator conn on failure and wait for Ready#5073

Open
neyy91 wants to merge 1 commit into
celestiaorg:mainfrom
neyy91:fix/txclient-estimator-conn-leak
Open

fix(state/txclient): close estimator conn on failure and wait for Ready#5073
neyy91 wants to merge 1 commit into
celestiaorg:mainfrom
neyy91:fix/txclient-estimator-conn-leak

Conversation

@neyy91

@neyy91 neyy91 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Overview

Fixes two related defects in setupEstimatorConnection:

  1. gRPC connection leak - the conn was created and Connect()-ed but returned without Close() on the failure path. Since setupClient retries on each submit while the client is uninitialized, an unreachable estimator leaked one conn (resolver/balancer goroutines + transport) per attempt.

  2. Ready was never awaited - WaitForStateChange(ctx, Ready) waits for transition away from the given state. A fresh conn is in Idle/Connecting, so the call returned immediately and the function handed back a non-connected conn with a nil error, ignoring the deadline.

Changes

  • Close the conn on the failure path.
  • Add waitForReady, which loops over the current state until Ready or the context is done.
  • Tests: happy path, error on unreachable endpoint (honors deadline), and a goroutine-count guard against the conn leak.

Closes #5072

Testing

  • go test ./state/txclient/ -race - green.
  • Verified the new tests fail on the pre-fix logic (they catch the bug).

setupEstimatorConnection created a *grpc.ClientConn, called Connect, then
returned an error without closing it when the endpoint was unreachable.
setupClient retries on each submit while the client is uninitialized, so an
unreachable estimator leaked one conn (resolver/balancer goroutines and
transport) per attempt.

Additionally, WaitForStateChange(ctx, Ready) waits for the connection to
transition away from the given state, not into it. A fresh conn starts in
Idle/Connecting, so the call returned immediately and never actually waited
for Ready, returning a non-connected conn with a nil error and ignoring the
context deadline.

Close the conn on the failure path and add waitForReady, which loops over the
current state until Ready or the context is done.
@neyy91 neyy91 requested a review from a team as a code owner June 21, 2026 03:04
@neyy91 neyy91 requested a review from evan-forbes June 21, 2026 03:04
@github-actions github-actions Bot added the external Issues created by non node team members label Jun 21, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.62%. Comparing base (2469e7a) to head (55b8db1).
⚠️ Report is 826 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5073      +/-   ##
==========================================
- Coverage   44.83%   36.62%   -8.21%     
==========================================
  Files         265      311      +46     
  Lines       14620    21506    +6886     
==========================================
+ Hits         6555     7877    +1322     
- Misses       7313    12622    +5309     
- Partials      752     1007     +255     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external Issues created by non node team members

Projects

None yet

Development

Successfully merging this pull request may close these issues.

state/txclient: estimator gRPC connection leaks on failure and Ready is never awaited

2 participants