Skip to content

Recognize mcp-go authorization-required sentinels as auth#5225

Open
lorr1 wants to merge 3 commits intomainfrom
auth-health-check
Open

Recognize mcp-go authorization-required sentinels as auth#5225
lorr1 wants to merge 3 commits intomainfrom
auth-health-check

Conversation

@lorr1
Copy link
Copy Markdown
Contributor

@lorr1 lorr1 commented May 7, 2026

Closes #5223

Summary

The vMCP gateway emits a backend remains unhealthy WARN every 30s for
upstreamInject backends that are functionally fine. North hit 2780+
consecutive "failures" on github-copilot-entry — pure log noise, since
the backend is reachable and only fails because the background probe
carries no user credentials.

Root cause: mcp-go v0.49.0 added two sentinels for 401 responses
(transport.ErrAuthorizationRequired and
transport.ErrOAuthAuthorizationRequired). wrapBackendError did not
check for either, so the error fell through to ErrBackendUnavailable
and the health monitor categorized the probe as BackendUnhealthy. The
auth-aware fix from #4935 ("treat 401/403 from auth-configured backends
as healthy") therefore never engaged.

  • Add typed errors.Is checks for both new mcp-go sentinels in
    wrapBackendError so production probes through BackendClient are
    correctly mapped to ErrAuthenticationFailed.
  • Add "authorization required" to IsAuthenticationError as a
    fallback for paths where the typed sentinel is not preserved through
    the error chain (covers the integration-test path where a mock
    bypasses wrapBackendError, and any future Unwrap breakage).

With both in place, North’s upstream_inject backend reports
BackendHealthy (probe gets a nil error), the consecutive_failures
counter stays at 0, and the WARN spam stops. Misconfigured backends
(no outgoing auth strategy) continue to surface as
BackendUnauthenticated so operators see the diagnostic signal.

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Five new test sub-cases were added in a separate red commit before the
fix to lock in the regression:

  • TestWrapBackendError: three sub-cases — ErrAuthorizationRequired,
    the wrapped production chain
    (transport.Error*AuthorizationRequiredError), and
    ErrOAuthAuthorizationRequired — each must map to
    ErrAuthenticationFailed.
  • TestHealthChecker_CheckHealth_AuthErrorsCategorizesAsUnauthenticated:
    the raw mcp-go chain reaching the checker classifies as
    BackendUnauthenticated when no outgoing auth strategy is configured.
  • TestHealthChecker_CheckHealth_AuthErrorWithOutgoingAuthIsHealthy:
    upstream_inject + the raw mcp-go chain (Norths exact reproducer)
    classifies as BackendHealthy with nil error so the monitor
    records a successful check.

All five sub-cases were red on main (verified before commit 1) and
green after the fix.

Does this introduce a user-facing change?

WARN spam (backend remains unhealthy) for upstreamInject backends
that respond with 401/authorization required to no-credential probes
will stop. Backends previously misclassified as BackendUnhealthy
under these conditions will now be reported as BackendHealthy, and
the consecutive_failures counter will reset. No CRD or API changes.

Generated with Claude Code

lorr1 added 2 commits May 7, 2026 16:54
Add five test sub-cases that exercise the mcp-go authorization-required
sentinel chain. These fail on main: wrapBackendError does not recognize
the new transport.ErrAuthorizationRequired or ErrOAuthAuthorizationRequired
sentinels, so the error falls through to ErrBackendUnavailable and the
health monitor categorizes the probe as BackendUnhealthy.

Cases added:
- TestWrapBackendError: ErrAuthorizationRequired, the wrapped production
  chain (transport.Error + AuthorizationRequiredError), and
  ErrOAuthAuthorizationRequired must each map to ErrAuthenticationFailed.
- TestHealthChecker_CheckHealth_AuthErrorsCategorizesAsUnauthenticated:
  the raw mcp-go chain reaching the checker must classify as
  BackendUnauthenticated when no outgoing auth strategy is configured.
- TestHealthChecker_CheckHealth_AuthErrorWithOutgoingAuthIsHealthy:
  upstream_inject + the raw mcp-go chain (North's exact reproducer)
  must classify as BackendHealthy with nil error so the monitor
  records a successful check and the WARN spam stops.
mcp-go v0.49.0 introduced transport.ErrAuthorizationRequired (and the
companion transport.ErrOAuthAuthorizationRequired from the OAuth-handler
path) as sentinels for 401 responses. wrapBackendError did not check for
either, so the error fell through to ErrBackendUnavailable and the vMCP
health monitor categorized auth-protected backends (e.g. GitHub Copilot
behind upstreamInject) as BackendUnhealthy on every 30s probe, producing
WARN log spam and an ever-climbing consecutive_failures counter.

Two-part fix:

1. wrapBackendError now matches both new sentinels via errors.Is and
   wraps them as ErrAuthenticationFailed. This is the principled fix
   for the production code path through BackendClient.

2. IsAuthenticationError now matches the substring "authorization
   required" as a fallback for paths where the typed sentinel is not
   preserved through the error chain (e.g. tests using a mock
   BackendClient, or future error-chain breakage in mcp-go's Unwrap).

With both in place, the existing auth-aware classification logic (#4935)
engages: backends with an outgoing auth strategy configured report
BackendHealthy on auth challenge (the expected response to a
no-credential probe), and misconfigured backends correctly surface
BackendUnauthenticated.

Closes #5223
@github-actions github-actions Bot added the size/XS Extra small PR: < 100 lines changed label May 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.88%. Comparing base (8d9eb28) to head (a785fad).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5225   +/-   ##
=======================================
  Coverage   67.87%   67.88%           
=======================================
  Files         610      610           
  Lines       62383    62389    +6     
=======================================
+ Hits        42340    42350   +10     
+ Misses      16871    16864    -7     
- Partials     3172     3175    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

R3 + R4 (test matrix gap on outgoing-auth strategies):
TestHealthChecker_CheckHealth_AuthErrorWithOutgoingAuthIsHealthy
previously covered the new mcp-go authorization-required chain only
against upstream_inject. Add token_exchange and header_injection rows
so the matrix pins all three strategies that route through the same
authErrorStatus branch.

R5 (negative test for substring over-classification):
TestIsAuthenticationError gains a row for a benign validation message
containing the words 'authorization' and 'required' separated by
punctuation ("field 'authorization' required"). This pins that the
current matcher's contiguous "authorization required" substring does
not over-classify, so a future loosening of the matcher (e.g.
whitespace-tolerant) would not silently regress.

R7 (test names should describe assertions, not tickets):
Renamed the issue-numbered cases to describe the chain shape
("transport.Error wrapping AuthorizationRequiredError"). The issue
reference remains in the case comment.

Note on R4 (StrategyTypeUnauthenticated against the new chain):
TestCategorizeError already pins the misconfig branch at the unit
level for sentinel-shaped errors. Extending the integration table
would require restructuring it to vary auth_config per row, which is
out of scope for this PR; the unit-level coverage is sufficient.
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vmcp/health: "authorization required" from upstream MCP not categorized as auth error → WARN spam

1 participant