Wire identityFromToken into the OAuth2 upstream provider#5222
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5222 +/- ##
==========================================
- Coverage 68.02% 67.97% -0.05%
==========================================
Files 615 615
Lines 62960 62998 +38
==========================================
- Hits 42829 42824 -5
- Misses 16929 16973 +44
+ Partials 3202 3201 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-agent review — /dev:pr-review
Reviewed by 6 specialist agents (security, concurrency, error-handling, test-coverage, docs-and-api, code-quality). Codex cross-review skipped (CLI not installed).
Recommendation: COMMENT — no blocking issues. The PR's design is sound, the priority chain is correct and well-tested, and the architecture doc is updated alongside the code. Findings below are quality improvements (doc accuracy, test rigor, log consistency) rather than correctness blockers.
Consensus findings (11 surviving from ~80 raw)
| # | Finding | Severity | Score |
|---|---|---|---|
| F1 | Rewriter mutable state lacks single-use / no-concurrent-use doc | MEDIUM | 9 |
| F2 | CRD type forward-reference points to v1alpha1 but sibling types live in v1beta1 |
MEDIUM | 9 |
| F3 | Refresh-path test reconstructs the helper call instead of observing RefreshTokens behavior |
MEDIUM | 8 |
| F4 | TestTokenResponseRewriter_IdentityFromRawBody doesn't exercise the ordering invariant it claims |
MEDIUM | 8 |
| F5 | Constructor WARN "using synthesis mode" still fires when IdentityFromToken is configured |
MEDIUM | 8 |
| F6 | OIDC defensive WARN is currently unreachable and would be misleading if it fired | MEDIUM | 8 |
| F7 | @upstreamjwt trust-boundary deserves an explicit unit test pinning unsigned-payload behavior |
LOW | 7 |
| F8 | "Unreachable in practice" comment overstates an oauth2-library-dependent invariant | LOW | 7 |
| F9 | INFO→DEBUG downgrade is asymmetric: RefreshTokens still uses slog.Info |
LOW | 7 |
| F10 | Happy-path test does not assert identity.Synthetic == false |
LOW | 7 |
| F11 | TestWrapHTTPClientWithMapping_NilMapping retains stale name after helper rename |
LOW | 7 |
Holistic assessment
Strengths verified across multiple agents:
- Pre-rewrite extraction ordering means
SubjectPathalways resolves against the original provider response, immune to futurerewriteTokenResponsechanges. - Information-disclosure guard is real and tested: extraction errors carry path name + static type description only, never body bytes (regression test in place).
sha256("")collision protection preserved through the new priority chain.- No downgrade attack on the refresh path: identity is anchored at auth-code time and
RefreshTokensconsumes the cachedUserIDfrom storage. - Error wrapping chain holds through
errors.Is(err, ErrIdentityResolutionFailed)for both extraction-failure and userInfo-failure branches. Synthetic = falsedefaults correctly for the priority-1 identity, routing throughUserResolver.ResolveUser.- The userinfo-tripwire test pins the early-return invariant.
Architecture. This is phase 3 of a 6-PR stack (#5156). Forward references to the CRD type (lands in #5155) and operator translation (#5157) are appropriately gated. The SyntheticIdentityUpstreams() predicate gap is correctly flagged as a known forward-looking limitation. The additive OAuth2Config.IdentityFromToken field is non-breaking.
Cross-cutting clusters. The MEDIUM findings cluster around two themes:
- Doc-code synchronization (F2): a one-line factual error in two locations that future readers will be confused by.
- Concurrency-invariant documentation (F1): the rewriter has grown mutable per-call state without a doc comment stating the single-use lifecycle. The current caller pattern is correct, but the contract is implicit.
The operator-visible log inconsistencies (F5, F6, F9) form a smaller cluster — none affect runtime correctness, but they undermine the architecture doc's claim that operator visibility is a designed property of the new path.
Dropped findings (< 7 consensus, no action expected)
Validate() doesn't check NamePath/EmailPath gjson syntax; missing OIDC IdentityFromToken WARN test (branch unreachable); Validate() table missing field combinations; missing edge-case test for misconfigured EmailPath at integration layer (covered at unit layer); no concurrent-use test for rewriter (subsumed by F1); missing rewriter-level negative test (subsumed by F4); tokenExchangeResult doc missing mutual-exclusion invariant (related to F1); extraction failure may double-log; RefreshTokens silently drops identity fields (acceptable trade-off per PR description); fmt.Errorf %w + %s loses validator chain depth (errors.Is still works); test helpers duplicated; identity vs extractedIdentity naming; awkward refresh-path comment; SPDX header format (pre-existing); rewriter mapping-time error not surfaced on refresh; rewriter retained longer than needed.
Generated by /dev:pr-review (multi-agent consensus review). Inline comments follow.
tgrunnagle
left a comment
There was a problem hiding this comment.
The implementation LGTM. The agent-raised issues could be addressed in a separate PR, so approving in case you want to do that.
Thank you, since there will be more PRs in this stacked series, I am going to include the fixes in the next PR. |
Wire identityFromToken into the embedded auth server's OAuth2 upstream
provider. Extension point: the existing tokenResponseRewriter (which
already reads and parses every successful token-endpoint response to
normalise non-standard envelopes) gains a parallel responsibility —
extract user identity claims from the same body when the operator
configures IdentityFromTokenConfig with gjson dot-notation paths.
Identity extraction runs on the RAW pre-rewrite body, so paths are
resolved against the original provider response even when
TokenResponseMapping is also configured. The rewriter passes the
extracted *partialIdentity back to exchangeCodeForTokens via a
returned reference; RefreshTokens passes nil and the rewriter is
either omitted entirely or runs with identityCfg=nil because
providers like Snowflake omit username on refresh and identity is
cached at auth-code time in session storage.
The new priority chain in BaseOAuth2Provider.ExchangeCodeForIdentity:
1. IdentityFromToken — when configured, return the extracted
identity. If extraction failed (path didn't resolve), return
ErrIdentityResolutionFailed without consulting userInfo or
synthesising — the operator's "identity is in the token" claim
is explicit and we surface its failure rather than silently
fall through.
2. UserInfo — existing fetchUserInfo path, unchanged.
3. Synthesis — existing synthesizeIdentity path (PR 5094),
unchanged.
OIDC providers always have ID-token-derived identity, so the OIDC
provider's ExchangeCodeForIdentity discards the rewriter's
identityFromToken return value with a defensive WARN if a future
config-loader bug ever sets IdentityFromToken on an OIDC base config
(structurally absent on the OIDC CRD type today).
The tripwire test asserts userinfo HTTP is never contacted when
identityFromToken is configured, including on extraction failure.
Other new tests cover the happy path, userInfo-only regression, the
refresh path (no identity extraction), and information disclosure
(raw body never appears in error messages or logs above DEBUG).
Two existing slog.Info calls in exchangeCodeForTokens are downgraded
to slog.Debug to comply with the silent-success-at-INFO rule.
Closes: #5156
Post-review polish on top of the IdentityFromToken provider wiring:
- Correct the CRD type reference from v1alpha1 to v1beta1 in the
doc comments and the architecture doc. The CRD lands on v1beta1.
- Suppress the synthesis-mode WARN in NewOAuth2Provider when
IdentityFromToken is configured. Previously the warn fired
whenever userInfo was nil, including upstreams that resolve
identity from the token response.
- Downgrade two logs that are too chatty at INFO:
- RefreshTokens "refreshing tokens" -> Debug (matches the
sibling success log and the silent-success-at-INFO rule).
- OIDC defensive ignore-IdentityFromToken log -> Debug. The
OIDC CRD type has no IdentityFromToken field today, so this
log is operationally irrelevant outside hand-built configs.
- Document the single-use contract on tokenResponseRewriter so
future contributors do not share an instance across goroutines
or reuse it for multiple exchanges.
- Strengthen TestTokenResponseRewriter_IdentityFromRawBody so it
actually proves the ordering invariant. The previous fixture
set the subject path to "authed_user.id", which the rewriter
never touches, so the test would pass regardless of ordering.
The new fixture uses "token_type", which rewriteTokenResponse
unconditionally overwrites to "Bearer" -- the assertion can
only hold if extraction ran on the pre-rewrite body.
- Rename TestWrapHTTPClientWithMapping_NilMapping to match the
renamed wrapHTTPClientForTokenExchange function.
- Remove a duplicate RegisterModifiers call in the @upstreamjwt
subtest; TestMain registers modifiers once for the package,
and a second call races with concurrent reads in parallel
subtests.
- Add a forged-JWT subtest pinning the no-signature-verification
trust-model behavior of @upstreamjwt. A future contributor who
adds verification must update the trust model docs and this test.
- Rewrite the "unreachable in practice" comment in the priority-1
fallback as defense-in-depth justification.
- Add an assert.False(t, identity.Synthetic) check on the
IdentityFromToken happy path so a future regression that
accidentally sets Synthetic=true here is caught.
- Rename "refresh path does not trigger identity extraction"
subtest to a name that matches what it actually asserts.
9d9f68b to
eb6ee4b
Compare
Summary
userInfowas configured, otherwise a synthesizedtk-...subject (PR Allow OAuth2 upstreams to omit userInfo config #5094). Upstreams that expose identity directly in the token-endpoint response (Snowflake'susername, Slack v2'sauthed_user.id) had no good answer — they would either go through synthesis (losing real subject identity forusersrow creation) or require a fakeuserInfoURL.IdentityFromTokenintoBaseOAuth2Provider.ExchangeCodeForIdentityas a third resolution mode that runs ahead ofuserInfo. The existingtokenResponseRewriteralready reads and parses every successful token-endpoint response to normalise non-standard envelopes; it now extracts identity claims from the same body using gjson dot-notation paths when an operator configures it.IdentityFromTokenextracts identity from the raw pre-rewrite body, (2)UserInfofetches via HTTP (existing), (3)synthesizeIdentityfalls back (existing). When extraction is configured but fails, the operator-actionable extractor error (path name + type description, never body content) is surfaced throughErrIdentityResolutionFailedrather than silently falling through to userinfo or synthesis — "identityFromToken set" is an explicit operator claim.RefreshTokensdeliberately passesnilfor the identity config: providers like Snowflake omit username on refresh; identity is cached at auth-code time and read from session storage on subsequent requests.IdentityFromTokenon an OIDC base config (structurally absent on the OIDC CRD type today).slog.Infocalls inexchangeCodeForTokenswere downgraded toslog.Debugfor silent-success-at-INFO compliance.Closes #5156
Type of change
Test plan
task test)task lint-fix)New tests cover the happy path, the userinfo-bypass tripwire (asserts no userinfo HTTP call when identityFromToken is configured), the userInfo-only regression, the synthesis fallback, the refresh path (no identity extraction), the
@upstreamjwtmodifier path through the provider, and an information-disclosure assertion that the raw token body never appears in error messages.API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.This PR adds an additive optional field to the runtime
OAuth2Configonly; the matching CRD type lands in #5155.Does this introduce a user-facing change?
No direct CRD-level user-facing change in this PR. End-to-end functionality is gated on the runtime translation in #5157 (the next PR in the stack), which wires the v1beta1 CRD field into this runtime config.
Implementation plan
Approved implementation plan
This is phase 3 of the Snowflake /
identityFromTokenstory (#5150):extractIdentityFromTokenResponseand the@upstreamjwtgjson modifier.IdentityFromTokenConfigCRD type onMCPExternalAuthConfigv1beta1.BaseOAuth2Provider.ExchangeCodeForIdentityvia the existingtokenResponseRewriter. New priority chain. Identity-source set on the rewriter, consumed by the caller afterExchangereturns.RegisterModifiers()so@upstreamjwtpaths actually work in production.thv llm setupdoesn't append provider path prefix for Envoy AI Gateway #5158, Recognise identityFromToken in synthesis-mode detection #5159): Examples + synthesis-detection refinement so theIdentitySynthesizedcontroller condition no longer fires whenIdentityFromTokenis configured.Design constraints surfaced and addressed in this PR:
rewriteTokenResponseonly relocates standard OAuth fields today and never touches identity-shaped fields, but extracting first makes the ordering invariant independent of that assumption.wrapHTTPClientForTokenExchangecreates a fresh one per call); no state shared across goroutines.IdentityFromTokenare claiming identity is in the token response; we surface the failure rather than mask it.subjectPath %q %swhere%sis a static validator description ("path not found in token response", etc.). Never body bytes. The ExchangeCodeForIdentity caller wraps this directly so operators see the misconfigured path name in the returned error.Special notes for reviewers
pkg/authserver/upstream/oauth2.go:160-164andpkg/authserver/upstream/identity_from_token.go:31-32referencecmd/thv-operator/api/v1alpha1.IdentityFromTokenConfig(the v1beta1 CRD type). That type lands in Add identityFromToken to MCPExternalAuthConfig CRD #5155. Until Add identityFromToken to MCPExternalAuthConfig CRD #5155 merges, those references dangle on this branch in isolation. End-to-end functionality requires Translate identityFromToken from CRD to runtime config #5157 too, which adds the operator-to-runner translation and theRegisterModifiers()bootstrap call.docs/arch/11-auth-server-storage.mdupdate reframes the previous "Synthesis-mode subjects" section as a 3-priority chain. It also flags a known gap: the controller predicateSyntheticIdentityUpstreams()checks onlyuserInfo == niland does not yet account forIdentityFromToken, so an upstream with onlyIdentityFromTokenconfigured will still triggerIdentitySynthesizedActiveuntil Recognise identityFromToken in synthesis-mode detection #5159 lands.synthesize fallbacksubtest inoauth2_test.gowas rewritten to useNewOAuth2Providerinstead of hand-building a struct literal, aligning it with the rest of the suite.🤖 Generated with Claude Code