This change uses separate http clients/transports in TestValidateToken
subtests. Previously parallel subtests of TestValidateToken shared
a http.DefaultTransport. When one subtest's httptest.Server.Close() ran in
t.Cleanup, it called http.DefaultTransport.CloseIdleConnections, which
could interrupt connection(s) used in another subtest.
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
Fixes https://github.com/coder/coder/issues/17069
Builds on #24332 and #24334 which addressed token persistence and rate
limit handling.
## Problem
When multiple concurrent requests race to refresh an expiring external
auth token, providers with single-use refresh tokens (e.g., GitHub Apps)
reject all but the first refresh attempt with `bad_refresh_token`. The
losing request caches this transient error in the
`oauth_refresh_failure_reason` database column and clears the refresh
token, blocking all subsequent refresh attempts until the user manually
re-authenticates.
This is common for users with multiple terminals, IDE connections, or
workspaces open, all of which poll the external auth endpoint and
trigger concurrent refreshes when the token nears expiry. Database
analysis showed 5 of 7 affected users failed within 5-10 seconds of
token expiry, matching the Go oauth2 library's `expiryDelta` window.
## Fix
Before caching a `bad_refresh_token` failure, re-read the external auth
link from the database. If the refresh token has changed (indicating a
concurrent caller already refreshed successfully), return the winner's
updated link instead of writing a failure. An empty-string guard ensures
a token cleared by another loser isn't mistaken for a winner's
successful refresh.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Garrett Delfosse <garrett@coder.com>
ValidateToken treated all 403 responses as "token invalid," including
GitHub rate limits. isFailedRefresh included 403 in the status code
fallthrough, destroying tokens on rate-limited refresh attempts.
Split the combined 401/403 check in ValidateToken into a switch on
status code. On 403, inspect X-RateLimit-Remaining and Retry-After
headers; if either indicates a rate limit, return optimistically valid.
Handle 429 the same way. Plain 403 without rate-limit headers preserves
the existing invalid-token behavior.
Add incorrect_client_credentials and invalid_client to isFailedRefresh
error code switch. Remove 403 from the status code fallthrough since no
known provider returns 403 from the token endpoint.
GitHub rotates refresh tokens on use, invalidating the old token
immediately. If post-refresh validation fails (e.g. rate-limited
403 from /user), the new token was silently discarded because the
DB save only happened after successful validation. The next refresh
attempt would use the stale refresh token, fail permanently, and
destroy the token.
Move the UpdateExternalAuthLink call to immediately after
TokenSource.Token() succeeds. The post-validation save block is
removed (dead code after the early save). The DB write uses a
detached context (context.WithoutCancel) so a canceled request
cannot prevent persistence of the already-consumed refresh token.
- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests
Created using Opus 4.6
## Problem
When multiple concurrent callers (e.g., parallel workspace builds) read
the same single-use OAuth2 refresh token from the database and race to
exchange it with the provider, the first caller succeeds but subsequent
callers get `bad_refresh_token`. The losing caller then **clears the
valid new token** from the database, permanently breaking the auth link
until the user manually re-authenticates.
This is reliably reproducible when launching multiple workspaces
simultaneously with GitHub App external auth and user-to-server token
expiration enabled.
## Solution
Two layers of protection:
### 1. Singleflight deduplication (`Config.RefreshToken` +
`ObtainOIDCAccessToken`)
Concurrent callers for the same user/provider share a single refresh
call via `golang.org/x/sync/singleflight`, keyed by `userID`. The
singleflight callback re-reads the link from the database to pick up any
token already refreshed by a prior in-flight call, avoiding redundant
IDP round-trips entirely.
### 2. Optimistic locking on `UpdateExternalAuthLinkRefreshToken`
The SQL `WHERE` clause now includes `AND oauth_refresh_token =
@old_oauth_refresh_token`, so if two replicas (HA) race past
singleflight, the loser's destructive UPDATE is a harmless no-op rather
than overwriting the winner's valid token.
## Changes
| File | Change |
|------|--------|
| `coderd/externalauth/externalauth.go` | Added `singleflight.Group` to
`Config`; split `RefreshToken` into public wrapper +
`refreshTokenInner`; pass `OldOauthRefreshToken` to DB update |
| `coderd/provisionerdserver/provisionerdserver.go` | Wrapped OIDC
refresh in `ObtainOIDCAccessToken` with package-level singleflight |
| `coderd/database/queries/externalauth.sql` | Added optimistic lock
(`WHERE ... AND oauth_refresh_token = @old_oauth_refresh_token`) |
| `coderd/database/queries.sql.go` | Regenerated |
| `coderd/database/querier.go` | Regenerated |
| `coderd/database/dbauthz/dbauthz_test.go` | Updated test params for
new field |
| `coderd/externalauth/externalauth_test.go` | Added
`ConcurrentRefreshDedup` test; updated existing tests for singleflight
DB re-read |
## Testing
- **New test `ConcurrentRefreshDedup`**: 5 goroutines call
`RefreshToken` concurrently, asserts IDP refresh called exactly once,
all callers get same token.
- All existing `TestRefreshToken/*` subtests updated and passing.
- `TestObtainOIDCAccessToken` passing.
- `dbauthz` tests passing.
**Breaking Change:** Existing oauth apps might now use PKCE. If an
unknown IdP type was being used, and it does not support PKCE, it will
break.
To fix, set the PKCE methods on the external auth to `none`
```
export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none
```
Solves #15575
Adds OAuth access token revocation when unlinking external auth
provider. Due to revocation not being consistently implemented by
providers this is only best effort attempt. Unsuccessful revocation
won't influence link removal.
External auth refresh errors lose the original error thrown on the first
refresh. This PR saves that error to the database to be raised on
subsequent refresh attempts
https://github.com/coder/coder/pull/15608 introduced a buggy behaviour
with dbcrypt enabled.
When clearing an oauth refresh token, we had been setting the value to
the empty string.
The database encryption package considers decrypting an empty string to
be an error, as an empty encrypted string value will still have a nonce
associated with it and thus not actually be empty when stored at rest.
Instead of 'deleting' the refresh token, 'update' it to be the empty
string.
This plays nicely with dbcrypt.
It also adds a 'utility test' in the dbcrypt package to help encrypt a
value. This was useful when manually fixing users affected by this bug
on our dogfood instance.
Once a token refresh fails, we remove the `oauth_refresh_token` from the
database. This will prevent the token from hitting the IDP for
subsequent refresh attempts.
Without this change, a bad script can cause a failing token to hit a
remote IDP repeatedly with each `git` operation. With this change, after
the first hit, subsequent hits will fail locally, and never contact the
IDP.
The solution in both cases is to authenticate the external auth link. So
the resolution is the same as before.
* chore: verify pass through external auth query params
Unit test added to verify behavior of query params set in the
auth url for external apps. This behavior is intended to specifically
support Auth0 audience query param.
* feat: allow external services to be authable
* Refactor external auth config structure for defaults
* Add support for new config properties
* Change the name of external auth
* Move externalauth -> external-auth
* Run gen
* Fix tests
* Fix MW tests
* Fix git auth redirect
* Fix lint
* Fix name
* Allow any ID
* Fix invalid type test
* Fix e2e tests
* Fix comments
* Fix colors
* Allow accepting any type as string
* Run gen
* Fix href