Error messages in agent chat now expose the actual error detail
instead of hiding it entirely. Also captures API response detail
for generic errors that previously dropped it.
(cherry picked from commit 78d556fffc)
(NOTE: Depends on https://github.com/coder/coder/pull/25837)
Adds a new `provider_disabled` error classification in `chatd` with the
corresponding plumbing to classify it as non-retryable. Also adds a
story for how this particular error kind is displayed in the UI.
(cherry picked from commit d0a51da0a9)
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
Refs CODAGT-486
- `codersdk/chats.go`: New `ChatErrorKindMissingKey` constant and
`AllChatErrorKinds` entry
- `coderd/x/chatd/chaterror/message.go`: `terminalMessage` and
`retryMessage` cases
- `coderd/x/chatd/model_routing_aibridge.go`: Pre-classify error with
`WithClassification`
- `coderd/x/chatd/model_routing_internal_test.go`: Classification
assertion on production path (CRF-2)
- `chatStatusHelpers.ts`: Frontend title "Chat interrupted"
- `LiveStreamTail.stories.tsx`: Storybook story with `detail` assertion
- `docs/ai-coder/ai-gateway/clients/coder-agents.md`: Troubleshooting
entry
- Tests: classification round-trip, terminal message, metrics kind
enumeration
> Generated with [Coder Agents](https://coder.com/agents) on behalf of
@johnstcn
(cherry picked from commit 6df1536256)
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
- Empty string is valid for `apiKeyID` in paths that genuinely lack a
caller key (e.g. agent-initiated context injection in
`workspaceAgentAddChatContext`). AI Gateway fail-closed check remains
the runtime safety net.
- Context injection paths (`persistInstructionFiles`, compaction) read
the key from `aibridge.DelegatedAPIKeyIDFromContext(ctx)`, set upstream
by `contextWithActiveTurnAPIKeyID`.
- Subagent context copy branches on `copiedRole ==
database.ChatMessageRoleUser` to choose the right append function.
> Generated by Coder Agents
(cherry picked from commit b278be7361)
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI
providers can be modified, and possibly misconfigured, at runtime. These
metrics help operators understand the state of these provider
definitions in case unexpected behaviour is observed.
(cherry picked from commit 12520ee964)
Cherry-pick of #25746 to `release/2.34`.
Bumps bundled Terraform from `1.15.2` to `1.15.5`. Terraform 1.15.5 is
built with Go 1.25.10 (vs Go 1.25.8 in 1.15.2), addressing Go stdlib
CVEs flagged by security scanners.
Files changed:
- `.github/actions/setup-tf/action.yaml`
- `scripts/Dockerfile.base`
- `install.sh`
- `flake.nix` (+ updated SRI hash for the linux_amd64 zip)
- `mise.toml`
- `mise.lock` (+ updated per-platform SHA256 checksums)
- `provisioner/terraform/testdata/version.txt`
-
`provisioner/terraform/testdata/resources/ai-tasks-disabled/ai-tasks-disabled.tfplan.json`
Release notes:
https://github.com/hashicorp/terraform/releases/tag/v1.15.5
(cherry picked from commit bcc6cca040 —
will be updated to the merged SHA from #25746)
Created on behalf of @Shelnutt2
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
The Go test jobs in `ci.yaml` each had ~30 lines of inline shell that
wrapped `gotestsum` with a PATH shim to capture JSON, then ran
`gotestsummary` and `upload-artifact` to publish a failure report. Three
jobs carried three near-identical copies.
This change replaces the three inline blocks with a single composite
action at `.github/actions/go-test-failure-report/` that runs the same
`gotestsummary` invocation, writes the same markdown to
`GITHUB_STEP_SUMMARY`, and uploads the same NDJSON artifact. The PATH
shim is gone; gotestsum's native `GOTESTSUM_JSONFILE` env variable is
used instead, plumbed through the `test-go-pg` composite.
`test-go-pg` gains three optional inputs:
- `gotestsum-json-file` — explicit JSON file path (or `default` for
`${RUNNER_TEMP}/go-test.json`)
- `run-regex` — passed to `go test -run`
- `test-shuffle` — passed to `go test -shuffle`
All three have safe defaults so existing callers are unaffected.
No observable change in CI behavior: the three existing test-go-pg jobs
continue to emit the same JSON, render the same failure summary, and
upload the same artifact.
Stacked under #25667, which uses the new composite and inputs to power a
new flake-detector workflow.
_Disclosure:_ _produced_ _with_ _Claude_ _Opus_ _4\.7_
AI Gateway only supports Anthropic (+Bedrock), OpenAI, and Copilot providers at present. All other types (Vercel, Gemini, etc) will be mapped to OpenAI since they support OpenAI-compatible endpoints.
<!--
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
-->
Makes `MsgQueue` exported, so it can be used in pubsub implementations outside PGPubsub.
## Summary
Three small docs fixes:
- **`docs/admin/integrations/oauth2-provider.md`**: Replace broken
relative link to `scripts/oauth2/README.md` with an absolute GitHub URL.
The previous link escaped the `docs/` tree
(`../../../scripts/oauth2/README.md`) and does not resolve in the
published docs site.
- **`docs/install/releases/feature-stages.md`**: Point the "Coder
documentation" link to `docs/about/contributing/documentation.md`. The
previous `../../README.md` target does not exist under `docs/`.
- **`docs/manifest.json`**: Add the missing `users oidc-claims` entry
alongside the other `users` CLI subcommands so the generated reference
page (`docs/reference/cli/users_oidc-claims.md`) is reachable from the
sidebar.
## Validation
- Confirmed each new link target exists on `main`
(`docs/about/contributing/documentation.md`, `scripts/oauth2/README.md`,
`docs/reference/cli/users_oidc-claims.md`).
- Pre-commit hooks pass (`fmt/markdown`, `lint/markdown`, `lint/emdash`,
`lint/typos`, etc.).
---
_This PR was prepared by a [Coder Agents](https://coder.com/) session on
behalf of @nickvigilante. Human review requested since this is a
docs-only change._
Fixes CODAGT-503
- Add failing-first coverage for manual title generation with missing
message `api_key_id`, with both context fallback and fail-closed cases.
- Set `aibridge.WithDelegatedAPIKeyID(ctx, apiKey.ID)` in
`regenerateChatTitle` and `proposeChatTitle`.
- In `generateManualTitleCandidate`, fall back to
`aibridge.DelegatedAPIKeyIDFromContext(ctx)` only when
`modelBuildOptionsFromMessages` yields an empty `ActiveAPIKeyID`.
- Keep `modelBuildOptionsFromMessages` pure and leave automatic title
generation unchanged.
For `vercel`, `openrouter`, and `openai-compat`, the
`<provider>/<model>` slash is part of the upstream model ID rather than
a hint. `ResolveModelWithProviderHint` was running
`parseCanonicalModelRef` before honoring `providerHint`, so a config
like `(provider=vercel, model=anthropic/claude-4-5-sonnet)` resolved to
`provider=anthropic, model=claude-4-5-sonnet` and the prefix-less model
name was forwarded to Vercel, which returned `Model 'claude-4-5-sonnet'
not found`.
Honor an explicit gateway provider hint before attempting canonical-ref
parsing. Non-gateway hints (anthropic, openai, etc.) keep the existing
canonical-ref-first behavior so `anthropic/claude-...` still has its
prefix stripped when routed directly to Anthropic.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Problem
Follow-on to:
- https://github.com/coder/coder/pull/25089
`coder exp sync start` still printed a generic success message when the
unit was ready on the first status check. That hid whether the unit had
no dependencies or had dependencies that were already satisfied before
`sync start` ran.
Before:
```text
Success
```
## Solution
Print explicit startup output for both ready-at-first-check cases.
After, dependencies already satisfied:
```text
Unit "test-unit" started immediately, dependencies already satisfied: [dep-unit, dep-unit-2]
```
After, no dependencies:
```text
Unit "test-unit" started with no dependencies
```
The existing waiting path is unchanged and still reports the
dependencies while waiting and after waiting finishes.
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Previously the in-process aibridge daemon and the enterprise aibridgeproxy daemon both snapshotted their provider routing once at boot. Any `ai_providers` or `ai_provider_keys` mutation required a restart for either to pick it up.
Add an `ai_providers_changed` pubsub channel that the CRUD handlers publish on after Create / Update / Delete. Both daemons subscribe:
- **aibridged** rebuilds its `[]aibridge.Provider` snapshot via `BuildProviders` and swaps it into the pool atomically. Inflight requests keep serving against the bridge they already acquired; new acquires build against the new snapshot. Per-provider construction errors stay scoped to the offending row.
- **aibridgeproxyd** rebuilds its routing snapshot from `GetAIProviders` and swaps the host→provider map atomically. The MITM listener picks up new providers without restart.
DB read for aibridgeproxyd uses the existing `AsAIProviderMetadataReader` subject for routing-only access.
Fixes CODAGT-484.
- Removed "quota", "billing", "insufficient_quota", "payment required"
from `authStrongPatterns`
- Added `usageLimitPatterns` slice with those patterns
- Added `usageLimitMatch` signal and rule between overloaded and
authStrong in priority
- Added terminal/retry messages for `ChatErrorKindUsageLimit`
- Simplified auth message (removed billing reference)
- Frontend: conditional `!usageLimitStatus.provider` guard on the "View
Usage" Alert
- Added `TestClassify_UsageLimitBeatsAuth` with 5 cases including real
production OpenAI error
- Added `ProviderQuotaExceeded` story asserting no "View Usage" link and
correct `ChatStatusCallout` rendering
> Generated with [Coder Agents](https://coder.com/agents)
> [!WARNING]
> The investigation and solution in this PR were done with
[Mux](https://mux.coder.com/). I've reviewed the investigation
methodology, evidence and solution, and it all appears sound.
## Summary
PR #25570 (`refactor: move aibridged out of enterprise to AGPL`, merged
2026-05-22) added an in-memory aibridge DRPC server in
`coderd/aibridged.go` that does `api.WebsocketWaitGroup.Add(1)` and only
releases `Done()` when its client session is closed. PR #25575 then
flipped `CODER_AI_GATEWAY_ENABLED` to default to `true`, so every
`cli.Server()` invocation now spins up that goroutine.
In `cli/server.go`, the only call to `aibridgeDaemon.Close()` was a
`defer` scheduled at function return. During graceful shutdown the code
first calls `coderAPICloser.Close()`, which waits on
`api.WebsocketWaitGroup`. That wait sits for the full 10s timeout in
`coderd/coderd.go` (`websocket shutdown timed out after 10 seconds`),
then returns, then the function unwinds, and only then does the deferred
`aibridgeDaemon.Close()` fire and let the goroutine call `Done()`.
The 10s tax was previously latent (aibridged was enterprise-only and
opt-in). After the two May 22 PRs it hit every `cli.Server()` test. On
Linux/macOS CI it just makes the suite slower; on the Depot Windows
runner, the ramdisk reservation leaves only ~17 GiB of headroom and the
~10s shutdown tails of multiple concurrent package binaries overlap into
an OOM, presenting as `test-go-pg (windows-2022)` jobs that die silently
at the ~600s watchdog with an empty `steps` array.
See Slack:
https://codercom.slack.com/archives/C05AE94121Z/p1779807717764189
## Fix
Close `aibridgeDaemon` explicitly during graceful shutdown, **before**
`coderAPICloser.Close()` waits on the WebSocket wait group. This matches
the existing ordered-shutdown pattern used for `tunnel` and
`notificationsManager`. The deferred `aibridgeDaemon.Close()` is
retained as a safety net for early-return paths, and is safe to
double-call because `aibridged.Server.Close()` is already idempotent via
`shutdownOnce` in `coderd/aibridged/aibridged.go`.
## Regression test
`TestServer_AIGatewayShutdownOrdering` boots a real `coder server` with
`--ai-gateway-enabled=true`, cancels its context, and asserts graceful
shutdown finishes in under 8s. With the fix the test runs in ~0.1s;
without the fix it fails deterministically at ~10.0s. The flag is passed
explicitly so the test continues to guard the ordering even if the
deployment default is ever flipped back.
## Evidence this fixes the OOM
On Linux the patched `cli` test package drops from 114 s back to its
pre-regression 30 s wall time at the same single-process peak RSS (~7.6
GiB), and the `websocket shutdown timed out after 10 seconds` log line
disappears from every server-test run. Since the Windows OOM is the sum
of multiple concurrent 10 s shutdown tails overlapping past the runner's
~17 GiB headroom, removing those tails returns the concurrent-RSS budget
to its pre-regression level. The Windows OOM was intermittent (a handful
of hits across many runs since May 22), so a single green `test-go-pg
(windows-2022)` job on this PR is not by itself proof; confirmation will
come from watching Windows runs on `main` over the next several days and
seeing the ~600 s silent-kill fingerprint stop recurring.
Relates to ENG-2771
Replaces the linear progress bars and text labels in the sidebar footer
usage trigger with SVG donut ring charts that show the section icon
centered inside each ring.
## Changes
- **`SvgRingProgress`**: shared SVG component used by both
`UsageIndicator` and `ContextUsageIndicator`
- Ring colors follow the existing severity system
(normal/warning/exceeded)
- Hover tooltips show "Spend $12.50" and "Workspaces 30/100"
- Dropdown menu content unchanged; full usage details still appear on
click
- Removed dead `summaryValue` field and `size="compact"` variant
- Updated stories to cover ring trigger rendering and dropdown usage
details
> Generated by Coder Agents on behalf of @tracyjohnsonux
Replaces the blocking Dialog modal setup notice with a context-aware
inline banner above the chat input, with different messaging for admins
and members.
## Inline notice banner
The `AgentSetupNotice` component now renders as a `bg-surface-tertiary`
inline box instead of an unclosable `Dialog` modal. The notice sits
above the chat composer using negative margin overlap, and the composer
is forced opaque (`bg-surface-secondary`) when the notice is present so
the banner doesn't bleed through the semi-transparent desktop
background.
Three states based on role and configuration:
- **Admin, no providers or models**: links to both provider and model
setup
- **Admin, missing provider only**: link to provider setup
- **Admin, has providers but no models**: link to model setup only
- **Member, no models available**: generic "your admin is still getting
things set up" message
The admin/member distinction is determined via
`permissions.editDeploymentConfig` and applied in both `AgentChatPage`
and `AgentCreatePage`.
## Conflict resolution notes
During merge with main, the following were adapted:
- Sidebar filter props updated to main's
`sidebarFilters`/`onSidebarFiltersChange` pattern (replacing old
`archivedFilter`)
- Accepted `Sidebar/` -> `ChatsSidebar/` directory refactor from main
- Dropped `hasArchivedChats` query (its sidebar consumer was removed in
the refactor)
- Provider link updated to `/ai/settings` (new AI settings page)
> Generated with the assistance of Coder Agents on behalf of
@tracyjohnsonux
---------
Co-authored-by: jaaydenh <jaaydenh@users.noreply.github.com>
Move docs linting into the required CI umbrella and reuse the existing
`changes` job so docs lint runs when docs or CI files change, plus on
`main` as a backstop.
This is motivated by the docs lint failures on #25601. That PR touched
`.claude/docs/TESTING.md`; the standalone `Docs CI` workflow picked it
up because `docs-ci.yaml` used broad `**.md` matching, but local `pnpm
lint-docs` and `make lint` did not catch the same file because they only
scanned `docs/**` plus root `*.md`. The first failed Docs CI run
reported markdownlint errors in `.claude/docs/TESTING.md` (`MD040` and
`MD031`), and the next run reported a markdown table formatter failure
in the same file.
That mismatch is why this PR exists: prevent unrelated PRs from being
surprised by stale `.claude/docs/**` lint drift only after they happen
to touch one of those files. The local docs scripts now include
`.claude/docs/**`, and the old standalone `Docs CI` workflow is removed
so we do not maintain separate path-filter logic outside the required CI
workflow.
> Generated by mux, but reviewed by a human
Update the user secrets user guide, the admin security secrets
reference, and the docs manifest to label the feature as Beta instead of
Early Access, and link to the beta section of the feature stages doc.
Add a Postgres trigger and matching codersdk constants that cap each
user's secrets in four dimensions: count (50), total stored value bytes
(200 KiB), env-injected stored value bytes (24 KiB), and env name length
(256 bytes). Without these caps a user could overflow the 4 MiB DRPC
agent manifest, the ~32 KiB Windows process env
block, or Linux/macOS ARG_MAX at workspace start. The trigger is the
source of truth on aggregates; the handler maps its check_violation
error into a 400 that names the per-user budget in stored
(post-encryption) bytes. A handler test exercises off-by-one at each cap
across POST and PATCH, plus per-user budget isolation.
Generated with help from Coder Agents.
Adds an end-to-end enterprise CLI test to ensure legacy AI provider keys seeded at server startup are encrypted at rest when DBCrypt external token encryption is enabled, preventing regressions related to #25699.
> Partially implemented by Coder Agents, and massaged afterwards by me.
## Summary
Wraps external auth token refresh in an exponential-backoff retry so a
brief upstream hiccup (5xx, network timeout, rate-limited 429) no longer
surfaces as an `InvalidTokenError` and forces users to re-authenticate.
GitHub in particular has been flaky enough lately that this is hitting
real users.
## Behavior
- `(*Config).RefreshToken` now calls a helper that retries the
`TokenSource.Token()` exchange with exponential backoff (250ms → 2s),
bounded by a 10s total budget.
- Errors classified as permanent by `isFailedRefresh` (e.g.
`bad_refresh_token`, `invalid_grant`, `unauthorized_client`, ...) skip
the retry loop. Retrying a permanent failure wastes the refresh quota
and, on providers with single-use refresh tokens, can mask a legitimate
concurrent winner with repeated `bad_refresh_token` responses.
- Refreshes with an empty refresh token still short-circuit without
making an API call.
- The existing concurrent-refresh-race detection and optimistic-lock
paths are unchanged.
## Tunables
Three new `time.Duration` fields on `externalauth.Config`
(`RefreshRetryInitialBackoff`, `RefreshRetryMaxBackoff`,
`RefreshRetryTimeout`) let callers override the defaults. They default
to zero, which falls back to the package defaults, so existing call
sites are unaffected. The fields exist primarily so tests can dial the
timing way down without touching package globals (and therefore without
serializing parallel tests).
## Tests
- `TestRefreshToken/RefreshRetries` now disables internal retries via
`RefreshRetryTimeout = time.Nanosecond` so its existing "1 IDP call per
`RefreshToken` invocation" assertion still holds. Otherwise its
assertions are unchanged.
- New `TestRefreshToken/RefreshTokenWithBackoff` simulates 3 transient
5xx failures followed by success and verifies the refresh ultimately
succeeds with 4 total IDP attempts.
- New `TestRefreshToken/RefreshTokenBackoffPermanentError` returns
`bad_refresh_token` and verifies the refresh is **not** retried even
with a generous 1s budget.
<details>
<summary>Why the explicit <code>retryCtx.Err()</code> guard?</summary>
`retry.Retrier.Wait` `select`s between `time.After(delay)` and
`ctx.Done()`. The first call has `delay == 0`, so `time.After(0)` and an
already-cancelled context both fire immediately and Go picks the case
nondeterministically. Without the guard, a near-zero retry budget would
still trigger an unwanted extra refresh attempt roughly half the time,
which would have made the `RefreshRetries` test flaky.
</details>
This PR was opened by a Coder agent on behalf of @kylecarbs.
## Summary
Routes chatd model calls backed by concrete AI Provider rows through the
in-process aibridge transport by default, with deployment options to use
direct provider routing when AI Gateway is disabled or chat AI Gateway
routing is disabled.
- Splits model routing into common, direct provider, and AI Gateway
paths behind a single deployment-mode entry point.
- Builds chatd models through explicit request, route, and options data.
Active API key attribution is passed explicitly instead of being hidden
inside generic model construction.
- For AI Gateway BYOK routes, resolves the user's provider key in chatd,
forwards it through provider-specific auth headers, and sets
`X-Coder-AI-Governance-Token` to the `delegated` marker so aibridge
preserves those headers while still stripping Coder-specific metadata.
- Keeps central provider credentials and deployment fallback credentials
out of forwarded provider auth headers, so AI Gateway central policy
remains authoritative.
- Redacts delegated provider auth from default string formatting to
avoid accidental plaintext logging of user BYOK credentials.
- Covers selected chat models, advisor overrides, title and quickgen
paths, subagent overrides, computer use model selection, and an
integration-style chat turn through the aibridge transport path.
- Persists initiating API key IDs on chat and queued user messages,
including subagent child messages, and fails closed for AI
Gateway-routed model builds without an active key.
- Removes unused `api_key_id` indexes while keeping the persistence
columns and foreign keys.
- Keeps the deployment option available through config and env parsing,
but hides it from CLI help and generated docs.
- Stabilizes the subagent poll fallback test so background CreateChat
processing cannot win the state transition under slower CI environments.
## Tests
- `go test ./coderd/x/chatd -run
'TestAIGatewayProviderAuthForUser|TestAIGatewayProviderAuthRedactsFormatting|TestResolveModelRouteForConfigAIGatewayProviderAuth|TestAIGatewayModelForwardsProviderAuth|TestProcessChat_AIGatewayRoutingUsesDelegatedAPIKey|TestAwaitSubagentCompletion'
-count=1`
- `go test ./coderd/aibridged -run
'TestServeHTTP_DelegatedAPIKey|TestServeHTTP_StripCoderToken' -count=1`
- `git diff --check HEAD~1..HEAD`
- `make lint`
> Mux working on behalf of Mike.
## Problem
Two related symptoms of the same architectural issue: the `dbcrypt`
wrapper is installed inside `enterprise/coderd.New`, so any access to
`options.Database` that happens before `newAPI` runs bypasses
encryption.
**Symptom 1 (reads):** Provider keys added via the admin UI are
encrypted at rest. `BuildProviders` was running *before* `newAPI`,
against the unwrapped store, so the ciphertext was read as-is and shoved
into the keypool as the upstream credential. Anthropic/OpenAI reject it,
and the interception log shows:
```
coderd.aibridged.pool: interception failed ... error="all configured keys failed authentication"
credential_kind=centralized credential_hint=PaPb...4A== credential_length=184
```
**Symptom 2 (writes):** `SeedAIProvidersFromEnv` was also running before
`newAPI`, against the unwrapped store, so env-derived keys
(`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`,
etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID =
null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set.
## Fix
Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after
`newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes
encrypt correctly; reads decrypt correctly.
The enterprise closure (`enterprise/cli/server.go`) runs *inside*
`newAPI` and calls `BuildProviders` for the aibridgeproxyd at that
point. Once the agpl seed moves to after `newAPI`, the proxy on first
boot would see no env-seeded providers. Add a matching seed call inside
the enterprise closure before its `BuildProviders` to cover that case.
Seeding is idempotent, so the agpl-side seed running again post-`newAPI`
is a no-op when the rows already exist.
## Known shortcomings
The clean version of this fix would just inherit `ctx` like every other
startup step and place these calls naturally. It can't, for two reasons
that are both about the surrounding handler architecture rather than
this change:
1. **`dbcrypt` wrapping is positioned inside `newAPI`, not around
`options.Database` at creation.** That's why both seed and build have to
wait until after `newAPI` in the first place. The principled fix is to
install the wrapper at the point the store is created (behind a hook the
enterprise build supplies), so every consumer sees a single
authoritative view and the ordering stops mattering. This would also
collapse the duplicated seed call back to a single site.
2. **The handler's shutdown sequence is not deferred.**
`coderAPICloser.Close()` and the other teardown steps run only if
control reaches the `select` at the bottom of the handler. An early
`return` from anywhere in Phase 1 (e.g. seed/build returning
`context.Canceled` when the user hits ctrl-c during startup) skips that
block and orphans all the goroutines `newAPI` spawned — tailnet workers,
gitsync, telemetry batcher, etc. `goleak` then catches them at package
teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving
the shutdown into deferred closers (with a `sync.Once`-guarded close to
avoid double-close from the explicit Phase 2 call) is the principled
fix.
For this PR I took the smallest change that fixes the reported bugs: a
detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the
seed and build call sites in both the agpl and enterprise paths. It lets
the calls complete even if the user cancels during startup, after which
the handler reaches its shutdown select naturally and tears down through
Phase 2. Both shortcomings above are worth addressing separately.
## Test plan
- `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`;
passes locally with `-count=3`.
- Manually verified on a deployment with
`CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers:
`ai_provider_keys.api_key_key_id` is populated, `api_key` is base64
ciphertext, and upstream auth succeeds.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Problem
When visiting `/ai/settings/governance`, both **AI Governance** and
**Providers** items in the AI settings subnav appear highlighted as
active.
## Cause
`SettingsSidebarNavItem` is built on react-router's `<NavLink>`, which
by default treats a link as active when the current URL **starts with**
the link's `to` path. Since `/ai/settings/governance` starts with
`/ai/settings`, the Providers item is also marked active.
## Fix
Pass `end` on the Providers nav item so it only matches when the path is
exactly `/ai/settings` (the index route). The `SettingsSidebarNavItem`
component already supports this prop for exactly this case.
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
> 🤖 Generated with [Coder Agents](https://coder.com/agents) on behalf of
@tracyjohnsonux
Updates the providers page description to explain that providers power
Coder Agents, AI Gateway, and other LLM features. Adds a "Manage
deployment-wide BYOK" link to the docs.
Uses `<Link>` component and `docs()` helper per project conventions.