When GITHUB_BASE_REF is set, the emdash lint compared against the tip
of main instead of the merge-base. For PRs behind main, this produced
a diff covering all divergent files, flagging pre-existing emdashes the
PR never touched.
Query the PR commit count via gh, deepen HEAD by that amount, and
resolve HEAD~N as the merge-base. Falls back to the branch tip when
the merge-base cannot be determined.
OpenAI-compatible chat paths hit two provider compatibility issues. Some
compatible endpoints reject a named `tool_choice` when there is only one
tool, and Gemini's OpenAI-compatible endpoint requires thought
signatures on current-turn tool calls.
Centralize OpenAI-compatible request patches in the chat provider:
rewrite single named tool choices to `"required"`, and add the
documented dummy Google thought signature to the first tool call in each
current-turn tool step for Gemini routes. Vercel OpenAI-compatible
requests are left unchanged for the thought-signature patch.
> Mux created this PR on behalf of Mike.
Changes the "Manage deployment-wide BYOK" link on the AI Providers
settings page (`/ai/settings`) to "View docs", matching the pattern used
on the provisioner keys page (`/organizations/{org}/provisioner-keys`).
### Changes
- Swapped `Link` from `react-router` to `#/components/Link/Link` (uses
`href` instead of `to`)
- Removed `target="_blank"` and `rel="noreferrer"`: the link now
navigates in the same tab, matching the provisioner keys page convention
- Changed link text from "Manage deployment-wide BYOK" to "View docs"
> Generated by Coder Agents on behalf of @tracyjohnsonux
Adds a `flake-go` workflow that hunts for ordering-dependent and racy Go
tests on pull requests. The workflow runs only on PRs (cancelling
earlier runs on new commits) and skips test execution when no Go test
files changed.
A single `flake_go` job uses
[coder/whichtests](https://github.com/coder/whichtests) with
`--coalesce` to compute the directly-modified `Test*` functions from the
PR diff and emit them as one target row. The same job then runs those
selected tests on a deliberately resource-constrained 4-vCPU runner with
4x parallelism oversubscription, `-count=25`, and `-shuffle=on` to
amplify contention and surface flakes.
Pinned at
[coder/whichtests@ec33bab](https://github.com/coder/whichtests/commit/ec33bab1ec04cd86beb7a61a069db4463dba63f5).
Reuses the `test-go-pg` composite (with its new `run-regex`,
`test-shuffle`, and `gotestsum-json-file` inputs) and the
`go-test-failure-report` composite, both introduced on the base branch
(#25670), so this workflow shares one implementation of the gotestsum +
failure-report path with the existing CI jobs.
`Makefile` adds `TEST_SHUFFLE` support and single-quotes `RUN` so
whichtests' regex survives shell parsing.
Stacked on top of #25670.
Demo @
https://github.com/coder/coder/actions/runs/26494322649/job/78018779381?pr=25667
Closes CODAGT-381
Fake agents now fetch their manifest, spawn a single per-agent metadata
goroutine, and emit batched BatchUpdateMetadata calls with 3072-byte
base64 payloads so scaletest runs mirror the load shape of real agents.
This matches what the current scaletest workspace template does for
metadata. In the future we can extend the harness here to take in a
config option for the metadata payload size.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Mux <mux@coder.com>
Bumps bundled Terraform from `1.15.2` to `1.15.5` across all pinned
locations:
- `.github/actions/setup-tf/action.yaml`
- `scripts/Dockerfile.base`
- `install.sh`
- `flake.nix` (+ updated SRI hash for the linux_amd64 zip)
- `mise.toml`
- `mise.lock` (+ updated per-platform SHA256 checksums)
- `provisioner/terraform/testdata/version.txt`
-
`provisioner/terraform/testdata/resources/ai-tasks-disabled/ai-tasks-disabled.tfplan.json`
## Why
Terraform 1.15.5 is built with Go 1.25.10, while the 1.15.2 we currently
ship was built with Go 1.25.8. The newer Go runtime addresses recent
stdlib CVEs flagged by security scanners.
Releases included: 1.15.3 (provider install crash fix, nested-module
stack migration fix), 1.15.4 (Linux s390x builds, symlinked provider dir
fix), 1.15.5.
Release notes:
https://github.com/hashicorp/terraform/releases/tag/v1.15.5
## Cherry-pick
#25747 mirrors this PR against `release/2.34`.
Created on behalf of @Shelnutt2
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
GitHub Actions does not reliably trigger the push-based CI workflow when
a new branch is created at a commit that already has a workflow run from
another branch (e.g. `main`). This meant cutting a release branch
produced no CI run on it, so `should_deploy.sh` never got to approve the
deploy from the release branch.
Adds the `create` event trigger to `ci.yaml` with a condition on the
`changes` job to only proceed for release branch creations. All other
jobs depend on `changes`, so non-release branch creations are a no-op.
> Generated with [Coder Agents](https://coder.com/agents) by @f0ssel
`NewDataBuilder` allocated `make([]byte, 0, req.FileSize)` using the
client-supplied `int64` with no upper-bound check. The DRPC 4 MiB wire
cap limits message size but not the integer value, so a crafted message
with `FileSize = 1<<40` forces a 1 TiB allocation, triggering an
unrecoverable `runtime.throw` that kills the entire `coderd` process.
Add a `MaxFileSize` constant (100 MiB, matching `HTTPFileMaxBytes` in
`coderd/files.go`) and reject negative or oversized `FileSize`, plus
negative or excessive `Chunks`, before the allocation.
`BytesToDataUpload` also returns an error for oversized data to preserve
the encode/decode round-trip contract. Fix a pre-existing reversed
subtraction in the `Add()` overflow error message.
Closes https://linear.app/codercom/issue/PLAT-231
<details>
<summary>Implementation details</summary>
- `provisionersdk/proto/dataupload.go`: New exported `MaxFileSize`
constant; validation in `NewDataBuilder` and `BytesToDataUpload`. Fixed
reversed subtraction in `Add()` error.
- `provisionersdk/proto/dataupload_test.go`: New
`TestNewDataBuilderValidation` with 7 subtests.
- Updated all 5 callers of `BytesToDataUpload` for new error return.
- Audited all `make([]byte, ...)` in provisioner paths; no other
client-supplied sizes.
</details>
> Generated by Coder Agents on behalf of @f0ssel
Other-user agent chats showed a banner that implied prompts would run as
the owner, but submitting from that view is forbidden.
This updates the banner to identify the chat owner and makes chats owned
by another user read-only in the UI by disabling the composer and hiding
inline send or edit follow-up actions.
> Mux working on behalf of Mike.
- Empty string is valid for `apiKeyID` in paths that genuinely lack a
caller key (e.g. agent-initiated context injection in
`workspaceAgentAddChatContext`). AI Gateway fail-closed check remains
the runtime safety net.
- Context injection paths (`persistInstructionFiles`, compaction) read
the key from `aibridge.DelegatedAPIKeyIDFromContext(ctx)`, set upstream
by `contextWithActiveTurnAPIKeyID`.
- Subagent context copy branches on `copiedRole ==
database.ChatMessageRoleUser` to choose the right append function.
> Generated by Coder Agents
The Go test jobs in `ci.yaml` each had ~30 lines of inline shell that
wrapped `gotestsum` with a PATH shim to capture JSON, then ran
`gotestsummary` and `upload-artifact` to publish a failure report. Three
jobs carried three near-identical copies.
This change replaces the three inline blocks with a single composite
action at `.github/actions/go-test-failure-report/` that runs the same
`gotestsummary` invocation, writes the same markdown to
`GITHUB_STEP_SUMMARY`, and uploads the same NDJSON artifact. The PATH
shim is gone; gotestsum's native `GOTESTSUM_JSONFILE` env variable is
used instead, plumbed through the `test-go-pg` composite.
`test-go-pg` gains three optional inputs:
- `gotestsum-json-file` — explicit JSON file path (or `default` for
`${RUNNER_TEMP}/go-test.json`)
- `run-regex` — passed to `go test -run`
- `test-shuffle` — passed to `go test -shuffle`
All three have safe defaults so existing callers are unaffected.
No observable change in CI behavior: the three existing test-go-pg jobs
continue to emit the same JSON, render the same failure summary, and
upload the same artifact.
Stacked under #25667, which uses the new composite and inputs to power a
new flake-detector workflow.
_Disclosure:_ _produced_ _with_ _Claude_ _Opus_ _4\.7_
AI Gateway only supports Anthropic (+Bedrock), OpenAI, and Copilot providers at present. All other types (Vercel, Gemini, etc) will be mapped to OpenAI since they support OpenAI-compatible endpoints.
<!--
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
-->
Makes `MsgQueue` exported, so it can be used in pubsub implementations outside PGPubsub.
## Summary
Three small docs fixes:
- **`docs/admin/integrations/oauth2-provider.md`**: Replace broken
relative link to `scripts/oauth2/README.md` with an absolute GitHub URL.
The previous link escaped the `docs/` tree
(`../../../scripts/oauth2/README.md`) and does not resolve in the
published docs site.
- **`docs/install/releases/feature-stages.md`**: Point the "Coder
documentation" link to `docs/about/contributing/documentation.md`. The
previous `../../README.md` target does not exist under `docs/`.
- **`docs/manifest.json`**: Add the missing `users oidc-claims` entry
alongside the other `users` CLI subcommands so the generated reference
page (`docs/reference/cli/users_oidc-claims.md`) is reachable from the
sidebar.
## Validation
- Confirmed each new link target exists on `main`
(`docs/about/contributing/documentation.md`, `scripts/oauth2/README.md`,
`docs/reference/cli/users_oidc-claims.md`).
- Pre-commit hooks pass (`fmt/markdown`, `lint/markdown`, `lint/emdash`,
`lint/typos`, etc.).
---
_This PR was prepared by a [Coder Agents](https://coder.com/) session on
behalf of @nickvigilante. Human review requested since this is a
docs-only change._
Fixes CODAGT-503
- Add failing-first coverage for manual title generation with missing
message `api_key_id`, with both context fallback and fail-closed cases.
- Set `aibridge.WithDelegatedAPIKeyID(ctx, apiKey.ID)` in
`regenerateChatTitle` and `proposeChatTitle`.
- In `generateManualTitleCandidate`, fall back to
`aibridge.DelegatedAPIKeyIDFromContext(ctx)` only when
`modelBuildOptionsFromMessages` yields an empty `ActiveAPIKeyID`.
- Keep `modelBuildOptionsFromMessages` pure and leave automatic title
generation unchanged.
For `vercel`, `openrouter`, and `openai-compat`, the
`<provider>/<model>` slash is part of the upstream model ID rather than
a hint. `ResolveModelWithProviderHint` was running
`parseCanonicalModelRef` before honoring `providerHint`, so a config
like `(provider=vercel, model=anthropic/claude-4-5-sonnet)` resolved to
`provider=anthropic, model=claude-4-5-sonnet` and the prefix-less model
name was forwarded to Vercel, which returned `Model 'claude-4-5-sonnet'
not found`.
Honor an explicit gateway provider hint before attempting canonical-ref
parsing. Non-gateway hints (anthropic, openai, etc.) keep the existing
canonical-ref-first behavior so `anthropic/claude-...` still has its
prefix stripped when routed directly to Anthropic.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Problem
Follow-on to:
- https://github.com/coder/coder/pull/25089
`coder exp sync start` still printed a generic success message when the
unit was ready on the first status check. That hid whether the unit had
no dependencies or had dependencies that were already satisfied before
`sync start` ran.
Before:
```text
Success
```
## Solution
Print explicit startup output for both ready-at-first-check cases.
After, dependencies already satisfied:
```text
Unit "test-unit" started immediately, dependencies already satisfied: [dep-unit, dep-unit-2]
```
After, no dependencies:
```text
Unit "test-unit" started with no dependencies
```
The existing waiting path is unchanged and still reports the
dependencies while waiting and after waiting finishes.
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Previously the in-process aibridge daemon and the enterprise aibridgeproxy daemon both snapshotted their provider routing once at boot. Any `ai_providers` or `ai_provider_keys` mutation required a restart for either to pick it up.
Add an `ai_providers_changed` pubsub channel that the CRUD handlers publish on after Create / Update / Delete. Both daemons subscribe:
- **aibridged** rebuilds its `[]aibridge.Provider` snapshot via `BuildProviders` and swaps it into the pool atomically. Inflight requests keep serving against the bridge they already acquired; new acquires build against the new snapshot. Per-provider construction errors stay scoped to the offending row.
- **aibridgeproxyd** rebuilds its routing snapshot from `GetAIProviders` and swaps the host→provider map atomically. The MITM listener picks up new providers without restart.
DB read for aibridgeproxyd uses the existing `AsAIProviderMetadataReader` subject for routing-only access.
Fixes CODAGT-484.
- Removed "quota", "billing", "insufficient_quota", "payment required"
from `authStrongPatterns`
- Added `usageLimitPatterns` slice with those patterns
- Added `usageLimitMatch` signal and rule between overloaded and
authStrong in priority
- Added terminal/retry messages for `ChatErrorKindUsageLimit`
- Simplified auth message (removed billing reference)
- Frontend: conditional `!usageLimitStatus.provider` guard on the "View
Usage" Alert
- Added `TestClassify_UsageLimitBeatsAuth` with 5 cases including real
production OpenAI error
- Added `ProviderQuotaExceeded` story asserting no "View Usage" link and
correct `ChatStatusCallout` rendering
> Generated with [Coder Agents](https://coder.com/agents)
> [!WARNING]
> The investigation and solution in this PR were done with
[Mux](https://mux.coder.com/). I've reviewed the investigation
methodology, evidence and solution, and it all appears sound.
## Summary
PR #25570 (`refactor: move aibridged out of enterprise to AGPL`, merged
2026-05-22) added an in-memory aibridge DRPC server in
`coderd/aibridged.go` that does `api.WebsocketWaitGroup.Add(1)` and only
releases `Done()` when its client session is closed. PR #25575 then
flipped `CODER_AI_GATEWAY_ENABLED` to default to `true`, so every
`cli.Server()` invocation now spins up that goroutine.
In `cli/server.go`, the only call to `aibridgeDaemon.Close()` was a
`defer` scheduled at function return. During graceful shutdown the code
first calls `coderAPICloser.Close()`, which waits on
`api.WebsocketWaitGroup`. That wait sits for the full 10s timeout in
`coderd/coderd.go` (`websocket shutdown timed out after 10 seconds`),
then returns, then the function unwinds, and only then does the deferred
`aibridgeDaemon.Close()` fire and let the goroutine call `Done()`.
The 10s tax was previously latent (aibridged was enterprise-only and
opt-in). After the two May 22 PRs it hit every `cli.Server()` test. On
Linux/macOS CI it just makes the suite slower; on the Depot Windows
runner, the ramdisk reservation leaves only ~17 GiB of headroom and the
~10s shutdown tails of multiple concurrent package binaries overlap into
an OOM, presenting as `test-go-pg (windows-2022)` jobs that die silently
at the ~600s watchdog with an empty `steps` array.
See Slack:
https://codercom.slack.com/archives/C05AE94121Z/p1779807717764189
## Fix
Close `aibridgeDaemon` explicitly during graceful shutdown, **before**
`coderAPICloser.Close()` waits on the WebSocket wait group. This matches
the existing ordered-shutdown pattern used for `tunnel` and
`notificationsManager`. The deferred `aibridgeDaemon.Close()` is
retained as a safety net for early-return paths, and is safe to
double-call because `aibridged.Server.Close()` is already idempotent via
`shutdownOnce` in `coderd/aibridged/aibridged.go`.
## Regression test
`TestServer_AIGatewayShutdownOrdering` boots a real `coder server` with
`--ai-gateway-enabled=true`, cancels its context, and asserts graceful
shutdown finishes in under 8s. With the fix the test runs in ~0.1s;
without the fix it fails deterministically at ~10.0s. The flag is passed
explicitly so the test continues to guard the ordering even if the
deployment default is ever flipped back.
## Evidence this fixes the OOM
On Linux the patched `cli` test package drops from 114 s back to its
pre-regression 30 s wall time at the same single-process peak RSS (~7.6
GiB), and the `websocket shutdown timed out after 10 seconds` log line
disappears from every server-test run. Since the Windows OOM is the sum
of multiple concurrent 10 s shutdown tails overlapping past the runner's
~17 GiB headroom, removing those tails returns the concurrent-RSS budget
to its pre-regression level. The Windows OOM was intermittent (a handful
of hits across many runs since May 22), so a single green `test-go-pg
(windows-2022)` job on this PR is not by itself proof; confirmation will
come from watching Windows runs on `main` over the next several days and
seeing the ~600 s silent-kill fingerprint stop recurring.
Relates to ENG-2771
Replaces the linear progress bars and text labels in the sidebar footer
usage trigger with SVG donut ring charts that show the section icon
centered inside each ring.
## Changes
- **`SvgRingProgress`**: shared SVG component used by both
`UsageIndicator` and `ContextUsageIndicator`
- Ring colors follow the existing severity system
(normal/warning/exceeded)
- Hover tooltips show "Spend $12.50" and "Workspaces 30/100"
- Dropdown menu content unchanged; full usage details still appear on
click
- Removed dead `summaryValue` field and `size="compact"` variant
- Updated stories to cover ring trigger rendering and dropdown usage
details
> Generated by Coder Agents on behalf of @tracyjohnsonux
Replaces the blocking Dialog modal setup notice with a context-aware
inline banner above the chat input, with different messaging for admins
and members.
## Inline notice banner
The `AgentSetupNotice` component now renders as a `bg-surface-tertiary`
inline box instead of an unclosable `Dialog` modal. The notice sits
above the chat composer using negative margin overlap, and the composer
is forced opaque (`bg-surface-secondary`) when the notice is present so
the banner doesn't bleed through the semi-transparent desktop
background.
Three states based on role and configuration:
- **Admin, no providers or models**: links to both provider and model
setup
- **Admin, missing provider only**: link to provider setup
- **Admin, has providers but no models**: link to model setup only
- **Member, no models available**: generic "your admin is still getting
things set up" message
The admin/member distinction is determined via
`permissions.editDeploymentConfig` and applied in both `AgentChatPage`
and `AgentCreatePage`.
## Conflict resolution notes
During merge with main, the following were adapted:
- Sidebar filter props updated to main's
`sidebarFilters`/`onSidebarFiltersChange` pattern (replacing old
`archivedFilter`)
- Accepted `Sidebar/` -> `ChatsSidebar/` directory refactor from main
- Dropped `hasArchivedChats` query (its sidebar consumer was removed in
the refactor)
- Provider link updated to `/ai/settings` (new AI settings page)
> Generated with the assistance of Coder Agents on behalf of
@tracyjohnsonux
---------
Co-authored-by: jaaydenh <jaaydenh@users.noreply.github.com>
Move docs linting into the required CI umbrella and reuse the existing
`changes` job so docs lint runs when docs or CI files change, plus on
`main` as a backstop.
This is motivated by the docs lint failures on #25601. That PR touched
`.claude/docs/TESTING.md`; the standalone `Docs CI` workflow picked it
up because `docs-ci.yaml` used broad `**.md` matching, but local `pnpm
lint-docs` and `make lint` did not catch the same file because they only
scanned `docs/**` plus root `*.md`. The first failed Docs CI run
reported markdownlint errors in `.claude/docs/TESTING.md` (`MD040` and
`MD031`), and the next run reported a markdown table formatter failure
in the same file.
That mismatch is why this PR exists: prevent unrelated PRs from being
surprised by stale `.claude/docs/**` lint drift only after they happen
to touch one of those files. The local docs scripts now include
`.claude/docs/**`, and the old standalone `Docs CI` workflow is removed
so we do not maintain separate path-filter logic outside the required CI
workflow.
> Generated by mux, but reviewed by a human
Update the user secrets user guide, the admin security secrets
reference, and the docs manifest to label the feature as Beta instead of
Early Access, and link to the beta section of the feature stages doc.
Add a Postgres trigger and matching codersdk constants that cap each
user's secrets in four dimensions: count (50), total stored value bytes
(200 KiB), env-injected stored value bytes (24 KiB), and env name length
(256 bytes). Without these caps a user could overflow the 4 MiB DRPC
agent manifest, the ~32 KiB Windows process env
block, or Linux/macOS ARG_MAX at workspace start. The trigger is the
source of truth on aggregates; the handler maps its check_violation
error into a 400 that names the per-user budget in stored
(post-encryption) bytes. A handler test exercises off-by-one at each cap
across POST and PATCH, plus per-user budget isolation.
Generated with help from Coder Agents.
Adds an end-to-end enterprise CLI test to ensure legacy AI provider keys seeded at server startup are encrypted at rest when DBCrypt external token encryption is enabled, preventing regressions related to #25699.
> Partially implemented by Coder Agents, and massaged afterwards by me.
## Summary
Wraps external auth token refresh in an exponential-backoff retry so a
brief upstream hiccup (5xx, network timeout, rate-limited 429) no longer
surfaces as an `InvalidTokenError` and forces users to re-authenticate.
GitHub in particular has been flaky enough lately that this is hitting
real users.
## Behavior
- `(*Config).RefreshToken` now calls a helper that retries the
`TokenSource.Token()` exchange with exponential backoff (250ms → 2s),
bounded by a 10s total budget.
- Errors classified as permanent by `isFailedRefresh` (e.g.
`bad_refresh_token`, `invalid_grant`, `unauthorized_client`, ...) skip
the retry loop. Retrying a permanent failure wastes the refresh quota
and, on providers with single-use refresh tokens, can mask a legitimate
concurrent winner with repeated `bad_refresh_token` responses.
- Refreshes with an empty refresh token still short-circuit without
making an API call.
- The existing concurrent-refresh-race detection and optimistic-lock
paths are unchanged.
## Tunables
Three new `time.Duration` fields on `externalauth.Config`
(`RefreshRetryInitialBackoff`, `RefreshRetryMaxBackoff`,
`RefreshRetryTimeout`) let callers override the defaults. They default
to zero, which falls back to the package defaults, so existing call
sites are unaffected. The fields exist primarily so tests can dial the
timing way down without touching package globals (and therefore without
serializing parallel tests).
## Tests
- `TestRefreshToken/RefreshRetries` now disables internal retries via
`RefreshRetryTimeout = time.Nanosecond` so its existing "1 IDP call per
`RefreshToken` invocation" assertion still holds. Otherwise its
assertions are unchanged.
- New `TestRefreshToken/RefreshTokenWithBackoff` simulates 3 transient
5xx failures followed by success and verifies the refresh ultimately
succeeds with 4 total IDP attempts.
- New `TestRefreshToken/RefreshTokenBackoffPermanentError` returns
`bad_refresh_token` and verifies the refresh is **not** retried even
with a generous 1s budget.
<details>
<summary>Why the explicit <code>retryCtx.Err()</code> guard?</summary>
`retry.Retrier.Wait` `select`s between `time.After(delay)` and
`ctx.Done()`. The first call has `delay == 0`, so `time.After(0)` and an
already-cancelled context both fire immediately and Go picks the case
nondeterministically. Without the guard, a near-zero retry budget would
still trigger an unwanted extra refresh attempt roughly half the time,
which would have made the `RefreshRetries` test flaky.
</details>
This PR was opened by a Coder agent on behalf of @kylecarbs.
## Summary
Routes chatd model calls backed by concrete AI Provider rows through the
in-process aibridge transport by default, with deployment options to use
direct provider routing when AI Gateway is disabled or chat AI Gateway
routing is disabled.
- Splits model routing into common, direct provider, and AI Gateway
paths behind a single deployment-mode entry point.
- Builds chatd models through explicit request, route, and options data.
Active API key attribution is passed explicitly instead of being hidden
inside generic model construction.
- For AI Gateway BYOK routes, resolves the user's provider key in chatd,
forwards it through provider-specific auth headers, and sets
`X-Coder-AI-Governance-Token` to the `delegated` marker so aibridge
preserves those headers while still stripping Coder-specific metadata.
- Keeps central provider credentials and deployment fallback credentials
out of forwarded provider auth headers, so AI Gateway central policy
remains authoritative.
- Redacts delegated provider auth from default string formatting to
avoid accidental plaintext logging of user BYOK credentials.
- Covers selected chat models, advisor overrides, title and quickgen
paths, subagent overrides, computer use model selection, and an
integration-style chat turn through the aibridge transport path.
- Persists initiating API key IDs on chat and queued user messages,
including subagent child messages, and fails closed for AI
Gateway-routed model builds without an active key.
- Removes unused `api_key_id` indexes while keeping the persistence
columns and foreign keys.
- Keeps the deployment option available through config and env parsing,
but hides it from CLI help and generated docs.
- Stabilizes the subagent poll fallback test so background CreateChat
processing cannot win the state transition under slower CI environments.
## Tests
- `go test ./coderd/x/chatd -run
'TestAIGatewayProviderAuthForUser|TestAIGatewayProviderAuthRedactsFormatting|TestResolveModelRouteForConfigAIGatewayProviderAuth|TestAIGatewayModelForwardsProviderAuth|TestProcessChat_AIGatewayRoutingUsesDelegatedAPIKey|TestAwaitSubagentCompletion'
-count=1`
- `go test ./coderd/aibridged -run
'TestServeHTTP_DelegatedAPIKey|TestServeHTTP_StripCoderToken' -count=1`
- `git diff --check HEAD~1..HEAD`
- `make lint`
> Mux working on behalf of Mike.
## Problem
Two related symptoms of the same architectural issue: the `dbcrypt`
wrapper is installed inside `enterprise/coderd.New`, so any access to
`options.Database` that happens before `newAPI` runs bypasses
encryption.
**Symptom 1 (reads):** Provider keys added via the admin UI are
encrypted at rest. `BuildProviders` was running *before* `newAPI`,
against the unwrapped store, so the ciphertext was read as-is and shoved
into the keypool as the upstream credential. Anthropic/OpenAI reject it,
and the interception log shows:
```
coderd.aibridged.pool: interception failed ... error="all configured keys failed authentication"
credential_kind=centralized credential_hint=PaPb...4A== credential_length=184
```
**Symptom 2 (writes):** `SeedAIProvidersFromEnv` was also running before
`newAPI`, against the unwrapped store, so env-derived keys
(`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`,
etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID =
null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set.
## Fix
Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after
`newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes
encrypt correctly; reads decrypt correctly.
The enterprise closure (`enterprise/cli/server.go`) runs *inside*
`newAPI` and calls `BuildProviders` for the aibridgeproxyd at that
point. Once the agpl seed moves to after `newAPI`, the proxy on first
boot would see no env-seeded providers. Add a matching seed call inside
the enterprise closure before its `BuildProviders` to cover that case.
Seeding is idempotent, so the agpl-side seed running again post-`newAPI`
is a no-op when the rows already exist.
## Known shortcomings
The clean version of this fix would just inherit `ctx` like every other
startup step and place these calls naturally. It can't, for two reasons
that are both about the surrounding handler architecture rather than
this change:
1. **`dbcrypt` wrapping is positioned inside `newAPI`, not around
`options.Database` at creation.** That's why both seed and build have to
wait until after `newAPI` in the first place. The principled fix is to
install the wrapper at the point the store is created (behind a hook the
enterprise build supplies), so every consumer sees a single
authoritative view and the ordering stops mattering. This would also
collapse the duplicated seed call back to a single site.
2. **The handler's shutdown sequence is not deferred.**
`coderAPICloser.Close()` and the other teardown steps run only if
control reaches the `select` at the bottom of the handler. An early
`return` from anywhere in Phase 1 (e.g. seed/build returning
`context.Canceled` when the user hits ctrl-c during startup) skips that
block and orphans all the goroutines `newAPI` spawned — tailnet workers,
gitsync, telemetry batcher, etc. `goleak` then catches them at package
teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving
the shutdown into deferred closers (with a `sync.Once`-guarded close to
avoid double-close from the explicit Phase 2 call) is the principled
fix.
For this PR I took the smallest change that fixes the reported bugs: a
detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the
seed and build call sites in both the agpl and enterprise paths. It lets
the calls complete even if the user cancels during startup, after which
the handler reaches its shutdown select naturally and tears down through
Phase 2. Both shortcomings above are worth addressing separately.
## Test plan
- `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`;
passes locally with `-count=3`.
- Manually verified on a deployment with
`CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers:
`ai_provider_keys.api_key_key_id` is populated, `api_key` is base64
ciphertext, and upstream auth succeeds.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Problem
When visiting `/ai/settings/governance`, both **AI Governance** and
**Providers** items in the AI settings subnav appear highlighted as
active.
## Cause
`SettingsSidebarNavItem` is built on react-router's `<NavLink>`, which
by default treats a link as active when the current URL **starts with**
the link's `to` path. Since `/ai/settings/governance` starts with
`/ai/settings`, the Providers item is also marked active.
## Fix
Pass `end` on the Providers nav item so it only matches when the path is
exactly `/ai/settings` (the index route). The `SettingsSidebarNavItem`
component already supports this prop for exactly this case.
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
> 🤖 Generated with [Coder Agents](https://coder.com/agents) on behalf of
@tracyjohnsonux
Updates the providers page description to explain that providers power
Coder Agents, AI Gateway, and other LLM features. Adds a "Manage
deployment-wide BYOK" link to the docs.
Uses `<Link>` component and `docs()` helper per project conventions.
Adds the account settings UI for managing user secrets, including the
table, add/edit/delete dialog, Storybook coverage, and route/sidebar
entry.
Also updates the shared `FeatureStageBadge` beta variant with
dedicated beta styling, sizing, and label casing for the Secrets
page.
Stacked on #25370.
_This PR was generated by Coder Agents._
The `multi-select` form type description in the dynamic parameters docs
incorrectly stated it renders checkboxes. The actual UI is a searchable
dropdown combobox (`MultiSelectCombobox`) with selected items shown as
removable chips.
> This PR was authored by Coder Agents on behalf of @uzair-coder07.
Previously we were only extracting the API when _not_ delegating auth;
this is incorrect.
We need to extract the key _always_ when BYOK is intended.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
> 🤖 This PR was written by Coder Agents on behalf of Jake Howell.
Linear: [DEVEX-355](https://linear.app/coder/issue/DEVEX-355)
Fifth and final PR in a 5-PR stack splitting #25328. Surfaces the AI
settings section in the dashboard chrome and moves the existing AI
Governance page out of `/deployment`.
- `Navbar` / `NavbarView` / `DeploymentDropdown` gain a
`canViewAISettings` prop sourced from the `viewAnyAIProvider` permission
added in PR 2. The deployment dropdown gets a new AI entry that links to
`/ai/settings`.
- `DeploymentSidebarView` drops the AI-related entries that now live
under `/ai/settings`.
- `AISettingsSidebarView` expands to include AI Governance and a
cross-section link to Manage Coder Agents.
- `router.tsx` removes the `/deployment/ai-governance` route and mounts
the matching `/ai/settings/governance` child route under the new AI
settings layout.
- `ChatsSidebar` settings panel repoints the Providers link from
`/deployment/ai-providers` to `/ai/settings`.
<details>
<summary>Stack</summary>
1. #25579 jakehwll/DEVEX-355/01-primitives, primitives
2. #25580 jakehwll/DEVEX-355/02-api, API client and query layer
3. #25581 jakehwll/DEVEX-355/03-components, provider form components
4. #25583 jakehwll/DEVEX-355/04-pages, pages and routes
5. **jakehwll/DEVEX-355/05-section, section reshuffle (this PR)**
Replaces #25328 once the stack lands.
</details>
> 🤖 This PR was written by Coder Agents on behalf of Jake Howell.
Linear: [DEVEX-355](https://linear.app/coder/issue/DEVEX-355)
Fourth PR in a 5-PR stack splitting #25328. Wires the new `/ai/settings`
provider management UI.
- `AISettingsLayout` hosts the section under `/ai/settings` with a
sidebar outlet.
- `AISettingsSidebar(View)` shows a single "Providers" nav entry. The
remaining sidebar entries arrive with the broader AI settings section
reshuffle in the next PR.
- `ProvidersPage` lists configured AI providers via the queries added in
PR 2.
- `AddProviderPage` walks through provider-type selection and form
submission, with type-specific credential fields.
- `UpdateProviderPage` edits an existing provider with the same form
components.
- Storybook stories cover each view's loading, empty, populated, error,
and form states using the mock providers from `testHelpers/entities.ts`.
- `router.tsx` mounts the new `/ai/settings` layout with index, `add`,
and `:providerId` child routes. The `governance` child route lands
together with the dashboard navigation changes in the next PR.
Removes the now-unused knip ignore entries for
`src/api/queries/aiProviders.ts` and
`src/pages/AISettingsPage/ProvidersPage/components/addableProviderTypes.ts`,
and drops the matching `@lintignore` tags on `getProviderIcon` and
`MockAIProviders` since the pages and page stories now consume them.
<details>
<summary>Stack</summary>
1. #25579 jakehwll/DEVEX-355/01-primitives, primitives
2. #25580 jakehwll/DEVEX-355/02-api, API client and query layer
3. #25581 jakehwll/DEVEX-355/03-components, provider form components
4. **jakehwll/DEVEX-355/04-pages, pages and routes (this PR)**
5. jakehwll/DEVEX-355/05-section, section reshuffle
Replaces #25328 once the stack lands.
</details>
> 🤖 This PR was written by Coder Agents on behalf of Jake Howell.
Linear: [DEVEX-355](https://linear.app/coder/issue/DEVEX-355)
Third PR in a 5-PR stack splitting #25328. Adds the component-level
pieces used by the provider management pages landing in the next PR of
the stack.
- `ProviderForm` + `CredentialField` + a provider type-to-form mapping
for reading and editing the per-type credential and config fields, with
the form API map covered by unit tests.
- `ProviderIcon` resolves the bundled per-provider SVG icons and falls
back to a building glyph for unknown types.
- `ProviderRow` renders a single provider entry for the list view.
- `useUnsavedChangesPrompt` hook intercepts unsaved-form navigation.
- Storybook stories for `ProviderForm`, `ProviderIcon`, and
`ProviderRow` exercise each provider type and form state and consume the
mock providers from PR 2.
Stories now consume `MockAIProviderOpenAI` / `Anthropic` / `Bedrock` so
their per-mock `@lintignore` tags are removed; the `MockAIProviders`
aggregate and the `addableProviderTypes` / `aiProviders` query modules
keep their exclusions for the page stories in the next PR.
<details>
<summary>Stack</summary>
1. #25579 jakehwll/DEVEX-355/01-primitives, primitives
2. #25580 jakehwll/DEVEX-355/02-api, API client and query layer
3. **jakehwll/DEVEX-355/03-components, provider form components (this
PR)**
4. jakehwll/DEVEX-355/04-pages, pages and routes
5. jakehwll/DEVEX-355/05-section, section reshuffle
Replaces #25328 once the stack lands.
</details>