Commit Graph

14542 Commits

Author SHA1 Message Date
Cian Johnston 0a73ec6a50 feat(site/src/pages/AgentsPage): show error details for generic errors (#25803)
Error messages in agent chat now expose the actual error detail
instead of hiding it entirely. Also captures API response detail
for generic errors that previously dropped it.

(cherry picked from commit 78d556fffc)
2026-06-02 12:23:59 +01:00
github-actions[bot] 26c035d742 fix(site): show condensed count for multi-provider in sessions list (#25705) (#25932)
Cherry-pick of https://github.com/coder/coder/pull/25705

Original PR: #25705 — fix(site): show condensed count for multi-provider
in sessions list
Merge commit: fc01aeeb0f
Requested by: @tracyjohnsonux

Co-authored-by: TJ <tracy@coder.com>
2026-06-01 14:09:56 -04:00
github-actions[bot] 01766e9694 docs: document chat sharing (#25592) (#25927)
Cherry-pick of https://github.com/coder/coder/pull/25592

Original PR: #25592 — docs: document chat sharing
Merge commit: 372265a0b5
Requested by: @david-fraley

Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
2026-06-01 13:42:21 -04:00
github-actions[bot] f4bf286deb docs: document AI providers seeding mechanism & support for new types (#25855) (#25906)
Cherry-pick of https://github.com/coder/coder/pull/25855

Original PR: #25855 — docs: document AI providers seeding mechanism &
support for new types
Merge commit: f9937a8931
Requested by: @dannykopping

---------

Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Susana Ferreira <susana@coder.com>
2026-06-01 13:41:19 -04:00
github-actions[bot] ec2d20a7f1 feat: support adding GitHub Copilot AI provider via UI (#25888) (#25902)
Cherry-pick of https://github.com/coder/coder/pull/25888

Original PR: #25888 — feat: support adding GitHub Copilot AI provider
via UI
Merge commit: a85462bd49
Requested by: @dannykopping

Co-authored-by: Danny Kopping <danny@coder.com>
2026-06-01 13:40:25 -04:00
github-actions[bot] ea971d54f3 fix: deprecate ai provider seeding env config (#25854) (#25900)
Cherry-pick of https://github.com/coder/coder/pull/25854

Original PR: #25854 — fix: deprecate ai provider seeding env config
Merge commit: c8555e2163
Requested by: @dannykopping

Co-authored-by: Danny Kopping <danny@coder.com>
2026-06-01 13:40:09 -04:00
Dean Sheather f7369502bf chore: disable release freezing on dev.coder.com (#25881) (#25912)
(cherry picked from commit 9c111a2be2)
2026-06-01 17:01:43 +02:00
github-actions[bot] 32882aee95 fix: recreate ai_provider_type instead of ADD VALUE (#25895) (#25904)
Cherry-pick of https://github.com/coder/coder/pull/25895

Original PR: #25895 — fix: recreate `ai_provider_type` instead of ADD
VALUE
Merge commit: 85f56e4944
Requested by: @dannykopping

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 10:32:44 -04:00
github-actions[bot] eb918f9ad5 chore: Style fixes and nits across the AI Governance docs (#25793) (#25897)
Backport of https://github.com/coder/coder/pull/25793

Original PR: #25793 — chore: Style fixes and nits across the AI
Governance docs
Merge commit: 61a9c4a61d
Requested by: @nickvigilante

Co-authored-by: Nick Vigilante <nickvigilante@users.noreply.github.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2026-06-01 10:06:34 -04:00
github-actions[bot] 295d2de5d7 feat(site): add Opus 4.8 known model (#25839) (#25853)
Cherry-pick of https://github.com/coder/coder/pull/25839

Original PR: #25839 — feat(site): add Opus 4.8 known model
Merge commit: 9448624d2d
Requested by: @ibetitsmike

Co-authored-by: Thomas Kosiewski <tk@coder.com>
2026-05-29 19:59:47 -04:00
Cian Johnston 2d640eaf76 feat: classify provider_disabled 503 as non-retryable (#25800) (#25860)
(NOTE: Depends on https://github.com/coder/coder/pull/25837)

Adds a new `provider_disabled` error classification in `chatd` with the
corresponding plumbing to classify it as non-retryable. Also adds a
story for how this particular error kind is displayed in the UI.

(cherry picked from commit d0a51da0a9)

<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
2026-05-29 16:54:20 -04:00
Cian Johnston 359a39f58a fix: add missing_key error kind for missing chat api_key_id (#25783) (#25798)
Refs CODAGT-486

- `codersdk/chats.go`: New `ChatErrorKindMissingKey` constant and
`AllChatErrorKinds` entry
- `coderd/x/chatd/chaterror/message.go`: `terminalMessage` and
`retryMessage` cases
- `coderd/x/chatd/model_routing_aibridge.go`: Pre-classify error with
`WithClassification`
- `coderd/x/chatd/model_routing_internal_test.go`: Classification
assertion on production path (CRF-2)
- `chatStatusHelpers.ts`: Frontend title "Chat interrupted"
- `LiveStreamTail.stories.tsx`: Storybook story with `detail` assertion
- `docs/ai-coder/ai-gateway/clients/coder-agents.md`: Troubleshooting
entry
- Tests: classification round-trip, terminal message, metrics kind
enumeration

> Generated with [Coder Agents](https://coder.com/agents) on behalf of
@johnstcn

(cherry picked from commit 6df1536256)

<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
2026-05-29 13:07:19 -04:00
Cian Johnston 804bb3c0cf fix(coderd): enforce api_key_id on user messages at type level (#25729) (#25797)
- Empty string is valid for `apiKeyID` in paths that genuinely lack a
caller key (e.g. agent-initiated context injection in
`workspaceAgentAddChatContext`). AI Gateway fail-closed check remains
the runtime safety net.
- Context injection paths (`persistInstructionFiles`, compaction) read
the key from `aibridge.DelegatedAPIKeyIDFromContext(ctx)`, set upstream
by `contextWithActiveTurnAPIKeyID`.
- Subagent context copy branches on `copiedRole ==
database.ChatMessageRoleUser` to choose the right append function.

> Generated by Coder Agents

(cherry picked from commit b278be7361)

<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
2026-05-29 13:06:06 -04:00
github-actions[bot] 476ed480d1 fix(coderd): block ai provider env key drift (#25849) (#25851)
Cherry-pick of https://github.com/coder/coder/pull/25849

Original PR: #25849 — fix(coderd): block ai provider env key drift
Merge commit: 110210d7c9
Requested by: @dannykopping

Co-authored-by: Danny Kopping <danny@coder.com>
2026-05-29 13:00:45 -04:00
github-actions[bot] 663f1ee834 fix: track credential hint across key failover attempts in aibridge (#25735) (#25847)
Cherry-pick of https://github.com/coder/coder/pull/25735

Original PR: #25735 — fix: track credential hint across key failover
attempts in aibridge
Merge commit: 7b903cad73
Requested by: @ssncferreira

Co-authored-by: Susana Ferreira <susana@coder.com>
2026-05-29 12:59:39 -04:00
github-actions[bot] cccf436db2 feat: serve 503 sentinel for disabled providers (#25794) (#25837)
Cherry-pick of https://github.com/coder/coder/pull/25794

Original PR: #25794 — feat: serve 503 sentinel for disabled providers
Merge commit: 5b10268827
Requested by: @dannykopping

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 12:58:13 -04:00
github-actions[bot] cf6311b9e0 fix(coderd/x/chatd): harden openai-compatible chat calls (#25737) (#25796)
Cherry-pick of https://github.com/coder/coder/pull/25737

Original PR: #25737 — fix(coderd/x/chatd): harden openai-compatible chat
calls
Merge commit: f529577bee
Requested by: @ibetitsmike

Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>
2026-05-29 12:53:24 -04:00
github-actions[bot] c350e98a6e fix(site): update models settings page description text (#25830) (#25831)
Cherry-pick of https://github.com/coder/coder/pull/25830

Original PR: #25830 — fix(site): update models settings page description
text
Merge commit: a801d996e7
Requested by: @tracyjohnsonux

Co-authored-by: TJ <tracy@coder.com>
2026-05-29 12:49:46 -04:00
Danny Kopping 7e5e8eb9d2 fix: add ai provider status and reload freshness metrics (#25770) (#25795)
Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI
providers can be modified, and possibly misconfigured, at runtime. These
metrics help operators understand the state of these provider
definitions in case unexpected behaviour is observed.

(cherry picked from commit 12520ee964)
2026-05-28 18:54:02 +02:00
github-actions[bot] 85d39b3dbe fix(coderd/x/chatd/chatloop): use stream silence timeout (#25782) (#25786)
Cherry-pick of https://github.com/coder/coder/pull/25782

Original PR: #25782 — fix(coderd/x/chatd/chatloop): use stream silence
timeout
Merge commit: 7e2f7198dd
Requested by: @ethanndickson

Co-authored-by: Ethan <ethanndickson@gmail.com>
2026-05-28 11:29:14 -04:00
github-actions[bot] eb8b062b1d fix: re-validate provider per request and classify reloads (#25766) (#25788)
Cherry-pick of https://github.com/coder/coder/pull/25766

Original PR: #25766 — fix: re-validate provider per request and classify
reloads
Merge commit: a9f5ed7644
Requested by: @dannykopping

Co-authored-by: Danny Kopping <danny@coder.com>
2026-05-28 09:29:30 -04:00
github-actions[bot] 570b193ed7 refactor(site): update BYOK link to use "View docs" on AI settings page (#25743) (#25764)
Cherry-pick of https://github.com/coder/coder/pull/25743

Original PR: #25743 — refactor(site): update BYOK link to use "View
docs" on AI settings page
Merge commit: cfa343e456
Requested by: @dannykopping

Co-authored-by: TJ <tracy@coder.com>
2026-05-28 09:29:02 -04:00
blinkagent[bot] 75f51532f3 chore: update terraform to v1.15.5 (#25747)
Cherry-pick of #25746 to `release/2.34`.

Bumps bundled Terraform from `1.15.2` to `1.15.5`. Terraform 1.15.5 is
built with Go 1.25.10 (vs Go 1.25.8 in 1.15.2), addressing Go stdlib
CVEs flagged by security scanners.

Files changed:
- `.github/actions/setup-tf/action.yaml`
- `scripts/Dockerfile.base`
- `install.sh`
- `flake.nix` (+ updated SRI hash for the linux_amd64 zip)
- `mise.toml`
- `mise.lock` (+ updated per-platform SHA256 checksums)
- `provisioner/terraform/testdata/version.txt`
-
`provisioner/terraform/testdata/resources/ai-tasks-disabled/ai-tasks-disabled.tfplan.json`

Release notes:
https://github.com/hashicorp/terraform/releases/tag/v1.15.5

(cherry picked from commit bcc6cca040 —
will be updated to the merged SHA from #25746)

Created on behalf of @Shelnutt2

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
2026-05-27 16:46:09 -04:00
github-actions[bot] c457a62d41 ci: trigger CI on release branch creation (#25744) (#25752)
Cherry-pick of https://github.com/coder/coder/pull/25744

Original PR: #25744 — ci: trigger CI on release branch creation
Merge commit: 5991a2c8b0
Requested by: @f0ssel

Co-authored-by: Garrett Delfosse <garrett@coder.com>
2026-05-27 14:47:49 -04:00
Ethan f422ac89cc ci: extract go-test-failure-report composite action (#25670)
The Go test jobs in `ci.yaml` each had ~30 lines of inline shell that
wrapped `gotestsum` with a PATH shim to capture JSON, then ran
`gotestsummary` and `upload-artifact` to publish a failure report. Three
jobs carried three near-identical copies.

This change replaces the three inline blocks with a single composite
action at `.github/actions/go-test-failure-report/` that runs the same
`gotestsummary` invocation, writes the same markdown to
`GITHUB_STEP_SUMMARY`, and uploads the same NDJSON artifact. The PATH
shim is gone; gotestsum's native `GOTESTSUM_JSONFILE` env variable is
used instead, plumbed through the `test-go-pg` composite.

`test-go-pg` gains three optional inputs:

- `gotestsum-json-file` — explicit JSON file path (or `default` for
`${RUNNER_TEMP}/go-test.json`)
- `run-regex` — passed to `go test -run`
- `test-shuffle` — passed to `go test -shuffle`

All three have safe defaults so existing callers are unaffected.

No observable change in CI behavior: the three existing test-go-pg jobs
continue to emit the same JSON, render the same failure summary, and
upload the same artifact.

Stacked under #25667, which uses the new composite and inputs to power a
new flake-detector workflow.
2026-05-28 00:16:46 +10:00
Danny Kopping 2770bdc9d1 feat: route extra ai_provider_types through OpenAI and Anthropic providers (#25722)
_Disclosure:_ _produced_ _with_ _Claude_ _Opus_ _4\.7_

AI Gateway only supports Anthropic (+Bedrock), OpenAI, and Copilot providers at present. All other types (Vercel, Gemini, etc) will be mapped to OpenAI since they support OpenAI-compatible endpoints.
2026-05-27 16:16:05 +02:00
Spike Curtis 6f06ace949 chore: export MsgQueue from pubsub package (#25707)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->

Makes `MsgQueue` exported, so it can be used in pubsub implementations outside PGPubsub.
2026-05-27 10:11:51 -04:00
Danielle Maywood d1e27889eb fix(site): improve chat sharing mobile layout (#25687) 2026-05-27 15:03:29 +01:00
Danielle Maywood 5603be19cc feat(site): add transcript tool icons (#25724) 2026-05-27 14:43:14 +01:00
Nick Vigilante ecaf5e022b docs: fix broken references and add users oidc-claims to manifest (#25706)
## Summary

Three small docs fixes:

- **`docs/admin/integrations/oauth2-provider.md`**: Replace broken
relative link to `scripts/oauth2/README.md` with an absolute GitHub URL.
The previous link escaped the `docs/` tree
(`../../../scripts/oauth2/README.md`) and does not resolve in the
published docs site.
- **`docs/install/releases/feature-stages.md`**: Point the "Coder
documentation" link to `docs/about/contributing/documentation.md`. The
previous `../../README.md` target does not exist under `docs/`.
- **`docs/manifest.json`**: Add the missing `users oidc-claims` entry
alongside the other `users` CLI subcommands so the generated reference
page (`docs/reference/cli/users_oidc-claims.md`) is reachable from the
sidebar.

## Validation

- Confirmed each new link target exists on `main`
(`docs/about/contributing/documentation.md`, `scripts/oauth2/README.md`,
`docs/reference/cli/users_oidc-claims.md`).
- Pre-commit hooks pass (`fmt/markdown`, `lint/markdown`, `lint/emdash`,
`lint/typos`, etc.).

---

_This PR was prepared by a [Coder Agents](https://coder.com/) session on
behalf of @nickvigilante. Human review requested since this is a
docs-only change._
2026-05-27 09:29:16 -04:00
Cian Johnston 0c27224fc2 fix(coderd): pass title API key context (#25723)
Fixes CODAGT-503

- Add failing-first coverage for manual title generation with missing
message `api_key_id`, with both context fallback and fail-closed cases.
- Set `aibridge.WithDelegatedAPIKeyID(ctx, apiKey.ID)` in
`regenerateChatTitle` and `proposeChatTitle`.
- In `generateManualTitleCandidate`, fall back to
`aibridge.DelegatedAPIKeyIDFromContext(ctx)` only when
`modelBuildOptionsFromMessages` yields an empty `ActiveAPIKeyID`.
- Keep `modelBuildOptionsFromMessages` pure and leave automatic title
generation unchanged.
2026-05-27 13:20:36 +01:00
Danny Kopping 10f37db35d fix(coderd/x/chatd/chatprovider): keep gateway model prefix in ResolveModelWithProviderHint (#25725)
For `vercel`, `openrouter`, and `openai-compat`, the
`<provider>/<model>` slash is part of the upstream model ID rather than
a hint. `ResolveModelWithProviderHint` was running
`parseCanonicalModelRef` before honoring `providerHint`, so a config
like `(provider=vercel, model=anthropic/claude-4-5-sonnet)` resolved to
`provider=anthropic, model=claude-4-5-sonnet` and the prefix-less model
name was forwarded to Vercel, which returned `Model 'claude-4-5-sonnet'
not found`.

Honor an explicit gateway provider hint before attempting canonical-ref
parsing. Non-gateway hints (anthropic, openai, etc.) keep the existing
canonical-ref-first behavior so `anthropic/claude-...` still has its
prefix stripped when routed directly to Anthropic.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:13:39 +00:00
Max Schwenk ae492495ee fix(cli): show ready sync start dependencies (#25546)
## Problem

Follow-on to:

- https://github.com/coder/coder/pull/25089

`coder exp sync start` still printed a generic success message when the
unit was ready on the first status check. That hid whether the unit had
no dependencies or had dependencies that were already satisfied before
`sync start` ran.

Before:

```text
Success
```

## Solution
Print explicit startup output for both ready-at-first-check cases.

After, dependencies already satisfied:

```text
Unit "test-unit" started immediately, dependencies already satisfied: [dep-unit, dep-unit-2]
```

After, no dependencies:

```text
Unit "test-unit" started with no dependencies
```

The existing waiting path is unchanged and still reports the
dependencies while waiting and after waiting finishes.

Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
2026-05-27 12:33:39 +02:00
Danny Kopping 79e007cf30 feat: hot-reload aibridged and aibridgeproxyd providers on DB changes (#25673)
Previously the in-process aibridge daemon and the enterprise aibridgeproxy daemon both snapshotted their provider routing once at boot. Any `ai_providers` or `ai_provider_keys` mutation required a restart for either to pick it up.

Add an `ai_providers_changed` pubsub channel that the CRUD handlers publish on after Create / Update / Delete. Both daemons subscribe:

- **aibridged** rebuilds its `[]aibridge.Provider` snapshot via `BuildProviders` and swaps it into the pool atomically. Inflight requests keep serving against the bridge they already acquired; new acquires build against the new snapshot. Per-provider construction errors stay scoped to the offending row.
- **aibridgeproxyd** rebuilds its routing snapshot from `GetAIProviders` and swaps the host→provider map atomically. The MITM listener picks up new providers without restart.

DB read for aibridgeproxyd uses the existing `AsAIProviderMetadataReader` subject for routing-only access.
2026-05-27 11:58:43 +02:00
Cian Johnston 6acfe6c835 fix: classify quota errors as usage_limit instead of auth (#25676)
Fixes CODAGT-484.

- Removed "quota", "billing", "insufficient_quota", "payment required"
from `authStrongPatterns`
- Added `usageLimitPatterns` slice with those patterns
- Added `usageLimitMatch` signal and rule between overloaded and
authStrong in priority
- Added terminal/retry messages for `ChatErrorKindUsageLimit`
- Simplified auth message (removed billing reference)
- Frontend: conditional `!usageLimitStatus.provider` guard on the "View
Usage" Alert
- Added `TestClassify_UsageLimitBeatsAuth` with 5 cases including real
production OpenAI error
- Added `ProviderQuotaExceeded` story asserting no "View Usage" link and
correct `ChatStatusCallout` rendering

> Generated with [Coder Agents](https://coder.com/agents)
2026-05-27 09:45:36 +01:00
Thomas Kosiewski e32be68687 fix(dogfood/coder): verify Homebrew installer (#25721) 2026-05-27 10:45:21 +02:00
Jake Howell 9c10ec2ca7 fix: resolve mui <TimelineDateRow /> regression (#25716) 2026-05-27 18:36:55 +10:00
Thomas Kosiewski bfa17c315e fix(dogfood/coder): persist mise user installs (#25720) 2026-05-27 09:54:09 +02:00
Ethan e91bec8574 fix(cli): close aibridge daemon before WebSocket shutdown wait (#25719)
> [!WARNING]
> The investigation and solution in this PR were done with
[Mux](https://mux.coder.com/). I've reviewed the investigation
methodology, evidence and solution, and it all appears sound.

## Summary

PR #25570 (`refactor: move aibridged out of enterprise to AGPL`, merged
2026-05-22) added an in-memory aibridge DRPC server in
`coderd/aibridged.go` that does `api.WebsocketWaitGroup.Add(1)` and only
releases `Done()` when its client session is closed. PR #25575 then
flipped `CODER_AI_GATEWAY_ENABLED` to default to `true`, so every
`cli.Server()` invocation now spins up that goroutine.

In `cli/server.go`, the only call to `aibridgeDaemon.Close()` was a
`defer` scheduled at function return. During graceful shutdown the code
first calls `coderAPICloser.Close()`, which waits on
`api.WebsocketWaitGroup`. That wait sits for the full 10s timeout in
`coderd/coderd.go` (`websocket shutdown timed out after 10 seconds`),
then returns, then the function unwinds, and only then does the deferred
`aibridgeDaemon.Close()` fire and let the goroutine call `Done()`.

The 10s tax was previously latent (aibridged was enterprise-only and
opt-in). After the two May 22 PRs it hit every `cli.Server()` test. On
Linux/macOS CI it just makes the suite slower; on the Depot Windows
runner, the ramdisk reservation leaves only ~17 GiB of headroom and the
~10s shutdown tails of multiple concurrent package binaries overlap into
an OOM, presenting as `test-go-pg (windows-2022)` jobs that die silently
at the ~600s watchdog with an empty `steps` array.

See Slack:
https://codercom.slack.com/archives/C05AE94121Z/p1779807717764189

## Fix

Close `aibridgeDaemon` explicitly during graceful shutdown, **before**
`coderAPICloser.Close()` waits on the WebSocket wait group. This matches
the existing ordered-shutdown pattern used for `tunnel` and
`notificationsManager`. The deferred `aibridgeDaemon.Close()` is
retained as a safety net for early-return paths, and is safe to
double-call because `aibridged.Server.Close()` is already idempotent via
`shutdownOnce` in `coderd/aibridged/aibridged.go`.

## Regression test

`TestServer_AIGatewayShutdownOrdering` boots a real `coder server` with
`--ai-gateway-enabled=true`, cancels its context, and asserts graceful
shutdown finishes in under 8s. With the fix the test runs in ~0.1s;
without the fix it fails deterministically at ~10.0s. The flag is passed
explicitly so the test continues to guard the ordering even if the
deployment default is ever flipped back.

## Evidence this fixes the OOM

On Linux the patched `cli` test package drops from 114 s back to its
pre-regression 30 s wall time at the same single-process peak RSS (~7.6
GiB), and the `websocket shutdown timed out after 10 seconds` log line
disappears from every server-test run. Since the Windows OOM is the sum
of multiple concurrent 10 s shutdown tails overlapping past the runner's
~17 GiB headroom, removing those tails returns the concurrent-RSS budget
to its pre-regression level. The Windows OOM was intermittent (a handful
of hits across many runs since May 22), so a single green `test-go-pg
(windows-2022)` job on this PR is not by itself proof; confirmation will
come from watching Windows runs on `main` over the next several days and
seeing the ~600 s silent-kill fingerprint stop recurring.

Relates to ENG-2771
2026-05-27 17:33:14 +10:00
TJ 916094c71c feat(site): replace usage bars with ring indicators (#25708)
Replaces the linear progress bars and text labels in the sidebar footer
usage trigger with SVG donut ring charts that show the section icon
centered inside each ring.

## Changes

- **`SvgRingProgress`**: shared SVG component used by both
`UsageIndicator` and `ContextUsageIndicator`
- Ring colors follow the existing severity system
(normal/warning/exceeded)
- Hover tooltips show "Spend $12.50" and "Workspaces 30/100"
- Dropdown menu content unchanged; full usage details still appear on
click
- Removed dead `summaryValue` field and `size="compact"` variant
- Updated stories to cover ring trigger rendering and dropdown usage
details

> Generated by Coder Agents on behalf of @tracyjohnsonux
2026-05-26 22:01:31 -07:00
TJ 2afb33ac5e feat(site/src/pages/AgentsPage): inline setup notice banner with admin/member distinction (#25518)
Replaces the blocking Dialog modal setup notice with a context-aware
inline banner above the chat input, with different messaging for admins
and members.

## Inline notice banner

The `AgentSetupNotice` component now renders as a `bg-surface-tertiary`
inline box instead of an unclosable `Dialog` modal. The notice sits
above the chat composer using negative margin overlap, and the composer
is forced opaque (`bg-surface-secondary`) when the notice is present so
the banner doesn't bleed through the semi-transparent desktop
background.

Three states based on role and configuration:
- **Admin, no providers or models**: links to both provider and model
setup
- **Admin, missing provider only**: link to provider setup
- **Admin, has providers but no models**: link to model setup only
- **Member, no models available**: generic "your admin is still getting
things set up" message

The admin/member distinction is determined via
`permissions.editDeploymentConfig` and applied in both `AgentChatPage`
and `AgentCreatePage`.

## Conflict resolution notes

During merge with main, the following were adapted:
- Sidebar filter props updated to main's
`sidebarFilters`/`onSidebarFiltersChange` pattern (replacing old
`archivedFilter`)
- Accepted `Sidebar/` -> `ChatsSidebar/` directory refactor from main
- Dropped `hasArchivedChats` query (its sidebar consumer was removed in
the refactor)
- Provider link updated to `/ai/settings` (new AI settings page)

> Generated with the assistance of Coder Agents on behalf of
@tracyjohnsonux

---------

Co-authored-by: jaaydenh <jaaydenh@users.noreply.github.com>
2026-05-26 21:00:53 -07:00
Ethan e99f7171e4 ci: require docs lint when docs change (#25608)
Move docs linting into the required CI umbrella and reuse the existing
`changes` job so docs lint runs when docs or CI files change, plus on
`main` as a backstop.

This is motivated by the docs lint failures on #25601. That PR touched
`.claude/docs/TESTING.md`; the standalone `Docs CI` workflow picked it
up because `docs-ci.yaml` used broad `**.md` matching, but local `pnpm
lint-docs` and `make lint` did not catch the same file because they only
scanned `docs/**` plus root `*.md`. The first failed Docs CI run
reported markdownlint errors in `.claude/docs/TESTING.md` (`MD040` and
`MD031`), and the next run reported a markdown table formatter failure
in the same file.

That mismatch is why this PR exists: prevent unrelated PRs from being
surprised by stale `.claude/docs/**` lint drift only after they happen
to touch one of those files. The local docs scripts now include
`.claude/docs/**`, and the old standalone `Docs CI` workflow is removed
so we do not maintain separate path-filter logic outside the required CI
workflow.

> Generated by mux, but reviewed by a human
2026-05-27 12:30:05 +10:00
Zach 20b50dd4b8 docs: mark user secrets as beta (#25704)
Update the user secrets user guide, the admin security secrets
reference, and the docs manifest to label the feature as Beta instead of
Early Access, and link to the beta section of the feature stages doc.
2026-05-26 15:22:17 -06:00
Zach 47ac4b309a feat: enforce per-user limits on user_secrets (#25588)
Add a Postgres trigger and matching codersdk constants that cap each
user's secrets in four dimensions: count (50), total stored value bytes
(200 KiB), env-injected stored value bytes (24 KiB), and env name length
(256 bytes). Without these caps a user could overflow the 4 MiB DRPC
agent manifest, the ~32 KiB Windows process env
block, or Linux/macOS ARG_MAX at workspace start. The trigger is the
source of truth on aggregates; the handler maps its check_violation
error into a 400 that names the per-user budget in stored
(post-encryption) bytes. A handler test exercises off-by-one at each cap
across POST and PATCH, plus per-user budget isolation.

Generated with help from Coder Agents.
2026-05-26 14:42:31 -06:00
Cian Johnston d3155e1cab test(enterprise/cli): add test to prove fix for #25699 (#25701)
Adds an end-to-end enterprise CLI test to ensure legacy AI provider keys seeded at server startup are encrypted at rest when DBCrypt external token encryption is enabled, preventing regressions related to #25699.

> Partially implemented by Coder Agents, and massaged afterwards by me.
2026-05-26 20:08:07 +00:00
Kyle Carberry 58f6b9c4d0 fix(coderd/externalauth): retry transient refresh failures with backoff (#25686)
## Summary

Wraps external auth token refresh in an exponential-backoff retry so a
brief upstream hiccup (5xx, network timeout, rate-limited 429) no longer
surfaces as an `InvalidTokenError` and forces users to re-authenticate.
GitHub in particular has been flaky enough lately that this is hitting
real users.

## Behavior

- `(*Config).RefreshToken` now calls a helper that retries the
`TokenSource.Token()` exchange with exponential backoff (250ms → 2s),
bounded by a 10s total budget.
- Errors classified as permanent by `isFailedRefresh` (e.g.
`bad_refresh_token`, `invalid_grant`, `unauthorized_client`, ...) skip
the retry loop. Retrying a permanent failure wastes the refresh quota
and, on providers with single-use refresh tokens, can mask a legitimate
concurrent winner with repeated `bad_refresh_token` responses.
- Refreshes with an empty refresh token still short-circuit without
making an API call.
- The existing concurrent-refresh-race detection and optimistic-lock
paths are unchanged.

## Tunables

Three new `time.Duration` fields on `externalauth.Config`
(`RefreshRetryInitialBackoff`, `RefreshRetryMaxBackoff`,
`RefreshRetryTimeout`) let callers override the defaults. They default
to zero, which falls back to the package defaults, so existing call
sites are unaffected. The fields exist primarily so tests can dial the
timing way down without touching package globals (and therefore without
serializing parallel tests).

## Tests

- `TestRefreshToken/RefreshRetries` now disables internal retries via
`RefreshRetryTimeout = time.Nanosecond` so its existing "1 IDP call per
`RefreshToken` invocation" assertion still holds. Otherwise its
assertions are unchanged.
- New `TestRefreshToken/RefreshTokenWithBackoff` simulates 3 transient
5xx failures followed by success and verifies the refresh ultimately
succeeds with 4 total IDP attempts.
- New `TestRefreshToken/RefreshTokenBackoffPermanentError` returns
`bad_refresh_token` and verifies the refresh is **not** retried even
with a generous 1s budget.

<details>
<summary>Why the explicit <code>retryCtx.Err()</code> guard?</summary>

`retry.Retrier.Wait` `select`s between `time.After(delay)` and
`ctx.Done()`. The first call has `delay == 0`, so `time.After(0)` and an
already-cancelled context both fire immediately and Go picks the case
nondeterministically. Without the guard, a near-zero retry budget would
still trigger an unwanted extra refresh attempt roughly half the time,
which would have made the `RefreshRetries` test flaky.
</details>

This PR was opened by a Coder agent on behalf of @kylecarbs.
2026-05-26 15:35:22 -04:00
Michael Suchacz 8b1705eb65 feat: route chatd provider traffic through aibridge (#25629)
## Summary

Routes chatd model calls backed by concrete AI Provider rows through the
in-process aibridge transport by default, with deployment options to use
direct provider routing when AI Gateway is disabled or chat AI Gateway
routing is disabled.

- Splits model routing into common, direct provider, and AI Gateway
paths behind a single deployment-mode entry point.
- Builds chatd models through explicit request, route, and options data.
Active API key attribution is passed explicitly instead of being hidden
inside generic model construction.
- For AI Gateway BYOK routes, resolves the user's provider key in chatd,
forwards it through provider-specific auth headers, and sets
`X-Coder-AI-Governance-Token` to the `delegated` marker so aibridge
preserves those headers while still stripping Coder-specific metadata.
- Keeps central provider credentials and deployment fallback credentials
out of forwarded provider auth headers, so AI Gateway central policy
remains authoritative.
- Redacts delegated provider auth from default string formatting to
avoid accidental plaintext logging of user BYOK credentials.
- Covers selected chat models, advisor overrides, title and quickgen
paths, subagent overrides, computer use model selection, and an
integration-style chat turn through the aibridge transport path.
- Persists initiating API key IDs on chat and queued user messages,
including subagent child messages, and fails closed for AI
Gateway-routed model builds without an active key.
- Removes unused `api_key_id` indexes while keeping the persistence
columns and foreign keys.
- Keeps the deployment option available through config and env parsing,
but hides it from CLI help and generated docs.
- Stabilizes the subagent poll fallback test so background CreateChat
processing cannot win the state transition under slower CI environments.

## Tests

- `go test ./coderd/x/chatd -run
'TestAIGatewayProviderAuthForUser|TestAIGatewayProviderAuthRedactsFormatting|TestResolveModelRouteForConfigAIGatewayProviderAuth|TestAIGatewayModelForwardsProviderAuth|TestProcessChat_AIGatewayRoutingUsesDelegatedAPIKey|TestAwaitSubagentCompletion'
-count=1`
- `go test ./coderd/aibridged -run
'TestServeHTTP_DelegatedAPIKey|TestServeHTTP_StripCoderToken' -count=1`
- `git diff --check HEAD~1..HEAD`
- `make lint`

> Mux working on behalf of Mike.
2026-05-26 19:31:52 +00:00
Danny Kopping a56c88a0cc fix: run AI provider seed and build after newAPI so dbcrypt applies (#25699)
## Problem

Two related symptoms of the same architectural issue: the `dbcrypt`
wrapper is installed inside `enterprise/coderd.New`, so any access to
`options.Database` that happens before `newAPI` runs bypasses
encryption.

**Symptom 1 (reads):** Provider keys added via the admin UI are
encrypted at rest. `BuildProviders` was running *before* `newAPI`,
against the unwrapped store, so the ciphertext was read as-is and shoved
into the keypool as the upstream credential. Anthropic/OpenAI reject it,
and the interception log shows:

```
coderd.aibridged.pool: interception failed  ... error="all configured keys failed authentication"
  credential_kind=centralized  credential_hint=PaPb...4A==  credential_length=184
```

**Symptom 2 (writes):** `SeedAIProvidersFromEnv` was also running before
`newAPI`, against the unwrapped store, so env-derived keys
(`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`,
etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID =
null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set.

## Fix

Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after
`newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes
encrypt correctly; reads decrypt correctly.

The enterprise closure (`enterprise/cli/server.go`) runs *inside*
`newAPI` and calls `BuildProviders` for the aibridgeproxyd at that
point. Once the agpl seed moves to after `newAPI`, the proxy on first
boot would see no env-seeded providers. Add a matching seed call inside
the enterprise closure before its `BuildProviders` to cover that case.
Seeding is idempotent, so the agpl-side seed running again post-`newAPI`
is a no-op when the rows already exist.

## Known shortcomings

The clean version of this fix would just inherit `ctx` like every other
startup step and place these calls naturally. It can't, for two reasons
that are both about the surrounding handler architecture rather than
this change:

1. **`dbcrypt` wrapping is positioned inside `newAPI`, not around
`options.Database` at creation.** That's why both seed and build have to
wait until after `newAPI` in the first place. The principled fix is to
install the wrapper at the point the store is created (behind a hook the
enterprise build supplies), so every consumer sees a single
authoritative view and the ordering stops mattering. This would also
collapse the duplicated seed call back to a single site.

2. **The handler's shutdown sequence is not deferred.**
`coderAPICloser.Close()` and the other teardown steps run only if
control reaches the `select` at the bottom of the handler. An early
`return` from anywhere in Phase 1 (e.g. seed/build returning
`context.Canceled` when the user hits ctrl-c during startup) skips that
block and orphans all the goroutines `newAPI` spawned — tailnet workers,
gitsync, telemetry batcher, etc. `goleak` then catches them at package
teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving
the shutdown into deferred closers (with a `sync.Once`-guarded close to
avoid double-close from the explicit Phase 2 call) is the principled
fix.

For this PR I took the smallest change that fixes the reported bugs: a
detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the
seed and build call sites in both the agpl and enterprise paths. It lets
the calls complete even if the user cancels during startup, after which
the handler reaches its shutdown select naturally and tears down through
Phase 2. Both shortcomings above are worth addressing separately.

## Test plan

- `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`;
passes locally with `-count=3`.
- Manually verified on a deployment with
`CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers:
`ai_provider_keys.api_key_key_id` is populated, `api_key` is base64
ciphertext, and upstream auth succeeds.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:27:02 +02:00
blinkagent[bot] dd741bd188 fix(site): only highlight Providers item on exact match in AI settings sidebar (#25700)
## Problem

When visiting `/ai/settings/governance`, both **AI Governance** and
**Providers** items in the AI settings subnav appear highlighted as
active.

## Cause

`SettingsSidebarNavItem` is built on react-router's `<NavLink>`, which
by default treats a link as active when the current URL **starts with**
the link's `to` path. Since `/ai/settings/governance` starts with
`/ai/settings`, the Providers item is also marked active.

## Fix

Pass `end` on the Providers nav item so it only matches when the path is
exactly `/ai/settings` (the index route). The `SettingsSidebarNavItem`
component already supports this prop for exactly this case.

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
2026-05-26 19:23:13 +00:00
TJ be184a0591 fix(site): update providers description with BYOK docs link (#25680)
> 🤖 Generated with [Coder Agents](https://coder.com/agents) on behalf of
@tracyjohnsonux

Updates the providers page description to explain that providers power
Coder Agents, AI Gateway, and other LLM features. Adds a "Manage
deployment-wide BYOK" link to the docs.

Uses `<Link>` component and `docs()` helper per project conventions.
2026-05-26 12:03:29 -07:00