coder

mirror of https://github.com/coder/coder.git synced 2026-06-05 05:58:20 +00:00

Author	SHA1	Message	Date
Hugo Dutka	658a04d28f	pr 3	2026-06-04 18:51:22 +00:00
Hugo Dutka	ef9bcfc335	pr 2 implementation	2026-06-04 18:51:22 +00:00
Hugo Dutka	f2c32c2cc9	post gen	2026-06-04 18:49:39 +00:00
Hugo Dutka	eaf2e65297	source files post gen	2026-06-04 18:49:39 +00:00
Hugo Dutka	72bc79c91d	make gen tmp files	2026-06-04 18:49:39 +00:00
Hugo Dutka	a13db0c2d9	make gen	2026-06-04 18:49:39 +00:00
Hugo Dutka	4110034209	source files	2026-06-04 18:42:59 +00:00
Michael Suchacz	502c5acca8	fix(coderd): preserve gateway model names (#26039 ) OpenAI-compatible gateway providers such as OpenRouter require slash-namespaced model IDs to reach the intended upstream model, but native OpenAI routing strips those prefixes. Preserve full model IDs for gateway provider types, reject OpenRouter-like providers configured as native `openai` when a slash model would be stripped, and validate chat model config changes under the provider reference lock while still allowing unrelated edits to existing configs. Split from #26005. > Mux created this PR on behalf of Mike.	2026-06-04 15:33:00 +02:00
Sas Swart	c5631a853a	feat(coderd/aibridged): add boundary correlation fields to RecordInterceptionRequest (#25884 ) Add `optional string boundary_session_id` (field 15) and `optional int64 boundary_sequence_number` (field 16) to `RecordInterceptionRequest` in the AI Bridge proto definition. Regenerate Go bindings. No behavior change. ## Context The [Gateway and Firewall Correlation RFC](https://www.notion.so/coderhq/Gateway-and-Firewall-Correlation-RFC-31ad579be592803aa8b3d48348ccdde9) defines a system for linking Agent Firewall (boundary) audit events with AI Bridge interceptions so that admins can trace an LLM request back to the exact network activity that produced it. The correlation mechanism works as follows: 1. Each boundary process generates a session UUID on startup and assigns a monotonically increasing sequence number to every audit event it records. 2. When boundary proxies a request to AI Bridge, it injects `X-Coder-Agent-Firewall-Session-Id` and `X-Coder-Agent-Firewall-Sequence-Number` headers. 3. AI Bridge reads these headers, records them on the interception, and strips them before forwarding to the upstream LLM provider. 4. The persisted session ID and sequence number allow the frontend to discover which boundary session an interception belongs to, and to fetch only the boundary audit events that occurred between any two interceptions by filtering on the sequence number range. This PR implements the first step: adding the proto fields that carry the correlation data from AI Bridge to coderd's recording service. ## How these fields will be used The two immediate downstream issues depend on these fields: AIGOV-260 adds `boundary_session_id UUID NULL` and `boundary_sequence_number BIGINT NULL` columns to the `aibridge_interceptions` database table, with a partial index on `boundary_session_id`. The `RecordInterception` server handler (`coderd/aibridgedserver/aibridgedserver.go`) will read the new proto fields via `GetBoundarySessionId()` and `GetBoundarySequenceNumber()` and pass them through to the database insert query. AIGOV-259 adds the capture-and-strip logic in the AI Bridge interception processor (`aibridge/bridge.go`). It reads the `X-Coder-Agent-Firewall-Session-Id` and `X-Coder-Agent-Firewall-Sequence-Number` headers from the incoming request, adds `BoundarySessionID string` and `BoundarySequenceNumber int64` fields to the `InterceptionRecord` struct (`aibridge/recorder/types.go`), and strips the headers before forwarding upstream. The translator (`coderd/aibridged/translator.go`) will then map these struct fields onto the proto fields added here. Fixes https://linear.app/codercom/issue/AIGOV-252 > [!NOTE] > This PR was generated by [Coder Agents](https://coder.com).	2026-06-04 11:19:57 +02:00
Ethan	3ab1323bc9	fix!: rename chat stream silence timeout error (#25973 ) Renames the Agents chat stream-silence error from `startup_timeout` to `stream_silence_timeout` now that the timeout applies to any gap between provider stream parts, not just first-token startup. Updates the SDK enum, generated API docs/types, chat error copy, and Agents UI stories/status labels so the user-facing wording describes a stalled provider response instead of startup delay. > Breaking change: This is a very minor breaking change for the Coder Agents API: the public chat error kind enum no longer includes `startup_timeout`, so clients matching that specific value should handle `stream_silence_timeout` instead.	2026-06-04 18:36:02 +10:00
Ethan	becc858fa8	fix(coderd/x/chatd): retry provider stream cancellations (#26010 ) Closes CODAGT-541. ## Problem An Agents chat stream could die with a terminal `context cancelled` error and surface to the user as a permanent chat failure, even when no context in our process had actually been canceled. The cancellation was a provider-returned error value (HTTP/2 RST_STREAM mid-body surfacing as `context.Canceled` from Go's net/http2), not a real caller cancel. The chain that produced the bug: - fantasy passed the provider's `context.Canceled` through unchanged. - `chaterror.Classify` short-circuited any `errors.Is(err, context.Canceled)` (or `"context canceled"` text) as terminal generic, before checking HTTP status codes or other retry signals. - `chatretry.Retry` did not retry. - The frontend rendered `type:"error"` and the chat was dead. The same short-circuit also masked retryable 5xx responses whose underlying transport error happened to wrap `context.Canceled`. ## Approach `context.Canceled` has no inherent intent. The same error value can mean a user pressing Stop, a server shutdown, the silence guard firing, or a provider-side stream reset. The only layer that can disambiguate is the one holding both the returned error and the caller context. That is `chatretry`. This PR centralizes the policy there and keeps `chaterror` context-free. ## Changes `coderd/x/chatd/chaterror/classify.go` - Add `ErrProviderTransportReset` sentinel to explicitly mark provider-side stream cancellations. - Remove the broad `context.Canceled` / `"context canceled"` short-circuit so status codes and other retry signals can win. - Classify `ErrProviderTransportReset` (with no status code) as a retryable timeout. - Keep a fallback that classifies bare `context.Canceled` as terminal-generic when no other signal is present, so legitimate caller cancels still terminate cleanly. `coderd/x/chatd/chatretry/chatretry.go` - Add `contextError(ctx)` that returns `context.Cause(ctx)` when set, falling back to `ctx.Err()`, so caller-owned cancel causes (`ErrInterrupted`, `errStreamSilenceTimeout`, server shutdown sentinels) propagate cleanly out of the retry loop. - Add `classifyProviderAttemptError(err)` that wraps a bare `context.Canceled` in `ErrProviderTransportReset` and reclassifies. Errors that already classify as retryable or carry a status code are left alone. - Restructure `Retry` so the policy is explicit and readable: check caller cancellation before attempting, run the attempt, check caller cancellation again before normalizing the provider error, then classify and retry. ## End-to-end behavior - Provider returns `context.Canceled` while caller context is healthy: classified as a retryable timeout, retried, the user sees a brief `type:"retry"` event and the chat continues. - User presses Stop: `contextError(ctx)` returns `ErrInterrupted`. Retry stops. `chatloop` flushes partial content and persists. - Stream-silence guard fires: `attemptCtx` is canceled with `errStreamSilenceTimeout`, `guardedStream` produces a classified retryable error, retry proceeds normally on the still-alive parent. - Server shutdown: parent context's cause propagates out, retry stops.	2026-06-04 12:52:37 +10:00
Jon Ayers	167ac7b879	feat: add nats experiment (#25703 )	2026-06-03 15:37:19 -05:00
Steven Masley	f1ebc42859	refactor(coderd/rbac): enumerate org-member and org-service-account perms (#25928 ) `organization-member` was created from `allPermsExcept(...)`. This is changed to an explicit enumeration of capabilities. - New resources no longer auto-grant to org members or service accounts. - Adding one now requires an explicit decision in `coderd/rbac/roles.go`.	2026-06-03 08:14:11 -05:00
Paweł Banaszewski	96e3a64b12	feat: add AI Gateway coderd key CRUD endpoints (#25565 ) Adds create, list and delete endpoints for AI Gateway keys. Those keys are used to authenticate into Coderd. All endpoints require Owner permission.	2026-06-03 13:50:33 +02:00
Mathias Fredriksson	7a84a851ce	fix(coderd): subscribe to pubsub before accepting websocket in watchChats (#25663 ) The watchChats handler called SubscribeWithErr after websocket.Accept, creating a window where clients could trigger events before the subscription was active. Move the subscription before the accept so events accumulate in the pubsub internal queue and drain naturally once the encoder is ready. Fixes CODAGT-480	2026-06-03 13:18:57 +03:00
Mathias Fredriksson	faf0add985	test(coderd/coderdtest/oidctest): scope IDP NotFound errors to IDP paths (#25892 ) The FakeIDP mux.NotFound handler called t.Errorf for any unrecognized HTTP request, failing the owning test. It also never wrote an HTTP response, so the stale caller got a 200 with an empty body, hiding the problem on the caller side. When the IDP runs as a real HTTP server (WithServing), OS port reuse across concurrent test binaries can route stale connections to the IDP port. The source is enterprise provisionerd reconnects and DERP clients from parallel tests whose coderd servers have shut down. Check whether the NotFound request path starts with a known IDP route prefix (/oauth2/, /.well-known/, /login/, /external-auth-validate/). IDP paths: t.Errorf, logger.Error, and 404 response. Non-IDP paths: t.Logf, logger.Warn, and 421 Misdirected Request response. Both branches now return a proper HTTP error so the offending caller can be traced.	2026-06-03 13:06:46 +03:00
Cian Johnston	8b058dc949	feat: add coderd_api_websocket_probes_total metric (#25012 ) Relates to CODAGT-115 Adds metric `coderd_api_websocket_probes_total`. Every successful heartbeat for a given path will increment the metric. Comparing this with `coderd_api_concurrent_websockets` will give an indication of how many websocket connections are open but in a 'wedged' state (when heartbeats stopped versus when we closed the connection).	2026-06-03 10:46:07 +01:00
Michael Suchacz	7703e7a26e	fix: preserve AI provider preset types (#25925 ) > Mux created this PR on behalf of Mike. AI provider creation previously collapsed OpenAI-compatible presets like Google and generic OpenAI-compatible providers to `openai`, which lost the backend provider discriminator. Preserve selected provider types in the create payload, keep explicit stored types authoritative when reconstructing edit form values, and add frontend plus backend regressions for the supported preset types.	2026-06-03 09:24:08 +02:00
Jon Ayers	ec19bc41d8	fix: escape appearance values in HTML output (#25804 )	2026-06-02 13:19:16 -05:00
George K	2f011fd2a3	fix: reject oversized and invalid zip uploads (#25877 ) Enforce aggregate limits when converting uploaded ZIP archives to tar so compressed inputs cannot expand without bound in memory. Also treat malformed ZIP entry metadata and content mismatches as client errors during conversion, returning 400 for invalid archives and 413 when expanded tar output exceeds the upload limit. Ref: https://linear.app/codercom/issue/PLAT-274/zip-upload-decompressed-without-aggregate-size-limit-sec-103	2026-06-02 10:11:49 -07:00
Zach	170c33a475	feat: encrypt gitsshkeys.private_key at rest via dbcrypt (#25872 ) Adds an optional dbcrypt wrapper around gitsshkeys.private_key. The column is encrypted on insert and update through enterprise/dbcrypt when external token encryption is configured, and decrypted on read. A new private_key_key_id column references dbcrypt_keys(active_key_digest) so revocation safety is enforced by the existing foreign key. Rows with a NULL key_id stay plaintext and remain readable. Existing plaintext rows can be backfilled by running `coder server dbcrypt rotate`. Generated with assistance from Coder Agents.	2026-06-02 08:36:01 -06:00
Ethan	9fe75587ae	fix: forward user-uploaded PDFs to Anthropic and Bedrock (#25946 ) Previously, user-uploaded PDFs were silently dropped by fantasy's Anthropic provider adapter, so Claude (direct or via Bedrock) only saw the user's text and replied as if no document had been attached. Other providers (OpenAI, Gemini, OpenRouter, Vercel) were unaffected. Bumps `coder/fantasy` past [coder/fantasy#37](https://github.com/coder/fantasy/pull/37) (cherry-pick of upstream [charmbracelet/fantasy#197](https://github.com/charmbracelet/fantasy/pull/197)), which emits an Anthropic `document` content block with a base64 PDF source for `fantasy.FilePart{MediaType: "application/pdf"}` and counts `OfDocument` as user-visible so a PDF-only user message is no longer culled as empty. Adds a regression test (`TestModelFromConfig_AnthropicPDFFilePartReachesProvider`) that drives a `fantasy.FilePart` through the real Anthropic provider against a `chattest.NewAnthropic` stub and asserts the outbound request contains a base64 document block. The test was verified to fail on the previous fantasy pin (the request leaves with zero messages and `Generate` returns EOF) and pass on the new one. Manually verified end-to-end with `./scripts/develop.sh`: uploading a PDF to a Claude-backed Coder Agents chat now lets the model read it. Closes CODAGT-540	2026-06-03 00:16:01 +10:00
Steven Masley	d2697dc5b0	test: data race for TestAIGatewayKeysTableConstraints - shadowed error (#25980 ) Closes https://github.com/coder/coder/issues/25979 error is shadowed and shared by parallel subtests	2026-06-02 14:02:29 +00:00
Paweł Banaszewski	32aee9ea4c	feat: add DB queries for ai_gateway_coderd_keys (#25564 ) Adds Insert, List and Delete queries for `ai_gateway_coderd_keys ` table.	2026-06-02 13:25:44 +02:00
Michael Suchacz	4d3bfa5fab	fix(coderd/x/chatd): stabilize advisor stream test (#25781 ) `TestAdvisorHappyPath_RootChat` could subscribe after the active test server had already processed the chat and published transient advisor deltas, leaving the live delta collector empty. Use a passive chatd test server until the live subscriber and collector are registered, then start processing and wait for the expected advisor deltas before canceling the stream. Closes coder/internal#1548 Generated by Coder Agents. <details> <summary>Implementation notes</summary> The failing assertion covered stream-only advisor `ResultDelta` events. `CreateChat` signals the processor, so an already-started server can publish those deltas before `Subscribe` registers its local stream subscriber. The test now creates the chat on a passive server, subscribes, starts the collector, then calls `Start()`. </details>	2026-06-02 12:44:45 +02:00
Michael Suchacz	dd22086734	fix(coderd/x/chatd): preserve chat API key after compaction (#25930 ) > Mux updated this PR on behalf of Mike. AI Gateway chat retries after context compaction could lose active turn API key routing metadata because the prompt query keeps the compressed model-only summary but omits the original visible user turn. Persist the active API key ID onto compaction summaries explicitly. Model construction now uses one active-turn lookup helper for visible user turns and compressed summary boundaries, so prompt model construction can recover the key when no later visible user turn exists. Added unit and DB-backed coverage for the compacted prompt path.	2026-06-02 12:19:06 +02:00
Paweł Banaszewski	f22d4e2cbb	feat: add ai_gateway_keys table and related RBAC (#25563 ) Adds table to store keys that AI Gateway standalone replicas will use to authenticate into Coderd. Also adds RBAC and audit boilerplate.	2026-06-02 09:28:43 +02:00
Ethan	d0fa9ff986	fix(coderd/x/chatd/chattool): retry workspace name conflicts (#25668 ) Retry Coder Agents workspace creation once with a generated random suffix when the requested workspace name already exists. This preserves structured errors for other conflicts and avoids surfacing avoidable name collisions. Closes CODAGT-386	2026-06-01 13:31:25 +00:00
Danny Kopping	85f56e4944	fix: recreate `ai_provider_type` instead of ADD VALUE (#25895 ) Coder runs all migrations in a single transaction (`pgTxnDriver`). Postgres forbids using an enum value added by `ALTER TYPE ... ADD VALUE` within the same transaction that added it. Migration `000499` widened `ai_provider_type` with `ADD VALUE`, and `000504` casts existing `chat_providers` rows to that enum in the same transaction. On deployments with a legacy provider using one of the new values (for example `openai-compat`), the batch failed with `unsafe use of new value` and the server could not start. Recreate the type (create a new enum, alter the column, drop and rename) instead of using `ADD VALUE`, matching the existing precedent in `000144_user_status_dormant`. A freshly created enum's values are usable immediately in the same transaction, so the cast in `000504` succeeds. The resulting schema is identical, so `make gen` produces no `dump.sql` diff and databases that already applied these migrations see no drift. Added a regression test that seeds an `openai-compat` provider and applies `000499` through `000504` in a single transaction, reproducing the production path. The per-step `Stepper` used by the other migration tests commits each migration separately and cannot surface this class of bug. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Danny Kopping <danny@coder.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 13:30:45 +00:00
Danny Kopping	a85462bd49	feat: support adding GitHub Copilot AI provider via UI (#25888 ) Copilot is the only AI provider type that could not be added through the `/ai/settings` UI. The aibridge runtime and the env-var seeding path already supported it, but the runtime CRUD API rejected `type=copilot` and the UI omitted it entirely. The root cause is that Copilot's auth model (a per-request GitHub OAuth token, with no pre-shared key) does not fit the credential-centric add-provider flow that every other provider uses. ## Backend Allow `type=copilot` in `CreateAIProviderRequest.Validate()`, and reject `api_keys` for Copilot on both create (validation) and update (handler sentinel), mirroring the existing Bedrock guards. Copilot carries no stored credential. ## Frontend Add Copilot to the provider type picker (with the `github-copilot.svg` icon) and give the form a credential-free branch: name, display name, and a free-text endpoint defaulting to `https://api.business.githubcopilot.com`, with copy explaining that authentication happens via the user's GitHub token at request time. Copilot maps to the distinct `copilot` wire type rather than collapsing to `openai`, and the edit flow recovers it correctly. The endpoint stays required with a business-tier default; users on the individual or enterprise endpoints edit the field. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-06-01 15:26:37 +02:00
Mathias Fredriksson	82752844bc	fix: isolate MCP HTTP transports from DefaultTransport in tests (#25821 ) Use testing.Testing() inside createTransport to automatically clone http.DefaultTransport when running in tests. In production, DefaultTransport is used as-is (efficient connection pooling). This fixes the CloseIdleConnections flake class: httptest.Server.Close() calls http.DefaultTransport.CloseIdleConnections(), which disrupts any MCP client sharing that transport. The testing.Testing() check means every MCP transport created during tests gets isolation automatically, with no caller changes needed. Closes coder/internal#1016 Closes PLAT-291	2026-06-01 16:17:29 +03:00
Mathias Fredriksson	8b7e040105	fix(coderd/x/chatd/chatloop): discourage doctrine in compaction summaries (#25850 ) Two additions to the compaction summary prompt: 1. Error specificity: the "errors encountered" bullet now instructs the model to keep error notes specific (name the file, the error, the fix) and not generalize from a specific failure to a blanket tool-avoidance rule. This addresses the doctrine crystallization pattern where a single tool failure gets promoted to a standing "avoid tool X" rule that persists across compactions and model swaps. 2. Reproducibility: a new closing sentence instructs the model to reference reproducible content by path, command, or URL rather than inlining it. Content without a stable reproducer is still preserved inline with a brief summary. This targets summary bloat from inlined code blocks (worst case: 34k chars, 76 code blocks reproducing repo content verbatim). Refs CODAGT-331	2026-06-01 12:42:09 +03:00
dylanhuff-at-coder	0401ed3af5	fix(coderd/notifications): serialize pending updates gauge writes (#25495 ) Fixes a race where concurrent notification dispatch goroutines could overwrite `coderd_notifications_pending_updates` with an older buffer-length snapshot. Pending update snapshots now serialize count evaluation with the gauge write, and inhibited dispatch results refresh the metric when buffered.	2026-05-29 11:02:13 -07:00
Jon Ayers	5cdc9e28a9	feat: add nats cluster peer support (#25632 )	2026-05-29 11:35:59 -05:00
Mathias Fredriksson	98d5e7948d	fix(coderd/autobuild): handle concurrent build number race in lifecycle executor (#25824 ) The lifecycle executor did not handle unique-violation errors from InsertWorkspaceBuild. When a concurrent actor (API handler, another lifecycle executor, or prebuilds reconciler) inserts a workspace build with the same build number, PostgreSQL returns a unique constraint violation on workspace_builds_workspace_id_build_number_key. The lifecycle executor treated this as a hard error, logging it and storing it in stats.Errors. The per-workspace advisory lock (pg_try_advisory_xact_lock) prevents two lifecycle executors from racing, but does not protect against races with the CreateWorkspaceBuild API handler or the prebuilds reconciler, which use different (or no) locking. Catch the specific unique-violation error after InTx returns (where the transaction is already rolled back) and clear it. The concurrent actor's build takes effect; the lifecycle executor treats the workspace as a no-op for this tick. Closes coder/internal#455 Closes PLAT-290	2026-05-29 17:12:31 +03:00
Yevhenii Shcherbina	1a91d31793	feat: add user AI budget override endpoints (#25439 ) Implements https://linear.app/codercom/issue/AIGOV-285 Follow the structure established in https://github.com/coder/coder/pull/25203 ## Summary Adds the `user_ai_budget_overrides` table and CRUD API at `/api/v2/users/{user}/ai/budget`. An override sets a custom per-user spend cap that supersedes group-budget resolution, attributing spend to a specific group. ## Schema ```sql CREATE TABLE user_ai_budget_overrides ( user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE, group_id UUID NOT NULL REFERENCES groups(id) ON DELETE CASCADE, spend_limit_micros BIGINT NOT NULL CHECK (spend_limit_micros >= 0), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); ``` ## Membership lifecycle The membership invariant — a user must be a member of the attributed group, including when that group is "Everyone" — would naturally be expressed as a composite FK on `(user_id, group_id) → group_members_expanded(user_id, group_id)`. PostgreSQL doesn't allow foreign keys to reference views, so enforcement is split across two mechanisms: - Write-time check. A CHECK constraint on the table (`user_ai_budget_overrides_must_be_group_member`) calls a `STABLE` function `is_group_member(user_id, group_id)` that queries `group_members_expanded`. The view surfaces both regular group memberships and the implicit "Everyone" group memberships from `organization_members`. Any INSERT or UPDATE that violates the predicate is rejected with a Postgres `check_violation`, which the handler maps to a 400. `is_group_member` is defined as a general predicate, reusable by any future table that needs the same check. - Cascade on removal. Two `BEFORE DELETE` triggers handle membership loss: - `trigger_delete_user_ai_budget_overrides_on_group_member_delete` on `group_members` — covers regular group removals (admin action, OIDC sync). - `trigger_delete_user_ai_budget_overrides_on_org_member_delete` on `organization_members` — covers the "Everyone" group, whose membership lives in `organization_members`. The single-column FKs on `users(id)` and `groups(id)` remain to cascade on user or group deletion (those paths don't pass through `group_members`). ## Authorization The dbauthz layer gates each operation against the `User` and (for writes) `Group` resources: \| Operation \| User resource \| Group resource \| \|-----------\|----------------\|----------------\| \| `GET` \| `ActionRead` \| — \| \| `PUT` \| `ActionUpdate` \| `ActionUpdate` \| \| `DELETE` \| `ActionUpdate` \| `ActionUpdate` \| For `DELETE`, the dbauthz layer fetches the existing override first to learn the attributed `group_id`, then runs both checks. ### Role matrix \| Role \| GET \| PUT \| DELETE \| \|--------------\|-----\|-----\|--------\| \| Owner \| ✅ \| ✅ \| ✅ \| \| UserAdmin \| ✅ \| ✅ \| ✅ \| \| OrgAdmin \| ✅ \| ❌ \| ❌ \| \| OrgUserAdmin \| ✅ \| ❌ \| ❌ \| Internal discussion: https://codercom.slack.com/archives/C096PFVBZKN/p1779392747885359 ## Audit logs Audit logs will be addressed in a follow-up PR.	2026-05-29 10:08:25 -04:00
Danny Kopping	110210d7c9	fix(coderd): block ai provider env key drift (#25849 ) Previously, `SeedAIProvidersFromEnv` only hashed provider-level fields, so env var key changes were silently ignored once a provider already existed in the database. Include bearer keys and Bedrock credentials in the canonical drift hash, and cover multi-key, multi-provider cases so restarts now fail loudly when the configured credentials no longer match what is stored. When changing a key, you'll now see this in the server startup logs: ``` 2026-05-29 12:29:02.674 [info] api: Encountered an error running "coder server", see "coder server --help" for more information 2026-05-29 12:29:02.674 [info] api: error: create coder API: 2026-05-29 12:29:02.674 [info] api: github.com/coder/coder/v2/cli.(RootCmd).Server.func2 2026-05-29 12:29:02.674 [info] api: /home/coder/coder/cli/server.go:1015 2026-05-29 12:29:02.674 [info] api: - seed ai providers from env: 2026-05-29 12:29:02.674 [info] api: github.com/coder/coder/v2/enterprise/cli.(RootCmd).Server.func1 2026-05-29 12:29:02.674 [info] api: /home/coder/coder/enterprise/cli/server.go:187 2026-05-29 12:29:02.674 [info] api: - execute transaction: 2026-05-29 12:29:02.674 [info] api: github.com/coder/coder/v2/coderd/database.(sqlQuerier).runTx 2026-05-29 12:29:02.674 [info] api: /home/coder/coder/coderd/database/db.go:212 ---> 2026-05-29 12:29:02.674 [info] api: - AI provider "vercel" already exists in the database and differs from the current environment configuration; update the provider through the API or remove the CODER_AIBRIDGE_ env vars to stop seeding it: 2026-05-29 12:29:02.674 [info] api: github.com/coder/coder/v2/coderd.SeedAIProvidersFromEnv.func1 2026-05-29 12:29:02.674 [info] api: /home/coder/coder/coderd/ai_providers_migrate.go:139 2026-05-29 12:29:02.674 [info] api: slogjson: failed to write entry: io: read/write on closed pipe 2026-05-29 12:29:02.700 [info] dlv: Stop reason: exited 2026-05-29 12:29:02.825 [info] site: ELIFECYCLE Command failed. error: running command "develop": server did not become ready in 1m0s: main.waitForHealthy /home/coder/coder/scripts/develop/main.go:877 - context canceled ``` _This PR was generated with Coder Agents._	2026-05-29 13:14:55 +00:00
Cian Johnston	d0a51da0a9	feat: classify provider_disabled 503 as non-retryable (#25800 ) Builds on top of https://github.com/coder/coder/pull/25794 Adds a new `provider_disabled` error classification in `chatd` with the corresponding plumbing to classify it as non-retryable. Also adds a story for how this particular error kind is displayed in the UI.	2026-05-29 13:14:04 +01:00
Susana Ferreira	7b903cad73	fix: track credential hint across key failover attempts in aibridge (#25735 ) ## Problem Centralized requests recorded the first available key from the pool at `CreateInterceptor` time as `credential_hint`, so the interception could be persisted in the database with a hint that didn't match the key that actually served the request. The fix consists in storing, at end-of-interception, the hint of the key that succeeded, or the last attempted key if all keys are unavailable. ## Changes - Add `Key.Hint()` and update `credential_hint` on every failover attempt so it reflects the actually-used key. - Stop pre-populating `credential_hint` at `CreateInterceptor`. Centralized starts empty and is updated by the key failover loop. - Persist the final hint via `RecordInterceptionEnded`; SQL updates `credential_hint` only when `credential_kind = 'centralized'` so BYOK keeps its start-time value. - Log the actually-used hint on interception end/failure; start log uses a `<keypool-pending>` placeholder for centralized. > [!NOTE] > Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira	2026-05-29 12:01:37 +01:00
Sas Swart	a586b7e5e0	feat: add `boundary_log` rbac resource (#24810 ) RFC: [Bridge ↔ Boundaries Correlation RFC](https://www.notion.so/coderhq/Gateway-and-Firewall-Correlation-RFC-31ad579be592803aa8b3d48348ccdde9) Register a dedicated `boundary_log` RBAC resource type with `create`, `read`, and `delete` actions, replacing the placeholder `rbac.ResourceAuditLog` and `rbac.ResourceSystem` references previously used in the dbauthz layer. Create is granted at user-level so workspace agents can only write logs owned by their workspace owner, preventing cross-workspace log fabrication. Delete is restricted to `DBPurge` only; no human role (including owner) can delete boundary logs. \| Subject \| Create (own) \| Create (other) \| Read (all) \| Delete \| \|---\|---\|---\|---\|---\| \| Workspace agent \| yes \| no \| no \| no \| \| Owner (site admin) \| yes (via member) \| no \| yes \| no \| \| Auditor \| no \| no \| yes \| no \| \| DBPurge \| no \| no \| no \| yes \| ### Changes - RBAC policy & resource definition: add `boundary_log` to `policy.go` and generate `ResourceBoundaryLog` object, scope constants, and codersdk/TypeScript types. - dbauthz authorization: replace all `ResourceAuditLog`/`ResourceSystem` placeholders with `ResourceBoundaryLog`. `InsertBoundaryLog` and `InsertBoundarySession` derive the workspace owner from the agent and authorize with `.WithOwner()` for user-scoped create. - Role assignments: - Owner (site): read only. Excluded from `allPermsExcept` wildcard; create is inherited from member at user-level. - Member (user-level): create. User-scoped so agents can only write logs they own. - Auditor (site): read. - `boundary_log` is excluded from org-admin, org-member, and org-service-account `allPermsExcept` calls for consistency with `ResourceBoundaryUsage`. - System subjects: - DB Purge (`SubjectTypeDBPurge`): delete. The only subject that can remove boundary logs. - Workspace agent scope: `ResourceBoundaryLog` with wildcard ID in the agent scope allow-list (necessary for creation since no pre-existing ID exists). User-level role scoping prevents deployment-wide access. - DB migration (`000510_boundary_log_scopes`): add `boundary_log:`, `boundary_log:create`, `boundary_log:delete`, `boundary_log:read` enum values to `api_key_scope`. - Test coverage: `BoundaryLogCreate` (user-scoped, only matching owner succeeds), `BoundaryLogDelete` (all human roles denied), `BoundaryLogRead` (owner + auditor). dbauthz mock tests set up workspace agent lookups for owner derivation. - Generated docs*: update OpenAPI specs, API reference docs, and frontend type definitions. --------- Co-authored-by: Muhammad Danish <mdanishkhdev@gmail.com> Co-authored-by: Coder Agents <coder-agents-review[bot]@users.noreply.github.com>	2026-05-29 12:50:39 +02:00
Danny Kopping	5b10268827	feat: serve 503 sentinel for disabled providers (#25794 ) _Disclosure: created with Coder Agents._ When providers are disabled, we should serve a sentinel error so the requesting client (Claude Code, Coder Agents, etc) is informed. Coder Agents can also conditionalize its display to show a helpful error message. --------- Signed-off-by: Danny Kopping <danny@coder.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 10:24:16 +02:00
Ethan	eb2c2799ca	fix: strip deleted MCP IDs from chats on delete (#25763 ) Adds a database migration that reconciles existing stale chat MCP server IDs, then installs a `BEFORE DELETE` trigger on `mcp_server_configs` to remove the deleted ID from `chats.mcp_server_ids`. This keeps chat continuation from failing with `400 One or more MCP server IDs are invalid` after an MCP server config is deleted. This matches the existing repo precedent in `coderd/database/migrations/000241_delete_user_roles.up.sql`, where deleting a custom role cleans `organization_members.roles`, a similarly structured array of references that cannot be protected by a normal foreign key. Closes CODAGT-505	2026-05-29 16:49:25 +10:00
Jon Ayers	bb11946bd4	fix: require update permission to recreate devcontainers (#25812 ) - The httpmw upstream from this endpoint only checks for read perms to the workspace agent. Recreating a dev container should require `update` perms since it mutates state. This also matches the behavior of the `DELETE` endpoint	2026-05-28 15:34:36 -05:00
Cian Johnston	7ea0eff94e	fix: improve chat audit log descriptions and diff rendering (#25728 ) Chat ACL audit diffs rendered as `[object Object]` because the diff viewer called `.toString()` on object values. Common chat operations (archive, share) showed generic "updated chat" descriptions instead of semantic ones. Add `chatAuditLogDescription` to derive semantic descriptions from the audit diff for successful chat writes: "archived/unarchived chat" for archive toggles, "updated sharing for chat" for ACL-only changes. Extract diff value formatting into `formatAuditDiffValue`, which renders object values as deterministic compact JSON with sorted keys, fixing the `[object Object]` rendering for chat ACLs and any other object-valued fields. The previous `determineIdPSyncMappingDiff` workaround for IdP sync mappings was removed because the generic formatting handles it. Closes CODAGT-513 > Generated by Coder Agents on behalf of @johnstcn	2026-05-28 18:37:57 +01:00
Danielle Maywood	0d1340a430	fix: collapse agent command output by default (#25748 )	2026-05-28 16:54:52 +01:00
Steven Masley	4591212482	feat: implement SCIM handler for SCIM 2.0 compliance (#25572 ) Rewrites the SCIM 2.0 user provisioning handler to be RFC 7644 compliant. Verified against an external IdP Okta. Behavior is OPT IN	2026-05-28 10:00:37 -05:00
Cian Johnston	6df1536256	fix: add missing_key error kind for missing chat api_key_id (#25783 ) Refs CODAGT-486 - `codersdk/chats.go`: New `ChatErrorKindMissingKey` constant and `AllChatErrorKinds` entry - `coderd/x/chatd/chaterror/message.go`: `terminalMessage` and `retryMessage` cases - `coderd/x/chatd/model_routing_aibridge.go`: Pre-classify error with `WithClassification` - `coderd/x/chatd/model_routing_internal_test.go`: Classification assertion on production path (CRF-2) - `chatStatusHelpers.ts`: Frontend title "Chat interrupted" - `LiveStreamTail.stories.tsx`: Storybook story with `detail` assertion - `docs/ai-coder/ai-gateway/clients/coder-agents.md`: Troubleshooting entry - Tests: classification round-trip, terminal message, metrics kind enumeration > Generated with [Coder Agents](https://coder.com/agents) on behalf of @johnstcn	2026-05-28 15:50:52 +01:00
Danny Kopping	12520ee964	feat: add ai provider status and reload freshness metrics (#25770 ) Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.	2026-05-28 14:57:33 +02:00
Ethan	7e2f7198dd	fix(coderd/x/chatd/chatloop): use stream silence timeout (#25782 ) Replaces the 60 second first-token timeout in the chat loop with a 10 minute stream-silence timeout. Previously, the guard bounded only the gap before the first stream part. Once any part arrived the attempt could hang indefinitely if the provider stopped streaming without closing the connection, and even normal long-running responses could be killed after 60 seconds if the provider was slow to emit the first token. The guard now arms when a model attempt opens its stream, resets on every received stream part, and fires after 10 minutes of complete silence. The existing retry path still handles the timeout, and the public `startup_timeout` error kind is preserved to avoid API and frontend churn. 10 minutes matches the default request timeout used by the Anthropic and OpenAI Python SDKs. Closes CODAGT-493	2026-05-28 21:02:40 +10:00
Michael Suchacz	f529577bee	fix(coderd/x/chatd): harden openai-compatible chat calls (#25737 ) OpenAI-compatible chat paths hit two provider compatibility issues. Some compatible endpoints reject a named `tool_choice` when there is only one tool, and Gemini's OpenAI-compatible endpoint requires thought signatures on current-turn tool calls. Centralize OpenAI-compatible request patches in the chat provider: rewrite single named tool choices to `"required"`, and add the documented dummy Google thought signature to the first tool call in each current-turn tool step for Gemini routes. Vercel OpenAI-compatible requests are left unchanged for the thought-signature patch. > Mux created this PR on behalf of Mike.	2026-05-28 10:27:32 +02:00

1 2 3 4 5 ...

3965 Commits