coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Cian Johnston	581f3bdd14	fix(coderd/httpapi): stop writing websocket frames to ResponseRecorder in test (#25284 ) The `mockEventSenderWrite` function in `newOneWayWriter()` wrote WebSocket frame data to both the `net.Pipe` and the `httptest.ResponseRecorder`. After `websocket.Accept()` calls `WriteHeader(101)`, the recorder rejects body writes with `"response status code does not allow body"`. When `HeartbeatClose` sends a ping, the control frame flush routes through the recorder, producing an ERROR-level log that `slogtest` catches as a test failure. Removed the `recorder.Write(b)` call from the write function. The recorder is only needed for header/status inspection; WebSocket frame data should only go through the `net.Pipe`. Closes https://github.com/coder/internal/issues/1521 > 🤖 Generated by Coder Agents	2026-05-14 09:15:14 +01:00
Jaayden Halko	024132e8a4	feat: add theme_mode, theme_light, theme_dark to UserAppearanceSettings (#25076 ) Part 1: Backend portion of a change broken into 2 PRs. Part 2: #25077 Adds three new UserAppearanceSettings fields (theme_mode, theme_light, theme_dark) on top of the existing theme_preference and terminal_font. Replaces GetUserThemePreference and GetUserTerminalFont with a single GetUserAppearanceSettings aggregate query. The PUT handler is wrapped in db.InTx so sync-mode's mode + slot writes can never half-apply.	2026-05-14 05:44:05 +01:00
Ethan	a35f71cd8a	fix(coderd/x/chatd): retry HTTP/2 stream resets (#25170 ) Mid-stream HTTP/2 peer resets from LLM providers can arrive after a 200 streaming response has already emitted provisional parts. Previously those resets fell through as generic non-retryable errors because `stream ID` messages did not match retryable transport signals, and stream IDs could be misread as HTTP statuses. Classify retryable HTTP/2 RST_STREAM codes as transient timeout failures, ignore stream IDs during status extraction, and keep the existing `retry` event as the rollback boundary for provisional message parts so replacement attempts do not replay failed-attempt output. Closes CODAGT-382	2026-05-14 11:40:43 +10:00
Michael Suchacz	d1a471e29e	fix(coderd/x/chatd): retune subagent selection guidance (#25311 ) > Mux working on behalf of Mike. ## Summary - retune chatd subagent guidance to prefer `general` for substantial delegated work, including read-only synthesis and planning support - narrow `explore` guidance to repository-local code lookup and bounded tracing - add regression tests for planning, spawn tool, and Plan Mode guidance text ## Tests - `go test ./coderd/x/chatd -run 'Test(DefaultSystemPromptPlanningGuidance_SteersSubagentSelection\|SpawnAgent_DescriptionSteersGeneralForSubstantialResearch\|SpawnAgent_PlanModeDescriptionOmitsComputerUse\|PlanningOverlaySubagentGuidance_UsesPlanModeSafeDescriptions\|ExploreSubagentIsReadOnly)$'` - `make lint` - `make test TEST_PACKAGES=./coderd/x/chatd RUN=Guidance && make test TEST_PACKAGES=./coderd/x/chatd RUN=Description` - pre-commit hook during `git commit`	2026-05-13 23:10:21 +02:00
Kayla はな	341051ceee	fix: exclude service accounts from license seat count (#24401 )	2026-05-13 13:55:53 -07:00
Zach	e0be9bf213	feat: surface missing coder_secret requirements on resolve-autostart (#25081 ) Adds `dynamicparameters.EvaluateSecretMismatch` as a shared helper on top of the existing renderer, then wires it into the resolve-autostart handler so the UI can surface unsatisfied `coder_secret` requirements in a template alongside parameter mismatch for autostart. The lifecycle executor changes will land in a follow-up that depend on this helper. The UI changes that consume the new `secret_mismatch` field is also a follow-up. Generated with assistance from Coder Agents.	2026-05-13 14:20:02 -06:00
George K	49c6191bbe	fix(coderd/azureidentity): add Azure IMDS G2 chain certificates (#25243 ) Azure IMDS attested data signatures can now chain through Microsoft TLS G2 RSA CA OCSP intermediates, then through the cross-signed Microsoft TLS RSA Root G2 certificate, before reaching DigiCert Global Root G2. coderd did not bundle the new G2 OCSP intermediates or the cross-signed Microsoft TLS RSA Root G2 bridge certificate, so it could fail to build a trusted chain for affected IMDS signatures. Related to: https://linear.app/codercom/issue/PLAT-205/bug-azure-instance-identity-verification-is-broken	2026-05-13 09:07:44 -07:00
Kyle Carberry	5040ab6fca	feat: filter chats by diff URL via the q search parameter (#24970 ) Adds a `diff_url:` term to the `q` search parameter on `GET /api/experimental/chats` so callers can look up the chat associated with a particular pull request, merge request, or any other URL persisted on the chat's diff status. ``` q=diff_url:"https://github.com/coder/coder/pull/123" ``` Match is case-insensitive. When the URL lives on a delegated sub-agent's diff status, the parent chat is returned so the relationship surfaces from a single lookup. <details> <summary>Design notes</summary> - Forge-agnostic. Reuses the existing `chat_diff_statuses.url` column rather than introducing a `pr:` vocabulary, since the SDK already documents the URL as "may point to a pull request or a branch page depending on whether a PR has been opened." Works for GitHub PRs, GitLab MRs, branch pages, etc. - Composes with `archived:`. The two terms can be combined: `q=archived:true diff_url:"..."`. - Case handling. The parser used to lowercase the entire `q` string up front, which would mangle URL path segments. Switched to lowercasing only the field key inside `searchTerms` (already happens there) and keeping the value as the caller typed it. The SQL comparison lowercases on both sides. - Validation. `diff_url` must be a syntactically valid HTTP(S) URL with a non-empty host. No forge-specific validation. - Index. Adds `idx_chat_diff_statuses_url_lower` on `LOWER(url)` so the lookup is cheap even on large datasets. - Sub-agent fan-in. `EXISTS` clause matches when the URL lives on the chat itself or any chat with `root_chat_id` equal to the chat's id, so a delegated sub-agent's PR pulls in its parent. - Deferred. Sentinels like `pr:any` / `pr:none` and a forge-agnostic state filter (`diff_state:open\|merged\|closed`) were intentionally left out of this change. They couple cleanly to a second forge or a clearer product call, and shipping them now would lock in vocabulary we may want to revisit. </details> ## Tests - `coderd/searchquery`: parser tests for valid URLs, case handling (key insensitive, value preserved), composition with `archived:`, and validation errors (non-HTTP scheme, missing host, malformed URL). - `coderd/exp_chats_test.go`: end-to-end coverage hitting `ListChats`. Verifies a root chat matches its own URL, a parent chat surfaces when only a sub-agent has the URL, lookups are case-insensitive, non-matching URLs return empty, and invalid URLs return `400`. --- _This PR was authored by a Coder Agent on behalf of @kylecarbs._	2026-05-13 11:06:42 -04:00
Jakub Domeracki	1a1f06aa79	fix: verify PKCS7 signature on Azure instance identity tokens (#25286 ) Migrates Azure instance identity verification from `go.mozilla.org/pkcs7` and `github.com/fullsailor/pkcs7` to `github.com/smallstep/pkcs7`, using `VerifyWithChainAtTime` to validate both the PKCS7 signature and the certificate chain in one call. The previous code only verified the signer certificate against a set of intermediates/roots but did not verify that the PKCS7 signature itself covered the content, meaning tampered payloads could be accepted. The `Options` struct is restructured to accept `Roots`, `Intermediates`, and `CurrentTime` as explicit fields instead of embedding `x509.VerifyOptions`. The test helper `NewAzureInstanceIdentity` now builds a realistic 3-level certificate chain (Root CA -> Intermediate CA -> Signing Cert) matching real Azure trust hierarchy. New tests (`TestValidate_TamperedContent`, `TestValidate_UntrustedCertWithValidSignature`) confirm tampered and untrusted envelopes are rejected. Addresses GHSA-6x44-w3xg-hqqf. > [!NOTE] > This PR was authored by Coder Agents. <details> <summary>Implementation Plan</summary> ### Files Changed \| File \| Summary \| \|------\|---------\| \| `coderd/azureidentity/azureidentity.go` \| Replace `signer.Verify()` with `VerifyWithChainAtTime`; restructure `Options` struct; add `ParseCertificates()` helper \| \| `coderd/azureidentity/azureidentity_test.go` \| Add `testCertChain` builder, tampered-content and untrusted-cert tests; update existing tests for new `Options` API \| \| `coderd/coderd.go` \| Change `AzureCertificates` field from `x509.VerifyOptions` to `azureidentity.Options` \| \| `coderd/workspaceresourceauth.go` \| Pass `api.AzureCertificates` directly instead of wrapping \| \| `coderd/coderdtest/coderdtest.go` \| Migrate to `smallstep/pkcs7`; build 3-level cert chain in test helper \| \| `go.mod` / `go.sum` \| Add `github.com/smallstep/pkcs7`; remove `fullsailor/pkcs7` and `go.mozilla.org/pkcs7` \| </details>	2026-05-13 14:14:07 +00:00
Jakub Domeracki	57b11d405f	fix(coderd): harden Azure identity certificate fetch (#25274 ) Security improvements: - Restrict cert fetches to a host+port allowlist (Microsoft and DigiCert on 80/443). - Route requests through a dedicated `http.Client` that resolves the host once and dials the validated IP directly, preventing DNS rebinding. - Reject loopback, private (RFC 1918 / IPv6 ULA), link-local, multicast, unspecified, CGNAT, benchmarking, and IPv4-mapped IPv6 addresses. - Cap the certificate response body at 1 MiB. - Log the underlying error via slog and return a generic detail to the caller to prevent information disclosure.	2026-05-13 12:51:44 +02:00
Jakub Domeracki	9400eaa957	revert(coderd): "Merge commit from fork" (#25273 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 12:10:27 +02:00
Jakub Domeracki	fb3aef1883	Merge commit from fork * fix(coderd): Harden Azure identity certificate fetch - Restrict cert fetches to a host+port allowlist (Microsoft and DigiCert on 80/443). - Route requests through a dedicated `http.Client` that resolves the host once and dials the validated IP directly. - Reject loopback, private (RFC 1918 / IPv6 ULA), link-local, multicast, unspecified, CGNAT, benchmarking, and IPv4-mapped IPv6 addresses. - Cap the certificate response body at 1 MiB. - Log the underlying error via slog and return a generic detail to the caller. - Add unit tests for the URL allowlist, IP classification, and dialer. * fix(coderd/azureidentity): add IPv6 special-use ranges to SSRF blocklist The extraBlockedNetworks list only contained IPv4 CIDRs. Add IPv6 equivalents that Go's stdlib classification methods do not cover: - 64:ff9b:1::/48 RFC 8215 NAT64 translation - 100::/64 RFC 6666 discard-only - 2001:2::/48 RFC 5180 benchmarking - 2001:db8::/32 RFC 3849 documentation IPv6 ranges already handled by stdlib (unchanged): - ::1/128 (IsLoopback) - fc00::/7 (IsPrivate, ULA) - fe80::/10 (IsLinkLocalUnicast) - ff00::/8 (IsMulticast) - ::/128 (IsUnspecified)	2026-05-13 11:55:41 +02:00
Ethan	8955599bd0	fix: bump sqlc fork to v1.31.1 merge, strip pg_dump meta-commands (#25105 ) Closes https://github.com/coder/internal/issues/965 Recent `pg_dump` patch releases (13.22+ / 14.19+ / 15.14+ / 16.10+ / 17.6+) emit `\restrict` / `\unrestrict` psql meta-commands at the head and tail of schema dumps. These broke both `sqlc` and our `scripts/migrate-test` schema-equality check. PR #19696 worked around it by pinning `pg_dump` to a Docker image. This change unpins the workaround now that `sqlc` handles the meta-commands: * Bumps the coder/sqlc fork pin to [`337309b` on coder/sqlc:main](https://github.com/coder/sqlc/commit/337309bfb9524f38466a5090e310040fc7af0203), the merge of upstream v1.31.1 (coder/sqlc#6). v1.31.1 includes [sqlc-dev/sqlc#4390](https://github.com/sqlc-dev/sqlc/pull/4390), the upstream `\restrict` / `\unrestrict` parser fix. Updated in three places that pin the fork SHA: `flake.nix` (`sqlc-custom`), `.github/actions/setup-sqlc/action.yaml`, and the `dogfood/coder/ubuntu-{22,26}.04` Dockerfiles. The flake's `sha256` / `vendorHash` are reset to `pkgs.lib.fakeSha256`; Nix will surface the real hashes on first build, per the existing comment block. * Reverts #19696's Docker pin in `coderd/database/dbtestutil/db.go`. Local `pg_dump` (13+) and the `postgres:13` Docker fallback both work again. * Strips `\restrict` / `\unrestrict` lines in `normalizeDump` so `scripts/migrate-test`'s schema comparison is stable across `pg_dump` versions (the token in those lines is randomized per run). `TestNormalizeDumpStripsRestrict` locks the behavior in. * Regenerates with v1.31.1, picking up the version stamp and one upstream correctness fix in `DeleteLicense` ([sqlc-dev/sqlc#4383](https://github.com/sqlc-dev/sqlc/pull/4383): don't shadow the input parameter when scanning a single-column return).	2026-05-13 18:55:24 +10:00
Seth Shelnutt	f355e010e8	fix(coderd/database): clean up org memberships when user is soft-deleted (#25149 ) The soft-delete cleanup trigger (`delete_deleted_user_resources`) removed `api_keys`, `user_links`, and `user_secrets` but left `organization_members` rows intact. When a new user was created with a previously-deleted user's email, both user IDs had org membership rows in the same organization, producing duplicate-email members. Extend the trigger to also delete `organization_members` for the soft-deleted user. This cascades through the existing `trigger_delete_group_members_on_org_member_delete`, which cleans up group memberships automatically. The migration backfills by removing zombie rows for already-deleted users. Fixes ENG-831 > [!NOTE] > 🤖 Generated by Coder Agents <details> <summary>Implementation notes</summary> Root cause: `GetOrganizationIDsByMemberIDs` does not join on `users.deleted = false`, so stale org membership rows for soft-deleted users were visible to internal queries. Even the filtered queries (`OrganizationMembers`, `PaginatedOrganizationMembers`) could surface duplicate emails when a new active user reused a deleted user's email. What changed: - Migration 000491 extends `delete_deleted_user_resources()` to `DELETE FROM organization_members WHERE user_id = OLD.id` - Backfill removes existing zombie org memberships for soft-deleted users - `TestOrgMembersSoftDeleteTrigger` covers org membership removal, raw row cleanup, and cascading group membership cleanup </details>	2026-05-12 16:20:25 -04:00
Yevhenii Shcherbina	b5e1ea33d8	feat: add AI budget policy and period deployment config (#25122 ) Closes https://linear.app/codercom/issue/AIGOV-283/add-deployment-config-for-ai-budget-policy-and-period Adds `CODER_AI_BUDGET_POLICY` and `CODER_AI_BUDGET_PERIOD` deployment options for AI Governance cost controls.	2026-05-12 10:48:36 -04:00
Ethan	fabf7d31fc	test: use default provider in TestPatchChatMessage/ChangesModel (#25189 ) `TestPatchChatMessage/ChangesModel` hardcoded `"openai"` as the provider for the override model config. After #25171, the shared chat test harness registers a single `"openai-compat"` provider by default, so calling `createAdditionalChatModelConfig(..., "openai", ...)` fails with HTTP 400 `Chat provider is not configured` before the test can exercise the model-change path. The subtest was added in #25084 after #25171 was reviewed, so the harness change and the new hardcoded provider only met on `main`. Use `defaultModel.Provider` so the override always matches whatever provider the harness registered. This mirrors every other call site of `createAdditionalChatModelConfig` in the file. Closes https://github.com/coder/internal/issues/1530	2026-05-12 14:05:08 +00:00
Michael Suchacz	96333acda3	fix(coderd): filter build instance agents in SQL (#25031 ) Replaces the per-agent Go-side template-version filter in `handleAuthInstanceID` with a purpose-built SQL query. `GetWorkspaceBuildAgentsByInstanceID` joins `workspace_agents -> workspace_resources -> workspace_builds -> provisioner_jobs -> workspaces` and excludes: - non-`workspace_build` provisioner jobs (template-version-import, dry-run) - deleted agents and sub-agents - deleted workspaces The handler: - drops the per-candidate `GetWorkspaceResourceByID` / `GetProvisionerJobByID` lookups - drops the `provisioner_jobs.input` JSON parsing and the follow-up `GetWorkspaceBuildByID` call - compares `latestHistory.ID` against `selected.WorkspaceBuildID` returned directly from the query - preserves the existing recycled-instance safety check and matching response codes One intentional behavior tightening: agents whose workspace is deleted now return 404 (previously they could reach the recycled-instance check and return 400, or 200 if the stale build was still latest). This matches the existing token-auth path, which already refuses to authenticate against deleted workspaces. The original `GetWorkspaceAgentsByInstanceID` query is intentionally untouched. It remains the generic raw lookup used elsewhere in tests and helpers. The dbauthz wrapper for the new query uses the system-read fast path with `fetchWithPostFilter` for non-system reads, with `RBACObject()` delegating to the embedded `WorkspaceTable`. Tests: - new `TestGetWorkspaceBuildAgentsByInstanceID` covering newest-first ordering, exclusion of deleted/sub agents, exclusion of template-import and dry-run jobs, and exclusion of deleted workspaces - new dbauthz mock test for `GetWorkspaceBuildAgentsByInstanceID` - new `TestPostWorkspaceAuthAWSInstanceIdentity/RecycledInstanceID` exercising the recycled-instance rejection branch (HTTP 400 when the agent's build is no longer latest) - existing `TestPostWorkspaceAuth{AWS,Azure,Google}InstanceIdentity` continue to cover the handler end to end (including the template-version + workspace-build same-instance-ID scenario via `setupInstanceIDWorkspace`) > Mux is acting on Mike's behalf.	2026-05-12 14:55:56 +02:00
Kyle Carberry	b0b07536fc	feat: add opt-in Coder identity headers for MCP servers (#25153 )	2026-05-12 08:54:53 -04:00
Michael Suchacz	f1d160c7f4	fix: allow changing model when editing earlier chat message (#25084 ) Editing a previous user message and selecting a different model in the picker silently kept using the original model: the selection was dropped on the frontend, in the SDK, and in the backend, so both the replacement user message and the assistant turn that followed ran against the old model. Plumb the selected model through all three layers (`AgentChatPage`, `codersdk.EditChatMessageRequest`, `chatd.EditMessageOptions` / `Server.EditMessage`), defaulting to the original message's model when the client does not specify one. The existing `InsertChatMessages` CTE already advances `chats.last_model_config_id` when the inserted message's model differs, so the assistant turn picks up the new selection without further changes. The new model is validated inside the transaction, so an unknown ID rolls the edit back and returns a 400 `Invalid model config ID.`, mirroring the `SendMessage` path. Refs: CODAGT-345 This change was generated by a Coder agent. <details> <summary>Implementation plan</summary> # CODAGT-345: Editing an earlier message cannot change model ## Problem When editing a previous user message in a chat, the user can change the model in the model picker, but the backend keeps using the original message's model. The model selection is dropped at three layers: 1. Frontend: `AgentChatPage.tsx`'s edit branch builds an `EditChatMessageRequest` that omits `model_config_id`. The new-message branch (a few lines below) does include it. 2. SDK: `codersdk.EditChatMessageRequest` has no `ModelConfigID` field at all. 3. Backend: `chatd.EditMessageOptions` has no model field, and `Server.EditMessage` always copies the original message's `ModelConfigID` into the replacement message. Once the replacement user message is inserted with the original model, the `InsertChatMessages` CTE leaves `chats.last_model_config_id` unchanged, so the assistant turn that follows runs against the old model. ## Fix Plumb the selected model through all three layers, defaulting to the original message's model when the client doesn't override it. This mirrors the `SendMessage` path, which already accepts a `model_config_id` and validates it via `resolveSendMessageModelConfigID`. ### Backend - `codersdk/chats.go`: add `ModelConfigID *uuid.UUID` to `EditChatMessageRequest`. - `coderd/x/chatd/chatd.go`: - Add `ModelConfigID uuid.UUID` to `EditMessageOptions`. - In `EditMessage`, after fetching the edited message, resolve the model: if `opts.ModelConfigID != uuid.Nil`, validate it exists with `tx.GetChatModelConfigByID` (using `chatdModelConfigLookupContext`), otherwise keep `editedMsg.ModelConfigID.UUID`. Pass the resolved ID into `newChatMessage(...)`. - Reuse the existing `ErrInvalidModelConfigID` sentinel. - `coderd/exp_chats.go` (`patchChatMessage`): - Read `req.ModelConfigID` (nil-safe), pass into `chatd.EditMessageOptions`. - Add a `case xerrors.Is(editErr, chatd.ErrInvalidModelConfigID)` arm returning 400 `Invalid model config ID.`, matching the `postChatMessages` handler. ### Frontend - `site/src/pages/AgentsPage/AgentChatPage.tsx`: - In the edit branch, set `model_config_id: effectiveSelectedModel \|\| undefined` on the `EditChatMessageRequest`. - On success, persist the chosen model to `lastModelConfigIDStorageKey` so the next chat from this browser keeps the same default. Mirrors the new-message branch. ### Generated - `make site/src/api/typesGenerated.ts` and `make coderd/apidoc/swagger.json` produce the updated `EditChatMessageRequest` schema in `typesGenerated.ts`, `coderd/apidoc/{docs.go,swagger.json}`, and `docs/reference/api/{chats.md,schemas.md}`. ## Tests - `coderd/x/chatd/chatd_test.go`: - `TestEditMessageWithModelConfigOverride`: edit with a different model -> replacement message and `chats.LastModelConfigID` use the new model. - `TestEditMessagePreservesModelConfigByDefault`: edit without `ModelConfigID` -> original model preserved. - `TestEditMessageRejectsUnknownModelConfig`: passes a random UUID -> `ErrInvalidModelConfigID`, original message still present, `LastModelConfigID` unchanged (rollback). - `coderd/exp_chats_test.go` (under `TestPatchChatMessage`): - `ChangesModel`: end-to-end via SDK; `edited.Message.ModelConfigID` and `chat.LastModelConfigID` both match the new model. - `InvalidModelConfigID`: random UUID -> 400 `Invalid model config ID.`. </details>	2026-05-12 14:51:55 +02:00
Michael Suchacz	f847ff3731	test(coderd/x/chatd): skip stale notification flakes (#25177 ) Skip the chatd tests that currently flake because the control notification flow cannot distinguish stale wake/status NOTIFY payloads from real interrupt requests. Each skipped test includes a TODO to re-enable it after the chatd notification flow refactor handles stale notifications correctly. Supersedes #25133, #25134, #25135, and #25139. Refs [CODAGT-353](https://linear.app/coder/issue/CODAGT-353), [CODAGT-356](https://linear.app/coder/issue/CODAGT-356), [CODAGT-360](https://linear.app/coder/issue/CODAGT-360), and [CODAGT-361](https://linear.app/coder/issue/CODAGT-361). > Mux working on behalf of Mike.	2026-05-12 14:50:30 +02:00
Ethan	4e08543ace	test(coderd): centralize chat test harness and stabilize flakes (#25171 ) Chat tests previously constructed a real `openai` provider with a fake API key and no `BaseURL`, so background title generation hit `api.openai.com` and timed out under `-race`. The same root cause produced several distinct flakes: title regeneration races with synchronous `UpdateChat`/`ProposeChatTitle`, and pagination races against `updated_at` bumps from real-network processing. This moves the fake OpenAI-compatible provider and the chat-settle wait into first-class `coderdtest` capabilities. `coderd.Options.ChatProviderAPIKeys` is the new seam tests use to redirect chat traffic to a local `httptest.Server`. `coderdtest.WaitForChatSettled` replaces per-test waiters and drains tracked chat-daemon work after the chat row leaves `pending`/`running`. The `newChatClient*` constructors funnel through one options builder that installs the fake provider before the coderd test server so cleanup ordering is deterministic. Closes https://github.com/coder/internal/issues/1528 & Closes ENG-2659 Closes https://github.com/coder/internal/issues/1480 & Closes CODAGT-359 Closes https://github.com/coder/internal/issues/1507 & Closes CODAGT-368 Relates to https://github.com/coder/internal/issues/1397 & Relates to CODAGT-374	2026-05-12 22:13:55 +10:00
Thomas Kosiewski	5c3b59151e	feat: add Cmd/Ctrl+Enter send setting (#25062 ) Adds an Agents General setting to require Cmd/Ctrl+Enter before sending chat messages. When enabled, plain Enter inserts a newline in agent chat inputs while the send button remains available. The preference is now persisted server-side through `/api/v2/users/{user}/preferences`, alongside the existing user preference settings, and is applied to both the create-agent input and existing chat composer. Storybook and API coverage verify the setting, keyboard behavior, validation, and persistence. <details> <summary>Coder Agents notes</summary> Generated by Coder Agents from a Slack request. Dogfooded with agent-browser against the Storybook settings and chat input stories. </details>	2026-05-12 10:09:34 +02:00
Kyle Carberry	376fc80451	fix(coderd/x/chatd): discover workspace MCP tools mid-turn after create_workspace (#25169 ) ## Problem In `coderd/x/chatd/chatd.go` `runChat`, workspace MCP discovery is gated on `chat.WorkspaceID.Valid` at the start of each turn. New chats that bind their workspace mid-turn (via `create_workspace` or `start_workspace`) get an empty workspace tool list on the first step, and the model falls back to `execute` (bash) because no workspace MCP tools are advertised. Repro: new chat → "create a workspace and use MCP tools". No `/api/v0/mcp/tools` request hits the agent on turn 1; turn 2 in the same chat works fine. ## Fix - Add a `PrepareTools` callback to `chatloop.RunOptions`, analogous to `PrepareMessages`. It is invoked once before each LLM step with the current tool list. When it returns non-nil, the chatloop replaces `opts.Tools`, rebuilds the per-step tool definitions, and appends new tool names to `opts.ActiveTools` so newly injected tools are callable immediately. - Wire `PrepareTools` in `runChat` to trigger workspace MCP discovery the first time the chat snapshot reports a valid `WorkspaceID`. The previous top-of-turn discovery path is unchanged for chats that start with a workspace. - Extract the discovery logic into `Server.discoverWorkspaceMCPTools` so the top-of-turn and mid-turn paths share identical behavior (cache, agent resolution, `ListMCPTools` timeout, invalidation). Mid-turn discovery stays disabled in plan-mode turns and Explore subagents, matching the existing top-of-turn gate. The `workspaceMCPDiscovered` flag prevents redundant dials after the first successful discovery. ## Tests - `coderd/x/chatd/chatloop/chatloop_test.go`: two new `TestRun_PrepareTools*` cases covering injection on the next step and active-set merging when `ActiveTools` is non-empty. - `coderd/x/chatd/chatd_test.go`: `TestRunChat_WorkspaceMCPDiscoveryAfterMidTurnCreateWorkspace` drives `runChat` through a `create_workspace` tool call against a real Postgres + mocked agent conn and asserts the second streamed LLM request advertises the workspace MCP tool. Verified that the test fails (and pinpoints the missing tool) when the `PrepareTools` wiring is disabled. ## Validation ``` go test ./coderd/x/chatd/chatloop/... -count=1 go test ./coderd/x/chatd/... -count=1 make lint/emdash ``` <details> <summary>Decision log</summary> - Chose a per-step `PrepareTools` callback over mutating `opts.Tools` in place because `chatloop.Run` builds the `fantasy.Tool` definitions once at start; a hook is required to let the LLM see new tools on the next step. - Returned `[]fantasy.AgentTool` (not also active-tool-names) and let the chatloop derive name merges via `mergeNewToolNames`. This avoids leaking plan-mode gating decisions into the callback contract. - Kept the existing top-of-turn discovery path so chats that already have a workspace at turn start pay no extra latency. - Skipped reusing `ReloadMessages` (history reload) since this is purely a tool-availability concern; coupling it to a history reload would defeat the chatloop cache prefix optimizations. </details> --- _This pull request was generated by Coder Agents._	2026-05-12 00:30:56 -04:00
Kyle Carberry	5a5cd79c4c	fix: drop buffered chat parts after their durable message commits (#25164 )	2026-05-12 00:30:38 -04:00
Kyle Carberry	07ff3b3f90	fix(coderd/exp_chats_test.go): stabilize TestListChats/Pagination by inserting chats directly (#25137 )	2026-05-12 00:26:22 -04:00
Kyle Carberry	0ed57ee343	fix(coderd/x/chatd): checkpoint buffered message_parts to avoid stale replay (#25145 )	2026-05-11 17:27:03 -04:00
J. Scott Miller	3e46c7986f	feat: event driven agent connection metric (#24355 ) Moves the `coderd_agents_first_connection_seconds` histogram from the polling-based `prometheusmetrics.Agents()` loop to the event-driven `agentConnectionMonitor.init()` path. The metric is now recorded exactly once when an agent first connects over the RPC websocket, instead of being retroactively computed each polling tick. The `username` and `workspace_name` labels are removed to reduce cardinality; only `template_name` and `agent_name` are retained. Adds unit tests covering both the happy path (first connection recorded) and the negative-duration guard (clock skew logs a warning, no sample emitted).	2026-05-11 14:27:40 -05:00
Thomas Kosiewski	e56381eb61	feat: stream advisor tool output (#25032 ) Stream advisor output into the advisor tool card while the nested advisor call is still running. This keeps the advisor implementation intentionally advisor-specific: the parent model still receives the same final structured tool result, while the frontend receives transient `tool-result.result_delta` parts to render partial advisor text in the expanded card. The final persisted chat history remains unchanged. Refs CODAGT-322. Generated by Coder Agents. <details> <summary>Implementation plan</summary> - Publish advisor text deltas from the nested `chatloop.Run` via `RunAdvisorOptions.OnAdviceDelta`. - Forward those deltas through `chatadvisor.Tool` with the parent advisor tool call ID. - Emit transient `ChatMessagePartTypeToolResult` websocket parts with `ResultDelta` from `chatd`. - Add `result_delta` to the generated tool-result TypeScript variant. - Accumulate tool result deltas in frontend stream state and keep the tool running until the final result arrives. - Render streamed advisor advice in the existing advisor card using streaming markdown mode, while retaining the updated advisor UI. </details>	2026-05-11 20:18:49 +02:00
Michael Suchacz	6bb88775ab	test(coderd/x/chatd): pin TestGetWorkspaceConn_StatusCheck to mock clock (#25130 ) The `TimedOutAgentCacheHit`, `CacheHitHealthyAgent`, and `CacheHitDBError` subtests of `TestGetWorkspaceConn_StatusCheck` built their `WorkspaceAgent` timestamps with `time.Now()` in the parent test's slice literal and then ran the actual check against the server's real wall clock (`quartz.NewReal()`). On slow Windows CI runners, more than `agentInactiveDisconnectTimeout` (30s) of wall time can elapse between slice construction and the parallel subtest body. In that window, the cached "healthy" agent gets reclassified as disconnected by `agentDisconnectedFor`, and `CacheHitHealthyAgent` fails with `errChatAgentDisconnected` instead of returning the cached connection. Build each agent inside the subtest with `quartz.NewMock(t)` and feed the same clock into the `Server` so the agent timestamps and the status math share a single frozen `now`. This matches the pattern already used by `TestGetWorkspaceConn_DialTimeoutDisconnectedRecoveryThreshold` in the same file. Closes https://github.com/coder/internal/issues/1522 <details> <summary>Verification</summary> Inserting `time.Sleep(35 * time.Second)` at the top of each subtest's body reliably reproduces the original failure (`errChatAgentDisconnected` on `CacheHitHealthyAgent`) on the parent commit and passes with this change. After removing the synthetic sleep, `go test ./coderd/x/chatd -run TestGetWorkspaceConn_StatusCheck -count=50` passes cleanly. </details> > Generated by Coder Agents on behalf of the assignee. Co-authored-by: Coder Agents <noreply@coder.com>	2026-05-11 19:53:58 +02:00
Kyle Carberry	e3db203011	fix(coderd/azureidentity): set explicit roots to avoid macOS system verifier (#25136 ) Fixes [CODAGT-372](https://linear.app/codercom/issue/CODAGT-372/coderdazureidentity-testvalidateregular-fails-on-macos). Closes coder/internal#101. ## Problem `coderd/azureidentity TestValidate/regular` fails on macOS with: ``` verify signature: github.com/coder/coder/v2/coderd/azureidentity.Validate /Users/runner/work/coder/coder/coderd/azureidentity/azureidentity.go:75 - x509: “metadata.azure.com” certificate is not standards compliant ``` When `crypto/x509.VerifyOptions.Roots` is `nil`, Go's verifier on macOS/iOS falls back to the system verifier (`systemVerify` in `crypto/x509/root_darwin.go`), which delegates to Apple's `SecTrustEvaluateWithError`. Apple's framework enforces stricter standards-compliance checks than Go's pure-Go verifier and rejects some otherwise valid Azure instance-identity leaf certificates with `errSecCertificateIsNotStandardsCompliant`, surfaced as the `not standards compliant` error. The test had been skipped on darwin since #12979 (April 2024) as a workaround. ## Fix - Embed the three root CAs that Azure instance-identity certificates ultimately chain to: - DigiCert Global Root G2 - DigiCert Global Root G3 - Baltimore CyberTrust Root (kept for historical chains via `Microsoft RSA TLS CA 01/02`) - In `Validate`, populate `options.Roots` from those embedded roots when the caller does not supply its own pool. Because `Roots != nil`, Go no longer takes the `systemVerify` path on darwin and uses the pure-Go verifier on all platforms. - Remove the `runtime.GOOS == "darwin"` skip from `TestValidate`. - Add `TestEmbeddedRoots` to guard against future regressions in the embedded root list (parses each PEM, asserts self-signed, requires all three named roots). The caller's existing `Intermediates` handling is unchanged. Tests that pass their own `Roots` (e.g. `coderdtest.NewAzureInstanceIdentity`) are unaffected. ## Verification On Linux: ``` $ go test ./coderd/azureidentity/ -race -count=1 -v === RUN TestValidate === RUN TestValidate/regular === RUN TestValidate/govcloud === RUN TestValidate/rsa --- PASS: TestValidate (0.00s) --- PASS: TestValidate/regular (0.00s) --- PASS: TestValidate/rsa (0.00s) --- PASS: TestValidate/govcloud (0.00s) === RUN TestEmbeddedRoots --- PASS: TestEmbeddedRoots (0.00s) === RUN TestExpiresSoon --- SKIP: TestExpiresSoon (0.00s) PASS ok github.com/coder/coder/v2/coderd/azureidentity 1.020s ``` The `test-go-pg` job on `macos-latest` in CI is the authoritative confirmation of the fix on macOS; previously it would have failed `TestValidate/regular` had the skip been removed. <details> <summary>Why this is the correct fix</summary> From `/usr/local/go/src/crypto/x509/verify.go`: ```go // Use platform verifiers, where available, if Roots is from SystemCertPool. if runtime.GOOS == "windows" \|\| runtime.GOOS == "darwin" \|\| runtime.GOOS == "ios" { systemPool := systemRootsPool() if opts.Roots == nil && (systemPool == nil \|\| systemPool.systemPool) { return c.systemVerify(&opts) } ... } ``` Setting `opts.Roots` to any non-nil, non-system pool deterministically routes verification through Go's pure-Go verifier, bypassing Apple's stricter compliance checks. The embedded roots are sufficient to validate every chain we currently care about, since every intermediate in `Certificates` ultimately issues to one of the three embedded roots. </details> > Generated by Coder Agents. Reviewed manually.	2026-05-11 13:53:33 -04:00
Michael Suchacz	60779ad2ec	test(coderd/x/chatd): stop waking acquireLoop in TestResolveExploreToolSnapshot (#25129 ) Fixes [CODAGT-367](https://linear.app/codercom/issue/CODAGT-367). `TestResolveExploreToolSnapshot/` flaked on CI (Linux and Windows) with `context deadline exceeded` on the `GetMCPServerConfigsByIDs` call inside `resolveExploreToolSnapshot`. Each test setup called `server.CreateChat` twice with `MCPServerIDs` set to fake `.example.com` URLs. `CreateChat` marks the chat pending and calls `signalWake`, which causes the chatd background `acquireLoop` to pick the chat up. That goroutine then dialed the fake MCP URLs (NXDOMAIN, slower on Windows) and made an OpenAI request with the dbgen default test key (401). Under CI load, that activity racing the 4 parallel subtests' `GetMCPServerConfigsByIDs` calls was enough to exceed the 25s test context deadline. The failure logs in the issue showed both side effects firing in the same job. `resolveExploreToolSnapshot` only reads `ID`, `MCPServerIDs`, `PlanMode`, `ParentChatID`, and `Mode` off the parent argument, so the chats do not need to be persisted. Build them as in-memory `database.Chat` values instead. The MCP server configs remain in the DB because the function still queries them via `GetMCPServerConfigsByIDs`. Verified locally with `go test ./coderd/x/chatd -run TestResolveExploreToolSnapshot -count=100 -race` (passes, ~5s total) and the surrounding `TestResolve` / `TestCreateChildSubagentChat` / `TestSpawnAgent_Explore` tests. --- _Made by Coder Agents on behalf of @ibetitsmike. [Linear session](https://linear.app/codercom/issue/CODAGT-367/flake-testresolveexploretoolsnapshot#agent-session-0730f3fe)._	2026-05-11 19:46:59 +02:00
Steven Masley	19573e8aee	feat!: patchTemplateMeta to use optional fields (#24984 ) Closes https://github.com/coder/coder/issues/13112 Breaking Change: Removed status code `StatusNotModified` when no diffs occur in a patch. Now the patch is always applied and a template is always returned.	2026-05-11 12:43:52 -05:00
Michael Suchacz	645b8cc63d	fix(coderd/x/chatd/chaterror): deflake TestClassify_ParsesRetryAfterHTTPDate (#25128 ) The test built a `Retry-After` HTTP-date with `time.Now().Add(3*time.Second).UTC().Format(http.TimeFormat)`, then asserted that the parsed `RetryAfter` was `>= 2s`. `http.TimeFormat` has second precision, so `Format()` truncates up to ~1s. Combined with the small elapsed time between formatting in the test and `time.Until()` in production, the value could land just under `offset-1s` (1.997s observed in CI), failing the lower bound. Round the formatted target up to the next whole second so the parsed deadline is never earlier than `now+offset`, and assert against a symmetric `[offset-1s, offset+1s]` window. Closes [CODAGT-365](https://linear.app/codercom/issue/CODAGT-365/flake-testclassify-parsesretryafterhttpdate) Refs https://github.com/coder/internal/issues/1512 <sub>Created by [Coder Agents](https://coder.com/docs/agent).</sub> Co-authored-by: Coder Agents <coderagents@coder.com>	2026-05-11 19:09:51 +02:00
Cian Johnston	e8508b2d90	fix: recover chatd from poisoned chain anchor on retry (#25097 ) When OpenAI's Responses API returns `Previous response with id ... not found` for a chained turn, classify it as a `ChainBroken` retry, clear `previous_response_id`, exit chain mode, reload full history, and let `chatretry` retry. Self-heals chats whose anchor was poisoned before #25074 stopped truncated streams from being persisted as a successful turn with a stored response id. The new state is exposed via the existing `coderd_chatd_stream_retries_total` counter as a `chain_broken="true"\|"false"` label. Aggregating queries (`sum`, `rate` over `provider`/`model`/`kind`) keep working without changes; raw-series matchers without aggregation will now see two series per `(provider, model, kind)` where they previously saw one. The metric is internal-only so the blast radius should be small, but if you have dashboards that index by exact label matchers without aggregation they will need an extra `sum` or an explicit `chain_broken` selector. > 🤖 This PR was created with the help of Coder Agents, and was reviewed by a human 🧑‍💻	2026-05-11 17:43:40 +01:00
Michael Suchacz	915956460a	feat(coderd/x/chatd): add compact turn status labels (#25043 ) > Mux is acting on Mike's behalf. Changes chat turn-end summaries into compact status labels for the cached `last_turn_summary` and successful web push body. Uses a structured-output model call for successful turns, requiring a 2-5 word `label` and validating it to reject agent-centric phrasing. Pending and requires-action states keep deterministic status labels. Removes the earlier deterministic tool-signal pipeline in favor of the smaller structured-output path.	2026-05-11 17:09:42 +02:00
Zach	b221632615	fix: wipe user secrets when user is soft-deleted (#24985 ) Extend the delete_deleted_user_resources() trigger so that secrets belonging to a soft-deleted user are removed in the same transaction as the existing api_keys and user_links cleanup. user_secrets.user_id has ON DELETE CASCADE, but Coder soft-deletes users by flipping users.deleted rather than removing the row, so the foreign key cascade never fires and secrets would otherwise survive deletion. Assisted by Coder Agents.	2026-05-11 09:07:30 -06:00
Zach	81e2be69e9	test: use typed atomics in test files (#25071 ) Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent mixing atomic and non-atomic access on the same value, guarantee 64-bit alignment on 32-bit platforms, and provide a cleaner API.	2026-05-11 08:41:17 -06:00
Jeremy Ruppel	a1dbd758bc	feat: add template builder deployment config and telemetry types (#25082 )	2026-05-11 09:48:55 -04:00
Thomas Kosiewski	4a6756a3e8	fix: isolate test HTTP clients (#25038 )	2026-05-11 11:03:38 +02:00
Marcin Tojek	febabfb8b2	feat: add request/response dump support to aibridgeproxyd (#24837 ) Closes https://github.com/coder/coder/issues/24335	2026-05-11 10:59:26 +02:00
Mathias Fredriksson	fb60bb0c08	chore(coderd/x/chatd): instrument PromoteQueued + stream subscriber for ENG-2645 (#25085 ) TestPromoteQueuedWhileRequiresActionMixedTools has flaked three times across Windows and Ubuntu CI runners since 2026-05-06; local repro on the dev workspace has not surfaced it. The May 8 Ubuntu log shows all four PromoteQueued post-TX pubsub publishes reaching pg_notify, yet the test still times out 25s later, so the failure is downstream between the subscriber's listener and the test's events channel. Adds three Debug-level markers in chatd.go (no logic change) plus two t.Logf markers in the test's reader so the next CI occurrence pins down exactly which step failed. Closes ENG-2645 Closes coder/internal#1523	2026-05-11 08:33:46 +00:00
Ethan	063c06ca5f	test: prevent expired contexts in chatd parallel subtests (#25107 ) Parallel subtests in `coderd/x/chatd` reused a parent test context with a `testutil.WaitLong` deadline, so the context could expire before a subtest was scheduled under load. That made the subagent lifecycle tools return plain-text context errors instead of the expected JSON payload, causing flaky JSON unmarshal failures. Create fresh `chatdTestContext` values inside the affected parallel subtests and add `chatdTestContext` to the `paralleltestctx` custom function list so this pattern is caught by `make lint`. Closes https://github.com/coder/internal/issues/1494	2026-05-11 17:48:27 +10:00
Ethan	bd6cc1aaf2	feat(coderd): add stop_workspace chatd tool and recovery classification (#24997 ) ## Summary Adds a `stop_workspace` tool to chatd so the model can recover from the "workspace running but agent dead" failure mode (e.g. an OOM that leaves the workspace running but the agent unreachable) by stopping and then starting the workspace. <img width="924" height="742" alt="image" src="https://github.com/user-attachments/assets/279dedb6-6e29-4fe1-8754-3a1f01e538bf" /> ## What changed New `stop_workspace` chatd tool (`coderd/x/chatd/chattool/stopworkspace.go`). Mirrors `start_workspace`: shares `WorkspaceMu` to serialize with create/start, waits for any in-progress build before issuing a stop, and is idempotent only after a successful Stop transition. Failed stop builds re-attempt rather than reporting success. New `chatStopWorkspace` coderd hook (`coderd/exp_chats.go`). Mirrors `chatStartWorkspace` minus the `RequireActiveVersion` gate. Stop should not be blocked by template version policy. Differentiated recovery sentinels (`coderd/x/chatd/chatd.go`). `errChatAgentDisconnected` instructs the model to call `stop_workspace` then `start_workspace`. `errChatDialTimeout` instructs a single retry, then user escalation if it repeats. The previous single message conflated transient and persistent failures. Two-signal recovery gate. Recovery is only surfaced when a tool call times out and a fresh DB read of the latest workspace agent says `Disconnected`. The previous draft escalated on the DB read alone, which would fire on a 30-second heartbeat blip (e.g. agent respawn) and prompt a destructive stop/start unnecessarily. Cache-hit disconnected handling now clears the cache and retries a fresh dial before escalating, rather than returning the recovery sentinel immediately. Latest-agent classification uses `GetWorkspaceAgentsInLatestBuildByWorkspaceID` instead of the chat's bound `AgentID`, so stale bindings after a rebuild don't misclassify. Shared chattool helpers in `coderd/x/chatd/chattool/chattool.go`: `latestWorkspaceBuildAndJob`, `publishBuildBinding`, `provisionerJobTerminal`. Applied to both `start_workspace` and `stop_workspace`. ## Notes - Reverts an earlier draft that widened `ask_user_question` to root standard turns. Plan-mode-only behavior is restored. - The `stop_workspace` tool currently renders via the generic chat tool-call UI. A follow-up frontend PR will prettify the `stop_workspace` tool and style it like the `start_workspace` tool. - Never-connected (`Timeout` status) agents are intentionally excluded from recovery. They indicate template or startup failure, not the running-but-dead case this PR targets. Closes CODAGT-315	2026-05-11 16:23:07 +10:00
Kyle Carberry	aaa0dacdb3	fix: infer workspace claim time from build history for /agents delete dialog (#25057 ) Closes [CODAGT-317](https://linear.app/codercom/issue/CODAGT-317/pr-workspaces-sometimes-require-name-confirmation-to-delete). ## Problem The `/agents` archive-and-delete molly-guard (typing the workspace name) was firing for chats that had clearly created their own workspace. The heuristic in `resolveArchiveAndDeleteAction` decides whether confirmation is needed by comparing the workspace's `created_at` against the chat's `created_at`: ```ts return new Date(workspaceCreatedAt) >= new Date(chatCreatedAt); ``` That assumption breaks for prebuilt workspaces. `ClaimPrebuiltWorkspace` rewrites `owner_id`, `name`, `updated_at`, `last_used_at`, etc., but never touches `created_at`, which still reflects when the prebuild was provisioned by the reconciler, often hours before the chat exists. Result: every prebuild-claimed workspace looks pre-existing, so the molly-guard fires. Concrete example from a real chat: \| Field \| Value \| \|---\|---\| \| `chat.created_at` \| `2026-05-07T15:12:23Z` \| \| `workspace.created_at` (provision) \| `2026-05-07T14:22:24Z` \| \| `latest_build.created_at` (claim) \| `2026-05-07T15:19:09Z` \| `14:22:24 < 15:12:23` so `isWorkspaceAutoCreated` returned false even though the chat issued the claim. ## Fix (frontend-only) Derive the moment a workspace was acquired from existing build history rather than relying on `workspace.created_at`: - Build #1 initiator = prebuilds system user → workspace was a prebuild → use `build_2.created_at` (the claim build) as the acquisition time. - Build #1 initiator = real user → workspace was created from scratch → use `workspace.created_at` (unchanged behavior). - Unclaimed prebuild or no build history → return `null` (force confirmation; safe degradation for a destructive flow). The resolver fetches the build list via the existing `getWorkspaceBuilds` endpoint when the dialog might fire. No new column, no migration, no schema change. Works retroactively for all existing claimed prebuilds; no backfill needed. The prebuilds system user UUID is exposed via `codersdk.PrebuildsSystemUserID` and typegen'd to `typesGenerated.ts`. `coderd/database.PrebuildsSystemUserID` parses that constant via `uuid.MustParse` so the two cannot drift; if the codersdk literal ever changes, package init fails fast. ## History The first draft of this PR added a `workspaces.claimed_at` column populated by `ClaimPrebuiltWorkspace`. After review feedback from @johnstcn pointing out that the same fact is already implicit in build history, I pivoted to the frontend-only approach. Subsequent review notes consolidated the prebuilds system user UUID into a single typegen'd constant. ## Why not the other open PRs - #25055 (`chatKey` cache fallback) only fixes a different cache-miss path; it explicitly notes it does not address `created_at < chat.created_at`. - #25053 (`chats.workspace_auto_created` boolean) puts the truth on the wrong side of the schema: "this workspace was claimed at time T" is a property of the workspace, not the chat. The MCP plumbing it adds is also unnecessary now that the same answer is available from build history. ## Test plan - `pnpm vitest run --project=unit src/pages/AgentsPage/utils/agentWorkspaceUtils.test.ts` — 40/40 pass; new cases cover prebuild claim before/after chat, unclaimed prebuild, missing-build-history fallback, and the fetch-skip when the chat is not in cache. - `pnpm lint:types`, `pnpm check`, `make pre-commit`. <details> <summary>Disclosure</summary> Opened on behalf of @kylecarbs by [Coder Agents](https://coder.com/coder-agents). </details>	2026-05-10 11:04:55 -04:00
Yevhenii Shcherbina	4124d1137d	feat: add ai_model_prices table (#24932 ) # Summary Implements https://linear.app/codercom/issue/AIGOV-282/add-ai-model-price-table-and-seed-generator This PR lays the groundwork for AI Bridge cost controls (per the AI Governance RFC). It adds the foundation needed for future cost tracking: a place to store per-model token prices, a way to keep those prices in sync with upstream pricing data, and a startup mechanism that ensures every deployment has prices loaded before AI Bridge starts processing requests. The price data comes from [models.dev](https://models.dev/), a community-maintained catalogue of AI provider pricing. A generator script fetches the latest prices, filters to Anthropic and OpenAI for now, and produces a seed file checked into the repository. On every server startup the seed is applied to the database, so new releases automatically pick up any price corrections that landed since the previous one. Existing rows are overwritten with the latest prices; rows for models no longer in the seed are left untouched. # Batching the AI model price seed: three approaches Context: at server startup we seed the `ai_model_prices` table from an embedded JSON price book (~70 rows today, will grow as we add providers, potentially 4000+). Each row is: ```text (provider, model, input_price, output_price, cache_read_price, cache_write_price) ``` Any of the four price columns can be: - `NULL` → “price unknown for this dimension” - explicit `0` → “free” The batch must be an UPSERT so re-running is idempotent and existing rows pick up new prices. We considered three implementations. --- ## Approach 1 — Per-row UPSERT in a Go loop ```go for _, row := range rows { if err := db.UpsertAIModelPrice(ctx, database.UpsertAIModelPriceParams{ Provider: row.Provider, Model: row.Model, InputPrice: nullInt64(row.InputPrice), // ... }); err != nil { return err } } ``` ### Pros - Trivial. - NULL handling falls out naturally from `sql.NullInt64`. ### Cons - `N` round-trips per seed. - With ~70 rows that means ~70 statement executions on every startup, even inside a transaction. - Doesn't scale gracefully as the price book grows, potentially 4000+. --- ## Approach 2 — `UNNEST` with parallel arrays Pass each column as a separate Go slice. Postgres unnests them in parallel into a virtual table, then `INSERT ... SELECT`. ```sql INSERT INTO ai_model_prices ( provider, model, input_price, output_price, cache_read_price, cache_write_price ) SELECT UNNEST(@providers::text[]), UNNEST(@models::text[]), NULLIF(UNNEST(@input_prices::bigint[]), -1), NULLIF(UNNEST(@output_prices::bigint[]), -1), NULLIF(UNNEST(@cache_read_prices::bigint[]), -1), NULLIF(UNNEST(@cache_write_prices::bigint[]), -1) ON CONFLICT (provider, model) DO UPDATE SET input_price = EXCLUDED.input_price, output_price = EXCLUDED.output_price, cache_read_price = EXCLUDED.cache_read_price, cache_write_price = EXCLUDED.cache_write_price, updated_at = NOW(); ``` Go side: flatten rows into six parallel slices. Use a sentinel (`-1`) for “missing”, since `lib/pq` can't encode `NULL` into a `bigint[]` element. ```go providers := make([]string, len(rows)) models := make([]string, len(rows)) inputs := make([]int64, len(rows)) outputs := make([]int64, len(rows)) cacheR := make([]int64, len(rows)) cacheW := make([]int64, len(rows)) for i, r := range rows { providers[i] = r.Provider models[i] = r.Model inputs[i] = -1 if r.InputPrice != nil { inputs[i] = r.InputPrice } outputs[i] = -1 if r.OutputPrice != nil { outputs[i] = r.OutputPrice } cacheR[i] = -1 if r.CacheReadPrice != nil { cacheR[i] = r.CacheReadPrice } cacheW[i] = -1 if r.CacheWritePrice != nil { cacheW[i] = r.CacheWritePrice } } return db.UpsertAIModelPrices(ctx, database.UpsertAIModelPricesParams{ Providers: providers, Models: models, InputPrices: inputs, OutputPrices: outputs, CacheReadPrices: cacheR, CacheWritePrices: cacheW, }) ``` ### Pros - Single round-trip. ### Cons - The generated `sqlc` params become plain `[]int64`, which can't represent `NULL`. --- ## Approach 3 — `jsonb_array_elements` over a single `@seed::jsonb` (chosen) Pass the raw seed JSON as one parameter; let Postgres expand and parse it. ```sql INSERT INTO ai_model_prices ( provider, model, input_price, output_price, cache_read_price, cache_write_price ) SELECT elem->>'provider', elem->>'model', (elem->>'input_price')::bigint, (elem->>'output_price')::bigint, (elem->>'cache_read_price')::bigint, (elem->>'cache_write_price')::bigint FROM jsonb_array_elements(@seed::jsonb) AS elem ON CONFLICT (provider, model) DO UPDATE SET input_price = EXCLUDED.input_price, output_price = EXCLUDED.output_price, cache_read_price = EXCLUDED.cache_read_price, cache_write_price = EXCLUDED.cache_write_price, updated_at = NOW(); ``` Go side reduces to: ```go return db.UpsertAIModelPrices(ctx, seedJSON) ``` ### Pros - Single round-trip. - NULLs fall out naturally: - `(elem->>'cache_write_price')::bigint` becomes `NULL` - no sentinels - The seed is already JSON: - Existing precedent: - `jsonb_array_elements` is already used elsewhere in the codebase ### Cons - Less type-safe at the SQL boundary than `UNNEST` - Slightly less standard than `UNNEST` - Readers need familiarity with: - `jsonb_array_elements` - `->>` extraction syntax - Postgres pays JSON parse cost - negligible at our scale --- --- # Decision We picked Approach 3. It collapses the round-trips like `UNNEST` does, but without: - nullable-array workarounds - sentinel values	2026-05-08 16:45:14 -04:00
Mathias Fredriksson	3925d3941b	fix(coderd/x/chatd): wait long enough for cold-start workspace MCP discovery (#25035 ) The 5s timeout cancelled cold-start ListMCPTools calls before the agent's 30s connectTimeout could settle, so workspace MCP tools never reached the LLM. Bump to 35s and scope to ListMCPTools only.	2026-05-08 17:49:10 +03:00
Ethan	b6dbc5614c	fix(coderd/x/chatd): handle truncated provider streams (#25074 ) coder/fantasy now fails closed when Anthropic or OpenAI Responses streams close before their provider terminal events instead of yielding a successful finish. This bumps the fantasy replacement to coder/fantasy#33 and teaches chat error classification to treat those failures as retryable timeout errors with explicit stream-closed messages. <img width="875" height="311" alt="image" src="https://github.com/user-attachments/assets/69c6f7b5-c885-46d2-a88b-b7a2b111bd55" />	2026-05-08 15:52:42 +10:00
Ethan	de9cdca77e	fix(coderd): handle external-agent workspaces honestly in chat (#24969 ) ## Summary Make Coder's chat agent honest about workspaces that use `coder_external_agent`. Three behaviors change so the chat stops pretending it can drive an external workspace through to a usable state on its own. <img width="859" height="537" alt="image" src="https://github.com/user-attachments/assets/0561442b-95f1-4a2d-853c-7e3776114680" /> ## Problem External agents are not started by Coder. The user has to run `coder agent` on their own host with a token Coder generates. Before this change, the chat agent treated those workspaces like any other: - `create_workspace` would enqueue a build for an external-agent template and then wait minutes (~22 worst case) for an agent that was never going to come up. - When mid-turn tool calls dialed an external agent that was not connected, the chat burned the full 30-second dial timeout and returned generic "the workspace may need to be restarted from the Coder dashboard" guidance, which is not the action the user can take. - Nothing told the chat (or the user, through the chat) that the next action lives outside Coder. ## Fix Three changes scoped to `coderd/x/chatd/`: 1. `create_workspace` blocks templates with external agents. The tool reads `template_versions.has_external_agent` for the template's active version and refuses external-agent templates with a message instructing the chat to pick a different template, or to have the user create and start the workspace themselves and then attach it. 2. Attaching an existing external workspace stays open. No selection-time gate on attachment; users can still bind a working external workspace to a chat. 3. External-agent-aware error handling on connection. Two complementary changes both predicated on proven connectivity failures rather than every dial error: - `getWorkspaceConn` preflight and timeout handling. Before opening a connection, the cache-miss path reads the agent's status from the already-loaded row. If the selected agent is external and clearly offline according to the existing `isAgentUnreachable` helper (`Disconnected` or `Timeout`, never `Connecting`), it returns an external-agent-specific error immediately instead of waiting out the 30-second dial timeout. `Connecting` external agents fall through to the dial so a user who just started the agent on their host can still succeed in the same turn. The preflight only fires when the agent is still the latest selected agent for the workspace, so stale-binding recovery via `dialWithLazyValidation` is unaffected. The post-dial rewrite is limited to the dial timeout sentinel; stale/no-agent bindings and non-timeout dial failures preserve their original errors. - `waitForAgentReady` timeout-branch rewrite. The 2-minute retry loop used by `create_workspace` and `start_workspace` runs unchanged for all agents. When the loop's outer deadline elapses, the timeout branch substitutes the external-agent message in place of the raw dial error if the agent belongs to an external resource. This applies the same pattern that the cache-hit path of `getWorkspaceConn` already used (`isAgentUnreachable` returning `errChatAgentDisconnected`), extended to the cache-miss path and to the readiness helper, with the external-agent-aware error rewrite layered only on confirmed offline or timeout paths. Closes CODAGT-314	2026-05-08 13:51:13 +10:00
Ethan	3a9080fff6	feat: tag chat-originating agent logs with chat_id (#25019 ) Workspace-agent logs emitted while serving chatd-driven requests were not correlated with the originating chat, making agent logs hard to attribute to the corresponding/originating chat. This adds agent-side chat context middleware that parses `Coder-Chat-Id` once, enriches agent access logs and structured handler/background logs, and adds a chatd bridge log when chat headers are attached to an agent connection. Closes CODAGT-324	2026-05-08 13:25:30 +10:00
Cian Johnston	9581f76e07	fix: add /api prefix to chat swagger annotations (#25051 ) Fixes API endpoints in exp_chats.go to ensure the API endpoints show up correctly. > 🤖	2026-05-07 20:45:28 +01:00

1 2 3 4 5 ...

3828 Commits