coder

mirror of https://github.com/coder/coder.git synced 2026-06-02 20:48:20 +00:00

Author	SHA1	Message	Date
github-actions[bot]	b98577cb91	fix: drop N+1 db query on template ACL available (#25465 ) (#25635 ) Backport of https://github.com/coder/coder/pull/25465 Original PR: #25465 — fix: drop N+1 db query on template ACL available Merge commit: `9b6eadab77` Requested by: @f0ssel Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-05-22 21:40:31 -04:00
Cian Johnston	ec03b1bba0	fix(coderd/taskname): parse task name JSON with trailing text (#25005 ) (#25299 ) Anthropic task name responses can include valid JSON followed by a closing fence or extra text, which made `json.Unmarshal` fail with trailing-character errors and forced fallback naming. This updates task name JSON extraction to accept the first JSON value after optional fences and adds regression coverage for fenced and bare JSON with trailing content. (cherry picked from commit `87d580d3fe`) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Co-authored-by: Max Schwenk <maschwenk@gmail.com>	2026-05-18 12:31:33 -04:00
Garrett Delfosse	7fc8a0829a	fix(coderd): skip stale agents from prior builds in instance-identity auth (#25443 ) Fixes the HTTP 409 ambiguity errors that occur during instance-identity auth when stale workspace agents from prior builds accumulate with the same `auth_instance_id`. ## Problem #24325 changed the instance-identity auth path from a `:one` lookup (which silently picked the newest agent) to a `:many` lookup with ambiguity rejection. This caused HTTP 409 errors for workspaces whose EC2/Azure/GCP instances had been through multiple builds, because old agents from prior builds (sharing the same instance ID) were still returned by the query. ## Solution Inside the existing per-candidate loop in `handleAuthInstanceID` (which already does per-candidate DB calls for resource and job lookups), add a latest-build check: parse the provisioner job input to get the workspace build, compare against the latest build for that workspace, and `continue` past candidates whose build is not current. 1 file changed, no SQL/migration/schema changes. > Generated by Coder Agents on behalf of @f0ssel	2026-05-18 12:23:47 -04:00
github-actions[bot]	abe1c85c69	fix(coderd/azureidentity): add Azure IMDS G2 chain certificates (#25243 ) (#25345 ) Cherry-pick of https://github.com/coder/coder/pull/25243 Original PR: #25243 — fix(coderd/azureidentity): add Azure IMDS G2 chain certificates Merge commit: `49c6191bbe` Requested by: @geokat Co-authored-by: George K <george@coder.com>	2026-05-14 13:00:33 -07:00
Spike Curtis	2b778f292c	fix: verify PKCS7 signature on Azure instance identity tokens (2.33 cherry-pick) (#25302 ) cherry pick of: #25286 The Azure instance-identity authentication endpoint parsed the PKCS7 envelope and verified the certificate chain, but never verified the PKCS7 signature itself. An attacker could forge a PKCS7 envelope with a legitimate, publicly obtainable Azure certificate and arbitrary vmId content to obtain any agent auth token. Add verifyPKCS7Signature(), a custom PKCS7 signature verification that handles Azure non-standard use of sha256WithRSAEncryption (OID 1.2.840.113549.1.1.11) as the DigestAlgorithm. The upstream go.mozilla.org/pkcs7 library Verify() rejects this combination. The verification checks: 1. Content digest matches the signed message-digest attribute 2. Signature over the authenticated attributes is valid Tests added: - TestValidate_TamperedContent: forges a PKCS7 with modified vmId, confirms rejection - TestValidate_UntrustedCertWithValidSignature: valid PKCS7 signature with untrusted cert chain, confirms rejection Co-authored-by: Jakub Domeracki <jakub@coder.com>	2026-05-13 13:45:37 -04:00
Jakub Domeracki	844c1e0467	fix(coderd): harden Azure identity certificate fetch (cherry-pick v2.33) (#25276 ) Cherry-pick of https://github.com/coder/coder/commit/57b11d405f17492aa789d4b9ff33366f961a37f8 to `release/2.33`. Backport of #25274. > [!NOTE] > This PR was created by Coder Agents on behalf of a human.	2026-05-13 17:34:52 +02:00
david-fraley	d622e86fa0	fix: backport 11 Coder Agents docs PRs to release/2.33 (#25047 )	2026-05-07 12:54:47 -05:00
Dean Sheather	3e34ba7bf0	chore: remove agents experiment flag and mark feature as beta (#24432 ) (#25003 )	2026-05-07 03:30:35 +10:00
Garrett Delfosse	f009c17217	fix(coderd): cut DB fan-out on agent instance-identity auth (backport #24973 ) (#24982 ) Backport of #24973 to `release/2.33`. ## Summary Restores `v2.33.0-rc.2`-equivalent query cost for agent instance-identity auth, which currently saturates the pgx pool when multiple agents share an instance ID. Customer report against rc.3 traced 233x `Internal error fetching provisioner job resource` 500s during a 50-minute incident window to this path. ## Changes 1. System fast-path on `authorizeProvisionerJob` (`coderd/database/dbauthz/dbauthz.go`): Short-circuits the per-job RBAC fan-out through `GetWorkspaceBuildByJobID` -> `GetWorkspaceByID` for `AsSystemRestricted` callers. 2. Drop survivor re-fetch in `handleAuthInstanceID` (`coderd/workspaceresourceauth.go`): Captures the provisioner job alongside each candidate during the filter loop so the post-selection code reads it directly instead of re-querying. ## Conflict resolution One conflict in `coderd/database/dbauthz/dbauthz_test.go`: the `TestAsAutostart` test function (from an unrelated commit on `main`) was brought in as surrounding context during the cherry-pick. It was removed since it tests functionality (`ResourceUserSecret.Read` for the Autostart role) not present on the release branch. ## Tests - `TestAuthorizeProvisionerJob_SystemFastPath` (3 sub-tests): all pass - `TestPostWorkspaceAuthAWSInstanceIdentity/Ambiguous/*` (7 sub-tests): all pass > Generated by Coder Agents Co-authored-by: Dean Sheather <dean@deansheather.com>	2026-05-05 21:54:04 +02:00
Jon Ayers	17635dde5c	chore: include pgcoordinator schema changes in 2.33 (#24931 ) Includes https://github.com/coder/coder/pull/24613 since it landed prior to the pgcoordinator migration --------- Co-authored-by: Marcin Tojek <mtojek@users.noreply.github.com>	2026-05-04 15:42:34 -05:00
github-actions[bot]	e67d027786	fix(coderd/externalauth): detect concurrent refresh race to prevent cache poisoning (#24228 ) (#24938 ) Cherry-pick of https://github.com/coder/coder/pull/24228 Original PR: #24228 — fix(coderd/externalauth): detect concurrent refresh race to prevent cache poisoning Merge commit: `da6e708bd2` Requested by: @f0ssel Co-authored-by: Jason Barnett <J@sonBarnett.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Garrett Delfosse <garrett@coder.com>	2026-05-04 14:03:39 -04:00
Cian Johnston	eabb68d89e	fix: add preset support to MCP tools (#24694 ) (#24889 ) The chat tools (`read_template`, `create_workspace`) did not surface or respect template version presets. Presets were invisible to the LLM and preset parameter defaults were never applied at workspace creation. The `toolsdk` MCP surface had the same gap (ref #24695, now subsumed here). ## What this changes - `read_template` returns presets with `id`, `name`, `default`, `description`, `icon`, `parameters`, and `desired_prebuild_instances` (when set), so the LLM can pick the right preset and prefer prebuilt-backed ones. - `create_workspace` accepts a `preset_id`. The wsbuilder applies preset parameter defaults and may claim a prebuilt workspace. - `start_workspace` does not accept a preset. Presets are a creation-time choice; subsequent starts use the workspace's existing version and parameters. Users who need a specific preset or version on an existing chat can create the workspace out-of-band (CLI / UI / API) with the desired configuration and attach the chat to it. - `toolsdk` gains `GetTemplate` (with presets including `desired_prebuild_instances`), preset support on `CreateWorkspace`, and preset + `rich_parameters` support on `CreateWorkspaceBuild`. The `template_version_preset_id` description warns about preset/version affinity. > 🤖 Generated with [Coder Agents](https://coder.com/agents) and reviewed by a human. (cherry picked from commit `04cc983833`) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Co-authored-by: Max schwenk <maschwenk@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:26:47 +01:00
Cian Johnston	df1bfe6479	feat: audit user secret create, update, and delete (#24756 ) (#24849 ) Emit user secret audit log entries for create/update/delete operations. Reads stay un-audited, matching every other resource. Audit log entries record changes in user secret name, environment variable name, file path, and value. The secret value column is marked `ActionSecret` so the diff records the change without showing the ciphertext or plaintext. Closes a TOCTOU window on delete to ensure no phantom audit logs for a delete of a non-existent secret. Secret update accepts a small TOCTOU window matching the other audited resources (templates, workspaces, chats). The two-query pattern is wrapped in a transaction so audit state can't leak from a failed mutation. (cherry picked from commit `1c30d52b2b`) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Co-authored-by: Zach <3724288+zedkipp@users.noreply.github.com>	2026-04-30 21:01:27 +01:00
George K	9538390107	fix(coderd/healthcheck/derphealth): avoid data races in DERP report (#24795 ) Fixes two data races, one introduced in #24544 and one pre-existing. Related to: https://github.com/coder/internal/issues/1505	2026-04-28 13:06:45 -07:00
Michael Suchacz	1d8e29815e	fix(coderd/x/chatd/chatdebug): restore request body after capture (#24784 ) > Mux working on behalf of Mike. Debug recording could consume request bodies when a provider SDK returned the active body from `GetBody`, which left the upstream request with an empty body after capture. Reset the request body after debug capture and add coverage for shared `GetBody` readers so debug logging does not alter the bytes sent upstream.	2026-04-28 19:09:27 +02:00
Mathias Fredriksson	881df9a5b0	feat: reload MCP config on change via lazy stat-on-request (#24700 ) The MCP manager previously read .mcp.json exactly once at agent startup. Editing the file had no effect until workspace rebuild or agent restart. handleListTools now stats config file mtimes on every tool-list request and triggers a differential reload when any file changed. Unchanged servers keep their client pointer so in-flight tool calls survive. Concurrent reload requests coalesce via singleflight. MCP stdio subprocesses use the agent's execer for resource limits and receive the same enriched environment as SSH sessions via updateEnv. On the chatd side, WorkspaceMCPTool.Run detects 404 responses from CallMCPTool (indicating the server was removed) and drops the chat's cached tool list so the next turn refetches from the agent.	2026-04-28 19:47:14 +03:00
George K	3f0e015fe5	fix: allow coderd to start with an empty DERP map when built-in DERP is disabled (#24544 ) Allow coderd to start with an empty base DERP map when built-in DERP is disabled and no static DERP map is configured, so DERP can come from workspace proxies after startup. Also add a DERP healthcheck warning when no DERP servers are currently available at runtime. Related to: https://linear.app/codercom/issue/PLAT-43/bug-coderd-unable-to-be-started-if-built-in-derp-server-disabled-and Related to: https://github.com/coder/coder/issues/22324	2026-04-28 09:17:08 -07:00
Mathias Fredriksson	1926b7e658	fix(coderd/externalauth): detect rate-limit 403/429 and narrow isFailedRefresh (#24334 ) ValidateToken treated all 403 responses as "token invalid," including GitHub rate limits. isFailedRefresh included 403 in the status code fallthrough, destroying tokens on rate-limited refresh attempts. Split the combined 401/403 check in ValidateToken into a switch on status code. On 403, inspect X-RateLimit-Remaining and Retry-After headers; if either indicates a rate limit, return optimistically valid. Handle 429 the same way. Plain 403 without rate-limit headers preserves the existing invalid-token behavior. Add incorrect_client_credentials and invalid_client to isFailedRefresh error code switch. Remove 403 from the status code fallthrough since no known provider returns 403 from the token endpoint.	2026-04-28 18:03:35 +03:00
Mathias Fredriksson	3c450899ea	fix: pass agent context config explicitly instead of reading env (#24759 ) The CODER_AGENT_EXP_* env vars are agent-internal options. When set in the workspace environment they leak to MCP subprocesses and user shells. ReadEnvConfig() captures the values and ClearEnvVars() strips them before the reinit loop, so config survives agent restarts. NewAPI and ReadEnvConfig both use applyDefaults() to fill zero fields. The chatd test passes config via agenttest.WithContextConfigFromEnv().	2026-04-28 17:58:28 +03:00
Cian Johnston	1666bff1f9	fix(coderd/x/chatd): block chain mode when provider missing tool results (#24782 ) When `StopAfterTool` fires (e.g., `propose_plan`), the LLM response containing a `function_call` is stored at OpenAI via `store=true`, but the tool result is only persisted locally. On the next user message, `resolveChainMode` sees the tool result in the local DB and concludes all calls are resolved. Chain mode activates with `previous_response_id`, but OpenAI rejects because its stored chain has an unresolved `function_call`. This adds a `providerMissingToolResults` check to `resolveChainMode` that detects the `assistant(tool-call) → tool(result) → user` pattern with no follow-up assistant message. The absence of a follow-up assistant proves the tool results were never round-tripped to the provider. When detected, chain mode is blocked and the system falls back to full history replay, which includes both the tool call and its result. Deploying this fix un-bricks existing affected chats with no DB migration needed. > Generated by Coder Agents.	2026-04-28 15:30:04 +01:00
david-fraley	5222db86c7	feat: add after_id pagination for chat messages (#24531 )	2026-04-28 08:31:33 -05:00
Michael Suchacz	8fe11e9b14	fix: match Bedrock streaming accept headers (#24781 ) > Mux is working on behalf of Mike. ## Summary - Bump `github.com/coder/anthropic-sdk-go` to the corrected Bedrock streaming header fix from coder/anthropic-sdk-go#14. - Match botocore's `InvokeModelWithResponseStream` request shape by using `X-Amzn-Bedrock-Accept` and omitting the HTTP `Accept` header. - Update chatd regression coverage for the corrected header shape. ## Context The previous fix set `Accept: application/vnd.amazon.eventstream`. Real boto3/botocore streaming requests do not send that header. They send `X-Amzn-Bedrock-Accept: application/json`, which is the modeled Bedrock request header for the desired model response MIME type. ## Validation - `go test ./coderd/x/chatd/chatprovider -run 'TestModelFromConfig_Bedrock(StreamingHeaders\|StripsAnthropicHeaders)?$' -count=1` - `go mod tidy -diff` - `git diff --check` - pre-commit hook during `git commit`	2026-04-28 14:39:10 +02:00
Michael Suchacz	dec3e98e54	fix: set Bedrock streaming accept headers (#24776 ) > Mux is working on behalf of Mike. ## Summary - Bump `github.com/coder/anthropic-sdk-go` to the clean Bedrock streaming header fix from coder/anthropic-sdk-go#10. - Add chatd regression coverage that verifies Bedrock streaming requests use AWS event stream headers and include `X-Amzn-Bedrock-Accept` in the SigV4 signed headers. ## SDK follow-up - Reverted the bad coder/anthropic-sdk-go#8 merge with coder/anthropic-sdk-go#9. - Re-applied only the intended Bedrock streaming header change in coder/anthropic-sdk-go#10. ## Validation - `go test ./coderd/x/chatd/chatprovider -run 'TestModelFromConfig_Bedrock(StreamingHeaders\|StripsAnthropicHeaders)?$' -count=1` - `go test ./coderd/x/chatd/chatprovider -count=1` - `go mod tidy -diff` - `make lint` - pre-commit hook during `git commit`	2026-04-28 11:28:20 +00:00
Michael Suchacz	99eb46dac1	fix(coderd/x/chatd): repair Anthropic provider tool history (#24744 ) ## Problem Anthropic returns HTTP 400 when an assistant message contains a `web_search_tool_result` block whose `tool_use_id` has no matching earlier `server_tool_use` block in the same assistant message. A previous fix (#24706) sanitized provider-executed tool calls without matching results, but the opposite direction, orphaned or misordered provider-executed results, could still slip through both the prompt sanitizer and the persistence path. ## Fix Tighten Anthropic provider-executed tool history handling while preserving the useful result payload as normal assistant text when the provider-tool metadata is unsafe. 1. Extract Anthropic provider-tool sanitization into `coderd/x/chatd/chatsanitize` so provider-specific repair logic is no longer spread through `chatprompt` and `chatloop`. 2. `chatsanitize.SanitizeAnthropicProviderToolHistory` removes invalid provider-executed tool structure for Anthropic prompts: orphans in either direction, result-before-call, duplicate IDs, invalid JSON inputs, empty IDs and tool names, unsupported tool names, mismatched `ProviderExecuted` flags, provider-executed blocks outside assistant messages, and web-search results without serializable Anthropic result metadata. Provider-executed result payloads are textified instead of being discarded when there is text to preserve. 3. `chatsanitize.SanitizeAnthropicProviderToolContent` mirrors the same rule at the streamed step content level. Persisted history no longer carries invalid provider-tool blocks forward, but it keeps the result text for future turns. 4. `chatsanitize.ApplyAnthropicProviderToolGuard` only repairs structurally invalid Anthropic provider-tool history. It no longer strips otherwise-valid historical `web_search` blocks just because web search is disabled for the current request. The fail-closed fallback also textifies provider results before removing provider-tool metadata. Tests cover prompt sanitization, validation reason strings, result payload textification, content-level persistence sanitization, disabled web-search history preservation, direct pre-request guard behavior, and the fallback strip path. > Mux is acting on Mike's behalf.	2026-04-28 12:45:23 +02:00
Cian Johnston	70d6efa311	feat: chat auto-archive owner digest notifications (#24643 ) Depends on #24642 Adds per-owner digest notifications onto the chat auto-archive subsystem. Each tick's archived rows are grouped by owner, the top 25 titles per owner are rendered into a new `Chats Auto-Archived` notification template, and any remainder surfaces as `and N more`. Each digest is per-tick, so users with large amounts of purgeable data may get multiple notifications in sequence (one per user per tick). The template body branches on `retention_days`: when retention is disabled (`retention_days=0`), users are told archived chats are kept indefinitely rather than falsely claiming imminent deletion. ### Changes - migration `000XXX_chat_auto_archive_notification_template` adds new notification template - `dbpurge`: threads `notifications.Enqueuer` through `New`; and enqueues notification message. - `cli/server.go`: passes `options.NotificationsEnqueuer` into `dbpurge.New`. - `coderd/notifications/events.go`: new `TemplateChatAutoArchiveDigest` UUID. - `coderd/inboxnotifications.go`: inbox registration. - Docs: adds a `Notifications` section to `chat-auto-archive.md`. > 🤖	2026-04-28 08:56:36 +01:00
Faur Ioan-Aurel	a8e7f329ac	fix: redirect OAuth2 authorization page to dashboard (#24499 ) Currently when a user clicks either the Cancel or Allow button on the authorization page the client app URI is executed but the page does not land to the main dashboard page, leaving the two buttons open for multiple clicks from the user. Aside from the potential problems it might cause by activating the callback URI multiple times, the page also provides poor UX because users usually expect the authorization tab to return to the dashboard. The consent page now executes the OAuth2 callback (auth code on Allow, `access_denied` on Cancel) and hides the two buttons and updates the existing description with a user instruction to close the window. Initial implementation relied on a pop-up window executing the callback while the main window was redirected to the dashboard main page. - resolves https://github.com/coder/coder/issues/20323 <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2026-04-27 23:26:17 +03:00
Zach	79735f2d45	feat: plumb user secrets through provisioner chain to terraform (#24542 ) This change passes user secrets from coderd to the Terraform process at workspace build time so the `data.coder_secret` data source in terraform-provider-coder can resolve values at plan time. Secrets traverse two proto hops: `provisionerdserver` fetches them via`ListUserSecretsWithValues`, attaches them to `AcquiredJob.WorkspaceBuild.user_secrets` on `provisionerd.proto`; `runner.go` forwards into `PlanRequest.user_secrets` on `provisioner.proto`; the Terraform provisioner encodes each as `CODER_SECRET_ENV_<name>` or `CODER_SECRET_FILE_<hex(path)>` before invoking `terraform plan`. Only plan requests carry secrets; apply runs with `nil` because values are baked into plan state. Fetch is gated on a workspace transitioning to start. stop and delete transitions never carry secrets, so revoking or deleting a stored secret cannot make a workspace unstoppable. DB errors on the fetch fail the job outright rather than silently continuing with an empty secret set. Note that user secrets will be stored in the workspace_builds table in provisioner_state with other Terraform state (including other sensitive data).	2026-04-27 08:26:07 -06:00
Cian Johnston	2f26903af9	feat: add admin UI control for chat auto-archive days (#24704 ) Relates to #24642 Adds admin UI controls for managing chat auto-archive (days) under "Lifecycle". Also adds a "Days" label to the right of the pre-existing unitless numeric input for consistency. Exemplary screenshot below. More screens available in Storybook. <img width="847" height="585" alt="Screenshot 2026-04-24 at 16 48 59" src="https://github.com/user-attachments/assets/d38de5f8-d379-4b06-b175-ac399f31e578" />	2026-04-27 09:54:22 +01:00
Kyle Carberry	069223ae26	fix: recover web push subscriptions after PWA reinstall (#24720 )	2026-04-26 14:49:10 -07:00
Michael Suchacz	99a83a2702	fix: clean Bedrock headers (#24718 ) Bedrock chat provider requests can inherit Anthropic public API headers from the process environment, which causes mixed Anthropic and Bedrock auth headers on signed requests. Update the Anthropic SDK fork so its Bedrock middleware strips Anthropic-only headers before signing requests, and keep a chatprovider regression test for the production request shape. > Mux is acting on Mike's behalf.	2026-04-26 21:50:29 +02:00
Michael Suchacz	62e9752acd	fix: prevent malformed OpenAI Responses continuations (#24725 ) > Worked on by Mux on Mike's behalf. ## Summary - Disable OpenAI Responses `previous_response_id` chain mode when the prior assistant response has unresolved local tool calls, so the next request can include paired tool outputs instead of sending an incomplete continuation. - Update the fantasy pin to a Responses replay fix that preserves stored reasoning references, only replays web search references when paired with reasoning, and validates local function-call output pairing before send. - Add fake OpenAI Responses input validation for the two production 400 shapes and integration coverage for full-history reasoning plus web search replay. - Add sanitized diagnostics for the OpenAI Responses continuity errors. ## Tests - `go test ./providers/openai -run 'TestResponsesToPrompt_(ReasoningWithStore\|ReasoningWithWebSearchCombined\|WebSearchRequiresReasoningReference\|ReasoningWithFunctionCallCombined\|WebSearchProviderExecutedToolResults)\|TestPrepareParams_(SkipsProviderExecutedToolReferences\|ValidatesFunctionCallOutputPairing)\|TestValidateResponsesInput_WebSearchReferenceRequiresReasoning' -count=1` - `go test ./providers/openai -count=1` - `GOWORK=off go test ./coderd/x/chatd/chattest -run TestValidateResponsesAPIInput -count=1` - `GOWORK=off go test ./coderd/x/chatd -run 'TestOpenAIResponses(NoStaleWebSearchReplay\|FullReplayPairsReasoningAndWebSearch\|ChainModeSkipsWhenLocalCallPending\|ChainModeStillFiresForProviderExecutedOnly)$\|TestResolveChainMode_' -count=1` - `GOWORK=off go test ./coderd/x/chatd/chatprompt -run 'TestInjectMissingToolResults_' -count=1` - `GOWORK=off go test ./coderd/x/chatd/chaterror -run TestClassify_OpenAIResponsesAPIDiagnostics -count=1` - `GOWORK=off go test ./coderd/x/chatd/... -count=1` - `git diff --check` - `git commit` pre-commit hook	2026-04-26 21:23:06 +02:00
Michael Suchacz	ed33e28b13	fix(coderd/x/chatd): wake after auto-promoting queued message (#24714 ) `tryAutoPromoteQueuedMessage` in `processChat`'s deferred cleanup could set a chat back to `pending` without waking the processor. The processor only noticed on the next 10ms poll, so under load tests like `TestAutoPromoteQueuedMessageFallsBackForInvalidQueuedModelConfigID` could time out waiting for the second streaming request (#1500). Call `p.signalWake()` after the promoted-message publishes when `promotedMessage != nil`, matching the pattern used by `CreateChat`, `SendMessage`, `EditMessage`, `PromoteQueued`, and `InterruptChat`. Make the regression helper `testAutoPromoteQueuedMessageFallback` deterministic by setting `PendingChatAcquireInterval = time.Hour` and synchronizing on a `secondRunStarted` channel instead of polling `requestCount`, so the test fails without the wake instead of relying on the 10ms ticker. Closes https://github.com/coder/internal/issues/1500 > Mux is acting on Mike's behalf.	2026-04-26 11:08:32 +02:00
Michael Suchacz	0211448d09	fix(coderd): sanitize Anthropic provider tool history (#24706 ) Anthropic can reject replayed chat histories when a provider-executed tool call, such as `web_search`, is present without its matching provider result block. This sanitizes unpaired Anthropic provider-executed tool calls during prompt reconstruction, before Anthropic requests, and before persistence so existing poisoned histories can continue and new malformed turns are not stored. Resolves: CODAGT-259 > Mux is acting on Mike's behalf.	2026-04-24 23:57:30 +02:00
Cian Johnston	0ccfd575d0	fix(coderd/database/migrations): rename duplicate migration 477 (#24707 )	2026-04-24 14:49:11 +00:00
Michael Suchacz	c7cac9debe	fix: persist per-turn model on chats and queued messages (#24688 ) Previously, `chats.last_model_config_id` was not updated when a user sent a mid-chat message with a different model, and queued messages did not store their own per-turn model, so promotion ran against whatever the chat row said at promote time. Chat watch events also did not merge `last_model_config_id` into the site's root, child, and per-chat caches, so sidebar labels stayed stale after direct sends and queued promotions. - Add nullable `chat_queued_messages.model_config_id`, backfilled from `chats.last_model_config_id`. Queued inserts round-trip the effective model id at enqueue time. - In `coderd/x/chatd`, direct sends update `chats.last_model_config_id` inside the same transaction that inserts the admitted user message. Manual promotion and auto-promotion use the queued row's stored `model_config_id`, with a fallback to `chats.last_model_config_id` for legacy NULL rows during rollout. `PromoteQueuedOptions.ModelConfigID` is now ignored. - On the site, extract `mergeWatchedChatSummary` and `mergeWatchedChatIntoCaches` in `site/src/api/queries/chats.ts` so status-change watch events merge `last_model_config_id` into the root infinite chat list, the parent-embedded child entry, and the per-chat `chatKey(chatId)` cache. `updated_at` guards against stale watch payloads clobbering newer cached state, while diff status events still merge their PR metadata because they are timestamped outside the chat row. Watch timestamps are compared as instants so variable fractional precision does not make fresh events look stale. - Queued promotion validates stored model config IDs before admission. Invalid legacy queued IDs fall back to the chat's current model config instead of dropping the queued message during auto-promotion. - Backend and frontend regression coverage added for admission, queue promotion (including FIFO across mixed models, legacy NULL fallback, and invalid queued model IDs), and chat watch cache merging. > Mux is acting on Mike's behalf.	2026-04-24 15:36:08 +02:00
Cian Johnston	a876287d36	feat: auto-archive inactive chats with audit trail (#24642 ) Adds a background job in `dbpurge` that periodically archives chats inactive beyond a configurable threshold. Each archived root chat gets a background audit entry tagged `chat_auto_archive`. Disabled by default. * New `AutoArchiveInactiveChats` SQL query with LATERAL last-activity subquery and partial index on archive candidates * `site_configs`-backed `auto_archive_days` setting with admin-only PUT, any-authenticated-user GET * Cascade archive via `root_chat_id`; pinned chats and active threads exempt * Root-only audit dispatch on detached context, matching manual archive (`patchChat`) behavior * 11 subtests covering disabled no-op, boundary, deleted messages, child activity, pinned exemption, multi-owner, idempotency, and batch pagination PR #24643 adds per-owner digest notifications. PR #24704 adds the requisite UI controls. > 🤖	2026-04-24 14:18:28 +01:00
Danielle Maywood	3a9a60dff8	feat: add collapsible thinking blocks with configurable display mode (#24635 )	2026-04-24 11:29:08 +00:00
Michael Suchacz	3d90546aae	feat: add general subagent model override (#24610 ) Adds a deployment-wide admin override for general delegated subagents. ## What changed - store the general override in `site_configs` and expose it through the shared `agent-model-override/{context}` API - apply the general override when spawning delegated general subagents, while preserving the existing Explore override behavior - reuse a shared Agents settings form for the general and Explore override sections ## Validation - `make gen` - `go test ./coderd -run 'TestChatModelOverrides'` - `go test ./coderd/x/chatd -run 'TestSpawnAgent_(GeneralUsesConfiguredModelOverride\|GeneralOverrideLogsAndFallsBackWhenCredentialsUnavailable\|GeneralOverrideLogsAndFallsBackWhenProviderDisabled)'` - `pnpm -C site lint:types` - `pnpm -C site test:storybook -- AgentSettingsAgentsPageView.stories.tsx` - `make lint` - `make pre-commit` > Mux is acting on Mike's behalf.	2026-04-24 12:37:20 +02:00
Cian Johnston	a02339c66a	fix(coderd/x/chatd): prevent invalid tool results from poisoning chat history (#24663 ) - computeruse.go: Decode base64 screenshot data before storing in `ToolResponse.Data` (was casting base64 string to bytes without decoding) - chatloop.go: Re-encode `ToolResponse.Data` to base64 via `base64.StdEncoding.EncodeToString` instead of `string()` cast - mcpclient.go: UTF-8 validate all text from MCP responses in `convertCallResult()` using `strings.ToValidUTF8` - chatprompt.go (persist): Defense-in-depth UTF-8 sanitization of text and media Text fields before database storage - chatprompt.go (replay): Antivenom layer that validates base64 and UTF-8 at read time, auto-healing already-poisoned chats without requiring a migration - `TestToolResultAntivenom`: 4 subtests covering poisoned text, poisoned media, valid media round-trip, and media with invalid UTF-8 text - Adds `TestConvertCallResult_UTF8Sanitization`: 4 subtests covering invalid UTF-8 in TextContent, EmbeddedResource, valid passthrough, and multi-part - Adds `TestComputerUseTool_Run_ScreenshotDataIsDecodedBinary`: Verifies no double-encode in the computer-use path - Updated existing computer-use tests for the new decoded-binary contract > 🤖	2026-04-23 19:58:38 +01:00
Cian Johnston	c602a31856	fix(coderd): reject pinning child chats in patchChat handler (#24669 ) The UI already prevents child (delegated/subagent) chats from being pinned, but the `PATCH /api/experimental/chats/{chat}` endpoint did not enforce this. A direct API call could pin a child chat. - Add a `400 Bad Request` guard in `patchChat` when `pinOrder > 0` and the chat has a `ParentChatID` - Add `TestChatPinOrder/RejectsChildChat` test > 🤖	2026-04-23 18:36:20 +01:00
Michael Suchacz	dbcc654d28	feat: snapshot explore subagent tool entitlements (#24638 ) Explore sub-agents previously could not use `web_search` or external MCP tools. `runChat` hard-skipped both for Explore. Lifting those guards naively would over-grant tools, because a child chat could outlive the spawning turn's plan-mode filter. This change persists the spawning parent turn's filtered external MCP server IDs onto the child Explore chat, and simplifies the Explore provider-tool filter in `runChat`: - New `resolveExploreToolSnapshot` helper: computes the child's inherited external MCP subset by running the parent's configs through `filterExternalMCPConfigsForTurn` (plan-mode policy) and, if the parent is itself an Explore child, further narrowing to the parent's own persisted `MCPServerIDs`. The result is written to the child's `MCPServerIDs` column at spawn time. - The existing `mcp_server_ids` column is the sole durable snapshot. No new chat column is added. - `runChat` for Explore children: loads MCP tools from the persisted snapshot, and keeps only `web_search` from provider-native tools (to block computer-use and other write-style tools, since Explore is read-only). Whether `web_search` is actually available is a per-model decision, determined by the current model config, just like a main chat. - Built-in Explore allowlist is unchanged. Workspace-local MCP remains excluded for Explore. Verification: `go build ./...`, `go test ./coderd/x/chatd/... -count=1`, `make gen` (clean tree), `make lint/emdash`, `go vet`. Deep-review ran 12 reviewers on the feature and 5 on the clarity refactor; CAR reviewed and approved; a subsequent scope reduction dropped a temporary `allow_web_search` column in favor of per-model handling. > Mux is acting on Mike's behalf.	2026-04-23 19:07:38 +02:00
Cian Johnston	b5a625549e	feat: migrate agents-access to org-scoped system role for proper chat RBAC (#24438 ) The agents-access role previously granted chat permissions at user scope, but chats are org-scoped objects. Rego skips user-level perms when org_owner is set, making the grants invisible. Handler-level band-aids used synthetic non-org-scoped objects as a workaround. - Migrates agents-access from users.rbac_roles (site-level) to organization_members.roles (org-scoped) via DB migration - Redefines agents-access as a predefined org-scoped builtin role alongside organization-admin, organization-auditor, etc., with Member permissions granting chat create/read/update - Excludes ResourceChat from OrgMemberPermissions so org membership alone no longer grants chat access - Fixes handler Authorize checks to use org-scoped objects with semantically correct actions (ActionUpdate for message/tool operations) - Grants org admins the ability to assign agents-access Closes #24250 Fixes CODAGT-174 Note: this does not update the "Usage" endpoints. Tracked by CODAGT-161. > 🤖	2026-04-23 17:59:42 +01:00
Mathias Fredriksson	f8fe5d680b	fix(coderd): reject API operations on archived chats (#24633 ) Archived chats accept mutations (messages, edits, queued-message promotions, tool-result submissions) via the API, causing them to re-enter the processing pipeline. This violates the hard-stop design intent from PR #23758. Add archived checks at three layers: - HTTP handlers (postChatMessages, patchChatMessage, promoteChatQueuedMessage, postChatToolResults): return 400 after auth so callers get a clear error. - Daemon functions (SendMessage, EditMessage, PromoteQueued, SubmitToolResults): return ErrChatArchived after row lock, guarding against future callers that bypass the handler. - AcquireChats SQL: filter out archived chats so they are never acquired for processing. Fixes CODAGT-245	2026-04-23 19:03:33 +03:00
Danny Kopping	a8613b2209	chore: deprecate /api/v2/aibridge/interceptions endpoint (#24670 ) Disclaimer: implemented by a Coder Agent using Claude Opus 4.6 Marks the `GET /api/v2/aibridge/interceptions` endpoint as deprecated in favor of `/aibridge/sessions`, which provides richer session-level aggregation including threads and agentic actions. Changes: - Add `@Deprecated` Swagger annotation to the endpoint handler - Add deprecation notice to the `codersdk.Client.AIBridgeListInterceptions` method - Regenerated OpenAPI spec with `"deprecated": true` flag The endpoint remains fully functional. Fixes https://github.com/coder/internal/issues/1339	2026-04-23 15:33:40 +02:00
Cian Johnston	2e5c7d99c2	fix(coderd/x/chatd): fix flaky TestSpawnComputerUseAgentInheritsContext (#24666 ) Fixes flaky `TestSpawnComputerUseAgentInheritsContext`. - The test inserts an Anthropic provider directly into the DB after `CreateChat` has already been called - The server's background goroutine may have already cached the provider list (OpenAI only) via `configCache.EnabledProviders()` with a 10s TTL - The direct DB insert bypasses the pubsub event that production uses to invalidate the cache - `isAnthropicConfigured()` returns the stale cached result, making `computer_use` appear unavailable - Fix: call `server.configCache.InvalidateProviders()` after the insert, mirroring what production does via pubsub CI failure: https://github.com/coder/coder/actions/runs/24829197096/job/72673070101?pr=24648 > 🤖	2026-04-23 13:18:18 +01:00
Jake Howell	4caa52844d	chore!: remove `api.ts` unnecessary calls (#22168 ) > [!WARNING] > The change of the status code from `404` to `204` could break peoples code downstream. Adding this as a breaking change incase. Theres a whole ton of noise around failed requests, these are all unrelated to the actual thing that is broken at hand (and are confusing). * Change `/api/v2/organizations/.../templates/.../versions/.../previous` to return `204` instead of `404` (actually makes more sense because the content doesn't exist, but the route is found. * Remove unnecessary calls to `/api/v2/users/me/appearance` when the user isn't logged in. * Remove unnecessary calls to `/api/v2/deployment/stats` when the deployment stats aren't allowed to be seen. * Various changes to `workspace-sharing` so we don't make unnecessary calls. Whats left: * `/api/v2/users/me` still `401`s on the login page. This persists as when the user is logged in but tries to reach the sign-in page they should be redirected to the app, not sign in again. * `monaco-editor` is still upset... we theoretically could inject an environment that can serve workers... but eh. #### Old ```sh % pnpm playwright:test -g "create workspace with default and required parameters" > coder-v2@ playwright:test /home/coder/coder/site > playwright test --config=e2e/playwright.config.ts -g 'create workspace with default and required parameters' ... Running 2 tests using 1 worker ✓ 1 …e/setup/addUsersAndLicense.spec.ts:7:5 › setup deployment (8.2s) 2 ….ts:79:5 › create workspace with default and required parameters [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} [console][error] Failed to load resource: the server responded with a status of 404 (Not Found) [response] url=http://localhost:3111/api/v2/organizations//provisionerdaemons status=404 body={"message":"Resource not found or you do not have access to this resource"} [console][error] Failed to load resource: the server responded with a status of 404 (Not Found) [response] url=http://localhost:3111/api/v2/organizations/default/templates/a4e8096d/versions/agreeable_glenn33/previous status=404 body={"message":"No previous template version found for \"agreeable_glenn33\"."} [console][warning] Could not create web worker(s). Falling back to loading web worker code in main thread, which might cause UI freezes. Please see https://github.com/microsoft/monaco-editor#faq [console][warning] You must define a function MonacoEnvironment.getWorkerUrl or MonacoEnvironment.getWorker [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} ✓ 2 …5 › create workspace with default and required parameters (7.0s)atus of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} 2 passed (56.1s) ``` `23 LOL` (Lines of logs) #### New ```sh % pnpm playwright:test -g "create workspace with default and required parameters" > coder-v2@ playwright:test /home/coder/coder/site > playwright test --config=e2e/playwright.config.ts -g 'create workspace with default and required parameters' ... Running 2 tests using 1 worker ✓ 1 …e/setup/addUsersAndLicense.spec.ts:7:5 › setup deployment (8.7s) 2 ….ts:79:5 › create workspace with default and required parameters [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [console][warning] Could not create web worker(s). Falling back to loading web worker code in main thread, which might cause UI freezes. Please see https://github.com/microsoft/monaco-editor#faq [console][warning] You must define a function MonacoEnvironment.getWorkerUrl or MonacoEnvironment.getWorker ✓ 2 …5 › create workspace with default and required parameters (7.1s)atus of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} 2 passed (32.0s) ``` `9 LOL` (Lines of logs)	2026-04-23 06:20:35 +10:00
Cian Johnston	be1256c418	fix(coderd): fix TestListChats/PinnedOnFirstPage race timeout (#24641 ) - Insert filler chats directly into the database with `completed` status instead of creating them via the API - Removes the `testutil.Eventually` polling loop that waited for all 52 chats to reach terminal status - Avoids spawning 52 background chat processors that each time out on title generation under `-race`, exceeding the 25s `WaitLong` timeout - Test now completes in ~1s instead of timing out at 30s+ Flake: https://github.com/coder/coder/actions/runs/24789695935/job/72543519963?pr=24438 > 🤖	2026-04-22 20:37:06 +01:00
Mathias Fredriksson	1ace519c6e	fix(coderd/x/chatd): remove cache-miss check blocking agent recovery (#24634 ) The cache-miss isAgentUnreachable check added in #24336 runs before dialWithLazyValidation, preventing the existing switch mechanism from discovering the new agent after a workspace rebuild. The chat's stale agent binding is never repaired, causing an infinite loop of 'agent is disconnected' errors. Remove the cache-miss check. The cache-hit check remains (it verifies the agent behind an established connection). The dial timeout and dialWithLazyValidation already bound the cache-miss failure path. Closes CODAGT-248	2026-04-22 21:49:10 +03:00
Cian Johnston	72e3ae9c5f	feat: add chatd tool call error metrics and logging (#24559 ) - Add `coderd_chatd_tool_errors_total` prometheus counter (labels: provider, model, tool_name) - Log tool call errors at warn level with correlation fields: chat_id, owner_id, organization_id, workspace_id, agent_id, parent_chat_id, trigger_message_id, tool_name, tool_call_id, provider, model - Thread enriched logger from chatd.go into chatloop via `RunOptions.Logger` - Remove squashing of all MCP tool calls to the `mcp` bucket > 🤖	2026-04-22 16:19:56 +00:00
Michael Suchacz	7904bed947	fix: fall back to local git watcher for chat diff drawer (#24512 ) The Ctrl+D diff drawer in `coder exp agents` only rendered PR-backed diffs returned by `/api/experimental/chats/{id}/diff`. Local working tree changes in a chat's workspace returned an empty diff, so the drawer showed "No diff contents" with no file summary. Centralise diff loading behind a single `fetchChatDiffContents` helper that first hits `/diff`, then falls back to the chat git watcher WebSocket (`/stream/git`) when the remote diff is empty. Aggregate the agent's `WorkspaceAgentRepoChanges` into a `ChatDiffContents` value so the drawer can derive the file summary and styled body from the local unified diff. Missing workspaces, missing agents, and watcher timeouts are treated as graceful fallbacks that render the empty-diff placeholder instead of a hard error. > Mux is opening this PR on Mike's behalf.	2026-04-22 18:08:02 +02:00

1 2 3 4 5 ...

3731 Commits