coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Danny Kopping	a8613b2209	chore: deprecate /api/v2/aibridge/interceptions endpoint (#24670 ) Disclaimer: implemented by a Coder Agent using Claude Opus 4.6 Marks the `GET /api/v2/aibridge/interceptions` endpoint as deprecated in favor of `/aibridge/sessions`, which provides richer session-level aggregation including threads and agentic actions. Changes: - Add `@Deprecated` Swagger annotation to the endpoint handler - Add deprecation notice to the `codersdk.Client.AIBridgeListInterceptions` method - Regenerated OpenAPI spec with `"deprecated": true` flag The endpoint remains fully functional. Fixes https://github.com/coder/internal/issues/1339	2026-04-23 15:33:40 +02:00
Cian Johnston	2e5c7d99c2	fix(coderd/x/chatd): fix flaky TestSpawnComputerUseAgentInheritsContext (#24666 ) Fixes flaky `TestSpawnComputerUseAgentInheritsContext`. - The test inserts an Anthropic provider directly into the DB after `CreateChat` has already been called - The server's background goroutine may have already cached the provider list (OpenAI only) via `configCache.EnabledProviders()` with a 10s TTL - The direct DB insert bypasses the pubsub event that production uses to invalidate the cache - `isAnthropicConfigured()` returns the stale cached result, making `computer_use` appear unavailable - Fix: call `server.configCache.InvalidateProviders()` after the insert, mirroring what production does via pubsub CI failure: https://github.com/coder/coder/actions/runs/24829197096/job/72673070101?pr=24648 > 🤖	2026-04-23 13:18:18 +01:00
Jake Howell	4caa52844d	chore!: remove `api.ts` unnecessary calls (#22168 ) > [!WARNING] > The change of the status code from `404` to `204` could break peoples code downstream. Adding this as a breaking change incase. Theres a whole ton of noise around failed requests, these are all unrelated to the actual thing that is broken at hand (and are confusing). * Change `/api/v2/organizations/.../templates/.../versions/.../previous` to return `204` instead of `404` (actually makes more sense because the content doesn't exist, but the route is found. * Remove unnecessary calls to `/api/v2/users/me/appearance` when the user isn't logged in. * Remove unnecessary calls to `/api/v2/deployment/stats` when the deployment stats aren't allowed to be seen. * Various changes to `workspace-sharing` so we don't make unnecessary calls. Whats left: * `/api/v2/users/me` still `401`s on the login page. This persists as when the user is logged in but tries to reach the sign-in page they should be redirected to the app, not sign in again. * `monaco-editor` is still upset... we theoretically could inject an environment that can serve workers... but eh. #### Old ```sh % pnpm playwright:test -g "create workspace with default and required parameters" > coder-v2@ playwright:test /home/coder/coder/site > playwright test --config=e2e/playwright.config.ts -g 'create workspace with default and required parameters' ... Running 2 tests using 1 worker ✓ 1 …e/setup/addUsersAndLicense.spec.ts:7:5 › setup deployment (8.2s) 2 ….ts:79:5 › create workspace with default and required parameters [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} [console][error] Failed to load resource: the server responded with a status of 404 (Not Found) [response] url=http://localhost:3111/api/v2/organizations//provisionerdaemons status=404 body={"message":"Resource not found or you do not have access to this resource"} [console][error] Failed to load resource: the server responded with a status of 404 (Not Found) [response] url=http://localhost:3111/api/v2/organizations/default/templates/a4e8096d/versions/agreeable_glenn33/previous status=404 body={"message":"No previous template version found for \"agreeable_glenn33\"."} [console][warning] Could not create web worker(s). Falling back to loading web worker code in main thread, which might cause UI freezes. Please see https://github.com/microsoft/monaco-editor#faq [console][warning] You must define a function MonacoEnvironment.getWorkerUrl or MonacoEnvironment.getWorker [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} ✓ 2 …5 › create workspace with default and required parameters (7.0s)atus of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} [console][error] Failed to load resource: the server responded with a status of 403 (Forbidden) [response] url=http://localhost:3111/api/v2/deployment/stats status=403 body={"message":"Forbidden.","detail":"You don't have permission to view this content. If you believe this is a mistake, please contact your administrator or try signing in with different credentials."} 2 passed (56.1s) ``` `23 LOL` (Lines of logs) #### New ```sh % pnpm playwright:test -g "create workspace with default and required parameters" > coder-v2@ playwright:test /home/coder/coder/site > playwright test --config=e2e/playwright.config.ts -g 'create workspace with default and required parameters' ... Running 2 tests using 1 worker ✓ 1 …e/setup/addUsersAndLicense.spec.ts:7:5 › setup deployment (8.7s) 2 ….ts:79:5 › create workspace with default and required parameters [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [console][warning] Could not create web worker(s). Falling back to loading web worker code in main thread, which might cause UI freezes. Please see https://github.com/microsoft/monaco-editor#faq [console][warning] You must define a function MonacoEnvironment.getWorkerUrl or MonacoEnvironment.getWorker ✓ 2 …5 › create workspace with default and required parameters (7.1s)atus of 401 (Unauthorized) [console][error] Failed to load resource: the server responded with a status of 401 (Unauthorized) [response] url=http://localhost:3111/api/v2/users/me/appearance status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} [response] url=http://localhost:3111/api/v2/users/me status=401 body={"message":"You are signed out or your session has expired. Please sign in again to continue.","detail":"Cookie \"coder_session_token\" or query parameter must be provided."} 2 passed (32.0s) ``` `9 LOL` (Lines of logs)	2026-04-23 06:20:35 +10:00
Cian Johnston	be1256c418	fix(coderd): fix TestListChats/PinnedOnFirstPage race timeout (#24641 ) - Insert filler chats directly into the database with `completed` status instead of creating them via the API - Removes the `testutil.Eventually` polling loop that waited for all 52 chats to reach terminal status - Avoids spawning 52 background chat processors that each time out on title generation under `-race`, exceeding the 25s `WaitLong` timeout - Test now completes in ~1s instead of timing out at 30s+ Flake: https://github.com/coder/coder/actions/runs/24789695935/job/72543519963?pr=24438 > 🤖	2026-04-22 20:37:06 +01:00
Mathias Fredriksson	1ace519c6e	fix(coderd/x/chatd): remove cache-miss check blocking agent recovery (#24634 ) The cache-miss isAgentUnreachable check added in #24336 runs before dialWithLazyValidation, preventing the existing switch mechanism from discovering the new agent after a workspace rebuild. The chat's stale agent binding is never repaired, causing an infinite loop of 'agent is disconnected' errors. Remove the cache-miss check. The cache-hit check remains (it verifies the agent behind an established connection). The dial timeout and dialWithLazyValidation already bound the cache-miss failure path. Closes CODAGT-248	2026-04-22 21:49:10 +03:00
Cian Johnston	72e3ae9c5f	feat: add chatd tool call error metrics and logging (#24559 ) - Add `coderd_chatd_tool_errors_total` prometheus counter (labels: provider, model, tool_name) - Log tool call errors at warn level with correlation fields: chat_id, owner_id, organization_id, workspace_id, agent_id, parent_chat_id, trigger_message_id, tool_name, tool_call_id, provider, model - Thread enriched logger from chatd.go into chatloop via `RunOptions.Logger` - Remove squashing of all MCP tool calls to the `mcp` bucket > 🤖	2026-04-22 16:19:56 +00:00
Michael Suchacz	7904bed947	fix: fall back to local git watcher for chat diff drawer (#24512 ) The Ctrl+D diff drawer in `coder exp agents` only rendered PR-backed diffs returned by `/api/experimental/chats/{id}/diff`. Local working tree changes in a chat's workspace returned an empty diff, so the drawer showed "No diff contents" with no file summary. Centralise diff loading behind a single `fetchChatDiffContents` helper that first hits `/diff`, then falls back to the chat git watcher WebSocket (`/stream/git`) when the remote diff is empty. Aggregate the agent's `WorkspaceAgentRepoChanges` into a `ChatDiffContents` value so the drawer can derive the file summary and styled body from the local unified diff. Missing workspaces, missing agents, and watcher timeouts are treated as graceful fallbacks that render the empty-diff placeholder instead of a hard error. > Mux is opening this PR on Mike's behalf.	2026-04-22 18:08:02 +02:00
Jeremy Ruppel	c23abc691f	feat: sort AI sessions by last prompt time (#24440 ) Previously, the sessions list sorted by `MIN(started_at)` across interceptions, so sessions with old start times but recent activity would sink to the bottom of the list regardless of how recently they were used. `ListAIBridgeSessions` now sorts by `COALESCE(MAX(prompt.created_at), MIN(started_at)) DESC`, exposed as the non-nullable `last_active_at` field. Sessions with prompts surface by last activity; sessions with no prompts fall back to their start time. The original implementation used two separate columns (`last_active_at` as a nullable prompt timestamp and `sort_at` as the non-nullable cursor key). This revision collapses them into a single `last_active_at` that is always set — simplifying the SQL, the Go conversion, the API type, and the frontend. 🤖 Generated with [Claude Code](https://claude.ai/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 12:06:49 -04:00
Marcin Tojek	ec91ac5427	fix: grant AsAIBridged ResourceSystem.ActionCreate for UpsertAISeatState (#24603 ) Related coder/internal#1444	2026-04-22 16:38:57 +02:00
Michael Suchacz	9b5d09ebdc	test(coderd/x/chatd): seed anthropic provider for computer_use tests (#24611 ) `TestSubagentLifecycleToolsIncludePersistedSubagentTypeAcrossVariants/ComputerUse` and two adjacent positive tests passed a static Anthropic key into `newInternalTestServer`, but `seedInternalChatDeps` only inserts an OpenAI provider. At runtime, `Server.resolveUserProviderAPIKeys` calls `chatprovider.PruneDisabledProviderKeys`, which clears `keys.Anthropic` because Anthropic is not in the enabled DB provider set, so the `computer_use` execution path loses its key. Add a focused test helper `seedEnabledAnthropicProvider` and use it only in the positive tests that actually drive a `computer_use` spawn through the runtime key-resolution path (the `computer_use` branch of `TestSubagentLifecycleToolsIncludePersistedSubagentTypeAcrossVariants`, `TestSpawnAgent_ComputerUseUsesComputerUseModelNotParent`, and `TestSpawnAgent_ComputerUseInheritsMCPServerIDs`). `seedInternalChatDeps` stays unchanged, so the negative availability tests continue to model the "Anthropic unavailable" fixture. No production code is modified. Closes https://github.com/coder/internal/issues/1486 > This PR was opened by Mux working on Mike's behalf.	2026-04-22 15:54:17 +02:00
Thomas Kosiewski	b7c2c59931	fix(coderd/x/chatd/chatdebug): allow Anthropic per-modality ratelimit headers (#24592 ) Previously, Anthropic's per-modality, Priority Tier, and fast-mode rate-limit headers (`Anthropic-Ratelimit-Input-Tokens-`, `Anthropic-Ratelimit-Output-Tokens-`, `Anthropic-Priority-Input-Tokens-`, `Anthropic-Priority-Output-Tokens-`, `Anthropic-Fast-Input-Tokens-`, and `Anthropic-Fast-Output-Tokens-`) were shown as `[REDACTED]` in the Debug panel because they contain `"token"` in the name and fell through the generic credential filter. Add them to the allowlist in `coderd/x/chatd/chatdebug/redaction.go` alongside the existing `Anthropic-Ratelimit-Tokens-*` entries so the limits/remaining/reset values surface in the raw response view.	2026-04-22 15:14:31 +02:00
Thomas Kosiewski	26b64fa523	fix(coderd/x/chatd/chatdebug): record SSE attempts on EOF (#24565 ) `chat_turn` debug steps persist with `attempts: []` even when the streaming call to Anthropic completes successfully. Fantasy's Anthropic SSE adapter iterates the response to EOF via `for stream.Next()` and abandons the body without calling `Close()`, so `RecordingTransport`'s Close-only recording path never fires and the attempt is lost. Non-streaming runs (`quickgen`, `title_generation`) go through `model.Generate(...)` and are unaffected. Record on `io.EOF` for `text/event-stream` bodies specifically. Non-SSE responses stay on the Close-only path so JSON integrity, content-length validation, and inner-`Close()` error semantics are preserved. `record()` is already `sync.Once`-guarded, so a later `Close()` is a no-op for recording.	2026-04-22 15:02:02 +02:00
Michael Suchacz	9634739aed	fix: support Bedrock ambient AWS credentials for Agents providers (#24397 ) > This PR was authored by Mux on behalf of Mike. Adds AWS Bedrock ambient credential support to the Agents provider path. Bedrock providers can now be saved without a stored API key and authenticated via the standard AWS SDK credential chain on the Coder server (IAM roles, `AWS_ACCESS_KEY_ID`, etc.). Also fixes missing `Base URL` forwarding for Bedrock. ## Changes Backend runtime (`coderd/x/chatd/chatprovider/chatprovider.go`): - New `ProviderAllowsAmbientCredentials(provider)` helper. Currently returns true only for Bedrock. - `ModelFromConfig` no longer errors on an empty API key when the provider is in the ambient-allowed set AND was explicitly resolved via `ByProvider`. This preserves the policy gate: unresolvable providers (disabled central key, user-key-required without a user key) still error. - `setResolvedProviderAPIKey` internalizes the ambient-credentials contract via `ProviderAllowsAmbientCredentials`, so a resolved-but-keyless Bedrock provider is represented as an empty `ByProvider` entry rather than a post-hoc sentinel patch in the caller. - `WithAPIKey` is only appended when a token is present. - `WithBaseURL(baseURL)` is now forwarded for Bedrock (was previously missing). Backend admin API (`coderd/exp_chats.go`): - `validateChatProviderCentralAPIKey` exempts Bedrock from requiring a stored API key when central credentials are enabled. - AI Gateway separation (`ChatProviderAPIKeysFromDeploymentValues`) is unchanged. No silent reuse of `CODER_AIBRIDGE_BEDROCK_` flags. Frontend* (`site/src/pages/AgentsPage/components/ChatModelAdminPanel/`): - API Key field is optional for Bedrock when central credentials are enabled. - Bedrock-specific descriptions on API Key and Base URL fields (bearer-token vs ambient modes, `AWS_REGION` guidance). - Right-aligned "Clear stored token" action switches an existing Bedrock provider back to ambient mode. - `hasEffectiveAPIKey` treats Bedrock with central credentials enabled as configured, so the provider list shows the correct status icon. - Three new stories: `ProviderFormBedrockAmbientCredentials`, `ProviderFormBedrockBearerToken`, `ProviderFormBedrockClearBearerToken`. Docs* (`docs/ai-coder/agents/models.md`, `docs/ai-coder/ai-gateway/setup.md`): - New "Configuring AWS Bedrock" section covering both credential modes, region resolution, and the Base URL override. - Explicit note that the `us-east-1` region fallback only applies to bearer-token mode; ambient credentials require a region from the standard AWS SDK chain. - Cross-reference in AI Gateway docs clarifying that `CODER_AIBRIDGE_BEDROCK_*` flags are a separate configuration path from Agents. ## Not in scope - Reusing AI Gateway Bedrock flags as an implicit Agents fallback. - Per-provider AWS access key, secret, or region fields (would need a migration and audit-table review). - IMDS or network-backed credential probes in admin/listing request paths. ## Related Dogfood deployment integration: https://github.com/coder/dogfood/pull/324	2026-04-22 14:20:23 +02:00
Mathias Fredriksson	78d9a220cf	fix(coderd/x/chatd): detect disconnected agents in getWorkspaceConn (#24336 ) Add agent status check and dial timeout to getWorkspaceConn to prevent tool calls from hanging when a workspace agent disconnects. Status check: call isAgentUnreachable on every getWorkspaceConn call. On cache miss, check the freshly fetched agent row. On cache hit, re-fetch the agent row by PK for a fresh heartbeat timestamp. Disconnected and timed-out agents return a sentinel immediately; connecting agents proceed to dial. Dial timeout: wrap dialWithLazyValidation in a 30s context.WithTimeoutCause (matching 8 other server-side AgentConn callers). Parent context cancellation propagates unchanged so the chatloop can detect ErrInterrupted. Both sentinels tell the LLM the agent is unreachable and the workspace may need restarting from the dashboard. Closes CODAGT-149	2026-04-22 12:10:32 +00:00
Cian Johnston	38f5d3f0b2	test: add regression guard for chat title masking (#24584 ) Follow-up to #24564 addressing unresolved review findings. - DEREM-1: Add `Test_diff/Chat/TitleMasked` to `enterprise/audit/diff_internal_test.go` so flipping `title` back to `ActionTrack` fails loudly. Verified: the case passes today, fails with a clear diff after flipping to `ActionTrack`, passes again after reverting. - DEREM-4: Inline comment at `coderd/audit/request.go:138` explaining why `ResourceTarget` for `database.Chat` returns a UUID prefix instead of the title. - DEREM-5: Trailing comment on `enterprise/audit/table.go` `title` entry, matching the surrounding `ActionSecret` comment style. Won't-fix, with rationale (per user): - DEREM-2 (8-char prefix collision risk): `resource_target` is a display hint, not an identifier; the full UUID lives in `resource_id`. - DEREM-3 (named constant for `[:8]`): single call site; extracting would be ceremony. - DEREM-6 (PR title misleading): merged PR title is immutable. - DEREM-7 (historical log redaction): the offending version only shipped to dogfood for a couple of hours and not to customers. > 🤖	2026-04-22 10:52:52 +00:00
Jakub Domeracki	86b2db60b2	fix(coderd): enforce ActionSSH in MCP HTTP agent connection path (#24607 )	2026-04-22 12:34:17 +02:00
Ethan	cc4e04afde	feat(site): display file attachments in chat UI (#24281 ) Renders the durable file attachments introduced in #24280 in the chat interface. Without this, attachments were stored and served correctly but the UI showed raw file parts with no previews or download UX. Every attachment gets a download affordance, split into three rendering tiers: - Images — thumbnail with a hover/focus overlay containing a download link. `onFocusCapture`/`onBlurCapture` with `contains(relatedTarget)` keeps the overlay open while tabbing between the image and its download link. - Text-like files (`text/`, `application/json`) — expandable preview button with loading + error-with-retry states and the same download overlay. Preview fetches throw a typed `FetchTextAttachmentError` with a `.status` field instead of a stringly-typed error. - Everything else* — compact `FileCard` with extension badge, filename, and download link. User-side and assistant-side rendering now share `AttachmentBlocks.tsx` (`AttachmentPreviewFrame`, `TextAttachmentButton`, `ImageAttachmentButton`, `FileCard`, plus `getAttachmentHref`/`getAttachmentName`) instead of two near-duplicate implementations. The text-attachment overlay anchors to the preview surface so the download button stays pinned even when a loading/error status line widens the row below. `ComputerRenderer` detects when a screenshot was stored as a durable attachment (`attachment_file_id`) and suppresses the stale base64 rendering — the screenshot appears as a proper file part instead. `ToolLabel` shows the attached filename for `attach_file` tool calls. Storybook coverage in `ConversationTimeline.stories.tsx` was expanded to cover every tier (single/multiple images, inline + file-id text, JSON, download-only files, fetch-failure retry, mixed attachments + file references) with play-function assertions. <img width="811" height="150" alt="image" src="https://github.com/user-attachments/assets/27c71081-3502-4e80-92a7-d8adf1ff9323" /> ## Cleanup Per Mathias' post-merge suggestion on #24280, this PR also relocates `coderd/chatfiles` → `coderd/x/chatfiles` so the durable-attachment helpers live beside the rest of the `chatd` experimental surface. Closes CODAGT-91	2026-04-22 20:11:53 +10:00
Ethan	ad1906589d	fix(coderd): allow deleting chat providers used in historical chats (#24568 ) Drop the `chat_model_configs.provider -> chat_providers.provider` foreign key and soft-delete model configs when their provider is removed. The provider row is now hard-deleted inside a transaction that also tombstones its model configs and promotes a replacement default when needed. Historical chats and messages keep pointing at the soft-deleted model config rows, which are hidden from live/admin queries but still resolve for read. The runtime chat path already falls back to the default model config when a soft-deleted config is looked up. Replaces the lost FK validation in the create/update model-config handlers with an explicit provider lookup that returns the existing `Chat provider is not configured.` 400. ## UX Admin deleting a chat provider that has historical usage - Before: blocked with 400 `Provider models are still referenced by existing chats.` Admins had no in-product way to remove a provider that had ever been used. - After: delete succeeds (204). Any model configs under that provider are soft-deleted. If the removed provider owned the default model config, one of the remaining live configs is auto-promoted to the new default. The promotion is deterministic (`ensureDefaultChatModelConfig` picks the first live config by `provider ASC, model ASC, updated_at DESC, id DESC`); there is no picker, and no toast or response detail names which config became the new default. End users with chats that used a deleted provider's model - Old chats still open and their history still renders unchanged. - Sending a new turn in such a chat silently falls back to the current default model. No banner or warning tells the user the original model is gone. - The model picker no longer lists the deleted model. - If no default model config exists at all after the delete, sending a new turn fails with `no default chat model config is available`. Admin creating or updating a model config against a provider that is not configured - Same as before: 400 `Chat provider is not configured.` Only the detection mechanism changed (explicit `FOR UPDATE` lookup inside the transaction, which also serializes against a concurrent provider delete). Admin updating a model config whose row disappears mid-transaction - Now returns the standard 404 `Resource not found or you do not have access to this resource` instead of the previous 500 that leaked `sql: no rows in result set` in the detail. Unrelated internal races (for example a race on the promoted default candidate) are still reported as 500 so they are not misclassified as "your target is gone". Closes CODAGT-23	2026-04-22 19:34:34 +10:00
Cian Johnston	360e119b43	fix(coderd): use waitChatSettled in remaining title tests (#24585 ) - Replace inline `require.Eventually` blocks in `PreservesUpdatedAt` and `NoOpWhenTitleUnchanged` with the shared `waitChatSettled` helper - These were the last two title subtests still using direct DB polling instead of the API-based helper > 🤖	2026-04-22 09:14:25 +01:00
Ethan	353e522614	fix: handle expired chat file attachments in replay and UI (#24518 ) Closes CODAGT-216 ## Problem `dbpurge` deletes `chat_files` rows after the deployment's configured retention window, but `chat_messages.content` can still contain `file_id` references to those files. On replay, that left the Anthropic provider with an empty file payload and a `400 image cannot be empty` error. In the UI, the same missing file showed up as a broken image. ## Fix - Backend: when replay hits a `file_id` whose bytes are gone, replace it with a short text placeholder instead of emitting an empty file part. We could also drop the missing attachment entirely, but that would silently remove context from the replay and make the conversation harder for the model to interpret. The placeholder keeps the request valid while still telling the model that a file used to be there and is no longer available. - Frontend: classify chat image failures instead of treating every broken image the same. - `404` file fetches render `Image expired`, with a tooltip explaining that chat attachments are deleted after the retention window set for the deployment. - Other remote failures render `Image failed to load`, with a tooltip that surfaces server/network detail when available. - Invalid inline image data still renders `Image failed to load` without a probe.	2026-04-22 14:10:51 +10:00
blinkagent[bot]	79a9f437d7	feat(coderd/x/chatd/chattool): add description tags to tool parameter structs (#24394 )	2026-04-21 11:37:29 -07:00
Jaayden Halko	148e56b5d9	fix(coderd): fix TestPatchChat/Title flake by waiting for chat to settle (#24572 ) ## Problem `TestPatchChat/Title/Rename` and `TestPatchChat/Title/TrimsWhitespace` fail intermittently on `test-go-pg` with: ``` PATCH .../api/experimental/chats/<id>: unexpected status code 409: Title regeneration already in progress for this chat. ``` `createChat` persists a chat with `ChatStatusPending` and signals the daemon wake loop. If the `UpdateChat` PATCH arrives before the daemon transitions the chat past `Pending`/`Running`, the handler's `acquireManualTitleLock` returns a 409. Whether the PATCH wins the race is timing-dependent under PG + `-parallel` load. Sibling subtests `PreservesUpdatedAt` and `NoOpWhenTitleUnchanged` already wait for the chat to leave `Pending`/`Running` before renaming, which is why they do not flake. ## Fix Add a `waitChatSettled` helper closure in `TestPatchChat` that polls `client.GetChat` until the chat status leaves `Pending`/`Running`. Call it in the 4 subtests that issue a valid rename immediately after `createChat`: - `Title/Rename` (originally reported flake) - `Title/TrimsWhitespace` (originally reported flake) - `Title/LengthBoundaries` (latent flake in valid-rename cases) - `Title/PublishesWatchEvent` (latent flake, goroutine silently 409s) No handler, daemon, or SDK changes. The 409 is intentional production behavior; this is a pure test-side timing fix. Refs coder/internal#1480	2026-04-21 17:10:00 +01:00
Ethan	c1421b4ead	test(coderd/x/chatd): deflake stale control notification test (#24545 ) Previously, `TestProcessChat_IgnoresStaleControlNotification` could return as soon as `UpdateChatStatus` ran, even though `processChat` still re-read chat state and finished deferred cleanup afterward. That let gomock and quartz teardown race the tail of cleanup and intermittently fail the test. Wait for `processChat` itself to return before asserting the final status, while keeping the existing strict mock expectations intact. Closes https://github.com/coder/internal/issues/1479	2026-04-22 00:08:34 +10:00
Ethan	2295e9d5be	feat: surface upstream provider error details in chat callout (#24546 ) Anthropic HTTP 400 responses (e.g. "image exceeds 5 MB maximum") were collapsed in the chat UI to the generic headline "Anthropic returned an unexpected error (HTTP 400)." with no actionable detail — the upstream message survived to the processor log but was dropped before reaching the client. Add a new optional `Detail` field on `codersdk.ChatStreamError` that carries the upstream provider message alongside the existing normalized headline. The backend extracts `error.message` from `fantasy.ProviderError.ResponseBody` (the JSON envelope shared by Anthropic and OpenAI), falls back to the trimmed provider message when the body is absent or unparseable, and caps the result at 500 runes. The frontend threads `Detail` through `useChatStore`, `liveStatusModel`, and `ChatStatusCallout`, rendering it as a muted secondary line inside the existing `AlertDescription`. Before: <img width="1552" height="185" alt="image" src="https://github.com/user-attachments/assets/524b588e-3cee-4fad-bc15-6bf3aec0899d" /> After: <img width="814" height="173" alt="image" src="https://github.com/user-attachments/assets/eae82a89-3ac1-4a33-8d18-ef9f77263d89" /> ## Persistence `Detail` is not persisted — it disappears on refresh. Persisting it would require a DB change (today `chats.last_error` is a single nullable `TEXT` column), and the shape of persisted chat errors is worth a more deliberate rethink — e.g. promoting `last_error` to `JSONB` so we can also retain structured fields like `kind`, `statusCode`, `provider`, and `retryable` instead of only the normalized headline string. That's a bigger design discussion than this PR should carry. In the meantime, seeing the upstream error reason immediately on failure is already a large UX improvement over the status quo, and this PR gets us there without prejudicing the eventual persistence design. Tracking persistence in CODAGT-239. Closes CODAGT-235	2026-04-22 00:05:27 +10:00
Cian Johnston	4d45b69b03	fix: stop tracking chat title in audit logs (#24564 ) Chat titles can contain sensitive information (secrets, internal project names, etc.) and should not be visible in audit logs. - Use truncated chat UUID (first 8 chars) as `resource_target` instead of the title - Mark the `title` field as `ActionSecret` so diffs render as `••••••••` <details><summary>Implementation notes</summary> Two changes: 1. `coderd/audit/request.go`: `ResourceTarget` for Chat returns `typed.ID.String()[:8]` instead of `typed.Title` 2. `enterprise/audit/table.go`: Chat `title` field tracking changed from `ActionTrack` to `ActionSecret` No frontend changes needed. The frontend already handles `secret: true` fields. </details> > 🤖	2026-04-21 14:26:22 +01:00
Michael Suchacz	f073323c89	refactor: unify subagent spawn behind spawn_subagent (#24535 ) Unify the three subagent spawn tools (`spawn_agent`, `spawn_explore_agent`, `spawn_computer_use_agent`) behind a single `spawn_subagent` tool keyed by a `subagent_type` discriminant (`general`, `explore`, `computer_use`). Mirrors the single-entry-point pattern already used by `task` in mux while keeping `wait_agent`, `message_agent`, and `close_agent` as separate lifecycle tools. A new backend subagent definition catalog (`coderd/x/chatd/subagent_catalog.go`) is the source of truth for tool description, prompt guidance, availability rules (plan mode, desktop/Anthropic gating), and child-chat option building. `spawn_subagent` advertises only the types available in the current context and validates `subagent_type` server-side; context inheritance still flows through the existing `createChildSubagentChatWithOptions` path. `wait_agent`, `message_agent`, and `close_agent` responses now include a server-derived `subagent_type` so the UI stops inferring lifecycle state from tool names. The frontend gets a shared normalization helper (`site/src/pages/AgentsPage/components/ChatElements/tools/subagentDescriptor.ts`) that maps either legacy tool names or new `spawn_subagent` args into a common descriptor (action, variant, icon, fallback copy). Legacy transcripts still render identically; `Tool.tsx`, `SubagentTool.tsx`, `ToolLabel.tsx`, `ToolIcon.tsx`, and `messageParsing.ts` now key off the descriptor instead of hard-coded names. Existing UI copy is preserved (`Spawning Explore agent...`, `Using the computer...`, computer-use monitor icon and Open Desktop affordance). > This PR was opened by Mux working on Mike's behalf.	2026-04-21 14:01:32 +02:00
Michael Suchacz	cb67e71835	fix(coderd/database): renumber duplicate MCP migration (#24552 ) ## Summary - rename the `allow_in_plan_mode` migration pair from `000472` to `000473` - rename the matching fixture file and update its comment - remove the duplicate migration version that broke containerized database startup ## Testing - `go test ./coderd/database/migrations -run '^TestMigrate$' -count=1 -timeout 15m` - validated `iofs.New` for `coderd/database/migrations` and `coderd/database/migrations/testdata/fixtures` Closes coder/internal#1483 > Mux opened this PR on Mike's behalf.	2026-04-21 11:10:17 +00:00
Michael Suchacz	9d0469fc4c	feat: allow approved external MCP tools in root plan mode (#24509 ) ## Summary Allow root plan-mode chats to use MCP tools from external servers that an admin has explicitly approved for plan mode. Workspace MCP and plan-mode subagents remain blocked. ## Problem `chatd.go` excluded every MCP tool when `isPlanModeTurn` was true, so planning had no access to tools like docs search, ticketing, etc. Lifting that guard wholesale was unsafe: `mcp_server_configs` already has centralized admin governance, but workspace-local MCP (discovered from agent `.mcp.json`) does not, and subagents use a narrower trust boundary. ## Fix Add an admin-controlled per-server `allow_in_plan_mode` flag (default `false`) and gate plan-mode MCP access on it. ### Backend / schema - New migration `000472_mcp_server_allow_in_plan_mode.{up,down}.sql` and matching fixture update. - `mcpserverconfigs.sql` + generated code: persist and read the new column. - `codersdk/mcp.go`: thread the field through `MCPServerConfig`, `Create`, and `Update` request types. - `coderd/mcp.go`: validate, persist, and return the flag in get/list/create/update handlers. ### chatd - `coderd/x/chatd/chatd.go`: pre-filter selected external MCP configs by `AllowInPlanMode` before calling `mcpclient.ConnectAll` on plan-mode root turns. Workspace MCP discovery is skipped entirely on plan-mode turns. - Single helper decides whether a tool is available in plan mode, used both at construction and for active-tool filtering (defense in depth). Plan-mode subagents, dynamic tools, provider-native tools, computer-use, and workspace MCP stay unchanged. - `coderd/x/chatd/prompt.go`: update the root plan-mode overlay text to match the new boundary. ### UI - `MCPServerAdminPanel.tsx`: add an explicit toggle ("Allow all tools from this MCP server in root plan mode") next to the existing governance controls. - Regenerated `site/src/api/typesGenerated.ts`. ### Docs - `docs/ai-coder/agents/architecture.md`: replace the blanket "MCP is unavailable in plan mode" note with the new root-only, external-only, admin-approved policy. Explicitly call out that workspace MCP and plan-mode subagents are still excluded. ### Tests - Plan-mode visibility (approved vs non-approved external server). - Plan-mode invocation of an approved external MCP tool. - End-to-end plan-mode workflow that uses an approved MCP tool and then reaches `propose_plan`. - Regressions: workspace MCP still excluded in plan mode; plan-mode subagents still on the restricted tool boundary; existing tool allow/deny list filtering still applies. ## Policy precedence `allow_in_plan_mode` is an additional requirement on top of existing `enabled`, availability, chat-selected / forced server IDs, and tool allow/deny lists. It approves all tools on that server for root plan mode; a per-tool plan allowlist is deliberately deferred. ## Follow-ups (explicitly out of scope) - Whether plan-mode subagents should inherit approved external MCP tools. - Workspace-local MCP safety model (agent-side `.mcp.json` schema vs. a coderd-managed workspace MCP config). ## Validation - `go vet ./coderd/x/chatd/...` - `go test ./coderd/x/chatd -run 'TestPlan.\|TestMCP.' -count=1` - `go test ./coderd/x/chatd -count=1 -timeout 5m` (full chatd suite) - `make fmt` (no diff) > Mux opened this PR on Mike's behalf.	2026-04-21 12:26:12 +02:00
Cian Johnston	c968a1f3a3	feat: make database.Chat auditable (#24485 ) Wire database.Chat into the audit system so chat lifecycle events (creation, patches, etc.) produce audit log entries. Part of CODAGT-200. > 🤖	2026-04-21 11:11:56 +01:00
Cian Johnston	5f3effd839	fix(coderd/x/chatd): add chattest.OpenAI() default fake server (#24540 ) - Add `chattest.OpenAI(t)` convenience wrapper around `NewOpenAI` with sensible defaults (JSON title response for non-streaming, text chunk for streaming) - Update `seedChatDependencies` to use it instead of an empty base URL, preventing title generation from hitting real `api.openai.com` with a fake key: ``` t.go:111: 2026-04-20 19:23:31.885 [debu] coderd.chatd.processor: title model candidate failed chat_id=edb43454-f23d-4163-9974-d101b8091de6 chat_id=edb43454-f23d-4163-9974-d101b8091de6 ... error= generate structured title: github.com/coder/coder/v2/coderd/x/chatd.generateStructuredTitleWithUsage /home/coder/src/coder/coder/coderd/x/chatd/quickgen.go:443 - unauthorized: Incorrect API key provided: test-api-key. You can find your API key at https://platform.openai.com/account/api-keys. ``` > 🤖	2026-04-21 10:26:20 +01:00
Ethan	181e103201	fix: reuse shared tailnet for coderd-hosted MCP workspace tools (#24460 ) ## Problem Coderd can expose an MCP server at `/api/experimental/mcp/http` (we have this enabled on dogfood). Its workspace tools dialed agents through a per-call client-side tailnet stack. Every tool call re-created a WireGuard device, netstack, magicsock + UDP sockets, DERP connection, coordinator websocket, and their goroutines — in a process that already runs a long-lived shared tailnet. The duplicate stacks drove up resource usage under load. ## Fix Route this server's tool calls through the existing shared tailnet, so none of those transports are reconstructed per call. Closing an `AgentConn` now releases a tunnel reference instead of tearing down a transport. ## Potential follow-up `coder exp mcp server` still builds a fresh tailnet per call. It pays per-call latency and causes coordinator/DERP churn. A shared CLI tailnet is more involved — unlike coderd, the CLI has no existing shared tailnet to reuse, so it would need a new long-lived client-side tailnet with reconnect, sleep/wake, and idle-destination handling. There's less motivation to optimize this, given the client-side MCP does not compete for resources with coderd. Closes CODAGT-199 > Generated by mux, but reviewed by a human	2026-04-21 11:37:10 +10:00
Ethan	1203f625b7	feat(coderd): accept parameters in start_workspace tool (#24434 ) When the chat `start_workspace` tool triggers an active-version upgrade that introduces new required parameters, the build fails with a parameter validation error. Previously this returned a message telling the user to update from the UI — a dead end for the model. This PR lets the model recover inside the chat by: 1. Accepting an optional `parameters` map on `start_workspace` (same schema as `create_workspace`), forwarded as `RichParameterValues`. 2. Returning structured JSON error responses that preserve validation details and the workspace's `template_id`, so the model can call `read_template` to discover what changed. 3. Replacing the UI-only guidance in `exp_chats.go` with model-actionable retry instructions. The expected model flow on an active-version parameter failure is now: ``` start_workspace → fails (structured error with template_id + validations) read_template → discovers new required parameters start_workspace → retries with parameters map → workspace starts ``` <img width="846" height="511" alt="image" src="https://github.com/user-attachments/assets/d18b6864-5970-4225-8da0-0f2ab134ccb4" />	2026-04-21 11:36:20 +10:00
Jakub Domeracki	411ed21059	fix(coderd): omit frame-ancestors CSP for embed routes (#24529 )	2026-04-20 15:38:52 +02:00
Jaayden Halko	410f9a5e19	feat: allow renaming of agent chat title (#24489 ) Co-authored-by: Coder Agents <noreply@coder.com>	2026-04-20 14:00:46 +01:00
Thomas Kosiewski	18a30a7a10	feat: add chat debug HTTP handlers and API docs (#23918 )	2026-04-20 13:34:41 +02:00
Dean Sheather	ea00d2d396	fix(coderd): enforce workspace authz on watchChatGit (#24477 ) `watchChatGit` proxies a live websocket to the workspace agent's git watcher (`/api/v0/git/watch`), streaming repository diffs back through the chat stream. Before this change it only enforced `chat:read` (via `ExtractChatParam`) plus an implicit `workspace:read` from the dbauthz wrapper on `GetWorkspaceAgentsInLatestBuildByWorkspaceID`. The sibling `watchChatDesktop` handler already fetches the workspace and requires `policy.ActionApplicationConnect` or `policy.ActionSSH` before dialing. Built-in roles like Template Admin and Org Admin grant `workspace:read` without SSH/ApplicationConnect, and Owner also loses both under `DisableOwnerWorkspaceExec`. A chat owner whose exec-level workspace access was revoked after the chat was bound could therefore keep streaming repository content from the workspace agent through the chat's git-watch endpoint. Mirror `watchChatDesktop`: fetch the workspace and require `ApplicationConnect \|\| SSH` before any agent-tunnel activity. Adds one real-coderdtest regression test (`TestWatchChatGitAuthz`) that demotes the chat's owner to template-admin after binding and asserts the git-watch endpoint returns 403; the mock-based `TestWatchChatGit` in `coderd/workspaceagents_internal_test.go` continues to cover the no-workspace / disconnected-agent / websocket-proxy paths. Fixes CODAGT-184.	2026-04-20 21:33:35 +10:00
Jakub Domeracki	615be176b8	fix(coderd): add frame-ancestors CSP directive to prevent clickjacking (#24474 )	2026-04-20 13:01:46 +02:00
Mathias Fredriksson	467430d8fa	fix: sort child chats newest-first and prepend on creation (#24524 ) GetChildChatsByParentIDs sorted created_at ASC, but the cache helper appended new children to the end. On refetch the API and cache agreed on oldest-first, putting the just-created child at the bottom. Users expect newest first, matching the root-chat sidebar convention. - SQL: change child sort to created_at DESC, id DESC. - Cache: prepend instead of append in addChildToParentInCache (renamed from appendChildToParentInCache to avoid leaking position semantics). - Test: update ordering assertion to expect newest-first. Refs #24404	2026-04-20 10:43:31 +00:00
Thomas Kosiewski	df7e838c21	feat(coderd): wire debug logging into chat lifecycle (#23917 )	2026-04-20 12:27:16 +02:00
Mathias Fredriksson	fc2493780f	fix: exclude subagent chats from sidebar pagination (#24404 ) GetChats now returns only root chats (parent_chat_id IS NULL). A new GetChildChatsByParentIDs query fetches children for visible roots and embeds them in each parent's Children field. The singular getChat endpoint does the same. Archive invariant is one-way: parent archived implies child archived. Parent archive/unarchive cascades via root_chat_id. Individual child archive is permitted; child unarchive while the parent is archived is rejected atomically (row lock on child, re-read parent inside the transaction). Embedded children are filtered by the caller's archive state so individually-archived children stay hidden from active-parent views. Gitsync MarkStale uses GetChatsByWorkspaceIDs directly; MarkStaleParams.OwnerID removed (dead after the switch). Frontend: buildChatTree reads from the embedded children field, WebSocket handlers route child events into the parent's children array, and archiving a child strips it from the parent cache.	2026-04-20 13:19:59 +03:00
Cian Johnston	df429b7f60	fix: classify HTTP/2 transport failures as retryable timeouts (#24502 ) Modifies chatloop error classification behaviour to treat the following as retryable: * HTTP/2 `force closed` * GOAWAY * use of closed network connection * Modfies user-facing retry banner to show "<provider> is temporarily unavailable." Relates to CODAGT-212. > 🤖	2026-04-20 11:09:47 +01:00
Ethan	ef6969dd70	feat(coderd/x/chatd): agent-created file attachments in chat (#24280 ) Agents can already see workspace files and take screenshots, but users could not download those artifacts from chat. This PR adds durable chat attachments to chatd. `attach_file`, explicit `computer` screenshot actions (not the automatic post-action screenshots), and `propose_plan` now fetch bytes over the agent connection, store them in `chat_files`, link them to the chat, and carry attachment metadata in tool responses so `buildAssistantPartsForPersist` can materialize ordinary `type:"file"` assistant parts that the chat file APIs serve. The same storage helpers are reused for other artifact-producing paths. `wait_agent` recordings and thumbnails are stored as chat files and linked back to the parent chat, with best-effort relinking so parent chats retain those artifacts without leaving orphaned rows when chat-file caps reject links. `storeChatAttachment` wraps insert + link in one transaction, files are capped at 10 MB each and 20 per chat, and serving defaults to `Content-Disposition: attachment` with an explicit inline-safe allowlist. This PR also consolidates chat-file media policy in `coderd/chatfiles`. Uploads and tool-generated attachments share byte-based MIME detection, SVG blocking, inline-safety rules, and compatible `text/plain` refinement for JSON, CSV, and Markdown. Prompt construction still only inlines synthetic pasted text for model consumption; assistant-created attachments are persisted for the user and intentionally not replayed into later LLM turns. UI follow-up lives in #24281. Relates to CODAGT-91	2026-04-20 18:04:35 +10:00
Mathias Fredriksson	6b0bb02e5d	fix: server-side diffs and stricter fuzzy splicing for edit_files (#24454 ) Fixes three classes of edit_files bugs and adds structured per-file diff output for tool callers: - New IncludeDiff flag on FileEditRequest; when set, the agent returns FileEditResponse.Files[]{Path, Diff} with unified diffs computed via go-udiff v0.4.1 Lines + ToUnified (not Unified, which calls log.Fatalf on internal error). - Fuzzy match comparators split each line into leading whitespace, body, trailing whitespace, and ending. The splice substitutes at each position: on agreement between search and replace the file's bytes win; on disagreement the replacement's bytes are spliced verbatim. Carve-outs for empty-body lines, multi-line EOF splices, and level-aware indent translation for inserted lines. - Indent-unit detection (GCD for spaces, tab-priority) lets a 4sp LLM search insert correctly into tab or 2sp files. Falls back to the previous cLead-inheritance path when units can't be detected cleanly. - Empty search is rejected with "search string must not be empty". - Duplicate file paths in one request are rejected; symlink aliases resolved via api.resolvePath before the dedup check. - Frontend EditFilesRenderer consumes the structured files array by explicit path (no label munging) with per-file synthetic fallback for older agents or mismatched paths. On error, no diff is rendered so the synthetic fallback doesn't misrepresent a rejected edit as applied. Breaking change: AgentConn.EditFiles changes from (ctx, req) error to (ctx, req) (FileEditResponse, error) in codersdk/workspacesdk. Source-breaking for external Go consumers; no compat shim per plan owner. Out of scope (tracked in CODAGT-214): level-aware indent for middle-substituted splice lines. Locked in TestEditFiles_FuzzyIndent_InsertionLevelAware's Lock_* cases plus TestEditFiles_ReplaceAll_FuzzyIndentGap.	2026-04-18 16:39:34 +03:00
Mathias Fredriksson	2a1984f0e8	fix(coderd/externalauth): save refreshed token before validation (#24332 ) GitHub rotates refresh tokens on use, invalidating the old token immediately. If post-refresh validation fails (e.g. rate-limited 403 from /user), the new token was silently discarded because the DB save only happened after successful validation. The next refresh attempt would use the stale refresh token, fail permanently, and destroy the token. Move the UpdateExternalAuthLink call to immediately after TokenSource.Token() succeeds. The post-validation save block is removed (dead code after the early save). The DB write uses a detached context (context.WithoutCancel) so a canceled request cannot prevent persistence of the already-consumed refresh token.	2026-04-18 14:28:29 +03:00
Spike Curtis	2ea27e897b	chore: split Pubsub interface into Publisher and Subscriber (#24442 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Splits the Pubsub into Publisher and Subscriber interfaces. Allows components to scope down their needs if they only publish or only subscribe. This allows smaller fakes/mocks and generally better encapsulation.	2026-04-17 22:58:33 -04:00
Spike Curtis	e19b21b7d5	chore: add GetLatestWorkspaceBuildWithStatusByWorkspaceID query (#24441 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> relates to GRU-18 Adds new database query supporting the Agent Connection Watch we will add.	2026-04-17 22:47:08 -04:00
Zach	72f35e1cd3	feat: runtime user secrets injection into workspaces (#24313 ) Injects user secrets into workspace agents at runtime via the agent manifest. Secrets with an environment variable name are set as environment variables in every agent session and startup script. Secrets with a file path are written to disk before startup scripts run. - Fetch user secrets in GetManifest and convert to proto - Defensively strip secrets from manifests received by the agent to avoid accidental leakage - Add WorkspaceSecret type and proto conversion helpers to agentsdk - Write secret files eagerly on manifest fetch (0600 perms, 0700 dirs) - Inject secret env vars per-session in updateCommandEnv - Expand ~/paths using caller-resolved home directory - Log file write errors without blocking workspace startup	2026-04-17 16:55:24 -06:00
Cian Johnston	3f6b40a833	fix: reap idle chatd stream states on a timer (#24476 ) * Adds `streamJanitorLoop` to clean up stale streams every 30s * zeroes dropped slots to aid in gc-eligibliity * Adds regression tests in coderd/x/chatd and enterprise/coderd/x/chatd > 🤖	2026-04-17 19:22:00 +01:00
Cian Johnston	4b585465b8	feat: label chatd metrics by model, add stream-state diagnostics (#24475 ) Adds production-observability metrics to coderd/x/chatd/ for model-level correlation and a chatStreams memory-leak investigation. - Label per-request chatd metrics (steps_total, message_count, prompt_size_bytes, tool_result_size_bytes, ttft_seconds, compaction_total) with `model` and enrich the per-turn logger with provider/model. - Add `coderd_chatd_stream_retries_total{provider, model, kind}` counter incremented in chatloop before OnRetry. - Register a prometheus.Collector exposing `streams_active`, `stream_buffer_size_max`, `stream_buffer_events`, `stream_subscribers` from p.chatStreams. - Add `coderd_chatd_stream_buffer_dropped_total` counter, incremented per publishToStream drop independently of the existing log-rate-limited bufferDropCount. - Snapshot logger/model before the title-generation goroutine to avoid a data race with the logger/model rebind below it. > 🤖	2026-04-17 16:16:30 +01:00
Thomas Kosiewski	91f9de27a1	feat(coderd): add chat debug service and summary aggregation (#23916 )	2026-04-17 16:27:53 +02:00

1 2 3 4 5 ...

3688 Commits