coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 21:18:24 +00:00

Author	SHA1	Message	Date
Jeremy Ruppel	c23abc691f	feat: sort AI sessions by last prompt time (#24440 ) Previously, the sessions list sorted by `MIN(started_at)` across interceptions, so sessions with old start times but recent activity would sink to the bottom of the list regardless of how recently they were used. `ListAIBridgeSessions` now sorts by `COALESCE(MAX(prompt.created_at), MIN(started_at)) DESC`, exposed as the non-nullable `last_active_at` field. Sessions with prompts surface by last activity; sessions with no prompts fall back to their start time. The original implementation used two separate columns (`last_active_at` as a nullable prompt timestamp and `sort_at` as the non-nullable cursor key). This revision collapses them into a single `last_active_at` that is always set — simplifying the SQL, the Go conversion, the API type, and the frontend. 🤖 Generated with [Claude Code](https://claude.ai/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 12:06:49 -04:00
Marcin Tojek	ec91ac5427	fix: grant AsAIBridged ResourceSystem.ActionCreate for UpsertAISeatState (#24603 ) Related coder/internal#1444	2026-04-22 16:38:57 +02:00
Ethan	ad1906589d	fix(coderd): allow deleting chat providers used in historical chats (#24568 ) Drop the `chat_model_configs.provider -> chat_providers.provider` foreign key and soft-delete model configs when their provider is removed. The provider row is now hard-deleted inside a transaction that also tombstones its model configs and promotes a replacement default when needed. Historical chats and messages keep pointing at the soft-deleted model config rows, which are hidden from live/admin queries but still resolve for read. The runtime chat path already falls back to the default model config when a soft-deleted config is looked up. Replaces the lost FK validation in the create/update model-config handlers with an explicit provider lookup that returns the existing `Chat provider is not configured.` 400. ## UX Admin deleting a chat provider that has historical usage - Before: blocked with 400 `Provider models are still referenced by existing chats.` Admins had no in-product way to remove a provider that had ever been used. - After: delete succeeds (204). Any model configs under that provider are soft-deleted. If the removed provider owned the default model config, one of the remaining live configs is auto-promoted to the new default. The promotion is deterministic (`ensureDefaultChatModelConfig` picks the first live config by `provider ASC, model ASC, updated_at DESC, id DESC`); there is no picker, and no toast or response detail names which config became the new default. End users with chats that used a deleted provider's model - Old chats still open and their history still renders unchanged. - Sending a new turn in such a chat silently falls back to the current default model. No banner or warning tells the user the original model is gone. - The model picker no longer lists the deleted model. - If no default model config exists at all after the delete, sending a new turn fails with `no default chat model config is available`. Admin creating or updating a model config against a provider that is not configured - Same as before: 400 `Chat provider is not configured.` Only the detection mechanism changed (explicit `FOR UPDATE` lookup inside the transaction, which also serializes against a concurrent provider delete). Admin updating a model config whose row disappears mid-transaction - Now returns the standard 404 `Resource not found or you do not have access to this resource` instead of the previous 500 that leaked `sql: no rows in result set` in the detail. Unrelated internal races (for example a race on the promoted default candidate) are still reported as 500 so they are not misclassified as "your target is gone". Closes CODAGT-23	2026-04-22 19:34:34 +10:00
Michael Suchacz	cb67e71835	fix(coderd/database): renumber duplicate MCP migration (#24552 ) ## Summary - rename the `allow_in_plan_mode` migration pair from `000472` to `000473` - rename the matching fixture file and update its comment - remove the duplicate migration version that broke containerized database startup ## Testing - `go test ./coderd/database/migrations -run '^TestMigrate$' -count=1 -timeout 15m` - validated `iofs.New` for `coderd/database/migrations` and `coderd/database/migrations/testdata/fixtures` Closes coder/internal#1483 > Mux opened this PR on Mike's behalf.	2026-04-21 11:10:17 +00:00
Michael Suchacz	9d0469fc4c	feat: allow approved external MCP tools in root plan mode (#24509 ) ## Summary Allow root plan-mode chats to use MCP tools from external servers that an admin has explicitly approved for plan mode. Workspace MCP and plan-mode subagents remain blocked. ## Problem `chatd.go` excluded every MCP tool when `isPlanModeTurn` was true, so planning had no access to tools like docs search, ticketing, etc. Lifting that guard wholesale was unsafe: `mcp_server_configs` already has centralized admin governance, but workspace-local MCP (discovered from agent `.mcp.json`) does not, and subagents use a narrower trust boundary. ## Fix Add an admin-controlled per-server `allow_in_plan_mode` flag (default `false`) and gate plan-mode MCP access on it. ### Backend / schema - New migration `000472_mcp_server_allow_in_plan_mode.{up,down}.sql` and matching fixture update. - `mcpserverconfigs.sql` + generated code: persist and read the new column. - `codersdk/mcp.go`: thread the field through `MCPServerConfig`, `Create`, and `Update` request types. - `coderd/mcp.go`: validate, persist, and return the flag in get/list/create/update handlers. ### chatd - `coderd/x/chatd/chatd.go`: pre-filter selected external MCP configs by `AllowInPlanMode` before calling `mcpclient.ConnectAll` on plan-mode root turns. Workspace MCP discovery is skipped entirely on plan-mode turns. - Single helper decides whether a tool is available in plan mode, used both at construction and for active-tool filtering (defense in depth). Plan-mode subagents, dynamic tools, provider-native tools, computer-use, and workspace MCP stay unchanged. - `coderd/x/chatd/prompt.go`: update the root plan-mode overlay text to match the new boundary. ### UI - `MCPServerAdminPanel.tsx`: add an explicit toggle ("Allow all tools from this MCP server in root plan mode") next to the existing governance controls. - Regenerated `site/src/api/typesGenerated.ts`. ### Docs - `docs/ai-coder/agents/architecture.md`: replace the blanket "MCP is unavailable in plan mode" note with the new root-only, external-only, admin-approved policy. Explicitly call out that workspace MCP and plan-mode subagents are still excluded. ### Tests - Plan-mode visibility (approved vs non-approved external server). - Plan-mode invocation of an approved external MCP tool. - End-to-end plan-mode workflow that uses an approved MCP tool and then reaches `propose_plan`. - Regressions: workspace MCP still excluded in plan mode; plan-mode subagents still on the restricted tool boundary; existing tool allow/deny list filtering still applies. ## Policy precedence `allow_in_plan_mode` is an additional requirement on top of existing `enabled`, availability, chat-selected / forced server IDs, and tool allow/deny lists. It approves all tools on that server for root plan mode; a per-tool plan allowlist is deliberately deferred. ## Follow-ups (explicitly out of scope) - Whether plan-mode subagents should inherit approved external MCP tools. - Workspace-local MCP safety model (agent-side `.mcp.json` schema vs. a coderd-managed workspace MCP config). ## Validation - `go vet ./coderd/x/chatd/...` - `go test ./coderd/x/chatd -run 'TestPlan.\|TestMCP.' -count=1` - `go test ./coderd/x/chatd -count=1 -timeout 5m` (full chatd suite) - `make fmt` (no diff) > Mux opened this PR on Mike's behalf.	2026-04-21 12:26:12 +02:00
Cian Johnston	c968a1f3a3	feat: make database.Chat auditable (#24485 ) Wire database.Chat into the audit system so chat lifecycle events (creation, patches, etc.) produce audit log entries. Part of CODAGT-200. > 🤖	2026-04-21 11:11:56 +01:00
Jaayden Halko	410f9a5e19	feat: allow renaming of agent chat title (#24489 ) Co-authored-by: Coder Agents <noreply@coder.com>	2026-04-20 14:00:46 +01:00
Thomas Kosiewski	18a30a7a10	feat: add chat debug HTTP handlers and API docs (#23918 )	2026-04-20 13:34:41 +02:00
Mathias Fredriksson	467430d8fa	fix: sort child chats newest-first and prepend on creation (#24524 ) GetChildChatsByParentIDs sorted created_at ASC, but the cache helper appended new children to the end. On refetch the API and cache agreed on oldest-first, putting the just-created child at the bottom. Users expect newest first, matching the root-chat sidebar convention. - SQL: change child sort to created_at DESC, id DESC. - Cache: prepend instead of append in addChildToParentInCache (renamed from appendChildToParentInCache to avoid leaking position semantics). - Test: update ordering assertion to expect newest-first. Refs #24404	2026-04-20 10:43:31 +00:00
Thomas Kosiewski	df7e838c21	feat(coderd): wire debug logging into chat lifecycle (#23917 )	2026-04-20 12:27:16 +02:00
Mathias Fredriksson	fc2493780f	fix: exclude subagent chats from sidebar pagination (#24404 ) GetChats now returns only root chats (parent_chat_id IS NULL). A new GetChildChatsByParentIDs query fetches children for visible roots and embeds them in each parent's Children field. The singular getChat endpoint does the same. Archive invariant is one-way: parent archived implies child archived. Parent archive/unarchive cascades via root_chat_id. Individual child archive is permitted; child unarchive while the parent is archived is rejected atomically (row lock on child, re-read parent inside the transaction). Embedded children are filtered by the caller's archive state so individually-archived children stay hidden from active-parent views. Gitsync MarkStale uses GetChatsByWorkspaceIDs directly; MarkStaleParams.OwnerID removed (dead after the switch). Frontend: buildChatTree reads from the embedded children field, WebSocket handlers route child events into the parent's children array, and archiving a child strips it from the parent cache.	2026-04-20 13:19:59 +03:00
Spike Curtis	2ea27e897b	chore: split Pubsub interface into Publisher and Subscriber (#24442 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Splits the Pubsub into Publisher and Subscriber interfaces. Allows components to scope down their needs if they only publish or only subscribe. This allows smaller fakes/mocks and generally better encapsulation.	2026-04-17 22:58:33 -04:00
Spike Curtis	e19b21b7d5	chore: add GetLatestWorkspaceBuildWithStatusByWorkspaceID query (#24441 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> relates to GRU-18 Adds new database query supporting the Agent Connection Watch we will add.	2026-04-17 22:47:08 -04:00
Thomas Kosiewski	91f9de27a1	feat(coderd): add chat debug service and summary aggregation (#23916 )	2026-04-17 16:27:53 +02:00
Dean Sheather	4ba74dcdc8	feat(coderd): add PR status summary to telemetry snapshots (#24379 ) Adds aggregate PR counts (total, open, merged, closed) from `chat_diff_statuses` to telemetry snapshots, giving visibility into AI agent PR outcomes across deployments. The existing telemetry system reports `Chats`, `ChatMessageSummaries`, and `ChatModelConfigs`, but had no PR-level data. This adds a `ChatDiffStatusSummary` field to the `Snapshot` struct with four all-time counts derived from a single aggregate query. <details> <summary>Implementation details</summary> - New SQL query `GetChatDiffStatusSummary` counts `chat_diff_statuses` rows with non-NULL `pull_request_state`, grouped by state (open/merged/closed). - `ChatDiffStatusSummary` struct added to telemetry `Snapshot`, collected via a parallel `eg.Go()` block in `createSnapshot()`. - `dbauthz` wrapper uses `rbac.ResourceSystem` (telemetry-only pattern). - Test covers both empty state (zero counts) and populated state (mixed states + NULL-state exclusion). </details> > 🤖 Generated by Coder Agents	2026-04-17 21:56:11 +10:00
Michael Suchacz	73b5058923	feat: add Explore mode as subagent-only modality (#24448 ) > This PR was authored by Mux on behalf of Mike. Introduce Explore mode, a read-only subagent modality for delegated discovery and code investigation. ## What Adds a `spawn_explore_agent` tool that creates child chats restricted to read-only operations. An admin can optionally configure a deployment-wide model override so Explore subagents use a model optimized for large context or reasoning without changing the root chat's model. ### Backend - New `ChatModeExplore` enum value (migration 000471). - `spawn_explore_agent` tool definition with read-only allowlist: `read_file`, `execute`, `process_output`, `read_skill`, `read_skill_file`. Write tools, file editors, and nested subagent spawning are blocked. - Deployment config storage for the Explore model override (`agents_chat_explore_model_override` in `site_configs`). - Model resolution hierarchy: configured override, then current turn model, then global default. Silent fallback with warning log when the override becomes unavailable. - RBAC: `AsChatd` for daemon reads, `ActionRead` and `ActionUpdate` on `ResourceDeploymentConfig` for admin API calls. - Plan mode root chats can use `spawn_explore_agent` for read-only research, matching the planning prompt guidance. - The Explore override config API now reports malformed saved overrides as "treated as unset" so admins can clear them explicitly. ### Frontend - `ExploreModelOverrideSettings` component in admin agent behavior settings. Uses `ModelSelector`, handles unavailable model warnings, and supports explicit Save and Clear actions. - Malformed saved overrides show a warning and require an explicit Save to clear, instead of Clear auto-submitting behind the scenes. ### Tests - Integration: `TestExploreSubagentIsReadOnly` (full spawn flow, tool verification, prompt overlay, DB state). - Unit: tool allowlist tests for explore, plan, and default modes. - Internal: model override resolution with valid, invalid UUID, disabled, and unconfigured override scenarios. - RBAC: `dbauthz_test.go` for `GetChatExploreModelOverride` and `UpsertChatExploreModelOverride`. - API: admin set and clear, malformed stored override reporting, disabled model rejection, non-admin denial.	2026-04-17 13:40:17 +02:00
Michael Suchacz	1092093e98	feat: add internal subagent model override wiring (#24399 ) > Mux working on behalf of Mike. ## Summary - add an enabled chat model config lookup by ID for internal callers - keep `spawn_agent` unchanged while threading an internal model override through child subagent chat creation - extend chatd coverage for inherited bindings, plan mode, and internal override behavior ## Validation - `go test ./coderd/x/chatd ./coderd/database/dbauthz` - `make lint`	2026-04-16 17:08:02 +02:00
Ethan	55e525fc28	ci: add InTx linter replacing ruleguard rule (#24422 ) Replace the old `InTx` ruleguard rule in `scripts/rules.go` with a custom in-tree `go/analysis` analyzer under `scripts/intxcheck/`. The new analyzer catches the same direct and pass-through misuse classes as before, plus two new classes the pattern-matcher couldn't reach: - Indirect same-package helper misuse — flags `p.someHelper(ctx)` inside `InTx` when the helper body uses the outer store (the PR #24369 bug class). - Nested dangerous closures — descends into `go func() { ... }()`, `defer func() { ... }()`, and immediately-invoked function literals. The analyzer uses semantic `types.Object` identity instead of raw expression string comparison, which avoids false positives from closure-local shadowing and catches simple aliases like `outer := s.db` and `alias := s`. This PR also fixes three real outer-store-inside-transaction bugs the new analyzer surfaced: - `coderd/wsbuilder/wsbuilder.go`: `FindMatchingPresetID` and `getWorkspaceTask` now use the inner transaction store instead of `b.store`. - `enterprise/dbcrypt/dbcrypt.go`: `ensureEncrypted` now calls `s.InsertDBCryptKey` (the tx-wrapped store) instead of `db.InsertDBCryptKey`. The `dbCrypt.InTx` method wraps the raw tx in a new `*dbCrypt`, so `s.InsertDBCryptKey` still dispatches through the encryption layer. Two call sites need `// intxcheck:ignore` suppressions. Both are one-off patterns that only look like misuse because the analyzer doesn't track assignments — proving them safe would require full dataflow analysis, which is well beyond what a targeted lint like this should attempt: - `coderd/database/dbfake/dbfake.go` — `b.db` is reassigned to `tx` on the preceding line, so `b.doInTX()` actually uses the transaction. The analyzer sees the original `b.db` identity and flags it. - `coderd/database/db_test.go` — test intentionally passes the outer store to `require.Equal` to assert that nested `InTx` returns the same handle. Suppressions use `// intxcheck:ignore` instead of `//nolint:intxcheck` because `intxcheck` runs as a standalone `go/analysis` tool outside golangci-lint. golangci-lint's `nolintlint` checker flags `//nolint` directives for linters it doesn't control, so we use a custom comment prefix to avoid that conflict.	2026-04-17 00:07:30 +10:00
Dean Sheather	3452ab3166	chore: add client_type field to chats and telemetry (#24342 ) Add a `chat_client_type` enum (`ui` \| `api`) and `client_type` column to the `chats` table. The column defaults to `api` for new rows so API callers don't need to set it explicitly. Existing rows are backfilled to `ui`. The field flows through `CreateChatRequest`, `chatd.CreateOptions`, `InsertChat`, and is returned in the `Chat` response via `db2sdk`. <details> <summary>Implementation notes (Coder Agents generated)</summary> ### Changes Database migration (000469) - New enum `chat_client_type` with values `ui`, `api`. - New `client_type` column, `NOT NULL DEFAULT 'api'`. - Backfill: `UPDATE chats SET client_type = 'ui'`. SQL query — `InsertChat` now includes `client_type`. SDK — `ChatClientType` type added; `ClientType` field added to both `CreateChatRequest` (optional, defaults server-side to `api`) and `Chat` response. Handler — `postChats` maps the request field (defaulting to `api`) and passes it through `chatd.CreateOptions`. Sub-agent — Child chats inherit their parent's `client_type`. db2sdk — Maps the database value to the SDK type. ### Decision log - Default is `api` (not `ui`) so existing API integrations get the correct value without code changes. - Backfill sets existing rows to `ui` per requirement. - Child chats inherit `client_type` from parent rather than defaulting. </details>	2026-04-16 23:57:05 +10:00
Michael Suchacz	e5707a13d6	feat: support multiple agents with shared instance-identity auth (#24325 ) > This PR was authored by Mux on behalf of Mike. ## Summary Adds support for multiple peer root workspace agents sharing the same `auth_instance_id`, so AWS, Azure, and GCP instance-identity auth can issue the correct session token for a selected agent instead of assuming a single root agent per instance. ## Problem When a Terraform template attaches two or more `coder_agent` resources (with `auth = "aws-instance-identity"`) to a single compute instance, every agent shares the same cloud instance ID. The existing singular lookup picks whichever agent was created most recently, silently ignoring the others. ## Solution Introduce an optional pre-auth agent selector (`CODER_AGENT_NAME`) and make the server-side lookup ambiguity-aware. Database layer: - `GetWorkspaceAgentsByInstanceID` (`:many`): returns all matching root agents for an instance ID. - `GetWorkspaceAgentByInstanceIDAndName` (`:one`): returns the named root agent for disambiguation. SDK and CLI: - `agent_name` field added to AWS, Azure, and GCP request structs (`omitempty` for backward compatibility). - `CODER_AGENT_NAME` env var and `--agent-name` flag wired into the agent bootstrap before instance-identity auth runs. Server handler (`handleAuthInstanceID`): - When `agent_name` is present: direct lookup by (instance ID, name). - When absent: legacy lookup, then resource-scoped ambiguity check. Returns 409 with available agent names if multiple root agents match. - Whitespace-only names are trimmed and treated as unspecified. - Sub-agents remain excluded (`parent_id IS NULL` filter). Verification template: - `examples/templates/aws-multi-agent/` provisions one EC2 instance with two agents (`main` and `dev`), both using instance-identity auth with `CODER_AGENT_NAME` set in the cloud-init user data. ## Backward compatibility Existing single-agent deployments work unchanged. The `agent_name` field is optional with `omitempty`, and the unnamed path preserves today's behavior when only one root agent matches.	2026-04-16 13:59:09 +02:00
Michael Suchacz	1cf0354f72	feat: add plan mode with restricted tool boundary (#24236 ) > This PR was authored by Mux on behalf of Mike. ## Summary - add persistent plan mode for chats and the chat-specific plan file flow - add structured planning tools such as `ask_user_question` and `propose_plan` - keep `write_file` and `edit_files` constrained to the chat-specific plan file during plan turns - allow shell exploration in plan mode, including subagents, via `execute` and `process_output` - block implementation-oriented, provider-native, MCP, dynamic, and computer-use tools during plan turns - update the chat UI, tests, and docs for the new planning flow	2026-04-16 11:12:01 +02:00
Cian Johnston	6194bd6f57	fix: address post-merge review findings for chat org scoping (#24297 ) Addresses review findings from #23827 that were added post-merge: - Persisted attachments now store `organizationId`; mismatched orgs pruned on restore - Workspace selection reconciliation: stale IDs from previous orgs dropped via derived `effectiveWorkspaceId` - Org picker uses `permittedOrganizations()` for RBAC-aware filtering - Org picker hidden when user belongs to only one org - Ref-sync `useEffect` replaced with `useEffectEvent` - `CreateWorkspace()` and `ListTemplates()` take `organizationID` and `db` as required function parameters instead of optional struct fields — compiler enforces them, removes scattered nil guards - Cross-org template check in `CreateWorkspace` is now unconditional - `ListTemplates` org-scoping filter now has test coverage - `setupChatInfra` comment fixed; test helpers use params structs instead of positional UUIDs - Enterprise test documents that org admin only sees own chats (handler hardcodes `OwnerID` — future work needs sidebar UI before lifting that restriction) > 🤖	2026-04-15 11:39:05 +01:00
Callum Styan	730edba87a	fix: fix false positive disconnected agent metric reporting (#24225 ) We noticed during higher active workspace counts that the agent connection metric, generated via a query to the database, would report a relatively high amount of agents as disconnected. Somewhere between 5 and 20%. However, other metrics such as # of websocket connections would suggest that all agent connections are healthy. Looking at the `Agents` function in prometheus metrics, plus the query execution time (not accounting for actual database RT time) revealed that this reporting of agents as disconnected was almost certainly false positives due to clock drift in the way we're generating the metric values. At 10k metrics, with a p50 of 2ms and p99 of 5ms, the entire `agents` function could take upwards of 50s to execute. Because we were doing a query/database RT to query th apps for each agent individually, and grabbing a `time.Now` value on each iteration of that loop, it's likely the portion of agents that were reported as disconnected were those that had last heartbeat the furthest in the past. The fix here is to set a consistent `now` before fetching agent data to avoid clock drift inflating the inactive timeout comparison, and replace the per-agent app query N+1 with a single batched lookup to prevent loop execution time from pushing agents over the disconnected threshold. Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 22:23:06 -07:00
Cian Johnston	c552f9f281	fix: stop group spend limits from leaking across org boundaries (#24294 ) Three SQL queries (`GetUserGroupSpendLimit`, `ResolveUserChatSpendLimit`, `GetUserChatSpendInPeriod`) aggregated chat spend limits and usage globally across all organizations. A restrictive group limit in org A would bleed into org B. ## Changes - Add `organization_id` parameter to all three SQL queries in `coderd/database/queries/chats.sql` - When nil UUID is passed, queries fall back to global behavior (backward compat for HTTP dashboard endpoints) - When real org ID is passed, limits and spend are scoped to that organization - Thread `organizationID` through `ResolveUsageLimitStatus` → `checkUsageLimit` → all chatd call sites - Update dbauthz wrappers for new param structs - HTTP endpoints (`chatCostSummary`, `getMyChatUsageLimitStatus`) pass `uuid.Nil` with TODO for future org-scoped UI - Add `TestResolveUsageLimitStatus_OrgScoped` with 5 test cases covering org isolation, nil-UUID fallback, spend scoping, and user override priority Closes coder/internal#1466 > 🤖	2026-04-14 16:56:17 +01:00
Yevhenii Shcherbina	b78eba9f9d	feat: make sure creds are always masked (#24241 ) ## Summary Adds a `sanitizeCredentialHint` safety check in the db-to-SDK conversion layer to ensure credential hints are always masked before being exposed in the API. Also adds `credential_kind` and `credential_hint` assertions to the session threads API test.	2026-04-13 10:14:38 -04:00
Thomas Kosiewski	6ab30123bf	feat: add chat debug log tables, queries, and SDK types (#23913 )	2026-04-13 15:06:06 +02:00
Cian Johnston	22062ec52e	feat: add organization scoping to chats (#23827 ) Fixes https://github.com/coder/internal/issues/1436 * Adds organization_id to chats with backfill (workspace org → user org membership → default org) * No support yet for ACLs (follow-up issue) - Cross-org workspace binding rejected (both in `CreateChatRequest` and in `create_workspace` tool - Adds `OrganizationAutocomplete` to `AgentCreateForm` - Docs updated with `organization_id` in chats-api.md > 🤖 Written by a Coder Agent. Reviewed by many humans and many agents. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2026-04-13 12:31:25 +01:00
Mathias Fredriksson	a62ead8588	fix(coderd): sort pinned chats first in GetChats pagination (#24222 ) The GetChats SQL query ordered by (updated_at, id) DESC with no pin_order awareness. A pinned chat with an old updated_at could land on page 2+ and be invisible in the sidebar's Pinned section. Add a 4-column ORDER BY: pinned-first flag DESC, negated pin_order DESC, updated_at DESC, id DESC. The negation trick keeps all sort columns DESC so the cursor tuple < comparison still works. Update the after_id cursor clause to match the expanded sort key. Fix the false handler comment claiming PinChatByID bumps updated_at.	2026-04-10 17:13:19 +00:00
J. Scott Miller	7bde763b66	feat: add workspace build transition to provisioner job list (#24131 ) Closes #16332 Previously `coder provisioner jobs list` showed no indication of what a workspace build job was doing (i.e., start, stop, or delete). This adds `workspace_build_transition` to the provisioner job metadata, exposed in both the REST API and CLI. Template and workspace name columns were also added, both available via `-c`. ``` $ coder provisioner jobs list -c id,type,status,"workspace build transition" ID TYPE STATUS WORKSPACE BUILD TRANSITION 95f35545-a59f-4900-813d-80b8c8fd7a33 template_version_import succeeded 0a903bbe-cef5-4e72-9e62-f7e7b4dfbb7a workspace_build succeeded start ```	2026-04-10 09:50:11 -05:00
Matt Vollmer	36141fafad	feat: stack insights tables vertically and paginate Pull requests table (#24198 ) The "By model" and "Pull requests" tables on the PR Insights page (`/agents/settings/insights`) were side-by-side at `lg` breakpoints, and the Pull requests table was hard-capped at 20 rows by the backend. - Replaced `lg:grid-cols-2` with a single-column stacked layout so both tables span the full content width. - Removed the `LIMIT 20` from the `GetPRInsightsRecentPRs` SQL query so all PRs in the selected time range are returned. - Can add this back if we need it. If we do, we should add a little subheader above this table to indicate that we're not showing all PRs within the selected timeframe. - Added client-side pagination to the Pull requests table using `PaginationWidgetBase` (page size 10), matching the existing pattern in `ChatCostSummaryView`. - Renamed the section heading from "Recent" to "Pull requests" since it now shows the full set for the time range. <img width="1481" height="1817" alt="image" src="https://github.com/user-attachments/assets/0066c42f-4d7b-4cee-b64b-6680848edc68" /> > 🤖 PR generated with Coder Agents	2026-04-10 10:48:54 -04:00
Garrett Delfosse	3462c31f43	fix: update directory for terraform-managed subagents (#24220 ) When a devcontainer subagent is terraform-managed, the provisioner sets its directory to the host-side `workspace_folder` path at build time. At runtime, the agent injection code determines the correct container-internal path from `devcontainer read-configuration` and sends it via `CreateSubAgent`. However, the `CreateSubAgent` handler only updated `display_apps` for pre-existing agents, ignoring the `Directory` field. This caused SSH/terminal sessions to land in `~` instead of the workspace folder (e.g. `/workspaces/foo`). Add `UpdateWorkspaceAgentDirectoryByID` query and call it in the terraform-managed subagent update path to also persist the directory. Fixes PLAT-118 <details><summary>Root cause analysis</summary> Two code paths set the subagent `Directory` field: 1. Provisioner (build time): `insertDevcontainerSubagent` in `provisionerdserver.go` stores `dc.GetWorkspaceFolder()` — the host-side path from the `coder_devcontainer` Terraform resource (e.g. `/home/coder/project`). 2. Agent injection (runtime): `maybeInjectSubAgentIntoContainerLocked` in `api.go` reads the devcontainer config and gets the correct container-internal path (e.g. `/workspaces/project`), then calls `client.Create(ctx, subAgentConfig)`. For terraform-managed subagents (those with `req.Id != nil`), `CreateSubAgent` in `coderd/agentapi/subagent.go` recognized the pre-existing agent and entered the update path — but only called `UpdateWorkspaceAgentDisplayAppsByID`, discarding the `Directory` field from the request. The agent kept the stale host-side path, which doesn't exist inside the container, causing `expandPathToAbs` to fall back to `~`. </details> > [!NOTE] > Generated by Coder Agents	2026-04-10 10:11:22 -04:00
Zach	95cff8c5fb	feat: add REST API handlers and client methods for user secrets (#24107 ) Add the five REST endpoints for managing user secrets, SDK client methods, and handler tests. Endpoints: - `POST /api/v2/users/{user}/secrets` - `GET /api/v2/users/{user}/secrets` - `GET /api/v2/users/{user}/secrets/{name}` - `PATCH /api/v2/users/{user}/secrets/{name}` - `DELETE /api/v2/users/{user}/secrets/{name}` Routes are registered under the existing `/{user}` group with `ExtractUserParam`. The delete query was changed from `:exec` to `:execrows` so the handler can distinguish "not found" from success (DELETE with `:exec` silently returns nil for zero affected rows).	2026-04-09 12:12:55 -06:00
Yevhenii Shcherbina	8237822441	feat: byok observability api (#24207 ) ## Summary Exposes `credential_kind` and `credential_hint` on AI Bridge session threads, making credential metadata visible in the session detail API. Each thread in the `/api/v2/aibridge/sessions/{session_id}` response now includes: - `credential_kind`: `centralized` or `byok` - `credential_hint`: masked credential (e.g. `sk-a...pgAA`) Values are taken from the thread's root interception. ## Changes - `codersdk/aibridge.go`: Added `CredentialKind` and `CredentialHint` fields to `AIBridgeThread` - `coderd/database/db2sdk/db2sdk.go`: Populated from root interception in `buildAIBridgeThread` - `SessionTimeline.stories.tsx`: Added fields to mock thread data	2026-04-09 11:41:17 -04:00
Kyle Carberry	391b22aef7	feat: add CLI commands for managing chat context from workspaces (#24105 ) Adds `coder exp chat context add` and `coder exp chat context clear` commands that run inside a workspace to manage chat context files via the agent token. `add` reads instruction and skill files from a directory (defaulting to cwd) and inserts them as context-file messages into an active chat. Multiple calls are additive — `instructionFromContextFiles` already accumulates all context-file parts across messages. `clear` soft-deletes all context-file messages, causing `contextFileAgentID()` to return `!found` on the next turn, which triggers `needsInstructionPersist=true` and re-fetches defaults from the agent. Both commands auto-detect the target chat via `CODER_CHAT_ID` (already set by `agentproc` on chat-spawned processes), or fall back to single-active-chat resolution for the agent. The `--chat` flag overrides both. Also adds sub-agent context inheritance: `createChildSubagentChat` now copies parent context-file messages to child chats at spawn time, so delegated sub-agents share the same instruction context without independently re-fetching from the workspace agent. <details><summary>Implementation details</summary> New files: - `cli/exp_chat.go` — CLI command tree under `coder exp chat context` Modified files: - `agent/agentcontextconfig/api.go` — `ConfigFromDir()` reads context from an arbitrary directory without env vars - `codersdk/agentsdk/agentsdk.go` — `AddChatContext`/`ClearChatContext` SDK methods - `coderd/workspaceagents.go` — POST/DELETE handlers on `/workspaceagents/me/chat-context` - `coderd/coderd.go` — Route registration - `coderd/database/queries/chats.sql` — `GetActiveChatsByAgentID`, `SoftDeleteContextFileMessages` - `coderd/database/dbauthz/dbauthz.go` — RBAC implementations for new queries - `coderd/x/chatd/subagent.go` — `copyParentContextFiles` for sub-agent inheritance - `cli/root.go` — Register `chatCommand()` in `AGPLExperimental()` Auth pattern: Uses `AgentAuth` (same as `coder external-auth`) — agent token via `CODER_AGENT_TOKEN` + `CODER_AGENT_URL` env vars. </details> > 🤖 Generated by Coder Agents --------- Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>	2026-04-09 16:33:00 +02:00
Atif Ali	584c61acb5	fix: mark connecting agents as unhealthy instead of healthy (#24044 ) ## Problem Workspaces showed as "Healthy" immediately after creation while the agent was still downloading, starting, or connecting. If the agent never connected, the workspace stayed "Healthy" for the entire connection timeout (~120s), then abruptly flipped to "Unhealthy". ## Root cause In `db2sdk.WorkspaceAgent`, the health switch had no case for `WorkspaceAgentConnecting`. Agents in `connecting` status with a non-`off` lifecycle (e.g. `created` after a fresh build) fell through to the `default` case and were marked `Healthy = true`. ## Fix Add an explicit case for `WorkspaceAgentConnecting` that sets `Healthy = false` with reason `"agent has not yet connected"`. The case is placed after the existing `!connected + off` case (which correctly catches stopped agents as "not running") and before the `timeout`/`disconnected` cases. ``` Status + Lifecycle → Health reason ────────────────────────────────────────────────────── any !connected + off → "agent is not running" connecting + created/starting → "agent has not yet connected" ← NEW timeout + any → "agent is taking too long to connect" disconnected + any → "agent has lost connection" connected + start_error → "agent startup script exited with an error" connected + shutting_down → "agent is shutting down" connected + ready/starting → healthy ``` The frontend already handles this case — `getAgentHealthIssue()` returns "Workspace agent is still connecting" with `severity: "info"` for unhealthy workspaces with connecting agents. ## Test changes - Healthy test: now actually connects the agent via `agenttest.New` before asserting health (previously passed due to the bug). - New Connecting test: verifies a never-connected agent is correctly marked unhealthy. - Mixed health test: connects a1 and waits for the mixed state (`a1.Healthy && !workspace.Healthy`) to avoid a race where both agents are initially connecting. - Sub-agent excluded test: connects the parent agent and waits for it to be healthy before creating the sub-agent. - TestWorkspaceAgent/Connect: flipped assertion to `Health.Healthy == false` for a `dbfake` agent that never connects. <details> <summary>Review notes</summary> ### Known follow-up The `healthy:false` workspace search filter maps to `[disconnected, timeout]` and does not include `connecting`. This is a pre-existing gap that is now more consequential — a workspace unhealthy solely due to a connecting agent won't appear in `healthy:false` results. Worth a follow-up issue. ### Deep review findings addressed \| Finding \| Severity \| Status \| \|---------\|----------\|--------\| \| Mixed health test race (all 3 reviewers) \| P2 \| Fixed — tightened `Eventually` condition \| \| `TestWorkspaceAgent/Connect` assertion break \| P1 \| Fixed — flipped assertion \| \| CLI renders red for connecting agents \| Obs \| Acknowledged — design trade-off, accurate but visually strong for transient state \| \| Switch case ordering overlap \| Obs \| Documented with inline comment \| </details> > 🤖 This PR was created with the help of Coder Agents, and needs a human review. 🧑💻	2026-04-09 13:21:28 +05:00
Zach	9b91af8ab7	feat: add user secrets SDK types and db2sdk converters (#24102 ) Adds the SDK types and database-to-SDK conversion helpers for the user secrets feature.	2026-04-08 16:48:41 -06:00
Yevhenii Shcherbina	7f496c2f18	feat: byok-observability for aibridge (#23808 ) ## Summary Adds `credential_kind` and `credential_hint` columns to `aibridge_interceptions` to record how each LLM request was authenticated and provide a masked credential identifier for audit purposes. This enables admins to distinguish between centralized API keys, personal API keys, and subscription-based credentials in the interceptions audit log. ## Changes - New migration adding `credential_kind`and `credential_hint` to `aibridge_interceptions` - Updated `InsertAIBridgeInterception` query and proto definition to carry the new fields - Wired proto fields through `translator.go` and `aibridgedserver.go` to the database Depends on https://github.com/coder/aibridge/pull/239	2026-04-08 13:24:28 -04:00
Kyle Carberry	b969d66978	feat: add dynamic tools support for chat API (#24036 ) Adds client-executed dynamic tools to the chat API. Dynamic tools are declared by the client at chat creation time, presented to the LLM alongside built-in tools, but executed by the client rather than chatd. This enables external systems (Slack bots, IDE extensions, Discord bots, CI/CD integrations) to plug custom tools into the LLM chat loop without modifying chatd's built-in tool set. Modeled after OpenAI's Assistants API: the chat pauses with `requires_action` status when the LLM calls a dynamic tool, the client POSTs results back via `POST /chats/{id}/tool-results`, and the chat resumes. See [this example](https://github.com/coder/coder-slackbot-poc) as a reference for how this is used. It's highly-configurable, which would enable creating chats from webhooks, periodically polling, or running as a Slackbot. <details> <summary>Design context</summary> ### Architecture The chatloop exits when it encounters dynamic tools and re-enters when results arrive. No blocking channels, no pubsub for tool results, no in-memory registry. The DB is the only coordination mechanism. ``` Phase 1 (chatloop): LLM response → execute built-in tools only → Persist(assistant + built-in results) → status = requires_action → chatloop exits Phase 2 (POST /tool-results): Persist(dynamic tool results) → status = pending → wakeCh → chatloop re-enters ``` ### Validation (POST /tool-results) 1. Chat status must be `requires_action` (409 if not) 2. Read chat's `dynamic_tools` → set of dynamic tool names 3. Read last assistant message → extract tool-call parts matching dynamic tool names 4. Submitted tool_call_ids must match exactly (400 for missing/extra) 5. Persist tool-result message parts, set status to `pending`, signal wake ### Idempotency Tool call IDs scoped per LLM step. State machine (`requires_action` → `pending`) is the guard. First POST wins, subsequent get 409. ### Mixed tool calls When the LLM calls both built-in and dynamic tools in one step, built-in tools execute immediately. Their results are persisted in phase 1. Dynamic tool results arrive via POST in phase 2. The LLM sees all results when the chatloop resumes. </details> > 🤖 Generated by Coder Agents	2026-04-08 11:54:44 -04:00
Kyle Carberry	c5d720f73d	feat(coderd): add telemetry for agents chats and messages (#24068 ) Adds telemetry collection for the agents chat system (`/agents`) to the existing telemetry snapshot pipeline. Three new snapshot fields: - `Chats` — per-chat metadata (id, owner, status, mode, workspace_id, root_chat_id, has_parent, archived, model config) collected time-windowed via `createdAfter` - `ChatMessageSummaries` — per-chat aggregated message metrics (counts by role, token sums by type, cost, runtime, model count, compression count) collected time-windowed - `ChatModelConfigs` — model configuration metadata (provider, model, context limit, enabled, default) collected as full dump No PII is included — titles, message content, and URLs are excluded at the SQL level. Only structural metadata flows through telemetry. <details><summary>Implementation plan</summary> ### SQL Queries (`coderd/database/queries/chats.sql`) - `GetChatsCreatedAfter` — time-windowed chat metadata - `GetChatMessageSummariesPerChat` — per-chat message aggregates via `GROUP BY` - `GetChatModelConfigsForTelemetry` — full dump of model configs ### Telemetry (`coderd/telemetry/telemetry.go`) - `Chat`, `ChatMessageSummary`, `ChatModelConfig` structs - `ConvertChat`, `ConvertChatMessageSummary`, `ConvertChatModelConfig` conversion functions - Three `eg.Go()` blocks in `createSnapshot()` following the existing collection pattern ### Authorization (`coderd/database/dbauthz/dbauthz.go`) - System-only access for all three queries via `rbac.ResourceSystem` ### Tests - `TestChatsTelemetry` in `coderd/telemetry/telemetry_test.go` — creates chats (root + child), messages with token/cost data, model configs; verifies all snapshot fields - dbauthz test entries for all three queries in `coderd/database/dbauthz/dbauthz_test.go` </details> > 🤖 Generated by Coder Agents	2026-04-08 09:47:44 -04:00
Cian Johnston	233343c010	feat: add chat and chat_files cleanup to dbpurge (#23833 ) Fixes https://github.com/coder/coder/issues/23910 Adds periodic cleanup of chats and chat files to the dbpurge background goroutine, with a configurable retention period exposed in the Agent settings UI. > 🤖 Written by a Coder Agent. Reviewed by a human.	2026-04-08 11:08:09 +01:00
Zach	565a15bc9b	feat: update user secrets queries for REST API and injection (#23998 ) Update queries as prep work for user secrets API development: - Switch all lookups and mutations from ID-based to user_id + name - Split list query into metadata-only (for API responses) and with-values (for provisioner/agent) - Add partial update support using CASE WHEN pattern for write-only value fields - Include value_key_id in create for dbcrypt encryption support - Update dbauthz wrappers and remove stale methods from dbmetrics	2026-04-07 09:03:28 -06:00
Kyle Carberry	684f21740d	perf(coderd): batch chat heartbeat queries into single UPDATE per interval (#24037 ) ## Summary Replaces N per-chat heartbeat goroutines with a single centralized heartbeat loop that issues one `UPDATE` per 30s interval for all running chats on a worker. ## Problem Each running chat spawned a dedicated goroutine that issued an individual `UPDATE chats SET heartbeat_at = NOW() WHERE id = $1 AND worker_id = $2 AND status = 'running'` query every 30 seconds. At 10,000 concurrent chats this produces ~333 DB queries/second just for heartbeats, plus ~333 `ActivityBumpWorkspace` CTE queries/second from `trackWorkspaceUsage`. ## Solution New `UpdateChatHeartbeats` (plural) SQL query replaces the old singular `UpdateChatHeartbeat`: ```sql UPDATE chats SET heartbeat_at = @now::timestamptz WHERE worker_id = @worker_id::uuid AND status = 'running'::chat_status RETURNING id; ``` A single `heartbeatLoop` goroutine on the `Server`: 1. Ticks every `chatHeartbeatInterval` (30s) 2. Issues one batch UPDATE for all registered chats 3. Detects stolen/completed chats via set-difference (equivalent of old `rows == 0`) 4. Calls `trackWorkspaceUsage` for surviving chats `processChat` registers an entry in the heartbeat registry instead of spawning a goroutine. ## Impact \| Metric \| Before (10K chats) \| After (10K chats) \| \|---\|---\|---\| \| Heartbeat queries/sec \| ~333 \| ~0.03 (1 per 30s per replica) \| \| Heartbeat goroutines \| 10,000 \| 1 \| \| Self-interrupt detection \| Per-chat `rows==0` \| Batch set-difference \| --- > 🤖 Generated by Coder Agents <details><summary>Implementation notes</summary> - Uses `@now` parameter instead of `NOW()` so tests with `quartz.Mock` can control timestamps. - `heartbeatEntry` stores `context.CancelCauseFunc` + workspace state for the centralized loop. - `recoverStaleChats` is unaffected — it reads `heartbeat_at` which is still updated. - The old singular `UpdateChatHeartbeat` is removed entirely. - `dbauthz` wrapper uses system-level `rbac.ResourceChat` authorization (same pattern as `AcquireChats`). </details>	2026-04-07 10:25:46 -04:00
George K	86ca61d6ca	perf: cap count queries and emit native UUID comparisons for audit/connection logs (#23835 ) Audit and connection log pages were timing out due to expensive COUNT(*) queries over large tables. This commit adds opt-in count capping: requests can return a `count_cap` field signaling that the count was truncated at a threshold, avoiding full table scans that caused page timeouts. Text-cast UUID comparisons in regosql-generated authorization queries also contributed to the slowdown by preventing index usage for connection and audit log queries. These now emit native UUID operators. Frontend changes handle the capped state in usePaginatedQuery and PaginationWidget, optionally displaying a capped count in the pagination UI (e.g. "Showing 2,076 to 2,100 of 2,000+ logs") Related to: https://linear.app/codercom/issue/PLAT-31/connectionaudit-log-performance-issue	2026-04-07 07:24:53 -07:00
Cian Johnston	d5a1792f07	feat: track chat file associations with chat_file_links on chats (#23537 ) Needed by #23833 Adds a `chat_file_links` association table to track which files are associated with each chat. - `AppendChatFileIDs` query links a file to a chat with deduplication - `GetChatFileMetadataByIDs` query returns lightweight file metadata by IDs - Tool-created files (e.g. `propose_plan`) are linked to the chat after insert - User-uploaded files are linked to the chat when the referencing message is sent - Single-chat GET endpoint hydrates `files: ChatFileMetadata[]` on the response > 🤖 Created by Coder Agents and massaged into shape by a human.	2026-04-07 12:05:29 +01:00
Kyle Carberry	a2ce74f398	feat: add total_runtime_ms to chat cost analytics endpoints (#24050 ) Surface the aggregated `runtime_ms` from `chat_messages` through all four cost analytics queries (summary, per-model, per-chat, per-user). This is the key billing metric for agent compute time. The per-chat breakdown already groups by `root_chat_id`, so subagent runtime is automatically rolled up under the parent chat — no additional query changes needed. <details> <summary>Implementation details</summary> SQL (`coderd/database/queries/chats.sql`): Added `COALESCE(SUM(cm.runtime_ms), 0)::bigint AS total_runtime_ms` to `GetChatCostSummary`, `GetChatCostPerModel`, `GetChatCostPerChat`, and `GetChatCostPerUser`. Go SDK (`codersdk/chats.go`): Added `TotalRuntimeMs int64` to `ChatCostSummary`, `ChatCostModelBreakdown`, `ChatCostChatBreakdown`, and `ChatCostUserRollup`. Handler (`coderd/exp_chats.go`): Wired the new field through all converter functions and the response assembly. Tests (`coderd/exp_chats_test.go`): Updated fixture to seed non-zero `runtime_ms` values and added assertions for the new field at summary, per-model, and per-chat levels. </details> > 🤖 Generated by Coder Agents	2026-04-06 12:10:57 -04:00
Jon Ayers	a1d51f0dab	feat: batch connection logs to avoid DB lock contention (#23727 ) - Running 30k connections was generating a ton of lock contention in the DB	2026-04-03 15:47:26 -05:00
Jon Ayers	333503f74e	feat: improve coordinator peer mapping performance (#23696 ) - Skipping DB querying entirely for peers that aren't actually connected to our coordinator - Opportunistically batching the queries for peers	2026-04-03 14:22:58 -05:00
Paweł Banaszewski	8369fa88fd	feat: add columns for cached tokens from aibridge (#23832 ) Two new columns added to aibridge_token_usages: - cache_read_input_tokens (BIGINT, default 0) - cache_write_input_tokens (BIGINT, default 0) Migration backfills existing rows by extracting values from the metadata JSONB column (cache_read_input, input_cached, prompt_cached for reads (max value selected since only 1 should be set), cache_creation_input for writes). All references to data from metadata were updated to reference new columns. No other changes then changing where data is extracted from. Requires aibridge library version bump to include: https://github.com/coder/aibridge/pull/229 Fixes: https://github.com/coder/aibridge/issues/150	2026-04-03 16:27:31 +02:00
Zach	990c006f28	feat(coderd/database): add value_key_id column to user_secrets for encryption (#23997 ) Add a nullable `value_key_id` column to the `user_secrets` table with a foreign key to `dbcrypt_keys`. This is the column dbcrypt uses to track which encryption key encrypted a given secret's value. This is required for encryption of user secret values. The column was missing from the original migration (000357).	2026-04-02 15:40:32 -06:00
Michael Suchacz	7d0a0c6495	feat: provider key policies and user provider settings (#23751 )	2026-04-02 19:46:42 +02:00

1 2 3 4 5 ...

1462 Commits