Commit Graph

520 Commits

Author SHA1 Message Date
Garrett Delfosse f009c17217 fix(coderd): cut DB fan-out on agent instance-identity auth (backport #24973) (#24982)
Backport of #24973 to `release/2.33`.

## Summary

Restores `v2.33.0-rc.2`-equivalent query cost for agent
instance-identity auth, which currently saturates the pgx pool when
multiple agents share an instance ID. Customer report against rc.3
traced 233x `Internal error fetching provisioner job resource` 500s
during a 50-minute incident window to this path.

## Changes

1. **System fast-path on `authorizeProvisionerJob`**
(`coderd/database/dbauthz/dbauthz.go`): Short-circuits the per-job RBAC
fan-out through `GetWorkspaceBuildByJobID` -> `GetWorkspaceByID` for
`AsSystemRestricted` callers.
2. **Drop survivor re-fetch in `handleAuthInstanceID`**
(`coderd/workspaceresourceauth.go`): Captures the provisioner job
alongside each candidate during the filter loop so the post-selection
code reads it directly instead of re-querying.

## Conflict resolution

One conflict in `coderd/database/dbauthz/dbauthz_test.go`: the
`TestAsAutostart` test function (from an unrelated commit on `main`) was
brought in as surrounding context during the cherry-pick. It was removed
since it tests functionality (`ResourceUserSecret.Read` for the
Autostart role) not present on the release branch.

## Tests

- `TestAuthorizeProvisionerJob_SystemFastPath` (3 sub-tests): all pass
- `TestPostWorkspaceAuthAWSInstanceIdentity/Ambiguous/*` (7 sub-tests):
all pass

> Generated by Coder Agents

Co-authored-by: Dean Sheather <dean@deansheather.com>
2026-05-05 21:54:04 +02:00
Jon Ayers 17635dde5c chore: include pgcoordinator schema changes in 2.33 (#24931)
Includes https://github.com/coder/coder/pull/24613 since it landed prior
to the pgcoordinator migration

---------

Co-authored-by: Marcin Tojek <mtojek@users.noreply.github.com>
2026-05-04 15:42:34 -05:00
Cian Johnston df1bfe6479 feat: audit user secret create, update, and delete (#24756) (#24849)
Emit user secret audit log entries for create/update/delete operations.
Reads stay un-audited, matching every other resource.

Audit log entries record changes in user secret name, environment
variable name, file path, and value. The secret value column is marked
`ActionSecret` so the diff records the change without showing the
ciphertext or plaintext.

Closes a TOCTOU window on delete to ensure no phantom audit logs for a
delete of a non-existent secret. Secret update accepts a small TOCTOU
window matching the other audited resources (templates, workspaces,
chats). The two-query pattern is wrapped in a transaction so audit state
can't leak from a failed mutation.

(cherry picked from commit 1c30d52b2b)

<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->

Co-authored-by: Zach <3724288+zedkipp@users.noreply.github.com>
2026-04-30 21:01:27 +01:00
Zach 79735f2d45 feat: plumb user secrets through provisioner chain to terraform (#24542)
This change passes user secrets from coderd to the Terraform process at
workspace build time so the `data.coder_secret` data source in
terraform-provider-coder can resolve values at plan time.

Secrets traverse two proto hops: `provisionerdserver` fetches them
via`ListUserSecretsWithValues`, attaches them to
`AcquiredJob.WorkspaceBuild.user_secrets` on `provisionerd.proto`;
`runner.go` forwards into `PlanRequest.user_secrets` on
`provisioner.proto`; the Terraform provisioner encodes each as
`CODER_SECRET_ENV_<name>` or `CODER_SECRET_FILE_<hex(path)>` before
invoking `terraform plan`. Only plan requests carry secrets; apply runs
with `nil` because values are baked into plan state.

Fetch is gated on a workspace transitioning to start. stop and delete
transitions never carry secrets, so revoking or deleting a stored secret
cannot make a workspace unstoppable. DB errors on the fetch fail the job
outright rather than silently continuing with an empty secret set.

Note that user secrets will be stored in the workspace_builds table in
provisioner_state with other Terraform state (including other sensitive data).
2026-04-27 08:26:07 -06:00
Cian Johnston a876287d36 feat: auto-archive inactive chats with audit trail (#24642)
Adds a background job in `dbpurge` that periodically archives chats
inactive beyond a configurable threshold. Each archived root chat gets a
background audit entry tagged `chat_auto_archive`. Disabled by default.

* New `AutoArchiveInactiveChats` SQL query with LATERAL last-activity
subquery and partial index on archive candidates
* `site_configs`-backed `auto_archive_days` setting with admin-only PUT,
any-authenticated-user GET
* Cascade archive via `root_chat_id`; pinned chats and active threads
exempt
* Root-only audit dispatch on detached context, matching manual archive
(`patchChat`) behavior
* 11 subtests covering disabled no-op, boundary, deleted messages, child
activity, pinned exemption, multi-owner, idempotency, and batch
pagination

PR #24643 adds per-owner digest notifications.
PR #24704 adds the requisite UI controls.

> 🤖
2026-04-24 14:18:28 +01:00
Danielle Maywood 3a9a60dff8 feat: add collapsible thinking blocks with configurable display mode (#24635) 2026-04-24 11:29:08 +00:00
Michael Suchacz 3d90546aae feat: add general subagent model override (#24610)
Adds a deployment-wide admin override for general delegated subagents.

## What changed
- store the general override in `site_configs` and expose it through the
shared `agent-model-override/{context}` API
- apply the general override when spawning delegated general subagents,
while preserving the existing Explore override behavior
- reuse a shared Agents settings form for the general and Explore
override sections

## Validation
- `make gen`
- `go test ./coderd -run 'TestChatModelOverrides'`
- `go test ./coderd/x/chatd -run
'TestSpawnAgent_(GeneralUsesConfiguredModelOverride|GeneralOverrideLogsAndFallsBackWhenCredentialsUnavailable|GeneralOverrideLogsAndFallsBackWhenProviderDisabled)'`
- `pnpm -C site lint:types`
- `pnpm -C site test:storybook --
AgentSettingsAgentsPageView.stories.tsx`
- `make lint`
- `make pre-commit`

> Mux is acting on Mike's behalf.
2026-04-24 12:37:20 +02:00
Cian Johnston b5a625549e feat: migrate agents-access to org-scoped system role for proper chat RBAC (#24438)
The agents-access role previously granted chat permissions at user
scope, but chats are org-scoped objects. Rego skips user-level perms
when org_owner is set, making the grants invisible. Handler-level
band-aids used synthetic non-org-scoped objects as a workaround.

  - Migrates agents-access from users.rbac_roles (site-level) to
    organization_members.roles (org-scoped) via DB migration
  - Redefines agents-access as a predefined org-scoped builtin role
    alongside organization-admin, organization-auditor, etc., with
    Member permissions granting chat create/read/update
  - Excludes ResourceChat from OrgMemberPermissions so org membership
    alone no longer grants chat access
  - Fixes handler Authorize checks to use org-scoped objects with
semantically correct actions (ActionUpdate for message/tool operations)
  - Grants org admins the ability to assign agents-access

Closes #24250
Fixes CODAGT-174

Note: this does not update the "Usage" endpoints. Tracked by CODAGT-161.
> 🤖
2026-04-23 17:59:42 +01:00
Marcin Tojek ec91ac5427 fix: grant AsAIBridged ResourceSystem.ActionCreate for UpsertAISeatState (#24603)
Related coder/internal#1444
2026-04-22 16:38:57 +02:00
Ethan ad1906589d fix(coderd): allow deleting chat providers used in historical chats (#24568)
Drop the `chat_model_configs.provider -> chat_providers.provider`
foreign key and soft-delete model configs when their provider is
removed. The provider row is now hard-deleted inside a transaction that
also tombstones its model configs and promotes a replacement default
when needed.

Historical chats and messages keep pointing at the soft-deleted model
config rows, which are hidden from live/admin queries but still resolve
for read. The runtime chat path already falls back to the default model
config when a soft-deleted config is looked up.

Replaces the lost FK validation in the create/update model-config
handlers with an explicit provider lookup that returns the existing
`Chat provider is not configured.` 400.

## UX

**Admin deleting a chat provider that has historical usage**

- Before: blocked with 400 `Provider models are still referenced by
existing chats.` Admins had no in-product way to remove a provider that
had ever been used.
- After: delete succeeds (204). Any model configs under that provider
are soft-deleted. If the removed provider owned the default model
config, one of the remaining live configs is auto-promoted to the new
default. The promotion is deterministic (`ensureDefaultChatModelConfig`
picks the first live config by `provider ASC, model ASC, updated_at
DESC, id DESC`); there is no picker, and no toast or response detail
names which config became the new default.

**End users with chats that used a deleted provider's model**

- Old chats still open and their history still renders unchanged.
- Sending a new turn in such a chat silently falls back to the current
default model. No banner or warning tells the user the original model is
gone.
- The model picker no longer lists the deleted model.
- If no default model config exists at all after the delete, sending a
new turn fails with `no default chat model config is available`.

**Admin creating or updating a model config against a provider that is
not configured**

- Same as before: 400 `Chat provider is not configured.` Only the
detection mechanism changed (explicit `FOR UPDATE` lookup inside the
transaction, which also serializes against a concurrent provider
delete).

**Admin updating a model config whose row disappears mid-transaction**

- Now returns the standard 404 `Resource not found or you do not have
access to this resource` instead of the previous 500 that leaked `sql:
no rows in result set` in the detail. Unrelated internal races (for
example a race on the promoted default candidate) are still reported as
500 so they are not misclassified as "your target is gone".

Closes CODAGT-23
2026-04-22 19:34:34 +10:00
Jaayden Halko 410f9a5e19 feat: allow renaming of agent chat title (#24489)
Co-authored-by: Coder Agents <noreply@coder.com>
2026-04-20 14:00:46 +01:00
Thomas Kosiewski df7e838c21 feat(coderd): wire debug logging into chat lifecycle (#23917) 2026-04-20 12:27:16 +02:00
Mathias Fredriksson fc2493780f fix: exclude subagent chats from sidebar pagination (#24404)
GetChats now returns only root chats (parent_chat_id IS NULL).
A new GetChildChatsByParentIDs query fetches children for visible
roots and embeds them in each parent's Children field. The
singular getChat endpoint does the same.

Archive invariant is one-way: parent archived implies child
archived. Parent archive/unarchive cascades via root_chat_id.
Individual child archive is permitted; child unarchive while the
parent is archived is rejected atomically (row lock on child,
re-read parent inside the transaction). Embedded children are
filtered by the caller's archive state so individually-archived
children stay hidden from active-parent views.

Gitsync MarkStale uses GetChatsByWorkspaceIDs directly;
MarkStaleParams.OwnerID removed (dead after the switch).

Frontend: buildChatTree reads from the embedded children field,
WebSocket handlers route child events into the parent's children
array, and archiving a child strips it from the parent cache.
2026-04-20 13:19:59 +03:00
Spike Curtis e19b21b7d5 chore: add GetLatestWorkspaceBuildWithStatusByWorkspaceID query (#24441)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->

relates to GRU-18  
  
Adds new database query supporting the Agent Connection Watch we will add.
2026-04-17 22:47:08 -04:00
Thomas Kosiewski 91f9de27a1 feat(coderd): add chat debug service and summary aggregation (#23916) 2026-04-17 16:27:53 +02:00
Dean Sheather 4ba74dcdc8 feat(coderd): add PR status summary to telemetry snapshots (#24379)
Adds aggregate PR counts (total, open, merged, closed) from
`chat_diff_statuses` to telemetry snapshots, giving visibility into AI
agent PR outcomes across deployments.

The existing telemetry system reports `Chats`, `ChatMessageSummaries`,
and `ChatModelConfigs`, but had no PR-level data. This adds a
`ChatDiffStatusSummary` field to the `Snapshot` struct with four
all-time counts derived from a single aggregate query.

<details>
<summary>Implementation details</summary>

- New SQL query `GetChatDiffStatusSummary` counts `chat_diff_statuses`
rows with non-NULL `pull_request_state`, grouped by state
(open/merged/closed).
- `ChatDiffStatusSummary` struct added to telemetry `Snapshot`,
collected via a parallel `eg.Go()` block in `createSnapshot()`.
- `dbauthz` wrapper uses `rbac.ResourceSystem` (telemetry-only pattern).
- Test covers both empty state (zero counts) and populated state (mixed
states + NULL-state exclusion).

</details>

> 🤖 Generated by Coder Agents
2026-04-17 21:56:11 +10:00
Michael Suchacz 73b5058923 feat: add Explore mode as subagent-only modality (#24448)
> This PR was authored by Mux on behalf of Mike.

Introduce Explore mode, a read-only subagent modality for delegated
discovery and code investigation.

## What

Adds a `spawn_explore_agent` tool that creates child chats restricted to
read-only operations. An admin can optionally configure a
deployment-wide
model override so Explore subagents use a model optimized for large
context
or reasoning without changing the root chat's model.

### Backend

- New `ChatModeExplore` enum value (migration 000471).
- `spawn_explore_agent` tool definition with read-only allowlist:
`read_file`, `execute`, `process_output`, `read_skill`,
`read_skill_file`.
  Write tools, file editors, and nested subagent spawning are blocked.
- Deployment config storage for the Explore model override
  (`agents_chat_explore_model_override` in `site_configs`).
- Model resolution hierarchy: configured override, then current turn
model,
then global default. Silent fallback with warning log when the override
  becomes unavailable.
- RBAC: `AsChatd` for daemon reads, `ActionRead` and `ActionUpdate` on
  `ResourceDeploymentConfig` for admin API calls.
- Plan mode root chats can use `spawn_explore_agent` for read-only
research,
  matching the planning prompt guidance.
- The Explore override config API now reports malformed saved overrides
as
  "treated as unset" so admins can clear them explicitly.

### Frontend

- `ExploreModelOverrideSettings` component in admin agent behavior
settings.
  Uses `ModelSelector`, handles unavailable model warnings, and supports
  explicit Save and Clear actions.
- Malformed saved overrides show a warning and require an explicit Save
to
  clear, instead of Clear auto-submitting behind the scenes.

### Tests

- Integration: `TestExploreSubagentIsReadOnly` (full spawn flow, tool
  verification, prompt overlay, DB state).
- Unit: tool allowlist tests for explore, plan, and default modes.
- Internal: model override resolution with valid, invalid UUID,
disabled, and
  unconfigured override scenarios.
- RBAC: `dbauthz_test.go` for `GetChatExploreModelOverride` and
  `UpsertChatExploreModelOverride`.
- API: admin set and clear, malformed stored override reporting,
disabled
  model rejection, non-admin denial.
2026-04-17 13:40:17 +02:00
Michael Suchacz 1092093e98 feat: add internal subagent model override wiring (#24399)
> Mux working on behalf of Mike.

## Summary
- add an enabled chat model config lookup by ID for internal callers
- keep `spawn_agent` unchanged while threading an internal model
override through child subagent chat creation
- extend chatd coverage for inherited bindings, plan mode, and internal
override behavior

## Validation
- `go test ./coderd/x/chatd ./coderd/database/dbauthz`
- `make lint`
2026-04-16 17:08:02 +02:00
Dean Sheather 3452ab3166 chore: add client_type field to chats and telemetry (#24342)
Add a `chat_client_type` enum (`ui` | `api`) and `client_type` column to
the `chats` table. The column defaults to `api` for new rows so API
callers don't need to set it explicitly. Existing rows are backfilled to
`ui`.

The field flows through `CreateChatRequest`, `chatd.CreateOptions`,
`InsertChat`, and is returned in the `Chat` response via `db2sdk`.

<details>
<summary>Implementation notes (Coder Agents generated)</summary>

### Changes

**Database migration (000469)**
- New enum `chat_client_type` with values `ui`, `api`.
- New `client_type` column, `NOT NULL DEFAULT 'api'`.
- Backfill: `UPDATE chats SET client_type = 'ui'`.

**SQL query** — `InsertChat` now includes `client_type`.

**SDK** — `ChatClientType` type added; `ClientType` field added to both
`CreateChatRequest` (optional, defaults server-side to `api`) and `Chat`
response.

**Handler** — `postChats` maps the request field (defaulting to `api`)
and passes it through `chatd.CreateOptions`.

**Sub-agent** — Child chats inherit their parent's `client_type`.

**db2sdk** — Maps the database value to the SDK type.

### Decision log
- Default is `api` (not `ui`) so existing API integrations get the
correct value without code changes.
- Backfill sets existing rows to `ui` per requirement.
- Child chats inherit `client_type` from parent rather than defaulting.
</details>
2026-04-16 23:57:05 +10:00
Michael Suchacz e5707a13d6 feat: support multiple agents with shared instance-identity auth (#24325)
> This PR was authored by Mux on behalf of Mike.

## Summary

Adds support for multiple peer root workspace agents sharing the same
`auth_instance_id`, so AWS, Azure, and GCP instance-identity auth can
issue the correct session token for a selected agent instead of assuming
a
single root agent per instance.

## Problem

When a Terraform template attaches two or more `coder_agent` resources
(with `auth = "aws-instance-identity"`) to a single compute instance,
every agent shares the same cloud instance ID. The existing singular
lookup picks whichever agent was created most recently, silently
ignoring
the others.

## Solution

Introduce an optional pre-auth agent selector (`CODER_AGENT_NAME`) and
make the server-side lookup ambiguity-aware.

**Database layer:**
- `GetWorkspaceAgentsByInstanceID` (`:many`): returns all matching root
  agents for an instance ID.
- `GetWorkspaceAgentByInstanceIDAndName` (`:one`): returns the named
root
  agent for disambiguation.

**SDK and CLI:**
- `agent_name` field added to AWS, Azure, and GCP request structs
  (`omitempty` for backward compatibility).
- `CODER_AGENT_NAME` env var and `--agent-name` flag wired into the
agent
  bootstrap before instance-identity auth runs.

**Server handler (`handleAuthInstanceID`):**
- When `agent_name` is present: direct lookup by (instance ID, name).
- When absent: legacy lookup, then resource-scoped ambiguity check.
  Returns 409 with available agent names if multiple root agents match.
- Whitespace-only names are trimmed and treated as unspecified.
- Sub-agents remain excluded (`parent_id IS NULL` filter).

**Verification template:**
- `examples/templates/aws-multi-agent/` provisions one EC2 instance with
  two agents (`main` and `dev`), both using instance-identity auth with
  `CODER_AGENT_NAME` set in the cloud-init user data.

## Backward compatibility

Existing single-agent deployments work unchanged. The `agent_name` field
is optional with `omitempty`, and the unnamed path preserves today's
behavior when only one root agent matches.
2026-04-16 13:59:09 +02:00
Michael Suchacz 1cf0354f72 feat: add plan mode with restricted tool boundary (#24236)
> This PR was authored by Mux on behalf of Mike.

## Summary
- add persistent plan mode for chats and the chat-specific plan file
flow
- add structured planning tools such as `ask_user_question` and
`propose_plan`
- keep `write_file` and `edit_files` constrained to the chat-specific
plan file during plan turns
- allow shell exploration in plan mode, including subagents, via
`execute` and `process_output`
- block implementation-oriented, provider-native, MCP, dynamic, and
computer-use tools during plan turns
- update the chat UI, tests, and docs for the new planning flow
2026-04-16 11:12:01 +02:00
Cian Johnston c552f9f281 fix: stop group spend limits from leaking across org boundaries (#24294)
Three SQL queries (`GetUserGroupSpendLimit`,
`ResolveUserChatSpendLimit`, `GetUserChatSpendInPeriod`) aggregated chat
spend limits and usage globally across all organizations. A restrictive
group limit in org A would bleed into org B.

## Changes

- Add `organization_id` parameter to all three SQL queries in
`coderd/database/queries/chats.sql`
- When nil UUID is passed, queries fall back to global behavior
(backward compat for HTTP dashboard endpoints)
- When real org ID is passed, limits and spend are scoped to that
organization
- Thread `organizationID` through `ResolveUsageLimitStatus` →
`checkUsageLimit` → all chatd call sites
- Update dbauthz wrappers for new param structs
- HTTP endpoints (`chatCostSummary`, `getMyChatUsageLimitStatus`) pass
`uuid.Nil` with TODO for future org-scoped UI
- Add `TestResolveUsageLimitStatus_OrgScoped` with 5 test cases covering
org isolation, nil-UUID fallback, spend scoping, and user override
priority

Closes coder/internal#1466

> 🤖
2026-04-14 16:56:17 +01:00
Thomas Kosiewski 6ab30123bf feat: add chat debug log tables, queries, and SDK types (#23913) 2026-04-13 15:06:06 +02:00
Cian Johnston 22062ec52e feat: add organization scoping to chats (#23827)
Fixes https://github.com/coder/internal/issues/1436

* Adds organization_id to chats with backfill (workspace org → user org membership → default org)
* No support yet for ACLs (follow-up issue)
- Cross-org workspace binding rejected (both in `CreateChatRequest` and in `create_workspace` tool
- Adds `OrganizationAutocomplete` to `AgentCreateForm`
- Docs updated with `organization_id` in chats-api.md

> 🤖 Written by a Coder Agent. Reviewed by many humans and many agents.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2026-04-13 12:31:25 +01:00
Matt Vollmer 36141fafad feat: stack insights tables vertically and paginate Pull requests table (#24198)
The "By model" and "Pull requests" tables on the PR Insights page
(`/agents/settings/insights`) were side-by-side at `lg` breakpoints, and
the Pull requests table was hard-capped at 20 rows by the backend.

- Replaced `lg:grid-cols-2` with a single-column stacked layout so both
tables span the full content width.
- Removed the `LIMIT 20` from the `GetPRInsightsRecentPRs` SQL query so
all PRs in the selected time range are returned.
- Can add this back if we need it. If we do, we should add a little
subheader above this table to indicate that we're not showing all PRs
within the selected timeframe.
- Added client-side pagination to the Pull requests table using
`PaginationWidgetBase` (page size 10), matching the existing pattern in
`ChatCostSummaryView`.
- Renamed the section heading from "Recent" to "Pull requests" since it
now shows the full set for the time range.
<img width="1481" height="1817" alt="image"
src="https://github.com/user-attachments/assets/0066c42f-4d7b-4cee-b64b-6680848edc68"
/>


> 🤖 PR generated with Coder Agents
2026-04-10 10:48:54 -04:00
Garrett Delfosse 3462c31f43 fix: update directory for terraform-managed subagents (#24220)
When a devcontainer subagent is terraform-managed, the provisioner sets
its directory to the host-side `workspace_folder` path at build time. At
runtime, the agent injection code determines the correct
container-internal
path from `devcontainer read-configuration` and sends it via
`CreateSubAgent`.

However, the `CreateSubAgent` handler only updated `display_apps` for
pre-existing agents, ignoring the `Directory` field. This caused
SSH/terminal
sessions to land in `~` instead of the workspace folder (e.g.
`/workspaces/foo`).

Add `UpdateWorkspaceAgentDirectoryByID` query and call it in the
terraform-managed subagent update path to also persist the directory.

Fixes PLAT-118

<details><summary>Root cause analysis</summary>

Two code paths set the subagent `Directory` field:

1. **Provisioner (build time):** `insertDevcontainerSubagent` in
`provisionerdserver.go`
   stores `dc.GetWorkspaceFolder()` — the **host-side** path from the
   `coder_devcontainer` Terraform resource (e.g. `/home/coder/project`).

2. **Agent injection (runtime):**
`maybeInjectSubAgentIntoContainerLocked` in
`api.go` reads the devcontainer config and gets the correct
**container-internal**
path (e.g. `/workspaces/project`), then calls `client.Create(ctx,
subAgentConfig)`.

For terraform-managed subagents (those with `req.Id != nil`),
`CreateSubAgent`
in `coderd/agentapi/subagent.go` recognized the pre-existing agent and
entered
the update path — but only called `UpdateWorkspaceAgentDisplayAppsByID`,
discarding the `Directory` field from the request. The agent kept the
stale
host-side path, which doesn't exist inside the container, causing
`expandPathToAbs` to fall back to `~`.

</details>

> [!NOTE]
> Generated by Coder Agents
2026-04-10 10:11:22 -04:00
Zach 95cff8c5fb feat: add REST API handlers and client methods for user secrets (#24107)
Add the five REST endpoints for managing user secrets, SDK client
methods, and handler tests.

Endpoints:
- `POST /api/v2/users/{user}/secrets`
- `GET /api/v2/users/{user}/secrets`
- `GET /api/v2/users/{user}/secrets/{name}`
- `PATCH /api/v2/users/{user}/secrets/{name}`
- `DELETE /api/v2/users/{user}/secrets/{name}`

Routes are registered under the existing `/{user}` group with
`ExtractUserParam`. The delete query was changed from `:exec` to
`:execrows` so the handler can distinguish "not found" from success
(DELETE with `:exec` silently returns nil for zero affected rows).
2026-04-09 12:12:55 -06:00
Kyle Carberry 391b22aef7 feat: add CLI commands for managing chat context from workspaces (#24105)
Adds `coder exp chat context add` and `coder exp chat context clear`
commands that run inside a workspace to manage chat context files via
the agent token.

`add` reads instruction and skill files from a directory (defaulting to
cwd) and inserts them as context-file messages into an active chat.
Multiple calls are additive — `instructionFromContextFiles` already
accumulates all context-file parts across messages.

`clear` soft-deletes all context-file messages, causing
`contextFileAgentID()` to return `!found` on the next turn, which
triggers `needsInstructionPersist=true` and re-fetches defaults from the
agent.

Both commands auto-detect the target chat via `CODER_CHAT_ID` (already
set by `agentproc` on chat-spawned processes), or fall back to
single-active-chat resolution for the agent. The `--chat` flag overrides
both.

Also adds sub-agent context inheritance: `createChildSubagentChat` now
copies parent context-file messages to child chats at spawn time, so
delegated sub-agents share the same instruction context without
independently re-fetching from the workspace agent.

<details><summary>Implementation details</summary>

**New files:**
- `cli/exp_chat.go` — CLI command tree under `coder exp chat context`

**Modified files:**
- `agent/agentcontextconfig/api.go` — `ConfigFromDir()` reads context
from an arbitrary directory without env vars
- `codersdk/agentsdk/agentsdk.go` — `AddChatContext`/`ClearChatContext`
SDK methods
- `coderd/workspaceagents.go` — POST/DELETE handlers on
`/workspaceagents/me/chat-context`
- `coderd/coderd.go` — Route registration
- `coderd/database/queries/chats.sql` — `GetActiveChatsByAgentID`,
`SoftDeleteContextFileMessages`
- `coderd/database/dbauthz/dbauthz.go` — RBAC implementations for new
queries
- `coderd/x/chatd/subagent.go` — `copyParentContextFiles` for sub-agent
inheritance
- `cli/root.go` — Register `chatCommand()` in `AGPLExperimental()`

**Auth pattern:** Uses `AgentAuth` (same as `coder external-auth`) —
agent token via `CODER_AGENT_TOKEN` + `CODER_AGENT_URL` env vars.

</details>

> 🤖 Generated by Coder Agents

---------

Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>
2026-04-09 16:33:00 +02:00
Kyle Carberry c5d720f73d feat(coderd): add telemetry for agents chats and messages (#24068)
Adds telemetry collection for the agents chat system (`/agents`) to the
existing telemetry snapshot pipeline.

Three new snapshot fields:
- **`Chats`** — per-chat metadata (id, owner, status, mode,
workspace_id, root_chat_id, has_parent, archived, model config)
collected time-windowed via `createdAfter`
- **`ChatMessageSummaries`** — per-chat aggregated message metrics
(counts by role, token sums by type, cost, runtime, model count,
compression count) collected time-windowed
- **`ChatModelConfigs`** — model configuration metadata (provider,
model, context limit, enabled, default) collected as full dump

No PII is included — titles, message content, and URLs are excluded at
the SQL level. Only structural metadata flows through telemetry.

<details><summary>Implementation plan</summary>

### SQL Queries (`coderd/database/queries/chats.sql`)
- `GetChatsCreatedAfter` — time-windowed chat metadata
- `GetChatMessageSummariesPerChat` — per-chat message aggregates via
`GROUP BY`
- `GetChatModelConfigsForTelemetry` — full dump of model configs

### Telemetry (`coderd/telemetry/telemetry.go`)
- `Chat`, `ChatMessageSummary`, `ChatModelConfig` structs
- `ConvertChat`, `ConvertChatMessageSummary`, `ConvertChatModelConfig`
conversion functions
- Three `eg.Go()` blocks in `createSnapshot()` following the existing
collection pattern

### Authorization (`coderd/database/dbauthz/dbauthz.go`)
- System-only access for all three queries via `rbac.ResourceSystem`

### Tests
- `TestChatsTelemetry` in `coderd/telemetry/telemetry_test.go` — creates
chats (root + child), messages with token/cost data, model configs;
verifies all snapshot fields
- dbauthz test entries for all three queries in
`coderd/database/dbauthz/dbauthz_test.go`

</details>

> 🤖 Generated by Coder Agents
2026-04-08 09:47:44 -04:00
Cian Johnston 233343c010 feat: add chat and chat_files cleanup to dbpurge (#23833)
Fixes https://github.com/coder/coder/issues/23910

Adds periodic cleanup of chats and chat files to the dbpurge background
goroutine, with a configurable retention period exposed in the Agent
settings UI.

> 🤖 Written by a Coder Agent. Reviewed by a human.
2026-04-08 11:08:09 +01:00
Zach 565a15bc9b feat: update user secrets queries for REST API and injection (#23998)
Update queries as prep work for user secrets API development:
- Switch all lookups and mutations from ID-based to user_id + name
- Split list query into metadata-only (for API responses) and
with-values (for provisioner/agent)
- Add partial update support using CASE WHEN pattern for write-only
value fields
- Include value_key_id in create for dbcrypt encryption support
- Update dbauthz wrappers and remove stale methods from dbmetrics
2026-04-07 09:03:28 -06:00
Kyle Carberry 684f21740d perf(coderd): batch chat heartbeat queries into single UPDATE per interval (#24037)
## Summary

Replaces N per-chat heartbeat goroutines with a single centralized
heartbeat loop that issues one `UPDATE` per 30s interval for all running
chats on a worker.

## Problem

Each running chat spawned a dedicated goroutine that issued an
individual `UPDATE chats SET heartbeat_at = NOW() WHERE id = $1 AND
worker_id = $2 AND status = 'running'` query every 30 seconds. At 10,000
concurrent chats this produces **~333 DB queries/second** just for
heartbeats, plus ~333 `ActivityBumpWorkspace` CTE queries/second from
`trackWorkspaceUsage`.

## Solution

New `UpdateChatHeartbeats` (plural) SQL query replaces the old singular
`UpdateChatHeartbeat`:

```sql
UPDATE chats
SET    heartbeat_at = @now::timestamptz
WHERE  worker_id = @worker_id::uuid
  AND  status = 'running'::chat_status
RETURNING id;
```

A single `heartbeatLoop` goroutine on the `Server`:
1. Ticks every `chatHeartbeatInterval` (30s)
2. Issues one batch UPDATE for all registered chats
3. Detects stolen/completed chats via set-difference (equivalent of old
`rows == 0`)
4. Calls `trackWorkspaceUsage` for surviving chats

`processChat` registers an entry in the heartbeat registry instead of
spawning a goroutine.

## Impact

| Metric | Before (10K chats) | After (10K chats) |
|---|---|---|
| Heartbeat queries/sec | ~333 | ~0.03 (1 per 30s per replica) |
| Heartbeat goroutines | 10,000 | 1 |
| Self-interrupt detection | Per-chat `rows==0` | Batch set-difference |

---

> 🤖 Generated by Coder Agents

<details><summary>Implementation notes</summary>

- Uses `@now` parameter instead of `NOW()` so tests with `quartz.Mock`
can control timestamps.
- `heartbeatEntry` stores `context.CancelCauseFunc` + workspace state
for the centralized loop.
- `recoverStaleChats` is unaffected — it reads `heartbeat_at` which is
still updated.
- The old singular `UpdateChatHeartbeat` is removed entirely.
- `dbauthz` wrapper uses system-level `rbac.ResourceChat` authorization
(same pattern as `AcquireChats`).

</details>
2026-04-07 10:25:46 -04:00
Cian Johnston d5a1792f07 feat: track chat file associations with chat_file_links on chats (#23537)
Needed by #23833

Adds a `chat_file_links` association table to track which files are
associated with each chat.

- `AppendChatFileIDs` query links a file to a chat with deduplication
- `GetChatFileMetadataByIDs` query returns lightweight file metadata by
IDs
- Tool-created files (e.g. `propose_plan`) are linked to the chat after
insert
- User-uploaded files are linked to the chat when the referencing
message is sent
- Single-chat GET endpoint hydrates `files: ChatFileMetadata[]` on the
response

> 🤖 Created by Coder Agents and massaged into shape by a human.
2026-04-07 12:05:29 +01:00
Jon Ayers a1d51f0dab feat: batch connection logs to avoid DB lock contention (#23727)
- Running 30k connections was generating a ton of lock contention in the
DB
2026-04-03 15:47:26 -05:00
Jon Ayers 333503f74e feat: improve coordinator peer mapping performance (#23696)
- Skipping DB querying entirely for peers that aren't actually connected
to our coordinator
- Opportunistically batching the queries for peers
2026-04-03 14:22:58 -05:00
Michael Suchacz 7d0a0c6495 feat: provider key policies and user provider settings (#23751) 2026-04-02 19:46:42 +02:00
Ethan 7757cd8e08 refactor(coderd/x/chatd): insert chats directly as pending on creation (#23888)
Previously, `CreateChat` inserted the `chats` row with the DB default
status (`waiting`), then updated it to `pending` in the same transaction
via `setChatPendingWithStore`. This wasted two extra queries per chat
creation (`GetChatByID` + `UpdateChatStatus`) and rewrote the same row
immediately after inserting it.

Now `CreateChat` passes the status directly to `InsertChat`, so the row
is written once in its final create-time state. The
`setChatPendingWithStore` helper is removed entirely. `InsertChat` now
requires an explicit `status` parameter at all callsites instead of
relying on a DB column default.

## Motivation

On an experimental branch we're trialing firing all chatd notifications
from plpgsql triggers. The old two-step insert made that awkward: in an
`AFTER INSERT` trigger, `NEW` only contained the insert-time row
(`waiting`), not the final committed state (`pending`). To emit the
correct event payload the trigger had to be deferred and re-read the row
from `chats` at commit time.

With this change, `NEW` already contains the correct row to publish — no
deferred trigger, no extra `SELECT`, simpler and cheaper trigger logic.

That said, this seems like a worthwhile change regardless of the trigger
experiment: writing the final row state once removes unnecessary DB work
on every chat creation and makes the create path easier to reason about.
2026-04-02 14:13:51 +11:00
Ethan 5cba59af79 fix(coderd): unarchive child chats with parents (#23761)
Unarchiving a root chat now restores descendant chats in the database
and emits lifecycle events for every affected chat so passive sessions
converge without a full refetch.

This keeps archive and unarchive symmetric at both the data and
watch-stream layers by returning the affected chat family from the
database, using those post-update rows for chatd pubsub fanout, and
covering descendant lifecycle delivery with a watch-level regression
test.

Closes #23666
2026-04-01 15:30:25 +11:00
Cian Johnston 3ce82bb885 feat: add chat-access site-wide role to gate chat creation (#23724)
- Add `chat-access` built-in role granting chat CRUD at User scope
- Exclude `ResourceChat` from member, org member, and org service
account `allPermsExcept` calls
- Allow system, owner, and user-admin to assign the new role
- Migration auto-assigns role to users who have ever created a chat
- Update RBAC test matrix: `memberMe` denied, `chatAccessUser` allowed

**Breaking change**: Members without `chat-access` lose chat creation
ability. Migration covers existing chat creators. Members who have never
created a chat do not get this role automatically applied.

> 🤖 This PR was created by a Coder Agent and reviewed by me.
2026-03-31 10:07:21 +01:00
Kyle Carberry a5cc579453 feat: add last_injected_context column to chats table (#23798)
Adds a nullable JSONB column `last_injected_context` to the `chats`
table that stores the most recently persisted injected context parts
(AGENTS.md context-file and skill message parts). The column is updated
only when `persistInstructionFiles()` runs — on first workspace attach
or when the agent changes — so there are no redundant writes on
subsequent turns.

Internal fields (`ContextFileContent`, `ContextFileOS`,
`ContextFileDirectory`, `SkillDir`) are stripped at write time so the
column only holds small metadata. No stripping needed on the read path.

<details>
<summary>Implementation notes</summary>

- New migration `000456` adds nullable `last_injected_context JSONB`
column.
- New SQL query `UpdateChatLastInjectedContext` writes the column
without touching `updated_at`.
- `persistInstructionFiles()` strips internal fields from parts via
`StripInternal()` before persisting.
- Sentinel path (no AGENTS.md) persists skill-only parts when skills
exist.
- `codersdk.Chat` exposes `LastInjectedContext []ChatMessagePart`
(omitempty).
- `db2sdk.Chat()` passes through the already-clean data.

</details>
2026-03-30 14:11:30 -04:00
Jake Howell 71a492a374 feat: implement <ClientFilter /> to AI Bridge request logs (#22694)
Closes #22136

This pull-request implements a `<ClientFilter />` to our `Request Logs`
page for AI Bridge. This will allow the user to select a client which
they wish to filter against. Technically the backend is able to actually
filter against multiple clients at once however the frontend doesn't
currently have a nice way of supporting this (future improvement).

<img width="1447" height="831" alt="image"
src="https://github.com/user-attachments/assets/0be234e2-25f2-4a89-b971-d74817395da1"
/>

---------

Co-authored-by: Jeremy Ruppel <jeremy.ruppel@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 17:18:28 -04:00
Kyle Carberry bcdc35ee3e feat: add chat read/unread indicator to sidebar (#23129)
## Summary

Adds read/unread tracking for chats so users can see which agent
conversations have new assistant messages they haven't viewed.

## Backend Changes

- Adds `last_read_message_id` column to the `chats` table (migration
000439).
- Computes `has_unread` as a virtual column in `GetChatsByOwnerID` using
an `EXISTS` subquery checking for assistant messages beyond the read
cursor.
- Exposes `has_unread` on the `codersdk.Chat` struct and auto-generated
TypeScript types.
- Updates `last_read_message_id` on stream connect/disconnect in
`streamChat`, avoiding per-message API calls during active streaming.
- Uses `context.WithoutCancel` for the deferred disconnect write so the
DB update succeeds even after the client disconnects.

## Frontend Changes

- Bold title (`font-semibold`) for unread chats in the sidebar.
- Small blue dot indicator next to the relative timestamp.
- Suppresses unread indicator for the currently active chat via
`isActive` from NavLink.

## Design Decisions

- Only `assistant` messages count as unread — the user's own messages
don't trigger the indicator.
- No foreign key on `last_read_message_id` since messages can be deleted
(via rollback/truncation) and the column is just a high-water mark.
- Zero API calls during streaming: exactly 2 DB writes per stream
session (connect + disconnect).
- Unread state refreshes on chat list load and window focus. The
`watchChats` WebSocket optimistically marks non-active chats as unread
on `status_change` events, but does not carry a server-computed
`has_unread` field. Navigating to a chat optimistically clears its
unread indicator in the cache.
2026-03-27 12:15:04 -04:00
Michael Suchacz 2312e5c428 feat: add manual chat title regeneration (#23633)
## Summary

Adds a "Generate new title" action that lets users manually regenerate a
chat's title using richer conversation context than the automatic
first-message title path.

## Changes

### Backend
- **New endpoint:** `POST
/api/experimental/chats/{chatID}/title/regenerate` returns the updated
Chat with a regenerated title
- **Manual title algorithm:** Extracts useful user/assistant text turns
→ selects first user turn + last 3 turns → builds context with gap
markers → renders prompt with anti-recency guidance → calls lightweight
model → normalizes output
- **Helpers:** `extractManualTitleTurns`,
`selectManualTitleTurnIndexes`, `buildManualTitleContext`,
`renderManualTitlePrompt`, `generateManualTitle` — all private, with the
public `Server.RegenerateChatTitle` method
- **SDK:** `ExperimentalClient.RegenerateChatTitle(ctx, chatID) (Chat,
error)`
- Persists title via existing `UpdateChatByID` and broadcasts
`ChatEventKindTitleChange`

### Frontend
- API client method + React Query mutation with cache invalidation
- "Generate new title" menu item (with wand icon) in both TopBar and
Sidebar dropdown menus
- Loading/disabled state while regeneration is in-flight
- Error toast on failure
- Stories updated for both menus

### Tests
- `quickgen_test.go`: Table-driven tests for all 4 helper functions
(turn extraction, index selection, context building, prompt rendering)
- `exp_chats_test.go`: Handler tests (ChatNotFound,
NotFoundForDifferentUser, NoDaemon)

## Design notes
- The existing auto-title path (`maybeGenerateChatTitle`, `titleInput`)
is completely unchanged
- Manual regeneration uses richer context (first user turn + last 3
turns + gap markers) vs the auto path's single first message
- Endpoint is experimental and marked with `@x-apidocgen {"skip": true}`
2026-03-27 01:47:19 +01:00
Matt Vollmer 113aaa79a0 feat: add pinned chats with drag-to-reorder (#23615)
https://github.com/user-attachments/assets/bd5d12a1-61b3-4b7d-83b6-317bdfb60b3c

## Summary

Adds pinned chats to the agents page sidebar with server-side
persistence and drag-to-reorder. Users can pin/unpin chats via the
context menu, and pinned chats appear in a dedicated "Pinned" section
above the time-grouped list.

## Database

Migration `000453_chat_pin_order`: adds `pin_order integer DEFAULT 0 NOT
NULL` column on `chats` (0 = unpinned, 1+ = pinned in display order).
Three SQL queries handle pin operations server-side using CTEs with
`ROW_NUMBER()`:

- `PinChatByID`: normalizes existing orders and appends to end
- `UnpinChatByID`: sets target to 0 and compacts remaining pins
- `UpdateChatPinOrder`: shifts neighbors, clamps to `[1, pinned_count]`

All queries exclude archived chats. `ArchiveChatByID` clears `pin_order`
on archive. The handler rejects pinning archived chats with 400.

## Backend

Pin/unpin/reorder go through the existing `PATCH
/api/experimental/chats/{chat}` via the `pin_order` field on
`UpdateChatRequest`. The handler routes based on current pin state:
`pin_order == 0` unpins, `> 0` on an already-pinned chat reorders, `> 0`
on an unpinned chat appends to end.

## Frontend

- `pinChat` / `unpinChat` / `reorderPinnedChat` optimistic mutations
using shared `isChatListQuery` predicate
- Sidebar renders Pinned section above time groups, excludes pinned
chats from time groups
- Pin/Unpin context menu items (hidden for child/delegated chats)
- `@dnd-kit/core` + `@dnd-kit/sortable` for drag-to-reorder with
`MouseSensor`, `TouchSensor`, and `KeyboardSensor`
- Local pin-order override prevents flash on drop; click blocker
prevents NavLink navigation after drag

---
*PR generated with Coder Agents*
2026-03-26 16:52:02 -04:00
Danny Kopping 801e57d430 feat: session detail API (#23203) 2026-03-26 18:09:53 +02:00
Michael Suchacz 4f063cdc47 feat: separate default and additional Coder Agents system prompts (#23616)
Admins can now control whether the built-in Coder Agents default system
prompt is prepended to their custom instructions, rather than having the
custom prompt silently replace the default.

**Changes:**
- New `include_default_system_prompt` boolean toggle (defaults to `true`
for existing deployments) stored as a site config key — no migration
needed.
- GET `/api/experimental/chats/config/system-prompt` returns the toggle
state, the custom prompt, and a preview of the built-in default.
- PUT persists both the toggle and custom prompt atomically in a single
transaction.
- `resolvedChatSystemPrompt()` composes `[default?, custom?]` joined by
`\n\n`, falling back to the built-in default on DB errors.
- Settings UI adds a Switch toggle with conditional helper text and a
"Preview" button that shows the built-in default prompt via the existing
`TextPreviewDialog`.
- Comprehensive test coverage: 15 subtests covering toggle behavior,
prompt composition matrix, auth boundaries, and integration with chat
creation.
2026-03-26 13:32:41 +01:00
Cian Johnston d175e799da feat: show agent badge on workspace list (#23453)
- Adds `GET /api/experimental/chats/by-workspace` endpoint that returns
workspace_id → latest chat_id mapping
- Modifies FE to fetch this alongside the workspace list, gated on
`agents` experiment and render an "Agent" badge similar to the existing
"Task" badge in `WorkspacesTable`
- Badge links to the "latest chat" linked to the given workspace.

Notes:
- Intentionally uses `fetchWithPostFilter` for RBAC to decouple from
workspaces API — will migrate to `workspaces_expanded` view later.
- If users have multiple chats linked to the same workspace, the badge
will link to the most recently updated one.

> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑‍💻
2026-03-26 11:30:12 +00:00
Jaayden Halko 3fb7c6264f feat: display the AI add-on column in the UI on the Users and Organization Members tables (#23291)
## Summary

Adds an entitlement-gated **AI add-on** column to both the **Users**
table and the **Organization Members** table. When
`ai_governance_user_limit` is entitled, each row shows whether the user
is consuming an AI seat.

## Background

The AI governance add-on tracks which users are consuming AI seats.
Admins need visibility into per-user seat consumption directly from the
user management tables. This change surfaces that information through
both the site-wide Users table and the per-organization Members table,
gated behind the `ai_governance_user_limit` entitlement so the column
only appears when the feature is licensed.

## Implementation

### Backend
- **New SQL query** `GetUserAISeatStates`
(`coderd/database/queries/aiseatstate.sql`) — returns user IDs consuming
an AI seat, derived from:
  - Users with entries in `aibridge_interceptions` (AI Bridge usage)
- Users who own workspaces with `has_ai_task = true` builds (AI Tasks
usage)
- **SDK types** — added `has_ai_seat: boolean` to `codersdk.User` and
`codersdk.OrganizationMemberWithUserData`
- **Handler wiring** — both the Users list endpoint (`coderd/users.go`)
and all Members endpoints (`coderd/members.go`) query AI seat state per
page of user IDs and populate the response field
- **dbauthz** — per-user `ActionRead` checks on `ResourceUserObject`

### Frontend
- **Shared `AISeatCell` component**
(`site/src/modules/users/AISeatCell.tsx`) — green `CircleCheck` for
consuming, gray `X` for non-consuming
- **`TableColumnHelpTooltip`** — extended with `ai_addon` variant with
tooltip: *"Users with access to AI features like AI Bridge, Boundary, or
Tasks who are actively consuming a seat."*
- **Column visibility** gated behind
`useFeatureVisibility().ai_governance_user_limit`

## Validation

- Backend: dbauthz full method suite (`TestMethodTestSuite`) passes
including new `GetUserAISeatStates` test
- Backend: `TestGetUsers`, `TestUsersFilter`, CLI golden file tests pass
- Frontend: 7/7 tests pass across `UsersPage.test.tsx` and
`OrganizationMembersPage.test.tsx` (column visibility gating both
directions)
- `go build ./coderd/...` compiles clean
- `pnpm --dir site run lint:types` passes
- `make gen` clean

## Risks

- **Pagination performance**: The AI seat query is scoped to the current
page's user IDs (not a full table scan), keeping it efficient for
paginated views.
- **Semantic scope**: The workspace-side AI seat derivation uses "any
build with `has_ai_task = true`" rather than "latest build only". If the
product intent is latest-build-only, this can be tightened in a
follow-up.

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$27.25`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=27.25 -->
2026-03-26 10:36:40 +00:00
Ethan 61e31ec5cc perf(coderd/x/chatd): persist workspace agent binding across chat turns (#23274)
## Summary

This change removes the steady-state "resolve the latest workspace
agent" query from chat execution.

Instead of asking the database for the latest build's agent on every
turn, a chat now persists the workspace/build/agent binding it actually
uses and reuses that binding across subsequent turns. The common path
becomes "load the bound agent by ID and dial it", with fallback paths to
repair the binding when it is missing, stale, or intentionally changed.

## What changes

- add `workspace_id`, `build_id`, and `agent_id` binding fields to
`chats`
- expose those fields through the chat API / SDK so the execution
context is explicit
- load the persisted binding first in chatd, instead of always resolving
the latest build's agent
- persist a refreshed binding when chatd has to re-resolve the workspace
agent
- keep child / subagent chats on the same bound workspace context by
inheriting the parent binding
- leave `build_id` / `agent_id` unset for flows like `create_workspace`,
then bind them lazily on the next agent-backed turn

## Runtime behavior

The binding is treated as an optimistic cache of the agent a chat should
use:

- if the bound agent still exists and dials successfully, we use it
without a latest-build lookup
- if the bound agent is missing or no longer reachable, chatd
re-resolves against the latest build and persists the new binding
- if a workspace mutation changes the chat's target workspace, the
binding is updated as part of that mutation

To avoid reintroducing a hot-path query, dialing uses lazy validation:

- start dialing the cached agent immediately
- only validate against the latest build if the dial is still pending
after a short delay
- if validation finds a different agent, cancel the stale dial, switch
to the current agent, and persist the repaired binding

## Result

The hot path stops issuing
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` for every user message,
which is the source of the DB pressure this PR is addressing. At the
same time, chats still converge to the correct workspace agent when the
binding becomes stale due to rebuilds or explicit workspace changes.
2026-03-26 17:22:38 +11:00
Kyle Carberry d4660d8a69 feat: add labels to chats (#23594)
## Summary

Adds a general-purpose `map[string]string` label system to chats, stored
as jsonb with a GIN index for efficient containment queries.

This is a standalone foundational feature that will be used by the
upcoming Automations feature for session identity (matching webhook
events to existing chats), replacing the need for bespoke session-key
tables.

## Changes

### Database
- **Migration 000451**: Adds `labels jsonb NOT NULL DEFAULT '{}'` column
to `chats` table with a GIN index (`idx_chats_labels`)
- **`InsertChat`**: Accepts labels on creation via `COALESCE(@labels,
'{}')`
- **`UpdateChatByID`**: Supports partial update —
`COALESCE(sqlc.narg('labels'), labels)` preserves existing labels when
NULL is passed
- **`GetChats`**: New `has_labels` filter using PostgreSQL `@>`
containment operator
- **`GetAuthorizedChats`**: Synced with generated `GetChats` (new column
scan + query param)

### API
- **Create chat** (`POST /chats`): Accepts optional `labels` field,
validated before creation
- **Update chat** (`PATCH /chats/{chat}`): Supports `labels` field for
atomic label replacement
- **List chats** (`GET /chats`): Supports `?label=key:value` query
parameters (multiple are AND-ed)

### SDK
- `Chat`, `CreateChatRequest`, `UpdateChatRequest`, `ListChatsOptions`
all gain `Labels` fields
- `UpdateChatRequest.Labels` is a pointer (`*map[string]string`) so
`nil` means "don't change" vs empty map means "clear all"

### Validation (`coderd/httpapi/labels.go`)
- Max 50 labels per chat
- Key: 1–64 chars, must match `[a-zA-Z0-9][a-zA-Z0-9._/-]*` (supports
namespaced keys like `github.repo`, `automation/pr-number`)
- Value: 1–256 chars
- 13 test cases covering all edge cases

### Chat runtime
- `chatd.CreateOptions` gains `Labels` field, threaded through to
`InsertChat`
- Existing `UpdateChatByID` callers (e.g., quickgen title updates) are
unaffected — NULL labels preserve existing values via COALESCE
2026-03-25 17:26:26 +00:00