Commit Graph

353 Commits

Author SHA1 Message Date
Michael Suchacz c3b6284955 feat: add chat cost analytics backend (#23036)
Add cost tracking for LLM chat interactions with microdollar precision.

## Changes
- Add `chatcost` package for per-message cost calculation using
`shopspring/decimal` for intermediate arithmetic
- **Ceil rounding policy**: fractional micros round UP to next whole
micro (applied once after summing all components)
- Database migration: `total_cost_micros` BIGINT column with historical
backfill and `created_at` index
- API endpoints: per-user cost summary and admin rollup under
`/api/experimental/chats/cost/`
- SDK types: `ChatCostSummary`, `ChatCostModelBreakdown`,
`ChatCostUserRollup`
- Fix `modeloptionsgen` to handle `decimal.Decimal` as opaque numeric
type
- Update frontend pricing test fixtures for string decimal types

## Design decisions
- `NULL` = unpriced (no matching model config), `0` = free
- Reasoning tokens included in output tokens (no double-counting)
- Integer microdollars (BIGINT) for storage and API responses
- Price config uses `decimal.Decimal` for exact parsing; totals use
`int64`

Frontend: #23037
2026-03-13 18:30:49 +01:00
Cian Johnston e9025f91e8 chore(db): remove 23 unused database methods (#22999)
Removes 22 database query methods with no callers outside generated code
and the dbauthz wrapper layer (~1,600 lines).

**Security keys (6)** — superseded by `cryptokeys` package:
`GetAppSecurityKey`, `UpsertAppSecurityKey`, `GetOAuthSigningKey`,
`UpsertOAuthSigningKey`, `GetCoordinatorResumeTokenSigningKey`,
`UpsertCoordinatorResumeTokenSigningKey`

**Superseded queries (4):**
- `GetProvisionerJobsByIDs` → `GetProvisionerJobsByIDsWithQueuePosition`
- `GetDeploymentDAUs` / `GetTemplateDAUs` →
`GetTemplateInsightsByInterval`
- `GetWorkspaceBuildParametersByBuildIDs` + its `GetAuthorized...`
variant → unused

**OAuth2 (2):**
`GetOAuth2ProviderAppByRegistrationToken`,
`UpdateOAuth2ProviderAppSecretByID`

**Chat (4)** — pre-wired with no callers:
`GetChatModelConfigByProviderAndModel`, `DeleteChatMessagesByChatID`,
`ListChatsByRootID`, `ListChildChatsByParentID`

**Other (6):**
`DeleteGitSSHKey`, `UpdateUserLinkedID`, `GetFileIDByTemplateVersionID`,
`GetTemplateVersionHasAITask`, `InsertUserGroupsByName`,
`RemoveUserFromAllGroups`
2026-03-12 21:32:57 +00:00
Kyle Carberry 58f295059c fix: grant chatd ActionReadPersonal on User and parallelize runChat DB calls (#22970)
## Problem

1. **Personal behavior prompt not applied**: The chatd background worker
was missing `ActionReadPersonal` on `ResourceUser` in its RBAC subject.
When `resolveUserPrompt` calls `GetUserChatCustomPrompt`, the dbauthz
layer checks `ActionReadPersonal` on the user — which the chatd role
didn't have. The error was silently swallowed (returns `""`), so the
user's custom prompt was never injected into the system messages.

2. **Sequential DB calls on chat startup**: Several independent database
queries in `runChat` and `resolveChatModel` were running sequentially,
adding unnecessary latency before the LLM stream begins.

## Changes

### RBAC fix (`dbauthz.go`)
- Add `rbac.ResourceUser.Type: {policy.ActionReadPersonal}` to
`subjectChatd` site permissions
- This is the minimal permission needed — `ActionRead` on User remains
denied

### Parallelization (`chatd.go`)
Three parallelization points using `errgroup.Group`:

1. **`resolveChatModel`**: `resolveModelConfig` and
`GetEnabledChatProviders` run concurrently (both needed for
`ModelFromConfig`, which stays sequential after the wait)

2. **`runChat` startup**: `resolveChatModel` and
`GetChatMessagesForPromptByChatID` run concurrently (completely
independent)

3. **`runChat` prompt assembly**: `resolveInstructions` and
`resolveUserPrompt` run concurrently (both produce strings;
`InsertSystem` calls maintain correct order after the wait)

Same pattern applied to the `ReloadMessages` callback.

### Test (`dbauthz_test.go`)
- Add assertion in `TestAsChatd/AllowedActions` that
`ActionReadPersonal` on `ResourceUser` is permitted
2026-03-11 22:07:46 +00:00
Kyle Carberry 1f37df4db3 perf(chatd): fix six scale bottlenecks identified by benchmarking (#22957)
## Summary

Scale-tested the `chatd` package with mock-based benchmarks to identify
performance bottlenecks. This PR fixes 6 of the 8 identified issues,
ranked by severity.

## Changes

### 1. Parallel tool execution (HIGH) — `chatloop.go`
`executeTools` ran tool calls sequentially. Now dispatches all calls
concurrently via goroutines with `sync.WaitGroup`. Results are
pre-allocated by index (no mutex needed). `onResult` callbacks fire as
each tool completes.

### 2. Pubsub-backed subagent await (HIGH) — `subagent.go`
`awaitSubagentCompletion` polled the DB every 200ms. Now subscribes to
the child chat's `ChatStreamNotifyChannel` via pubsub for near-instant
notifications. Fallback poll reduced to 5s. Falls back to 200ms only
when `pubsub == nil` (single-instance / in-memory).

### 3. Per-chat stream locking (MEDIUM) — `chatd.go`
Replaced single global `streamMu` + `map[uuid.UUID]*chatStreamState`
with `sync.Map` where each `chatStreamState` has its own `sync.Mutex`.
Zero cross-chat contention.

### 4. Batch chat acquisition (MEDIUM) — `chatd.go`
`processOnce` acquired 1 chat per tick. Now loops up to
`maxChatsPerAcquire = 10` per tick, avoiding idle time when many chats
are pending.

### 5. Reduced heartbeat frequency (LOW-MEDIUM) — `chatd.go`
`chatHeartbeatInterval` changed from 30s to 60s. Safe given the 5-minute
`DefaultInFlightChatStaleAfter`.

### 6. O(depth) descendant check (LOW) — `subagent.go`
Replaced top-down BFS (`O(total_descendants)` queries) with bottom-up
parent-chain walk (`O(depth)` queries). Includes cycle protection.

## Not addressed (intentionally)
- Message serialization overhead
- Buffer eviction (`buffer[1:]` pattern)
2026-03-11 14:00:08 -04:00
Cian Johnston bc27274aba feat(coderd): refactors github pr sync functionality (#22715)
- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests

Created using Opus 4.6
2026-03-10 18:46:01 +00:00
Kyle Carberry b6d1a11c58 feat(chatd): add user-level custom prompt for agent chats (#22896)
Adds a user-level custom prompt to the database.

I'll be doing a follow-up for the UI, as we currently do not have
user-level settings (it's just admin). I'll also make it very obvious
for chats where there is a user-level prompt, but I don't know how yet.
2026-03-10 11:17:52 -04:00
Danielle Maywood 6489d6f714 feat(chatd): use last assistant message as push notification summary (#22671)
Instead of the static 'Agent has finished running.' text, extract a
summary from the last assistant message to give users meaningful context
about what the agent accomplished. Falls back to the static text if no
suitable message is found.

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2026-03-10 15:14:15 +00:00
Cian Johnston c933ddcffd fix(agents): persist system prompt server-side instead of localStorage (#22857)
## Problem

The Admin → Agents → System Prompt textarea saved only to the browser's
`localStorage`. The value was never sent to the backend, never stored in
the database, and never injected into chats. Entering text, clicking
Save, and refreshing the page showed no changes — the prompt was
effectively a no-op.

## Root Cause

Three disconnected layers:
1. **Frontend** wrote to `localStorage`, never called an API.
2. **`handleCreateChat`** never read `savedSystemPrompt`.
3. **Backend** hardcoded `chatd.DefaultSystemPrompt` on every chat
creation — no field in `CreateChatRequest` accepted a custom prompt.

## Changes

### Database
- Added `GetChatSystemPrompt` / `UpsertChatSystemPrompt` queries on the
existing `site_configs` table (no migration needed).

### API
- `GET /api/experimental/chats/system-prompt` — returns the configured
prompt (any authenticated user).
- `PUT /api/experimental/chats/system-prompt` — sets the prompt
(admin-only, `rbac: deployment_config update`).
- Input validation: max 32 KiB prompt length.

### Backend
- `resolvedChatSystemPrompt(ctx)` checks for a custom prompt in the DB,
falls back to `chatd.DefaultSystemPrompt` when empty/unset.
- Logs a warning on DB errors instead of silently swallowing them.
- Replaced the hardcoded `defaultChatSystemPrompt()` call in chat
creation.

### Frontend
- Replaced `localStorage` read/write with React Query
`useQuery`/`useMutation` backed by the new endpoints.
- Fixed `useEffect` draft sync to avoid clobbering in-progress user
edits on refetch.
- Added `try/catch` error handling on save (draft stays dirty for
retry).
- Save button disabled during mutation (`isSavingSystemPrompt`).
- Query key follows kebab-case convention (`chat-system-prompt`).

### UX
- Added hint: "When empty, the built-in default prompt is used."

### Tests
- `TestChatSystemPrompt`: GET returns empty when unset, admin can set,
non-admin gets 403.
- dbauthz `TestMethodTestSuite` coverage for both new querier methods.
2026-03-10 11:46:53 +00:00
Mathias Fredriksson a104d608a3 feat: add file/image attachment support to chat input (#22604)
This change adds support for image attachments to chat via add button
and clipboard paste. Files are stored in a new `chat_files` table and
referenced by ID in message content. File data is resolved from storage
at LLM dispatch time, keeping the message content column small.

Upload validates MIME types via content type or content sniffing against
an allowlist (png, jpeg, gif, webp). The retrieval endpoint serves files
with immutable caching headers. On the frontend, uploads start eagerly
on attach with a background fetch to pre-warm the browser HTTP cache so
the timeline renders instantly after send.
2026-03-06 21:05:26 +02:00
Kayla はな 56bdea73b8 feat: add workspace acls to task rbac objects (#22311)
To allow tasks to be shareable, we need to share both the `task`
resource and the `workspace` resource, and their sharing state needs to
be kept in sync. We've already implemented all of the necessary ACL
functionality for workspaces, so we can just sort of proxy those ACLs
back to the task as well.
2026-03-05 13:40:53 -07:00
Danielle Maywood d2d956edb1 fix: add archived query parameter to chat list endpoint (#22562)
Despite the SDK type having an `Archived` field for chats, this data was
never fetched from the database — the `GetChatsByOwnerID` query
hardcoded `AND archived = false`, and the `convertChat` function never
mapped the field.

This PR adds an optional `archived` query parameter to `GET
/api/experimental/chats`:

| Value | Behavior |
|-------|----------|
| *(not provided)* | Returns all chats (active and archived) |
| `archived=false` | Returns only non-archived chats |
| `archived=true` | Returns only archived chats |

This follows the same pattern used by template versions
(`sqlc.narg('archived')` nullable boolean).

Also fixes `convertChat` to populate the `Archived` field in API
responses, which was never being set despite existing on the SDK type.
2026-03-03 20:39:19 +00:00
Danny Kopping 1b08bc76a6 feat: store tool call IDs to determine interception lineage (#22246)
Adds database columns and server-side logic to track interception lineage via tool call IDs. When an interception ends, the server resolves the correlating tool call ID to find the parent interception and links them via `parent_id`.

New `provider_tool_call_id` column on `aibridge_tool_usages` and `parent_id` column on `aibridge_interceptions`, with indexes for lookup. `findParentInterceptionID` queries by tool call ID and filters out the current interception to find the parent.

Adapted from the [coder/coder `dk/prompt_provenance_poc`](https://github.com/coder/coder/compare/main...dk/prompt_provenance_poc) branch.
Depends on [coder/aibridge#188](https://github.com/coder/aibridge/pull/188).  
  
Closes https://github.com/coder/internal/issues/1334
2026-03-03 21:04:41 +02:00
Kyle Carberry 5eebd3829f fix: use cursor-based query for chat stream notifications (#22510)
## Problem

The pubsub notification handler in `chatd` re-fetched **all** messages
from the DB on every new message notification, then filtered in Go with
`msg.ID > lastMessageID`. This grows linearly with conversation length —
every new message triggers a full table scan of that chat's history.

The `AfterMessageID` field in the pubsub notification payload was
clearly designed for cursor-based fetching, but no matching query
existed.

## Fix

- Add `GetChatMessagesByChatIDAfter` SQL query with `WHERE id >
@after_id`, so the database does the filtering instead of Go.
- Use it in the pubsub notification handler in `chatd.go`, passing
`lastMessageID` as the cursor.
- Implement the dbauthz wrapper (was a `panic("not implemented")` stub
from codegen) with the same read-check-on-parent-chat pattern as
adjacent methods.
- Add dbauthz test coverage for the new method.

**Not changed:** The initial snapshot in `Subscribe()` still loads all
messages — that's correct, since a newly-connecting client needs the
full conversation state. The waste was only in the ongoing notification
path.
2026-03-02 16:31:04 -05:00
Cian Johnston a62f2fbfc4 feat(rbac): add AsChatd subject to replace AsSystemRestricted in chatd (#22487)
Add a new SubjectTypeChatd RBAC subject with minimal permissions:
- Chat: CRUD
- Workspace: Read
- DeploymentConfig: Read

Replace all 10 AsSystemRestricted calls in coderd/chatd/chatd.go:
- Line 890: Use AsChatd instead of AsSystemRestricted for the background
processor context.
- Subscribe() path (5 calls): Remove system escalation entirely; these
run under the authenticated user's context from the HTTP handler.
- processChat path (4 calls): Remove redundant per-call wraps; the
context already carries AsChatd from the processor start.

Add TestAsChatd verifying allowed and denied actions.

Created using Mux (Opus 4.6)
2026-03-02 15:57:04 +00:00
Kyle Carberry 12083441e0 feat(chats): archive chats instead of hard-deleting them (#22406)
## Summary

The UI has always labeled the action as "Archive agent" but the backend
was performing a hard `DELETE`, permanently destroying chats and all
their messages.

This change replaces the hard delete with a soft archive, consistent
with the pattern used by template versions.

## Changes

### Database
- **Migration 000423**: Add `archived boolean DEFAULT false NOT NULL`
column to `chats` table
- Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET
archived = true`)
- Add `UnarchiveChatByID` query (`UPDATE SET archived = false`)
- Filter archived chats from `GetChatsByOwnerID` (`WHERE archived =
false`)

### API
- Remove `DELETE /api/experimental/chats/{chat}`
- Add `POST /api/experimental/chats/{chat}/archive` — archives a chat
and all its descendants
- Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a
single chat (API only, no UI yet)

### Backend
- `archiveChatTree()` recursively archives child chats (replaces
`deleteChatTree()` which hard-deleted)
- Chat daemon's `ArchiveChat()` archives the full chat tree in a
transaction
- Authorization uses `ActionUpdate` instead of `ActionDelete`

### SDK
- Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()`
- Add `Archived` field to `Chat` struct

### Frontend
- `archiveChat` API call uses `POST .../archive` instead of `DELETE`
- No UI changes — the "Archive agent" button now actually archives
instead of deleting

## Design Decision

This follows the **template version archive pattern** (Pattern B in the
codebase):
- `archived boolean` column (not `deleted boolean`)
- Dedicated `POST .../archive` and `POST .../unarchive` routes (not
repurposing `DELETE`)
- Reversible — users can unarchive via the API (UI for this will come
later)
2026-02-27 16:46:19 -05:00
Kyle Carberry edee917d88 feat: add experimental agents support (#22290)
feat: add AI chat system with agent tools and chat UI

Introduce the chatd subsystem and Agents UI for AI-powered chat
within Coder workspaces.

- Add chatd package with chat loop, message compaction, prompt
  management, and LLM provider integration (OpenAI, Anthropic)
- Add agent tools: create workspace, list/read templates, read/write/
  edit files, execute commands
- Add chat API endpoints with streaming, message editing, and
  durable reconnection
- Add database schema and migrations for chats, chat messages, chat
  providers, and chat model configs
- Add RBAC policies and dbauthz enforcement for chat resources
- Add Agents UI pages with conversation timeline, queued messages
  list, diff viewer, and model configuration panel
- Add comprehensive test coverage including coderd integration tests,
  chatd unit tests, and Storybook stories
- Gate feature behind experiments flag

---------

Co-authored-by: Cian Johnston <cian@coder.com>
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jeremy Ruppel <jeremy@coder.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 16:50:56 +00:00
Jake Howell d2787df442 feat: add AI Bridge request logs model filter (#22230)
This pull-request implements a simple filtering logic so that we're able
to pick which model the user actually used when logs were sent to AI
Bridge.

- Add `GET /aibridge/models` API endpoint that returns distinct model
names from AI Bridge interceptions, with pagination and search support
- New `ListAIBridgeModels` SQL query using case-sensitive prefix
matching (`LIKE model || '%'`) to allow B-tree index usage
- Hand-written `ListAuthorizedAIBridgeModels` in `modelqueries.go` for
RBAC authorization filter injection
- `AIBridgeModels` search query parser in searchquery/search.go
(defaults bare terms to the `model` field)
- dbauthz wrappers, dbmetrics, and dbmock implementations for the new
query

<img width="292" height="185" alt="image"
src="https://github.com/user-attachments/assets/134771df-2d26-4c54-acc4-27f58128b351"
/>
2026-02-26 02:40:45 +11:00
Cian Johnston 6336fee3a7 feat: add telemetry for task lifecycle events (#21922)
Relates to https://github.com/coder/internal/issues/1259

Adds new database queries and telemetry collection functions to gather
task lifecycle events (pause/resume cycles, idle time) for analytics.
    
Task events track pause/resume activity, idle duration before pausing,
paused duration, and time from resume to first app status, filtered to
recent activity based on the telemetry snapshot interval.

🤖 Created with Mux (Opus 4.6).
2026-02-24 17:04:42 +00:00
Kacper Sawicki 1e274063d4 feat(coderd): filter expired API tokens server-side (#22263)
## Summary

Moves expired token filtering from client-side to server-side by adding
an `include_expired` parameter to the `GetAPIKeysByLoginType` and
`GetAPIKeysByUserID` database queries. This is more efficient for large
deployments with many expired/short-lived tokens.

## Changes

- Add `include_expired` parameter to SQL queries using `OR`
short-circuit
- Add `include_expired` query parameter to `GET
/users/{user}/keys/tokens`
- Add `IncludeExpired` field to `codersdk.TokensFilter`
- Remove client-side filtering from CLI `tokens list` command
- Add `TestTokensFilterExpired` test

Fixes coder/internal#1357
2026-02-24 15:27:03 +00:00
Jon Ayers 0a7a3da178 fix: exclude provisioner_state from workspace_build_with_user view (#22159)
The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.

This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
2026-02-23 22:46:17 -06:00
Jon Ayers 6035e45cb8 feat: add e2e workspace build duration metric (#21739)
Adds coderd_template_workspace_build_duration_seconds histogram that
tracks the full duration from workspace build creation to agent ready.
This captures the complete user-perceived build time including
provisioning and agent startup.

The metric is emitted when the agent reports ready/error/timeout via the
lifecycle API, ensuring each build is counted exactly once per replica.
2026-02-06 16:26:02 -06:00
Zach a31e476623 fix: make boundary usage telemetry collection atomic (#21907)
Previously, UpsertBoundaryUsageStats (INSERT...ON CONFLICT DO UPDATE) and
GetAndResetBoundaryUsageSummary (DELETE...RETURNING) could race during
telemetry period cutover. Without serialization, an upsert concurrent with the
delete could lose data (deleted right after being written) or commit after the
delete (miscounted in the next period). Both operations now acquire
LockIDBoundaryUsageStats within a transaction to ensure a clean cutover.
2026-02-06 09:52:17 -07:00
Mathias Fredriksson c60c373bc9 fix(coderd): clean up task snapshots on task deletion (#21949)
Task snapshots were orphaned when tasks were soft-deleted. The
`task_snapshots` table has an `ON DELETE CASCADE` foreign key, but
that only fires on hard deletes.

Modified DeleteTask to use a CTE that atomically soft-deletes the
task and removes its snapshot in a single transaction. The query now
returns just the task UUID instead of the full row.

Closes coder/internal#1283
2026-02-06 11:55:33 +02:00
Danielle Maywood af0e171595 feat(coderd/agentapi): support terraform-defined subagent ids (#21837)
Update `coderd/agentapi` to handle pre-created sub agents
2026-02-04 15:33:48 +00:00
Zach 7dfa33b410 feat: add boundary usage tracking database schema and tracker skeleton (#21670)
feat: add boundary usage telemetry database schema and RBAC

Adds the foundation for tracking boundary usage telemetry across Coder
replicas. This includes:

  - Database schema: `boundary_usage_stats` table with per-replica stats
    (unique workspaces, unique users, allowed/denied request counts)
  - Database queries: upsert stats, get aggregated summary, reset stats,
    delete by replica ID
  - RBAC: `boundary_usage` resource type with read/update/delete actions,
    accessible only via system `BoundaryUsageTracker` subject (not regular
    user roles)
  - Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/`

The tracker accumulates stats in memory and periodically flushes to the
database. Stats are aggregated across replicas for telemetry reporting,
then reset when a new reporting period begins. The tracker implementation
and plumbing will be done in a subsequent commit/PR.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 13:29:21 -07:00
George K c352a51b22 fix(coderd): authorize workspace start/stop/delete by transition action (#21691)
Use transition-specific actions when authorizing workspace build
parameter inserts in the database layer so start/stop/delete do not
require workspace.update.

Related to: https://github.com/coder/internal/issues/1299
2026-01-27 09:08:12 -08:00
Mathias Fredriksson 25d7f27cdb feat(coderd): add task log snapshot storage endpoint (#21644)
This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot
endpoint for agents to upload task conversation history during
workspace shutdown. This allows users to view task logs even when the
workspace is stopped.

The endpoint accepts agentapi format payloads (typically last 10
messages, max 64KB), wraps them in a format envelope, and upserts to the
task_snapshots table. Uses agent token auth and validates the task
belongs to the agent's workspace.

Closes coder/internal#1253
2026-01-27 11:09:24 +02:00
Spike Curtis f47f89d997 chore: remove unused tailnet v1 tables and queries (#21646)
Removes the legacy tailnet v1 API tables (`tailnet_clients`, `tailnet_agents`, `tailnet_client_subscriptions`) and their associated queries, triggers, and functions. These were superseded by the v2 tables (`tailnet_peers`, `tailnet_tunnels`) in migration 000168, and the v1 API code was removed in commit d6154c4310, but the database artifacts were never cleaned up.

**Changes:**
- New migration `000410_remove_tailnet_v1_tables` to drop the unused tables
- Removed 11 unused queries from `tailnet.sql`
- Removed associated manual wrapper methods in `dbauthz` and `dbmetrics`
- ~930 lines deleted across 11 files
2026-01-26 14:27:17 +04:00
Callum Styan e195856c43 perf: reduce pg_notify call volume by batching together agent metadata updates (#21330)
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 22:47:49 -08:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Cian Johnston 08343a7a9f perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522)
Relates to https://github.com/coder/internal/issues/1214

The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.

This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context

Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
2026-01-19 12:36:33 +00:00
George K 0712faef4f feat(enterprise): implement organization "disable workspace sharing" option (#21376)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.

This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.

Closes https://github.com/coder/internal/issues/1073 (part 2)

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-14 09:47:50 -08:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Jake Howell ea00e72063 feat: add rbac specificity for dbpurge (#21088)
Related to
[`internal#1139`](https://github.com/coder/internal/issues/1139)

Continuation of #21074 

This implements some RBAC role specificity for `dbpurge`, ensuring that
we follow the least-privileged model for removing data from the
database. It is specified as following.

```go
Site: rbac.Permissions(map[string][]policy.Action{
	// DeleteOldWorkspaceAgentLogs
	// DeleteOldWorkspaceAgentStats
	// DeleteOldProvisionerDaemons
	// DeleteOldTelemetryLocks
	// DeleteOldAuditLogConnectionEvents
	// DeleteOldConnectionLogs
	rbac.ResourceSystem.Type: {policy.ActionDelete},
	// DeleteOldNotificationMessages
	rbac.ResourceNotificationMessage.Type: {policy.ActionDelete},
	// ExpirePrebuildsAPIKeys
	// DeleteExpiredAPIKeys
	rbac.ResourceApiKey.Type: {policy.ActionDelete},
	// DeleteOldAIBridgeRecords
	rbac.ResourceAibridgeInterception.Type: {policy.ActionDelete},
}),
```

| Position | Pull-request |
| -------- | ------------ |
| | [feat: add prometheus observability metrics for
`dbpurge`](https://github.com/coder/coder/pull/21074) |
|  | [feat: add rbac specificity for
`dbpurge`](https://github.com/coder/coder/pull/21088) |
2025-12-20 01:02:39 +11:00
Callum Styan 8ed1c1d372 perf: reduce calls to GetWorkspaceByAgentID in GetWorkspaceAgentByID (#21046)
This PR piggy backs on the agent API cached workspace added in an earlier PR to provide a fast path for avoiding `GetWorkspaceByAgentID` calls in dbauthz's `GetWorkspaceAgentByID`. This query is not the most expensive, but has a significant call volume at ~16 million calls per week.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-12-10 14:03:24 -08:00
Callum Styan 27c3ec072e perf: support fastpath in dbauthz GetLatestWorkspaceBuildByWorkspaceID (#21047)
This PR piggy backs on the agent API cached workspace added in earlier PRs to provide a fast path for avoiding `GetWorkspaceByID` calls in `GetLatestWorkspaceBuildByWorkspaceID` via injection of the workspaces RBAC object into the context. We can do this from the `agentConnectionMonitor` easily since we already cache the workspace.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-12-09 15:53:52 -08:00
Mathias Fredriksson ad93262d07 fix(coderd/database/dbpurge): allow disabling AI Bridge retention with 0 (#21062)
Previously setting AI Bridge retention to 0 would cause records to be
deleted immediately since we didn't check for the zero value before
calculating the deletion threshold.

This adds a check for aibridgeRetention > 0 to skip deletion when
retention is disabled, matching the pattern used for other retention
settings (connection logs, audit logs, etc.).

Also fixes the return type of DeleteOldAIBridgeRecords from int32 to
int64 since COUNT(*) returns bigint in PostgreSQL.

Refs #21055
2025-12-03 09:37:18 +00:00
Mathias Fredriksson ff46917e62 feat: add retention config for workspace_agent_logs (#21039)
Replace hardcoded 7-day retention for workspace agent logs with
configurable retention from deployment settings. Defaults to 7d to
preserve existing behavior.

Depends on #21038
Updates #20743
2025-12-02 16:01:33 +00:00
Mathias Fredriksson c85d79bcdb feat(coderd/database/dbpurge): add retention for audit logs (#21025)
Add configurable retention policy for audit logs. The DeleteOldAuditLogs
query excludes deprecated connection events (connect, disconnect, open,
close) which are handled separately by DeleteOldAuditLogConnectionEvents.

Disabled (0) by default.

Depends on #21021
Updates #20743
2025-12-02 16:50:09 +02:00
Mathias Fredriksson 9ebcca5b0d feat(coderd/database/dbpurge): add retention for connection logs (#21022)
Add `DeleteOldConnectionLogs` query and integrate it into the `dbpurge`
routine. Retention is controlled by `--retention-connection-logs` flag.
Disabled (0) by default.

Depends on #21021
Updates #20743
2025-12-02 14:17:52 +00:00
Susana Ferreira f8d9a8046f feat: add notification warning alert to Tasks page (#20900)
## Problem

Users may not realize that task notifications are disabled by default.
To improve awareness, we show a warning alert on the Tasks page when all
task notifications are disabled.

**Alert visibility logic:**
- Shows when **all** task notification templates (Task Working, Task
Idle, Task Completed, Task Failed) are disabled
- Can be dismissed by the user, which stores the dismissal in the user
preferences API
- If the user later enables any task notification in Account Settings,
the dismissal state is cleared so the alert will show again if they
disable all notifications in the future

<img width="2980" height="1588" alt="Screenshot 2025-11-25 at 17 48 17"
src="https://github.com/user-attachments/assets/316bf097-d9d2-4489-bc16-2987ba45f45c"
/>

## Changes

- Added a warning alert to the Tasks page when all task notifications
are disabled
- Introduced new `/users/{user}/preferences` endpoint to manage user
preferences (stored in `user_configs` table)
- Alert is dismissible and stores the dismissal state via the new user
preferences API endpoint
- Enabling any task notification in Account Settings clears the
dismissal state via the preferences API
- Added comprehensive Storybook stories for both TasksPage and
NotificationsPage to test all alert visibility states and interactions

Closes: https://github.com/coder/internal/issues/1089
2025-11-28 16:50:59 +00:00
Callum Styan b0e8384b82 perf: reduce DB calls to GetWorkspaceByAgentID via caching workspace info (#20662)
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-11-25 14:45:05 -08:00
Mathias Fredriksson 37fc6646ad perf(coderd/database): limit GetLatestWorkspaceAppStatusByAppID to 1 row (#20917)
## Description

This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID`
returned an unbounded number of rows for a given app ID, which could
cause performance issues for noisy or long-running AI tasks.

## Impact

This change reduces database query overhead for workspace app status
updates, particularly for busy AI tasks that update their status
frequently. Previously, fetching the latest status would return all
historical statuses, now it returns only the most recent one.

Fixes #20862

---

🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️
2025-11-25 16:56:42 +02:00
Danielle Maywood 82f525baf3 feat(coderd): add task prompt modification endpoint (#20811)
This PR adds the backend implementation for modifying task prompts. Part
of https://github.com/coder/internal/issues/1084

## Changes

- New `UpdateTaskPrompt` database query to update task prompts
- New PATCH `/api/v2/tasks/{task}/prompt` endpoint

## Notes

This is part 1 of a 2-part PR stack. The frontend UI will be added in a
follow-up PR based on this branch
(https://github.com/coder/coder/pull/20812).

---

🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
2025-11-25 11:13:32 +00:00
Danielle Maywood c12303f0b2 fix: allow agents to be created on dormant workspaces (#20909)
Closes https://github.com/coder/coder/issues/20711

We now allow agents to be created on dormant workspaces.

I've ran the test with and without the change. I've confirmed that -
without the fix - it triggers the "rbac: unauthorized" error.
2025-11-25 06:24:33 +00:00
Steven Masley cefe07d074 feat: purge expired api keys in dbpurge (#20863)
closes https://github.com/coder/coder/issues/19889

This is in response to a migration in v2.27 that takes very long on deployments with large `api_key` tables.
2025-11-24 10:24:32 -06:00
Atif Ali 636408906f chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831) 2025-11-24 18:09:04 +05:00
Danny Kopping 5a7d4f69f6 feat: add configurable retention for aibridge (#20828)
Closes https://github.com/coder/internal/issues/1134

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-21 11:35:36 +02:00
Marcin Tojek d004710a74 feat: add prebuild invalidation via last_invalidated_at timestamp (#20582)
Updates #17917
2025-11-20 17:12:25 +01:00