Commit Graph

677 Commits

Author SHA1 Message Date
Zach 7dfa33b410 feat: add boundary usage tracking database schema and tracker skeleton (#21670)
feat: add boundary usage telemetry database schema and RBAC

Adds the foundation for tracking boundary usage telemetry across Coder
replicas. This includes:

  - Database schema: `boundary_usage_stats` table with per-replica stats
    (unique workspaces, unique users, allowed/denied request counts)
  - Database queries: upsert stats, get aggregated summary, reset stats,
    delete by replica ID
  - RBAC: `boundary_usage` resource type with read/update/delete actions,
    accessible only via system `BoundaryUsageTracker` subject (not regular
    user roles)
  - Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/`

The tracker accumulates stats in memory and periodically flushes to the
database. Stats are aggregated across replicas for telemetry reporting,
then reset when a new reporting period begins. The tracker implementation
and plumbing will be done in a subsequent commit/PR.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 13:29:21 -07:00
Mathias Fredriksson 25d7f27cdb feat(coderd): add task log snapshot storage endpoint (#21644)
This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot
endpoint for agents to upload task conversation history during
workspace shutdown. This allows users to view task logs even when the
workspace is stopped.

The endpoint accepts agentapi format payloads (typically last 10
messages, max 64KB), wraps them in a format envelope, and upserts to the
task_snapshots table. Uses agent token auth and validates the task
belongs to the agent's workspace.

Closes coder/internal#1253
2026-01-27 11:09:24 +02:00
Spike Curtis f47f89d997 chore: remove unused tailnet v1 tables and queries (#21646)
Removes the legacy tailnet v1 API tables (`tailnet_clients`, `tailnet_agents`, `tailnet_client_subscriptions`) and their associated queries, triggers, and functions. These were superseded by the v2 tables (`tailnet_peers`, `tailnet_tunnels`) in migration 000168, and the v1 API code was removed in commit d6154c4310, but the database artifacts were never cleaned up.

**Changes:**
- New migration `000410_remove_tailnet_v1_tables` to drop the unused tables
- Removed 11 unused queries from `tailnet.sql`
- Removed associated manual wrapper methods in `dbauthz` and `dbmetrics`
- ~930 lines deleted across 11 files
2026-01-26 14:27:17 +04:00
Callum Styan e195856c43 perf: reduce pg_notify call volume by batching together agent metadata updates (#21330)
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 22:47:49 -08:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Cian Johnston 08343a7a9f perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522)
Relates to https://github.com/coder/internal/issues/1214

The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.

This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context

Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
2026-01-19 12:36:33 +00:00
George K 0712faef4f feat(enterprise): implement organization "disable workspace sharing" option (#21376)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.

This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.

Closes https://github.com/coder/internal/issues/1073 (part 2)

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-14 09:47:50 -08:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Steven Masley 89f4d60e7b chore: remove experiment "terraform-directory-reuse" (#21397)
Experiment is no longer required, the new method will be released without an experiment and without a toggle

Main PR is: https://github.com/coder/coder/pull/21398
2026-01-09 11:13:16 -06:00
Spike Curtis bd753d9cb9 fix: mark users seen when activating on login (#21305)
fixes #21303

Update user last_seen_at when we mark them active on login. This prevents a narrow race where they can be re-marked dormant and fail to log in.
2025-12-17 16:49:40 +04:00
George K 103967ed02 feat: add sharing info to /workspaces endpoint (#21049)
closes: https://github.com/coder/internal/issues/858

Similar to https://github.com/coder/coder/pull/19375, this one uses
system permissions for fetching actual user and group data.

Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it.  

Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g:

`ALTER TABLE workspaces
   ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');`

Also adds missing  `OwnerName` in `ConvertWorkspaceRows`.
2025-12-15 08:42:08 -08:00
Mathias Fredriksson 761dd55ee8 fix(coderd/database): sort template version variables and fix test flake (#21233)
Previously the GetTemplateVersionVariables query did not sort output,
relying on PostgreSQL on-disk ordering which is undeterministic.

Variables are now sorted by name because there is no alternative for
ordering.

Tests were adjusted to accommodate the new ordering, previously they
relied on data being written to disk in insert order.
2025-12-12 11:41:46 +00:00
Callum Styan a59a84b2a7 perf: optimize GetTemplateAppInsightsByTemplate by pre-filtering on start/end times (#20669)
In this PR we're optimizing the `GetTemplateAppInsightsByTemplate` query
by pre-filtering out apps which do not have an active session during the
start/end time window.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-12-09 15:21:16 -08:00
Callum Styan 6abb889fab perf: optimize GetDeploymentWorkspaceAgentStats by eliminating 2nd select (#21112)
Tracking issue here: https://github.com/coder/internal/issues/1009

To summarize, the current version of this query selects from
`workspace_agent_stats` twice. The expensive portion of this query is
the bitmap heap scan we have to do for each of these selects. We can
easily cut the cost of this query by 40-50% by cutting this down to a
single select, and using those rows for both sets of calculations.

Eliminating the heap scan itself would require a follow up PR to
introduce a new index. Blink helped with the rewrite of the query.

The current plan looks like this:
```
 Nested Loop  (cost=6101.64..6101.69 rows=1 width=64) (actual time=11.782..11.787 rows=1 loops=1)
   ->  Aggregate  (cost=2996.17..2996.19 rows=1 width=32) (actual time=3.356..3.357 rows=1 loops=1)
         ->  Bitmap Heap Scan on workspace_agent_stats  (cost=54.80..2992.86 rows=440 width=24) (actu
al time=0.346..2.927 rows=818 loops=1)
               Recheck Cond: (created_at > (now() - '00:15:00'::interval))
               Filter: (connection_median_latency_ms > '0'::double precision)
               Rows Removed by Filter: 1070
               Heap Blocks: exact=486
               ->  Bitmap Index Scan on idx_agent_stats_created_at  (cost=0.00..54.69 rows=1368 width
=0) (actual time=0.241..0.241 rows=1888 loops=1)
                     Index Cond: (created_at > (now() - '00:15:00'::interval))
   ->  Aggregate  (cost=3105.47..3105.49 rows=1 width=32) (actual time=8.418..8.420 rows=1 loops=1)
         ->  Subquery Scan on a  (cost=3060.95..3105.39 rows=7 width=32) (actual time=7.851..8.394 ro
ws=63 loops=1)
               Filter: (a.rn = 1)
               ->  WindowAgg  (cost=3060.95..3088.29 rows=1368 width=209) (actual time=7.850..8.382 r
ows=63 loops=1)
                     Run Condition: (row_number() OVER (?) <= 1)
                     ->  Sort  (cost=3060.93..3064.35 rows=1368 width=56) (actual time=7.836..8.036 r
ows=1888 loops=1)
                           Sort Key: workspace_agent_stats_1.agent_id, workspace_agent_stats_1.create
d_at DESC
                           Sort Method: quicksort  Memory: 181kB
                           ->  Bitmap Heap Scan on workspace_agent_stats workspace_agent_stats_1  (co
st=55.03..2989.67 rows=1368 width=56) (actual time=0.388..2.096 rows=1888 loops=1)
                                 Recheck Cond: (created_at > (now() - '00:15:00'::interval))
                                 Heap Blocks: exact=486
                                 ->  Bitmap Index Scan on idx_agent_stats_created_at  (cost=0.00..54.
69 rows=1368 width=0) (actual time=0.295..0.295 rows=1888 loops=1)
                                       Index Cond: (created_at > (now() - '00:15:00'::interval))
 Planning Time: 2.350 ms
 Execution Time: 13.152 ms
(24 rows)
```

The new plan looks like this
```
 Aggregate  (cost=2966.96..2966.98 rows=1 width=64) (actual time=3.812..3.814 rows=1 loops=1)
   ->  WindowAgg  (cost=2891.96..2916.94 rows=1250 width=88) (actual time=2.696..3.412 rows=1890 loop
s=1)
         ->  Sort  (cost=2891.94..2895.06 rows=1250 width=80) (actual time=2.686..2.780 rows=1890 loo
ps=1)
               Sort Key: workspace_agent_stats.agent_id, workspace_agent_stats.created_at DESC
               Sort Method: quicksort  Memory: 226kB
               ->  Bitmap Heap Scan on workspace_agent_stats  (cost=50.11..2827.64 rows=1250 width=80
) (actual time=0.218..1.551 rows=1890 loops=1)
                     Recheck Cond: (created_at > (now() - '00:15:00'::interval))
                     Heap Blocks: exact=474
                     ->  Bitmap Index Scan on idx_agent_stats_created_at  (cost=0.00..49.80 rows=1250
 width=0) (actual time=0.146..0.147 rows=1890 loops=1)
                           Index Cond: (created_at > (now() - '00:15:00'::interval))
 Planning Time: 0.534 ms
 Execution Time: 3.969 ms
(12 rows)
```

If we compare the results of the query they're similar enough that any
differences can be attributed to slightly different timestamps for
`now()` in the version of the query I am using to generate results for
comparison:
```
 workspace_rx_bytes | workspace_tx_bytes | workspace_connection_latency_50 | workspace_connection_latency_95 | session_count_vscode | session_count_ssh | session_count_jetbrains | session_count_reconnecting_pty 
--------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+--------------------------------
           15263563 |           74555854 |                          47.933 |                        250.5522 |                  239 |                59 |                       3 |                              3
(1 row)

 workspace_rx_bytes | workspace_tx_bytes | workspace_connection_latency_50 | workspace_connection_latency_95 | session_count_vscode | session_count_ssh | session_count_jetbrains | session_count_reconnecting_pty 
--------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+--------------------------------
           15295819 |           74598410 |                          47.933 |                        250.5522 |                  239 |                59 |                       3 |                              3           
```

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-12-09 15:19:55 -08:00
Mathias Fredriksson cfdd4a9b88 perf(coderd/database): add index on workspace_app_statuses.app_id (#21099) 2025-12-04 17:56:13 +02:00
Mathias Fredriksson ad93262d07 fix(coderd/database/dbpurge): allow disabling AI Bridge retention with 0 (#21062)
Previously setting AI Bridge retention to 0 would cause records to be
deleted immediately since we didn't check for the zero value before
calculating the deletion threshold.

This adds a check for aibridgeRetention > 0 to skip deletion when
retention is disabled, matching the pattern used for other retention
settings (connection logs, audit logs, etc.).

Also fixes the return type of DeleteOldAIBridgeRecords from int32 to
int64 since COUNT(*) returns bigint in PostgreSQL.

Refs #21055
2025-12-03 09:37:18 +00:00
Mathias Fredriksson ff46917e62 feat: add retention config for workspace_agent_logs (#21039)
Replace hardcoded 7-day retention for workspace agent logs with
configurable retention from deployment settings. Defaults to 7d to
preserve existing behavior.

Depends on #21038
Updates #20743
2025-12-02 16:01:33 +00:00
Mathias Fredriksson 9ec90cf2e7 feat(coderd/database/dbpurge): make API keys retention configurable (#21037)
Replace hardcoded 7-day retention for expired API keys with configurable
retention from deployment settings. Skips deletion entirely when effective
retention is 0.

Depends on #21021
Updates #20743
2025-12-02 15:41:38 +00:00
Mathias Fredriksson c85d79bcdb feat(coderd/database/dbpurge): add retention for audit logs (#21025)
Add configurable retention policy for audit logs. The DeleteOldAuditLogs
query excludes deprecated connection events (connect, disconnect, open,
close) which are handled separately by DeleteOldAuditLogConnectionEvents.

Disabled (0) by default.

Depends on #21021
Updates #20743
2025-12-02 16:50:09 +02:00
Mathias Fredriksson 9ebcca5b0d feat(coderd/database/dbpurge): add retention for connection logs (#21022)
Add `DeleteOldConnectionLogs` query and integrate it into the `dbpurge`
routine. Retention is controlled by `--retention-connection-logs` flag.
Disabled (0) by default.

Depends on #21021
Updates #20743
2025-12-02 14:17:52 +00:00
Susana Ferreira f8d9a8046f feat: add notification warning alert to Tasks page (#20900)
## Problem

Users may not realize that task notifications are disabled by default.
To improve awareness, we show a warning alert on the Tasks page when all
task notifications are disabled.

**Alert visibility logic:**
- Shows when **all** task notification templates (Task Working, Task
Idle, Task Completed, Task Failed) are disabled
- Can be dismissed by the user, which stores the dismissal in the user
preferences API
- If the user later enables any task notification in Account Settings,
the dismissal state is cleared so the alert will show again if they
disable all notifications in the future

<img width="2980" height="1588" alt="Screenshot 2025-11-25 at 17 48 17"
src="https://github.com/user-attachments/assets/316bf097-d9d2-4489-bc16-2987ba45f45c"
/>

## Changes

- Added a warning alert to the Tasks page when all task notifications
are disabled
- Introduced new `/users/{user}/preferences` endpoint to manage user
preferences (stored in `user_configs` table)
- Alert is dismissible and stores the dismissal state via the new user
preferences API endpoint
- Enabling any task notification in Account Settings clears the
dismissal state via the preferences API
- Added comprehensive Storybook stories for both TasksPage and
NotificationsPage to test all alert visibility states and interactions

Closes: https://github.com/coder/internal/issues/1089
2025-11-28 16:50:59 +00:00
Danielle Maywood e7dbbcde87 fix: do not notify marked for deletion for deleted workspaces (#20937)
Closes https://github.com/coder/coder/issues/20913

I've ran the test without the fix, verified the test caught the issue,
then applied the fix, and confirmed the issue no longer happens.

---

🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code
and then review by a human 👩
2025-11-26 09:23:16 +00:00
Mathias Fredriksson 37fc6646ad perf(coderd/database): limit GetLatestWorkspaceAppStatusByAppID to 1 row (#20917)
## Description

This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID`
returned an unbounded number of rows for a given app ID, which could
cause performance issues for noisy or long-running AI tasks.

## Impact

This change reduces database query overhead for workspace app status
updates, particularly for busy AI tasks that update their status
frequently. Previously, fetching the latest status would return all
historical statuses, now it returns only the most recent one.

Fixes #20862

---

🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️
2025-11-25 16:56:42 +02:00
Susana Ferreira 3011207519 feat: add display name field for tasks (#20856)
## Problem

Tasks currently only expose a machine-friendly name field (e.g.
`task-python-debug-a1b2`), but this value is primarily an identifier
rather than a clean, descriptive label. We need a separate
display-friendly name for use in the UI.

This PR introduces a new `display_name` field and updates the task-name
generation flow. The Claude system prompt was updated to return valid
JSON with both `name` and `display_name`. The name generation logic
follows a fallback chain (Anthropic > prompt sanitization > random
fallback). To make task names more closely resemble their display names,
the legacy `task-` prefix has been removed. For context, PR
https://github.com/coder/coder/pull/20834 introduced a small Task icon
to the workspace list to help identify workspaces associated to tasks.

## Changes

- Database migration: Added `display_name` column to tasks table
- Updated system prompt to generate both task name and display name as
valid JSON
- Task name generation now follows a fallback chain: Anthropic > prompt
sanitization > random fallback
- Removed `task-` prefix from task names to allow more descriptive names
- Note: PR https://github.com/coder/coder/pull/20834 adds a Task icon to
workspaces in the workspace list to distinguish task-created workspaces

**Note:** UI changes will be addressed in a follow-up PR

Related to: https://github.com/coder/coder/issues/20801
2025-11-25 13:00:59 +00:00
Danielle Maywood 82f525baf3 feat(coderd): add task prompt modification endpoint (#20811)
This PR adds the backend implementation for modifying task prompts. Part
of https://github.com/coder/internal/issues/1084

## Changes

- New `UpdateTaskPrompt` database query to update task prompts
- New PATCH `/api/v2/tasks/{task}/prompt` endpoint

## Notes

This is part 1 of a 2-part PR stack. The frontend UI will be added in a
follow-up PR based on this branch
(https://github.com/coder/coder/pull/20812).

---

🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
2025-11-25 11:13:32 +00:00
Jake Howell ca560d36ce fix: remove inflight interceptions from aibridge returned values (#20852)
Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54)

When querying against the values in the database for
`/api/experimental/aibridge/interceptions` we found strange behaviour
wherein there was interceptions that lacked prompting and other various
fields we want. Generally this was as a result of the data not actually
existing for these values (as they were inflight).

The simple solution to this was to hide them if they didn't exist. This
PR addresses that.

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2025-11-25 10:23:39 +11:00
Steven Masley cefe07d074 feat: purge expired api keys in dbpurge (#20863)
closes https://github.com/coder/coder/issues/19889

This is in response to a migration in v2.27 that takes very long on deployments with large `api_key` tables.
2025-11-24 10:24:32 -06:00
Atif Ali 636408906f chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831) 2025-11-24 18:09:04 +05:00
Danny Kopping 5a7d4f69f6 feat: add configurable retention for aibridge (#20828)
Closes https://github.com/coder/internal/issues/1134

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-21 11:35:36 +02:00
Marcin Tojek d004710a74 feat: add prebuild invalidation via last_invalidated_at timestamp (#20582)
Updates #17917
2025-11-20 17:12:25 +01:00
Steven Masley fe3b825b86 chore: per template opt into cached terraform directories (#20609)
For experimental and dogfood purposes, this adds the ability to opt in a single template. 
Leaving the rest of the templates as is. 

For GA, this setting might be removed or changed.
2025-11-13 14:04:12 -06:00
Paweł Banaszewski 991831b1dd chore: add API key ID to interceptions (#20513)
Adds APIKeyID to interceptions.
Needed for tracking API key usage with bridge.
fixes https://github.com/coder/coder/issues/20001
2025-11-10 13:46:41 +01:00
Mathias Fredriksson ce04f6cc5d fix(coderd): remove deprecated AITaskSidebarApp column (#20680)
This column was no longer used in `v2.28` and the codersdk field
deprecated. Both can now be dropped in `v2.29`.

Closes coder/internal#974
2025-11-07 12:45:45 +02:00
Cian Johnston 34f6e72879 feat(coderd): add lookup task by name in httpmw.TaskParam (#20647)
* Adds a `GetTaskByOwnerIDAndName` query
* Updates `httpmw.TaskParam` to fall back to task name if no task by
UUID found.
* Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.
2025-11-05 14:28:34 +00:00
Mathias Fredriksson 7ae3fdc749 refactor: use task data model for notifications (#20590)
Updates coder/internal#973
Updates coder/internal#974
2025-10-31 15:53:27 +02:00
Mathias Fredriksson 859e94d67a fix: deprecate codersdk.AITaskPromptParameterName and reduce usage (#20501)
Depends on coder/sqlc#1
Fixes coder/internal#979
Updates coder/internal#973
2025-10-29 18:59:12 +00:00
Cian Johnston 1ebc217624 fix: update task link AppStatus using task_id (#20543)
Fixes https://github.com/coder/coder/issues/20515

Alternative to https://github.com/coder/coder/pull/20519

Adds `task_id` to `workspaces_expanded` view and updates the "View Task"
link in `AppStatuses` component.

NOTE: this contains a migration
2025-10-29 15:45:45 +00:00
Susana Ferreira 7e8fcb4b0f perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493)
## Description

The membership reconciliation ensures the prebuilds system user is a
member of all organizations with prebuilds configured. To support
prebuilds quota management, each organization must have a prebuilds
group that the system user belongs to.

## Problem

Previously, membership reconciliation iterated over all presets to check
and update membership status. This meant database queries
`GetGroupByOrgAndName` and `InsertGroupMember` were executed for each
preset. Since presets are unique combinations of `(organization,
template, template version, preset)`, this resulted in several redundant
checks for the same organization.

In dogfood, `InsertGroupMember` was called thousands of times per day,
even though memberships were already configured ([internal Grafana
dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1))

<img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36"
src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db"
/>

## Solution

This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query
that returns:
* All unique organizations with prebuilds configured
* Whether the prebuilds user is a member of each organization
* Whether the prebuilds group exists in each organization
* Whether the prebuilds user is in the prebuilds group

The membership reconciliation logic now:
* Fetches status for all organizations in one query
* Only performs inserts for organizations missing required memberships
or groups
* Safely handles concurrent operations via unique constraint violations
* This reduces database load from `O(presets)` to `O(organizations)` per
reconciliation loop, with a single read query when everything is
configured.

## Changes

* Add `GetOrganizationsWithPrebuildStatus` SQL query
* Update `membership.ReconcileAll` to use organization-based
reconciliation instead of preset-based
* Update tests to reflect new behavior

Related to internal thread:
https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369
2025-10-29 14:24:29 +00:00
Susana Ferreira c3e3bb58f2 feat: delete pending canceled prebuilds (#20499)
## Description

PR https://github.com/coder/coder/pull/20387 introduced canceling
pending prebuild jobs from inactive template versions to avoid
provisioning obsolete workspaces. However, the associated prebuilds
remained in the database with "Canceled" status, visible in the UI.

This PR now orphan-deletes these canceled prebuilt workspaces. Since the
canceled jobs were never processed by a provisioner, no Terraform
resources were created, making orphan deletion safe.

Orphan deletion always creates a provisioner job, but behaves
differently based on provisioner availability:
- If no provisioner daemon is available, the job is immediately marked
as completed and the workspace is marked as deleted without any
provisioner processing
- If a provisioner daemon is available, it processes the delete job with
empty Terraform state (no actual resources to destroy)

The job cancellation and workspace deletion occur atomically in the same
transaction. We don't split this into two separate reconciliation runs
because there's no way to distinguish between system-canceled prebuilds
and user-canceled workspaces. If we deleted canceled workspaces in a
later run, we'd delete user-canceled workspaces that users may want to
keep for troubleshooting.

Note: This only applies to system-generated prebuilds from inactive
template versions.

## Changes

* Update `UpdatePrebuildProvisionerJobWithCancel` query to return job
ID, workspace ID, template ID, and template version preset ID
* Add `DeprovisionMode` enum to support orphan deletion in the provision
flow
* Update `ActionTypeCancelPending` handler to cancel jobs and
orphan-delete associated workspaces atomically
2025-10-29 10:37:28 +00:00
Mathias Fredriksson a1fa58ac17 fix: update dbgen and dbfake task creation and toolsdk test fixtures (#20508)
Depends on #20506
Fixes coder/internal#1103
2025-10-28 14:15:58 +02:00
Dean Sheather 5a3ceb38f0 chore: add aibridge data to telemetry (#20449)
- Adds a new table to keep track of which payloads have already been
reported since we only report for the last clock hour
- Adds a query to gather and aggregate all the data by
provider/model/client

Relates to https://github.com/coder/coder-telemetry-server/issues/27
2025-10-28 03:16:41 +11:00
Paweł Banaszewski 50ba223aa1 feat: add db query for setting interception ended_at field (#20437)
Adds UpdateAIBridgeInterceptionEnded query to mark interceptions as
done.
Needed for https://github.com/coder/internal/issues/1051
2025-10-27 09:51:37 +01:00
Susana Ferreira f6e86c6fdb feat: cancel pending prebuilds from non-active template versions (#20387)
## Description

This PR introduces an optimization to automatically cancel pending
prebuild-related jobs from non-active template versions in the
reconciliation loop.

## Problem

Currently, when a template is configured with more prebuild instances
than available provisioners, the provisioner queue can become flooded
with pending prebuild jobs. This issue is worsened when
provisioning/deprovisioning operations take a long time.

When the prebuild reconciliation loop generates jobs faster than
provisioners can process them, pending jobs accumulate in the queue.
Since prebuilt workspaces should always run the latest active template
version, pending prebuild jobs from non-active versions become obsolete
once a new version is promoted.

## Solution

The reconciliation loop cancels pending prebuild-related jobs from
non-active template versions that match the following criteria:

* Build number: 1 (initial build created by the reconciliation loop)
* Job status: `pending`
* Not yet picked up by a provisioner (`worker_id` is `NULL`)
* Owned by the prebuilds system user
* Workspace transition: `start`

This prevents the queue from being cluttered with stale prebuild jobs
that would provision workspaces on an outdated template version that
would consequently need to be deprovisioned.

## Changes

* Added new SQL query `CountPendingNonActivePrebuilds` to identify
presets with pending jobs from non-active versions
* Added new SQL query `UpdatePrebuildProvisionerJobWithCancel` to cancel
jobs for a specific preset
* New reconciliation action type `ActionTypeCancelPending` handles the
cancellation logic
* Cancellation is non-blocking: failures to cancel prebuild jobs are
logged as errors and don't prevent other reconciliation actions

## Follow-up PR

Canceling pending prebuild jobs leaves workspaces in a Canceled state.
While no Terraform resources need to be destroyed (since jobs were
canceled before provisioning started), these database records should
still be cleaned up. This will be addressed in a follow-up PR.

Closes: https://github.com/coder/coder/issues/20242
2025-10-24 15:27:49 +01:00
Mathias Fredriksson a106d67c07 feat(coderd): use task data model for list (#20394)
Updates coder/internal#976
2025-10-23 20:22:51 +03:00
Mathias Fredriksson 9855460524 feat(coderd): use new data model for task delete (#20334)
Updates coder/internal#976
2025-10-23 19:45:18 +03:00
Mathias Fredriksson 5c802c2627 feat(coderd): use task data model when creating a new task (#20275)
Updates coder/internal#976
2025-10-23 19:12:09 +03:00
Dean Sheather 69c2c40512 chore: add user details to aibridge interception list endpoint (#20397)
- Adds FK from `aibridge_interceptions.initiator_id` to `users.id`
- This is enforced by deleting any rows that don't have any users. Since
this is an experimental feature AND coder never deletes user rows I
think this is acceptable.
- Adds `name` as a property on `codersdk.MinimalUser`
- This matches the `visible_users` view in the database. I'm unsure why
`name` wasn't already included given that `username` is.
- Adds a new `initiator` field to `codersdk.AIBridgeInterception` which
contains `codersdk.MinimalUser` (ID, username, name, avatar URL)
- Removes `initiator_id` from `codersdk.AIBridgeInterception`
    - Should be fine since we're still in early access
2025-10-22 16:18:31 +11:00
Dean Sheather ea261a1f7c chore: add offset-based pagination support to aibridge list endpoint (#20393)
Necessary for the frontend to be able to paginate easily. Cursor
pagination is good for fetching all events, but doesn't play very well
when a pagination component gets involved.

Adds support for `?offset=x` to the existing endpoint. The cursor-based
pagination (`?after_id=x`) is still supported. The two pagination modes
are mutually exclusive, and are documented as such. If both are
supplied, the request will be rejected.

Also adds a `total` property to the response that contains the full
count of items matching the filter. We already have indices in place so
I don't think this will impact performance (or we can revisit it before
GA).
2025-10-21 11:50:00 +00:00
Callum Styan 141ef23c81 fix: introduce dedicated queries for workspaces and workspace agents metrics (#19786)
aid in differentiation between sources of calls to `GetWorkspaces` but introducing new queries for metrics specific use cases

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-10-17 13:40:10 -07:00
Cian Johnston 9f229370e7 feat(coderd/database): add ListTasks query (#20282)
Relates to https://github.com/coder/internal/issues/981

Adds a `ListTasks` query that allows filtering by OwnerID and OrganizationID.
2025-10-14 17:33:30 +01:00