coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 13:08:25 +00:00

Author	SHA1	Message	Date
Jon Ayers	6035e45cb8	feat: add e2e workspace build duration metric (#21739 ) Adds coderd_template_workspace_build_duration_seconds histogram that tracks the full duration from workspace build creation to agent ready. This captures the complete user-perceived build time including provisioning and agent startup. The metric is emitted when the agent reports ready/error/timeout via the lifecycle API, ensuring each build is counted exactly once per replica.	2026-02-06 16:26:02 -06:00
Zach	a31e476623	fix: make boundary usage telemetry collection atomic (#21907 ) Previously, UpsertBoundaryUsageStats (INSERT...ON CONFLICT DO UPDATE) and GetAndResetBoundaryUsageSummary (DELETE...RETURNING) could race during telemetry period cutover. Without serialization, an upsert concurrent with the delete could lose data (deleted right after being written) or commit after the delete (miscounted in the next period). Both operations now acquire LockIDBoundaryUsageStats within a transaction to ensure a clean cutover.	2026-02-06 09:52:17 -07:00
Mathias Fredriksson	c60c373bc9	fix(coderd): clean up task snapshots on task deletion (#21949 ) Task snapshots were orphaned when tasks were soft-deleted. The `task_snapshots` table has an `ON DELETE CASCADE` foreign key, but that only fires on hard deletes. Modified DeleteTask to use a CTE that atomically soft-deletes the task and removes its snapshot in a single transaction. The query now returns just the task UUID instead of the full row. Closes coder/internal#1283	2026-02-06 11:55:33 +02:00
Danielle Maywood	af0e171595	feat(coderd/agentapi): support terraform-defined subagent ids (#21837 ) Update `coderd/agentapi` to handle pre-created sub agents	2026-02-04 15:33:48 +00:00
Zach	90aeea5649	fix: handle boundary usage across snapshots and flush races (#21805 ) Previously there were two issues that could cause incorrect boundary usage telemetry data. 1. Bad handling across snapshot intervals: After telemetry snapshot deleted the DB row, the next flush would INSERT the stale cumulative data (which included already-reported usage). This would then be overwritten by subsequent UPDATE flushes, causing the delta between the last snapshot and the reset to be lost (under-reporting usage). Additionally, if there was no new usage after the reset, the tracker would carry over all usage from the previous period into the next period (over-reporting usage). 2. Missed usage from a race condition: Track() calls between the first mutex unlock and second mutex lock in FlushToDB() were lost. The data wasn't included in the current flush (already snapshotted) and was wiped by the subsequent reset. This is likely low impact to overall usage numbers in the real world. Fix by tracking unique workspace/user deltas separately from cumulative values and always tracking delta allowed/denied requests. Deltas are used for INSERT (fresh start after reset), cumulative for UPDATE (accurate unique counts within a period). All counters reset atomically before the DB operation so Track() calls during the operation are preserved for the next flush.	2026-02-02 09:11:54 -07:00
Zach	7dfa33b410	feat: add boundary usage tracking database schema and tracker skeleton (#21670 ) feat: add boundary usage telemetry database schema and RBAC Adds the foundation for tracking boundary usage telemetry across Coder replicas. This includes: - Database schema: `boundary_usage_stats` table with per-replica stats (unique workspaces, unique users, allowed/denied request counts) - Database queries: upsert stats, get aggregated summary, reset stats, delete by replica ID - RBAC: `boundary_usage` resource type with read/update/delete actions, accessible only via system `BoundaryUsageTracker` subject (not regular user roles) - Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/` The tracker accumulates stats in memory and periodically flushes to the database. Stats are aggregated across replicas for telemetry reporting, then reset when a new reporting period begins. The tracker implementation and plumbing will be done in a subsequent commit/PR. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-27 13:29:21 -07:00
Mathias Fredriksson	25d7f27cdb	feat(coderd): add task log snapshot storage endpoint (#21644 ) This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot endpoint for agents to upload task conversation history during workspace shutdown. This allows users to view task logs even when the workspace is stopped. The endpoint accepts agentapi format payloads (typically last 10 messages, max 64KB), wraps them in a format envelope, and upserts to the task_snapshots table. Uses agent token auth and validates the task belongs to the agent's workspace. Closes coder/internal#1253	2026-01-27 11:09:24 +02:00
Spike Curtis	f47f89d997	chore: remove unused tailnet v1 tables and queries (#21646 ) Removes the legacy tailnet v1 API tables (`tailnet_clients`, `tailnet_agents`, `tailnet_client_subscriptions`) and their associated queries, triggers, and functions. These were superseded by the v2 tables (`tailnet_peers`, `tailnet_tunnels`) in migration 000168, and the v1 API code was removed in commit `d6154c4310`, but the database artifacts were never cleaned up. Changes: - New migration `000410_remove_tailnet_v1_tables` to drop the unused tables - Removed 11 unused queries from `tailnet.sql` - Removed associated manual wrapper methods in `dbauthz` and `dbmetrics` - ~930 lines deleted across 11 files	2026-01-26 14:27:17 +04:00
Callum Styan	e195856c43	perf: reduce pg_notify call volume by batching together agent metadata updates (#21330 ) --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-22 22:47:49 -08:00
Mathias Fredriksson	97e8a5b093	fix(coderd): allow agent auth during workspace shutdown (#21538 ) Agents were losing authentication during workspace shutdown, causing shutdown scripts to fail. The auth query required agents to belong to the latest build, but during shutdown a `stop` build becomes latest while the `start` build's agents are still running. Modified the auth query to allow `start` build agents to authenticate temporarily during `stop` execution. The query allows auth when: - Agent's `start` build job succeeded - Latest build is `stop` with `pending`/`running` job status - Builds are adjacent (`stop` is `build_number + 1`) - Template versions match Auth closes once `stop` completes. Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to `GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the agent's build (not always latest) during shutdown. Closes coder/internal#1249 Fixes #19467	2026-01-21 13:18:43 +00:00
Cian Johnston	08343a7a9f	perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522 ) Relates to https://github.com/coder/internal/issues/1214 The `ExtractWorkspaceAgentParam` middleware ends up making 4 database queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource` -> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that hard work on the floor. The `api.workspaceAgent` handler that references this middleware then has to do all of that work again, plus one more query to get the related `User` so we can get the username. This pattern is also mirrored in `getDatabaseTerminal` but without the middleware. This PR: * Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all this information at once to avoid the multiple round-trips, * Updates the existing usage of `GetWorkspaceAgentByID` to this new query instead, * Updates `ExtractWorkspaceAgentParam` to also store the workspace in the request context Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)	2026-01-19 12:36:33 +00:00
George K	0712faef4f	feat(enterprise): implement organization "disable workspace sharing" option (#21376 ) Adds a per-organization setting to disable workspace sharing. When enabled, all existing workspace ACLs in the organization are cleared and the workspace ACL mutation API endpoints return `403 Forbidden`. This complements the existing site-wide `--disable-workspace-sharing` flag by providing more granular control at the organization level. Closes https://github.com/coder/internal/issues/1073 (part 2) --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-14 09:47:50 -08:00
Mathias Fredriksson	ad93262d07	fix(coderd/database/dbpurge): allow disabling AI Bridge retention with 0 (#21062 ) Previously setting AI Bridge retention to 0 would cause records to be deleted immediately since we didn't check for the zero value before calculating the deletion threshold. This adds a check for aibridgeRetention > 0 to skip deletion when retention is disabled, matching the pattern used for other retention settings (connection logs, audit logs, etc.). Also fixes the return type of DeleteOldAIBridgeRecords from int32 to int64 since COUNT(*) returns bigint in PostgreSQL. Refs #21055	2025-12-03 09:37:18 +00:00
Mathias Fredriksson	ff46917e62	feat: add retention config for `workspace_agent_logs` (#21039 ) Replace hardcoded 7-day retention for workspace agent logs with configurable retention from deployment settings. Defaults to 7d to preserve existing behavior. Depends on #21038 Updates #20743	2025-12-02 16:01:33 +00:00
Mathias Fredriksson	c85d79bcdb	feat(coderd/database/dbpurge): add retention for audit logs (#21025 ) Add configurable retention policy for audit logs. The DeleteOldAuditLogs query excludes deprecated connection events (connect, disconnect, open, close) which are handled separately by DeleteOldAuditLogConnectionEvents. Disabled (0) by default. Depends on #21021 Updates #20743	2025-12-02 16:50:09 +02:00
Mathias Fredriksson	9ebcca5b0d	feat(coderd/database/dbpurge): add retention for connection logs (#21022 ) Add `DeleteOldConnectionLogs` query and integrate it into the `dbpurge` routine. Retention is controlled by `--retention-connection-logs` flag. Disabled (0) by default. Depends on #21021 Updates #20743	2025-12-02 14:17:52 +00:00
Susana Ferreira	f8d9a8046f	feat: add notification warning alert to Tasks page (#20900 ) ## Problem Users may not realize that task notifications are disabled by default. To improve awareness, we show a warning alert on the Tasks page when all task notifications are disabled. Alert visibility logic: - Shows when all task notification templates (Task Working, Task Idle, Task Completed, Task Failed) are disabled - Can be dismissed by the user, which stores the dismissal in the user preferences API - If the user later enables any task notification in Account Settings, the dismissal state is cleared so the alert will show again if they disable all notifications in the future <img width="2980" height="1588" alt="Screenshot 2025-11-25 at 17 48 17" src="https://github.com/user-attachments/assets/316bf097-d9d2-4489-bc16-2987ba45f45c" /> ## Changes - Added a warning alert to the Tasks page when all task notifications are disabled - Introduced new `/users/{user}/preferences` endpoint to manage user preferences (stored in `user_configs` table) - Alert is dismissible and stores the dismissal state via the new user preferences API endpoint - Enabling any task notification in Account Settings clears the dismissal state via the preferences API - Added comprehensive Storybook stories for both TasksPage and NotificationsPage to test all alert visibility states and interactions Closes: https://github.com/coder/internal/issues/1089	2025-11-28 16:50:59 +00:00
Mathias Fredriksson	37fc6646ad	perf(coderd/database): limit `GetLatestWorkspaceAppStatusByAppID` to 1 row (#20917 ) ## Description This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID` returned an unbounded number of rows for a given app ID, which could cause performance issues for noisy or long-running AI tasks. ## Impact This change reduces database query overhead for workspace app status updates, particularly for busy AI tasks that update their status frequently. Previously, fetching the latest status would return all historical statuses, now it returns only the most recent one. Fixes #20862 --- 🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️	2025-11-25 16:56:42 +02:00
Danielle Maywood	82f525baf3	feat(coderd): add task prompt modification endpoint (#20811 ) This PR adds the backend implementation for modifying task prompts. Part of https://github.com/coder/internal/issues/1084 ## Changes - New `UpdateTaskPrompt` database query to update task prompts - New PATCH `/api/v2/tasks/{task}/prompt` endpoint ## Notes This is part 1 of a 2-part PR stack. The frontend UI will be added in a follow-up PR based on this branch (https://github.com/coder/coder/pull/20812). --- 🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder Mux](https://github.com/coder/cmux) and reviewed by a human 👩	2025-11-25 11:13:32 +00:00
Steven Masley	cefe07d074	feat: purge expired api keys in dbpurge (#20863 ) closes https://github.com/coder/coder/issues/19889 This is in response to a migration in v2.27 that takes very long on deployments with large `api_key` tables.	2025-11-24 10:24:32 -06:00
Atif Ali	636408906f	chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831 )	2025-11-24 18:09:04 +05:00
Danny Kopping	5a7d4f69f6	feat: add configurable retention for aibridge (#20828 ) Closes https://github.com/coder/internal/issues/1134 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-21 11:35:36 +02:00
Marcin Tojek	d004710a74	feat: add prebuild invalidation via last_invalidated_at timestamp (#20582 ) Updates #17917	2025-11-20 17:12:25 +01:00
Cian Johnston	34f6e72879	feat(coderd): add lookup task by name in httpmw.TaskParam (#20647 ) * Adds a `GetTaskByOwnerIDAndName` query * Updates `httpmw.TaskParam` to fall back to task name if no task by UUID found. * Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.	2025-11-05 14:28:34 +00:00
Mathias Fredriksson	303e9ef7de	fix: switch to coder/sqlc fork (#20536 ) Refs https://github.com/coder/sqlc/pull/1 Unblocks https://github.com/coder/coder/pull/20501 Upstream https://github.com/sqlc-dev/sqlc/pull/4159	2025-10-29 18:45:56 +02:00
Susana Ferreira	7e8fcb4b0f	perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493 ) ## Description The membership reconciliation ensures the prebuilds system user is a member of all organizations with prebuilds configured. To support prebuilds quota management, each organization must have a prebuilds group that the system user belongs to. ## Problem Previously, membership reconciliation iterated over all presets to check and update membership status. This meant database queries `GetGroupByOrgAndName` and `InsertGroupMember` were executed for each preset. Since presets are unique combinations of `(organization, template, template version, preset)`, this resulted in several redundant checks for the same organization. In dogfood, `InsertGroupMember` was called thousands of times per day, even though memberships were already configured ([internal Grafana dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1)) <img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36" src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db" /> ## Solution This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query that returns: * All unique organizations with prebuilds configured * Whether the prebuilds user is a member of each organization * Whether the prebuilds group exists in each organization * Whether the prebuilds user is in the prebuilds group The membership reconciliation logic now: * Fetches status for all organizations in one query * Only performs inserts for organizations missing required memberships or groups * Safely handles concurrent operations via unique constraint violations * This reduces database load from `O(presets)` to `O(organizations)` per reconciliation loop, with a single read query when everything is configured. ## Changes * Add `GetOrganizationsWithPrebuildStatus` SQL query * Update `membership.ReconcileAll` to use organization-based reconciliation instead of preset-based * Update tests to reflect new behavior Related to internal thread: https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369	2025-10-29 14:24:29 +00:00
Susana Ferreira	c3e3bb58f2	feat: delete pending canceled prebuilds (#20499 ) ## Description PR https://github.com/coder/coder/pull/20387 introduced canceling pending prebuild jobs from inactive template versions to avoid provisioning obsolete workspaces. However, the associated prebuilds remained in the database with "Canceled" status, visible in the UI. This PR now orphan-deletes these canceled prebuilt workspaces. Since the canceled jobs were never processed by a provisioner, no Terraform resources were created, making orphan deletion safe. Orphan deletion always creates a provisioner job, but behaves differently based on provisioner availability: - If no provisioner daemon is available, the job is immediately marked as completed and the workspace is marked as deleted without any provisioner processing - If a provisioner daemon is available, it processes the delete job with empty Terraform state (no actual resources to destroy) The job cancellation and workspace deletion occur atomically in the same transaction. We don't split this into two separate reconciliation runs because there's no way to distinguish between system-canceled prebuilds and user-canceled workspaces. If we deleted canceled workspaces in a later run, we'd delete user-canceled workspaces that users may want to keep for troubleshooting. Note: This only applies to system-generated prebuilds from inactive template versions. ## Changes * Update `UpdatePrebuildProvisionerJobWithCancel` query to return job ID, workspace ID, template ID, and template version preset ID * Add `DeprovisionMode` enum to support orphan deletion in the provision flow * Update `ActionTypeCancelPending` handler to cancel jobs and orphan-delete associated workspaces atomically	2025-10-29 10:37:28 +00:00
Dean Sheather	5a3ceb38f0	chore: add aibridge data to telemetry (#20449 ) - Adds a new table to keep track of which payloads have already been reported since we only report for the last clock hour - Adds a query to gather and aggregate all the data by provider/model/client Relates to https://github.com/coder/coder-telemetry-server/issues/27	2025-10-28 03:16:41 +11:00
Paweł Banaszewski	50ba223aa1	feat: add db query for setting interception ended_at field (#20437 ) Adds UpdateAIBridgeInterceptionEnded query to mark interceptions as done. Needed for https://github.com/coder/internal/issues/1051	2025-10-27 09:51:37 +01:00
Susana Ferreira	f6e86c6fdb	feat: cancel pending prebuilds from non-active template versions (#20387 ) ## Description This PR introduces an optimization to automatically cancel pending prebuild-related jobs from non-active template versions in the reconciliation loop. ## Problem Currently, when a template is configured with more prebuild instances than available provisioners, the provisioner queue can become flooded with pending prebuild jobs. This issue is worsened when provisioning/deprovisioning operations take a long time. When the prebuild reconciliation loop generates jobs faster than provisioners can process them, pending jobs accumulate in the queue. Since prebuilt workspaces should always run the latest active template version, pending prebuild jobs from non-active versions become obsolete once a new version is promoted. ## Solution The reconciliation loop cancels pending prebuild-related jobs from non-active template versions that match the following criteria: * Build number: 1 (initial build created by the reconciliation loop) * Job status: `pending` * Not yet picked up by a provisioner (`worker_id` is `NULL`) * Owned by the prebuilds system user * Workspace transition: `start` This prevents the queue from being cluttered with stale prebuild jobs that would provision workspaces on an outdated template version that would consequently need to be deprovisioned. ## Changes * Added new SQL query `CountPendingNonActivePrebuilds` to identify presets with pending jobs from non-active versions * Added new SQL query `UpdatePrebuildProvisionerJobWithCancel` to cancel jobs for a specific preset * New reconciliation action type `ActionTypeCancelPending` handles the cancellation logic * Cancellation is non-blocking: failures to cancel prebuild jobs are logged as errors and don't prevent other reconciliation actions ## Follow-up PR Canceling pending prebuild jobs leaves workspaces in a Canceled state. While no Terraform resources need to be destroyed (since jobs were canceled before provisioning started), these database records should still be cleaned up. This will be addressed in a follow-up PR. Closes: https://github.com/coder/coder/issues/20242	2025-10-24 15:27:49 +01:00
Steven Masley	13ca9ead3a	chore!: ensure consistent secret token generation and hashing (#20388 ) This PR uses the same sha256 hashing technique as we use for APIKeys. So now all randomly generated secrets will be hashed with sha256 for consistency. This is a breaking change for the oauth tokens. Since oauth is only allowed for dev builds and experimental, this is ok.	2025-10-23 15:38:49 -05:00
Mathias Fredriksson	9855460524	feat(coderd): use new data model for task delete (#20334 ) Updates coder/internal#976	2025-10-23 19:45:18 +03:00
Mathias Fredriksson	5c802c2627	feat(coderd): use task data model when creating a new task (#20275 ) Updates coder/internal#976	2025-10-23 19:12:09 +03:00
Dean Sheather	69c2c40512	chore: add user details to aibridge interception list endpoint (#20397 ) - Adds FK from `aibridge_interceptions.initiator_id` to `users.id` - This is enforced by deleting any rows that don't have any users. Since this is an experimental feature AND coder never deletes user rows I think this is acceptable. - Adds `name` as a property on `codersdk.MinimalUser` - This matches the `visible_users` view in the database. I'm unsure why `name` wasn't already included given that `username` is. - Adds a new `initiator` field to `codersdk.AIBridgeInterception` which contains `codersdk.MinimalUser` (ID, username, name, avatar URL) - Removes `initiator_id` from `codersdk.AIBridgeInterception` - Should be fine since we're still in early access	2025-10-22 16:18:31 +11:00
Dean Sheather	ea261a1f7c	chore: add offset-based pagination support to aibridge list endpoint (#20393 ) Necessary for the frontend to be able to paginate easily. Cursor pagination is good for fetching all events, but doesn't play very well when a pagination component gets involved. Adds support for `?offset=x` to the existing endpoint. The cursor-based pagination (`?after_id=x`) is still supported. The two pagination modes are mutually exclusive, and are documented as such. If both are supplied, the request will be rejected. Also adds a `total` property to the response that contains the full count of items matching the filter. We already have indices in place so I don't think this will impact performance (or we can revisit it before GA).	2025-10-21 11:50:00 +00:00
Callum Styan	141ef23c81	fix: introduce dedicated queries for workspaces and workspace agents metrics (#19786 ) aid in differentiation between sources of calls to `GetWorkspaces` but introducing new queries for metrics specific use cases --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-10-17 13:40:10 -07:00
Cian Johnston	9f229370e7	feat(coderd/database): add ListTasks query (#20282 ) Relates to https://github.com/coder/internal/issues/981 Adds a `ListTasks` query that allows filtering by OwnerID and OrganizationID.	2025-10-14 17:33:30 +01:00
Mathias Fredriksson	952c69f412	feat(coderd/database): add task status and status view (#20235 ) This change updates the `task_workspace_apps` table structure for improved linking to workspace builds and adds queries to manage tasks and a view to expose task status. Updates coder/internal#948 Supersedes coder/coder#20212 Supersedes coder/coder#19773	2025-10-13 12:25:58 +03:00
Susana Ferreira	fdb0267e5d	feat: add notification for task status (#19965 ) ## Description Send a notification to the workspace owner when an AI task’s app state becomes `Working` or `Idle`. An AI task is identified by a workspace build with `HasAITask = true` and `AITaskSidebarAppID` matching the agent app’s ID. ## Changes * Add `TemplateTaskWorking` notification template. * Add `TemplateTaskIdle` notification template. * Add `GetLatestWorkspaceAppStatusesByAppID` SQL query to get the workspace app statuses ordered by latest first. * Update `PATCH /workspaceagents/me/app-status` to enqueue: * `TemplateTaskWorking` when state transitions to `working` * `TemplateTaskIdle` when state transitions to `idle` * Notification labels include: * `task`: task initial prompt * `workspace`: workspace name * Notification dedupe: include a minute-bucketed timestamp (UTC truncated to the minute) in the enqueue data to allow identical content to resend within the same day (but not more than once per minute). Closes: https://github.com/coder/coder/issues/19776	2025-09-29 16:44:53 +01:00
Paweł Banaszewski	0a6ba5d51a	feat: add endpoint to list aibridge interceptions (#19929 ) Co-authored-by: Dean Sheather <dean@deansheather.com>	2025-09-27 00:20:33 +10:00
Danny Kopping	0a79817050	feat: initialize `aibridged` & mount API handler (#19798 ) Addresses https://github.com/coder/internal/issues/987	2025-09-25 16:37:28 +02:00
Danny Kopping	422bba44d9	chore: add aibridge database resources & define RBAC policies (#19796 ) Closes https://github.com/coder/internal/issues/986	2025-09-16 21:31:17 +02:00
Brett Kolodny	854f3c0187	feat: add workspaces/acl [delete] endpoint (#19772 ) Closes [coder/internal#971](https://github.com/coder/internal/issues/971)	2025-09-12 12:21:01 -04:00
Kacper Sawicki	776231d025	fix(coderd): add blocking GetProvisionerJobByIDWithLock for workspace build cancellation (#19737 ) Closes https://github.com/coder/internal/issues/885 Adds a new database method GetProvisionerJobByIDWithLock that uses FOR UPDATE without SKIP LOCKED to fix workspace build cancellation returning 500 errors when jobs are locked.	2025-09-08 15:40:14 +02:00
Cian Johnston	06cbb2890f	fix: expire token for prebuilds user when regenerating session token (#19667 ) * provisionerdserver: Expires prebuild user token for workspace, if it exists, when regenerating session token. * dbauthz: disallow prebuilds user from creating api keys * dbpurge: added functionality to expire stale api keys owned by the prebuilds user	2025-09-02 09:38:43 +01:00
Callum Styan	4fab14b40b	fix: limit the scope of the template average build time query to the last 100 (#19648 ) This PR should resolve https://github.com/coder/internal/issues/719 by limiting the `workspace_builds` rows selected by the query to the most recent 100 builds of a template, as opposed to all builds in the last 30d. For our own internal templates with the most builds (1700-2000 in a 30d period) this should cut the query execution time by about 80%. Unless we have some restriction on keeping the 30d period, contract related or otherwise, this seems like a safe change to make. In addition to the execution speed improvements it also means the memory for the query is bounded as well. If we want to keep a 30d time period for the avg build time value I think it's worth exploring a purpose built solution such as histogram structures where the build times could be bucketized by template ID as they're observed. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-09-01 09:31:21 -07:00
Dean Sheather	39bf3ba628	chore: replace GetManagedAgentCount query with aggregate table (#19636 ) - Removes GetManagedAgentCount query - Adds new table `usage_events_daily` which stores aggregated usage events by the type and UTC day - Adds trigger to update the values in this table when a new row is inserted into `usage_events` - Adds a migration that adds `usage_events_daily` rows for existing data in `usage_events` - Adds tests for the trigger - Adds tests for the backfill query in the migration Since the `usage_events` table is unreleased currently, this migration will do nothing on real deployments and will only affect preview deployments such as dogfood. Closes https://github.com/coder/internal/issues/943	2025-08-30 03:39:37 +10:00
Susana Ferreira	0ab345ca84	feat: add prebuild timing metrics to Prometheus (#19503 ) ## Description This PR introduces one counter and two histograms related to workspace creation and claiming. The goal is to provide clearer observability into how workspaces are created (regular vs prebuild) and the time cost of those operations. ### `coderd_workspace_creation_total` * Metric type: Counter * Name: `coderd_workspace_creation_total` * Labels: `organization_name`, `template_name`, `preset_name` This counter tracks whether a regular workspace (not created from a prebuild pool) was created using a preset or not. Currently, we already expose `coderd_prebuilt_workspaces_claimed_total` for claimed prebuilt workspaces, but we lack a comparable metric for regular workspace creations. This metric fills that gap, making it possible to compare regular creations against claims. Implementation notes: * Exposed as a `coderd_` metric, consistent with other workspace-related metrics (e.g. `coderd_api_workspace_latest_build`: https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149). * Every `defaultRefreshRate` (1 minute ), DB query `GetRegularWorkspaceCreateMetrics` is executed to fetch all regular workspaces (not created from a prebuild pool). * The counter is updated with the total from all time (not just since metric introduction). This differs from the histograms below, which only accumulate from their introduction forward. ### `coderd_workspace_creation_duration_seconds` & `coderd_prebuilt_workspace_claim_duration_seconds` * Metric types: Histogram * Names: * `coderd_workspace_creation_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name`, `type` (`regular`, `prebuild`) * `coderd_prebuilt_workspace_claim_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name` We already have `coderd_provisionerd_workspace_build_timings_seconds`, which tracks build run times for all workspace builds handled by the provisioner daemon. However, in the context of this issue, we are only interested in creation and claim build times, not all transitions; additionally, this metric does not include `preset_name`, and adding it there would significantly increase cardinality. Therefore, separate more focused metrics are introduced here: * `coderd_workspace_creation_duration_seconds`: Build time to create a workspace (either a regular workspace or the build into a prebuild pool, for prebuild initial provisioning build). * `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a prebuilt workspace from the pool. The reason for two separate histograms is that: * Creation (regular or prebuild): provisioning builds with similar time magnitude, generally expected to take longer than a claim operation. * Claim: expected to be a much faster provisioning build. #### Native histogram usage Provisioning times vary widely between projects. Using static buckets risks unbalanced or poorly informative histograms. To address this, these metrics use [Prometheus native histograms](https://prometheus.io/docs/specs/native_histograms/): * First introduced in Prometheus v2.40.0 * Recommended stable usage from v2.45+ * Requires Go client `prometheus/client_golang` v1.15.0+ * Experimental and must be explicitly enabled on the server (`--enable-feature=native-histograms`) For compatibility, we also retain a classic bucket definition (aligned with the existing provisioner metric: https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189). * If native histograms are enabled, Prometheus ingests the high-resolution histogram. * If not, it falls back to the predefined buckets. Implementation notes: * Unlike the counter, these histograms are updated in real-time at workspace build job completion. * They reflect data only from the point of introduction forward (no historical backfill). ## Relates to Closes: https://github.com/coder/coder/issues/19528 Native histograms tested in observability stack: https://github.com/coder/observability/pull/50	2025-08-28 15:00:26 +01:00
ケイラ	d7ee1019c0	feat: add endpoint for retrieving workspace acl (#19375 ) Implements `/acl [get]` for workspaces, with tests. Blocked by experiment enablement	2025-08-25 07:11:18 -05:00
Sas Swart	f9a6adc704	feat: claim prebuilds based on workspace parameters instead of preset id (#19279 ) Closes https://github.com/coder/coder/issues/18356. This change finds and selects a matching preset if one was not chosen during workspace creation. This solidifies the relationship between presets and parameters. When a workspace is created without in explicitly chosen preset, it will now still be eligible to claim a prebuilt workspace if one is available.	2025-08-20 11:02:53 +02:00

1 2 3 4 5 ...

381 Commits