coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Danielle Maywood	e7dbbcde87	fix: do not notify marked for deletion for deleted workspaces (#20937 ) Closes https://github.com/coder/coder/issues/20913 I've ran the test without the fix, verified the test caught the issue, then applied the fix, and confirmed the issue no longer happens. --- 🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code and then review by a human 👩	2025-11-26 09:23:16 +00:00
Mathias Fredriksson	37fc6646ad	perf(coderd/database): limit `GetLatestWorkspaceAppStatusByAppID` to 1 row (#20917 ) ## Description This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID` returned an unbounded number of rows for a given app ID, which could cause performance issues for noisy or long-running AI tasks. ## Impact This change reduces database query overhead for workspace app status updates, particularly for busy AI tasks that update their status frequently. Previously, fetching the latest status would return all historical statuses, now it returns only the most recent one. Fixes #20862 --- 🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️	2025-11-25 16:56:42 +02:00
Susana Ferreira	3011207519	feat: add display name field for tasks (#20856 ) ## Problem Tasks currently only expose a machine-friendly name field (e.g. `task-python-debug-a1b2`), but this value is primarily an identifier rather than a clean, descriptive label. We need a separate display-friendly name for use in the UI. This PR introduces a new `display_name` field and updates the task-name generation flow. The Claude system prompt was updated to return valid JSON with both `name` and `display_name`. The name generation logic follows a fallback chain (Anthropic > prompt sanitization > random fallback). To make task names more closely resemble their display names, the legacy `task-` prefix has been removed. For context, PR https://github.com/coder/coder/pull/20834 introduced a small Task icon to the workspace list to help identify workspaces associated to tasks. ## Changes - Database migration: Added `display_name` column to tasks table - Updated system prompt to generate both task name and display name as valid JSON - Task name generation now follows a fallback chain: Anthropic > prompt sanitization > random fallback - Removed `task-` prefix from task names to allow more descriptive names - Note: PR https://github.com/coder/coder/pull/20834 adds a Task icon to workspaces in the workspace list to distinguish task-created workspaces Note: UI changes will be addressed in a follow-up PR Related to: https://github.com/coder/coder/issues/20801	2025-11-25 13:00:59 +00:00
Danielle Maywood	82f525baf3	feat(coderd): add task prompt modification endpoint (#20811 ) This PR adds the backend implementation for modifying task prompts. Part of https://github.com/coder/internal/issues/1084 ## Changes - New `UpdateTaskPrompt` database query to update task prompts - New PATCH `/api/v2/tasks/{task}/prompt` endpoint ## Notes This is part 1 of a 2-part PR stack. The frontend UI will be added in a follow-up PR based on this branch (https://github.com/coder/coder/pull/20812). --- 🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder Mux](https://github.com/coder/cmux) and reviewed by a human 👩	2025-11-25 11:13:32 +00:00
Jake Howell	ca560d36ce	fix: remove inflight interceptions from aibridge returned values (#20852 ) Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54) When querying against the values in the database for `/api/experimental/aibridge/interceptions` we found strange behaviour wherein there was interceptions that lacked prompting and other various fields we want. Generally this was as a result of the data not actually existing for these values (as they were inflight). The simple solution to this was to hide them if they didn't exist. This PR addresses that. --------- Co-authored-by: Danny Kopping <danny@coder.com>	2025-11-25 10:23:39 +11:00
Steven Masley	cefe07d074	feat: purge expired api keys in dbpurge (#20863 ) closes https://github.com/coder/coder/issues/19889 This is in response to a migration in v2.27 that takes very long on deployments with large `api_key` tables.	2025-11-24 10:24:32 -06:00
Atif Ali	636408906f	chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831 )	2025-11-24 18:09:04 +05:00
Danny Kopping	5a7d4f69f6	feat: add configurable retention for aibridge (#20828 ) Closes https://github.com/coder/internal/issues/1134 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-21 11:35:36 +02:00
Marcin Tojek	d004710a74	feat: add prebuild invalidation via last_invalidated_at timestamp (#20582 ) Updates #17917	2025-11-20 17:12:25 +01:00
Steven Masley	fe3b825b86	chore: per template opt into cached terraform directories (#20609 ) For experimental and dogfood purposes, this adds the ability to opt in a single template. Leaving the rest of the templates as is. For GA, this setting might be removed or changed.	2025-11-13 14:04:12 -06:00
Paweł Banaszewski	991831b1dd	chore: add API key ID to interceptions (#20513 ) Adds APIKeyID to interceptions. Needed for tracking API key usage with bridge. fixes https://github.com/coder/coder/issues/20001	2025-11-10 13:46:41 +01:00
Mathias Fredriksson	ce04f6cc5d	fix(coderd): remove deprecated AITaskSidebarApp column (#20680 ) This column was no longer used in `v2.28` and the codersdk field deprecated. Both can now be dropped in `v2.29`. Closes coder/internal#974	2025-11-07 12:45:45 +02:00
Cian Johnston	34f6e72879	feat(coderd): add lookup task by name in httpmw.TaskParam (#20647 ) * Adds a `GetTaskByOwnerIDAndName` query * Updates `httpmw.TaskParam` to fall back to task name if no task by UUID found. * Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.	2025-11-05 14:28:34 +00:00
Mathias Fredriksson	7ae3fdc749	refactor: use task data model for notifications (#20590 ) Updates coder/internal#973 Updates coder/internal#974	2025-10-31 15:53:27 +02:00
Mathias Fredriksson	859e94d67a	fix: deprecate codersdk.AITaskPromptParameterName and reduce usage (#20501 ) Depends on coder/sqlc#1 Fixes coder/internal#979 Updates coder/internal#973	2025-10-29 18:59:12 +00:00
Cian Johnston	1ebc217624	fix: update task link AppStatus using task_id (#20543 ) Fixes https://github.com/coder/coder/issues/20515 Alternative to https://github.com/coder/coder/pull/20519 Adds `task_id` to `workspaces_expanded` view and updates the "View Task" link in `AppStatuses` component. NOTE: this contains a migration	2025-10-29 15:45:45 +00:00
Susana Ferreira	7e8fcb4b0f	perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493 ) ## Description The membership reconciliation ensures the prebuilds system user is a member of all organizations with prebuilds configured. To support prebuilds quota management, each organization must have a prebuilds group that the system user belongs to. ## Problem Previously, membership reconciliation iterated over all presets to check and update membership status. This meant database queries `GetGroupByOrgAndName` and `InsertGroupMember` were executed for each preset. Since presets are unique combinations of `(organization, template, template version, preset)`, this resulted in several redundant checks for the same organization. In dogfood, `InsertGroupMember` was called thousands of times per day, even though memberships were already configured ([internal Grafana dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1)) <img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36" src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db" /> ## Solution This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query that returns: * All unique organizations with prebuilds configured * Whether the prebuilds user is a member of each organization * Whether the prebuilds group exists in each organization * Whether the prebuilds user is in the prebuilds group The membership reconciliation logic now: * Fetches status for all organizations in one query * Only performs inserts for organizations missing required memberships or groups * Safely handles concurrent operations via unique constraint violations * This reduces database load from `O(presets)` to `O(organizations)` per reconciliation loop, with a single read query when everything is configured. ## Changes * Add `GetOrganizationsWithPrebuildStatus` SQL query * Update `membership.ReconcileAll` to use organization-based reconciliation instead of preset-based * Update tests to reflect new behavior Related to internal thread: https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369	2025-10-29 14:24:29 +00:00
Susana Ferreira	c3e3bb58f2	feat: delete pending canceled prebuilds (#20499 ) ## Description PR https://github.com/coder/coder/pull/20387 introduced canceling pending prebuild jobs from inactive template versions to avoid provisioning obsolete workspaces. However, the associated prebuilds remained in the database with "Canceled" status, visible in the UI. This PR now orphan-deletes these canceled prebuilt workspaces. Since the canceled jobs were never processed by a provisioner, no Terraform resources were created, making orphan deletion safe. Orphan deletion always creates a provisioner job, but behaves differently based on provisioner availability: - If no provisioner daemon is available, the job is immediately marked as completed and the workspace is marked as deleted without any provisioner processing - If a provisioner daemon is available, it processes the delete job with empty Terraform state (no actual resources to destroy) The job cancellation and workspace deletion occur atomically in the same transaction. We don't split this into two separate reconciliation runs because there's no way to distinguish between system-canceled prebuilds and user-canceled workspaces. If we deleted canceled workspaces in a later run, we'd delete user-canceled workspaces that users may want to keep for troubleshooting. Note: This only applies to system-generated prebuilds from inactive template versions. ## Changes * Update `UpdatePrebuildProvisionerJobWithCancel` query to return job ID, workspace ID, template ID, and template version preset ID * Add `DeprovisionMode` enum to support orphan deletion in the provision flow * Update `ActionTypeCancelPending` handler to cancel jobs and orphan-delete associated workspaces atomically	2025-10-29 10:37:28 +00:00
Mathias Fredriksson	a1fa58ac17	fix: update dbgen and dbfake task creation and toolsdk test fixtures (#20508 ) Depends on #20506 Fixes coder/internal#1103	2025-10-28 14:15:58 +02:00
Dean Sheather	5a3ceb38f0	chore: add aibridge data to telemetry (#20449 ) - Adds a new table to keep track of which payloads have already been reported since we only report for the last clock hour - Adds a query to gather and aggregate all the data by provider/model/client Relates to https://github.com/coder/coder-telemetry-server/issues/27	2025-10-28 03:16:41 +11:00
Paweł Banaszewski	50ba223aa1	feat: add db query for setting interception ended_at field (#20437 ) Adds UpdateAIBridgeInterceptionEnded query to mark interceptions as done. Needed for https://github.com/coder/internal/issues/1051	2025-10-27 09:51:37 +01:00
Susana Ferreira	f6e86c6fdb	feat: cancel pending prebuilds from non-active template versions (#20387 ) ## Description This PR introduces an optimization to automatically cancel pending prebuild-related jobs from non-active template versions in the reconciliation loop. ## Problem Currently, when a template is configured with more prebuild instances than available provisioners, the provisioner queue can become flooded with pending prebuild jobs. This issue is worsened when provisioning/deprovisioning operations take a long time. When the prebuild reconciliation loop generates jobs faster than provisioners can process them, pending jobs accumulate in the queue. Since prebuilt workspaces should always run the latest active template version, pending prebuild jobs from non-active versions become obsolete once a new version is promoted. ## Solution The reconciliation loop cancels pending prebuild-related jobs from non-active template versions that match the following criteria: * Build number: 1 (initial build created by the reconciliation loop) * Job status: `pending` * Not yet picked up by a provisioner (`worker_id` is `NULL`) * Owned by the prebuilds system user * Workspace transition: `start` This prevents the queue from being cluttered with stale prebuild jobs that would provision workspaces on an outdated template version that would consequently need to be deprovisioned. ## Changes * Added new SQL query `CountPendingNonActivePrebuilds` to identify presets with pending jobs from non-active versions * Added new SQL query `UpdatePrebuildProvisionerJobWithCancel` to cancel jobs for a specific preset * New reconciliation action type `ActionTypeCancelPending` handles the cancellation logic * Cancellation is non-blocking: failures to cancel prebuild jobs are logged as errors and don't prevent other reconciliation actions ## Follow-up PR Canceling pending prebuild jobs leaves workspaces in a Canceled state. While no Terraform resources need to be destroyed (since jobs were canceled before provisioning started), these database records should still be cleaned up. This will be addressed in a follow-up PR. Closes: https://github.com/coder/coder/issues/20242	2025-10-24 15:27:49 +01:00
Mathias Fredriksson	a106d67c07	feat(coderd): use task data model for list (#20394 ) Updates coder/internal#976	2025-10-23 20:22:51 +03:00
Mathias Fredriksson	9855460524	feat(coderd): use new data model for task delete (#20334 ) Updates coder/internal#976	2025-10-23 19:45:18 +03:00
Mathias Fredriksson	5c802c2627	feat(coderd): use task data model when creating a new task (#20275 ) Updates coder/internal#976	2025-10-23 19:12:09 +03:00
Dean Sheather	69c2c40512	chore: add user details to aibridge interception list endpoint (#20397 ) - Adds FK from `aibridge_interceptions.initiator_id` to `users.id` - This is enforced by deleting any rows that don't have any users. Since this is an experimental feature AND coder never deletes user rows I think this is acceptable. - Adds `name` as a property on `codersdk.MinimalUser` - This matches the `visible_users` view in the database. I'm unsure why `name` wasn't already included given that `username` is. - Adds a new `initiator` field to `codersdk.AIBridgeInterception` which contains `codersdk.MinimalUser` (ID, username, name, avatar URL) - Removes `initiator_id` from `codersdk.AIBridgeInterception` - Should be fine since we're still in early access	2025-10-22 16:18:31 +11:00
Dean Sheather	ea261a1f7c	chore: add offset-based pagination support to aibridge list endpoint (#20393 ) Necessary for the frontend to be able to paginate easily. Cursor pagination is good for fetching all events, but doesn't play very well when a pagination component gets involved. Adds support for `?offset=x` to the existing endpoint. The cursor-based pagination (`?after_id=x`) is still supported. The two pagination modes are mutually exclusive, and are documented as such. If both are supplied, the request will be rejected. Also adds a `total` property to the response that contains the full count of items matching the filter. We already have indices in place so I don't think this will impact performance (or we can revisit it before GA).	2025-10-21 11:50:00 +00:00
Callum Styan	141ef23c81	fix: introduce dedicated queries for workspaces and workspace agents metrics (#19786 ) aid in differentiation between sources of calls to `GetWorkspaces` but introducing new queries for metrics specific use cases --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-10-17 13:40:10 -07:00
Cian Johnston	9f229370e7	feat(coderd/database): add ListTasks query (#20282 ) Relates to https://github.com/coder/internal/issues/981 Adds a `ListTasks` query that allows filtering by OwnerID and OrganizationID.	2025-10-14 17:33:30 +01:00
Mathias Fredriksson	952c69f412	feat(coderd/database): add task status and status view (#20235 ) This change updates the `task_workspace_apps` table structure for improved linking to workspace builds and adds queries to manage tasks and a view to expose task status. Updates coder/internal#948 Supersedes coder/coder#20212 Supersedes coder/coder#19773	2025-10-13 12:25:58 +03:00
Mathias Fredriksson	057d7dacdc	chore(coderd/database/queries): remove trailing whitespace (#20192 )	2025-10-07 13:10:38 +00:00
Sas Swart	d17dd5d787	feat: add filtering by initiator to provisioner job listing in the CLI (#20137 ) Relates to https://github.com/coder/internal/issues/934 This PR provides a mechanism to filter provisioner jobs according to who initiated the job. This will be used to find pending prebuild jobs when prebuilds have overwhelmed the provisioner job queue. They can then be canceled. If prebuilds are overwhelming provisioners, the following steps will be taken: ```bash # pause prebuild reconciliation to limit provisioner queue pollution: coder prebuilds pause # cancel pending provisioner jobs to clear the queue coder provisioner jobs list --initiator="prebuilds" --status="pending" \| jq ... \| xargs -n1 -I{} coder provisioner jobs cancel {} # push a fixed template and wait for the import to complete coder templates push ... # push a fixed template # resume prebuild reconciliation coder prebuilds resume ``` This interface differs somewhat from what was specified in the issue, but still provides a mechanism that addresses the issue. The original proposal was made by myself and this simpler implementation makes sense. I might add a `--search` parameter in a follow-up if there is appetite for it. Potential follow ups: * Support for this usage: `coder provisioner jobs list --search "initiator:prebuilds status:pending"` * Adding the same parameters to `coder provisioner jobs cancel` as a convenience feature so that operators don't have to pipe through `jq` and `xargs`	2025-10-06 08:56:43 +00:00
Cian Johnston	ff930ad4f3	feat(coderd): add ability to search org members by user_id, is_system, github_user_id (#20048 ) Adds the ability to search org members by query. Supported fields: `user_id`, `is_system`, `github_user_id`.	2025-09-30 23:54:21 +01:00
Susana Ferreira	fdb0267e5d	feat: add notification for task status (#19965 ) ## Description Send a notification to the workspace owner when an AI task’s app state becomes `Working` or `Idle`. An AI task is identified by a workspace build with `HasAITask = true` and `AITaskSidebarAppID` matching the agent app’s ID. ## Changes * Add `TemplateTaskWorking` notification template. * Add `TemplateTaskIdle` notification template. * Add `GetLatestWorkspaceAppStatusesByAppID` SQL query to get the workspace app statuses ordered by latest first. * Update `PATCH /workspaceagents/me/app-status` to enqueue: * `TemplateTaskWorking` when state transitions to `working` * `TemplateTaskIdle` when state transitions to `idle` * Notification labels include: * `task`: task initial prompt * `workspace`: workspace name * Notification dedupe: include a minute-bucketed timestamp (UTC truncated to the minute) in the enqueue data to allow identical content to resend within the same day (but not more than once per minute). Closes: https://github.com/coder/coder/issues/19776	2025-09-29 16:44:53 +01:00
Paweł Banaszewski	0a6ba5d51a	feat: add endpoint to list aibridge interceptions (#19929 ) Co-authored-by: Dean Sheather <dean@deansheather.com>	2025-09-27 00:20:33 +10:00
Thomas Kosiewski	d0db9ec88f	feat: add multi-scope support to API keys (#19917 ) # Canonicalize API Key Scopes This PR introduces canonical API key scopes with a `coder:` namespace prefix to avoid collisions with low-level resource:action names. It: 1. Renames special API key scopes in the database: - `all` → `coder:all` - `application_connect` → `coder:application_connect` 2. Adds support for a new `scopes` field in the API key creation request, allowing multiple scopes to be specified while maintaining backward compatibility with the singular `scope` field. 3. Updates the API documentation to reflect these changes, including the new endpoint for listing public API key scopes. 4. Ensures backward compatibility by mapping between legacy and canonical scope names in relevant code paths.	2025-09-26 11:56:34 +02:00
Danny Kopping	0a79817050	feat: initialize `aibridged` & mount API handler (#19798 ) Addresses https://github.com/coder/internal/issues/987	2025-09-25 16:37:28 +02:00
Danny Kopping	615585d5d1	feat: add `aibridgedserver` pkg (#19902 )	2025-09-25 13:32:16 +02:00
Thomas Kosiewski	fb0ce389a6	feat: implement API key scopes database migration (#19861 ) Added database migration for API key scopes. Fixes #19845	2025-09-22 19:26:51 +02:00
Brett Kolodny	38ca98745b	feat: add shared_with_group: and shared_with_user: filters to /workspaces endpoint (#19875 ) Adds shared_with_user and shared_with_group filters to the /workspaces endpoint. - `shared_with_user`: filters workspaces shared with a specific user. Accepts a user UUID or username. - `shared_with_group`: filters workspaces shared with a specific group. Accepts: - a group UUID, or - `<organization name>/<group name>`, or - `<group name>` (resolved in the default organization). Closes [coder/internal#1004](https://github.com/coder/internal/issues/1004)	2025-09-19 16:05:27 -04:00
Danny Kopping	422bba44d9	chore: add aibridge database resources & define RBAC policies (#19796 ) Closes https://github.com/coder/internal/issues/986	2025-09-16 21:31:17 +02:00
Brett Kolodny	e6b04d1918	feat: add shared filter to workspaces query (#19807 ) Adds a `shared:<boolean>` search query to the `/workspaces [get]` endpoint https://github.com/user-attachments/assets/ccf84bd9-c1fd-4085-825b-2e3176a2d488 Closes [coder/internal#972](https://github.com/coder/internal/issues/972)	2025-09-16 12:37:39 -04:00
Brett Kolodny	854f3c0187	feat: add workspaces/acl [delete] endpoint (#19772 ) Closes [coder/internal#971](https://github.com/coder/internal/issues/971)	2025-09-12 12:21:01 -04:00
Rafael Rodriguez	e53bc247e9	feat: add tooltip field to workspace app that renders as markdown (#19651 ) In this pull request we're adding an optional `tooltip` field. The `tooltip` field is a string field (with markdown support) that will be used to display tooltips on hover over app buttons in a workspace dashboard. Tooltip screenshot <img width="816" height="275" alt="Screenshot 2025-08-29 at 4 11 56 PM" src="https://github.com/user-attachments/assets/52c736a1-f632-465b-89a0-35ca99bd367b" /> Tooltip video https://github.com/user-attachments/assets/21806337-accc-4acf-b8c6-450c031d98f1 Issue: https://github.com/coder/coder/issues/18431 Related provider PR: https://github.com/coder/terraform-provider-coder/pull/435 ### Changes - Added migration to add `tooltip` column to `workspace_apps` table - Updated queries to get/set the new `tooltip` column - Updated frontend to render tooltip as markdown (primary tool tip takes precedence over template tooltip) ### Testing - Added storybook test for `Applink` markdown rendering	2025-09-10 11:01:54 -05:00
Rafael Rodriguez	1677a30a1d	fix: add support for spaces in search & enable searching by display name in templates (#19552 ) ## Summary In this pull request we're updating search to support queries with spaces in addition to the `field:value` pattern that is currently supported. Additionally templates search now defaults to `display_name` (since `display_name` is optional the search will fallback to `name`) when searching without the `field:value` pattern Closes: https://github.com/coder/coder/issues/14384 ### Downsides with searching on `name` and `display_name` Because the `name` field cannot include spaces, we end up in a situation where including a space in the query will result in no results since the query searches on both `name` AND `display_name`. In the following example, we can see the results of searching by both `name` and `display_name` on these templates: \| Name \| Display Name \| \| ------ \| ------------- \| \| docker \| Docker Template \| \| faketemplate \| A Fake Template \| \| azure \| Fake Azure Template \| \| anotherfake \| Another Fake Template \| \| azurefake \| Another Fake Fake Azure Template \| https://github.com/user-attachments/assets/b0e0793e-e77d-46bc-9a42-d7cf4f8bd910 ### Proposal: Search on `display_name` by default and allow for `name` using the `field:value` pattern If we remove `name` from the default template search, we're now able to search with spaces on template `display_names`. Since `display_names` are what users see in the templates list they might expect the search to work this way. Below is an example of `name` being removed from the default template search. https://github.com/user-attachments/assets/9aba5911-4960-4384-befb-08ea1acaa3ab With this approach users would still be able to search on template names by specifying `exact_name:foo`. ### Testing Added additional test cases to ensure spaces were handled as expected in combination with `field:value` patterns.	2025-09-08 17:13:27 -05:00
Kacper Sawicki	776231d025	fix(coderd): add blocking GetProvisionerJobByIDWithLock for workspace build cancellation (#19737 ) Closes https://github.com/coder/internal/issues/885 Adds a new database method GetProvisionerJobByIDWithLock that uses FOR UPDATE without SKIP LOCKED to fix workspace build cancellation returning 500 errors when jobs are locked.	2025-09-08 15:40:14 +02:00
Cian Johnston	06cbb2890f	fix: expire token for prebuilds user when regenerating session token (#19667 ) * provisionerdserver: Expires prebuild user token for workspace, if it exists, when regenerating session token. * dbauthz: disallow prebuilds user from creating api keys * dbpurge: added functionality to expire stale api keys owned by the prebuilds user	2025-09-02 09:38:43 +01:00
Callum Styan	4fab14b40b	fix: limit the scope of the template average build time query to the last 100 (#19648 ) This PR should resolve https://github.com/coder/internal/issues/719 by limiting the `workspace_builds` rows selected by the query to the most recent 100 builds of a template, as opposed to all builds in the last 30d. For our own internal templates with the most builds (1700-2000 in a 30d period) this should cut the query execution time by about 80%. Unless we have some restriction on keeping the 30d period, contract related or otherwise, this seems like a safe change to make. In addition to the execution speed improvements it also means the memory for the query is bounded as well. If we want to keep a 30d time period for the avg build time value I think it's worth exploring a purpose built solution such as histogram structures where the build times could be bucketized by template ID as they're observed. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-09-01 09:31:21 -07:00
Dean Sheather	39bf3ba628	chore: replace GetManagedAgentCount query with aggregate table (#19636 ) - Removes GetManagedAgentCount query - Adds new table `usage_events_daily` which stores aggregated usage events by the type and UTC day - Adds trigger to update the values in this table when a new row is inserted into `usage_events` - Adds a migration that adds `usage_events_daily` rows for existing data in `usage_events` - Adds tests for the trigger - Adds tests for the backfill query in the migration Since the `usage_events` table is unreleased currently, this migration will do nothing on real deployments and will only affect preview deployments such as dogfood. Closes https://github.com/coder/internal/issues/943	2025-08-30 03:39:37 +10:00
Susana Ferreira	0ab345ca84	feat: add prebuild timing metrics to Prometheus (#19503 ) ## Description This PR introduces one counter and two histograms related to workspace creation and claiming. The goal is to provide clearer observability into how workspaces are created (regular vs prebuild) and the time cost of those operations. ### `coderd_workspace_creation_total` * Metric type: Counter * Name: `coderd_workspace_creation_total` * Labels: `organization_name`, `template_name`, `preset_name` This counter tracks whether a regular workspace (not created from a prebuild pool) was created using a preset or not. Currently, we already expose `coderd_prebuilt_workspaces_claimed_total` for claimed prebuilt workspaces, but we lack a comparable metric for regular workspace creations. This metric fills that gap, making it possible to compare regular creations against claims. Implementation notes: * Exposed as a `coderd_` metric, consistent with other workspace-related metrics (e.g. `coderd_api_workspace_latest_build`: https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149). * Every `defaultRefreshRate` (1 minute ), DB query `GetRegularWorkspaceCreateMetrics` is executed to fetch all regular workspaces (not created from a prebuild pool). * The counter is updated with the total from all time (not just since metric introduction). This differs from the histograms below, which only accumulate from their introduction forward. ### `coderd_workspace_creation_duration_seconds` & `coderd_prebuilt_workspace_claim_duration_seconds` * Metric types: Histogram * Names: * `coderd_workspace_creation_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name`, `type` (`regular`, `prebuild`) * `coderd_prebuilt_workspace_claim_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name` We already have `coderd_provisionerd_workspace_build_timings_seconds`, which tracks build run times for all workspace builds handled by the provisioner daemon. However, in the context of this issue, we are only interested in creation and claim build times, not all transitions; additionally, this metric does not include `preset_name`, and adding it there would significantly increase cardinality. Therefore, separate more focused metrics are introduced here: * `coderd_workspace_creation_duration_seconds`: Build time to create a workspace (either a regular workspace or the build into a prebuild pool, for prebuild initial provisioning build). * `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a prebuilt workspace from the pool. The reason for two separate histograms is that: * Creation (regular or prebuild): provisioning builds with similar time magnitude, generally expected to take longer than a claim operation. * Claim: expected to be a much faster provisioning build. #### Native histogram usage Provisioning times vary widely between projects. Using static buckets risks unbalanced or poorly informative histograms. To address this, these metrics use [Prometheus native histograms](https://prometheus.io/docs/specs/native_histograms/): * First introduced in Prometheus v2.40.0 * Recommended stable usage from v2.45+ * Requires Go client `prometheus/client_golang` v1.15.0+ * Experimental and must be explicitly enabled on the server (`--enable-feature=native-histograms`) For compatibility, we also retain a classic bucket definition (aligned with the existing provisioner metric: https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189). * If native histograms are enabled, Prometheus ingests the high-resolution histogram. * If not, it falls back to the predefined buckets. Implementation notes: * Unlike the counter, these histograms are updated in real-time at workspace build job completion. * They reflect data only from the point of introduction forward (no historical backfill). ## Relates to Closes: https://github.com/coder/coder/issues/19528 Native histograms tested in observability stack: https://github.com/coder/observability/pull/50	2025-08-28 15:00:26 +01:00

1 2 3 4 5 ...

656 Commits