coder

mirror of https://github.com/coder/coder.git synced 2026-06-02 20:48:20 +00:00

Author	SHA1	Message	Date
Susana Ferreira	16b8e6072f	fix: set codersdk.Task current_state during task initialization (#20692 ) ## Problem With the new tasks data model, a task starts with an `initializing` status. However, the API returns `current_state: null` to represent the agent state, causing the frontend to display "No message available". This PR updates `codersdk.Task` to return a `current_state` when the task is initializing with meaningful messages about what's happening during task initialization. Previous message <img width="2764" height="288" alt="Screenshot 2025-11-07 at 09 06 13" src="https://github.com/user-attachments/assets/feec9f15-91ca-4378-8565-5f9de062d11a" /> New message <img width="2726" height="226" alt="Screenshot 2025-11-12 at 11 00 15" src="https://github.com/user-attachments/assets/2f9bee3e-7ac4-4382-b1c3-1d06bbc2906e" /> ## Changes - Populate `current_state` with descriptive initialization messages when task status is `initializing` and no valid app status exists for the current build - dbfake: Fix `WorkspaceBuild` builder to properly handle pending/running jobs by linking tasks without requiring agent/app resources Note: UI Storybook changes to reflect these new messages will be addressed in a follow-up PR. Closes: https://github.com/coder/internal/issues/1063	2025-11-17 13:24:12 +00:00
Mathias Fredriksson	fa314fe7e5	fix(coderd/database): rename duplicate migration 397 to 398 (#20783 ) Fix duplicate migration from #20683.	2025-11-14 18:05:29 +00:00
Mathias Fredriksson	1483fd11ff	fix(coderd/database): improve task status in tasks_with_status view (#20683 ) This change restructures the `tasks_with_status` view query to: - Improve debuggability by adding a `status_debug` column to better understand the outcome - Reduce clutter from `bool_or`, `bool_and` which are aggregate functions that did not actually have serve a purpose (each join is 0-1 rows) - Improve agent lifecycle state coverage, `start_timeout` and `start_error` were omitted - These states are easy to trigger even in a perfectly functioning workspace/task so we now rely on app health to report whether or not there was an issue - Mark canceling and canceled workspace build jobs as error state - Agent stop states were implicitly `unknown`, now there are explicit (I initially considered `error`, could go either way)	2025-11-14 19:52:26 +02:00
Steven Masley	f23836d426	chore: add more scopes to the curated catalog (#20746 ) Just noticed when writing docs. These are probably obvious scopes to allow	2025-11-14 08:30:10 -06:00
Susana Ferreira	79d46769fe	chore: remove warning for non-trackable workspace builds in metrics (#20775 ) Previously, `UpdateWorkspaceTimingsMetrics` would log a warning for workspace builds that aren't tracked (restarts, stops, subsequent builds after creation). This was noisy since these are legitimate operations, not errors. `UpdateWorkspaceTimingsMetrics` is specifically designed to track only workspace creation, prebuild creation, and prebuild claim timings. Related with: https://github.com/coder/coder/pull/20772	2025-11-14 12:26:32 +00:00
Danny Kopping	86c4948445	chore: add timing flag context to warn message (#20772 ) `prometheus.provisionerd_server_metrics: unsupported workspace timing flags` appears in the logs, but without knowledge of the available flags it's not possible to troubleshoot this. Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-14 10:10:53 +00:00
Steven Masley	fe3b825b86	chore: per template opt into cached terraform directories (#20609 ) For experimental and dogfood purposes, this adds the ability to opt in a single template. Leaving the rest of the templates as is. For GA, this setting might be removed or changed.	2025-11-13 14:04:12 -06:00
Steven Masley	9ca5b44b56	chore: implement persistent terraform directories (experimental) (#20563 ) Prior to this, every workspace build ran `terraform init` in a fresh directory. This would mean the `modules` are downloaded fresh. If the module is not pinned, subsequent workspace builds would have different modules.	2025-11-13 07:50:17 -06:00
Steven Masley	04727c06e8	chore: add experiment toggle for terraform workspace caching (#20559 ) Experiments passed to provisioners to determine behavior. This adds `--experiments` flag to provisioner daemons. Prior to this, provisioners had no method to turn on/off experiments.	2025-11-12 14:26:15 -06:00
Steven Masley	9149c1e9f2	chore: append template metadata to protobuf config (#20558 ) Adds some extra meta data sent to provisioners. Also adds a field `reuse_terraform_workspace` to tell the provisioner whether or not to use the caching experiment.	2025-11-12 12:46:39 -06:00
Mathias Fredriksson	e61b0fcf42	chore(codersdk): deprecate HasAITask on WorkspaceBuild (#20732 ) Closes coder/internal#973	2025-11-12 10:27:06 +00:00
Danny Kopping	04f809f2d0	chore!: allow coder MCP tools to not be injected (#20713 ) Currently, when AI Bridge is enabled AND the `oauth2` and `mcp-server-http` experiments are enabled we inject Coder's MCP tools into all intercepted AI Bridge requests. This PR introduces a config to control this behaviour. NOTE: this is a backwards-incompatible change; previously these tools would be injected automatically, now this setting will need to be explicitly enabled. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-12 11:23:01 +02:00
Ethan	e49c917bb0	perf: use a single query for notification target lookups (#20574 ) Somewhat minor inefficiency in notifications I discovered during a scaletest where I was creating many users. Our `GetUsers` query filter for rbac roles uses the `&&` operator on arrays, which is the intersection of the two arrays. Despite that, we were making seperate DB queries for each role, and then collating the results. I didn't see any other instances of this. The test changes are required as the order of outgoing notifications is now non-deterministic.	2025-11-11 21:23:23 -05:00
Danielle Maywood	f2a1a7e8c3	fix(coderd): gate AI task notifications on agent ready state (#20690 ) Relates to https://github.com/coder/internal/issues/1098 Currently AgentAPI waits for only 2 seconds worth of identical terminal screen snapshots before deciding a task has entered a "stable" state. We interpret this as becoming "idle", resulting in a notification being triggered. This behavior is not ideal and is ultimately the root cause of our spammy notifications. Unfortunately, until we move AgentAPI to either use the Claude Code SDK (or ACP wrapper around it), we are unable to easily fix the root cause. This PR instead waits until the agent is ready before it will send state change notifications. This will at least resolve _some_ of the complaints about task state notifications being too spammy. --- 🤖 PR was written by Claude Sonnet 4.5 using [Coder Mux](https://github.com/coder/cmux) and reviewed by a human 👩	2025-11-10 16:00:13 +00:00
Paweł Banaszewski	991831b1dd	chore: add API key ID to interceptions (#20513 ) Adds APIKeyID to interceptions. Needed for tracking API key usage with bridge. fixes https://github.com/coder/coder/issues/20001	2025-11-10 13:46:41 +01:00
Mathias Fredriksson	ce04f6cc5d	fix(coderd): remove deprecated AITaskSidebarApp column (#20680 ) This column was no longer used in `v2.28` and the codersdk field deprecated. Both can now be dropped in `v2.29`. Closes coder/internal#974	2025-11-07 12:45:45 +02:00
Cian Johnston	34f6e72879	feat(coderd): add lookup task by name in httpmw.TaskParam (#20647 ) * Adds a `GetTaskByOwnerIDAndName` query * Updates `httpmw.TaskParam` to fall back to task name if no task by UUID found. * Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.	2025-11-05 14:28:34 +00:00
Mathias Fredriksson	daad93967a	fix(coderd): fix template ai task check error message (#20651 ) Create task was still mentioning magic prompt parameter when checking template task validity. This change updates it to only mention validity of `coder_ai_task` resource.	2025-11-03 12:54:43 +00:00
Mathias Fredriksson	a6b0eae38d	refactor(coderd): drop sidebar app constraint and simplify provisionerdserver for tasks (#20591 ) Updates coder/internal#973 Updates coder/internal#974	2025-11-03 13:46:38 +02:00
Cian Johnston	1961252918	chore(coderd/provisionerdserver): address flake in TestServer_ExpirePrebuildsSessionToken (#20648 ) Addresses a flake seen locally by @mafredri: ``` panic: interface conversion: proto.isAcquiredJob_Type is nil, not proto.AcquiredJob_WorkspaceBuild_ [recovered] panic: interface conversion: proto.isAcquiredJob_Type is nil, not proto.AcquiredJob_WorkspaceBuild_ goroutine 77 [running]: testing.tRunner.func1.2({0x35ba440, 0xc000f15620}) /usr/local/go/src/testing/testing.go:1734 +0x21c testing.tRunner.func1() /usr/local/go/src/testing/testing.go:1737 +0x35e panic({0x35ba440?, 0xc000f15620?}) /usr/local/go/src/runtime/panic.go:792 +0x132 github.com/coder/coder/v2/coderd/provisionerdserver_test.TestServer_ExpirePrebuildsSessionToken(0xc00010d500) /home/coder/coder/coderd/provisionerdserver/provisionerdserver_test.go:4128 +0xc4b testing.tRunner(0xc00010d500, 0x4bd8450) /usr/local/go/src/testing/testing.go:1792 +0xf4 created by testing.(*T).Run in goroutine 1 /usr/local/go/src/testing/testing.go:1851 +0x413 FAIL github.com/coder/coder/v2/coderd/provisionerdserver 20.830s FAIL ``` It's unclear why this would happen in the first place.	2025-11-03 11:39:02 +00:00
Mathias Fredriksson	7ae3fdc749	refactor: use task data model for notifications (#20590 ) Updates coder/internal#973 Updates coder/internal#974	2025-10-31 15:53:27 +02:00
Asher	d306a2d7e5	chore: log with %s on unexpected non-sdk err (#20570 ) With `%w` it prints an address instead of the error, like `<op> <url> 0xc001329370` instead of `<op> <url>: some error`, honestly idk why you even can log with `%w` it seems like it makes no sense to use `%w` outside of `fmt.Errorf`. This is to help debug https://github.com/coder/internal/issues/1010.	2025-10-30 10:23:52 -08:00
Danielle Maywood	d80b5fc8ed	refactor!: remove TaskAppID from codersdk.WorkspaceBuild (#20583 ) Remove the `TaskAppID` field from `codersdk.WorkspaceBuild`. Consumers can instead use the new `codersdk.Task` data model for this information.	2025-10-30 16:45:51 +00:00
Cian Johnston	38017010ce	fix(coderd): disallow POSTing a workspace build on a deleted workspace (#20584 ) - Adds a check on /api/v2/workspacebuilds to disallow creating a START or STOP build if the workspace is deleted. - DELETEs are still allowed.	2025-10-30 13:32:18 +00:00
Cian Johnston	73dedcc765	fix: delete related task when deleting workspace (#20567 ) * Instead of prompting the user to start a deleted workspace (which is silly), prompt them to create a new task instead. * Adds a warning dialog when deleting a workspace * Updates provisionerdserver to delete the related task if a workspace is related to a task	2025-10-30 10:37:51 +00:00
Steven Masley	54497f4f6b	chore: add revocation endpoint to oauth well-known (#20561 ) Was added to apps endpoints, but not the wider site ones. This is a site wide oauth route	2025-10-29 16:44:53 -05:00
Mathias Fredriksson	859e94d67a	fix: deprecate codersdk.AITaskPromptParameterName and reduce usage (#20501 ) Depends on coder/sqlc#1 Fixes coder/internal#979 Updates coder/internal#973	2025-10-29 18:59:12 +00:00
Mathias Fredriksson	303e9ef7de	fix: switch to coder/sqlc fork (#20536 ) Refs https://github.com/coder/sqlc/pull/1 Unblocks https://github.com/coder/coder/pull/20501 Upstream https://github.com/sqlc-dev/sqlc/pull/4159	2025-10-29 18:45:56 +02:00
Cian Johnston	1ebc217624	fix: update task link AppStatus using task_id (#20543 ) Fixes https://github.com/coder/coder/issues/20515 Alternative to https://github.com/coder/coder/pull/20519 Adds `task_id` to `workspaces_expanded` view and updates the "View Task" link in `AppStatuses` component. NOTE: this contains a migration	2025-10-29 15:45:45 +00:00
Danielle Maywood	06dbadab11	fix(coderd): ensure lifecycle executor has sufficient task permissions (#20539 ) We recently made a change to the `wsbuilder` to handle task related logic. Our test coverage for the lifecycle executor didn't handle this scenario and so we missed that it had insufficient permissions. This PR adds `Update` and `Read` permissions for `Task`s in the lifecycle executor, as well as an autostart/autostop test tailored to task workspaces to verify the change. --- Anthropic's Claude Sonnet 4.5 Thinking was involved in writing the tests	2025-10-29 15:44:35 +00:00
Cian Johnston	566146af72	fix(coderd): fix audit log resource link for tasks (#20545 ) Existing task audit log links were incorrect. As audit log links are generated on-the-fly, this does not require backfill.	2025-10-29 15:31:41 +00:00
Susana Ferreira	7e8fcb4b0f	perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493 ) ## Description The membership reconciliation ensures the prebuilds system user is a member of all organizations with prebuilds configured. To support prebuilds quota management, each organization must have a prebuilds group that the system user belongs to. ## Problem Previously, membership reconciliation iterated over all presets to check and update membership status. This meant database queries `GetGroupByOrgAndName` and `InsertGroupMember` were executed for each preset. Since presets are unique combinations of `(organization, template, template version, preset)`, this resulted in several redundant checks for the same organization. In dogfood, `InsertGroupMember` was called thousands of times per day, even though memberships were already configured ([internal Grafana dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1)) <img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36" src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db" /> ## Solution This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query that returns: * All unique organizations with prebuilds configured * Whether the prebuilds user is a member of each organization * Whether the prebuilds group exists in each organization * Whether the prebuilds user is in the prebuilds group The membership reconciliation logic now: * Fetches status for all organizations in one query * Only performs inserts for organizations missing required memberships or groups * Safely handles concurrent operations via unique constraint violations * This reduces database load from `O(presets)` to `O(organizations)` per reconciliation loop, with a single read query when everything is configured. ## Changes * Add `GetOrganizationsWithPrebuildStatus` SQL query * Update `membership.ReconcileAll` to use organization-based reconciliation instead of preset-based * Update tests to reflect new behavior Related to internal thread: https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369	2025-10-29 14:24:29 +00:00
Danny Kopping	b20fd6f2c1	chore: graduate aibridge API out of experimental (#20523 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:18:54 -06:00
Susana Ferreira	aad1b401c1	feat: add prebuilds reconciliation duration metric (#20535 ) ## Description Adds `coderd_prebuilds_reconciliation_duration_seconds` histogram metric to track the duration of each prebuilds reconciliation cycle. This metric helps operators monitor reconciliation performance and identify potential bottlenecks. ## Changes - Added `ReconcileStats` struct to capture reconciliation cycle statistics - Updated `ReconcileAll()` to return stats including elapsed time - Added histogram metric `coderd_prebuilds_reconciliation_duration_seconds`	2025-10-29 12:52:30 +00:00
Danny Kopping	95a1ca898f	chore: remove aibridge experiment (#20520 ) Removes the experiment and all references to it	2025-10-29 06:18:38 -06:00
Susana Ferreira	c3e3bb58f2	feat: delete pending canceled prebuilds (#20499 ) ## Description PR https://github.com/coder/coder/pull/20387 introduced canceling pending prebuild jobs from inactive template versions to avoid provisioning obsolete workspaces. However, the associated prebuilds remained in the database with "Canceled" status, visible in the UI. This PR now orphan-deletes these canceled prebuilt workspaces. Since the canceled jobs were never processed by a provisioner, no Terraform resources were created, making orphan deletion safe. Orphan deletion always creates a provisioner job, but behaves differently based on provisioner availability: - If no provisioner daemon is available, the job is immediately marked as completed and the workspace is marked as deleted without any provisioner processing - If a provisioner daemon is available, it processes the delete job with empty Terraform state (no actual resources to destroy) The job cancellation and workspace deletion occur atomically in the same transaction. We don't split this into two separate reconciliation runs because there's no way to distinguish between system-canceled prebuilds and user-canceled workspaces. If we deleted canceled workspaces in a later run, we'd delete user-canceled workspaces that users may want to keep for troubleshooting. Note: This only applies to system-generated prebuilds from inactive template versions. ## Changes * Update `UpdatePrebuildProvisionerJobWithCancel` query to return job ID, workspace ID, template ID, and template version preset ID * Add `DeprovisionMode` enum to support orphan deletion in the provision flow * Update `ActionTypeCancelPending` handler to cancel jobs and orphan-delete associated workspaces atomically	2025-10-29 10:37:28 +00:00
Callum Styan	45c43d4ec4	fix: refactor agent resource monitoring API to avoid excessive calls to DB (#20430 ) This should resolve https://github.com/coder/internal/issues/728 by refactoring the ResourceMonitorAPI struct to only require querying the resource monitor once for memory and once for volumes, then using the stored monitors on the API struct from that point on. This should eliminate the vast majority of calls to `GetWorkspaceByAgentID` and `FetchVolumesResourceMonitorsUpdatedAfter`/`FetchMemoryResourceMonitorsUpdatedAfter` (millions of calls per week). Tests passed, and I ran an instance of coder via a workspace with a template that added resource monitoring every 10s. Note that this is the default docker container, so there are other sources of `GetWorkspaceByAgentID` db queries. Note that this workspace was running for ~15 minutes at the time I gathered this data. Over 30s for the `ResourceMonitor` calls: ``` coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep ResourceMonitor \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsUpdatedAfter"} 2 100 288k 0 288k 0 0 58.3M 0 --:--:-- --:--:-- --:--:-- 70.4M coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsUpdatedAfter"} 2 coderd_db_query_latencies_seconds_count{query="UpdateMemoryResourceMonitor"} 155 coderd_db_query_latencies_seconds_count{query="UpdateVolumeResourceMonitor"} 155 coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep ResourceMonitor \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsUpdatedAfter"} 2 100 288k 0 288k 0 0 34.7M 0 --:--:-- --:--:-- --:--:-- 40.2M coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsUpdatedAfter"} 2 coderd_db_query_latencies_seconds_count{query="UpdateMemoryResourceMonitor"} 158 coderd_db_query_latencies_seconds_count{query="UpdateVolumeResourceMonitor"} 158 ``` And over 1m for the `GetWorkspaceAgentByID` calls, the majority are from the workspace metadata stats updates: ``` coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep GetWorkspaceByAgentID \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 284k 0 284k 0 0 42.4M 0 --:--:-- --:--:-- --:--:-- 46.3M coderd_db_query_latencies_seconds_count{query="GetWorkspaceByAgentID"} 876 coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep GetWorkspaceByAgentID \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 284k 0 284k 0 0 75.4M 0 --:--:-- --:--:-- --:--:-- 92.7M coderd_db_query_latencies_seconds_count{query="GetWorkspaceByAgentID"} 918 ``` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-10-28 13:38:16 -07:00
Danielle Maywood	a1e7e105a4	chore: disable task notifications by default (#20518 ) Relates to https://github.com/coder/internal/issues/1098 Currently task notifications are incredibly noisy. We should disable them by default for the upcoming release whilst we iron them out.	2025-10-28 17:21:23 +00:00
Cian Johnston	659f89e079	feat(coderd): add owner-related fields to tasks_with_status view (#20471 ) Relates to https://github.com/coder/coder/pull/20431/files#diff-9cfc826a6ce7e77d977b2025482474dd263d12965b2a94479a74c7f1d872b782 If the workspace relating to a task was deleted, most of the workspace-related fields in `taskFromDBTaskAndWorkspace` will be zero-valued. However, we can still get information relating to the owner so that "created by" shows up correctly in the UI. Updates the `tasks_with_status` view with a join on `visible_users` to get owner-related info.	2025-10-28 14:29:29 +00:00
Mathias Fredriksson	a1fa58ac17	fix: update dbgen and dbfake task creation and toolsdk test fixtures (#20508 ) Depends on #20506 Fixes coder/internal#1103	2025-10-28 14:15:58 +02:00
Danny Kopping	d18441debe	feat: add AWS Bedrock support (#20507 ) Depends on https://github.com/coder/aibridge/pull/44 Closes https://github.com/coder/aibridge/issues/28 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-10-28 03:38:14 +00:00
ケイラ	4f7b279fd8	feat: add an organization member permission level (#19953 )	2025-10-27 17:14:16 -06:00
Mathias Fredriksson	c3cbd977f1	fix(coderd/database/dbfake): use transaction for workspace builder (#20506 ) While investigating a flake I noticed that the dbfake workspace builder executes all database inserts without a transaction. Since our real wsbuilder implementation utilizes one it makes sense to do here as well. For example, our normal workspace <-> build relationship is such that a workspace cannot exist with at least one build. However, our GetWorkspaces query left joins workspace builds but has types that are non-nullable, leading to flakes like coder/internal#1103.	2025-10-28 01:06:52 +02:00
Dean Sheather	5a3ceb38f0	chore: add aibridge data to telemetry (#20449 ) - Adds a new table to keep track of which payloads have already been reported since we only report for the last clock hour - Adds a query to gather and aggregate all the data by provider/model/client Relates to https://github.com/coder/coder-telemetry-server/issues/27	2025-10-28 03:16:41 +11:00
ケイラ	d9c40d61c2	refactor: clean up policy.rego (#20366 )	2025-10-27 10:01:30 -06:00
Spike Curtis	af3ff825a1	test: track postgres database creation by package and test name (#20492 ) Adds columns to track package and test name to test_databases table, and populates them as databases are created using the Broker. In order to seamlessly work with existing `coder_database` databases with the old schema, the SQL that creates the table and columns is additive and idempotent, so we run it every time we initialize the Broker (once per test binary execution). We include a transaction level advisorly lock to prevent deadlocks before attempting to alter the schema. I was seeing deadlocks without this.	2025-10-27 14:31:32 +04:00
Paweł Banaszewski	50ba223aa1	feat: add db query for setting interception ended_at field (#20437 ) Adds UpdateAIBridgeInterceptionEnded query to mark interceptions as done. Needed for https://github.com/coder/internal/issues/1051	2025-10-27 09:51:37 +01:00
Cian Johnston	b8a0f97cab	chore(coderd): add test for deleting task with no workspace (#20466 )	2025-10-24 18:19:05 +01:00
Susana Ferreira	f6e86c6fdb	feat: cancel pending prebuilds from non-active template versions (#20387 ) ## Description This PR introduces an optimization to automatically cancel pending prebuild-related jobs from non-active template versions in the reconciliation loop. ## Problem Currently, when a template is configured with more prebuild instances than available provisioners, the provisioner queue can become flooded with pending prebuild jobs. This issue is worsened when provisioning/deprovisioning operations take a long time. When the prebuild reconciliation loop generates jobs faster than provisioners can process them, pending jobs accumulate in the queue. Since prebuilt workspaces should always run the latest active template version, pending prebuild jobs from non-active versions become obsolete once a new version is promoted. ## Solution The reconciliation loop cancels pending prebuild-related jobs from non-active template versions that match the following criteria: * Build number: 1 (initial build created by the reconciliation loop) * Job status: `pending` * Not yet picked up by a provisioner (`worker_id` is `NULL`) * Owned by the prebuilds system user * Workspace transition: `start` This prevents the queue from being cluttered with stale prebuild jobs that would provision workspaces on an outdated template version that would consequently need to be deprovisioned. ## Changes * Added new SQL query `CountPendingNonActivePrebuilds` to identify presets with pending jobs from non-active versions * Added new SQL query `UpdatePrebuildProvisionerJobWithCancel` to cancel jobs for a specific preset * New reconciliation action type `ActionTypeCancelPending` handles the cancellation logic * Cancellation is non-blocking: failures to cancel prebuild jobs are logged as errors and don't prevent other reconciliation actions ## Follow-up PR Canceling pending prebuild jobs leaves workspaces in a Canceled state. While no Terraform resources need to be destroyed (since jobs were canceled before provisioning started), these database records should still be cleaned up. This will be addressed in a follow-up PR. Closes: https://github.com/coder/coder/issues/20242	2025-10-24 15:27:49 +01:00
Mathias Fredriksson	51d3abb904	feat(site): use new task data model and endpoints (#20431 ) Updates the UI to use the new API endpoints for tasks and use its new data model. Disclaimer: Since the base data model for tasks changed, we had to do a quite large refactor and I'm sorry for that 🙏, but you'll notice most of the changes are to adjust the types. Closes coder/internal#976 --------- Co-authored-by: Bruno Quaresma <bruno_nonato_quaresma@hotmail.com>	2025-10-24 10:45:19 -03:00

1 2 3 4 5 ...

3005 Commits