coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Zach	ddc0e99c69	chore: remove coder_secret Terraform integration (#25512 ) Removes the coder_secret Terraform integration: the data.coder_secret consumption path through provisionerdserver → provisioner.proto → provisioner/terraform, the dynamic-parameter secret-requirement validation, and the workspace-update / resolve-autostart surfaces that depended on it. This is being done due to a product/feature direction change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit) and the agent-manifest secret-injection path are untouched. The provisionerd API is bumped from v1.17 to v1.18 rather than rolled back: v1.17 shipped in v2.33.x, so user_secrets field numbers are reserved and the changelog documents both versions. Generated with assistance from Coder Agents.	2026-05-21 09:19:29 -06:00
Garrett Delfosse	78d4cf9e47	fix: soft-delete stale workspace agents on new build (#25207 )	2026-05-18 08:33:29 -04:00
Zach	79735f2d45	feat: plumb user secrets through provisioner chain to terraform (#24542 ) This change passes user secrets from coderd to the Terraform process at workspace build time so the `data.coder_secret` data source in terraform-provider-coder can resolve values at plan time. Secrets traverse two proto hops: `provisionerdserver` fetches them via`ListUserSecretsWithValues`, attaches them to `AcquiredJob.WorkspaceBuild.user_secrets` on `provisionerd.proto`; `runner.go` forwards into `PlanRequest.user_secrets` on `provisioner.proto`; the Terraform provisioner encodes each as `CODER_SECRET_ENV_<name>` or `CODER_SECRET_FILE_<hex(path)>` before invoking `terraform plan`. Only plan requests carry secrets; apply runs with `nil` because values are baked into plan state. Fetch is gated on a workspace transitioning to start. stop and delete transitions never carry secrets, so revoking or deleting a stored secret cannot make a workspace unstoppable. DB errors on the fetch fail the job outright rather than silently continuing with an empty secret set. Note that user secrets will be stored in the workspace_builds table in provisioner_state with other Terraform state (including other sensitive data).	2026-04-27 08:26:07 -06:00
Sas Swart	5b6b7719df	fix: make prebuild claiming durable and idempotent (#23108 ) ## Problem When a prebuilt workspace is claimed, the agent reinitializes via a single fire-and-forget pubsub event over SSE. If the agent's SSE connection is interrupted at claim time, the event is permanently lost — the workspace is stuck with no self-healing path. Additionally, regular (non-prebuild) workspaces had no way to opt out of the `/reinit` polling loop — agents would reconnect indefinitely to an endpoint that would never send them anything useful. ## Root Cause `workspaceAgentReinit` fetches the workspace (with its current `owner_id`) via `GetWorkspaceByAgentID`, but never checked whether a claim already happened. It only subscribed to pubsub for future events. The database already has durable claim state (`owner_id` changes from `PrebuildsSystemUserID` to the real user), but no layer ever consulted it on reconnection. ## Solution ### Server-side durable check with first-build-initiator gating TOCTOU-safe ordering: Subscribe to pubsub claim events before any durable checks, so a claim that fires during the check is buffered in the channel rather than lost. First-build-initiator gating: When `!workspace.IsPrebuild()` (owner is no longer the system user), look up the first build's `InitiatorID`. The prebuild reconciler always uses `PrebuildsSystemUserID` as the initiator. This distinguishes claimed prebuilds from regular workspaces without any SQL schema changes. - Regular workspace (first build initiator ≠ system user) → 409 Conflict, agent stops reconnecting - Claimed prebuild, build completed → pre-seed channel with reinit event and close it, transmitter delivers one-shot then exits - Claimed prebuild, build in-progress → fall through to pubsub subscription, agent waits for completion event - Unclaimed prebuild → pubsub subscription (existing happy path) ### Declarative reinit events (defense-in-depth) - Added `UserID` field to `ReinitializationEvent` with JSON tags - Switched pubsub serialization from raw string to JSON (with backward-compat fallback for rolling upgrades) - Populated `UserID` at both the publish site and the durable check ### Agent SDK: 409 handling `WaitForReinitLoop` detects 409 Conflict from the server and closes the `reinitEvents` channel, cleanly exiting the retry goroutine. ### Agent CLI: fixed two bugs + added reinitCtx - Closed channel (`!ok`): now blocks on `<-ctx.Done()` instead of `continue`, keeping the current agent running. Previously this would leak agents by skipping `agnt.Close()` and re-entering the loop. - Duplicate owner reinit: cancels `reinitCtx` (stops the reinit goroutine), then blocks on `<-ctx.Done()`. Previously `continue` would skip cleanup and create a new agent on the next loop iteration. - `reinitCtx`: a cancellable child of `ctx` passed to `WaitForReinitLoop`, allowing the agent to stop the reinit HTTP polling after reinit completes. ### Agent-side idempotency Tracks `lastOwnerID` in the agent reinit loop — duplicate events for the same owner are skipped. ## Testing - "unclaimed prebuild receives reinit via pubsub": prebuild owned by system user, pubsub event triggers reinit - "claimed prebuild receives one-shot reinit on reconnect": first build by system user, owner changed, build completed → immediate reinit (no pubsub needed) - "claimed prebuild waits during in-progress claim build": claimed but build still running → no reinit until build completes - "regular workspace gets 409": first build by real user → 409 Conflict, agent stops polling - Updated claim publisher/listener tests: verify `UserID` survives JSON round-trip + backward compat with raw string payloads - Updated SSE round-trip test: verify `UserID` survives transmit → receive cycle Fixes #22359 ## Rolling upgrade note During a rolling deploy where old coderd instances coexist with new ones, the pubsub `ReinitializationEvent` has a new `workspace_id` field (JSON key `workspace_id`). Old publishers send a raw reason string instead of JSON; the new listener gracefully falls back by treating the entire payload as the reason and filling in `WorkspaceID` from context. The only visible effect during the upgrade window is that `WorkspaceID` may be the zero UUID in agent-side logs — this is cosmetic and resolves once all instances are updated.	2026-04-02 23:51:02 +02:00
Cian Johnston	3f55b35f68	refactor: replace AsSystemRestricted with narrower actors (#23712 ) Replace overly-broad `AsSystemRestricted` with purpose-built actors: - OAuth2 provider paths → `AsSystemOAuth2` (13 call sites across `tokens.go`, `registration.go`, `apikey.go`) - Provisioner daemon health read → `AsSystemReadProvisionerDaemons` (1 site in `healthcheck/provisioner.go`) - Provisionerd file cache paths → `AsProvisionerd` (2 sites in `provisionerdserver.go`, matching existing usage nearby) <details> <summary>Implementation notes</summary> Each replacement actor is a strict subset of `AsSystemRestricted`. Every DB method at each call site is already covered by the narrower actor's permissions: - `subjectSystemOAuth2`: OAuth2App/Secret/CodeToken (all), ApiKey (Read, Delete), User (Read), Organization (Read) - `subjectSystemReadProvisionerDaemons`: ProvisionerDaemon (Read) - `subjectProvisionerd`: File (Create, Read) plus provisionerd-scoped resources No new permissions added. `nolint:gocritic` comments updated to reflect the new actors. </details> > 🤖 Created by a Coder Agent, reviewed by me.	2026-03-27 15:08:30 +00:00
Kacper Sawicki	1e07ec49a6	feat: add merge_strategy support for coder_env resources (#23107 ) ## Description Implements the server-side merge logic for the `merge_strategy` attribute added to `coder_env` in [terraform-provider-coder v2.15.0](https://github.com/coder/terraform-provider-coder/pull/489). This allows template authors to control how duplicate environment variable names are combined across multiple `coder_env` resources. Relates to https://github.com/coder/coder/issues/21885 ## Supported strategies \| Strategy \| Behavior \| \|----------\|----------\| \| `replace` (default) \| Last value wins — backward compatible \| \| `append` \| Joins values with `:` separator (e.g. PATH additions) \| \| `prepend` \| Prepends value with `:` separator \| \| `error` \| Fails the build if the variable is already defined \| ## Example ```hcl resource "coder_env" "path_tools" { agent_id = coder_agent.dev.id name = "PATH" value = "/home/coder/tools/bin" merge_strategy = "append" } ``` ## Changes - Proto: Added `merge_strategy` field to `Env` message in `provisioner.proto` - State reader: Updated `agentEnvAttributes` struct and proto construction in `resources.go` - Merge logic: Added `mergeExtraEnvs()` function in `provisionerdserver.go` with strategy-aware merging for both agent envs and devcontainer subagent envs - Tests: 15 unit tests covering all strategies, edge cases (empty values, mixed strategies, multiple appends) - Dependency: Bumped `terraform-provider-coder` v2.14.0 → v2.15.0 - Fixtures: Updated `duplicate-env-keys` test fixtures and golden files ## Ordering When multiple resources `append` or `prepend` to the same key, they are processed in alphabetical order by Terraform resource address (per the determinism fix in #22706).	2026-03-18 15:43:28 +01:00
Steven Masley	abf59ee7a6	feat: track ai seat usage (#22682 ) When a user uses an AI feature, we record them in the `ai_seat_state` as consuming a seat. Added in debouching to prevent excessive writes to the db for this feature. There is no need for frequent updates.	2026-03-16 12:36:26 -05:00
Mathias Fredriksson	703b974757	fix(coderd): remove false devcontainers early access warning (#23056 ) The script source claimed Dev Containers are early access and told users to set CODER_AGENT_DEVCONTAINERS_ENABLE=true, which already defaults to true. Clear the script source and set RunOnStart to false since there is nothing to run.	2026-03-16 10:16:14 +02:00
Callum Styan	36665e17b2	feat: add WatchAllWorkspaceBuilds endpoint for autostart scaletests (#22057 ) This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces` endpoint, which can be used to listen on a single global pubsub channel for _all_ workspace build updates, and makes use of it in the autostart scaletest. This negates the need to use a workspace watch pubsub channel _per_ workspace, which has auth overhead associated with each call. This is especially relevant in situations such as the autostart scaletest, where we need to start/stop a set of workspaces before we can configure their autostart config. The overhead associated with all the watch requests skews the scaletest results and makes it harder to reason about the performance of the autostart feature itself. The autostart scaletest also no longer generates its own metrics nor does it wait for all the workspaces to actually start via autostart. We should update the scaletest dashboard after both PRs are merged to measure autostart performance via the new metrics. The new function/endpoint and its usage in the autostart scaletest are gated behind an experiment feature flag, this is something we should discuss whether we want to enable the endpoint in prod by default or not. If so, we can remove the experiment. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Callum Styan <callum@coder.com>	2026-03-13 20:37:41 -07:00
Mathias Fredriksson	9d33c340ec	fix(coderd): handle ignored errors across coderd packages (#22851 ) Handle previously ignored error return values in coderd: - coderd/chats.go: check sendEvent errors, log on failure - coderd/chatd/chattest: thread testing.TB through server structs, replace log.Printf with t.Logf, check writeSSEEvent errors - coderd/chatd/chattool/createworkspace.go: log UpdateChatWorkspace failure instead of discarding both return values - coderd/chatd/chattool/execute.go: surface ProcessOutput error in the timeout message returned to the caller - coderd/provisionerdserver: log stream.Send failure in the DownloadFile error helper	2026-03-13 19:53:20 +02:00
Kyle Carberry	d39f69f4c2	fix: avoid mutating proto App.Healthcheck in insertAgentApp (#22954 ) ## Problem `insertAgentApp` mutated its input by writing to `app.Healthcheck` when it was nil (line 3525): ```go if app.Healthcheck == nil { app.Healthcheck = &sdkproto.Healthcheck{} // mutation! } ``` The Devcontainers subtests share the same `tt.resource` pointer across two parallel goroutines (`WithProtoIDs` and `WithoutProtoIDs`), causing a data race on the `Healthcheck` field (and its sub-fields `Url`, `Interval`, `Threshold`). ## Fix Replace the in-place mutation with a local variable: ```go healthcheck := app.GetHealthcheck() if healthcheck == nil { healthcheck = &sdkproto.Healthcheck{} } ``` This avoids writing back to the shared proto message. All downstream reads now use the local `healthcheck` variable.	2026-03-11 16:29:10 +00:00
Steven Masley	537260aa22	fix: early oidc refresh with fake idp tests (#22712 ) Wrote unit tests that implement a fake idp to verify the oauth package actually refreshes the token	2026-03-06 16:51:27 +00:00
Steven Masley	c805c8c02c	chore: setting time forward for expiration math (#22687 ) It was set backwards, which allowed invalid refresh tokens. Making things worse.	2026-03-06 12:29:54 +00:00
Steven Masley	f49dea683c	chore: prematurely refresh oidc token near expiry during workspace build (#22502 ) Closes https://github.com/coder/coder/issues/22429	2026-03-03 18:13:00 +00:00
Jon Ayers	0a7a3da178	fix: exclude provisioner_state from workspace_build_with_user view (#22159 ) The provisioner state for a workspace build was being loaded for every long-lived agent rpc connection. Since this state can be anywhere from kilobytes to megabytes this can gradually cause the `coderd` memory footprint to grow over time. It's also a lot of unnecessary allocations for every query that fetches a workspace build since only a few callers ever actually reference the provisioner state. This PR removes it from the returned workspace build and adds a query to fetch the provisioner state explicitly.	2026-02-23 22:46:17 -06:00
Zach	6a783fc5c7	fix: floor provisioner job queue wait metric (#22184 ) After a PostgreSQL round-trip, job timestamps lose their monotonic clock component, making the subtraction susceptible to wall-clock adjustments producing a small negative delta. Floor at 1ms since a zero or negative queue wait is meaningless. Fixes TestProvisionerJobQueueWaitMetric flakes where small negative values (~ -2ms) are observed.	2026-02-20 16:12:17 -07:00
Danielle Maywood	911d734df9	fix: avoid re-using `AuthInstanceID` for sub agents (#22196 ) Parent agents were re-using AuthInstanceID when spawning child agents. This caused GetWorkspaceAgentByInstanceID to return the most recently created sub agent instead of the parent when the parent tried to refetch its own manifest. Fix by not reusing AuthInstanceID for sub agents, and updating GetWorkspaceAgentByInstanceID to filter them out entirely.	2026-02-19 16:56:29 +00:00
Callum Styan	5f3be6b288	feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869 ) This PR adds some metrics to help identify job enqueue rates and latencies. This work was initiated as a way to help reduce the cost of the observation/measurement itself for autostart scaletests, which impacts our ability to identify/reason about the load caused by autostart. See: https://github.com/coder/internal/issues/1209 I've extended the metrics here to account for regular user initiated builds, prebuilds, autostarts, etc. IMO there is still the question here of whether we want to include or need the `transition` label, which is only present on workspace builds. Including it does lead to an increase in cardinality, and in the case of the histogram (when not using native histograms) that's at least a few extra series for every bucket. We could remove the transition label there but keep it on the counter. Additionally, the histogram is currently observing latencies for other jobs, such as template builds/version imports, those do not have a transition type associated with them. Tested briefly in a workspace, can see metric values like the following: - `coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"} 1` - `coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"} 1` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 13:40:47 -08:00
Steven Masley	efd98bd93a	chore: add template toggle to disable module caching (#21931 ) There exists use cases to disable the new module caching behavior of workspace builds. This was the legacy behavior.	2026-02-05 14:38:55 -06:00
Danielle Maywood	37aecda165	feat(coderd/provisionerdserver): insert sub agent resource (#21699 ) Update provisionerdserver to handle the changes introduced to provisionerd in https://github.com/coder/coder/pull/21602 We now create a relationship between `workspace_agent_devcontainers` and `workspace_agents` with the newly created `subagent_id`.	2026-01-30 17:19:19 +00:00
Steven Masley	e13f2a9869	chore: remove extra `stop_modules` from provisionerd proto (#21706 ) Was a duplicate of start_modules Closes https://github.com/coder/coder/issues/21206	2026-01-28 09:25:47 -06:00
Cian Johnston	7b44976618	fix(coderd/provisionerdserver): correct managed agent tracking (#21696 ) Relates to https://github.com/coder/internal/issues/1282 Updates tracking of managed agents to be predicated instead on the presence of a related `task_id` instead of the presence of a `coder_ai_task` resource.	2026-01-27 12:14:52 +00:00
Steven Masley	60b3fd0783	chore!: send modules archive over the proto messages (#21398 ) # What this does Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces. This allow terraform to skip downloading modules from their registries, a step that takes seconds. <img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" /> # Wire protocol The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit. # 🚨 Behavior Change (Breaking Change) 🚨 Before this PR modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version After this PR modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.	2026-01-09 11:33:34 -06:00
Steven Masley	d2044c2ee9	chore: update protobuf to reuse file request (#21447 ) This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398 Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`. Renamed to `FileUpload` because it is now bi-directional. This is backwards compatible. I tested it to confirm the payloads are identical. Types were just renamed and moved around. ```golang func TestTypeUpgrade(t *testing.T) { t.Parallel() x := &proto2.UploadFileRequest{ Type: &proto2.UploadFileRequest_ChunkPiece{ ChunkPiece: &proto.ChunkPiece{ Data: []byte("Hello World!"), FullDataHash: []byte("Foobar"), PieceIndex: 42, }, }, } data, err := protobuf.Marshal(x) require.NoError(t, err) // Exactly the same output // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main` // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch fmt.Println(base64.StdEncoding.EncodeToString(data)) } ``` # What this does This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.	2026-01-09 11:23:32 -06:00
Steven Masley	89f4d60e7b	chore: remove experiment "terraform-directory-reuse" (#21397 ) Experiment is no longer required, the new method will be released without an experiment and without a toggle Main PR is: https://github.com/coder/coder/pull/21398	2026-01-09 11:13:16 -06:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Danielle Maywood	c3224b793e	fix: handle scenario where provisionerdserver deletes task before coderd (#21220 )	2025-12-11 13:04:13 +00:00
Marcin Tojek	d004710a74	feat: add prebuild invalidation via last_invalidated_at timestamp (#20582 ) Updates #17917	2025-11-20 17:12:25 +01:00
Steven Masley	a10c5ff381	chore: protect build timings insert for invalid enums (#20821 ) Database insert errors will fail the transaction. So this error is fatal. Properly return it for a better error call stack, and not just hiding the error in the logs.	2025-11-19 09:34:19 -06:00
Susana Ferreira	79d46769fe	chore: remove warning for non-trackable workspace builds in metrics (#20775 ) Previously, `UpdateWorkspaceTimingsMetrics` would log a warning for workspace builds that aren't tracked (restarts, stops, subsequent builds after creation). This was noisy since these are legitimate operations, not errors. `UpdateWorkspaceTimingsMetrics` is specifically designed to track only workspace creation, prebuild creation, and prebuild claim timings. Related with: https://github.com/coder/coder/pull/20772	2025-11-14 12:26:32 +00:00
Steven Masley	fe3b825b86	chore: per template opt into cached terraform directories (#20609 ) For experimental and dogfood purposes, this adds the ability to opt in a single template. Leaving the rest of the templates as is. For GA, this setting might be removed or changed.	2025-11-13 14:04:12 -06:00
Steven Masley	9ca5b44b56	chore: implement persistent terraform directories (experimental) (#20563 ) Prior to this, every workspace build ran `terraform init` in a fresh directory. This would mean the `modules` are downloaded fresh. If the module is not pinned, subsequent workspace builds would have different modules.	2025-11-13 07:50:17 -06:00
Steven Masley	04727c06e8	chore: add experiment toggle for terraform workspace caching (#20559 ) Experiments passed to provisioners to determine behavior. This adds `--experiments` flag to provisioner daemons. Prior to this, provisioners had no method to turn on/off experiments.	2025-11-12 14:26:15 -06:00
Steven Masley	9149c1e9f2	chore: append template metadata to protobuf config (#20558 ) Adds some extra meta data sent to provisioners. Also adds a field `reuse_terraform_workspace` to tell the provisioner whether or not to use the caching experiment.	2025-11-12 12:46:39 -06:00
Mathias Fredriksson	ce04f6cc5d	fix(coderd): remove deprecated AITaskSidebarApp column (#20680 ) This column was no longer used in `v2.28` and the codersdk field deprecated. Both can now be dropped in `v2.29`. Closes coder/internal#974	2025-11-07 12:45:45 +02:00
Mathias Fredriksson	a6b0eae38d	refactor(coderd): drop sidebar app constraint and simplify provisionerdserver for tasks (#20591 ) Updates coder/internal#973 Updates coder/internal#974	2025-11-03 13:46:38 +02:00
Cian Johnston	73dedcc765	fix: delete related task when deleting workspace (#20567 ) * Instead of prompting the user to start a deleted workspace (which is silly), prompt them to create a new task instead. * Adds a warning dialog when deleting a workspace * Updates provisionerdserver to delete the related task if a workspace is related to a task	2025-10-30 10:37:51 +00:00
Danielle Maywood	5a31c590e6	fix(coderd/provisionerdserver): pipe through task id and prompt (#20408 ) Pipes through the Task's ID and prompt into the provisioner. This is required to support the new `coder_ai_task.prompt` field and modified `coder_ai_task.id` field.	2025-10-24 09:43:48 +01:00
Mathias Fredriksson	a8f87c2625	feat(coderd): implement task to app linking (#20237 ) This change adds workspace build/agent/app linking to tasks and wires it into `wsbuilder` and `provisionerdserver`. Closes coder/internal#948 Closes coder/coder#20212 Closes coder/coder#19773	2025-10-13 12:57:06 +03:00
Danielle Maywood	f31e6e09ba	chore(provisioner): support updated coder_ai_task resource (#20160 ) Closes https://github.com/coder/internal/issues/978 - Introduce `CODER_TASK_ID` and `CODER_TASK_PROMPT` to the provisioner environment - Make use of new `app_id` field in provider, with a fallback to `sidebar_app.id` for backwards compatibility For now I've left the `taskPrompt` and `taskID` as a TODO as we do not yet create these values.	2025-10-09 10:42:01 +01:00
Rafael Rodriguez	e53bc247e9	feat: add tooltip field to workspace app that renders as markdown (#19651 ) In this pull request we're adding an optional `tooltip` field. The `tooltip` field is a string field (with markdown support) that will be used to display tooltips on hover over app buttons in a workspace dashboard. Tooltip screenshot <img width="816" height="275" alt="Screenshot 2025-08-29 at 4 11 56 PM" src="https://github.com/user-attachments/assets/52c736a1-f632-465b-89a0-35ca99bd367b" /> Tooltip video https://github.com/user-attachments/assets/21806337-accc-4acf-b8c6-450c031d98f1 Issue: https://github.com/coder/coder/issues/18431 Related provider PR: https://github.com/coder/terraform-provider-coder/pull/435 ### Changes - Added migration to add `tooltip` column to `workspace_apps` table - Updated queries to get/set the new `tooltip` column - Updated frontend to render tooltip as markdown (primary tool tip takes precedence over template tooltip) ### Testing - Added storybook test for `Applink` markdown rendering	2025-09-10 11:01:54 -05:00
Cian Johnston	06cbb2890f	fix: expire token for prebuilds user when regenerating session token (#19667 ) * provisionerdserver: Expires prebuild user token for workspace, if it exists, when regenerating session token. * dbauthz: disallow prebuilds user from creating api keys * dbpurge: added functionality to expire stale api keys owned by the prebuilds user	2025-09-02 09:38:43 +01:00
Susana Ferreira	0ab345ca84	feat: add prebuild timing metrics to Prometheus (#19503 ) ## Description This PR introduces one counter and two histograms related to workspace creation and claiming. The goal is to provide clearer observability into how workspaces are created (regular vs prebuild) and the time cost of those operations. ### `coderd_workspace_creation_total` * Metric type: Counter * Name: `coderd_workspace_creation_total` * Labels: `organization_name`, `template_name`, `preset_name` This counter tracks whether a regular workspace (not created from a prebuild pool) was created using a preset or not. Currently, we already expose `coderd_prebuilt_workspaces_claimed_total` for claimed prebuilt workspaces, but we lack a comparable metric for regular workspace creations. This metric fills that gap, making it possible to compare regular creations against claims. Implementation notes: * Exposed as a `coderd_` metric, consistent with other workspace-related metrics (e.g. `coderd_api_workspace_latest_build`: https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149). * Every `defaultRefreshRate` (1 minute ), DB query `GetRegularWorkspaceCreateMetrics` is executed to fetch all regular workspaces (not created from a prebuild pool). * The counter is updated with the total from all time (not just since metric introduction). This differs from the histograms below, which only accumulate from their introduction forward. ### `coderd_workspace_creation_duration_seconds` & `coderd_prebuilt_workspace_claim_duration_seconds` * Metric types: Histogram * Names: * `coderd_workspace_creation_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name`, `type` (`regular`, `prebuild`) * `coderd_prebuilt_workspace_claim_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name` We already have `coderd_provisionerd_workspace_build_timings_seconds`, which tracks build run times for all workspace builds handled by the provisioner daemon. However, in the context of this issue, we are only interested in creation and claim build times, not all transitions; additionally, this metric does not include `preset_name`, and adding it there would significantly increase cardinality. Therefore, separate more focused metrics are introduced here: * `coderd_workspace_creation_duration_seconds`: Build time to create a workspace (either a regular workspace or the build into a prebuild pool, for prebuild initial provisioning build). * `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a prebuilt workspace from the pool. The reason for two separate histograms is that: * Creation (regular or prebuild): provisioning builds with similar time magnitude, generally expected to take longer than a claim operation. * Claim: expected to be a much faster provisioning build. #### Native histogram usage Provisioning times vary widely between projects. Using static buckets risks unbalanced or poorly informative histograms. To address this, these metrics use [Prometheus native histograms](https://prometheus.io/docs/specs/native_histograms/): * First introduced in Prometheus v2.40.0 * Recommended stable usage from v2.45+ * Requires Go client `prometheus/client_golang` v1.15.0+ * Experimental and must be explicitly enabled on the server (`--enable-feature=native-histograms`) For compatibility, we also retain a classic bucket definition (aligned with the existing provisioner metric: https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189). * If native histograms are enabled, Prometheus ingests the high-resolution histogram. * If not, it falls back to the predefined buckets. Implementation notes: * Unlike the counter, these histograms are updated in real-time at workspace build job completion. * They reflect data only from the point of introduction forward (no historical backfill). ## Relates to Closes: https://github.com/coder/coder/issues/19528 Native histograms tested in observability stack: https://github.com/coder/observability/pull/50	2025-08-28 15:00:26 +01:00
Cian Johnston	bd139f3a43	fix(coderd/provisionerdserver): workaround lack of coder_ai_task resource on stop transition (#19560 ) This works around the issue where a task may "disappear" on stop. Re-using the previous value of `has_ai_task` and `sidebar_app_id` on a stop transition. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2025-08-27 10:33:17 +01:00
Dean Sheather	1a601c30ad	chore: move usage types to new package (#19103 )	2025-08-20 23:48:38 +10:00
Dean Sheather	6eb02d1c2a	chore: wire up usage tracking for managed agents (#19096 ) Wires up the usage collector and publisher to coderd. Relates to coder/internal#814	2025-08-20 23:38:09 +10:00
Kacper Sawicki	9edceef0bf	feat(coderd): add support for external agents to API's and provisioner (#19286 ) This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder. Depends on: https://github.com/coder/terraform-provider-coder/pull/424 * GET /api/v2/init-script - Gets the agent initialization script * By default, it returns a script for Linux (amd64), but with query parameters (os and arch) you can get the init script for different platforms * GET /api/v2/workspaces/{workspace}/external-agent/{agent}/credentials - Gets credentials for an external agent (enterprise) * Updated queries to filter workspaces/templates by the has_external_agent field	2025-08-19 10:41:33 +02:00
Kacper Sawicki	5e4aa79a9d	feat(coderd): add `has_external_agent` flag to template_versions and workspace_builds (#19285 ) This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder. * Added has_external_agent field to workspace builds and template versions	2025-08-19 10:29:51 +02:00
Susana Ferreira	8567ecbe52	fix: set prebuilds lifecycle parameters on creation and claim (#19252 ) ## Description This PR ensures that prebuilt workspaces are properly excluded from the lifecycle executor and treated as a separate class of workspaces, fully managed by the prebuild reconciliation loop. It introduces two lifecycle guarantees: * When a prebuilt workspace is created (i.e., when the workspace build completes), all lifecycle-related fields are unset, ensuring the workspace does not participate in TTL, autostop, autostart, dormancy, or auto-deletion logic. * When a prebuilt workspace is claimed, it transitions into a regular user workspace. At this point, all lifecycle fields are correctly populated according to template-level configurations, allowing the workspace to be managed by the lifecycle executor as expected. ## Changes * Prebuilt workspaces now have all lifecycle-relevant fields unset during creation * When a prebuild is claimed: * Lifecycle fields are set based on template and workspace level configurations. This ensures a clean transition into the standard workspace lifecycle flow. * Updated lifecycle-related SQL update queries to explicitly exclude prebuilt workspaces. ## Relates Related issue: https://github.com/coder/coder/issues/18898 To reduce the scope of this PR and make the review process more manageable, the original implementation has been split into the following focused PRs: * https://github.com/coder/coder/pull/19259 * https://github.com/coder/coder/pull/19263 * https://github.com/coder/coder/pull/19264 * https://github.com/coder/coder/pull/19265 These PRs should be considered in conjunction with this one to understand the complete set of lifecycle separation changes for prebuilt workspaces.	2025-08-13 12:45:46 +01:00

1 2 3 4 5

231 Commits