coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 13:08:25 +00:00

Author	SHA1	Message	Date
Danielle Maywood	5deab9f721	test: wait for devcontainer readiness (#25567 )	2026-05-22 13:55:21 +01:00
Michael Suchacz	ca1f6b19a2	feat: remove legacy chat provider tables (#25416 )	2026-05-22 09:50:01 +02:00
Michael Suchacz	06526a5822	feat: use AI provider chat APIs (#25415 )	2026-05-22 07:53:23 +02:00
Spike Curtis	8dc4d76890	chore: add agent-connection-watch for workspaces (#24507 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> relates to GRU-18 Adds basic implementation for Workspace Agent Connection Watch and tests. Missing are handling of logs.	2026-05-20 13:09:11 -04:00
Jakub Domeracki	1a1f06aa79	fix: verify PKCS7 signature on Azure instance identity tokens (#25286 ) Migrates Azure instance identity verification from `go.mozilla.org/pkcs7` and `github.com/fullsailor/pkcs7` to `github.com/smallstep/pkcs7`, using `VerifyWithChainAtTime` to validate both the PKCS7 signature and the certificate chain in one call. The previous code only verified the signer certificate against a set of intermediates/roots but did not verify that the PKCS7 signature itself covered the content, meaning tampered payloads could be accepted. The `Options` struct is restructured to accept `Roots`, `Intermediates`, and `CurrentTime` as explicit fields instead of embedding `x509.VerifyOptions`. The test helper `NewAzureInstanceIdentity` now builds a realistic 3-level certificate chain (Root CA -> Intermediate CA -> Signing Cert) matching real Azure trust hierarchy. New tests (`TestValidate_TamperedContent`, `TestValidate_UntrustedCertWithValidSignature`) confirm tampered and untrusted envelopes are rejected. Addresses GHSA-6x44-w3xg-hqqf. > [!NOTE] > This PR was authored by Coder Agents. <details> <summary>Implementation Plan</summary> ### Files Changed \| File \| Summary \| \|------\|---------\| \| `coderd/azureidentity/azureidentity.go` \| Replace `signer.Verify()` with `VerifyWithChainAtTime`; restructure `Options` struct; add `ParseCertificates()` helper \| \| `coderd/azureidentity/azureidentity_test.go` \| Add `testCertChain` builder, tampered-content and untrusted-cert tests; update existing tests for new `Options` API \| \| `coderd/coderd.go` \| Change `AzureCertificates` field from `x509.VerifyOptions` to `azureidentity.Options` \| \| `coderd/workspaceresourceauth.go` \| Pass `api.AzureCertificates` directly instead of wrapping \| \| `coderd/coderdtest/coderdtest.go` \| Migrate to `smallstep/pkcs7`; build 3-level cert chain in test helper \| \| `go.mod` / `go.sum` \| Add `github.com/smallstep/pkcs7`; remove `fullsailor/pkcs7` and `go.mozilla.org/pkcs7` \| </details>	2026-05-13 14:14:07 +00:00
Ethan	4e08543ace	test(coderd): centralize chat test harness and stabilize flakes (#25171 ) Chat tests previously constructed a real `openai` provider with a fake API key and no `BaseURL`, so background title generation hit `api.openai.com` and timed out under `-race`. The same root cause produced several distinct flakes: title regeneration races with synchronous `UpdateChat`/`ProposeChatTitle`, and pagination races against `updated_at` bumps from real-network processing. This moves the fake OpenAI-compatible provider and the chat-settle wait into first-class `coderdtest` capabilities. `coderd.Options.ChatProviderAPIKeys` is the new seam tests use to redirect chat traffic to a local `httptest.Server`. `coderdtest.WaitForChatSettled` replaces per-test waiters and drains tracked chat-daemon work after the chat row leaves `pending`/`running`. The `newChatClient*` constructors funnel through one options builder that installs the fake provider before the coderd test server so cleanup ordering is deterministic. Closes https://github.com/coder/internal/issues/1528 & Closes ENG-2659 Closes https://github.com/coder/internal/issues/1480 & Closes CODAGT-359 Closes https://github.com/coder/internal/issues/1507 & Closes CODAGT-368 Relates to https://github.com/coder/internal/issues/1397 & Relates to CODAGT-374	2026-05-12 22:13:55 +10:00
Thomas Kosiewski	4a6756a3e8	fix: isolate test HTTP clients (#25038 )	2026-05-11 11:03:38 +02:00
Atif Ali	fad69df710	fix: correct SCIM Swagger try it out URLs (#24779 )	2026-05-05 02:54:03 +05:00
Asher	70d46943db	fix: match on ID instead of username (#24797 ) The username suffix could put the name past the 32 character limit, causing the test to flake. Instead of using a suffix, match on the expected ID instead.	2026-04-29 12:24:52 -08:00
George K	3f0e015fe5	fix: allow coderd to start with an empty DERP map when built-in DERP is disabled (#24544 ) Allow coderd to start with an empty base DERP map when built-in DERP is disabled and no static DERP map is configured, so DERP can come from workspace proxies after startup. Also add a DERP healthcheck warning when no DERP servers are currently available at runtime. Related to: https://linear.app/codercom/issue/PLAT-43/bug-coderd-unable-to-be-started-if-built-in-derp-server-disabled-and Related to: https://github.com/coder/coder/issues/22324	2026-04-28 09:17:08 -07:00
Kyle Carberry	391b22aef7	feat: add CLI commands for managing chat context from workspaces (#24105 ) Adds `coder exp chat context add` and `coder exp chat context clear` commands that run inside a workspace to manage chat context files via the agent token. `add` reads instruction and skill files from a directory (defaulting to cwd) and inserts them as context-file messages into an active chat. Multiple calls are additive — `instructionFromContextFiles` already accumulates all context-file parts across messages. `clear` soft-deletes all context-file messages, causing `contextFileAgentID()` to return `!found` on the next turn, which triggers `needsInstructionPersist=true` and re-fetches defaults from the agent. Both commands auto-detect the target chat via `CODER_CHAT_ID` (already set by `agentproc` on chat-spawned processes), or fall back to single-active-chat resolution for the agent. The `--chat` flag overrides both. Also adds sub-agent context inheritance: `createChildSubagentChat` now copies parent context-file messages to child chats at spawn time, so delegated sub-agents share the same instruction context without independently re-fetching from the workspace agent. <details><summary>Implementation details</summary> New files: - `cli/exp_chat.go` — CLI command tree under `coder exp chat context` Modified files: - `agent/agentcontextconfig/api.go` — `ConfigFromDir()` reads context from an arbitrary directory without env vars - `codersdk/agentsdk/agentsdk.go` — `AddChatContext`/`ClearChatContext` SDK methods - `coderd/workspaceagents.go` — POST/DELETE handlers on `/workspaceagents/me/chat-context` - `coderd/coderd.go` — Route registration - `coderd/database/queries/chats.sql` — `GetActiveChatsByAgentID`, `SoftDeleteContextFileMessages` - `coderd/database/dbauthz/dbauthz.go` — RBAC implementations for new queries - `coderd/x/chatd/subagent.go` — `copyParentContextFiles` for sub-agent inheritance - `cli/root.go` — Register `chatCommand()` in `AGPLExperimental()` Auth pattern: Uses `AgentAuth` (same as `coder external-auth`) — agent token via `CODER_AGENT_TOKEN` + `CODER_AGENT_URL` env vars. </details> > 🤖 Generated by Coder Agents --------- Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>	2026-04-09 16:33:00 +02:00
Kayla はな	c5f1a2fccf	feat: make service accounts a Premium feature (#24020 )	2026-04-07 12:25:32 -06:00
Kyle Carberry	919dc299fc	feat: agent reads context files and discovers skills locally (#23935 ) Piggybacks on #23878. Moves instruction file reading and skill discovery from `chatd` (server-side, via multiple `LS`/`ReadFile` round-trips through the agent connection) to the agent itself (local filesystem access). This intentionally drops backward compatibility with older agents that don't support the context-config endpoint. Agents and server are deployed together; there is no rolling-update contract to maintain here. ## What changed The agent's `GET /api/v0/context-config` response now returns `[]ChatMessagePart` directly — the same types chatd persists. This eliminates intermediate type conversions and makes the protocol extensible. \| Field \| Type \| Description \| \|---\|---\|---\| \| `parts` \| `[]ChatMessagePart` \| Context-file and skill parts, ready to persist \| \| `working_dir` \| `string` \| Agent's resolved working directory \| Removed from the response: `instructions_dirs`, `instructions_file`, `skills_dirs`, `skill_meta_file`, `mcp_config_files` — the agent reads files locally and returns their content as parts. Removed from chatd: all legacy `LS`/`ReadFile` fallback code (`readHomeInstructionFile`, `readInstructionDirFile`, `DiscoverSkills` via LS, etc). ## Why The previous architecture had the agent resolve paths, serve them over HTTP, then `chatd` make N+1 round-trips back through the agent connection to read files. The agent has direct filesystem access and should just read the files. ## Key design decisions - Agent returns `ChatMessagePart` directly — same types chatd persists. No intermediate `InstructionFileEntry`/`SkillEntry` types needed. - `SkillMeta.MetaFile` — persisted via `ContextFileSkillMetaFile` on the skill part, so custom meta file names (`CODER_AGENT_EXP_SKILL_META_FILE`) survive across chat turns. - No pre-read body — `read_skill` always dials the workspace to fetch the skill body on demand. Simpler than caching the body in the response. - MCP config paths kept agent-internal — `MCPConfigFiles()` getter, not sent over the wire. - No backward compat fallback — old agents that don't support context-config get no instruction files. This is acceptable since agent and server deploy together.	2026-04-04 12:45:46 -04:00
Asher	81188b9ac9	feat: add filtering by service account (#23468 ) You can now filter by/out service accounts using `service_account:true/false` or using the filter dropdown.	2026-03-24 10:13:25 -08:00
Asher	24ab216dd1	feat: add new group members endpoint with filtering and pagination (#23067 ) Partially addresses #21813 (still need to make changes to the "add user" button to be complete) Since there are a lot of user tests already, I moved them into `coderdtest` to be shared.	2026-03-20 12:43:03 -08:00
Steven Masley	84de391f26	chore: add tallyman events for ai seat tracking (#22689 ) AI seat tracking inserted as heartbeat into usage table.	2026-03-18 09:30:22 -05:00
George K	91ec0f1484	feat: add service_accounts workspace sharing mode (#23093 ) Introduce a three-way workspace sharing setting (none, everyone, service_accounts) replacing the boolean workspace_sharing_disabled. In service_accounts mode, only service account-owned workspaces can be shared while regular members' share permissions are removed. Adds a new organization-service-account system role with per-org permissions reconciled alongside the existing organization-member system role. Related to: https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com> Co-authored-by: Kayla はな <mckayla@hey.com>	2026-03-17 12:16:43 -07:00
Kyle Carberry	10a33ebc75	test: reduce Await* polling interval from 250ms to 25ms (#22536 ) ## Summary Change the four main `coderdtest` Await helper functions to poll at `IntervalFast` (25ms) instead of `IntervalMedium` (250ms): - `AwaitTemplateVersionJobCompleted` - `AwaitWorkspaceBuildJobCompleted` - `WorkspaceAgentWaiter.WaitFor` - `WorkspaceAgentWaiter.Wait` These are called ~855 times across the test suite. Each call previously wasted ~125ms on average waiting for the next poll tick. `AwaitTemplateVersionJobRunning` already used `IntervalFast` — this makes all Await helpers consistent. ## Measured Impact Local benchmarks (postgres, `-short -count=1 -p 8 -parallel 8 -tags=testsmallbatch`): \| Package \| Before \| After \| Delta \| \|---\|---\|---\|---\| \| enterprise/coderd \| 90.8s \| 76.0s \| -16.3% \| \| coderd \| 65.6s \| 56.5s \| -13.8% \| \| cli \| 57.9s \| 37.8s \| -34.7% \| \| enterprise (root) \| 41.1s \| 39.9s \| -2.9% \| \| Sum of all packages \| 623s \| 543s \| -12.8% \| Zero test failures across all 199 packages.	2026-03-03 13:48:58 +00:00
Dean Sheather	bef7eb9dcc	fix: avoid derp-related panic during wsproxy registration (#22322 )	2026-02-27 00:07:14 +11:00
Callum Styan	5f3be6b288	feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869 ) This PR adds some metrics to help identify job enqueue rates and latencies. This work was initiated as a way to help reduce the cost of the observation/measurement itself for autostart scaletests, which impacts our ability to identify/reason about the load caused by autostart. See: https://github.com/coder/internal/issues/1209 I've extended the metrics here to account for regular user initiated builds, prebuilds, autostarts, etc. IMO there is still the question here of whether we want to include or need the `transition` label, which is only present on workspace builds. Including it does lead to an increase in cardinality, and in the case of the histogram (when not using native histograms) that's at least a few extra series for every bucket. We could remove the transition label there but keep it on the counter. Additionally, the histogram is currently observing latencies for other jobs, such as template builds/version imports, those do not have a transition type associated with them. Tested briefly in a workspace, can see metric values like the following: - `coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"} 1` - `coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"} 1` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 13:40:47 -08:00
Cian Johnston	91be688e39	chore(coderd/database): remove deprecated db2sdk.List(Lazy)? methods (#21902 ) Removes deprecated methods db2sdk.List and db2sdk.ListLazy.	2026-02-03 17:52:07 +00:00
Steven Masley	799b190dee	fix: do not enforce managed agent limit for non-task workspaces (#21689 ) Only task workspaces have the checks in wsbuilder for violating the managed agent caps in the license. Stopped tasks that are resumed with a regular workspace start still count as usage.	2026-01-27 19:01:17 -06:00
Callum Styan	e195856c43	perf: reduce pg_notify call volume by batching together agent metadata updates (#21330 ) --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-22 22:47:49 -08:00
Cian Johnston	3a62a8e70e	chore: improve healthcheck timeout message (#21520 ) Relates to https://github.com/coder/internal/issues/272 This flake has been persisting for a while, and unfortunately there's no detail on which healthcheck in particular is holding things up. This PR adds a concurrency-safe `healthcheck.Progress` and wires it through `healthcheck.Run`. If the healthcheck times out, it will provide information on which healthchecks are completed / running, and how long they took / are still taking. 🤖 Claude Opus 4.5 completed the first round of this implementation, which I then refactored.	2026-01-15 16:37:05 +00:00
George K	cc2efe9e1f	feat(coderd/rbac): make organization-member a per-org system custom role (#21359 ) Migrated the built-in organization-member role to DB storage so it can be customized per org. Closes https://github.com/coder/internal/issues/1073 (part 1)	2026-01-12 18:19:19 -08:00
Zach	091d31224d	fix: replace moby/moby namesgenerator with internal implementation (#21377 ) Replace the external moby/moby/pkg/namesgenerator dependency with an internal implementation using gofakeit/v7. The moby package has ~25k unique name combinations, and with its retry parameter only adds a random digit 0-9, giving ~250k possibilities. In parallel tests, this has led to collisions (flakes). The new internal API at coderd/util/namesgenerator eliminates the external dependnecy and offers functions with explicit uniqueness guarantees. This PR also consolidates fragmented name generation in a few places to use the new package. \| Old (moby/moby) \| New \| \|-------------------------------------\|------------------------\| \| namesgenerator.GetRandomName(0) \| NameWith("_") \| \| namesgenerator.GetRandomName(>0) \| NameDigitWith("_") \| \| testutil.GetRandomName(t) \| UniqueName() \| \| testutil.GetRandomNameHyphenated(t) \| UniqueNameWith("-") \| namesgenerator package API: - NameWith(delim): random name, not unique - NameDigitWith(delim): random name with 1-9 suffix, not unique - UniqueName(): guaranteed unique via atomic counter - UniqueNameWith(delim): unique with custom delimiter Names continue to be docker style `[adjective][delim][surname]`. Unique names are truncated to 32 characters (preserving the numeric suffix) to fit common name length limits in Coder. Related test flakes: https://github.com/coder/internal/issues/1212 https://github.com/coder/internal/issues/118 https://github.com/coder/internal/issues/1068	2026-01-09 15:40:26 -07:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Danny Kopping	733b6b7db9	feat: add API to serve proxy certificate (#21391 ) Closes https://github.com/coder/internal/issues/1184	2025-12-29 18:00:06 +00:00
Steven Masley	8fefd91e4a	feat!: support PKCE in the oauth2 client's auth/exchange flow (#21215 ) Breaking Change: Existing oauth apps might now use PKCE. If an unknown IdP type was being used, and it does not support PKCE, it will break. To fix, set the PKCE methods on the external auth to `none` ``` export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none ```	2025-12-15 17:41:47 +00:00
Steven Masley	3194bcfc9e	chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064 ) Provisioner steps broken into smaller granular actions. Changes: - `ExtractArchive` moved to `init` request (was in `configure`) - Writing `tfstate` moved to `plan` (was in `configure`) - Moved most plan/apply outputs to `GraphComplete`	2025-12-15 11:26:41 -06:00
Asher	d306a2d7e5	chore: log with %s on unexpected non-sdk err (#20570 ) With `%w` it prints an address instead of the error, like `<op> <url> 0xc001329370` instead of `<op> <url>: some error`, honestly idk why you even can log with `%w` it seems like it makes no sense to use `%w` outside of `fmt.Errorf`. This is to help debug https://github.com/coder/internal/issues/1010.	2025-10-30 10:23:52 -08:00
Dean Sheather	0652b18ebc	feat: mount pprof and metrics to /api/v2/debug for admins (#20353 ) Adds the following debug routes for people with the `debug_info:read` permission: - `/api/v2/debug/pprof` for `net/http/pprof` - `/` - `/cmdline` - `/profile` - `/symbol` - `/trace` - `/*` - `/api/v2/debug/metrics` for Prometheus metrics	2025-10-21 03:13:11 +00:00
Cian Johnston	1a104751e4	chore(coderd): use WorkspaceAgentWaiter.WithContext in aitasks_test.go (#20360 ) Fixes https://github.com/coder/internal/issues/1067 - Adds `WorkspaceAgentWaiter.WithContext()` - Updates usage of `WorkspaceAgentWaiter` in `aitasks_test.go` with context bumped to `testutil.WaitMedium` Authored by Claude with manual review and updates.	2025-10-17 13:45:11 +01:00
Thomas Kosiewski	ed90ecf00e	feat: add allow_list to resource-scoped API tokens (#19964 ) # Add API key allow_list for resource-scoped tokens This PR adds support for API key allow lists, enabling tokens to be scoped to specific resources. The implementation: 1. Adds a new `allow_list` field to the `CreateTokenRequest` struct, allowing clients to specify resource-specific scopes when creating API tokens 2. Implements `APIAllowListTarget` type to represent resource targets in the format `<type>:<id>` with support for wildcards 3. Adds validation and normalization logic for allow lists to handle wildcards and deduplication 4. Integrates with RBAC by creating an `APIKeyEffectiveScope` that merges API key scopes with allow list restrictions 5. Updates API documentation and TypeScript types to reflect the new functionality This feature enables creating tokens that are limited to specific resources (like workspaces or templates) by ID, making it possible to create more granular API tokens with limited access.	2025-10-09 14:53:08 +02:00
Thomas Kosiewski	4bda39585d	feat: add external API key scopes (#19916 ) # Add support for low-level API key scopes This PR adds support for fine-grained API key scopes based on RBAC resource:action pairs. It includes: 1. A new endpoint `/api/v2/auth/scopes` to list all public low-level API key scopes 2. Generated constants in the SDK for all public scopes 3. Tests to verify scope validation during token creation 4. Updated API documentation to reflect the expanded scope options The implementation allows users to create API keys with specific permissions like `workspace:read` or `template:use` instead of only the legacy `all` or `application_connect` scopes. Fixes #19847	2025-09-26 11:43:32 +02:00
Spike Curtis	289f0217c7	feat: add scaletest Runner for dynamicparameters load gen (#19890 ) relates to https://github.com/coder/internal/issues/912 Adds a new scaletest Runner to generate dynamic parameters load. A later PR will add the CLI command, including creating the template & version.	2025-09-25 16:18:37 +04:00
Thomas Kosiewski	fb0ce389a6	feat: implement API key scopes database migration (#19861 ) Added database migration for API key scopes. Fixes #19845	2025-09-22 19:26:51 +02:00
Spike Curtis	a30c30724b	chore: refactor cli and coderd to use ClientOptions (#19763 ) Refactors CLI and coderd to use the ClientBuilder pattern rather than directly instantiating the Client.	2025-09-22 21:02:56 +04:00
Paweł Banaszewski	439b041780	feat: add best effort attempt to revoke oauth access token in external auth provider (#19775 ) Solves #15575 Adds OAuth access token revocation when unlinking external auth provider. Due to revocation not being consistently implemented by providers this is only best effort attempt. Unsuccessful revocation won't influence link removal.	2025-09-19 16:27:02 +02:00
Steven Masley	3df9d8e902	test: set test flags from within an init to limit maximum test parallelism (#19575 )	2025-09-17 08:24:19 -05:00
Asher	6d39077087	chore: log error when checking if codersdk.Err (#19784 )	2025-09-11 13:17:09 -08:00
Kacper Sawicki	b4e4173347	test: improve workspace build job completion logging (#19740 ) Closes https://github.com/coder/internal/issues/935 This PR enhances the AwaitWorkspaceBuildJobCompleted func in coderdtest pkg to provide better visibility into test failures and debugging information.	2025-09-09 08:38:32 +02:00
Spike Curtis	192c81e8f9	chore: refactor codersdk to use SessionTokenProvider (#19565 ) Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token. The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token. However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.	2025-08-29 10:41:32 +02:00
Callum Styan	321c2b8fce	fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable (#19478 ) The flake here had two causes: 1. related to usage of time.Now() in MustWaitForProvisionersAvailable and 2. the fact that UpdateProvisionerLastSeenAt can not use a time that is further in the past than the current LastSeenAt time Previously the test here was calling `coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now` rather than the next tick time like the real `hasProvisionersAvailable` function does. Additionally, when using `UpdateProvisionerLastSeenAt` the underlying db query enforces that the time we're trying to set `LastSeenAt` to cannot be older than the current value. I was able to reliably reproduce the flake by executing both the `UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own goroutines, the former with a small sleep to reliably ensure we'd trigger the autobuild before we set the `LastSeenAt` time. That's when I also noticed that `coderdtest.MustWaitForProvisionersAvailable` was using `time.Now` instead of the tick time. When I updated that function to take in a tick time + added a 2nd call to `UpdateProvisionerLastSeenAt` to set an original non-stale time, we could then never get the test to pass because the later call to set the stale time would not actually modify `LastSeenAt`. On top of that, calling the provisioner daemons closer in the middle of the function doesn't really do anything of value in this test. The fix for the flake is to keep the go routines, ensuring there would be a flake if there was not a relevant fix, but to include the fix which is to ensure that we explicitly wait for the provisioner to be stale before passing the time to `tickCh`. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-28 12:07:50 -07:00
Susana Ferreira	0ab345ca84	feat: add prebuild timing metrics to Prometheus (#19503 ) ## Description This PR introduces one counter and two histograms related to workspace creation and claiming. The goal is to provide clearer observability into how workspaces are created (regular vs prebuild) and the time cost of those operations. ### `coderd_workspace_creation_total` * Metric type: Counter * Name: `coderd_workspace_creation_total` * Labels: `organization_name`, `template_name`, `preset_name` This counter tracks whether a regular workspace (not created from a prebuild pool) was created using a preset or not. Currently, we already expose `coderd_prebuilt_workspaces_claimed_total` for claimed prebuilt workspaces, but we lack a comparable metric for regular workspace creations. This metric fills that gap, making it possible to compare regular creations against claims. Implementation notes: * Exposed as a `coderd_` metric, consistent with other workspace-related metrics (e.g. `coderd_api_workspace_latest_build`: https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149). * Every `defaultRefreshRate` (1 minute ), DB query `GetRegularWorkspaceCreateMetrics` is executed to fetch all regular workspaces (not created from a prebuild pool). * The counter is updated with the total from all time (not just since metric introduction). This differs from the histograms below, which only accumulate from their introduction forward. ### `coderd_workspace_creation_duration_seconds` & `coderd_prebuilt_workspace_claim_duration_seconds` * Metric types: Histogram * Names: * `coderd_workspace_creation_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name`, `type` (`regular`, `prebuild`) * `coderd_prebuilt_workspace_claim_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name` We already have `coderd_provisionerd_workspace_build_timings_seconds`, which tracks build run times for all workspace builds handled by the provisioner daemon. However, in the context of this issue, we are only interested in creation and claim build times, not all transitions; additionally, this metric does not include `preset_name`, and adding it there would significantly increase cardinality. Therefore, separate more focused metrics are introduced here: * `coderd_workspace_creation_duration_seconds`: Build time to create a workspace (either a regular workspace or the build into a prebuild pool, for prebuild initial provisioning build). * `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a prebuilt workspace from the pool. The reason for two separate histograms is that: * Creation (regular or prebuild): provisioning builds with similar time magnitude, generally expected to take longer than a claim operation. * Claim: expected to be a much faster provisioning build. #### Native histogram usage Provisioning times vary widely between projects. Using static buckets risks unbalanced or poorly informative histograms. To address this, these metrics use [Prometheus native histograms](https://prometheus.io/docs/specs/native_histograms/): * First introduced in Prometheus v2.40.0 * Recommended stable usage from v2.45+ * Requires Go client `prometheus/client_golang` v1.15.0+ * Experimental and must be explicitly enabled on the server (`--enable-feature=native-histograms`) For compatibility, we also retain a classic bucket definition (aligned with the existing provisioner metric: https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189). * If native histograms are enabled, Prometheus ingests the high-resolution histogram. * If not, it falls back to the predefined buckets. Implementation notes: * Unlike the counter, these histograms are updated in real-time at workspace build job completion. * They reflect data only from the point of introduction forward (no historical backfill). ## Relates to Closes: https://github.com/coder/coder/issues/19528 Native histograms tested in observability stack: https://github.com/coder/observability/pull/50	2025-08-28 15:00:26 +01:00
Kacper Sawicki	9edceef0bf	feat(coderd): add support for external agents to API's and provisioner (#19286 ) This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder. Depends on: https://github.com/coder/terraform-provider-coder/pull/424 * GET /api/v2/init-script - Gets the agent initialization script * By default, it returns a script for Linux (amd64), but with query parameters (os and arch) you can get the init script for different platforms * GET /api/v2/workspaces/{workspace}/external-agent/{agent}/credentials - Gets credentials for an external agent (enterprise) * Updated queries to filter workspaces/templates by the has_external_agent field	2025-08-19 10:41:33 +02:00
Dean Sheather	8f9f0cda11	chore: avoid DNS lookups for DERP in tests (#19385 ) Closes https://github.com/coder/internal/issues/886	2025-08-18 03:44:37 +00:00
Callum Styan	6c902a7410	fix: don't create autostart workspace builds with no available provisioners (#19067 ) This should fix https://github.com/coder/coder/issues/17941 by introducing a check for whether there are any valid (non-stale provisioners for a job in the autobuild executor code path. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-15 08:50:51 -07:00
ケイラ	1cffd11619	feat: add workspace sharing page (#19107 )	2025-07-31 15:05:09 +00:00

1 2 3 4 5 ...

418 Commits