coder

mirror of https://github.com/coder/coder.git synced 2026-06-02 20:48:20 +00:00

Author	SHA1	Message	Date
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Cian Johnston	0f446f99dd	feat(cli): add logs cmd (#21430 ) This PR adds a command to view the provisioner and agent logs for a given workspace. Note: I did investigate using the existing `cliui` methods to tail the logs but they are tailored to a very specific use-case. Other changes: - Adds `Agents` to `dbfake.WorkspaceResponse` - Adds methods to generate provisioner and agent logs in `dbgen` --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-08 09:58:10 +00:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Danielle Maywood	c77c0fce52	fix(cli/open): wait for agent to be created (#21448 ) Fix https://github.com/coder/internal/issues/596 --- 🤖 Claude Code with Claude Opus 4.5	2026-01-07 16:06:00 +00:00
Cian Johnston	6bd2d1c85f	chore(cli): seed healthcheck cache in TestSupportBundle (#21436 ) Fixes https://github.com/coder/internal/issues/272 This test periodically fails due to the healthcheck timing out. The problem is compounded due to the fact that we stand up a new coderdtest instance for each test. This PR does the following: * Updates the subtests to share a single `coderdtest` instance. * Hits the `/debug/health` endpoint before completing the setup phase so that the result is cached. This will not completely remove the issue, as the healthcheck could still fail due to test-infrastructure-related issues. In this case we may decide to add a retry in this 'seed' function.	2026-01-07 08:47:31 +00:00
Asher	4a97df3768	chore: rename flag to disable template insights (#21329 ) Because this affects more than just the template insights page (specifically it also affects the deployment stats endpoint which is shown on bottom bar and Prometheus), the group is being renamed generically to just "stats collection". In the future if we need to affect the other stats we can put those options here. Then, because this change only affects a portion of stats, specifically usage stats like connection and application time, bytes sent, etc, add a new sub-group called "usage stats". Then finally add back the "enable" flag. This also gives us a place to one day place an "anonymize" flag if we need to go that route.	2026-01-05 11:44:06 -09:00
Zach	07924037e7	feat: add boundary log forwarding from agent to coderd (#21345 ) Add agent forwarding of boundary audit logs from workspaces to coderd via agent API, and re-emission of boundary logs to coderd stderr. This change adds a server to the workspace agent that always listens on a unix socket for boundary to connect and send audit logs. coderd log format example: ``` [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=.. ``` Corresponding boundary PR: https://github.com/coder/boundary/pull/124 RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba https://github.com/coder/coder/issues/21280	2025-12-31 16:38:19 -07:00
Susana Ferreira	b97572285a	feat: add core AI MITM proxy daemon (#21296 ) ## Description Adds the core AI Bridge MITM proxy daemon. This proxy intercepts HTTPS traffic, decrypts it using a configured CA certificate, and forwards requests to AIBridge for processing. ## Changes * Added `aibridgeproxyd` package with the core proxy server implementation * Added configuration options: `CODER_AIBRIDGE_PROXY_ENABLED`, `CODER_AIBRIDGE_PROXY_LISTEN_ADDR`, `CODER_AIBRIDGE_PROXY_CERT_FILE`, `CODER_AIBRIDGE_PROXY_KEY_FILE` * Added tests for server initialization and MITM functionality Closes https://github.com/coder/internal/issues/1180	2025-12-29 15:31:51 +00:00
Danielle Maywood	44a46db487	feat(agent): support deleting dev containers (#21247 ) Add logic to the agent, and an endpoint, to allow requesting and then deleting a Dev Container and its related agent.	2025-12-22 11:28:31 +00:00
Rowan Smith	81cbf03a52	chore: fix typo in organization roles create help text (#21352 ) A simple typo fix to the help text stidin > stdin ``` ➜ coder git:(org_role_fix) ✗ coder organizations roles create -h coder v2.29.1+59cdd7e USAGE: coder organizations roles create [flags] <role_name> Create a new organization custom role - Run with an input.json file: $ coder organization -O <organization_name> roles create --stidin < role.json ```	2025-12-22 11:24:00 +11:00
Jake Howell	00793cc0b5	feat: add prometheus observability metrics for `dbpurge` (#21074 ) Related to [`internal#1139`](https://github.com/coder/internal/issues/1139) This implements some prometheus metrics for records being removed from the database. Currently we're tracking the following fields being removed from the DB by this. They're viewable in the `/api/v2/debug/metrics` endpoint. * `expired_api_keys` * `aibridge_records` * `connection_logs` * `duration` ``` # HELP coderd_dbpurge_iteration_duration_seconds Duration of each dbpurge iteration in seconds. # TYPE coderd_dbpurge_iteration_duration_seconds histogram coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="1"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="5"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="10"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="30"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="60"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="300"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="600"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="+Inf"} 1 coderd_dbpurge_iteration_duration_seconds_sum{success="true"} 0.014787814 coderd_dbpurge_iteration_duration_seconds_count{success="true"} 1 # HELP coderd_dbpurge_records_purged_total Total number of records purged by type. # TYPE coderd_dbpurge_records_purged_total counter coderd_dbpurge_records_purged_total{record_type="aibridge_records"} 0 coderd_dbpurge_records_purged_total{record_type="audit_logs"} 0 coderd_dbpurge_records_purged_total{record_type="connection_logs"} 0 coderd_dbpurge_records_purged_total{record_type="expired_api_keys"} 0 coderd_dbpurge_records_purged_total{record_type="workspace_agent_logs"} 0 ``` \| Position \| Pull-request \| \| -------- \| ------------ \| \| ✅ \| [feat: add prometheus observability metrics for `dbpurge`](https://github.com/coder/coder/pull/21074) \| \| \| [feat: add rbac specificity for `dbpurge`](https://github.com/coder/coder/pull/21088) \|	2025-12-20 00:20:57 +11:00
Spike Curtis	73253df6bf	fix: use separate HTTP clients in scale test load generators (#21288 ) While scale testing, I noticed that our load generators send basically all requests to a single Coderd instance. e.g. ![image.png](https://app.graphite.com/user-attachments/assets/e259862a-adf1-47e7-a37b-fd14e420058e.png) This is because our scale test commands create all `Runner`s using the same codersdk Client, which means they share an underlying HTTP client. With HTTP/2 a single TCP session can multiplex many different HTTP requests (including websockets). So, it creates a single TCP connection to a single coderd, and then sends all the requests down the one TCP connections. This PR modifies the `exp scaletest` load generator commands to create an independent HTTP client per `Runner`. This means that each runner will create its own TCP connection. This should help spread the load and make a more realistic test, because in a real deployment, scaled out load will be coming over different TCP connections.	2025-12-19 12:22:49 +04:00
Spike Curtis	cac6d4ce98	feat: add --max-failures to coder exp scaletest create-workspaces (#21315 ) Adds `--max-failures` flag to `coder exp scaletest create-workspaces` so that we can tolerate a few failures without failing the command. When running our scale test infra, we create Kubernetes Jobs to create the initial cluster workspaces, then we have load-generation jobs that depend on them. At high scale, it's kind of expected that some of the requests will fail: even with 99.9% success, you still expect one failure per 1000. It's useful to be able to carry on with the scale test anyway and proceed to traffic generation.	2025-12-18 11:21:35 +04:00
Zach	174a6192fa	refactor: consolidate darwin unix socket test helpers (#21283 )	2025-12-16 09:11:54 -07:00
Steven Masley	8fefd91e4a	feat!: support PKCE in the oauth2 client's auth/exchange flow (#21215 ) Breaking Change: Existing oauth apps might now use PKCE. If an unknown IdP type was being used, and it does not support PKCE, it will break. To fix, set the PKCE methods on the external auth to `none` ``` export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none ```	2025-12-15 17:41:47 +00:00
Steven Masley	3194bcfc9e	chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064 ) Provisioner steps broken into smaller granular actions. Changes: - `ExtractArchive` moved to `init` request (was in `configure`) - Writing `tfstate` moved to `plan` (was in `configure`) - Moved most plan/apply outputs to `GraphComplete`	2025-12-15 11:26:41 -06:00
Zach	7ecfd1aa07	fix: isolate keyring usage by parallel test processes (#21256 ) This change ensures keyring tests that utilize the real OS keyring use credentials that are isolated by process ID so that parallel test processes do not access the same credentials. https://github.com/coder/internal/issues/1192	2025-12-15 09:40:59 -07:00
Asher	27f0413347	feat: add flag to disable template insights (#20940 ) Closes #20399 To summarize the original commit messages: - Do not log stats to the database. - Return errors on the insight endpoints. - Update the frontend to show those errors. - Also fixes an issue with getting the user status count via codersdk, since I added a test to ensure it was not disabled by this flag and it was sending the wrong payload.	2025-12-14 03:00:03 +00:00
Mathias Fredriksson	761dd55ee8	fix(coderd/database): sort template version variables and fix test flake (#21233 ) Previously the GetTemplateVersionVariables query did not sort output, relying on PostgreSQL on-disk ordering which is undeterministic. Variables are now sorted by name because there is no alternative for ordering. Tests were adjusted to accommodate the new ordering, previously they relied on data being written to disk in insert order.	2025-12-12 11:41:46 +00:00
Mathias Fredriksson	3d38cd568e	test(cli): attempt to fix TestGitSSH flake (#21230 ) Since the failing test logs are gone, we can only guess at what went wrong. Given our parallel test-suite, and that tests typically run slow on Windows, it seems reasonable that the context timed out due to a single context being responsbile for setup and two command executions. This change fixes the issue by updating the context usage, if this flake ever resurfaces, we can re-investigate. Fixes coder/internal#770	2025-12-11 18:41:45 +00:00
Mathias Fredriksson	2e4aa729be	test(cli): fix flaky TestProvisioners_Golden (#21228 ) Use a single base time with consistent offsets and ensure CreatedAt is set on all dbgen-created resources. Fixes coder/internal#449	2025-12-11 18:19:18 +00:00
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
George K	4379230a27	feat: add deployment-wide option to disable workspace sharing (#21172 ) Adds `--disable-workspace-sharing` option. Workspace sharing is disabled by not including user and group ACLs in the workspace RBAC object, which prevents ACL-based authz. Closes https://github.com/coder/internal/issues/1072 The commit also adds saving of workspace user/group ACLs in the test DB data generator.	2025-12-09 08:13:09 -08:00
Mathias Fredriksson	7fc8ee4c60	test(cli/cliui): add test for context cancellation during log streaming (#21125 ) Verifies that streamLogs properly returns ctx.Err() when the context is cancelled while waiting for logs. This covers the case where a user interrupts an SSH connection (e.g., Ctrl+C) during startup script execution. Refs #21104	2025-12-08 14:17:25 +00:00
Mathias Fredriksson	d351821ec3	fix(cli/cliui): skip startup script logs when Wait=false (#21105 ) When users pass --wait=no or set CODER_SSH_WAIT=no, startup logs are no longer dumped to stderr. The stage indicator is still shown, just not the log content. Fixes #13580	2025-12-08 14:11:47 +00:00
Mathias Fredriksson	0c453d7f8e	refactor(cli/cliui): extract agentWaiter struct from agent connection state machine (#21104 ) The Agent function had complex nested control flow and cross-case state sharing via the showStartupLogs flag. This made the code hard to follow and maintain. This change extract an agentWaiter struct with self-contained methods: - wait: main state machine loop - waitForConnection: handles Connecting/Timeout states - handleConnected: handles Connected state and startup scripts - streamLogs: handles log streaming/fetching - waitForReconnection: handles Disconnected state - pollWhile: helper to consolidate polling loops Each handler is now self-contained with no cross-method state sharing and the showStartupLogs flag is replaced by return values and the waitedForConnection tracking variable.	2025-12-08 14:00:25 +00:00
Danny Kopping	259dee2ea8	fix: move contexts to appropriate locations (#21121 ) Closes https://github.com/coder/internal/issues/1173, https://github.com/coder/internal/issues/1174 Currently these two tests are flaky because the contexts were created before a potentially long-running process. By the time the context was actually used, it may have timed out - leading to confusion. Additionally, the `ExpectMatch` calls were not using the test context - but rather a background context. I've marked that func as deprecated because we should always tie these to the test context. Special thanks to @mafredri for the brain probe 🧠 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-12-05 13:14:35 +00:00
Mathias Fredriksson	c750695d83	feat(cli/cliui): output empty string for empty table (#20967 ) This changes makes it so that we output the empty string for Format when there is no data. It turns out there are many places in the code where we have such handling, but in a way that would break the JSON formatter (since we'd output nothing on stdout or text rather than `[]`/`null`).	2025-12-03 11:32:59 +02:00
Mathias Fredriksson	ff46917e62	feat: add retention config for `workspace_agent_logs` (#21039 ) Replace hardcoded 7-day retention for workspace agent logs with configurable retention from deployment settings. Defaults to 7d to preserve existing behavior. Depends on #21038 Updates #20743	2025-12-02 16:01:33 +00:00
Mathias Fredriksson	56e7858570	feat(coderd): add retention policy configuration (#21021 ) Add `RetentionConfig` with server flags for configuring data retention: - `--audit-logs-retention`: retention for audit log entries - `--connection-logs-retention`: retention for connection logs - `--api-keys-retention`: retention for expired API keys (default 7d) Updates #20743	2025-12-02 16:04:06 +02:00
Ethan	bf40d678ec	fix(cli): close prebuild runner prometheus server last (#21053 ) ## Description Fixes the prebuilds scaletest command where the prometheus server was being shut down before waiting for metrics to be scraped. The issue was the defer order - since defers execute in LIFO (last-in, first-out) order: Before (broken): 1. Register tracing defer (includes wait for prometheus scrape) 2. Register prometheus server defer Execution order: prometheus closes first, then wait happens (server already gone!) After (fixed): 1. Register prometheus server defer 2. Register tracing defer (includes wait for prometheus scrape) Execution order: wait happens first (server still up), then prometheus closes. This matches the pattern used in other scaletest commands. ## Impact The `coderd_scaletest_prebuild_deletion_jobs_completed` metric (and potentially others) was always showing 0 because the server shut down before Prometheus could scrape the final values. _This PR was generated by [`mux`](https://github.com/coder/mux) and reviewed by a human._	2025-12-02 12:10:50 +00:00
Jake Howell	ab4366f5c6	feat!: implement `AI Bridge` heading to `/deployment/observability` (#20791 ) > [!CAUTION] > In whichever release this lands, we've removed the ability to provide keys via a YAML file (specifically on `openai_key`, `anthropic_key`, `bedrock_access_key` and finally `bedrock_access_key_secret`). This will need to be described in the release notes as to not break peoples AI Bridge integrations upgrading from older versions. This pull-request ensures that we can see the overview of the settings of the `AI Bridge` feature within the `/deployment/observability` route. This set of options only render when the `aibridge` feature flag is enabled. ### Preview ![preview-ai-bridge-observability](https://github.com/user-attachments/assets/262d2456-94b4-49b2-9b4e-b14583e70ede)	2025-12-01 21:23:46 +00:00
Sas Swart	ce627bf23f	feat: implement agent socket api, client and cli (#20758 ) closes: https://github.com/coder/coder/issues/10352 closes: https://github.com/coder/internal/issues/1094 closes: https://github.com/coder/internal/issues/1095 In this pull request, we enable a new set of experimental cli commands grouped under `coder exp sync`. These commands allow any process acting within a coder workspace to inform the coder agent of its requirements and execution progress. The coder agent will then relay this information to other processes that have subscribed. These commands are: ``` # Check if this feature is enabled in your environment coder exp sync ping # express that your unit depends on another coder exp sync want <unit> <dependency_unit> # express that your unit intends to start a portion of the script that requires # other units to have completed first. This command blocks until all dependencies have been met coder exp sync start <unit> # express that your unit has completes its work, allowing dependent units to begin their execution coder exp sync complete <unit> ``` Example: In order to automatically run claude code in a new workspace, it must first have a git repository cloned. The scripts responsible for cloning the repository and for running claude code would coordinate in the following way: ```bash # Script A: Claude code # Inform the agent that the claude script wants the git script. # That is, the git script must have completed before the claude script can begin its execution coder exp sync want claude git # Inform the agent that we would now like to begin execution of claude. # This command will block until the git script (and any other defined dependencies) # have completed coder exp sync start claude # Now we run claude code and any other commands we need claude ... # Once our script has completed, we inform the agent, so that any scripts that depend on this one # may begin their execution coder exp sync complete claude ``` ```bash # Script B: Git # Because the git script does not have any dependencies, we can simply inform the agent that we # intend to start coder exp sync start git git clone ssh://git@github.com/coder/coder # Once the repository have been cloned, we inform the agent that this script is complete, so that # scripts that depend on it may begin their execution. coder exp sync complete git ``` Notes: * Unit names (ie. `claude` and `git`) given as input to the sync commands are arbitrary strings. You do not have to conform to specific identifiers. We recommend naming your scripts descriptively, but succinctly. * Scripts unit names should be well documented. Other scripts will need to know the names you've chosen in order to depend on yours. Therefore, you --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2025-11-28 08:33:50 +02:00
Zach	bbf7b137da	fix(cli): remove defaulting to keyring when --global-config set (#20943 ) This fixes a regression that caused the VS code extension to be unable to authenticate after making keyring usage on by default. This is because the VS code extension assumes the CLI will always use the session token stored on disk, specifically in the directory specified by --global-config. This fix makes keyring usage enabled when the --global-config directory is not set. This is a bit wonky but necessary to allow the extension to continue working without modification and without backwards compat concerns. In the future we should modify these extensions to either access the credential in the keyring (like Coder Desktop) or some other approach that doesn't rely on the session token being stored on disk. Tests: `coder login dev.coder.com` -> token stored in keyring `coder login --global-config=/tmp/ dev.coder.com` -> token stored in `/tmp/session`	2025-11-26 10:17:31 +01:00
Zach	6238a99275	feat(cli)!: enable keyring usage by default (#20851 ) Make keyring usage for session token storage on by default for supported platforms (Windows and macOS), with the ability to opt-out via --use-keyring=false. This change will be a breaking change for any users depending on the session token being stored on disk, though users can restore file usage via the flag above. This change will also require CLI users to authenticate after updating.	2025-11-25 18:13:00 -07:00
Danielle Maywood	b255827a52	chore: promote tasks to stable from experimental (#20921 ) - Promote tasks from `/api/experimental` to `/api/v2`. - Move sdk from `ExperimentalClient` to `Client`. - Update swagger	2025-11-25 15:24:25 +00:00
Mathias Fredriksson	ad8ba4aac6	feat(cli): promote tasks commands from experimental to GA (#20916 ) ## Overview This change promotes the tasks CLI commands from `coder exp task` to `coder task`, marking them as generally available (GA). ## Migration Users will need to update their scripts from: ```shell coder exp task create "my task" ``` To: ```shell coder task create "my task" ``` --- 🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️	2025-11-25 13:50:22 +00:00
Susana Ferreira	3011207519	feat: add display name field for tasks (#20856 ) ## Problem Tasks currently only expose a machine-friendly name field (e.g. `task-python-debug-a1b2`), but this value is primarily an identifier rather than a clean, descriptive label. We need a separate display-friendly name for use in the UI. This PR introduces a new `display_name` field and updates the task-name generation flow. The Claude system prompt was updated to return valid JSON with both `name` and `display_name`. The name generation logic follows a fallback chain (Anthropic > prompt sanitization > random fallback). To make task names more closely resemble their display names, the legacy `task-` prefix has been removed. For context, PR https://github.com/coder/coder/pull/20834 introduced a small Task icon to the workspace list to help identify workspaces associated to tasks. ## Changes - Database migration: Added `display_name` column to tasks table - Updated system prompt to generate both task name and display name as valid JSON - Task name generation now follows a fallback chain: Anthropic > prompt sanitization > random fallback - Removed `task-` prefix from task names to allow more descriptive names - Note: PR https://github.com/coder/coder/pull/20834 adds a Task icon to workspaces in the workspace list to distinguish task-created workspaces Note: UI changes will be addressed in a follow-up PR Related to: https://github.com/coder/coder/issues/20801	2025-11-25 13:00:59 +00:00
Kacper Sawicki	6d41bfad81	fix: improve http connection pooling for smtp notifications (#20605 ) This change updates how SMTP notifications are polled during scale tests. Before, each of the ~2,000 pollers created its own http.Client, which opened thousands of short-lived TCP connections. Under heavy load, this ran out of available network ports and caused errors like `connect: cannot assign requested address` Now, all pollers share one HTTP connection pool. This prevents port exhaustion and makes polling faster and more stable. If a network error happens, the poller will now retry instead of stopping, so tests keep running until all notifications are received. The `SMTPRequestTimeout` is now applied per request using a context, instead of being set on the `http.Client`.	2025-11-24 14:25:18 +01:00
Atif Ali	636408906f	chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831 )	2025-11-24 18:09:04 +05:00
Danny Kopping	443b0c851d	chore: upgrade `coder/serpent` to allow more readable durations (#20886 ) https://github.com/coder/serpent/pull/28 added this capability. https://github.com/coder/serpent/compare/v0.11.0...v0.12.0 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-24 09:24:06 +00:00
Zach	b4cc982cc2	fix: ensure embedded-postgres state is wiped between retries (#20809 ) Retries were previously added when starting embedded postgres to mitigate port allocation conflicts (we can't use an ephemeral port for tests). Retries alone seemingly did not fix the test flakes. A new failure mode appeared on the retries: timing out connecting to the database. When a port discovery error occurrs, embedded-postgres does not create the database. If the data directory exists on the next attempt, embedded-postgres will assume the database has already been created. This seems to cause the timeout error. Wipe all state between retries to ensure attempts execute the same logic that creates the database. [#658](https://github.com/coder/internal/issues/658)	2025-11-21 08:55:01 -07:00
Danny Kopping	5a7d4f69f6	feat: add configurable retention for aibridge (#20828 ) Closes https://github.com/coder/internal/issues/1134 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-21 11:35:36 +02:00
Spike Curtis	0bbb7dd0a3	feat: add cleanup to task-status load test runner (#20799 ) Implement Cleanup in the task status Runner, to delete the external workspaces created.	2025-11-19 10:24:30 +04:00
Spike Curtis	5ea1353d46	feat: add exp scaletest task-status command (#20761 ) Adds `coder exp scaletest task-status` subcommand to generate task status update load on the Coder server.	2025-11-19 10:13:32 +04:00
Ethan	6bafbb7bc5	feat(cli): add `prebuilds` scaletest command (#20600 ) Closes https://github.com/coder/internal/issues/914	2025-11-14 18:08:14 +11:00
Dean Sheather	a8f2a8a44d	fix(cli): skip dry-run for workspace start/restart commands (#20754 ) ## Problem The `prepWorkspaceBuild()` function in `cli/create.go` was unconditionally executing dry-runs for all workspace actions. This caused unnecessary delays and "Planning workspace..." messages during `coder start` and `coder restart` commands when they should only happen during `coder create` and `coder update`. ## Root Cause The `prepWorkspaceBuild()` function is shared code called by: - create command - passes `WorkspaceCreate` action ✅ dry-run IS desired - update command - passes `WorkspaceUpdate` action ✅ dry-run IS desired - start command - passes `WorkspaceStart` action (or `WorkspaceUpdate` as fallback) ❌ dry-run NOT desired for `WorkspaceStart` - restart command - passes `WorkspaceRestart` action ❌ dry-run NOT desired - scaletest commands - pass `WorkspaceCreate` action ✅ dry-run IS desired ## Solution Wrapped the dry-run section (lines 580-627) in a conditional that only executes when `args.Action == WorkspaceCreate \|\| args.Action == WorkspaceUpdate`. This skips dry-run for `WorkspaceStart` and `WorkspaceRestart` actions while preserving it for creation and explicit updates. ## Changes - Added conditional check around the entire dry-run logic block - Added clarifying comment explaining the intent - Changed from unconditional execution to: `if args.Action == WorkspaceCreate \|\| args.Action == WorkspaceUpdate { ... }` ## Impact \| Command \| Action Type \| Dry-run Before \| Dry-run After \| Status \| \|---------\|-------------\|----------------\|---------------\|--------\| \| `coder create` \| `WorkspaceCreate` \| ✅ Yes \| ✅ Yes \| Unchanged \| \| `coder update` \| `WorkspaceUpdate` \| ✅ Yes \| ✅ Yes \| Unchanged \| \| `coder start` (normal) \| `WorkspaceStart` \| ❌ Yes (bug) \| ✅ No \| Fixed \| \| `coder start` (template changed) \| `WorkspaceUpdate` \| ✅ Yes \| ✅ Yes \| Unchanged (correct behavior) \| \| `coder restart` \| `WorkspaceRestart` \| ❌ Yes (bug) \| ✅ No \| Fixed \| \| scaletest \| `WorkspaceCreate` \| ✅ Yes \| ✅ Yes \| Unchanged \| ## Testing ✅ Code compiles successfully ```bash go build -o /dev/null ./cli/... ``` ✅ All relevant tests pass locally ```bash cd cli && go test -run "TestCreate\|TestStart\|TestRestart\|TestUpdate" -v PASS ok github.com/coder/coder/v2/cli 3.337s ``` ✅ All CI checks pass - test-go-pg (ubuntu, macos, windows) ✅ - test-go-pg-17 ✅ - test-go-race-pg ✅ - test-e2e ✅ - All other checks ✅ ## Behavior Changes Before: - Users running `coder start` would see "Planning workspace..." and wait for unnecessary dry-run completion - Users running `coder restart` would experience unnecessary dry-run overhead After: - `coder start` (simple start) skips dry-run entirely (faster, more intuitive) - `coder start` (with template update) still shows dry-run (correct - user needs to see what's changing) - `coder restart` skips dry-run entirely (faster, more intuitive) - `coder create` maintains existing dry-run behavior (shows "Planning workspace..." and resource preview) - `coder update` maintains existing dry-run behavior (shows "Planning workspace..." and resource preview) ## Verification Manual testing should verify: 1. `coder create` still shows "Planning workspace..." ✅ 2. `coder update` still shows "Planning workspace..." ✅ 3. `coder start` (simple start) does NOT show "Planning workspace..." ✅ 4. `coder restart` does NOT show "Planning workspace..." ✅	2025-11-13 09:48:28 +11:00
Steven Masley	04727c06e8	chore: add experiment toggle for terraform workspace caching (#20559 ) Experiments passed to provisioners to determine behavior. This adds `--experiments` flag to provisioner daemons. Prior to this, provisioners had no method to turn on/off experiments.	2025-11-12 14:26:15 -06:00
Steven Masley	9149c1e9f2	chore: append template metadata to protobuf config (#20558 ) Adds some extra meta data sent to provisioners. Also adds a field `reuse_terraform_workspace` to tell the provisioner whether or not to use the caching experiment.	2025-11-12 12:46:39 -06:00
Zach	5e85663ce3	feat(cli): add macOS support for session token keyring storage (#20613 ) Add support for storing the CLI session token in the OS keyring on macOS when the --use-keyring flag is provided. https://github.com/coder/coder/issues/19403 https://www.notion.so/coderhq/CLI-Session-Token-in-OS-Keyring-293d579be592808b8b7fd235304e50d5	2025-11-12 10:48:19 -07:00

1 2 3 4 5 ...

1696 Commits