coder

mirror of https://github.com/coder/coder.git synced 2026-06-04 21:48:22 +00:00

Author	SHA1	Message	Date
Jon Ayers	4f1fd82ed7	fix: propagate correct agent exit code (#21718 ) The reaper (PID 1) now returns the child's exit code instead of always exiting 0. Signal termination uses the standard Unix convention of 128 + signal number. fixes #21661	2026-01-28 15:56:04 -06:00
Steven Masley	e13f2a9869	chore: remove extra `stop_modules` from provisionerd proto (#21706 ) Was a duplicate of start_modules Closes https://github.com/coder/coder/issues/21206	2026-01-28 09:25:47 -06:00
Mathias Fredriksson	d06b21df45	test(cli): increase timeout in TestGitSSH to reduce flakes (#21725 ) The test occasionally times out at 15s on Windows CI runners. Investigation of CI logs shows the HTTP request to the agent's gitsshkey endpoint never appears in server logs, suggesting it hangs before the request completes (possibly in connection setup, middleware, or database queries). Increase to 60s to reduce flake rate. Fixes coder/internal#770	2026-01-28 14:01:07 +02:00
Callum Styan	d4cd982608	chore: undeprecate the workspace rename flag and clarify potential issues (#21669 ) This undeprecates the `allow-workspace-renames` flag. IIUC, the 'danger' with using this flag is that the workspace name might have been used in the definition of some other terraform resources within template code, so a rename could cause problems such as with persistent disks. for https://github.com/coder/coder/issues/21628 --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2026-01-27 10:53:13 -08:00
Susana Ferreira	8f3bb0b0d1	feat: add Copilot provider to aibridge (#21663 ) Adds GitHub Copilot as a supported AI provider in aibridge. Depends on: https://github.com/coder/aibridge/pull/137 Closes: https://github.com/coder/internal/issues/1235	2026-01-27 14:02:35 +00:00
Danny Kopping	7123518baa	feat: conditionally send `aibridge` actor headers (#21643 ) Also passes along the authenticated username as actor metadata. Closes https://github.com/coder/aibridge/issues/135 Depends on https://github.com/coder/aibridge/pull/142 Replace aibridge tag with merge commit once https://github.com/coder/aibridge/pull/142 lands. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-26 15:08:17 +00:00
Yevhenii Shcherbina	9b14fd3adc	feat: add boundary premium feature (#21589 ) Source code changes: - Added a wrapper for the boundary subcommand that checks feature entitlement before executing the underlying command. - Added a helper that returns the Boundary version using the runtime/debug package, which reads this information from the go.mod file. - Added FeatureBoundary to the corresponding enum. - Move boundary command from AGPL to enterprise. `NOTE`: From now on, the Boundary version will be specified in go.mod instead of being defined in AI modules.	2026-01-23 12:56:36 -05:00
Cian Johnston	365ab0e609	test: bump timeout on TestSSH/StdioExitOnParentDeath (#21630 ) Relates to https://github.com/coder/internal/issues/1289 I was able to reproduce the issue locally -- it appears to sometimes just take 25 seconds to get all of the test dependencies stood up: ``` t.go:111: 2026-01-22 16:39:15.388 [debu] pubsub: pubsub dialing postgres network=tcp address=127.0.0.1:5432 timeout_ms=0 ... t.go:111: 2026-01-22 16:39:38.789 [info] agent.net.tailnet.tcp: accepted connection src=[fd7a:115c:a1e0:44b1:8901:8f09:e605:d019]:55406 dst=[fd7a:115c:a1e0:4cfd:a892:e4e2:8cad:8534]:1 ... ssh_test.go:1208: Error Trace: /Users/cian/src/coder/coder/testutil/chan.go:74 /Users/cian/src/coder/coder/cli/ssh_test.go:1208 Error: SoftTryReceive: context expired Test: TestSSH/StdioExitOnParentDeath ssh_test.go:1212: ``` Hopefully bumping the timeout should fix it.	2026-01-23 10:17:41 +00:00
George K	d29a168785	fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620 ) The removal of that permission from the role broke valid use cases (e.g. a site owner user creating a workspace owned by a system account and then trying to share it with another user). The bulk of the PR is made up of the rollbacks of the previously introduced test updates necessitated by the removal. Related to: https://github.com/coder/internal/issues/1285	2026-01-22 08:12:15 -08:00
Cian Johnston	f799cba395	fix(cli): allow coder ssh --stdio to exit when parent process dies (#21583 ) Relates to https://github.com/coder/internal/issues/1217 Adds a background goroutine in `--stdio` mode to check if the parent PID is still alive and exit if it is no longer present. 🤖 Implemented using Mux + Claude Opus 4.5, reviewed and refactored by me.	2026-01-21 14:14:51 +00:00
Danny Kopping	a14a22eb54	feat: support custom bedrock base url (#21582 ) Closes https://github.com/coder/aibridge/issues/126 Depends on https://github.com/coder/aibridge/pull/131 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-21 12:48:56 +00:00
Kacper Sawicki	ed679bb3da	feat(codersdk): add circuit breaker configuration support for aibridge (#21546 ) ## Summary Add circuit breaker support for AI Bridge to protect against cascading failures from upstream AI provider rate limits (HTTP 429, 503, and Anthropic's 529 overloaded responses). ## Changes - Add 5 new CLI options for circuit breaker configuration: - `--aibridge-circuit-breaker-enabled` (default: false) - `--aibridge-circuit-breaker-failure-threshold` (default: 5) - `--aibridge-circuit-breaker-interval` (default: 10s) - `--aibridge-circuit-breaker-timeout` (default: 30s) - `--aibridge-circuit-breaker-max-requests` (default: 3) - Update aibridge dependency to include circuit breaker support - Add tests for pool creation with circuit breaker providers ## Notes - Circuit breaker is disabled by default for backward compatibility - When enabled, applies to both OpenAI and Anthropic providers - Uses sony/gobreaker internally via the aibridge library ## Testing ``` make test RUN=TestPoolWithCircuitBreakerProviders ```	2026-01-20 14:59:29 +01:00
Rowan Smith	b163b4c950	feat: support bundle updates to enable pprof and telemetry collection (#21486 ) - Adds pprof collection support now that we have the listeners automatically starting (requires Coder server 2.28.0+, includes a version check). Collects heap, allocs, profile (30s), block, mutex, goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture for 30 seconds and emits a log line stating as such. Enable capture by supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var. Collection of pprof data from both coderd and the Coder agent occurs. - Adds collection of Prometheus metrics, also requires 2.28.0+ - Adds the ability to include a template in the bundle independently of supplying the details of a running workspace by supplying the `--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var - Captures a list of workspaces the user has access to. Defaults to a max of 10, configurable via `--workspaces-total-cap` / `CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP` - Collects additional stats from the coderd deployment (aggregated workspace/session metrics), as well as entitlements via license and dismissed health checks. created with help from mux	2026-01-20 10:28:52 +11:00
Susana Ferreira	a002fbbae6	refactor: avoid terminology collision with aibridge by renaming passthrough to tunneled (#21562 ) ## Description Renames "passthrough" to "tunneled" in aiproxy to avoid terminology collision with aibridge, which has its own passthrough concept. Follow-up from: https://github.com/coder/coder/pull/21512#discussion_r2698231778 --------- Co-authored-by: Danny Kopping <danny@coder.com>	2026-01-19 13:23:42 +00:00
Susana Ferreira	a406ed7cc5	feat: add upstream proxy support to aiproxy for passthrough requests (#21512 ) ## Description Adds upstream proxy support for AI Bridge Proxy passthrough requests. This allows aiproxy to forward non-allowlisted requests through an upstream proxy. Currently, the only supported configuration is when aiproxy is the first proxy in the chain (client → aiproxy → upstream proxy). ## Changes * Add `--aibridge-proxy-upstream` option to configure an upstream HTTP/HTTPS proxy URL for passthrough requests * Add `--aibridge-proxy-upstream-ca` option to trust custom CA certificates for HTTPS upstream proxies * Passthrough requests (non-allowlisted domains) are forwarded through the upstream proxy * MITM'd requests (allowlisted domains) continue to go directly to aibridge, not through the upstream proxy * Add tests for upstream proxy configuration and request routing Closes: https://github.com/coder/internal/issues/1204	2026-01-19 08:50:57 +00:00
Asher	4d414a0df7	feat: add --use-parameter-defaults flag (#21119 ) This is like `--yes`, but for parameter prompts.	2026-01-16 17:04:57 -09:00
Cian Johnston	ab126e0f0a	feat: improve usability of coder show (#21539 ) This PR improves the usability of `coder show`: - Adds a header with workspace owner/name, latest build status and time since, and template name / version name. - Updates `namedWorkspace` to allow looking up by UUID - Also improves associated `TestShow` to respect context deadlines.	2026-01-16 15:45:33 +00:00
Sas Swart	0ebe8e57ad	chore: add scaletesting tools for aibridge (#21279 ) This pull request adds scaletesting tools for aibridge. See https://www.notion.so/Scale-tests-2c5d579be5928088b565d15dd8bdea41?source=copy_link for information and instructions. closes: https://github.com/coder/internal/issues/1156 closes: https://github.com/coder/internal/issues/1155 closes: https://github.com/coder/internal/issues/1158	2026-01-15 17:05:46 +02:00
George K	0712faef4f	feat(enterprise): implement organization "disable workspace sharing" option (#21376 ) Adds a per-organization setting to disable workspace sharing. When enabled, all existing workspace ACLs in the organization are cleared and the workspace ACL mutation API endpoints return `403 Forbidden`. This complements the existing site-wide `--disable-workspace-sharing` flag by providing more granular control at the organization level. Closes https://github.com/coder/internal/issues/1073 (part 2) --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-14 09:47:50 -08:00
Danny Kopping	7d5cd06f83	feat: add `aibridge` structured logging (#21492 ) Closes https://github.com/coder/internal/issues/1151 Sample: ``` [API] 2026-01-13 15:50:20.795 [info] coderd.aibridgedserver: interception started trace=8bb5a1d8eb10526cc46ad90f191bb468 span=a3e5b5da9546032a record_type=interception_start interception_id=97461880-4a6c-47c1-8292-3588dd715312 initiator_id=360c6167-a93a-4442-9c3e-f87a6d1cfb66 api_key_id=vg1sbUv97d provider=anthropic model=claude-opus-4-5-20251101 started_at="2026-01-13T15:50:20.790690781Z" metadata={} [API] 2026-01-13 15:50:23.741 [info] coderd.aibridgedserver: token usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=a114f0cc3047296e record_type=token_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu input_tokens=10 output_tokens=8 created_at="2026-01-13T15:50:23.731587038Z" metadata={"cache_creation_input":53194,"cache_ephemeral_1h_input":0,"cache_ephemeral_5m_input":53194,"cache_read_input":0,"web_search_requests":0} [API] 2026-01-13 15:50:26.265 [info] coderd.aibridgedserver: token usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=dbdafb563bff2c9c record_type=token_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu input_tokens=0 output_tokens=130 created_at="2026-01-13T15:50:26.254467904Z" metadata={} [API] 2026-01-13 15:50:26.268 [info] coderd.aibridgedserver: prompt usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=da51887a757226fc record_type=prompt_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu prompt="list the jmia share price" created_at="2026-01-13T15:50:26.255299811Z" metadata={} [API] 2026-01-13 15:50:26.268 [info] coderd.aibridgedserver: interception ended trace=8bb5a1d8eb10526cc46ad90f191bb468 span=3fa25397705ee7c9 record_type=interception_end interception_id=97461880-4a6c-47c1-8292-3588dd715312 ended_at="2026-01-13T15:50:26.25555547Z" [API] 2026-01-13 15:50:26.269 [info] coderd.aibridgedserver: tool usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=b54af90afc604d29 record_type=tool_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu tool=mcp__stonks__getStockPriceSnapshot input="{\"ticker\":\"JMIA\"}" server_url="" injected=false invocation_error="" created_at="2026-01-13T15:50:26.255164652Z" metadata={} ``` Structured logging is only enabled when `CODER_AIBRIDGE_STRUCTURED_LOGGING=true`. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-14 17:26:08 +02:00
Susana Ferreira	74b6d12a8a	feat: implement selective MITM with configurable domain allowlist in aibridgeproxyd (#21473 ) ## Description Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so that only requests to allowlisted domains are intercepted and decrypted. Requests to all other domains are tunneled directly without decryption. ## Changes * New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default: `api.anthropic.com`,`api.openai.com`) * Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT` requests to allowlisted hosts * Certificate caching: Now only generates/caches certificates for allowlisted domains * Validation: Startup fails if domain allowlist is empty or contains invalid entries Closes: https://github.com/coder/internal/issues/1182	2026-01-13 11:30:51 +00:00
Danny Kopping	49a42eff5c	feat: make database connection pool size configurable (#21403 ) Closes https://github.com/coder/coder/issues/21360 A few considerations/notes: - I've kept the number of conns to 10 in all other places, except coderd - which uses the config value - I opted to also make idle conns configurable; the greater the delta between max open and max idle, the more connection churn - Postgres maintains a [_process_ per connection](https://www.postgresql.org/docs/current/connect-estab.html), contrary to what the comment said previously - Operators should be able to tune this, since process churn can negatively affect OS scheduling - I've set the value to `"auto"` by default so it's not another knob one _has to_ twiddle, and sets max idle = max conns / 3 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-13 10:50:57 +02:00
George K	cc2efe9e1f	feat(coderd/rbac): make organization-member a per-org system custom role (#21359 ) Migrated the built-in organization-member role to DB storage so it can be customized per org. Closes https://github.com/coder/internal/issues/1073 (part 1)	2026-01-12 18:19:19 -08:00
Cian Johnston	2b448c7178	feat(cli): enrich user-agent header for client requests (#21483 ) Adds the following information to CLI User-Agent headers to aid deployment administrators in troubleshooting where requests are coming from. Before: `Go-http-client/1.1` After: `coder-cli/v2.34.5 (linux/amd64; coder whoami)` 🤖 These changes were generated by Claude Sonnet 4.5 but reviewed and edited manually by me.	2026-01-12 17:46:05 +00:00
Kacper Sawicki	6ca70d3618	feat(cli): add --no-build flag to state push for state-only updates (#21374 ) ## Summary Adds a `--no-build` flag to `coder state push` that updates the Terraform state directly without triggering a workspace build. ## Use Case This enables state-only migrations, such as migrating Kubernetes resources from deprecated types (e.g., `kubernetes_config_map`) to versioned types (e.g., `kubernetes_config_map_v1`): ```bash coder state pull my-workspace > state.json terraform init terraform state rm -state=state.json kubernetes_config_map.example terraform import -state=state.json kubernetes_config_map_v1.example default/example coder state push --no-build my-workspace state.json ``` ## Changes - Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state without triggering a build - Add `UpdateWorkspaceBuildState` SDK method - Add `--no-build`/`-n` flag to `coder state push` - Add confirmation prompt (can be skipped with `--yes`/`-y`) since this is a potentially dangerous operation - Add test for `--no-build` functionality Fixes #21336	2026-01-12 15:16:59 +01:00
Steven Masley	d2044c2ee9	chore: update protobuf to reuse file request (#21447 ) This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398 Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`. Renamed to `FileUpload` because it is now bi-directional. This is backwards compatible. I tested it to confirm the payloads are identical. Types were just renamed and moved around. ```golang func TestTypeUpgrade(t *testing.T) { t.Parallel() x := &proto2.UploadFileRequest{ Type: &proto2.UploadFileRequest_ChunkPiece{ ChunkPiece: &proto.ChunkPiece{ Data: []byte("Hello World!"), FullDataHash: []byte("Foobar"), PieceIndex: 42, }, }, } data, err := protobuf.Marshal(x) require.NoError(t, err) // Exactly the same output // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main` // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch fmt.Println(base64.StdEncoding.EncodeToString(data)) } ``` # What this does This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.	2026-01-09 11:23:32 -06:00
Steven Masley	89f4d60e7b	chore: remove experiment "terraform-directory-reuse" (#21397 ) Experiment is no longer required, the new method will be released without an experiment and without a toggle Main PR is: https://github.com/coder/coder/pull/21398	2026-01-09 11:13:16 -06:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Cian Johnston	0f446f99dd	feat(cli): add logs cmd (#21430 ) This PR adds a command to view the provisioner and agent logs for a given workspace. Note: I did investigate using the existing `cliui` methods to tail the logs but they are tailored to a very specific use-case. Other changes: - Adds `Agents` to `dbfake.WorkspaceResponse` - Adds methods to generate provisioner and agent logs in `dbgen` --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-08 09:58:10 +00:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Danielle Maywood	c77c0fce52	fix(cli/open): wait for agent to be created (#21448 ) Fix https://github.com/coder/internal/issues/596 --- 🤖 Claude Code with Claude Opus 4.5	2026-01-07 16:06:00 +00:00
Cian Johnston	6bd2d1c85f	chore(cli): seed healthcheck cache in TestSupportBundle (#21436 ) Fixes https://github.com/coder/internal/issues/272 This test periodically fails due to the healthcheck timing out. The problem is compounded due to the fact that we stand up a new coderdtest instance for each test. This PR does the following: * Updates the subtests to share a single `coderdtest` instance. * Hits the `/debug/health` endpoint before completing the setup phase so that the result is cached. This will not completely remove the issue, as the healthcheck could still fail due to test-infrastructure-related issues. In this case we may decide to add a retry in this 'seed' function.	2026-01-07 08:47:31 +00:00
Asher	4a97df3768	chore: rename flag to disable template insights (#21329 ) Because this affects more than just the template insights page (specifically it also affects the deployment stats endpoint which is shown on bottom bar and Prometheus), the group is being renamed generically to just "stats collection". In the future if we need to affect the other stats we can put those options here. Then, because this change only affects a portion of stats, specifically usage stats like connection and application time, bytes sent, etc, add a new sub-group called "usage stats". Then finally add back the "enable" flag. This also gives us a place to one day place an "anonymize" flag if we need to go that route.	2026-01-05 11:44:06 -09:00
Zach	07924037e7	feat: add boundary log forwarding from agent to coderd (#21345 ) Add agent forwarding of boundary audit logs from workspaces to coderd via agent API, and re-emission of boundary logs to coderd stderr. This change adds a server to the workspace agent that always listens on a unix socket for boundary to connect and send audit logs. coderd log format example: ``` [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=.. ``` Corresponding boundary PR: https://github.com/coder/boundary/pull/124 RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba https://github.com/coder/coder/issues/21280	2025-12-31 16:38:19 -07:00
Susana Ferreira	b97572285a	feat: add core AI MITM proxy daemon (#21296 ) ## Description Adds the core AI Bridge MITM proxy daemon. This proxy intercepts HTTPS traffic, decrypts it using a configured CA certificate, and forwards requests to AIBridge for processing. ## Changes * Added `aibridgeproxyd` package with the core proxy server implementation * Added configuration options: `CODER_AIBRIDGE_PROXY_ENABLED`, `CODER_AIBRIDGE_PROXY_LISTEN_ADDR`, `CODER_AIBRIDGE_PROXY_CERT_FILE`, `CODER_AIBRIDGE_PROXY_KEY_FILE` * Added tests for server initialization and MITM functionality Closes https://github.com/coder/internal/issues/1180	2025-12-29 15:31:51 +00:00
Danielle Maywood	44a46db487	feat(agent): support deleting dev containers (#21247 ) Add logic to the agent, and an endpoint, to allow requesting and then deleting a Dev Container and its related agent.	2025-12-22 11:28:31 +00:00
Rowan Smith	81cbf03a52	chore: fix typo in organization roles create help text (#21352 ) A simple typo fix to the help text stidin > stdin ``` ➜ coder git:(org_role_fix) ✗ coder organizations roles create -h coder v2.29.1+59cdd7e USAGE: coder organizations roles create [flags] <role_name> Create a new organization custom role - Run with an input.json file: $ coder organization -O <organization_name> roles create --stidin < role.json ```	2025-12-22 11:24:00 +11:00
Jake Howell	00793cc0b5	feat: add prometheus observability metrics for `dbpurge` (#21074 ) Related to [`internal#1139`](https://github.com/coder/internal/issues/1139) This implements some prometheus metrics for records being removed from the database. Currently we're tracking the following fields being removed from the DB by this. They're viewable in the `/api/v2/debug/metrics` endpoint. * `expired_api_keys` * `aibridge_records` * `connection_logs` * `duration` ``` # HELP coderd_dbpurge_iteration_duration_seconds Duration of each dbpurge iteration in seconds. # TYPE coderd_dbpurge_iteration_duration_seconds histogram coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="1"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="5"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="10"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="30"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="60"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="300"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="600"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="+Inf"} 1 coderd_dbpurge_iteration_duration_seconds_sum{success="true"} 0.014787814 coderd_dbpurge_iteration_duration_seconds_count{success="true"} 1 # HELP coderd_dbpurge_records_purged_total Total number of records purged by type. # TYPE coderd_dbpurge_records_purged_total counter coderd_dbpurge_records_purged_total{record_type="aibridge_records"} 0 coderd_dbpurge_records_purged_total{record_type="audit_logs"} 0 coderd_dbpurge_records_purged_total{record_type="connection_logs"} 0 coderd_dbpurge_records_purged_total{record_type="expired_api_keys"} 0 coderd_dbpurge_records_purged_total{record_type="workspace_agent_logs"} 0 ``` \| Position \| Pull-request \| \| -------- \| ------------ \| \| ✅ \| [feat: add prometheus observability metrics for `dbpurge`](https://github.com/coder/coder/pull/21074) \| \| \| [feat: add rbac specificity for `dbpurge`](https://github.com/coder/coder/pull/21088) \|	2025-12-20 00:20:57 +11:00
Spike Curtis	73253df6bf	fix: use separate HTTP clients in scale test load generators (#21288 ) While scale testing, I noticed that our load generators send basically all requests to a single Coderd instance. e.g. ![image.png](https://app.graphite.com/user-attachments/assets/e259862a-adf1-47e7-a37b-fd14e420058e.png) This is because our scale test commands create all `Runner`s using the same codersdk Client, which means they share an underlying HTTP client. With HTTP/2 a single TCP session can multiplex many different HTTP requests (including websockets). So, it creates a single TCP connection to a single coderd, and then sends all the requests down the one TCP connections. This PR modifies the `exp scaletest` load generator commands to create an independent HTTP client per `Runner`. This means that each runner will create its own TCP connection. This should help spread the load and make a more realistic test, because in a real deployment, scaled out load will be coming over different TCP connections.	2025-12-19 12:22:49 +04:00
Spike Curtis	cac6d4ce98	feat: add --max-failures to coder exp scaletest create-workspaces (#21315 ) Adds `--max-failures` flag to `coder exp scaletest create-workspaces` so that we can tolerate a few failures without failing the command. When running our scale test infra, we create Kubernetes Jobs to create the initial cluster workspaces, then we have load-generation jobs that depend on them. At high scale, it's kind of expected that some of the requests will fail: even with 99.9% success, you still expect one failure per 1000. It's useful to be able to carry on with the scale test anyway and proceed to traffic generation.	2025-12-18 11:21:35 +04:00
Zach	174a6192fa	refactor: consolidate darwin unix socket test helpers (#21283 )	2025-12-16 09:11:54 -07:00
Steven Masley	8fefd91e4a	feat!: support PKCE in the oauth2 client's auth/exchange flow (#21215 ) Breaking Change: Existing oauth apps might now use PKCE. If an unknown IdP type was being used, and it does not support PKCE, it will break. To fix, set the PKCE methods on the external auth to `none` ``` export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none ```	2025-12-15 17:41:47 +00:00
Steven Masley	3194bcfc9e	chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064 ) Provisioner steps broken into smaller granular actions. Changes: - `ExtractArchive` moved to `init` request (was in `configure`) - Writing `tfstate` moved to `plan` (was in `configure`) - Moved most plan/apply outputs to `GraphComplete`	2025-12-15 11:26:41 -06:00
Zach	7ecfd1aa07	fix: isolate keyring usage by parallel test processes (#21256 ) This change ensures keyring tests that utilize the real OS keyring use credentials that are isolated by process ID so that parallel test processes do not access the same credentials. https://github.com/coder/internal/issues/1192	2025-12-15 09:40:59 -07:00
Asher	27f0413347	feat: add flag to disable template insights (#20940 ) Closes #20399 To summarize the original commit messages: - Do not log stats to the database. - Return errors on the insight endpoints. - Update the frontend to show those errors. - Also fixes an issue with getting the user status count via codersdk, since I added a test to ensure it was not disabled by this flag and it was sending the wrong payload.	2025-12-14 03:00:03 +00:00
Mathias Fredriksson	761dd55ee8	fix(coderd/database): sort template version variables and fix test flake (#21233 ) Previously the GetTemplateVersionVariables query did not sort output, relying on PostgreSQL on-disk ordering which is undeterministic. Variables are now sorted by name because there is no alternative for ordering. Tests were adjusted to accommodate the new ordering, previously they relied on data being written to disk in insert order.	2025-12-12 11:41:46 +00:00
Mathias Fredriksson	3d38cd568e	test(cli): attempt to fix TestGitSSH flake (#21230 ) Since the failing test logs are gone, we can only guess at what went wrong. Given our parallel test-suite, and that tests typically run slow on Windows, it seems reasonable that the context timed out due to a single context being responsbile for setup and two command executions. This change fixes the issue by updating the context usage, if this flake ever resurfaces, we can re-investigate. Fixes coder/internal#770	2025-12-11 18:41:45 +00:00
Mathias Fredriksson	2e4aa729be	test(cli): fix flaky TestProvisioners_Golden (#21228 ) Use a single base time with consistent offsets and ensure CreatedAt is set on all dbgen-created resources. Fixes coder/internal#449	2025-12-11 18:19:18 +00:00
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
George K	4379230a27	feat: add deployment-wide option to disable workspace sharing (#21172 ) Adds `--disable-workspace-sharing` option. Workspace sharing is disabled by not including user and group ACLs in the workspace RBAC object, which prevents ACL-based authz. Closes https://github.com/coder/internal/issues/1072 The commit also adds saving of workspace user/group ACLs in the test DB data generator.	2025-12-09 08:13:09 -08:00

1 2 3 4 5 ...

1723 Commits