coder

mirror of https://github.com/coder/coder.git synced 2026-06-04 21:48:22 +00:00

Author	SHA1	Message	Date
Steven Masley	60b3fd0783	chore!: send modules archive over the proto messages (#21398 ) # What this does Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces. This allow terraform to skip downloading modules from their registries, a step that takes seconds. <img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" /> # Wire protocol The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit. # 🚨 Behavior Change (Breaking Change) 🚨 Before this PR modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version After this PR modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.	2026-01-09 11:33:34 -06:00
Steven Masley	d2044c2ee9	chore: update protobuf to reuse file request (#21447 ) This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398 Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`. Renamed to `FileUpload` because it is now bi-directional. This is backwards compatible. I tested it to confirm the payloads are identical. Types were just renamed and moved around. ```golang func TestTypeUpgrade(t *testing.T) { t.Parallel() x := &proto2.UploadFileRequest{ Type: &proto2.UploadFileRequest_ChunkPiece{ ChunkPiece: &proto.ChunkPiece{ Data: []byte("Hello World!"), FullDataHash: []byte("Foobar"), PieceIndex: 42, }, }, } data, err := protobuf.Marshal(x) require.NoError(t, err) // Exactly the same output // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main` // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch fmt.Println(base64.StdEncoding.EncodeToString(data)) } ``` # What this does This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.	2026-01-09 11:23:32 -06:00
Steven Masley	89f4d60e7b	chore: remove experiment "terraform-directory-reuse" (#21397 ) Experiment is no longer required, the new method will be released without an experiment and without a toggle Main PR is: https://github.com/coder/coder/pull/21398	2026-01-09 11:13:16 -06:00
Cian Johnston	b116d22c5f	chore: manage tool versions in go.mod (#21455 ) Go 1.24 adds [tool dependencies](https://go.dev/doc/modules/managing-dependencies#tools). This allows us to track versions of tools in our `go.mod` instead of sprinkling various `go run` commands throughout our codebase. NOTE: there are still various hard-coded `go install` commands in our dogfood Dockerfile. As that list is likely severely outdated, will leave that for a separate PR.	2026-01-08 16:25:28 +00:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Cian Johnston	0f446f99dd	feat(cli): add logs cmd (#21430 ) This PR adds a command to view the provisioner and agent logs for a given workspace. Note: I did investigate using the existing `cliui` methods to tail the logs but they are tailored to a very specific use-case. Other changes: - Adds `Agents` to `dbfake.WorkspaceResponse` - Adds methods to generate provisioner and agent logs in `dbgen` --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-08 09:58:10 +00:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Spike Curtis	41a966c284	fix: sort latest key by sequence correctly (#21425 ) Fixes an issue where we will not correctly return the latest key by sequence number if the fetch returns them in a order where the latest key is not last. The db query uses `ORDER BY sequence DESC` it is likely we have been operating incorrectly. Adds a second key to one of the test cases which fails without this fix. Also includes some debug logging statements I found helpful while chasing key rotation issues.	2026-01-06 14:01:51 +04:00
Asher	4a97df3768	chore: rename flag to disable template insights (#21329 ) Because this affects more than just the template insights page (specifically it also affects the deployment stats endpoint which is shown on bottom bar and Prometheus), the group is being renamed generically to just "stats collection". In the future if we need to affect the other stats we can put those options here. Then, because this change only affects a portion of stats, specifically usage stats like connection and application time, bytes sent, etc, add a new sub-group called "usage stats". Then finally add back the "enable" flag. This also gives us a place to one day place an "anonymize" flag if we need to go that route.	2026-01-05 11:44:06 -09:00
George K	e10fceb23c	fix(coderd/database): allow same custom role name for different orgs (#21312 ) Previously the `idx_custom_roles_name_lower` index prevented that. A check constraint was also added to ensure the `organization_id` column cannot be set to the all-zero UUID.	2026-01-05 07:43:08 -08:00
Zach	07924037e7	feat: add boundary log forwarding from agent to coderd (#21345 ) Add agent forwarding of boundary audit logs from workspaces to coderd via agent API, and re-emission of boundary logs to coderd stderr. This change adds a server to the workspace agent that always listens on a unix socket for boundary to connect and send audit logs. coderd log format example: ``` [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=.. ``` Corresponding boundary PR: https://github.com/coder/boundary/pull/124 RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba https://github.com/coder/coder/issues/21280	2025-12-31 16:38:19 -07:00
Danny Kopping	733b6b7db9	feat: add API to serve proxy certificate (#21391 ) Closes https://github.com/coder/internal/issues/1184	2025-12-29 18:00:06 +00:00
Susana Ferreira	b97572285a	feat: add core AI MITM proxy daemon (#21296 ) ## Description Adds the core AI Bridge MITM proxy daemon. This proxy intercepts HTTPS traffic, decrypts it using a configured CA certificate, and forwards requests to AIBridge for processing. ## Changes * Added `aibridgeproxyd` package with the core proxy server implementation * Added configuration options: `CODER_AIBRIDGE_PROXY_ENABLED`, `CODER_AIBRIDGE_PROXY_LISTEN_ADDR`, `CODER_AIBRIDGE_PROXY_CERT_FILE`, `CODER_AIBRIDGE_PROXY_KEY_FILE` * Added tests for server initialization and MITM functionality Closes https://github.com/coder/internal/issues/1180	2025-12-29 15:31:51 +00:00
Danielle Maywood	5655760f1d	test: use deterministic time to avoid time-based flake (#21396 ) Use deterministic time to avoid time-based flake test failure.	2025-12-29 14:25:14 +00:00
Danielle Maywood	05529139bc	feat(coderd): support deleting dev containers (#21248 ) Add an endpoint to coderd to support deleting dev containers	2025-12-24 12:34:39 +00:00
Danielle Maywood	44a46db487	feat(agent): support deleting dev containers (#21247 ) Add logic to the agent, and an endpoint, to allow requesting and then deleting a Dev Container and its related agent.	2025-12-22 11:28:31 +00:00
Spike Curtis	6238065185	test: use not before in TestAgentConnectionMonitor_* (#21332 ) fixes https://github.com/coder/internal/issues/1203 The matcher I wrote for TestAgentConnectionMonitor tested that `last_disconnected_at` was strictly _after_ the start of the test to ensure it was updated. This is too strict of a test because Windows in particular doesn't have high-resolution timers, so it's entirely possible to get the exact same timestamp from subsequent calls to `time.Now()`. This PR switches the test to _not before_ to cover this case. The results are just as valid because we always initialize the `last_disconnected_at` to something well before the test starts.	2025-12-22 10:21:39 +04:00
Zach	9d1493a13a	feat: add initial API for boundary log forwarding to coderd (#21293 ) Add the AgentAPI changes to support the feature that transmits boundary logs from workspaces to coderd via the agent API for eventual re-emission to stderr. The API handlers are stubs for now because I'm trying to land this feature from multiple smaller PRs. High level architecture: - Boundary records resource access in batches and sends proto message to agent - Agent proxies messages to coderd (captured by the API changes in this PR) - coderd re-emits logs to stderr RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba	2025-12-19 10:41:39 -07:00
Jake Howell	ea00e72063	feat: add rbac specificity for `dbpurge` (#21088 ) Related to [`internal#1139`](https://github.com/coder/internal/issues/1139) Continuation of #21074 This implements some RBAC role specificity for `dbpurge`, ensuring that we follow the least-privileged model for removing data from the database. It is specified as following. ```go Site: rbac.Permissions(map[string][]policy.Action{ // DeleteOldWorkspaceAgentLogs // DeleteOldWorkspaceAgentStats // DeleteOldProvisionerDaemons // DeleteOldTelemetryLocks // DeleteOldAuditLogConnectionEvents // DeleteOldConnectionLogs rbac.ResourceSystem.Type: {policy.ActionDelete}, // DeleteOldNotificationMessages rbac.ResourceNotificationMessage.Type: {policy.ActionDelete}, // ExpirePrebuildsAPIKeys // DeleteExpiredAPIKeys rbac.ResourceApiKey.Type: {policy.ActionDelete}, // DeleteOldAIBridgeRecords rbac.ResourceAibridgeInterception.Type: {policy.ActionDelete}, }), ``` \| Position \| Pull-request \| \| -------- \| ------------ \| \| \| [feat: add prometheus observability metrics for `dbpurge`](https://github.com/coder/coder/pull/21074) \| \| ✅ \| [feat: add rbac specificity for `dbpurge`](https://github.com/coder/coder/pull/21088) \|	2025-12-20 01:02:39 +11:00
Jake Howell	00793cc0b5	feat: add prometheus observability metrics for `dbpurge` (#21074 ) Related to [`internal#1139`](https://github.com/coder/internal/issues/1139) This implements some prometheus metrics for records being removed from the database. Currently we're tracking the following fields being removed from the DB by this. They're viewable in the `/api/v2/debug/metrics` endpoint. * `expired_api_keys` * `aibridge_records` * `connection_logs` * `duration` ``` # HELP coderd_dbpurge_iteration_duration_seconds Duration of each dbpurge iteration in seconds. # TYPE coderd_dbpurge_iteration_duration_seconds histogram coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="1"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="5"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="10"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="30"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="60"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="300"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="600"} 1 coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="+Inf"} 1 coderd_dbpurge_iteration_duration_seconds_sum{success="true"} 0.014787814 coderd_dbpurge_iteration_duration_seconds_count{success="true"} 1 # HELP coderd_dbpurge_records_purged_total Total number of records purged by type. # TYPE coderd_dbpurge_records_purged_total counter coderd_dbpurge_records_purged_total{record_type="aibridge_records"} 0 coderd_dbpurge_records_purged_total{record_type="audit_logs"} 0 coderd_dbpurge_records_purged_total{record_type="connection_logs"} 0 coderd_dbpurge_records_purged_total{record_type="expired_api_keys"} 0 coderd_dbpurge_records_purged_total{record_type="workspace_agent_logs"} 0 ``` \| Position \| Pull-request \| \| -------- \| ------------ \| \| ✅ \| [feat: add prometheus observability metrics for `dbpurge`](https://github.com/coder/coder/pull/21074) \| \| \| [feat: add rbac specificity for `dbpurge`](https://github.com/coder/coder/pull/21088) \|	2025-12-20 00:20:57 +11:00
Cian Johnston	8248fa3b84	fix(coderd): wake dormant workspace when attempting to start it (#21306 ) Relates to #20925 This PR modifies the `postWorkspaceBuild` handler to automatically unset dormancy on a workspace when a start transition is requested. Previously, the client was responsible for unsetting the dormancy on the workspace prior to posting a workspace build.	2025-12-18 10:35:04 +00:00
Spike Curtis	c5fc6defb8	fix: report correct request paths from workspace proxy metrics (#21302 ) I noticed while looking at scale test metrics that we don't always report a useful path in the API request metrics. ![image.png](https://app.graphite.com/user-attachments/assets/a5b0dadf-9c2f-46a8-a6c1-3ad5f6201edb.png) There are a lot of requests with path `/*`. I chased this problem to the workspace proxy, where we mount a the proxy router as a child of a "root" router to support some high level endpoints like `latency-check`. Because we query the path from the Chi route context in the prometheus middleware _before_ the request is actually handled, we can have a partially resolved pattern match only corresponding to the root router. The fix is to always re-resolve the path, rather than accept a partially resolved path.	2025-12-17 21:08:40 +04:00
Spike Curtis	bd753d9cb9	fix: mark users seen when activating on login (#21305 ) fixes #21303 Update user last_seen_at when we mark them active on login. This prevents a narrow race where they can be re-marked dormant and fail to log in.	2025-12-17 16:49:40 +04:00
Mathias Fredriksson	dac822b7f4	refactor: remove deprecated AITaskPromptParameterName constant (#21023 ) This removes the deprecated AITaskPromptParameterName constant and all backward compatibility code that was added for v2.28. - Remove AITaskPromptParameterName constant from codersdk/aitasks.go - Remove backward compatibility code in coderd/aitasks.go that populated the "AI Prompt" parameter for templates that defined it - Remove the backward compatibility test (OK AIPromptBackCompat) - Update dbfake to no longer set the AI Prompt parameter - Remove AITaskPromptParameterName from frontend TypeScript types - Remove preset prompt read-only feature from TaskPrompt component - Update docs to reflect that pre-2.28 definition is no longer supported Task prompts are now exclusively stored in the tasks.prompt database column, as introduced in the migration that added the tasks table.	2025-12-16 15:14:59 +00:00
Asher	871ed128aa	chore: update azure certs (#21265 )	2025-12-15 13:44:44 -09:00
Steven Masley	8fefd91e4a	feat!: support PKCE in the oauth2 client's auth/exchange flow (#21215 ) Breaking Change: Existing oauth apps might now use PKCE. If an unknown IdP type was being used, and it does not support PKCE, it will break. To fix, set the PKCE methods on the external auth to `none` ``` export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none ```	2025-12-15 17:41:47 +00:00
Steven Masley	3194bcfc9e	chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064 ) Provisioner steps broken into smaller granular actions. Changes: - `ExtractArchive` moved to `init` request (was in `configure`) - Writing `tfstate` moved to `plan` (was in `configure`) - Moved most plan/apply outputs to `GraphComplete`	2025-12-15 11:26:41 -06:00
George K	103967ed02	feat: add sharing info to /workspaces endpoint (#21049 ) closes: https://github.com/coder/internal/issues/858 Similar to https://github.com/coder/coder/pull/19375, this one uses system permissions for fetching actual user and group data. Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it. Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g: `ALTER TABLE workspaces ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');` Also adds missing `OwnerName` in `ConvertWorkspaceRows`.	2025-12-15 08:42:08 -08:00
Spike Curtis	71c6dc4043	fix: stop disconnecting from coderd early and record disconnect correctly (#21250 ) fixes https://github.com/coder/internal/issues/1196 The above issue exposes two different bugs in Coder. In the agent, there is a race where if the agent is closed while starting up networking, it will erroneously disconnect from Coderd, which delays or breaks writing final status and logs. In Coderd, there is a bug where we don't properly record the latest agent disconnection time if the agent had previously disconnected. This causes us to report the agent status as "Connected" even after it has disconnected up until the inactivity timeout fires. This PR fixes both issues. It also slightly reworks when we send workspace updates based on connection and disconnection. Previously we would send two updates when the agent connected in certain circumstances, even though the status would be the same in both (only times changed). Now we universally only send one on connect, and then another on disconnect.	2025-12-15 12:04:01 +04:00
Asher	27f0413347	feat: add flag to disable template insights (#20940 ) Closes #20399 To summarize the original commit messages: - Do not log stats to the database. - Return errors on the insight endpoints. - Update the frontend to show those errors. - Also fixes an issue with getting the user status count via codersdk, since I added a test to ensure it was not disabled by this flag and it was sending the wrong payload.	2025-12-14 03:00:03 +00:00
Mathias Fredriksson	761dd55ee8	fix(coderd/database): sort template version variables and fix test flake (#21233 ) Previously the GetTemplateVersionVariables query did not sort output, relying on PostgreSQL on-disk ordering which is undeterministic. Variables are now sorted by name because there is no alternative for ordering. Tests were adjusted to accommodate the new ordering, previously they relied on data being written to disk in insert order.	2025-12-12 11:41:46 +00:00
Danielle Maywood	f45a179181	test: move context to after db creation (#21224 ) Closes https://github.com/coder/internal/issues/1040 We move the context to just before it is used to avoid the scenario where NewDB takes a while to spin up and runs up the context to the deadline.	2025-12-11 21:51:16 +00:00
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
Danielle Maywood	8ead6f795d	test: close provisioner before creating workspace build (#21219 ) Closes https://github.com/coder/internal/issues/1178 I verified the fix works by adding a `time.Sleep(100 *time.Millisecond)` between the `CreateWorkspaceBuild` and`CancelWorkspaceBuild` calls. Adding this reliably triggered the flake, and when I added the fix the flake stopped happening.	2025-12-11 13:36:33 +00:00
Danielle Maywood	c3224b793e	fix: handle scenario where provisionerdserver deletes task before coderd (#21220 )	2025-12-11 13:04:13 +00:00
Callum Styan	8ed1c1d372	perf: reduce calls to GetWorkspaceByAgentID in GetWorkspaceAgentByID (#21046 ) This PR piggy backs on the agent API cached workspace added in an earlier PR to provide a fast path for avoiding `GetWorkspaceByAgentID` calls in dbauthz's `GetWorkspaceAgentByID`. This query is not the most expensive, but has a significant call volume at ~16 million calls per week. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-12-10 14:03:24 -08:00
George K	da71e546bb	chore: fix test errors on newer debian-based systems due to deprecated TZ (#21115 ) It appears on newer Debian systems `Canada/Newfoundland` TZ is not present and `America/St_Johns` should be used instead. Coder tests use a docker PG image where `Canada/Newfoundland` is still supported: ``` $ docker run --rm -it us-docker.pkg.dev/coder-v2-images-public/public/postgres:17 bash root@ca99e82721dc:/# ls -l /usr/share/zoneinfo/Canada/Newfoundland lrwxrwxrwx 1 root root 19 Mar 26 2025 /usr/share/zoneinfo/Canada/Newfoundland -> ../America/St_Johns ``` However, if a local PG instance is running on a Debian Trixie host, coder test will use it and error out due to the zone being unavailable: ``` $ docker run --rm -it debian:trixie bash root@f285092767e4:/# ls -l /usr/share/zoneinfo/Canada/Newfoundland ls: cannot access '/usr/share/zoneinfo/Canada/Newfoundland': No such file or directory root@f285092767e4:/# ls -l /usr/share/zoneinfo/America/St_Johns -rw-r--r-- 1 root root 3655 Aug 24 20:12 /usr/share/zoneinfo/America/St_Johns ``` ... which causes the tests to error out: ``` $ go test ./enterprise/coderd --- FAIL: TestWorkspaceTemplateParamsChange (0.13s) workspaces_test.go:3097: TestWorkspaceTagsTerraform: using cached terraform providers workspaces_test.go:3097: Set TF_CLI_CONFIG_FILE=/home/geo/.cache/coderv2-test/terraform_workspace_tags_test/a28ed341dee8/terraform.rc coderdenttest.go:84: Error Trace: /home/geo/coder/coderd/database/dbtestutil/db.go:161 /home/geo/coder/coderd/database/dbtestutil/db.go:122 /home/geo/coder/coderd/coderdtest/coderdtest.go:270 /home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:105 /home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:84 /home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:84 /home/geo/coder/enterprise/coderd/workspaces_test.go:3103 Error: Received unexpected error: pq: invalid value for parameter "TimeZone": "Canada/Newfoundland" Test: TestWorkspaceTemplateParamsChange Messages: failed to set timezone for database ... ``` This commit replaces the problematic TZ with the canonical one.	2025-12-10 08:09:13 -08:00
Callum Styan	27c3ec072e	perf: support fastpath in dbauthz GetLatestWorkspaceBuildByWorkspaceID (#21047 ) This PR piggy backs on the agent API cached workspace added in earlier PRs to provide a fast path for avoiding `GetWorkspaceByID` calls in `GetLatestWorkspaceBuildByWorkspaceID` via injection of the workspaces RBAC object into the context. We can do this from the `agentConnectionMonitor` easily since we already cache the workspace. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-12-09 15:53:52 -08:00
Callum Styan	a59a84b2a7	perf: optimize GetTemplateAppInsightsByTemplate by pre-filtering on start/end times (#20669 ) In this PR we're optimizing the `GetTemplateAppInsightsByTemplate` query by pre-filtering out apps which do not have an active session during the start/end time window. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-12-09 15:21:16 -08:00
Callum Styan	6abb889fab	perf: optimize GetDeploymentWorkspaceAgentStats by eliminating 2nd select (#21112 ) Tracking issue here: https://github.com/coder/internal/issues/1009 To summarize, the current version of this query selects from `workspace_agent_stats` twice. The expensive portion of this query is the bitmap heap scan we have to do for each of these selects. We can easily cut the cost of this query by 40-50% by cutting this down to a single select, and using those rows for both sets of calculations. Eliminating the heap scan itself would require a follow up PR to introduce a new index. Blink helped with the rewrite of the query. The current plan looks like this: ``` Nested Loop (cost=6101.64..6101.69 rows=1 width=64) (actual time=11.782..11.787 rows=1 loops=1) -> Aggregate (cost=2996.17..2996.19 rows=1 width=32) (actual time=3.356..3.357 rows=1 loops=1) -> Bitmap Heap Scan on workspace_agent_stats (cost=54.80..2992.86 rows=440 width=24) (actu al time=0.346..2.927 rows=818 loops=1) Recheck Cond: (created_at > (now() - '00:15:00'::interval)) Filter: (connection_median_latency_ms > '0'::double precision) Rows Removed by Filter: 1070 Heap Blocks: exact=486 -> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..54.69 rows=1368 width =0) (actual time=0.241..0.241 rows=1888 loops=1) Index Cond: (created_at > (now() - '00:15:00'::interval)) -> Aggregate (cost=3105.47..3105.49 rows=1 width=32) (actual time=8.418..8.420 rows=1 loops=1) -> Subquery Scan on a (cost=3060.95..3105.39 rows=7 width=32) (actual time=7.851..8.394 ro ws=63 loops=1) Filter: (a.rn = 1) -> WindowAgg (cost=3060.95..3088.29 rows=1368 width=209) (actual time=7.850..8.382 r ows=63 loops=1) Run Condition: (row_number() OVER (?) <= 1) -> Sort (cost=3060.93..3064.35 rows=1368 width=56) (actual time=7.836..8.036 r ows=1888 loops=1) Sort Key: workspace_agent_stats_1.agent_id, workspace_agent_stats_1.create d_at DESC Sort Method: quicksort Memory: 181kB -> Bitmap Heap Scan on workspace_agent_stats workspace_agent_stats_1 (co st=55.03..2989.67 rows=1368 width=56) (actual time=0.388..2.096 rows=1888 loops=1) Recheck Cond: (created_at > (now() - '00:15:00'::interval)) Heap Blocks: exact=486 -> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..54. 69 rows=1368 width=0) (actual time=0.295..0.295 rows=1888 loops=1) Index Cond: (created_at > (now() - '00:15:00'::interval)) Planning Time: 2.350 ms Execution Time: 13.152 ms (24 rows) ``` The new plan looks like this ``` Aggregate (cost=2966.96..2966.98 rows=1 width=64) (actual time=3.812..3.814 rows=1 loops=1) -> WindowAgg (cost=2891.96..2916.94 rows=1250 width=88) (actual time=2.696..3.412 rows=1890 loop s=1) -> Sort (cost=2891.94..2895.06 rows=1250 width=80) (actual time=2.686..2.780 rows=1890 loo ps=1) Sort Key: workspace_agent_stats.agent_id, workspace_agent_stats.created_at DESC Sort Method: quicksort Memory: 226kB -> Bitmap Heap Scan on workspace_agent_stats (cost=50.11..2827.64 rows=1250 width=80 ) (actual time=0.218..1.551 rows=1890 loops=1) Recheck Cond: (created_at > (now() - '00:15:00'::interval)) Heap Blocks: exact=474 -> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..49.80 rows=1250 width=0) (actual time=0.146..0.147 rows=1890 loops=1) Index Cond: (created_at > (now() - '00:15:00'::interval)) Planning Time: 0.534 ms Execution Time: 3.969 ms (12 rows) ``` If we compare the results of the query they're similar enough that any differences can be attributed to slightly different timestamps for `now()` in the version of the query I am using to generate results for comparison: ``` workspace_rx_bytes \| workspace_tx_bytes \| workspace_connection_latency_50 \| workspace_connection_latency_95 \| session_count_vscode \| session_count_ssh \| session_count_jetbrains \| session_count_reconnecting_pty --------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+-------------------------------- 15263563 \| 74555854 \| 47.933 \| 250.5522 \| 239 \| 59 \| 3 \| 3 (1 row) workspace_rx_bytes \| workspace_tx_bytes \| workspace_connection_latency_50 \| workspace_connection_latency_95 \| session_count_vscode \| session_count_ssh \| session_count_jetbrains \| session_count_reconnecting_pty --------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+-------------------------------- 15295819 \| 74598410 \| 47.933 \| 250.5522 \| 239 \| 59 \| 3 \| 3 ``` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-12-09 15:19:55 -08:00
George K	4379230a27	feat: add deployment-wide option to disable workspace sharing (#21172 ) Adds `--disable-workspace-sharing` option. Workspace sharing is disabled by not including user and group ACLs in the workspace RBAC object, which prevents ACL-based authz. Closes https://github.com/coder/internal/issues/1072 The commit also adds saving of workspace user/group ACLs in the test DB data generator.	2025-12-09 08:13:09 -08:00
blinkagent[bot]	4844c978d8	fix: improve task naming prompt to avoid URL content guessing (#21151 ) Previously, when a user created a task with a URL-only prompt (e.g., `Let's work on https://github.com/coder/coder/issues/21138`), the LLM would hallucinate what the URL content might be about - generating names like "Fix GitHub Actions workflow issue" when the actual issue was unrelated. Add examples to the task naming system prompt showing expected behavior for GitHub issue and PR URLs, teaching the model to use visible URL parts (repo name, issue/PR number) rather than guessing content. Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-12-09 09:10:54 -06:00
Dean Sheather	b199eb1c38	fix: allow stops and deletes after breaching AI limit (#21186 ) Fixes a bug a customer encountered once they breached their limit. Adds a test.	2025-12-09 11:05:12 +00:00
Asher	3a0e8af6e3	feat: add view workspace button to app error page (#20960 ) Closes #19984 As part of this, I refactored the error template to take in a slice of actions rather than using individual booleans and strings to control the behavior. We decided a link resolves the issue for now so that is what I added, although we may want to consider a way to start the workspace and follow the logs dynamically on that page and then show the app when finished (similar to the tasks page), or at least make the link automatically start the workspace instead of only taking you to the dashboard where you have to then start the workspace.	2025-12-08 14:16:00 -09:00
blinkagent[bot]	50d42ab0b9	docs: document 200 OK response for upload file API when file exists (#21071 ) Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-12-08 16:04:56 -06:00
blinkagent[bot]	b4be5bcfed	docs: fix swagger tags for license endpoints (#21101 ) ## Summary Change `@Tags` from `Organizations` to `Enterprise` for `POST /licenses` and `POST /licenses/refresh-entitlements` to match the `GET` and `DELETE` license endpoints which are already tagged as `Enterprise`. ## Problem The license API endpoints were inconsistently tagged in the swagger annotations: - `GET /licenses` → `Enterprise` ✓ - `DELETE /licenses/{id}` → `Enterprise` ✓ - `POST /licenses` → `Organizations` ✗ - `POST /licenses/refresh-entitlements` → `Organizations` ✗ This caused the POST endpoints to be documented in the [Organizations API docs](https://coder.com/docs/reference/api/organizations) instead of the [Enterprise API docs](https://coder.com/docs/reference/api/enterprise) where the other license endpoints live. ## Fix Simply updated the `@Tags` annotation from `Organizations` to `Enterprise` for both POST endpoints. This was an oversight from the original swagger docs addition in #5625 (January 2023). Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-12-05 15:27:22 +00:00
Callum Styan	83dbf73dde	perf: don't calculate build times for deleted templates (#21072 ) The metrics cache to calculate and expose build time metrics for templates currently calls `GetTemplates`, which returns all templates even if they are deleted. We can use the `GetTemplatesWithFilter` query to easily filter out deleted templates from the results, and thus not call `GetTemplateAverageBuildTime` for those deleted templates. Delete time for workspaces for non-deleted templates is still calculated. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-12-04 10:27:56 -08:00
Mathias Fredriksson	cfdd4a9b88	perf(coderd/database): add index on workspace_app_statuses.app_id (#21099 )	2025-12-04 17:56:13 +02:00
Mathias Fredriksson	532a1f3054	fix(coderd): exclude sub-agents from workspace health calculation (#21098 )	2025-12-04 15:38:24 +02:00
Spike Curtis	40df21ed62	fix: fixes use of possibly nil RemoteAddr() and LocalAddr() return values (#21076 ) fixes: https://github.com/coder/internal/issues/1143 Both gVisor and the Go standard library implementations of `net.Conn` can under certain circumstances return `nil` for `RemoteAddr()` and `LocalAddr()` calls. If we call their methods, we segfault. This PR fixes these calls and adds ruleguard rules. Note that `slog.F("remote_addr", conn.RemoteAddr())` is fine because slog detects the `nil` before attempting to stringify the type.	2025-12-03 15:06:00 +04:00

1 2 3 4 5 ...

3086 Commits