coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 13:08:25 +00:00

Author	SHA1	Message	Date
Marcin Tojek	04b0253e8a	feat: add Prometheus metrics for license warnings and errors (#21749 ) Fixes: coder/internal#767 Adds two new Prometheus metrics for license health monitoring: - `coderd_license_warnings` - count of active license warnings - `coderd_license_errors` - count of active license errors Metrics endpoint after startup of a deployment with license enabled: ``` ... # HELP coderd_license_errors The number of active license errors. # TYPE coderd_license_errors gauge coderd_license_errors 0 ... # HELP coderd_license_warnings The number of active license warnings. # TYPE coderd_license_warnings gauge coderd_license_warnings 0 ... ```	2026-01-29 13:50:15 +01:00
Steven Masley	dfbd541cee	chore: move List util out of db2sdk to avoid circular imports (#21733 )	2026-01-28 13:07:53 -06:00
Steven Masley	e13f2a9869	chore: remove extra `stop_modules` from provisionerd proto (#21706 ) Was a duplicate of start_modules Closes https://github.com/coder/coder/issues/21206	2026-01-28 09:25:47 -06:00
Spike Curtis	7090a1e205	chore: renumber duplicate migration 000411 (#21720 ) Fixes recent duplicate DB migration in #21607	2026-01-28 08:01:58 +04:00
Spike Curtis	f358a6db11	chore: convert tailnet tables to UNLOGGED for improved write performance (#21607 ) This migration converts all tailnet coordination tables to UNLOGGED: - `tailnet_coordinators` - `tailnet_peers` - `tailnet_tunnels` UNLOGGED tables skip Write-Ahead Log (WAL) writes, significantly improving performance for high-frequency updates like coordinator heartbeats and peer state changes. The trade-off is that UNLOGGED tables are truncated on crash recovery and are not replicated to standby servers. This is acceptable for these tables because the data is ephemeral: 1. Coordinators re-register on startup 2. Peers re-establish connections on reconnect 3. Tunnels are re-created based on current peer state Migration notes: - Child tables must be converted before the parent table because LOGGED child tables cannot reference UNLOGGED parent tables (but the reverse is allowed) - The down migration reverses the order: parent first, then children Fixes https://github.com/coder/coder/issues/21333	2026-01-28 07:12:32 +04:00
Zach	2204731ddb	feat: implement boundary usage tracker and telemetry collection (#21716 ) Implements telemetry for boundary usage tracking across all Coder replicas and reports them via telemetry. Changes: - Implement Tracker with Track(), FlushToDB(), and StartFlushLoop() methods - Add telemetry integration via collectBoundaryUsageSummary() - Use telemetry lock to ensure only one replica collects per period The tracker accumulates unique workspaces, unique users, and request counts (allowed/denied) in memory, then flushes to the database periodically. During telemetry collection, stats are aggregated across all replicas and reset for the next period.	2026-01-27 19:11:40 -07:00
Steven Masley	799b190dee	fix: do not enforce managed agent limit for non-task workspaces (#21689 ) Only task workspaces have the checks in wsbuilder for violating the managed agent caps in the license. Stopped tasks that are resumed with a regular workspace start still count as usage.	2026-01-27 19:01:17 -06:00
Zach	7dfa33b410	feat: add boundary usage tracking database schema and tracker skeleton (#21670 ) feat: add boundary usage telemetry database schema and RBAC Adds the foundation for tracking boundary usage telemetry across Coder replicas. This includes: - Database schema: `boundary_usage_stats` table with per-replica stats (unique workspaces, unique users, allowed/denied request counts) - Database queries: upsert stats, get aggregated summary, reset stats, delete by replica ID - RBAC: `boundary_usage` resource type with read/update/delete actions, accessible only via system `BoundaryUsageTracker` subject (not regular user roles) - Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/` The tracker accumulates stats in memory and periodically flushes to the database. Stats are aggregated across replicas for telemetry reporting, then reset when a new reporting period begins. The tracker implementation and plumbing will be done in a subsequent commit/PR. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-27 13:29:21 -07:00
George K	c352a51b22	fix(coderd): authorize workspace start/stop/delete by transition action (#21691 ) Use transition-specific actions when authorizing workspace build parameter inserts in the database layer so start/stop/delete do not require workspace.update. Related to: https://github.com/coder/internal/issues/1299	2026-01-27 09:08:12 -08:00
Cian Johnston	7b44976618	fix(coderd/provisionerdserver): correct managed agent tracking (#21696 ) Relates to https://github.com/coder/internal/issues/1282 Updates tracking of managed agents to be predicated instead on the presence of a related `task_id` instead of the presence of a `coder_ai_task` resource.	2026-01-27 12:14:52 +00:00
Mathias Fredriksson	25d7f27cdb	feat(coderd): add task log snapshot storage endpoint (#21644 ) This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot endpoint for agents to upload task conversation history during workspace shutdown. This allows users to view task logs even when the workspace is stopped. The endpoint accepts agentapi format payloads (typically last 10 messages, max 64KB), wraps them in a format envelope, and upserts to the task_snapshots table. Uses agent token auth and validates the task belongs to the agent's workspace. Closes coder/internal#1253	2026-01-27 11:09:24 +02:00
Danny Kopping	7123518baa	feat: conditionally send `aibridge` actor headers (#21643 ) Also passes along the authenticated username as actor metadata. Closes https://github.com/coder/aibridge/issues/135 Depends on https://github.com/coder/aibridge/pull/142 Replace aibridge tag with merge commit once https://github.com/coder/aibridge/pull/142 lands. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-26 15:08:17 +00:00
Cian Johnston	612aae2523	chore: replace httpapi.Heartbeat with httpapi.HeartbeatClose (#21676 ) Relates to https://github.com/coder/coder/pull/21676 * Replaces all existing usages of `httpapi.Heartbeat` with `httpapi.HeartbeatClose` * Removes `httpapi.HeartbeatClose`	2026-01-26 12:11:40 +00:00
Spike Curtis	f47f89d997	chore: remove unused tailnet v1 tables and queries (#21646 ) Removes the legacy tailnet v1 API tables (`tailnet_clients`, `tailnet_agents`, `tailnet_client_subscriptions`) and their associated queries, triggers, and functions. These were superseded by the v2 tables (`tailnet_peers`, `tailnet_tunnels`) in migration 000168, and the v1 API code was removed in commit `d6154c4310`, but the database artifacts were never cleaned up. Changes: - New migration `000410_remove_tailnet_v1_tables` to drop the unused tables - Removed 11 unused queries from `tailnet.sql` - Removed associated manual wrapper methods in `dbauthz` and `dbmetrics` - ~930 lines deleted across 11 files	2026-01-26 14:27:17 +04:00
Danielle Maywood	409360c62d	fix(coderd): ensure inbox WebSocket is closed when client disconnects (#21652 ) Relates to https://github.com/coder/coder/issues/19715 This is similar to https://github.com/coder/coder/pull/19711 This endpoint works by doing the following: - Subscribing to the database's with pubsub - Accepts a WebSocket upgrade - Starts a `httpapi.Heartbeat` - Creates a json encoder - Infinitely loops waiting for notification until request context cancelled The critical issue here is that `httpapi.Heartbeat` silently fails when the client has disconnected. This means we never cancel the request context, leaving the WebSocket alive until we receive a notification from the database and fail to write that down the pipe. By replacing usage of `httpapi.Heartbeat` with `httpapi.HeartbeatClose`, we cancel the context _when the heartbeat fails to write_ due to the client disconnecting. This allows us to cleanup without waiting for a notification to come through the pubsub channel.	2026-01-26 09:24:45 +00:00
Cian Johnston	fa7baebdd8	fix(coderd): handle rbac.NotAuthorizedError when deleting template (#21645 ) Relates to https://github.com/coder/aibridge/pull/143/changes#r2720659638 We previously had been returning the following when attempting to delete failed due to lack of permissions. ``` 500 Internal error deleting template: unauthorized: rbac: forbidden ``` This PR updates the handler to return our usual 403 forbidden response.	2026-01-23 12:02:46 +00:00
Callum Styan	e195856c43	perf: reduce pg_notify call volume by batching together agent metadata updates (#21330 ) --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-22 22:47:49 -08:00
Zach	6c49938fca	feat: add template version ID to re-emitted boundary logs (#21636 ) Adds template_version_id to re-emitted boundary audit logs to allow filtering and analysis by specific template versions iin addition to the existing template_id field. Since boundary policies are defined in the template, the template version is critical to figuring out which policy was responsible for boundaries decision in a workspace. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-22 15:06:02 -07:00
George K	d29a168785	fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620 ) The removal of that permission from the role broke valid use cases (e.g. a site owner user creating a workspace owned by a system account and then trying to share it with another user). The bulk of the PR is made up of the rollbacks of the previously introduced test updates necessitated by the removal. Related to: https://github.com/coder/internal/issues/1285	2026-01-22 08:12:15 -08:00
Zach	6d8e6d4830	feat: include template ID in re-emitted boundary logs (#21618 ) Boundary policies are currently defined at the template level, so including the template ID in re-emitted logs by the control plane allows policy creators to filter and observe boundary activity for specific templates. This makes it easier to verify that policies are working as expected and to debug issues with specific template configurations.	2026-01-22 08:37:16 -07:00
Mathias Fredriksson	4c7844ad3d	feat(coderd): bump workspace deadline on AI agent activity (#21584 ) AI agents report status via patchWorkspaceAgentAppStatus, but this wasn't extending workspace deadlines. This prevented proper task auto-pause behavior, causing tasks to pause mid-execution when there were no human connections. Now we call ActivityBumpWorkspace when agents report status, using the same logic as SSH/IDE connections. We bump when transitioning to or from the working state. Closes coder/internal#1251	2026-01-22 13:52:32 +02:00
Susana Ferreira	47b3846bca	feat: use coder specific header for aibridge authentication from AI proxy (#21590 ) ## Description Introduces a new `X-Coder-Token` header for authenticating requests from AI Proxy to AI Bridge. Previously, the proxy overwrote the `Authorization` header with the Coder token, which prevented the original authentication headers from flowing through to upstream providers. With this change, AI Proxy sets the Coder token in a separate header, preserving the original `Authorization` and `X-Api-Key` headers. AI Bridge uses this header for authentication and removes it before forwarding requests to upstream providers. For requests that don't come through AI Proxy, AI Bridge continues to use `Authorization` and `X-Api-Key` for authentication. ## Changes * Add `HeaderCoderAuth` constant and update `ExtractAuthToken` to check headers in the following order: `X-Coder-Token` > `Authorization` > `X-Api-Key` * Update AI Proxy to set `X-Coder-Token` instead of overwriting `Authorization` * Remove `X-Coder-Token` in AI Bridge before forwarding to upstream providers * Add tests for header handling and token extraction priority Related to: https://github.com/coder/internal/issues/1235	2026-01-21 19:06:19 +00:00
Mathias Fredriksson	97e8a5b093	fix(coderd): allow agent auth during workspace shutdown (#21538 ) Agents were losing authentication during workspace shutdown, causing shutdown scripts to fail. The auth query required agents to belong to the latest build, but during shutdown a `stop` build becomes latest while the `start` build's agents are still running. Modified the auth query to allow `start` build agents to authenticate temporarily during `stop` execution. The query allows auth when: - Agent's `start` build job succeeded - Latest build is `stop` with `pending`/`running` job status - Builds are adjacent (`stop` is `build_number + 1`) - Template versions match Auth closes once `stop` completes. Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to `GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the agent's build (not always latest) during shutdown. Closes coder/internal#1249 Fixes #19467	2026-01-21 13:18:43 +00:00
Danny Kopping	a14a22eb54	feat: support custom bedrock base url (#21582 ) Closes https://github.com/coder/aibridge/issues/126 Depends on https://github.com/coder/aibridge/pull/131 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-21 12:48:56 +00:00
Susana Ferreira	6ef9670384	fix: limit concurrent database connections in prebuild reconciliation (#20908 ) ## Description This PR addresses database connection pool exhaustion during prebuilds reconciliation by introducing two changes: * `CanSkipReconciliation`: Filters out presets that don't need reconciliation before spawning goroutines. This ensures we only create goroutines for presets that will (_most likely_) perform database operations, avoiding unnecessary connection pool usage. * Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`). This replaces the previous hardcoded limit of 5, ensuring the reconciliation loop scales appropriately with the configured pool size while leaving capacity for other database operations. ## Changes * Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns true for inactive presets with no running workspaces, no pending jobs, or expired prebuilds. * Add `maxDBConnections` parameter to `NewStoreReconciler` and compute `reconciliationConcurrency` as half the pool size (minimum 1). * Add `ReconciliationConcurrency()` getter method to `StoreReconciler`. * Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent reconciliation goroutines. * Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for observability. * Add `TestCanSkipReconciliation` unit tests. * Add `TestReconciliationConcurrency` unit tests. * Add benchmark tests for reconciliation performance. ## Benchmarks * `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation actions. All presets are filtered by `CanSkipReconciliation`, resulting in no goroutines spawned and no database connections used. * `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all require reconciliation actions. All presets spawn goroutines, but concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`. * `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a large subset of inactive presets (filtered by `CanSkipReconciliation`) and a smaller subset requiring reconciliation (limited by `eg.SetLimit`). Closes: https://github.com/coder/coder/issues/20606	2026-01-21 10:56:31 +00:00
Mathias Fredriksson	2132c53f28	feat(coderd/database): add schema for task pause/resume lifecycle (#21557 ) Creates migration 000409 with the database foundation for pausing and resuming task workspaces. The task_snapshots table stores conversation history (AgentAPI messages) so users can view task logs even when the workspace is stopped. Each task gets one snapshot, overwritten on each pause. Three new build_reason values (task_auto_pause, task_manual_pause, task_resume) let us distinguish task lifecycle events in telemetry and audit logs from regular workspace operations. Uses a regular table rather than UNLOGGED for snapshots. While UNLOGGED would be faster, losing snapshots on database crash creates user confusion (logs disappear until next pause). We can switch to UNLOGGED post-GA if write performance becomes a problem. Closes coder/internal#1250	2026-01-21 12:12:12 +02:00
Jake Howell	59b71f296f	feat: implement non-brittle `TestDBPurgeAuthorization` (#21442 ) Closes #21440 The `TestDBPurgeAuthorization` test was overfitting by calling each purge method individually, which reimplemented dbpurge logic in the test and created a maintenance burden. When new purge steps are added, they either need to be reflected in the test or there will be a testing blindspot. This change extracts the `doTick` closure into an exported `PurgeTick` function that returns an error, making the core purge logic testable. The test now calls `PurgeTick` directly to exercise the actual dbpurge behavior rather than reimplementing it. Retention values are configured to ensure all purge operations run, so we test RBAC permissions for all code paths. - Tests actual dbpurge behavior instead of reimplementing it - Automatically covers new purge steps when they're added - Still validates that all operations have proper RBAC permissions The test focuses on authorization (checking for RBAC errors) rather than verifying deletion behavior, which is already covered by other tests like `TestDeleteExpiredAPIKeys` and `TestDeleteOldAuditLogs`.	2026-01-21 11:27:01 +11:00
Kacper Sawicki	ed679bb3da	feat(codersdk): add circuit breaker configuration support for aibridge (#21546 ) ## Summary Add circuit breaker support for AI Bridge to protect against cascading failures from upstream AI provider rate limits (HTTP 429, 503, and Anthropic's 529 overloaded responses). ## Changes - Add 5 new CLI options for circuit breaker configuration: - `--aibridge-circuit-breaker-enabled` (default: false) - `--aibridge-circuit-breaker-failure-threshold` (default: 5) - `--aibridge-circuit-breaker-interval` (default: 10s) - `--aibridge-circuit-breaker-timeout` (default: 30s) - `--aibridge-circuit-breaker-max-requests` (default: 3) - Update aibridge dependency to include circuit breaker support - Add tests for pool creation with circuit breaker providers ## Notes - Circuit breaker is disabled by default for backward compatibility - When enabled, applies to both OpenAI and Anthropic providers - Uses sony/gobreaker internally via the aibridge library ## Testing ``` make test RUN=TestPoolWithCircuitBreakerProviders ```	2026-01-20 14:59:29 +01:00
Rowan Smith	b163b4c950	feat: support bundle updates to enable pprof and telemetry collection (#21486 ) - Adds pprof collection support now that we have the listeners automatically starting (requires Coder server 2.28.0+, includes a version check). Collects heap, allocs, profile (30s), block, mutex, goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture for 30 seconds and emits a log line stating as such. Enable capture by supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var. Collection of pprof data from both coderd and the Coder agent occurs. - Adds collection of Prometheus metrics, also requires 2.28.0+ - Adds the ability to include a template in the bundle independently of supplying the details of a running workspace by supplying the `--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var - Captures a list of workspaces the user has access to. Defaults to a max of 10, configurable via `--workspaces-total-cap` / `CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP` - Collects additional stats from the coderd deployment (aggregated workspace/session metrics), as well as entitlements via license and dismissed health checks. created with help from mux	2026-01-20 10:28:52 +11:00
Cian Johnston	9776dc16bd	fix(coderd/database/dbmetrics): fix incorrect query label in GetWorkspaceAgentAndWorkspaceByID (#21576 ) Fixes an incorrect label.	2026-01-19 16:25:36 +00:00
Cian Johnston	08343a7a9f	perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522 ) Relates to https://github.com/coder/internal/issues/1214 The `ExtractWorkspaceAgentParam` middleware ends up making 4 database queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource` -> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that hard work on the floor. The `api.workspaceAgent` handler that references this middleware then has to do all of that work again, plus one more query to get the related `User` so we can get the username. This pattern is also mirrored in `getDatabaseTerminal` but without the middleware. This PR: * Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all this information at once to avoid the multiple round-trips, * Updates the existing usage of `GetWorkspaceAgentByID` to this new query instead, * Updates `ExtractWorkspaceAgentParam` to also store the workspace in the request context Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)	2026-01-19 12:36:33 +00:00
Susana Ferreira	a406ed7cc5	feat: add upstream proxy support to aiproxy for passthrough requests (#21512 ) ## Description Adds upstream proxy support for AI Bridge Proxy passthrough requests. This allows aiproxy to forward non-allowlisted requests through an upstream proxy. Currently, the only supported configuration is when aiproxy is the first proxy in the chain (client → aiproxy → upstream proxy). ## Changes * Add `--aibridge-proxy-upstream` option to configure an upstream HTTP/HTTPS proxy URL for passthrough requests * Add `--aibridge-proxy-upstream-ca` option to trust custom CA certificates for HTTPS upstream proxies * Passthrough requests (non-allowlisted domains) are forwarded through the upstream proxy * MITM'd requests (allowlisted domains) continue to go directly to aibridge, not through the upstream proxy * Add tests for upstream proxy configuration and request routing Closes: https://github.com/coder/internal/issues/1204	2026-01-19 08:50:57 +00:00
Cian Johnston	ad23ea3561	chore: remove unused ExtractWorkspaceAndAgentParam (#21537 ) While investigating https://github.com/coder/internal/issues/1214 I noticed that `ExtractWorkspaceAndAgentParam` appeared to be unused outside of tests.	2026-01-16 15:11:10 +00:00
Cian Johnston	3a62a8e70e	chore: improve healthcheck timeout message (#21520 ) Relates to https://github.com/coder/internal/issues/272 This flake has been persisting for a while, and unfortunately there's no detail on which healthcheck in particular is holding things up. This PR adds a concurrency-safe `healthcheck.Progress` and wires it through `healthcheck.Run`. If the healthcheck times out, it will provide information on which healthchecks are completed / running, and how long they took / are still taking. 🤖 Claude Opus 4.5 completed the first round of this implementation, which I then refactored.	2026-01-15 16:37:05 +00:00
blinkagent[bot]	d5296a4855	chore: add lint/migrations to detect hardcoded public schema (#21496 ) ## Problem Migration 000401 introduced a hardcoded `public.` schema qualifier which broke deployments using non-public schemas (see #21493). We need to prevent this from happening again. ## Solution Adds a new `lint/migrations` Make target that validates database migrations do not hardcode the `public` schema qualifier. Migrations should rely on `search_path` instead to support deployments using non-public schemas. ## Changes - Added `scripts/check_migrations_schema.sh` - a linter script that checks for `public.` references in migration files (excluding test fixtures) - Added `lint/migrations` target to the Makefile - Added `lint/migrations` to the main `lint` target so it runs in CI ## Testing - Verified the linter fails on current `main` (which has the hardcoded `public.` in migration 000401) - Verified the linter passes after applying the fix from #21493 ```bash # On main (fails) $ make lint/migrations ERROR: Migrations must not hardcode the 'public' schema. Use unqualified table names instead. # After fix (passes) $ make lint/migrations Migration schema references OK ``` ## Depends on - #21493 must be merged first (or this PR will fail CI until it is) --------- Signed-off-by: Danny Kopping <danny@coder.com> Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com> Co-authored-by: Danny Kopping <danny@coder.com>	2026-01-15 14:17:16 +02:00
Cian Johnston	5073493850	feat(coderd/database/dbmetrics): add query_counts_total metric (#21506 ) Adds a new Prometheus metric `coderd_db_query_counts_total` that tracks the total number of queries by route, method, and query name. This is aimed at helping us track down potential optimization candidates for HTTP handlers that may trigger a number of queries. It is expected to be used alongside `coderd_api_requests_processed_total` for correlation. Depends upon new middleware introduced in https://github.com/coder/coder/pull/21498 Relates to https://github.com/coder/internal/issues/1214	2026-01-15 10:58:56 +00:00
Cian Johnston	32354261d3	chore(coderd/httpmw): extract HTTPRoute middleware (#21498 ) Extracts part of the prometheus middleware that stores the route information in the request context into its own middleware. Also adds request method information to context. Relates to https://github.com/coder/internal/issues/1214	2026-01-15 10:26:50 +00:00
Ehab Younes	6683d807ac	refactor: add RFC-compliant enum types and use SDK as source of truth (#21468 ) Add comprehensive OAuth2 enum types to codersdk following RFC specifications: - OAuth2ProviderGrantType (RFC 6749) - OAuth2ProviderResponseType (RFC 6749) - OAuth2TokenEndpointAuthMethod (RFC 7591) - OAuth2PKCECodeChallengeMethod (RFC 7636) - OAuth2TokenType (RFC 6749, RFC 9449) - OAuth2RevocationTokenTypeHint (RFC 7009) - OAuth2ErrorCode (RFC 6749, RFC 7009, RFC 8707) Add OAuth2TokenRequest, OAuth2TokenResponse, OAuth2TokenRevocationRequest, and OAuth2Error structs to the SDK. Update OAuth2ClientRegistrationRequest, OAuth2ClientRegistrationResponse, OAuth2ClientConfiguration, and OAuth2AuthorizationServerMetadata to use typed enums instead of raw strings. This makes codersdk the single source of truth for OAuth2 types, eliminating duplication between SDK and server-side structs. Closes #21476	2026-01-15 12:41:28 +03:00
George K	0712faef4f	feat(enterprise): implement organization "disable workspace sharing" option (#21376 ) Adds a per-organization setting to disable workspace sharing. When enabled, all existing workspace ACLs in the organization are cleared and the workspace ACL mutation API endpoints return `403 Forbidden`. This complements the existing site-wide `--disable-workspace-sharing` flag by providing more granular control at the organization level. Closes https://github.com/coder/internal/issues/1073 (part 2) --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-14 09:47:50 -08:00
Danny Kopping	7d5cd06f83	feat: add `aibridge` structured logging (#21492 ) Closes https://github.com/coder/internal/issues/1151 Sample: ``` [API] 2026-01-13 15:50:20.795 [info] coderd.aibridgedserver: interception started trace=8bb5a1d8eb10526cc46ad90f191bb468 span=a3e5b5da9546032a record_type=interception_start interception_id=97461880-4a6c-47c1-8292-3588dd715312 initiator_id=360c6167-a93a-4442-9c3e-f87a6d1cfb66 api_key_id=vg1sbUv97d provider=anthropic model=claude-opus-4-5-20251101 started_at="2026-01-13T15:50:20.790690781Z" metadata={} [API] 2026-01-13 15:50:23.741 [info] coderd.aibridgedserver: token usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=a114f0cc3047296e record_type=token_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu input_tokens=10 output_tokens=8 created_at="2026-01-13T15:50:23.731587038Z" metadata={"cache_creation_input":53194,"cache_ephemeral_1h_input":0,"cache_ephemeral_5m_input":53194,"cache_read_input":0,"web_search_requests":0} [API] 2026-01-13 15:50:26.265 [info] coderd.aibridgedserver: token usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=dbdafb563bff2c9c record_type=token_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu input_tokens=0 output_tokens=130 created_at="2026-01-13T15:50:26.254467904Z" metadata={} [API] 2026-01-13 15:50:26.268 [info] coderd.aibridgedserver: prompt usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=da51887a757226fc record_type=prompt_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu prompt="list the jmia share price" created_at="2026-01-13T15:50:26.255299811Z" metadata={} [API] 2026-01-13 15:50:26.268 [info] coderd.aibridgedserver: interception ended trace=8bb5a1d8eb10526cc46ad90f191bb468 span=3fa25397705ee7c9 record_type=interception_end interception_id=97461880-4a6c-47c1-8292-3588dd715312 ended_at="2026-01-13T15:50:26.25555547Z" [API] 2026-01-13 15:50:26.269 [info] coderd.aibridgedserver: tool usage recorded trace=8bb5a1d8eb10526cc46ad90f191bb468 span=b54af90afc604d29 record_type=tool_usage interception_id=97461880-4a6c-47c1-8292-3588dd715312 msg_id=msg_01VJH1rYKspfun8BW29CrYEu tool=mcp__stonks__getStockPriceSnapshot input="{\"ticker\":\"JMIA\"}" server_url="" injected=false invocation_error="" created_at="2026-01-13T15:50:26.255164652Z" metadata={} ``` Structured logging is only enabled when `CODER_AIBRIDGE_STRUCTURED_LOGGING=true`. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-14 17:26:08 +02:00
blinkagent[bot]	b3a81be1aa	fix(coderd/database): remove hardcoded public schema from migration 000401 (#21493 )	2026-01-14 05:40:30 +02:00
Susana Ferreira	74b6d12a8a	feat: implement selective MITM with configurable domain allowlist in aibridgeproxyd (#21473 ) ## Description Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so that only requests to allowlisted domains are intercepted and decrypted. Requests to all other domains are tunneled directly without decryption. ## Changes * New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default: `api.anthropic.com`,`api.openai.com`) * Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT` requests to allowlisted hosts * Certificate caching: Now only generates/caches certificates for allowlisted domains * Validation: Startup fails if domain allowlist is empty or contains invalid entries Closes: https://github.com/coder/internal/issues/1182	2026-01-13 11:30:51 +00:00
Cian Johnston	64e7a77983	feat: add user_agent to loggermw (#21485 ) Adds the `user_agent` field to `httpmw/loggermw`.	2026-01-13 10:50:01 +00:00
Danny Kopping	49a42eff5c	feat: make database connection pool size configurable (#21403 ) Closes https://github.com/coder/coder/issues/21360 A few considerations/notes: - I've kept the number of conns to 10 in all other places, except coderd - which uses the config value - I opted to also make idle conns configurable; the greater the delta between max open and max idle, the more connection churn - Postgres maintains a [_process_ per connection](https://www.postgresql.org/docs/current/connect-estab.html), contrary to what the comment said previously - Operators should be able to tune this, since process churn can negatively affect OS scheduling - I've set the value to `"auto"` by default so it's not another knob one _has to_ twiddle, and sets max idle = max conns / 3 --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-13 10:50:57 +02:00
George K	cc2efe9e1f	feat(coderd/rbac): make organization-member a per-org system custom role (#21359 ) Migrated the built-in organization-member role to DB storage so it can be customized per org. Closes https://github.com/coder/internal/issues/1073 (part 1)	2026-01-12 18:19:19 -08:00
Kacper Sawicki	6ca70d3618	feat(cli): add --no-build flag to state push for state-only updates (#21374 ) ## Summary Adds a `--no-build` flag to `coder state push` that updates the Terraform state directly without triggering a workspace build. ## Use Case This enables state-only migrations, such as migrating Kubernetes resources from deprecated types (e.g., `kubernetes_config_map`) to versioned types (e.g., `kubernetes_config_map_v1`): ```bash coder state pull my-workspace > state.json terraform init terraform state rm -state=state.json kubernetes_config_map.example terraform import -state=state.json kubernetes_config_map_v1.example default/example coder state push --no-build my-workspace state.json ``` ## Changes - Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state without triggering a build - Add `UpdateWorkspaceBuildState` SDK method - Add `--no-build`/`-n` flag to `coder state push` - Add confirmation prompt (can be skipped with `--yes`/`-y`) since this is a potentially dangerous operation - Add test for `--no-build` functionality Fixes #21336	2026-01-12 15:16:59 +01:00
Zach	091d31224d	fix: replace moby/moby namesgenerator with internal implementation (#21377 ) Replace the external moby/moby/pkg/namesgenerator dependency with an internal implementation using gofakeit/v7. The moby package has ~25k unique name combinations, and with its retry parameter only adds a random digit 0-9, giving ~250k possibilities. In parallel tests, this has led to collisions (flakes). The new internal API at coderd/util/namesgenerator eliminates the external dependnecy and offers functions with explicit uniqueness guarantees. This PR also consolidates fragmented name generation in a few places to use the new package. \| Old (moby/moby) \| New \| \|-------------------------------------\|------------------------\| \| namesgenerator.GetRandomName(0) \| NameWith("_") \| \| namesgenerator.GetRandomName(>0) \| NameDigitWith("_") \| \| testutil.GetRandomName(t) \| UniqueName() \| \| testutil.GetRandomNameHyphenated(t) \| UniqueNameWith("-") \| namesgenerator package API: - NameWith(delim): random name, not unique - NameDigitWith(delim): random name with 1-9 suffix, not unique - UniqueName(): guaranteed unique via atomic counter - UniqueNameWith(delim): unique with custom delimiter Names continue to be docker style `[adjective][delim][surname]`. Unique names are truncated to 32 characters (preserving the numeric suffix) to fit common name length limits in Coder. Related test flakes: https://github.com/coder/internal/issues/1212 https://github.com/coder/internal/issues/118 https://github.com/coder/internal/issues/1068	2026-01-09 15:40:26 -07:00
Steven Masley	60b3fd0783	chore!: send modules archive over the proto messages (#21398 ) # What this does Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces. This allow terraform to skip downloading modules from their registries, a step that takes seconds. <img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" /> # Wire protocol The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit. # 🚨 Behavior Change (Breaking Change) 🚨 Before this PR modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version After this PR modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.	2026-01-09 11:33:34 -06:00
Steven Masley	d2044c2ee9	chore: update protobuf to reuse file request (#21447 ) This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398 Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`. Renamed to `FileUpload` because it is now bi-directional. This is backwards compatible. I tested it to confirm the payloads are identical. Types were just renamed and moved around. ```golang func TestTypeUpgrade(t *testing.T) { t.Parallel() x := &proto2.UploadFileRequest{ Type: &proto2.UploadFileRequest_ChunkPiece{ ChunkPiece: &proto.ChunkPiece{ Data: []byte("Hello World!"), FullDataHash: []byte("Foobar"), PieceIndex: 42, }, }, } data, err := protobuf.Marshal(x) require.NoError(t, err) // Exactly the same output // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main` // EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch fmt.Println(base64.StdEncoding.EncodeToString(data)) } ``` # What this does This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.	2026-01-09 11:23:32 -06:00
Steven Masley	89f4d60e7b	chore: remove experiment "terraform-directory-reuse" (#21397 ) Experiment is no longer required, the new method will be released without an experiment and without a toggle Main PR is: https://github.com/coder/coder/pull/21398	2026-01-09 11:13:16 -06:00

1 2 3 4 5 ...

3133 Commits