coder

mirror of https://github.com/coder/coder.git synced 2026-06-07 15:08:20 +00:00

Author	SHA1	Message	Date
Marcin Tojek	04b0253e8a	feat: add Prometheus metrics for license warnings and errors (#21749 ) Fixes: coder/internal#767 Adds two new Prometheus metrics for license health monitoring: - `coderd_license_warnings` - count of active license warnings - `coderd_license_errors` - count of active license errors Metrics endpoint after startup of a deployment with license enabled: ``` ... # HELP coderd_license_errors The number of active license errors. # TYPE coderd_license_errors gauge coderd_license_errors 0 ... # HELP coderd_license_warnings The number of active license warnings. # TYPE coderd_license_warnings gauge coderd_license_warnings 0 ... ```	2026-01-29 13:50:15 +01:00
Cian Johnston	c2c225052a	chore(enterprise/coderd): ensure TestManagedAgentLimit differentiates between tasks and workspaces (#21731 ) My previous change to this test did not create another workspace using the template containing `coder_ai_task` resources, meaning that this test was not actually testing the right thing. This PR addresses this oversight.	2026-01-28 16:30:56 +00:00
Zach	2204731ddb	feat: implement boundary usage tracker and telemetry collection (#21716 ) Implements telemetry for boundary usage tracking across all Coder replicas and reports them via telemetry. Changes: - Implement Tracker with Track(), FlushToDB(), and StartFlushLoop() methods - Add telemetry integration via collectBoundaryUsageSummary() - Use telemetry lock to ensure only one replica collects per period The tracker accumulates unique workspaces, unique users, and request counts (allowed/denied) in memory, then flushes to the database periodically. During telemetry collection, stats are aggregated across all replicas and reset for the next period.	2026-01-27 19:11:40 -07:00
Steven Masley	799b190dee	fix: do not enforce managed agent limit for non-task workspaces (#21689 ) Only task workspaces have the checks in wsbuilder for violating the managed agent caps in the license. Stopped tasks that are resumed with a regular workspace start still count as usage.	2026-01-27 19:01:17 -06:00
Cian Johnston	7b44976618	fix(coderd/provisionerdserver): correct managed agent tracking (#21696 ) Relates to https://github.com/coder/internal/issues/1282 Updates tracking of managed agents to be predicated instead on the presence of a related `task_id` instead of the presence of a `coder_ai_task` resource.	2026-01-27 12:14:52 +00:00
Jake Howell	6f15b178a4	feat: extend premium license for `aigovernance` (#21499 ) Closes [#1227](https://github.com/coder/internal/issues/1227) Added support for license addons, starting with AI Governance, to enable dynamic feature grouping without requiring license reissuance. ### What changed? - Introduced a new `Addon` type to represent groupings of features that can be added to licenses - Created the first addon `AddonAIGovernance` which includes AI Bridge and Boundary features - Added validation for addon dependencies to ensure required features are present - Added new features: `FeatureBoundary` and `FeatureAIGovernanceUserLimit` - Updated license entitlement logic to handle addons and their features - Added helper methods to check if features belong to addons - Updated tests to verify addon functionality ### Why make this change? This change introduces a more flexible licensing model that allows features to be grouped into addons that can be added to licenses without requiring reissuance when new features are added to an addon. This is particularly useful for specialized feature sets like AI Governance, where related features can be bundled together and sold as a separate SKU. The addon approach allows for better organization of features and more granular control over entitlements.	2026-01-27 22:33:53 +11:00
Kacper Sawicki	78bc5861e0	feat(enterprise/coderd): add soft warning for AI Bridge GA transition (#21675 ) ## Summary AI Bridge is moving to General Availability in v2.30 and will require the AI Governance Add-On license in future versions. This adds a soft warning for deployments using AI Bridge via Premium/Enterprise FeatureSet without an explicit AI Bridge add-on license. Relates to: https://github.com/coder/internal/issues/1226 ## Changes - Track whether AI Bridge was explicitly granted via license Features (add-on) vs inherited from FeatureSet - Show soft warning when AI Bridge is enabled and entitled via FeatureSet but not via explicit add-on - Changed AI Bridge enablement from hardcoded `true` to check `CODER_AIBRIDGE_ENABLED` deployment config ## Behavior Change AI Bridge is now only marked as "enabled" in entitlements when `CODER_AIBRIDGE_ENABLED=true` is set in the deployment config. Previously, it was always enabled for Premium/Enterprise licenses regardless of the config setting. This change ensures that users who do not use AI Bridge will not see the soft warning about the upcoming license requirement. ## Warning Message > AI Bridge is now Generally Available in v2.30. In a future Coder version, your deployment will require the AI Governance Add-On to continue using this feature. Please reach out to your account team or sales@coder.com to learn more. ## Behavior \| Condition \| Warning Shown \| \|-----------\|---------------\| \| AI Bridge disabled \| ❌ No \| \| AI Bridge enabled + explicit add-on license \| ❌ No \| \| AI Bridge enabled + Premium/Enterprise FeatureSet (no add-on) \| ✅ Yes \| ## Screenshots ### 1. No license <img width="1708" height="577" alt="image" src="https://github.com/user-attachments/assets/cbdbfd4d-55de-4d70-8abf-2665f458e96f" /> ### 2. No license + CODER_AIBRIDGE_ENABLED=true <img width="1716" height="513" alt="image" src="https://github.com/user-attachments/assets/344aae76-7703-485f-b568-1f13a1efa48f" /> ### 3. Premium license + CODER_AIBRIDGE_ENABLED=false <img width="1687" height="389" alt="image" src="https://github.com/user-attachments/assets/c2be12b0-1c0f-438d-a293-f9ec9fe6a736" /> ### 4. Premium license + CODER_AIBRIDGE_ENABLED=true <img width="1707" height="525" alt="image" src="https://github.com/user-attachments/assets/1a4640e1-e656-4f9b-bed0-9390cb5d6a84" /> ## Notes - TODO comments added to mark code that should be removed when AI Bridge enforcement is added - Feature continues to work - this is just a transitional warning (soft enforcement)	2026-01-26 10:46:45 +01:00
Kacper Sawicki	b82693d4cc	feat(codersdk): revert "remove AI Bridge entitlement from Premium license" (#21653 ) Reverts coder/coder#21540	2026-01-23 15:58:12 +00:00
Susana Ferreira	f5858c8a18	fix: unregister metrics on reconciler stop to prevent panic on restart (#21647 ) ## Description Fixes a panic that occurs when the prebuilds feature is toggled by adding/removing a license. The `StoreReconciler` was not unregistering the `reconciliationDuration` histogram, causing a "duplicate metrics collector registration attempted" panic when a new reconciler was created. ## Changes * Unregister the `reconciliationDuration` histogram in `Stop()` alongside the existing metrics collector * Change log level when stopping the reconciler with a cause, since "entitlements change" is not an error condition * Add `TestReconcilerLifecycle` to verify the reconciler can be stopped and recreated with the same prometheus registry Related to internal slack thread: https://codercom.slack.com/archives/C07GRNNRW03/p1769116582171379	2026-01-23 14:45:27 +00:00
Kacper Sawicki	9843adb8c6	feat(codersdk): remove AI Bridge entitlement from Premium license (#21540 ) ## Summary AI Bridge is moving out of Premium as a separate add-on (GA in Feb 3). Closes https://github.com/coder/internal/issues/1226 ## Changes - Excludes `FeatureAIBridge` from `Enterprise()` and `FeatureSetPremium.Features()` - Adds soft warning for deployments with AI Bridge enabled but not entitled - Warning is displayed to Auditor/Owner roles in UI banner and CLI headers ## Warning Message When AI Bridge is enabled (`CODER_AIBRIDGE_ENABLED=true`) but the license doesn't include the entitlement: > AI Bridge has reached General Availability and your Coder deployment is not entitled to run this feature. Contact your account team (https://coder.com/contact) for information around getting a license with AI Bridge. ## Behavior - The feature remains usable in v2.30 (soft warning only) - Future versions may include hard enforcement	2026-01-23 13:48:27 +01:00
George K	d29a168785	fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620 ) The removal of that permission from the role broke valid use cases (e.g. a site owner user creating a workspace owned by a system account and then trying to share it with another user). The bulk of the PR is made up of the rollbacks of the previously introduced test updates necessitated by the removal. Related to: https://github.com/coder/internal/issues/1285	2026-01-22 08:12:15 -08:00
Mathias Fredriksson	97e8a5b093	fix(coderd): allow agent auth during workspace shutdown (#21538 ) Agents were losing authentication during workspace shutdown, causing shutdown scripts to fail. The auth query required agents to belong to the latest build, but during shutdown a `stop` build becomes latest while the `start` build's agents are still running. Modified the auth query to allow `start` build agents to authenticate temporarily during `stop` execution. The query allows auth when: - Agent's `start` build job succeeded - Latest build is `stop` with `pending`/`running` job status - Builds are adjacent (`stop` is `build_number + 1`) - Template versions match Auth closes once `stop` completes. Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to `GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the agent's build (not always latest) during shutdown. Closes coder/internal#1249 Fixes #19467	2026-01-21 13:18:43 +00:00
Susana Ferreira	6ef9670384	fix: limit concurrent database connections in prebuild reconciliation (#20908 ) ## Description This PR addresses database connection pool exhaustion during prebuilds reconciliation by introducing two changes: * `CanSkipReconciliation`: Filters out presets that don't need reconciliation before spawning goroutines. This ensures we only create goroutines for presets that will (_most likely_) perform database operations, avoiding unnecessary connection pool usage. * Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`). This replaces the previous hardcoded limit of 5, ensuring the reconciliation loop scales appropriately with the configured pool size while leaving capacity for other database operations. ## Changes * Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns true for inactive presets with no running workspaces, no pending jobs, or expired prebuilds. * Add `maxDBConnections` parameter to `NewStoreReconciler` and compute `reconciliationConcurrency` as half the pool size (minimum 1). * Add `ReconciliationConcurrency()` getter method to `StoreReconciler`. * Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent reconciliation goroutines. * Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for observability. * Add `TestCanSkipReconciliation` unit tests. * Add `TestReconciliationConcurrency` unit tests. * Add benchmark tests for reconciliation performance. ## Benchmarks * `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation actions. All presets are filtered by `CanSkipReconciliation`, resulting in no goroutines spawned and no database connections used. * `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all require reconciliation actions. All presets spawn goroutines, but concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`. * `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a large subset of inactive presets (filtered by `CanSkipReconciliation`) and a smaller subset requiring reconciliation (limited by `eg.SetLimit`). Closes: https://github.com/coder/coder/issues/20606	2026-01-21 10:56:31 +00:00
George K	0712faef4f	feat(enterprise): implement organization "disable workspace sharing" option (#21376 ) Adds a per-organization setting to disable workspace sharing. When enabled, all existing workspace ACLs in the organization are cleared and the workspace ACL mutation API endpoints return `403 Forbidden`. This complements the existing site-wide `--disable-workspace-sharing` flag by providing more granular control at the organization level. Closes https://github.com/coder/internal/issues/1073 (part 2) --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>	2026-01-14 09:47:50 -08:00
Susana Ferreira	000bc334c9	fix: reuse reconciliation lock transaction for read operations in prebuilds (#21408 ) ## Description Reuses the reconciliation lock transaction for read operations during prebuilds reconciliation, reducing unnecessary database connections. ## Changes * Use the lock transaction (`db`) for read operations and `c.store` for write operations: * `GetPrebuildsSettings`: now uses `db` * `SnapshotState`: now uses `db` * `MembershipReconciler`: continues to use `c.store` (performs write operations) * Add comments explaining the transaction model and when to use `db` vs `c.store` Related to: https://github.com/coder/coder/pull/20587	2026-01-13 15:04:51 +00:00
George K	cc2efe9e1f	feat(coderd/rbac): make organization-member a per-org system custom role (#21359 ) Migrated the built-in organization-member role to DB storage so it can be customized per org. Closes https://github.com/coder/internal/issues/1073 (part 1)	2026-01-12 18:19:19 -08:00
Zach	091d31224d	fix: replace moby/moby namesgenerator with internal implementation (#21377 ) Replace the external moby/moby/pkg/namesgenerator dependency with an internal implementation using gofakeit/v7. The moby package has ~25k unique name combinations, and with its retry parameter only adds a random digit 0-9, giving ~250k possibilities. In parallel tests, this has led to collisions (flakes). The new internal API at coderd/util/namesgenerator eliminates the external dependnecy and offers functions with explicit uniqueness guarantees. This PR also consolidates fragmented name generation in a few places to use the new package. \| Old (moby/moby) \| New \| \|-------------------------------------\|------------------------\| \| namesgenerator.GetRandomName(0) \| NameWith("_") \| \| namesgenerator.GetRandomName(>0) \| NameDigitWith("_") \| \| testutil.GetRandomName(t) \| UniqueName() \| \| testutil.GetRandomNameHyphenated(t) \| UniqueNameWith("-") \| namesgenerator package API: - NameWith(delim): random name, not unique - NameDigitWith(delim): random name with 1-9 suffix, not unique - UniqueName(): guaranteed unique via atomic counter - UniqueNameWith(delim): unique with custom delimiter Names continue to be docker style `[adjective][delim][surname]`. Unique names are truncated to 32 characters (preserving the numeric suffix) to fit common name length limits in Coder. Related test flakes: https://github.com/coder/internal/issues/1212 https://github.com/coder/internal/issues/118 https://github.com/coder/internal/issues/1068	2026-01-09 15:40:26 -07:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Sas Swart	9a0024c45f	chore: add tracing to prebuilds (#21443 ) The implementation for prebuilt workspaces is complex and conversations regarding edge cases and bugs frequently get bogged down by minutiae, because it's hard to reason about the behaviour of the system. To alleviate this, I've introduced otel tracing to the StoreReconciler (see attached). We can now directly observe the behaviour of the prebuilds system under load in order to inform our decisions. Traces are terminated at the boundary between prebuilds and workspace builder, because of prebuilt workspaces' "fire and forget" philosophy and to prevent span explosion. <img width="3024" height="1718" alt="image" src="https://github.com/user-attachments/assets/f9b207be-8f2c-475e-98a8-46ef70bda446" />	2026-01-07 11:04:40 +02:00
Danielle Maywood	467c8bbd6b	fix: prevent notification for dormant delete on dormant-removal (#21427 ) Ensure we do not send "Marked for deletion" notifications when disabling dormancy deletion	2026-01-06 16:26:28 +00:00
Danny Kopping	733b6b7db9	feat: add API to serve proxy certificate (#21391 ) Closes https://github.com/coder/internal/issues/1184	2025-12-29 18:00:06 +00:00
Danny Kopping	a173c38715	chore: remove experimental endpoints (#21390 ) This should've been removed when we cut the Beta release, but we missed it. Adding as a drive-by.	2025-12-29 16:17:46 +00:00
Steven Masley	35f1c44455	test: fix path seperator on windows for unit test (#21382 ) Test TestWorkspaceTemplateParamsChange writes a file to disk Closes https://github.com/coder/internal/issues/1213	2025-12-23 15:13:16 +00:00
Steven Masley	61d7d2983f	fix: remove state information from apply (#21373 ) Delete builds were not deleting resources as the tf state being sent in the apply request was empty. State removed from apply request and read from the session instead.	2025-12-22 16:18:53 +00:00
Cian Johnston	f1b930b190	chore(enterprise): increase coverage of TestWorkspaceBuild (#21304 ) Relates to #20925 This PR expands the test coverage of `enterprise/coderd/TestWorkspaceBuild` to also exercise the `postWorkspaceBuilds` handler. Previously, it only exercised the `createWorkspace` handler.	2025-12-17 17:28:38 +00:00
Spike Curtis	bd753d9cb9	fix: mark users seen when activating on login (#21305 ) fixes #21303 Update user last_seen_at when we mark them active on login. This prevents a narrow race where they can be re-marked dormant and fail to log in.	2025-12-17 16:49:40 +04:00
Steven Masley	3194bcfc9e	chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064 ) Provisioner steps broken into smaller granular actions. Changes: - `ExtractArchive` moved to `init` request (was in `configure`) - Writing `tfstate` moved to `plan` (was in `configure`) - Moved most plan/apply outputs to `GraphComplete`	2025-12-15 11:26:41 -06:00
George K	103967ed02	feat: add sharing info to /workspaces endpoint (#21049 ) closes: https://github.com/coder/internal/issues/858 Similar to https://github.com/coder/coder/pull/19375, this one uses system permissions for fetching actual user and group data. Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it. Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g: `ALTER TABLE workspaces ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');` Also adds missing `OwnerName` in `ConvertWorkspaceRows`.	2025-12-15 08:42:08 -08:00
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
Dean Sheather	b199eb1c38	fix: allow stops and deletes after breaching AI limit (#21186 ) Fixes a bug a customer encountered once they breached their limit. Adds a test.	2025-12-09 11:05:12 +00:00
blinkagent[bot]	b4be5bcfed	docs: fix swagger tags for license endpoints (#21101 ) ## Summary Change `@Tags` from `Organizations` to `Enterprise` for `POST /licenses` and `POST /licenses/refresh-entitlements` to match the `GET` and `DELETE` license endpoints which are already tagged as `Enterprise`. ## Problem The license API endpoints were inconsistently tagged in the swagger annotations: - `GET /licenses` → `Enterprise` ✓ - `DELETE /licenses/{id}` → `Enterprise` ✓ - `POST /licenses` → `Organizations` ✗ - `POST /licenses/refresh-entitlements` → `Organizations` ✗ This caused the POST endpoints to be documented in the [Organizations API docs](https://coder.com/docs/reference/api/organizations) instead of the [Enterprise API docs](https://coder.com/docs/reference/api/enterprise) where the other license endpoints live. ## Fix Simply updated the `@Tags` annotation from `Organizations` to `Enterprise` for both POST endpoints. This was an oversight from the original swagger docs addition in #5625 (January 2023). Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-12-05 15:27:22 +00:00
Marcin Tojek	9c7135a61d	chore: add license check for prebuilds (#20947 ) Related: https://github.com/coder/coder/pull/20864	2025-11-26 15:00:07 +01:00
Danielle Maywood	e7dbbcde87	fix: do not notify marked for deletion for deleted workspaces (#20937 ) Closes https://github.com/coder/coder/issues/20913 I've ran the test without the fix, verified the test caught the issue, then applied the fix, and confirmed the issue no longer happens. --- 🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code and then review by a human 👩	2025-11-26 09:23:16 +00:00
Danielle Maywood	c12303f0b2	fix: allow agents to be created on dormant workspaces (#20909 ) Closes https://github.com/coder/coder/issues/20711 We now allow agents to be created on dormant workspaces. I've ran the test with and without the change. I've confirmed that - without the fix - it triggers the "rbac: unauthorized" error.	2025-11-25 06:24:33 +00:00
Jake Howell	ca560d36ce	fix: remove inflight interceptions from aibridge returned values (#20852 ) Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54) When querying against the values in the database for `/api/experimental/aibridge/interceptions` we found strange behaviour wherein there was interceptions that lacked prompting and other various fields we want. Generally this was as a result of the data not actually existing for these values (as they were inflight). The simple solution to this was to hide them if they didn't exist. This PR addresses that. --------- Co-authored-by: Danny Kopping <danny@coder.com>	2025-11-25 10:23:39 +11:00
Atif Ali	636408906f	chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831 )	2025-11-24 18:09:04 +05:00
Marcin Tojek	d004710a74	feat: add prebuild invalidation via last_invalidated_at timestamp (#20582 ) Updates #17917	2025-11-20 17:12:25 +01:00
blinkagent[bot]	48b8e22502	fix: add Windows stub for CacheTFProviders (#20840 ) Fixes https://github.com/coder/internal/issues/1119 ## Description The `CacheTFProviders` function in `testutil/terraform_cache.go` was only available on Linux and macOS due to the `//go:build linux \|\| darwin` build tag. This caused a compile error on Windows when `enterprise/coderd/workspaces_test.go` tried to call it: ``` enterprise\coderd\workspaces_test.go:3403:28: undefined: testutil.CacheTFProviders ``` ## Changes 1. Added `testutil/terraform_cache_windows.go` with a Windows-specific stub implementation that returns an empty string 2. Updated `downloadProviders` helper in `enterprise/coderd/workspaces_test.go` to handle empty paths gracefully ## Behavior - On Linux/macOS: Terraform providers are cached as before - On Windows: Provider caching is skipped, tests download providers normally during `terraform init` ## Testing This should fix the Windows nightly gauntlet failure. The test will still run on Windows, just without provider caching optimization. Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-11-20 07:52:07 +00:00
Steven Masley	04727c06e8	chore: add experiment toggle for terraform workspace caching (#20559 ) Experiments passed to provisioners to determine behavior. This adds `--experiments` flag to provisioner daemons. Prior to this, provisioners had no method to turn on/off experiments.	2025-11-12 14:26:15 -06:00
Susana Ferreira	ca94588bd5	fix: send prebuild job notification after job build db commit (#20693 ) ## Problem Fix race condition in prebuilds reconciler. Previously, a job notification event was sent to a Go channel before the provisioning database transaction completed. The notification is consumed by a separate goroutine that publishes to PostgreSQL's LISTEN/NOTIFY, using a separate database connection. This creates a potential race: if a provisioner daemon receives the notification and queries for the job before the provisioning transaction commits, it won't find the job in the database. This manifested as a flaky test failure in `TestReinitializeAgent`, where provisioners would occasionally miss newly created jobs. The test uses a 25-second timeout context, while the acquirer's backup polling mechanism checks for jobs every 30 seconds. This made the race condition visible in tests, though in production the backup polling would eventually pick up the job. The solution presented here guarantees that a job notification is only sent after the provisioning database transaction commits. ## Changes * The `provision()` and `provisionDelete()` functions now return the provisioner job instead of sending notifications internally. * A new `publishProvisionerJob()` helper centralizes the notification logic and is called after each transaction completes. Closes: https://github.com/coder/internal/issues/963	2025-11-12 10:36:39 +00:00
Danny Kopping	04f809f2d0	chore!: allow coder MCP tools to not be injected (#20713 ) Currently, when AI Bridge is enabled AND the `oauth2` and `mcp-server-http` experiments are enabled we inject Coder's MCP tools into all intercepted AI Bridge requests. This PR introduces a config to control this behaviour. NOTE: this is a backwards-incompatible change; previously these tools would be injected automatically, now this setting will need to be explicitly enabled. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-12 11:23:01 +02:00
Kacper Sawicki	f543a87b78	chore: cache terraform providers for workspaces terraform tests (#20603 ) Fixes flaky `TestWorkspaceTagsTerraform` and `TestWorkspaceTemplateParamsChange` tests that were failing with `connection reset by peer` errors when downloading the coder/coder provider. This applies the same caching solution which was done in https://github.com/coder/coder/pull/17373 1. Extracts provider caching logic into `testutil/terraform_cache.go` 2. Updates TestProvision to use the shared caching helpers 3. Updates enterprise workspace tests to use the shared caching helpers The cache is persisted at `~/.cache/coderv2-test/` and automatically cached between CI runs via existing GitHub Actions cache setup. Closes https://github.com/coder/internal/issues/607	2025-11-12 08:43:22 +00:00
Paweł Banaszewski	991831b1dd	chore: add API key ID to interceptions (#20513 ) Adds APIKeyID to interceptions. Needed for tracking API key usage with bridge. fixes https://github.com/coder/coder/issues/20001	2025-11-10 13:46:41 +01:00
Dean Sheather	b3f651d62f	chore: change managed agent limit (#20540 )	2025-11-05 00:46:27 +11:00
Danny Kopping	ff532d9bf3	chore: handle deprecated `aibridge` experimental routes (#20565 ) In v2.28 we're [removing the aibridge experiment](https://github.com/coder/coder/pull/20544). We need to handle `/api/experimental/aibridge/*` until Beta (next release). Signed-off-by: Danny Kopping <danny@coder.com>	2025-10-29 19:11:34 -06:00
Susana Ferreira	7e8fcb4b0f	perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493 ) ## Description The membership reconciliation ensures the prebuilds system user is a member of all organizations with prebuilds configured. To support prebuilds quota management, each organization must have a prebuilds group that the system user belongs to. ## Problem Previously, membership reconciliation iterated over all presets to check and update membership status. This meant database queries `GetGroupByOrgAndName` and `InsertGroupMember` were executed for each preset. Since presets are unique combinations of `(organization, template, template version, preset)`, this resulted in several redundant checks for the same organization. In dogfood, `InsertGroupMember` was called thousands of times per day, even though memberships were already configured ([internal Grafana dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1)) <img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36" src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db" /> ## Solution This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query that returns: * All unique organizations with prebuilds configured * Whether the prebuilds user is a member of each organization * Whether the prebuilds group exists in each organization * Whether the prebuilds user is in the prebuilds group The membership reconciliation logic now: * Fetches status for all organizations in one query * Only performs inserts for organizations missing required memberships or groups * Safely handles concurrent operations via unique constraint violations * This reduces database load from `O(presets)` to `O(organizations)` per reconciliation loop, with a single read query when everything is configured. ## Changes * Add `GetOrganizationsWithPrebuildStatus` SQL query * Update `membership.ReconcileAll` to use organization-based reconciliation instead of preset-based * Update tests to reflect new behavior Related to internal thread: https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369	2025-10-29 14:24:29 +00:00
Danny Kopping	b20fd6f2c1	chore: graduate aibridge API out of experimental (#20523 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:18:54 -06:00
Danny Kopping	2294c55bd9	chore: graduate `aibridged*` packages out of experimental (#20522 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:00:24 -06:00
Susana Ferreira	aad1b401c1	feat: add prebuilds reconciliation duration metric (#20535 ) ## Description Adds `coderd_prebuilds_reconciliation_duration_seconds` histogram metric to track the duration of each prebuilds reconciliation cycle. This metric helps operators monitor reconciliation performance and identify potential bottlenecks. ## Changes - Added `ReconcileStats` struct to capture reconciliation cycle statistics - Updated `ReconcileAll()` to return stats including elapsed time - Added histogram metric `coderd_prebuilds_reconciliation_duration_seconds`	2025-10-29 12:52:30 +00:00

1 2 3 4 5 ...

715 Commits