Commit Graph

715 Commits

Author SHA1 Message Date
Marcin Tojek 04b0253e8a feat: add Prometheus metrics for license warnings and errors (#21749)
Fixes: coder/internal#767

Adds two new Prometheus metrics for license health monitoring:

- `coderd_license_warnings` - count of active license warnings
- `coderd_license_errors` - count of active license errors

Metrics endpoint after startup of a deployment with license enabled:

```
...
# HELP coderd_license_errors The number of active license errors.
# TYPE coderd_license_errors gauge
coderd_license_errors 0
...
# HELP coderd_license_warnings The number of active license warnings.
# TYPE coderd_license_warnings gauge
coderd_license_warnings 0
...
```
2026-01-29 13:50:15 +01:00
Cian Johnston c2c225052a chore(enterprise/coderd): ensure TestManagedAgentLimit differentiates between tasks and workspaces (#21731)
My previous change to this test did not create another **workspace**
using the template containing `coder_ai_task` resources, meaning that
this test was not actually testing the right thing. This PR addresses
this oversight.
2026-01-28 16:30:56 +00:00
Zach 2204731ddb feat: implement boundary usage tracker and telemetry collection (#21716)
Implements telemetry for boundary usage tracking across all Coder
replicas and reports them via telemetry.

Changes:
- Implement Tracker with Track(), FlushToDB(), and StartFlushLoop() methods
- Add telemetry integration via collectBoundaryUsageSummary()
- Use telemetry lock to ensure only one replica collects per period

The tracker accumulates unique workspaces, unique users, and request
counts (allowed/denied) in memory, then flushes to the database
periodically. During telemetry collection, stats are aggregated across
all replicas and reset for the next period.
2026-01-27 19:11:40 -07:00
Steven Masley 799b190dee fix: do not enforce managed agent limit for non-task workspaces (#21689)
Only task workspaces have the checks in wsbuilder for violating the
managed agent caps in the license.

Stopped tasks that are resumed with a regular workspace start **still
count as usage**.
2026-01-27 19:01:17 -06:00
Cian Johnston 7b44976618 fix(coderd/provisionerdserver): correct managed agent tracking (#21696)
Relates to https://github.com/coder/internal/issues/1282

Updates tracking of managed agents to be predicated instead on the
presence of a related `task_id` instead of the presence of a
`coder_ai_task` resource.
2026-01-27 12:14:52 +00:00
Jake Howell 6f15b178a4 feat: extend premium license for aigovernance (#21499)
Closes [#1227](https://github.com/coder/internal/issues/1227)

Added support for license addons, starting with AI Governance, to enable
dynamic feature grouping without requiring license reissuance.

### What changed?

- Introduced a new `Addon` type to represent groupings of features that
can be added to licenses
- Created the first addon `AddonAIGovernance` which includes AI Bridge
and Boundary features
- Added validation for addon dependencies to ensure required features
are present
- Added new features: `FeatureBoundary` and
`FeatureAIGovernanceUserLimit`
- Updated license entitlement logic to handle addons and their features
- Added helper methods to check if features belong to addons
- Updated tests to verify addon functionality

### Why make this change?

This change introduces a more flexible licensing model that allows
features to be grouped into addons that can be added to licenses without
requiring reissuance when new features are added to an addon. This is
particularly useful for specialized feature sets like AI Governance,
where related features can be bundled together and sold as a separate
SKU. The addon approach allows for better organization of features and
more granular control over entitlements.
2026-01-27 22:33:53 +11:00
Kacper Sawicki 78bc5861e0 feat(enterprise/coderd): add soft warning for AI Bridge GA transition (#21675)
## Summary

AI Bridge is moving to General Availability in v2.30 and will require
the AI Governance Add-On license in future versions. This adds a soft
warning for deployments using AI Bridge via Premium/Enterprise
FeatureSet without an explicit AI Bridge add-on license.

Relates to: https://github.com/coder/internal/issues/1226

## Changes

- Track whether AI Bridge was explicitly granted via license Features
(add-on) vs inherited from FeatureSet
- Show soft warning when AI Bridge is enabled and entitled via
FeatureSet but not via explicit add-on
- Changed AI Bridge enablement from hardcoded `true` to check
`CODER_AIBRIDGE_ENABLED` deployment config

## Behavior Change

AI Bridge is now only marked as "enabled" in entitlements when
`CODER_AIBRIDGE_ENABLED=true` is set in the deployment config.
Previously, it was always enabled for Premium/Enterprise licenses
regardless of the config setting.

This change ensures that users who do not use AI Bridge will not see the
soft warning about the upcoming license requirement.

## Warning Message

> AI Bridge is now Generally Available in v2.30. In a future Coder
version, your deployment will require the AI Governance Add-On to
continue using this feature. Please reach out to your account team or
sales@coder.com to learn more.

## Behavior

| Condition | Warning Shown |
|-----------|---------------|
| AI Bridge disabled |  No |
| AI Bridge enabled + explicit add-on license |  No |
| AI Bridge enabled + Premium/Enterprise FeatureSet (no add-on) |  Yes
|

## Screenshots

### 1. No license
<img width="1708" height="577" alt="image"
src="https://github.com/user-attachments/assets/cbdbfd4d-55de-4d70-8abf-2665f458e96f"
/>

### 2. No license + CODER_AIBRIDGE_ENABLED=true
<img width="1716" height="513" alt="image"
src="https://github.com/user-attachments/assets/344aae76-7703-485f-b568-1f13a1efa48f"
/>

### 3. Premium license + CODER_AIBRIDGE_ENABLED=false
<img width="1687" height="389" alt="image"
src="https://github.com/user-attachments/assets/c2be12b0-1c0f-438d-a293-f9ec9fe6a736"
/>

### 4. Premium license + CODER_AIBRIDGE_ENABLED=true
<img width="1707" height="525" alt="image"
src="https://github.com/user-attachments/assets/1a4640e1-e656-4f9b-bed0-9390cb5d6a84"
/>

## Notes

- TODO comments added to mark code that should be removed when AI Bridge
enforcement is added
- Feature continues to work - this is just a transitional warning (soft
enforcement)
2026-01-26 10:46:45 +01:00
Kacper Sawicki b82693d4cc feat(codersdk): revert "remove AI Bridge entitlement from Premium license" (#21653)
Reverts coder/coder#21540
2026-01-23 15:58:12 +00:00
Susana Ferreira f5858c8a18 fix: unregister metrics on reconciler stop to prevent panic on restart (#21647)
## Description

Fixes a panic that occurs when the prebuilds feature is toggled by
adding/removing a license. The `StoreReconciler` was not unregistering
the `reconciliationDuration` histogram, causing a "duplicate metrics
collector registration attempted" panic when a new reconciler was
created.

## Changes

* Unregister the `reconciliationDuration` histogram in `Stop()`
alongside the existing metrics collector
* Change log level when stopping the reconciler with a cause, since
"entitlements change" is not an error condition
* Add `TestReconcilerLifecycle` to verify the reconciler can be stopped
and recreated with the same prometheus registry

Related to internal slack thread:
https://codercom.slack.com/archives/C07GRNNRW03/p1769116582171379
2026-01-23 14:45:27 +00:00
Kacper Sawicki 9843adb8c6 feat(codersdk): remove AI Bridge entitlement from Premium license (#21540)
## Summary

AI Bridge is moving out of Premium as a separate add-on (GA in Feb 3).

Closes https://github.com/coder/internal/issues/1226

## Changes

- Excludes `FeatureAIBridge` from `Enterprise()` and
`FeatureSetPremium.Features()`
- Adds soft warning for deployments with AI Bridge enabled but not
entitled
- Warning is displayed to Auditor/Owner roles in UI banner and CLI
headers

## Warning Message

When AI Bridge is enabled (`CODER_AIBRIDGE_ENABLED=true`) but the
license doesn't include the entitlement:

> AI Bridge has reached General Availability and your Coder deployment
is not entitled to run this feature. Contact your account team
(https://coder.com/contact) for information around getting a license
with AI Bridge.

## Behavior

- The feature remains usable in v2.30 (soft warning only)
- Future versions may include hard enforcement
2026-01-23 13:48:27 +01:00
George K d29a168785 fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620)
The removal of that permission from the role broke valid use cases (e.g.
a site owner user creating a workspace owned by a system account and
then trying to share it with another user).

The bulk of the PR is made up of the rollbacks of the previously
introduced test updates necessitated by the removal.

Related to: https://github.com/coder/internal/issues/1285
2026-01-22 08:12:15 -08:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Susana Ferreira 6ef9670384 fix: limit concurrent database connections in prebuild reconciliation (#20908)
## Description

This PR addresses database connection pool exhaustion during prebuilds
reconciliation by introducing two changes:
* `CanSkipReconciliation`: Filters out presets that don't need
reconciliation before spawning goroutines. This ensures we only create
goroutines for presets that will (_most likely_) perform database
operations, avoiding unnecessary connection pool usage.
* Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the
configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`).
This replaces the previous hardcoded limit of 5, ensuring the
reconciliation loop scales appropriately with the configured pool size
while leaving capacity for other database operations.

## Changes

* Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns
true for inactive presets with no running workspaces, no pending jobs,
or expired prebuilds.
* Add `maxDBConnections` parameter to `NewStoreReconciler` and compute
`reconciliationConcurrency` as half the pool size (minimum 1).
* Add `ReconciliationConcurrency()` getter method to `StoreReconciler`.
* Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent
reconciliation goroutines.
* Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for
observability.
* Add `TestCanSkipReconciliation` unit tests.
* Add `TestReconciliationConcurrency` unit tests.
* Add benchmark tests for reconciliation performance.

## Benchmarks

* `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation
actions. All presets are filtered by `CanSkipReconciliation`, resulting
in no goroutines spawned and no database connections used.
* `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all
require reconciliation actions. All presets spawn goroutines, but
concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`.
* `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a
large subset of inactive presets (filtered by `CanSkipReconciliation`)
and a smaller subset requiring reconciliation (limited by
`eg.SetLimit`).

Closes: https://github.com/coder/coder/issues/20606
2026-01-21 10:56:31 +00:00
George K 0712faef4f feat(enterprise): implement organization "disable workspace sharing" option (#21376)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.

This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.

Closes https://github.com/coder/internal/issues/1073 (part 2)

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-14 09:47:50 -08:00
Susana Ferreira 000bc334c9 fix: reuse reconciliation lock transaction for read operations in prebuilds (#21408)
## Description

Reuses the reconciliation lock transaction for read operations during
prebuilds reconciliation, reducing unnecessary database connections.

## Changes

* Use the lock transaction (`db`) for read operations and `c.store` for
write operations:
  * `GetPrebuildsSettings`: now uses `db`
  * `SnapshotState`: now uses `db`
* `MembershipReconciler`: continues to use `c.store` (performs write
operations)
* Add comments explaining the transaction model and when to use `db` vs
`c.store`

Related to: https://github.com/coder/coder/pull/20587
2026-01-13 15:04:51 +00:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Zach 091d31224d fix: replace moby/moby namesgenerator with internal implementation (#21377)
Replace the external moby/moby/pkg/namesgenerator dependency with an
internal implementation using gofakeit/v7. The moby package has ~25k
unique name combinations, and with its retry parameter only adds a
random digit 0-9, giving ~250k possibilities. In parallel tests, this
has led to collisions (flakes).

The new internal API at coderd/util/namesgenerator eliminates the
external dependnecy and offers functions with explicit uniqueness
guarantees. This PR also consolidates fragmented name generation in a
few places to use the new package.

| Old (moby/moby)                     | New                    |
|-------------------------------------|------------------------|
| namesgenerator.GetRandomName(0)     | NameWith("_")          |
| namesgenerator.GetRandomName(>0)    | NameDigitWith("_")     |
| testutil.GetRandomName(t)           | UniqueName()           |
| testutil.GetRandomNameHyphenated(t) | UniqueNameWith("-")    |

namesgenerator package API:
- NameWith(delim): random name, not unique
- NameDigitWith(delim): random name with 1-9 suffix, not unique
- UniqueName(): guaranteed unique via atomic counter
- UniqueNameWith(delim): unique with custom delimiter

Names continue to be docker style `[adjective][delim][surname]`. Unique
names are truncated to 32 characters (preserving the numeric suffix) to
fit common name length limits in Coder.

Related test flakes:
https://github.com/coder/internal/issues/1212
https://github.com/coder/internal/issues/118
https://github.com/coder/internal/issues/1068
2026-01-09 15:40:26 -07:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Sas Swart 9a0024c45f chore: add tracing to prebuilds (#21443)
The implementation for prebuilt workspaces is complex and conversations
regarding edge cases and bugs frequently get bogged down by minutiae,
because it's hard to reason about the behaviour of the system.

To alleviate this, I've introduced otel tracing to the StoreReconciler
(see attached). We can now directly observe the behaviour of the
prebuilds system under load in order to inform our decisions.

Traces are terminated at the boundary between prebuilds and workspace
builder, because of prebuilt workspaces' "fire and forget" philosophy
and to prevent span explosion.

<img width="3024" height="1718" alt="image"
src="https://github.com/user-attachments/assets/f9b207be-8f2c-475e-98a8-46ef70bda446"
/>
2026-01-07 11:04:40 +02:00
Danielle Maywood 467c8bbd6b fix: prevent notification for dormant delete on dormant-removal (#21427)
Ensure we do not send "Marked for deletion" notifications when disabling dormancy deletion
2026-01-06 16:26:28 +00:00
Danny Kopping 733b6b7db9 feat: add API to serve proxy certificate (#21391)
Closes https://github.com/coder/internal/issues/1184
2025-12-29 18:00:06 +00:00
Danny Kopping a173c38715 chore: remove experimental endpoints (#21390)
This should've been removed when we cut the Beta release, but we missed it. Adding as a drive-by.
2025-12-29 16:17:46 +00:00
Steven Masley 35f1c44455 test: fix path seperator on windows for unit test (#21382)
Test TestWorkspaceTemplateParamsChange writes a file to disk

Closes https://github.com/coder/internal/issues/1213
2025-12-23 15:13:16 +00:00
Steven Masley 61d7d2983f fix: remove state information from apply (#21373)
Delete builds were not deleting resources as the tf state being sent in the apply request was empty.
State removed from apply request and read from the session instead.
2025-12-22 16:18:53 +00:00
Cian Johnston f1b930b190 chore(enterprise): increase coverage of TestWorkspaceBuild (#21304)
Relates to #20925 
This PR expands the test coverage of `enterprise/coderd/TestWorkspaceBuild` to also exercise the `postWorkspaceBuilds` handler. Previously, it only exercised the `createWorkspace` handler.
2025-12-17 17:28:38 +00:00
Spike Curtis bd753d9cb9 fix: mark users seen when activating on login (#21305)
fixes #21303

Update user last_seen_at when we mark them active on login. This prevents a narrow race where they can be re-marked dormant and fail to log in.
2025-12-17 16:49:40 +04:00
Steven Masley 3194bcfc9e chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064)
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
2025-12-15 11:26:41 -06:00
George K 103967ed02 feat: add sharing info to /workspaces endpoint (#21049)
closes: https://github.com/coder/internal/issues/858

Similar to https://github.com/coder/coder/pull/19375, this one uses
system permissions for fetching actual user and group data.

Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it.  

Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g:

`ALTER TABLE workspaces
   ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');`

Also adds missing  `OwnerName` in `ConvertWorkspaceRows`.
2025-12-15 08:42:08 -08:00
Kacper Sawicki 6f86f67754 feat(coderd): add overload protection with rate limiting and concurrency control (#21161)
## Summary

This adds configurable overload protection to the AI Bridge daemon to
prevent the server from being overwhelmed during periods of high load.

Partially addresses coder/internal#1153 (rate limits and concurrency
control; circuit breakers are deferred to a follow-up).

## New Configuration Options

| Option | Environment Variable | Description | Default |
|--------|---------------------|-------------|---------|
| `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` |
Maximum number of concurrent AI Bridge requests. Set to 0 to disable
(unlimited). | `0` |
| `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number
of AI Bridge requests per second. Set to 0 to disable rate limiting. |
`0` |

## Behavior

When limits are exceeded:
- **Concurrency limit**: Returns HTTP `503 Service Unavailable` with
message "AI Bridge is currently at capacity. Please try again later."
- **Rate limit**: Returns HTTP `429 Too Many Requests` with
`Retry-After` header.

Both protections are optional and disabled by default (0 values).

## Implementation

The overload protection is implemented as reusable middleware in
`coderd/httpmw/ratelimit.go`:

1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses
`APITokenFromRequest` to extract the authentication token, with fallback
to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic).
Falls back to IP-based rate limiting if no token is present. Includes
`Retry-After` header for backpressure signaling.
2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight
requests and reject when at capacity.

The middleware is applied in `enterprise/coderd/aibridge.go` via
`r.Group` in the following order:
1. Concurrency check (faster rejection for load shedding)
2. Rate limit check

**Note**: Rate limiting currently applies to all AI Bridge requests,
including pass-through requests. Ideally only actual interceptions
should count, but this would require changes in the aibridge library.

## Testing

Added comprehensive tests for:
- Rate limiting by auth token (Bearer token, X-Api-Key, no token
fallback to IP)
- Different tokens not rate limited against each other
- Disabled when limit is zero
- Retry-After header is set on 429 responses
- Concurrency limiting (allows within limit, rejects over limit,
disabled when zero)
2025-12-11 16:38:54 +01:00
Dean Sheather b199eb1c38 fix: allow stops and deletes after breaching AI limit (#21186)
Fixes a bug a customer encountered once they breached their limit. Adds
a test.
2025-12-09 11:05:12 +00:00
blinkagent[bot] b4be5bcfed docs: fix swagger tags for license endpoints (#21101)
## Summary

Change `@Tags` from `Organizations` to `Enterprise` for `POST /licenses`
and `POST /licenses/refresh-entitlements` to match the `GET` and
`DELETE` license endpoints which are already tagged as `Enterprise`.

## Problem

The license API endpoints were inconsistently tagged in the swagger
annotations:
- `GET /licenses` → `Enterprise` ✓
- `DELETE /licenses/{id}` → `Enterprise` ✓
- `POST /licenses` → `Organizations` ✗
- `POST /licenses/refresh-entitlements` → `Organizations` ✗

This caused the POST endpoints to be documented in the [Organizations
API docs](https://coder.com/docs/reference/api/organizations) instead of
the [Enterprise API
docs](https://coder.com/docs/reference/api/enterprise) where the other
license endpoints live.

## Fix

Simply updated the `@Tags` annotation from `Organizations` to
`Enterprise` for both POST endpoints.

This was an oversight from the original swagger docs addition in #5625
(January 2023).

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
2025-12-05 15:27:22 +00:00
Marcin Tojek 9c7135a61d chore: add license check for prebuilds (#20947)
Related: https://github.com/coder/coder/pull/20864
2025-11-26 15:00:07 +01:00
Danielle Maywood e7dbbcde87 fix: do not notify marked for deletion for deleted workspaces (#20937)
Closes https://github.com/coder/coder/issues/20913

I've ran the test without the fix, verified the test caught the issue,
then applied the fix, and confirmed the issue no longer happens.

---

🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code
and then review by a human 👩
2025-11-26 09:23:16 +00:00
Danielle Maywood c12303f0b2 fix: allow agents to be created on dormant workspaces (#20909)
Closes https://github.com/coder/coder/issues/20711

We now allow agents to be created on dormant workspaces.

I've ran the test with and without the change. I've confirmed that -
without the fix - it triggers the "rbac: unauthorized" error.
2025-11-25 06:24:33 +00:00
Jake Howell ca560d36ce fix: remove inflight interceptions from aibridge returned values (#20852)
Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54)

When querying against the values in the database for
`/api/experimental/aibridge/interceptions` we found strange behaviour
wherein there was interceptions that lacked prompting and other various
fields we want. Generally this was as a result of the data not actually
existing for these values (as they were inflight).

The simple solution to this was to hide them if they didn't exist. This
PR addresses that.

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2025-11-25 10:23:39 +11:00
Atif Ali 636408906f chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831) 2025-11-24 18:09:04 +05:00
Marcin Tojek d004710a74 feat: add prebuild invalidation via last_invalidated_at timestamp (#20582)
Updates #17917
2025-11-20 17:12:25 +01:00
blinkagent[bot] 48b8e22502 fix: add Windows stub for CacheTFProviders (#20840)
Fixes https://github.com/coder/internal/issues/1119

## Description

The `CacheTFProviders` function in `testutil/terraform_cache.go` was
only available on Linux and macOS due to the `//go:build linux ||
darwin` build tag. This caused a compile error on Windows when
`enterprise/coderd/workspaces_test.go` tried to call it:

```
enterprise\coderd\workspaces_test.go:3403:28: undefined: testutil.CacheTFProviders
```

## Changes

1. Added `testutil/terraform_cache_windows.go` with a Windows-specific
stub implementation that returns an empty string
2. Updated `downloadProviders` helper in
`enterprise/coderd/workspaces_test.go` to handle empty paths gracefully

## Behavior

- On Linux/macOS: Terraform providers are cached as before
- On Windows: Provider caching is skipped, tests download providers
normally during `terraform init`

## Testing

This should fix the Windows nightly gauntlet failure. The test will
still run on Windows, just without provider caching optimization.

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
2025-11-20 07:52:07 +00:00
Steven Masley 04727c06e8 chore: add experiment toggle for terraform workspace caching (#20559)
Experiments passed to provisioners to determine behavior. This adds
`--experiments` flag to provisioner daemons. Prior to this, provisioners
had no method to turn on/off experiments.
2025-11-12 14:26:15 -06:00
Susana Ferreira ca94588bd5 fix: send prebuild job notification after job build db commit (#20693)
## Problem

Fix race condition in prebuilds reconciler. Previously, a job
notification event was sent to a Go channel before the provisioning
database transaction completed. The notification is consumed by a
separate goroutine that publishes to PostgreSQL's LISTEN/NOTIFY, using a
separate database connection. This creates a potential race: if a
provisioner daemon receives the notification and queries for the job
before the provisioning transaction commits, it won't find the job in
the database.

This manifested as a flaky test failure in `TestReinitializeAgent`,
where provisioners would occasionally miss newly created jobs. The test
uses a 25-second timeout context, while the acquirer's backup polling
mechanism checks for jobs every 30 seconds. This made the race condition
visible in tests, though in production the backup polling would
eventually pick up the job. The solution presented here guarantees that
a job notification is only sent after the provisioning database
transaction commits.

## Changes

* The `provision()` and `provisionDelete()` functions now return the
provisioner job instead of sending notifications internally.
* A new `publishProvisionerJob()` helper centralizes the notification
logic and is called after each transaction completes.

Closes: https://github.com/coder/internal/issues/963
2025-11-12 10:36:39 +00:00
Danny Kopping 04f809f2d0 chore!: allow coder MCP tools to not be injected (#20713)
Currently, when AI Bridge is enabled AND the `oauth2` and
`mcp-server-http` experiments are enabled we inject Coder's MCP tools
into all intercepted AI Bridge requests.

This PR introduces a config to control this behaviour.

**NOTE:** this is a backwards-incompatible change; previously these
tools would be injected automatically, now this setting will need to be
explicitly enabled.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-12 11:23:01 +02:00
Kacper Sawicki f543a87b78 chore: cache terraform providers for workspaces terraform tests (#20603)
Fixes flaky `TestWorkspaceTagsTerraform` and
`TestWorkspaceTemplateParamsChange` tests that were failing with
`connection reset by peer` errors when downloading the coder/coder
provider.

This applies the same caching solution which was done in
https://github.com/coder/coder/pull/17373

1. Extracts provider caching logic into `testutil/terraform_cache.go`
2. Updates TestProvision to use the shared caching helpers
3. Updates enterprise workspace tests to use the shared caching helpers

The cache is persisted at `~/.cache/coderv2-test/` and automatically
cached between CI runs via existing GitHub Actions cache setup.

Closes https://github.com/coder/internal/issues/607
2025-11-12 08:43:22 +00:00
Paweł Banaszewski 991831b1dd chore: add API key ID to interceptions (#20513)
Adds APIKeyID to interceptions.
Needed for tracking API key usage with bridge.
fixes https://github.com/coder/coder/issues/20001
2025-11-10 13:46:41 +01:00
Dean Sheather b3f651d62f chore: change managed agent limit (#20540) 2025-11-05 00:46:27 +11:00
Danny Kopping ff532d9bf3 chore: handle deprecated aibridge experimental routes (#20565)
In v2.28 we're [removing the aibridge
experiment](https://github.com/coder/coder/pull/20544).

We need to handle `/api/experimental/aibridge/*` until Beta (next
release).

Signed-off-by: Danny Kopping <danny@coder.com>
2025-10-29 19:11:34 -06:00
Susana Ferreira 7e8fcb4b0f perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493)
## Description

The membership reconciliation ensures the prebuilds system user is a
member of all organizations with prebuilds configured. To support
prebuilds quota management, each organization must have a prebuilds
group that the system user belongs to.

## Problem

Previously, membership reconciliation iterated over all presets to check
and update membership status. This meant database queries
`GetGroupByOrgAndName` and `InsertGroupMember` were executed for each
preset. Since presets are unique combinations of `(organization,
template, template version, preset)`, this resulted in several redundant
checks for the same organization.

In dogfood, `InsertGroupMember` was called thousands of times per day,
even though memberships were already configured ([internal Grafana
dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1))

<img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36"
src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db"
/>

## Solution

This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query
that returns:
* All unique organizations with prebuilds configured
* Whether the prebuilds user is a member of each organization
* Whether the prebuilds group exists in each organization
* Whether the prebuilds user is in the prebuilds group

The membership reconciliation logic now:
* Fetches status for all organizations in one query
* Only performs inserts for organizations missing required memberships
or groups
* Safely handles concurrent operations via unique constraint violations
* This reduces database load from `O(presets)` to `O(organizations)` per
reconciliation loop, with a single read query when everything is
configured.

## Changes

* Add `GetOrganizationsWithPrebuildStatus` SQL query
* Update `membership.ReconcileAll` to use organization-based
reconciliation instead of preset-based
* Update tests to reflect new behavior

Related to internal thread:
https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369
2025-10-29 14:24:29 +00:00
Danny Kopping b20fd6f2c1 chore: graduate aibridge API out of experimental (#20523)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->
2025-10-29 07:18:54 -06:00
Danny Kopping 2294c55bd9 chore: graduate aibridged* packages out of experimental (#20522)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->
2025-10-29 07:00:24 -06:00
Susana Ferreira aad1b401c1 feat: add prebuilds reconciliation duration metric (#20535)
## Description

Adds `coderd_prebuilds_reconciliation_duration_seconds` histogram metric
to track the duration of each prebuilds reconciliation cycle.

This metric helps operators monitor reconciliation performance and
identify potential bottlenecks.

## Changes

- Added `ReconcileStats` struct to capture reconciliation cycle
statistics
- Updated `ReconcileAll()` to return stats including elapsed time
- Added histogram metric
`coderd_prebuilds_reconciliation_duration_seconds`
2025-10-29 12:52:30 +00:00