Commit Graph

3133 Commits

Author SHA1 Message Date
Marcin Tojek 04b0253e8a feat: add Prometheus metrics for license warnings and errors (#21749)
Fixes: coder/internal#767

Adds two new Prometheus metrics for license health monitoring:

- `coderd_license_warnings` - count of active license warnings
- `coderd_license_errors` - count of active license errors

Metrics endpoint after startup of a deployment with license enabled:

```
...
# HELP coderd_license_errors The number of active license errors.
# TYPE coderd_license_errors gauge
coderd_license_errors 0
...
# HELP coderd_license_warnings The number of active license warnings.
# TYPE coderd_license_warnings gauge
coderd_license_warnings 0
...
```
2026-01-29 13:50:15 +01:00
Steven Masley dfbd541cee chore: move List util out of db2sdk to avoid circular imports (#21733) 2026-01-28 13:07:53 -06:00
Steven Masley e13f2a9869 chore: remove extra stop_modules from provisionerd proto (#21706)
Was a duplicate of start_modules

Closes https://github.com/coder/coder/issues/21206
2026-01-28 09:25:47 -06:00
Spike Curtis 7090a1e205 chore: renumber duplicate migration 000411 (#21720)
Fixes recent duplicate DB migration in #21607
2026-01-28 08:01:58 +04:00
Spike Curtis f358a6db11 chore: convert tailnet tables to UNLOGGED for improved write performance (#21607)
This migration converts all tailnet coordination tables to UNLOGGED:
- `tailnet_coordinators`
- `tailnet_peers`
- `tailnet_tunnels`

UNLOGGED tables skip Write-Ahead Log (WAL) writes, significantly
improving performance for high-frequency updates like coordinator
heartbeats and peer state changes.

The trade-off is that UNLOGGED tables are truncated on crash recovery
and are not replicated to standby servers. This is acceptable for these
tables because the data is ephemeral:
1. Coordinators re-register on startup
2. Peers re-establish connections on reconnect
3. Tunnels are re-created based on current peer state

**Migration notes:**
- Child tables must be converted before the parent table because LOGGED
child tables cannot reference UNLOGGED parent tables (but the reverse is
allowed)
- The down migration reverses the order: parent first, then children

Fixes https://github.com/coder/coder/issues/21333
2026-01-28 07:12:32 +04:00
Zach 2204731ddb feat: implement boundary usage tracker and telemetry collection (#21716)
Implements telemetry for boundary usage tracking across all Coder
replicas and reports them via telemetry.

Changes:
- Implement Tracker with Track(), FlushToDB(), and StartFlushLoop() methods
- Add telemetry integration via collectBoundaryUsageSummary()
- Use telemetry lock to ensure only one replica collects per period

The tracker accumulates unique workspaces, unique users, and request
counts (allowed/denied) in memory, then flushes to the database
periodically. During telemetry collection, stats are aggregated across
all replicas and reset for the next period.
2026-01-27 19:11:40 -07:00
Steven Masley 799b190dee fix: do not enforce managed agent limit for non-task workspaces (#21689)
Only task workspaces have the checks in wsbuilder for violating the
managed agent caps in the license.

Stopped tasks that are resumed with a regular workspace start **still
count as usage**.
2026-01-27 19:01:17 -06:00
Zach 7dfa33b410 feat: add boundary usage tracking database schema and tracker skeleton (#21670)
feat: add boundary usage telemetry database schema and RBAC

Adds the foundation for tracking boundary usage telemetry across Coder
replicas. This includes:

  - Database schema: `boundary_usage_stats` table with per-replica stats
    (unique workspaces, unique users, allowed/denied request counts)
  - Database queries: upsert stats, get aggregated summary, reset stats,
    delete by replica ID
  - RBAC: `boundary_usage` resource type with read/update/delete actions,
    accessible only via system `BoundaryUsageTracker` subject (not regular
    user roles)
  - Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/`

The tracker accumulates stats in memory and periodically flushes to the
database. Stats are aggregated across replicas for telemetry reporting,
then reset when a new reporting period begins. The tracker implementation
and plumbing will be done in a subsequent commit/PR.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 13:29:21 -07:00
George K c352a51b22 fix(coderd): authorize workspace start/stop/delete by transition action (#21691)
Use transition-specific actions when authorizing workspace build
parameter inserts in the database layer so start/stop/delete do not
require workspace.update.

Related to: https://github.com/coder/internal/issues/1299
2026-01-27 09:08:12 -08:00
Cian Johnston 7b44976618 fix(coderd/provisionerdserver): correct managed agent tracking (#21696)
Relates to https://github.com/coder/internal/issues/1282

Updates tracking of managed agents to be predicated instead on the
presence of a related `task_id` instead of the presence of a
`coder_ai_task` resource.
2026-01-27 12:14:52 +00:00
Mathias Fredriksson 25d7f27cdb feat(coderd): add task log snapshot storage endpoint (#21644)
This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot
endpoint for agents to upload task conversation history during
workspace shutdown. This allows users to view task logs even when the
workspace is stopped.

The endpoint accepts agentapi format payloads (typically last 10
messages, max 64KB), wraps them in a format envelope, and upserts to the
task_snapshots table. Uses agent token auth and validates the task
belongs to the agent's workspace.

Closes coder/internal#1253
2026-01-27 11:09:24 +02:00
Danny Kopping 7123518baa feat: conditionally send aibridge actor headers (#21643)
Also passes along the authenticated username as actor metadata.

Closes https://github.com/coder/aibridge/issues/135
Depends on https://github.com/coder/aibridge/pull/142

**Replace aibridge tag with merge commit once
https://github.com/coder/aibridge/pull/142 lands.**

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-26 15:08:17 +00:00
Cian Johnston 612aae2523 chore: replace httpapi.Heartbeat with httpapi.HeartbeatClose (#21676)
Relates to https://github.com/coder/coder/pull/21676

* Replaces all existing usages of `httpapi.Heartbeat` with `httpapi.HeartbeatClose`
* Removes `httpapi.HeartbeatClose`
2026-01-26 12:11:40 +00:00
Spike Curtis f47f89d997 chore: remove unused tailnet v1 tables and queries (#21646)
Removes the legacy tailnet v1 API tables (`tailnet_clients`, `tailnet_agents`, `tailnet_client_subscriptions`) and their associated queries, triggers, and functions. These were superseded by the v2 tables (`tailnet_peers`, `tailnet_tunnels`) in migration 000168, and the v1 API code was removed in commit d6154c4310, but the database artifacts were never cleaned up.

**Changes:**
- New migration `000410_remove_tailnet_v1_tables` to drop the unused tables
- Removed 11 unused queries from `tailnet.sql`
- Removed associated manual wrapper methods in `dbauthz` and `dbmetrics`
- ~930 lines deleted across 11 files
2026-01-26 14:27:17 +04:00
Danielle Maywood 409360c62d fix(coderd): ensure inbox WebSocket is closed when client disconnects (#21652)
Relates to https://github.com/coder/coder/issues/19715

This is similar to https://github.com/coder/coder/pull/19711

This endpoint works by doing the following:
- Subscribing to the database's with pubsub
- Accepts a WebSocket upgrade
- Starts a `httpapi.Heartbeat`
- Creates a json encoder
- **Infinitely loops waiting for notification until request context
cancelled**

The critical issue here is that `httpapi.Heartbeat` silently fails when
the client has disconnected. This means we never cancel the request
context, leaving the WebSocket alive until we receive a notification
from the database and fail to write that down the pipe.

By replacing usage of `httpapi.Heartbeat` with `httpapi.HeartbeatClose`,
we cancel the context _when the heartbeat fails to write_ due to the
client disconnecting. This allows us to cleanup without waiting for a
notification to come through the pubsub channel.
2026-01-26 09:24:45 +00:00
Cian Johnston fa7baebdd8 fix(coderd): handle rbac.NotAuthorizedError when deleting template (#21645)
Relates to
https://github.com/coder/aibridge/pull/143/changes#r2720659638

We previously had been returning the following when attempting to delete
failed due to lack of permissions.

```
500 Internal error deleting template: unauthorized: rbac: forbidden
```

This PR updates the handler to return our usual 403 forbidden response.
2026-01-23 12:02:46 +00:00
Callum Styan e195856c43 perf: reduce pg_notify call volume by batching together agent metadata updates (#21330)
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 22:47:49 -08:00
Zach 6c49938fca feat: add template version ID to re-emitted boundary logs (#21636)
Adds template_version_id to re-emitted boundary audit logs to allow
filtering and analysis by specific template versions iin addition to the
existing template_id field. Since boundary policies are defined in the
template, the template version is critical to figuring out which policy
was responsible for boundaries decision in a workspace.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:06:02 -07:00
George K d29a168785 fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620)
The removal of that permission from the role broke valid use cases (e.g.
a site owner user creating a workspace owned by a system account and
then trying to share it with another user).

The bulk of the PR is made up of the rollbacks of the previously
introduced test updates necessitated by the removal.

Related to: https://github.com/coder/internal/issues/1285
2026-01-22 08:12:15 -08:00
Zach 6d8e6d4830 feat: include template ID in re-emitted boundary logs (#21618)
Boundary policies are currently defined at the template level, so
including the template ID in re-emitted logs by the control plane
allows policy creators to filter and observe boundary activity for
specific templates. This makes it easier to verify that policies are
working as expected and to debug issues with specific template
configurations.
2026-01-22 08:37:16 -07:00
Mathias Fredriksson 4c7844ad3d feat(coderd): bump workspace deadline on AI agent activity (#21584)
AI agents report status via patchWorkspaceAgentAppStatus, but this wasn't
extending workspace deadlines. This prevented proper task auto-pause behavior,
causing tasks to pause mid-execution when there were no human connections.

Now we call ActivityBumpWorkspace when agents report status, using the same
logic as SSH/IDE connections. We bump when transitioning to or from the working
state.

Closes coder/internal#1251
2026-01-22 13:52:32 +02:00
Susana Ferreira 47b3846bca feat: use coder specific header for aibridge authentication from AI proxy (#21590)
## Description

Introduces a new `X-Coder-Token` header for authenticating requests from
AI Proxy to AI Bridge. Previously, the proxy overwrote the
`Authorization` header with the Coder token, which prevented the
original authentication headers from flowing through to upstream
providers.

With this change, AI Proxy sets the Coder token in a separate header,
preserving the original `Authorization` and `X-Api-Key` headers. AI
Bridge uses this header for authentication and removes it before
forwarding requests to upstream providers. For requests that don't come
through AI Proxy, AI Bridge continues to use `Authorization` and
`X-Api-Key` for authentication.

## Changes

* Add `HeaderCoderAuth` constant and update `ExtractAuthToken` to check
headers in the following order: `X-Coder-Token` > `Authorization` >
`X-Api-Key`
* Update AI Proxy to set `X-Coder-Token` instead of overwriting
`Authorization`
* Remove `X-Coder-Token` in AI Bridge before forwarding to upstream
providers
* Add tests for header handling and token extraction priority

Related to: https://github.com/coder/internal/issues/1235
2026-01-21 19:06:19 +00:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Danny Kopping a14a22eb54 feat: support custom bedrock base url (#21582)
Closes https://github.com/coder/aibridge/issues/126
Depends on https://github.com/coder/aibridge/pull/131

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-21 12:48:56 +00:00
Susana Ferreira 6ef9670384 fix: limit concurrent database connections in prebuild reconciliation (#20908)
## Description

This PR addresses database connection pool exhaustion during prebuilds
reconciliation by introducing two changes:
* `CanSkipReconciliation`: Filters out presets that don't need
reconciliation before spawning goroutines. This ensures we only create
goroutines for presets that will (_most likely_) perform database
operations, avoiding unnecessary connection pool usage.
* Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the
configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`).
This replaces the previous hardcoded limit of 5, ensuring the
reconciliation loop scales appropriately with the configured pool size
while leaving capacity for other database operations.

## Changes

* Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns
true for inactive presets with no running workspaces, no pending jobs,
or expired prebuilds.
* Add `maxDBConnections` parameter to `NewStoreReconciler` and compute
`reconciliationConcurrency` as half the pool size (minimum 1).
* Add `ReconciliationConcurrency()` getter method to `StoreReconciler`.
* Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent
reconciliation goroutines.
* Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for
observability.
* Add `TestCanSkipReconciliation` unit tests.
* Add `TestReconciliationConcurrency` unit tests.
* Add benchmark tests for reconciliation performance.

## Benchmarks

* `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation
actions. All presets are filtered by `CanSkipReconciliation`, resulting
in no goroutines spawned and no database connections used.
* `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all
require reconciliation actions. All presets spawn goroutines, but
concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`.
* `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a
large subset of inactive presets (filtered by `CanSkipReconciliation`)
and a smaller subset requiring reconciliation (limited by
`eg.SetLimit`).

Closes: https://github.com/coder/coder/issues/20606
2026-01-21 10:56:31 +00:00
Mathias Fredriksson 2132c53f28 feat(coderd/database): add schema for task pause/resume lifecycle (#21557)
Creates migration 000409 with the database foundation for pausing and
resuming task workspaces.

The task_snapshots table stores conversation history (AgentAPI messages)
so users can view task logs even when the workspace is stopped. Each task
gets one snapshot, overwritten on each pause.

Three new build_reason values (task_auto_pause, task_manual_pause,
task_resume) let us distinguish task lifecycle events in telemetry and
audit logs from regular workspace operations.

Uses a regular table rather than UNLOGGED for snapshots. While UNLOGGED
would be faster, losing snapshots on database crash creates user confusion
(logs disappear until next pause). We can switch to UNLOGGED post-GA if
write performance becomes a problem.

Closes coder/internal#1250
2026-01-21 12:12:12 +02:00
Jake Howell 59b71f296f feat: implement non-brittle TestDBPurgeAuthorization (#21442)
Closes #21440 

The `TestDBPurgeAuthorization` test was overfitting by calling each
purge method individually, which reimplemented dbpurge logic in the test
and created a maintenance burden. When new purge steps are added, they
either need to be reflected in the test or there will be a testing
blindspot.

This change extracts the `doTick` closure into an exported `PurgeTick`
function that returns an error, making the core purge logic testable.
The test now calls `PurgeTick` directly to exercise the actual dbpurge
behavior rather than reimplementing it. Retention values are configured
to ensure all purge operations run, so we test RBAC permissions for all
code paths.

- Tests actual dbpurge behavior instead of reimplementing it
- Automatically covers new purge steps when they're added
- Still validates that all operations have proper RBAC permissions

The test focuses on authorization (checking for RBAC errors) rather than
verifying deletion behavior, which is already covered by other tests
like `TestDeleteExpiredAPIKeys` and `TestDeleteOldAuditLogs`.
2026-01-21 11:27:01 +11:00
Kacper Sawicki ed679bb3da feat(codersdk): add circuit breaker configuration support for aibridge (#21546)
## Summary

Add circuit breaker support for AI Bridge to protect against cascading
failures from upstream AI provider rate limits (HTTP 429, 503, and
Anthropic's 529 overloaded responses).

## Changes

- Add 5 new CLI options for circuit breaker configuration:
  - `--aibridge-circuit-breaker-enabled` (default: false)
  - `--aibridge-circuit-breaker-failure-threshold` (default: 5)
  - `--aibridge-circuit-breaker-interval` (default: 10s)
  - `--aibridge-circuit-breaker-timeout` (default: 30s)
  - `--aibridge-circuit-breaker-max-requests` (default: 3)
- Update aibridge dependency to include circuit breaker support
- Add tests for pool creation with circuit breaker providers

## Notes

- Circuit breaker is **disabled by default** for backward compatibility
- When enabled, applies to both OpenAI and Anthropic providers
- Uses sony/gobreaker internally via the aibridge library

## Testing

```
make test RUN=TestPoolWithCircuitBreakerProviders
```
2026-01-20 14:59:29 +01:00
Rowan Smith b163b4c950 feat: support bundle updates to enable pprof and telemetry collection (#21486)
- Adds pprof collection support now that we have the listeners
automatically starting (requires Coder server 2.28.0+, includes a
version check). Collects heap, allocs, profile (30s), block, mutex,
goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture
for 30 seconds and emits a log line stating as such. Enable capture by
supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var.
Collection of pprof data from both coderd and the Coder agent occurs.
- Adds collection of Prometheus metrics, also requires 2.28.0+
- Adds the ability to include a template in the bundle independently of
supplying the details of a running workspace by supplying the
`--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var
- Captures a list of workspaces the user has access to. Defaults to a
max of 10, configurable via `--workspaces-total-cap` /
`CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP`
- Collects additional stats from the coderd deployment (aggregated
workspace/session metrics), as well as entitlements via license and
dismissed health checks.

created with help from mux
2026-01-20 10:28:52 +11:00
Cian Johnston 9776dc16bd fix(coderd/database/dbmetrics): fix incorrect query label in GetWorkspaceAgentAndWorkspaceByID (#21576)
Fixes an incorrect label.
2026-01-19 16:25:36 +00:00
Cian Johnston 08343a7a9f perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522)
Relates to https://github.com/coder/internal/issues/1214

The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.

This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context

Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
2026-01-19 12:36:33 +00:00
Susana Ferreira a406ed7cc5 feat: add upstream proxy support to aiproxy for passthrough requests (#21512)
## Description

Adds upstream proxy support for AI Bridge Proxy passthrough requests.
This allows aiproxy to forward non-allowlisted requests through an
upstream proxy. Currently, the only supported configuration is when
aiproxy is the first proxy in the chain (client → aiproxy → upstream
proxy).

## Changes

* Add `--aibridge-proxy-upstream` option to configure an upstream
HTTP/HTTPS proxy URL for passthrough requests
* Add `--aibridge-proxy-upstream-ca` option to trust custom CA
certificates for HTTPS upstream proxies
* Passthrough requests (non-allowlisted domains) are forwarded through
the upstream proxy
* MITM'd requests (allowlisted domains) continue to go directly to
aibridge, not through the upstream proxy
* Add tests for upstream proxy configuration and request routing

Closes: https://github.com/coder/internal/issues/1204
2026-01-19 08:50:57 +00:00
Cian Johnston ad23ea3561 chore: remove unused ExtractWorkspaceAndAgentParam (#21537)
While investigating https://github.com/coder/internal/issues/1214 I
noticed that `ExtractWorkspaceAndAgentParam` appeared to be unused
outside of tests.
2026-01-16 15:11:10 +00:00
Cian Johnston 3a62a8e70e chore: improve healthcheck timeout message (#21520)
Relates to https://github.com/coder/internal/issues/272

This flake has been persisting for a while, and unfortunately there's no
detail on which healthcheck in particular is holding things up.

This PR adds a concurrency-safe `healthcheck.Progress` and wires it
through `healthcheck.Run`. If the healthcheck times out, it will provide
information on which healthchecks are completed / running, and how long
they took / are still taking.

🤖 Claude Opus 4.5 completed the first round of this implementation,
which I then refactored.
2026-01-15 16:37:05 +00:00
blinkagent[bot] d5296a4855 chore: add lint/migrations to detect hardcoded public schema (#21496)
## Problem

Migration 000401 introduced a hardcoded `public.` schema qualifier which
broke deployments using non-public schemas (see #21493). We need to
prevent this from happening again.

## Solution

Adds a new `lint/migrations` Make target that validates database
migrations do not hardcode the `public` schema qualifier. Migrations
should rely on `search_path` instead to support deployments using
non-public schemas.

## Changes

- Added `scripts/check_migrations_schema.sh` - a linter script that
checks for `public.` references in migration files (excluding test
fixtures)
- Added `lint/migrations` target to the Makefile
- Added `lint/migrations` to the main `lint` target so it runs in CI

## Testing

- Verified the linter **fails** on current `main` (which has the
hardcoded `public.` in migration 000401)
- Verified the linter **passes** after applying the fix from #21493

```bash
# On main (fails)
$ make lint/migrations
ERROR: Migrations must not hardcode the 'public' schema. Use unqualified table names instead.

# After fix (passes)
$ make lint/migrations
Migration schema references OK
```

## Depends on

- #21493 must be merged first (or this PR will fail CI until it is)

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2026-01-15 14:17:16 +02:00
Cian Johnston 5073493850 feat(coderd/database/dbmetrics): add query_counts_total metric (#21506)
Adds a new Prometheus metric `coderd_db_query_counts_total` that tracks
the total number of queries by route, method, and query name. This is
aimed at helping us track down potential optimization candidates for
HTTP handlers that may trigger a number of queries. It is expected to be
used alongside `coderd_api_requests_processed_total` for correlation.

Depends upon new middleware introduced in
https://github.com/coder/coder/pull/21498

Relates to https://github.com/coder/internal/issues/1214
2026-01-15 10:58:56 +00:00
Cian Johnston 32354261d3 chore(coderd/httpmw): extract HTTPRoute middleware (#21498)
Extracts part of the prometheus middleware that stores the route
information in the request context into its own middleware. Also adds
request method information to context.

Relates to https://github.com/coder/internal/issues/1214
2026-01-15 10:26:50 +00:00
Ehab Younes 6683d807ac refactor: add RFC-compliant enum types and use SDK as source of truth (#21468)
Add comprehensive OAuth2 enum types to codersdk following RFC specifications:
- OAuth2ProviderGrantType (RFC 6749)
- OAuth2ProviderResponseType (RFC 6749)
- OAuth2TokenEndpointAuthMethod (RFC 7591)
- OAuth2PKCECodeChallengeMethod (RFC 7636)
- OAuth2TokenType (RFC 6749, RFC 9449)
- OAuth2RevocationTokenTypeHint (RFC 7009)
- OAuth2ErrorCode (RFC 6749, RFC 7009, RFC 8707)

Add OAuth2TokenRequest, OAuth2TokenResponse, OAuth2TokenRevocationRequest,
and OAuth2Error structs to the SDK. Update OAuth2ClientRegistrationRequest,
OAuth2ClientRegistrationResponse, OAuth2ClientConfiguration, and
OAuth2AuthorizationServerMetadata to use typed enums instead of raw strings.

This makes codersdk the single source of truth for OAuth2 types, eliminating
duplication between SDK and server-side structs.

Closes #21476
2026-01-15 12:41:28 +03:00
George K 0712faef4f feat(enterprise): implement organization "disable workspace sharing" option (#21376)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.

This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.

Closes https://github.com/coder/internal/issues/1073 (part 2)

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-14 09:47:50 -08:00
Danny Kopping 7d5cd06f83 feat: add aibridge structured logging (#21492)
Closes https://github.com/coder/internal/issues/1151

Sample:

```
[API] 2026-01-13 15:50:20.795 [info]  coderd.aibridgedserver: interception started  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=a3e5b5da9546032a  record_type=interception_start  interception_id=97461880-4a6c-47c1-8292-3588dd715312  initiator_id=360c6167-a93a-4442-9c3e-f87a6d1cfb66  api_key_id=vg1sbUv97d  provider=anthropic  model=claude-opus-4-5-20251101  started_at="2026-01-13T15:50:20.790690781Z"  metadata={}
[API] 2026-01-13 15:50:23.741 [info]  coderd.aibridgedserver: token usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=a114f0cc3047296e  record_type=token_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  input_tokens=10  output_tokens=8  created_at="2026-01-13T15:50:23.731587038Z"  metadata={"cache_creation_input":53194,"cache_ephemeral_1h_input":0,"cache_ephemeral_5m_input":53194,"cache_read_input":0,"web_search_requests":0}
[API] 2026-01-13 15:50:26.265 [info]  coderd.aibridgedserver: token usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=dbdafb563bff2c9c  record_type=token_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  input_tokens=0  output_tokens=130  created_at="2026-01-13T15:50:26.254467904Z"  metadata={}
[API] 2026-01-13 15:50:26.268 [info]  coderd.aibridgedserver: prompt usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=da51887a757226fc  record_type=prompt_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  prompt="list the jmia share price"  created_at="2026-01-13T15:50:26.255299811Z"  metadata={}
[API] 2026-01-13 15:50:26.268 [info]  coderd.aibridgedserver: interception ended  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=3fa25397705ee7c9  record_type=interception_end  interception_id=97461880-4a6c-47c1-8292-3588dd715312  ended_at="2026-01-13T15:50:26.25555547Z"
[API] 2026-01-13 15:50:26.269 [info]  coderd.aibridgedserver: tool usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=b54af90afc604d29  record_type=tool_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  tool=mcp__stonks__getStockPriceSnapshot  input="{\"ticker\":\"JMIA\"}"  server_url=""  injected=false  invocation_error=""  created_at="2026-01-13T15:50:26.255164652Z"  metadata={}
```

Structured logging is only enabled when
`CODER_AIBRIDGE_STRUCTURED_LOGGING=true`.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-14 17:26:08 +02:00
blinkagent[bot] b3a81be1aa fix(coderd/database): remove hardcoded public schema from migration 000401 (#21493) 2026-01-14 05:40:30 +02:00
Susana Ferreira 74b6d12a8a feat: implement selective MITM with configurable domain allowlist in aibridgeproxyd (#21473)
## Description

Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so
that only requests to allowlisted domains are intercepted and decrypted.
Requests to all other domains are tunneled directly without decryption.

## Changes

* New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default:
`api.anthropic.com`,`api.openai.com`)
* Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT`
requests to allowlisted hosts
* Certificate caching: Now only generates/caches certificates for
allowlisted domains
* Validation: Startup fails if domain allowlist is empty or contains
invalid entries

Closes: https://github.com/coder/internal/issues/1182
2026-01-13 11:30:51 +00:00
Cian Johnston 64e7a77983 feat: add user_agent to loggermw (#21485)
Adds the `user_agent` field to `httpmw/loggermw`.
2026-01-13 10:50:01 +00:00
Danny Kopping 49a42eff5c feat: make database connection pool size configurable (#21403)
Closes https://github.com/coder/coder/issues/21360

A few considerations/notes:
- I've kept the number of conns to 10 in all other places, except coderd
- which uses the config value
- I opted to also make idle conns configurable; the greater the delta
between max open and max idle, the more connection churn
- Postgres maintains a [_process_ per
connection](https://www.postgresql.org/docs/current/connect-estab.html),
contrary to what the comment said previously
- Operators should be able to tune this, since process churn can
negatively affect OS scheduling
- I've set the value to `"auto"` by default so it's not another knob one
_has to_ twiddle, and sets max idle = max conns / 3

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-13 10:50:57 +02:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Kacper Sawicki 6ca70d3618 feat(cli): add --no-build flag to state push for state-only updates (#21374)
## Summary

Adds a `--no-build` flag to `coder state push` that updates the
Terraform state directly without triggering a workspace build.

## Use Case

This enables state-only migrations, such as migrating Kubernetes
resources from deprecated types (e.g., `kubernetes_config_map`) to
versioned types (e.g., `kubernetes_config_map_v1`):

```bash
coder state pull my-workspace > state.json
terraform init
terraform state rm -state=state.json kubernetes_config_map.example
terraform import -state=state.json kubernetes_config_map_v1.example default/example
coder state push --no-build my-workspace state.json
```

## Changes

- Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state
without triggering a build
- Add `UpdateWorkspaceBuildState` SDK method
- Add `--no-build`/`-n` flag to `coder state push`
- Add confirmation prompt (can be skipped with `--yes`/`-y`) since this
is a potentially dangerous operation
- Add test for `--no-build` functionality

Fixes #21336
2026-01-12 15:16:59 +01:00
Zach 091d31224d fix: replace moby/moby namesgenerator with internal implementation (#21377)
Replace the external moby/moby/pkg/namesgenerator dependency with an
internal implementation using gofakeit/v7. The moby package has ~25k
unique name combinations, and with its retry parameter only adds a
random digit 0-9, giving ~250k possibilities. In parallel tests, this
has led to collisions (flakes).

The new internal API at coderd/util/namesgenerator eliminates the
external dependnecy and offers functions with explicit uniqueness
guarantees. This PR also consolidates fragmented name generation in a
few places to use the new package.

| Old (moby/moby)                     | New                    |
|-------------------------------------|------------------------|
| namesgenerator.GetRandomName(0)     | NameWith("_")          |
| namesgenerator.GetRandomName(>0)    | NameDigitWith("_")     |
| testutil.GetRandomName(t)           | UniqueName()           |
| testutil.GetRandomNameHyphenated(t) | UniqueNameWith("-")    |

namesgenerator package API:
- NameWith(delim): random name, not unique
- NameDigitWith(delim): random name with 1-9 suffix, not unique
- UniqueName(): guaranteed unique via atomic counter
- UniqueNameWith(delim): unique with custom delimiter

Names continue to be docker style `[adjective][delim][surname]`. Unique
names are truncated to 32 characters (preserving the numeric suffix) to
fit common name length limits in Coder.

Related test flakes:
https://github.com/coder/internal/issues/1212
https://github.com/coder/internal/issues/118
https://github.com/coder/internal/issues/1068
2026-01-09 15:40:26 -07:00
Steven Masley 60b3fd0783 chore!: send modules archive over the proto messages (#21398)
# What this does

Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces.

This allow terraform to skip downloading modules from their registries, a step that takes seconds. 

<img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" />


# Wire protocol

The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit.

# 🚨  Behavior Change (Breaking Change) 🚨 

**Before this PR** modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version

**After this PR** modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.
2026-01-09 11:33:34 -06:00
Steven Masley d2044c2ee9 chore: update protobuf to reuse file request (#21447)
**This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398**

Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`.

Renamed to `FileUpload` because it is now bi-directional.

This **is backwards compatible**. I tested it to confirm the payloads are identical.  Types were just renamed and moved around.

```golang
func TestTypeUpgrade(t *testing.T) {
	t.Parallel()

	x := &proto2.UploadFileRequest{
		Type: &proto2.UploadFileRequest_ChunkPiece{
			ChunkPiece: &proto.ChunkPiece{
				Data:         []byte("Hello World!"),
				FullDataHash: []byte("Foobar"),
				PieceIndex:   42,
			},
		},
	}

	data, err := protobuf.Marshal(x)
	require.NoError(t, err)

	// Exactly the same output
	// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main`
	// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch
	fmt.Println(base64.StdEncoding.EncodeToString(data))
}
```


# What this does

This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.
2026-01-09 11:23:32 -06:00
Steven Masley 89f4d60e7b chore: remove experiment "terraform-directory-reuse" (#21397)
Experiment is no longer required, the new method will be released without an experiment and without a toggle

Main PR is: https://github.com/coder/coder/pull/21398
2026-01-09 11:13:16 -06:00