Commit Graph

3211 Commits

Author SHA1 Message Date
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Danny Kopping a14a22eb54 feat: support custom bedrock base url (#21582)
Closes https://github.com/coder/aibridge/issues/126
Depends on https://github.com/coder/aibridge/pull/131

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-21 12:48:56 +00:00
Susana Ferreira 6ef9670384 fix: limit concurrent database connections in prebuild reconciliation (#20908)
## Description

This PR addresses database connection pool exhaustion during prebuilds
reconciliation by introducing two changes:
* `CanSkipReconciliation`: Filters out presets that don't need
reconciliation before spawning goroutines. This ensures we only create
goroutines for presets that will (_most likely_) perform database
operations, avoiding unnecessary connection pool usage.
* Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the
configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`).
This replaces the previous hardcoded limit of 5, ensuring the
reconciliation loop scales appropriately with the configured pool size
while leaving capacity for other database operations.

## Changes

* Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns
true for inactive presets with no running workspaces, no pending jobs,
or expired prebuilds.
* Add `maxDBConnections` parameter to `NewStoreReconciler` and compute
`reconciliationConcurrency` as half the pool size (minimum 1).
* Add `ReconciliationConcurrency()` getter method to `StoreReconciler`.
* Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent
reconciliation goroutines.
* Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for
observability.
* Add `TestCanSkipReconciliation` unit tests.
* Add `TestReconciliationConcurrency` unit tests.
* Add benchmark tests for reconciliation performance.

## Benchmarks

* `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation
actions. All presets are filtered by `CanSkipReconciliation`, resulting
in no goroutines spawned and no database connections used.
* `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all
require reconciliation actions. All presets spawn goroutines, but
concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`.
* `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a
large subset of inactive presets (filtered by `CanSkipReconciliation`)
and a smaller subset requiring reconciliation (limited by
`eg.SetLimit`).

Closes: https://github.com/coder/coder/issues/20606
2026-01-21 10:56:31 +00:00
Mathias Fredriksson 2132c53f28 feat(coderd/database): add schema for task pause/resume lifecycle (#21557)
Creates migration 000409 with the database foundation for pausing and
resuming task workspaces.

The task_snapshots table stores conversation history (AgentAPI messages)
so users can view task logs even when the workspace is stopped. Each task
gets one snapshot, overwritten on each pause.

Three new build_reason values (task_auto_pause, task_manual_pause,
task_resume) let us distinguish task lifecycle events in telemetry and
audit logs from regular workspace operations.

Uses a regular table rather than UNLOGGED for snapshots. While UNLOGGED
would be faster, losing snapshots on database crash creates user confusion
(logs disappear until next pause). We can switch to UNLOGGED post-GA if
write performance becomes a problem.

Closes coder/internal#1250
2026-01-21 12:12:12 +02:00
Jake Howell 59b71f296f feat: implement non-brittle TestDBPurgeAuthorization (#21442)
Closes #21440 

The `TestDBPurgeAuthorization` test was overfitting by calling each
purge method individually, which reimplemented dbpurge logic in the test
and created a maintenance burden. When new purge steps are added, they
either need to be reflected in the test or there will be a testing
blindspot.

This change extracts the `doTick` closure into an exported `PurgeTick`
function that returns an error, making the core purge logic testable.
The test now calls `PurgeTick` directly to exercise the actual dbpurge
behavior rather than reimplementing it. Retention values are configured
to ensure all purge operations run, so we test RBAC permissions for all
code paths.

- Tests actual dbpurge behavior instead of reimplementing it
- Automatically covers new purge steps when they're added
- Still validates that all operations have proper RBAC permissions

The test focuses on authorization (checking for RBAC errors) rather than
verifying deletion behavior, which is already covered by other tests
like `TestDeleteExpiredAPIKeys` and `TestDeleteOldAuditLogs`.
2026-01-21 11:27:01 +11:00
Kacper Sawicki ed679bb3da feat(codersdk): add circuit breaker configuration support for aibridge (#21546)
## Summary

Add circuit breaker support for AI Bridge to protect against cascading
failures from upstream AI provider rate limits (HTTP 429, 503, and
Anthropic's 529 overloaded responses).

## Changes

- Add 5 new CLI options for circuit breaker configuration:
  - `--aibridge-circuit-breaker-enabled` (default: false)
  - `--aibridge-circuit-breaker-failure-threshold` (default: 5)
  - `--aibridge-circuit-breaker-interval` (default: 10s)
  - `--aibridge-circuit-breaker-timeout` (default: 30s)
  - `--aibridge-circuit-breaker-max-requests` (default: 3)
- Update aibridge dependency to include circuit breaker support
- Add tests for pool creation with circuit breaker providers

## Notes

- Circuit breaker is **disabled by default** for backward compatibility
- When enabled, applies to both OpenAI and Anthropic providers
- Uses sony/gobreaker internally via the aibridge library

## Testing

```
make test RUN=TestPoolWithCircuitBreakerProviders
```
2026-01-20 14:59:29 +01:00
Rowan Smith b163b4c950 feat: support bundle updates to enable pprof and telemetry collection (#21486)
- Adds pprof collection support now that we have the listeners
automatically starting (requires Coder server 2.28.0+, includes a
version check). Collects heap, allocs, profile (30s), block, mutex,
goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture
for 30 seconds and emits a log line stating as such. Enable capture by
supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var.
Collection of pprof data from both coderd and the Coder agent occurs.
- Adds collection of Prometheus metrics, also requires 2.28.0+
- Adds the ability to include a template in the bundle independently of
supplying the details of a running workspace by supplying the
`--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var
- Captures a list of workspaces the user has access to. Defaults to a
max of 10, configurable via `--workspaces-total-cap` /
`CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP`
- Collects additional stats from the coderd deployment (aggregated
workspace/session metrics), as well as entitlements via license and
dismissed health checks.

created with help from mux
2026-01-20 10:28:52 +11:00
Cian Johnston 9776dc16bd fix(coderd/database/dbmetrics): fix incorrect query label in GetWorkspaceAgentAndWorkspaceByID (#21576)
Fixes an incorrect label.
2026-01-19 16:25:36 +00:00
Cian Johnston 08343a7a9f perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522)
Relates to https://github.com/coder/internal/issues/1214

The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.

This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context

Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
2026-01-19 12:36:33 +00:00
Susana Ferreira a406ed7cc5 feat: add upstream proxy support to aiproxy for passthrough requests (#21512)
## Description

Adds upstream proxy support for AI Bridge Proxy passthrough requests.
This allows aiproxy to forward non-allowlisted requests through an
upstream proxy. Currently, the only supported configuration is when
aiproxy is the first proxy in the chain (client → aiproxy → upstream
proxy).

## Changes

* Add `--aibridge-proxy-upstream` option to configure an upstream
HTTP/HTTPS proxy URL for passthrough requests
* Add `--aibridge-proxy-upstream-ca` option to trust custom CA
certificates for HTTPS upstream proxies
* Passthrough requests (non-allowlisted domains) are forwarded through
the upstream proxy
* MITM'd requests (allowlisted domains) continue to go directly to
aibridge, not through the upstream proxy
* Add tests for upstream proxy configuration and request routing

Closes: https://github.com/coder/internal/issues/1204
2026-01-19 08:50:57 +00:00
Cian Johnston ad23ea3561 chore: remove unused ExtractWorkspaceAndAgentParam (#21537)
While investigating https://github.com/coder/internal/issues/1214 I
noticed that `ExtractWorkspaceAndAgentParam` appeared to be unused
outside of tests.
2026-01-16 15:11:10 +00:00
Cian Johnston 3a62a8e70e chore: improve healthcheck timeout message (#21520)
Relates to https://github.com/coder/internal/issues/272

This flake has been persisting for a while, and unfortunately there's no
detail on which healthcheck in particular is holding things up.

This PR adds a concurrency-safe `healthcheck.Progress` and wires it
through `healthcheck.Run`. If the healthcheck times out, it will provide
information on which healthchecks are completed / running, and how long
they took / are still taking.

🤖 Claude Opus 4.5 completed the first round of this implementation,
which I then refactored.
2026-01-15 16:37:05 +00:00
blinkagent[bot] d5296a4855 chore: add lint/migrations to detect hardcoded public schema (#21496)
## Problem

Migration 000401 introduced a hardcoded `public.` schema qualifier which
broke deployments using non-public schemas (see #21493). We need to
prevent this from happening again.

## Solution

Adds a new `lint/migrations` Make target that validates database
migrations do not hardcode the `public` schema qualifier. Migrations
should rely on `search_path` instead to support deployments using
non-public schemas.

## Changes

- Added `scripts/check_migrations_schema.sh` - a linter script that
checks for `public.` references in migration files (excluding test
fixtures)
- Added `lint/migrations` target to the Makefile
- Added `lint/migrations` to the main `lint` target so it runs in CI

## Testing

- Verified the linter **fails** on current `main` (which has the
hardcoded `public.` in migration 000401)
- Verified the linter **passes** after applying the fix from #21493

```bash
# On main (fails)
$ make lint/migrations
ERROR: Migrations must not hardcode the 'public' schema. Use unqualified table names instead.

# After fix (passes)
$ make lint/migrations
Migration schema references OK
```

## Depends on

- #21493 must be merged first (or this PR will fail CI until it is)

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2026-01-15 14:17:16 +02:00
Cian Johnston 5073493850 feat(coderd/database/dbmetrics): add query_counts_total metric (#21506)
Adds a new Prometheus metric `coderd_db_query_counts_total` that tracks
the total number of queries by route, method, and query name. This is
aimed at helping us track down potential optimization candidates for
HTTP handlers that may trigger a number of queries. It is expected to be
used alongside `coderd_api_requests_processed_total` for correlation.

Depends upon new middleware introduced in
https://github.com/coder/coder/pull/21498

Relates to https://github.com/coder/internal/issues/1214
2026-01-15 10:58:56 +00:00
Cian Johnston 32354261d3 chore(coderd/httpmw): extract HTTPRoute middleware (#21498)
Extracts part of the prometheus middleware that stores the route
information in the request context into its own middleware. Also adds
request method information to context.

Relates to https://github.com/coder/internal/issues/1214
2026-01-15 10:26:50 +00:00
Ehab Younes 6683d807ac refactor: add RFC-compliant enum types and use SDK as source of truth (#21468)
Add comprehensive OAuth2 enum types to codersdk following RFC specifications:
- OAuth2ProviderGrantType (RFC 6749)
- OAuth2ProviderResponseType (RFC 6749)
- OAuth2TokenEndpointAuthMethod (RFC 7591)
- OAuth2PKCECodeChallengeMethod (RFC 7636)
- OAuth2TokenType (RFC 6749, RFC 9449)
- OAuth2RevocationTokenTypeHint (RFC 7009)
- OAuth2ErrorCode (RFC 6749, RFC 7009, RFC 8707)

Add OAuth2TokenRequest, OAuth2TokenResponse, OAuth2TokenRevocationRequest,
and OAuth2Error structs to the SDK. Update OAuth2ClientRegistrationRequest,
OAuth2ClientRegistrationResponse, OAuth2ClientConfiguration, and
OAuth2AuthorizationServerMetadata to use typed enums instead of raw strings.

This makes codersdk the single source of truth for OAuth2 types, eliminating
duplication between SDK and server-side structs.

Closes #21476
2026-01-15 12:41:28 +03:00
George K 0712faef4f feat(enterprise): implement organization "disable workspace sharing" option (#21376)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.

This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.

Closes https://github.com/coder/internal/issues/1073 (part 2)

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-14 09:47:50 -08:00
Danny Kopping 7d5cd06f83 feat: add aibridge structured logging (#21492)
Closes https://github.com/coder/internal/issues/1151

Sample:

```
[API] 2026-01-13 15:50:20.795 [info]  coderd.aibridgedserver: interception started  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=a3e5b5da9546032a  record_type=interception_start  interception_id=97461880-4a6c-47c1-8292-3588dd715312  initiator_id=360c6167-a93a-4442-9c3e-f87a6d1cfb66  api_key_id=vg1sbUv97d  provider=anthropic  model=claude-opus-4-5-20251101  started_at="2026-01-13T15:50:20.790690781Z"  metadata={}
[API] 2026-01-13 15:50:23.741 [info]  coderd.aibridgedserver: token usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=a114f0cc3047296e  record_type=token_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  input_tokens=10  output_tokens=8  created_at="2026-01-13T15:50:23.731587038Z"  metadata={"cache_creation_input":53194,"cache_ephemeral_1h_input":0,"cache_ephemeral_5m_input":53194,"cache_read_input":0,"web_search_requests":0}
[API] 2026-01-13 15:50:26.265 [info]  coderd.aibridgedserver: token usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=dbdafb563bff2c9c  record_type=token_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  input_tokens=0  output_tokens=130  created_at="2026-01-13T15:50:26.254467904Z"  metadata={}
[API] 2026-01-13 15:50:26.268 [info]  coderd.aibridgedserver: prompt usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=da51887a757226fc  record_type=prompt_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  prompt="list the jmia share price"  created_at="2026-01-13T15:50:26.255299811Z"  metadata={}
[API] 2026-01-13 15:50:26.268 [info]  coderd.aibridgedserver: interception ended  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=3fa25397705ee7c9  record_type=interception_end  interception_id=97461880-4a6c-47c1-8292-3588dd715312  ended_at="2026-01-13T15:50:26.25555547Z"
[API] 2026-01-13 15:50:26.269 [info]  coderd.aibridgedserver: tool usage recorded  trace=8bb5a1d8eb10526cc46ad90f191bb468  span=b54af90afc604d29  record_type=tool_usage  interception_id=97461880-4a6c-47c1-8292-3588dd715312  msg_id=msg_01VJH1rYKspfun8BW29CrYEu  tool=mcp__stonks__getStockPriceSnapshot  input="{\"ticker\":\"JMIA\"}"  server_url=""  injected=false  invocation_error=""  created_at="2026-01-13T15:50:26.255164652Z"  metadata={}
```

Structured logging is only enabled when
`CODER_AIBRIDGE_STRUCTURED_LOGGING=true`.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-14 17:26:08 +02:00
blinkagent[bot] b3a81be1aa fix(coderd/database): remove hardcoded public schema from migration 000401 (#21493) 2026-01-14 05:40:30 +02:00
Susana Ferreira 74b6d12a8a feat: implement selective MITM with configurable domain allowlist in aibridgeproxyd (#21473)
## Description

Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so
that only requests to allowlisted domains are intercepted and decrypted.
Requests to all other domains are tunneled directly without decryption.

## Changes

* New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default:
`api.anthropic.com`,`api.openai.com`)
* Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT`
requests to allowlisted hosts
* Certificate caching: Now only generates/caches certificates for
allowlisted domains
* Validation: Startup fails if domain allowlist is empty or contains
invalid entries

Closes: https://github.com/coder/internal/issues/1182
2026-01-13 11:30:51 +00:00
Cian Johnston 64e7a77983 feat: add user_agent to loggermw (#21485)
Adds the `user_agent` field to `httpmw/loggermw`.
2026-01-13 10:50:01 +00:00
Danny Kopping 49a42eff5c feat: make database connection pool size configurable (#21403)
Closes https://github.com/coder/coder/issues/21360

A few considerations/notes:
- I've kept the number of conns to 10 in all other places, except coderd
- which uses the config value
- I opted to also make idle conns configurable; the greater the delta
between max open and max idle, the more connection churn
- Postgres maintains a [_process_ per
connection](https://www.postgresql.org/docs/current/connect-estab.html),
contrary to what the comment said previously
- Operators should be able to tune this, since process churn can
negatively affect OS scheduling
- I've set the value to `"auto"` by default so it's not another knob one
_has to_ twiddle, and sets max idle = max conns / 3

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-01-13 10:50:57 +02:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Kacper Sawicki 6ca70d3618 feat(cli): add --no-build flag to state push for state-only updates (#21374)
## Summary

Adds a `--no-build` flag to `coder state push` that updates the
Terraform state directly without triggering a workspace build.

## Use Case

This enables state-only migrations, such as migrating Kubernetes
resources from deprecated types (e.g., `kubernetes_config_map`) to
versioned types (e.g., `kubernetes_config_map_v1`):

```bash
coder state pull my-workspace > state.json
terraform init
terraform state rm -state=state.json kubernetes_config_map.example
terraform import -state=state.json kubernetes_config_map_v1.example default/example
coder state push --no-build my-workspace state.json
```

## Changes

- Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state
without triggering a build
- Add `UpdateWorkspaceBuildState` SDK method
- Add `--no-build`/`-n` flag to `coder state push`
- Add confirmation prompt (can be skipped with `--yes`/`-y`) since this
is a potentially dangerous operation
- Add test for `--no-build` functionality

Fixes #21336
2026-01-12 15:16:59 +01:00
Zach 091d31224d fix: replace moby/moby namesgenerator with internal implementation (#21377)
Replace the external moby/moby/pkg/namesgenerator dependency with an
internal implementation using gofakeit/v7. The moby package has ~25k
unique name combinations, and with its retry parameter only adds a
random digit 0-9, giving ~250k possibilities. In parallel tests, this
has led to collisions (flakes).

The new internal API at coderd/util/namesgenerator eliminates the
external dependnecy and offers functions with explicit uniqueness
guarantees. This PR also consolidates fragmented name generation in a
few places to use the new package.

| Old (moby/moby)                     | New                    |
|-------------------------------------|------------------------|
| namesgenerator.GetRandomName(0)     | NameWith("_")          |
| namesgenerator.GetRandomName(>0)    | NameDigitWith("_")     |
| testutil.GetRandomName(t)           | UniqueName()           |
| testutil.GetRandomNameHyphenated(t) | UniqueNameWith("-")    |

namesgenerator package API:
- NameWith(delim): random name, not unique
- NameDigitWith(delim): random name with 1-9 suffix, not unique
- UniqueName(): guaranteed unique via atomic counter
- UniqueNameWith(delim): unique with custom delimiter

Names continue to be docker style `[adjective][delim][surname]`. Unique
names are truncated to 32 characters (preserving the numeric suffix) to
fit common name length limits in Coder.

Related test flakes:
https://github.com/coder/internal/issues/1212
https://github.com/coder/internal/issues/118
https://github.com/coder/internal/issues/1068
2026-01-09 15:40:26 -07:00
Steven Masley 60b3fd0783 chore!: send modules archive over the proto messages (#21398)
# What this does

Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces.

This allow terraform to skip downloading modules from their registries, a step that takes seconds. 

<img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" />


# Wire protocol

The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit.

# 🚨  Behavior Change (Breaking Change) 🚨 

**Before this PR** modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version

**After this PR** modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.
2026-01-09 11:33:34 -06:00
Steven Masley d2044c2ee9 chore: update protobuf to reuse file request (#21447)
**This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398**

Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`.

Renamed to `FileUpload` because it is now bi-directional.

This **is backwards compatible**. I tested it to confirm the payloads are identical.  Types were just renamed and moved around.

```golang
func TestTypeUpgrade(t *testing.T) {
	t.Parallel()

	x := &proto2.UploadFileRequest{
		Type: &proto2.UploadFileRequest_ChunkPiece{
			ChunkPiece: &proto.ChunkPiece{
				Data:         []byte("Hello World!"),
				FullDataHash: []byte("Foobar"),
				PieceIndex:   42,
			},
		},
	}

	data, err := protobuf.Marshal(x)
	require.NoError(t, err)

	// Exactly the same output
	// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main`
	// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch
	fmt.Println(base64.StdEncoding.EncodeToString(data))
}
```


# What this does

This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.
2026-01-09 11:23:32 -06:00
Steven Masley 89f4d60e7b chore: remove experiment "terraform-directory-reuse" (#21397)
Experiment is no longer required, the new method will be released without an experiment and without a toggle

Main PR is: https://github.com/coder/coder/pull/21398
2026-01-09 11:13:16 -06:00
Cian Johnston b116d22c5f chore: manage tool versions in go.mod (#21455)
Go 1.24 adds [tool
dependencies](https://go.dev/doc/modules/managing-dependencies#tools).
This allows us to track versions of tools in our `go.mod` instead of
sprinkling various `go run` commands throughout our codebase.

NOTE: there are still various hard-coded `go install` commands in our
dogfood Dockerfile. As that list is likely severely outdated, will leave
that for a separate PR.
2026-01-08 16:25:28 +00:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Cian Johnston 0f446f99dd feat(cli): add logs cmd (#21430)
This PR adds a command to view the provisioner and agent logs for a
given workspace.
Note: I did investigate using the existing `cliui` methods to tail the
logs but they are tailored to a very specific use-case.

Other changes:
- Adds `Agents` to `dbfake.WorkspaceResponse`
- Adds methods to generate provisioner and agent logs in `dbgen`

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-08 09:58:10 +00:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Spike Curtis 41a966c284 fix: sort latest key by sequence correctly (#21425)
Fixes an issue where we will not correctly return the latest key by sequence number if the fetch returns them in a order where the latest key is not last. The db query uses `ORDER BY sequence DESC` it is likely we have been operating incorrectly.

Adds a second key to one of the test cases which fails without this fix.

Also includes some debug logging statements I found helpful while chasing key rotation issues.
2026-01-06 14:01:51 +04:00
Asher 4a97df3768 chore: rename flag to disable template insights (#21329)
Because this affects more than just the template insights
page (specifically it also affects the deployment stats endpoint which
is shown on bottom bar and Prometheus), the group is being renamed
generically to just "stats collection".  In the future if we need to
affect the other stats we can put those options here.

Then, because this change only affects a portion of stats, specifically
usage stats like connection and application time, bytes sent, etc, add a
new sub-group called "usage stats".

Then finally add back the "enable" flag.  This also gives us a place to
one day place an "anonymize" flag if we need to go that route.
2026-01-05 11:44:06 -09:00
George K e10fceb23c fix(coderd/database): allow same custom role name for different orgs (#21312)
Previously the `idx_custom_roles_name_lower` index prevented that.

A check constraint was also added to ensure the `organization_id` column cannot be set to the all-zero UUID.
2026-01-05 07:43:08 -08:00
Zach 07924037e7 feat: add boundary log forwarding from agent to coderd (#21345)
Add agent forwarding of boundary audit logs from workspaces to coderd
via agent API, and re-emission of boundary logs to coderd stderr. This
change adds a server to the workspace agent that always listens on a
unix socket for boundary to connect and send audit logs.

coderd log format example:
```
[API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..
```

Corresponding boundary PR: https://github.com/coder/boundary/pull/124
RFC:
https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
https://github.com/coder/coder/issues/21280
2025-12-31 16:38:19 -07:00
Danny Kopping 733b6b7db9 feat: add API to serve proxy certificate (#21391)
Closes https://github.com/coder/internal/issues/1184
2025-12-29 18:00:06 +00:00
Susana Ferreira b97572285a feat: add core AI MITM proxy daemon (#21296)
## Description

Adds the core AI Bridge MITM proxy daemon. This proxy intercepts HTTPS traffic, decrypts it using a configured CA certificate, and forwards requests to AIBridge for processing.

## Changes

* Added `aibridgeproxyd` package with the core proxy server implementation
* Added configuration options: `CODER_AIBRIDGE_PROXY_ENABLED`, `CODER_AIBRIDGE_PROXY_LISTEN_ADDR`, `CODER_AIBRIDGE_PROXY_CERT_FILE`, `CODER_AIBRIDGE_PROXY_KEY_FILE`
* Added tests for server initialization and MITM functionality

Closes https://github.com/coder/internal/issues/1180
2025-12-29 15:31:51 +00:00
Danielle Maywood 5655760f1d test: use deterministic time to avoid time-based flake (#21396)
Use deterministic time to avoid time-based flake test failure.
2025-12-29 14:25:14 +00:00
Danielle Maywood 05529139bc feat(coderd): support deleting dev containers (#21248)
Add an endpoint to coderd to support deleting dev containers
2025-12-24 12:34:39 +00:00
Danielle Maywood 44a46db487 feat(agent): support deleting dev containers (#21247)
Add logic to the agent, and an endpoint, to allow requesting and then deleting a Dev Container and its related agent.
2025-12-22 11:28:31 +00:00
Spike Curtis 6238065185 test: use not before in TestAgentConnectionMonitor_* (#21332)
fixes https://github.com/coder/internal/issues/1203

The matcher I wrote for TestAgentConnectionMonitor tested that `last_disconnected_at` was strictly _after_ the start of the test to ensure it was updated.

This is too strict of a test because Windows in particular doesn't have high-resolution timers, so it's entirely possible to get the exact same timestamp from subsequent calls to `time.Now()`. This PR switches the test to _not before_ to cover this case. The results are just as valid because we always initialize the `last_disconnected_at` to something well before the test starts.
2025-12-22 10:21:39 +04:00
Zach 9d1493a13a feat: add initial API for boundary log forwarding to coderd (#21293)
Add the AgentAPI changes to support the feature that transmits boundary
logs from workspaces to coderd via the agent API for eventual re-emission to
stderr. The API handlers are stubs for now because I'm trying to land
this feature from multiple smaller PRs.

High level architecture:
- Boundary records resource access in batches and sends proto message to
agent
- Agent proxies messages to coderd **(captured by the API changes in
this PR)**
- coderd re-emits logs to stderr

RFC:
https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
2025-12-19 10:41:39 -07:00
Jake Howell ea00e72063 feat: add rbac specificity for dbpurge (#21088)
Related to
[`internal#1139`](https://github.com/coder/internal/issues/1139)

Continuation of #21074 

This implements some RBAC role specificity for `dbpurge`, ensuring that
we follow the least-privileged model for removing data from the
database. It is specified as following.

```go
Site: rbac.Permissions(map[string][]policy.Action{
	// DeleteOldWorkspaceAgentLogs
	// DeleteOldWorkspaceAgentStats
	// DeleteOldProvisionerDaemons
	// DeleteOldTelemetryLocks
	// DeleteOldAuditLogConnectionEvents
	// DeleteOldConnectionLogs
	rbac.ResourceSystem.Type: {policy.ActionDelete},
	// DeleteOldNotificationMessages
	rbac.ResourceNotificationMessage.Type: {policy.ActionDelete},
	// ExpirePrebuildsAPIKeys
	// DeleteExpiredAPIKeys
	rbac.ResourceApiKey.Type: {policy.ActionDelete},
	// DeleteOldAIBridgeRecords
	rbac.ResourceAibridgeInterception.Type: {policy.ActionDelete},
}),
```

| Position | Pull-request |
| -------- | ------------ |
| | [feat: add prometheus observability metrics for
`dbpurge`](https://github.com/coder/coder/pull/21074) |
|  | [feat: add rbac specificity for
`dbpurge`](https://github.com/coder/coder/pull/21088) |
2025-12-20 01:02:39 +11:00
Jake Howell 00793cc0b5 feat: add prometheus observability metrics for dbpurge (#21074)
Related to
[`internal#1139`](https://github.com/coder/internal/issues/1139)

This implements some prometheus metrics for records being removed from
the database. Currently we're tracking the following fields being
removed from the DB by this. They're viewable in the
`/api/v2/debug/metrics` endpoint.

* `expired_api_keys`
* `aibridge_records`
* `connection_logs`
* `duration`

```
# HELP coderd_dbpurge_iteration_duration_seconds Duration of each dbpurge iteration in seconds.
# TYPE coderd_dbpurge_iteration_duration_seconds histogram
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="1"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="5"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="10"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="30"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="60"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="300"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="600"} 1
coderd_dbpurge_iteration_duration_seconds_bucket{success="true",le="+Inf"} 1
coderd_dbpurge_iteration_duration_seconds_sum{success="true"} 0.014787814
coderd_dbpurge_iteration_duration_seconds_count{success="true"} 1
# HELP coderd_dbpurge_records_purged_total Total number of records purged by type.
# TYPE coderd_dbpurge_records_purged_total counter
coderd_dbpurge_records_purged_total{record_type="aibridge_records"} 0
coderd_dbpurge_records_purged_total{record_type="audit_logs"} 0
coderd_dbpurge_records_purged_total{record_type="connection_logs"} 0
coderd_dbpurge_records_purged_total{record_type="expired_api_keys"} 0
coderd_dbpurge_records_purged_total{record_type="workspace_agent_logs"} 0
```

| Position | Pull-request |
| -------- | ------------ |
|  | [feat: add prometheus observability metrics for
`dbpurge`](https://github.com/coder/coder/pull/21074) |
| | [feat: add rbac specificity for
`dbpurge`](https://github.com/coder/coder/pull/21088) |
2025-12-20 00:20:57 +11:00
Cian Johnston 8248fa3b84 fix(coderd): wake dormant workspace when attempting to start it (#21306)
Relates to #20925

This PR modifies the `postWorkspaceBuild` handler to automatically unset
dormancy on a workspace when a start transition is requested.
Previously, the client was responsible for unsetting the dormancy on the
workspace prior to posting a workspace build.
2025-12-18 10:35:04 +00:00
Spike Curtis c5fc6defb8 fix: report correct request paths from workspace proxy metrics (#21302)
I noticed while looking at scale test metrics that we don't always
report a useful path in the API request metrics.


![image.png](https://app.graphite.com/user-attachments/assets/a5b0dadf-9c2f-46a8-a6c1-3ad5f6201edb.png)

There are a lot of requests with path `/*`. I chased this problem to the
workspace proxy, where we mount a the proxy router as a child of a
"root" router to support some high level endpoints like `latency-check`.

Because we query the path from the Chi route context in the prometheus
middleware _before_ the request is actually handled, we can have a
partially resolved pattern match only corresponding to the root router.
The fix is to always re-resolve the path, rather than accept a partially
resolved path.
2025-12-17 21:08:40 +04:00
Spike Curtis bd753d9cb9 fix: mark users seen when activating on login (#21305)
fixes #21303

Update user last_seen_at when we mark them active on login. This prevents a narrow race where they can be re-marked dormant and fail to log in.
2025-12-17 16:49:40 +04:00
Mathias Fredriksson dac822b7f4 refactor: remove deprecated AITaskPromptParameterName constant (#21023)
This removes the deprecated AITaskPromptParameterName constant and all
backward compatibility code that was added for v2.28.

- Remove AITaskPromptParameterName constant from codersdk/aitasks.go
- Remove backward compatibility code in coderd/aitasks.go that populated
  the "AI Prompt" parameter for templates that defined it
- Remove the backward compatibility test (OK AIPromptBackCompat)
- Update dbfake to no longer set the AI Prompt parameter
- Remove AITaskPromptParameterName from frontend TypeScript types
- Remove preset prompt read-only feature from TaskPrompt component
- Update docs to reflect that pre-2.28 definition is no longer supported

Task prompts are now exclusively stored in the tasks.prompt database
column, as introduced in the migration that added the tasks table.
2025-12-16 15:14:59 +00:00
Asher 871ed128aa chore: update azure certs (#21265) 2025-12-15 13:44:44 -09:00