coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 13:08:25 +00:00

Author	SHA1	Message	Date
Spike Curtis	b49344519b	test: batch 06 of refactoring CLI tests not to use PTY (#25990 ) Part of [coder/internal#1400](https://github.com/coder/internal/issues/1400) Batch of refactored CLI tests to avoid creating PTYs.	2026-06-02 15:44:36 -04:00
Spike Curtis	cbaea49c02	test: batch 04 of refactoring CLI tests not to use PTY (#25937 ) Part of [coder/internal#1400](https://github.com/coder/internal/issues/1400) Batch of refactored CLI tests to avoid creating PTYs.	2026-06-02 10:50:27 -04:00
Zach	170c33a475	feat: encrypt gitsshkeys.private_key at rest via dbcrypt (#25872 ) Adds an optional dbcrypt wrapper around gitsshkeys.private_key. The column is encrypted on insert and update through enterprise/dbcrypt when external token encryption is configured, and decrypted on read. A new private_key_key_id column references dbcrypt_keys(active_key_digest) so revocation safety is enforced by the existing foreign key. Rows with a NULL key_id stay plaintext and remain readable. Existing plaintext rows can be backfilled by running `coder server dbcrypt rotate`. Generated with assistance from Coder Agents.	2026-06-02 08:36:01 -06:00
Spike Curtis	93b067f5f2	test: batch 03 of refactoring CLI tests not to use PTY (#25935 ) Part of [coder/internal#1400](https://github.com/coder/internal/issues/1400) Batch of refactored CLI tests to avoid creating PTYs.	2026-06-02 08:05:26 -04:00
Spike Curtis	bfa6ce32a6	test: batch 02 of refactoring CLI tests not to use PTY (#25931 ) Part of [coder/internal#1400](https://github.com/coder/internal/issues/1400) Batch of refactored CLI tests to avoid creating PTYs.	2026-06-02 07:53:24 -04:00
Thomas Kosiewski	f6a4ed309f	ci: fix Windows runner PATH casing for mise, not in cli (#25972 ) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 10:46:40 +00:00
Mathias Fredriksson	ed4311b2cb	ci: add Git usr/bin to PATH on Windows (#25939 ) ## Summary Fixes all 9 Windows CI test failures caused by the mise CI refactor (`fe257666d7`, PR #25727). ### Root cause `jdx/mise-action` exports `Path` (Windows convention) via `GITHUB_ENV`. Bash on Windows maintains its own `PATH`. When Go's `os.Environ()` returns both, `cmd.exe` subprocesses non-deterministically pick the MSYS-translated `PATH` (forward slashes), causing Windows executables (`printf`, `powershell.exe`, `cmd.exe`) to be unresolvable. These failures only appeared on `main` (where `-count=1` forces real test execution) and were masked on PRs by Go test cache. ### Fixes applied CI (`setup-mise` action): - Write both `Path` and `PATH` to `GITHUB_ENV` with Git usr/bin prepended Code (`cli/root.go`): - Add `appendAndDedupEnv` helper that deduplicates case-insensitive env vars on Windows, preferring native Windows paths (backslashes) over MSYS paths Code (`cli/configssh_windows.go`): - Use absolute paths for `powershell.exe` and `cmd.exe` in the SSH config `Match exec` escape function, avoiding PATH resolution entirely Tests: - Switch `--header-command` tests from `printf` to `echo` (cmd.exe builtin) for reliable cross-platform execution - Add env dedup in `Test_sshConfigMatchExecEscape` for subprocess PATH consistency Fixes coder/internal#1556, coder/internal#1558, coder/internal#1559 > 🤖 Generated by Coder agent, will be reviewed by @mafredri. 🏂🏻	2026-06-02 11:51:16 +10:00
Danny Kopping	c8555e2163	fix: deprecate ai provider seeding env config (#25854 ) Environment variables used to configure AI Gateway providers are now deprecated, and we need to reflect this as such.	2026-06-01 15:15:47 +02:00
Mathias Fredriksson	6ecf804896	test(cli): eliminate race in PausedDuringWaitForReady test (#25858 ) The PausedDuringWaitForReady and WaitsForWorkingAppState tests flaked because the quartz resetTrap was released immediately after catching ticker.Reset (line 174), allowing client.TaskByID (line 175) to race with the subsequent DB mutation (pauseTask / PatchAppStatus). Fix: keep the resetTrap open across both poll iterations. On the first poll, release the trap so the goroutine sees the initial state and continues. On the second poll, hold the goroutine frozen at ticker.Reset while mutating state. Then release; client.TaskByID deterministically sees the mutated state. No race because the goroutine cannot execute client.TaskByID while trapped. Closes CODAGT-482	2026-06-01 13:58:57 +03:00
Spike Curtis	3a727a9087	test: batch 01 of refactoring CLI tests not to use PTY (#25871 ) Part of https://github.com/coder/internal/issues/1400 Batch of refactored CLI tests to avoid creating PTYs.	2026-05-29 20:12:52 +00:00
Spike Curtis	8a47b7fa14	test: batch 00 of refactoring CLI tests not to use PTY (#25868 ) Part of https://github.com/coder/internal/issues/1400 Batch of refactored CLI tests to avoid creating PTYs.	2026-05-29 15:33:45 -04:00
Mathias Fredriksson	2af037ce02	fix(cli): use quartz mock clock in PausedDuringWaitForReady test (#25811 ) PausedDuringWaitForReady used the real clock, so the 5s poll in waitForTaskIdle could race with an in-flight stop build. The SQL view (tasks_with_status) returns "unknown" for stop builds with job_status != "succeeded" because the build_status CASE has no branch for (stop, pending) or (stop, running). On macOS CI, where the provisioner is slower, the poll fires during this transient window and hits the TaskStatusUnknown case instead of TaskStatusPaused, failing with "task entered unknown state" rather than the expected "was paused". Convert to the same quartz mock clock pattern that PR #25648 applied to WaitsForWorkingAppState: inject a mock clock via NewWithClock, trap ticker creation and reset, then advance time deterministically so the poll fires after the stop build completes. Closes CODAGT-482	2026-05-29 11:25:06 +03:00
Danny Kopping	5b10268827	feat: serve 503 sentinel for disabled providers (#25794 ) _Disclosure: created with Coder Agents._ When providers are disabled, we should serve a sentinel error so the requesting client (Claude Code, Coder Agents, etc) is informed. Coder Agents can also conditionalize its display to show a helpful error message. --------- Signed-off-by: Danny Kopping <danny@coder.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 10:24:16 +02:00
Spike Curtis	ee4126e913	test: refactor CLI create tests not to use PTY (#25807 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->Part of https://github.com/coder/internal/issues/1400 Refactors CLI tests of the `create` command as the first batch of tests refactored to take a PTY out of the loop. One interesting difference I noticed between PTY and a direct pipe to standard in is that on the PTY we write `\r` to enter some input, but the kernel actually sends `\n` (or maybe `\r\n`) to the process, at least on Unix. (On windows we sent `\r\n` into the PTY). This is reflected in the implementation of the `Writer` , otherwise mostly inspired by the PTYTest equivalents.	2026-05-28 17:50:37 -04:00
Danny Kopping	12520ee964	feat: add ai provider status and reload freshness metrics (#25770 ) Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.	2026-05-28 14:57:33 +02:00
Danny Kopping	2770bdc9d1	feat: route extra ai_provider_types through OpenAI and Anthropic providers (#25722 ) _Disclosure:_ _produced_ _with_ _Claude_ _Opus_ _4\.7_ AI Gateway only supports Anthropic (+Bedrock), OpenAI, and Copilot providers at present. All other types (Vercel, Gemini, etc) will be mapped to OpenAI since they support OpenAI-compatible endpoints.	2026-05-27 16:16:05 +02:00
Max Schwenk	ae492495ee	fix(cli): show ready sync start dependencies (#25546 ) ## Problem Follow-on to: - https://github.com/coder/coder/pull/25089 `coder exp sync start` still printed a generic success message when the unit was ready on the first status check. That hid whether the unit had no dependencies or had dependencies that were already satisfied before `sync start` ran. Before: ```text Success ``` ## Solution Print explicit startup output for both ready-at-first-check cases. After, dependencies already satisfied: ```text Unit "test-unit" started immediately, dependencies already satisfied: [dep-unit, dep-unit-2] ``` After, no dependencies: ```text Unit "test-unit" started with no dependencies ``` The existing waiting path is unchanged and still reports the dependencies while waiting and after waiting finishes. Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>	2026-05-27 12:33:39 +02:00
Danny Kopping	79e007cf30	feat: hot-reload aibridged and aibridgeproxyd providers on DB changes (#25673 ) Previously the in-process aibridge daemon and the enterprise aibridgeproxy daemon both snapshotted their provider routing once at boot. Any `ai_providers` or `ai_provider_keys` mutation required a restart for either to pick it up. Add an `ai_providers_changed` pubsub channel that the CRUD handlers publish on after Create / Update / Delete. Both daemons subscribe: - aibridged rebuilds its `[]aibridge.Provider` snapshot via `BuildProviders` and swaps it into the pool atomically. Inflight requests keep serving against the bridge they already acquired; new acquires build against the new snapshot. Per-provider construction errors stay scoped to the offending row. - aibridgeproxyd rebuilds its routing snapshot from `GetAIProviders` and swaps the host→provider map atomically. The MITM listener picks up new providers without restart. DB read for aibridgeproxyd uses the existing `AsAIProviderMetadataReader` subject for routing-only access.	2026-05-27 11:58:43 +02:00
Ethan	e91bec8574	fix(cli): close aibridge daemon before WebSocket shutdown wait (#25719 ) > [!WARNING] > The investigation and solution in this PR were done with [Mux](https://mux.coder.com/). I've reviewed the investigation methodology, evidence and solution, and it all appears sound. ## Summary PR #25570 (`refactor: move aibridged out of enterprise to AGPL`, merged 2026-05-22) added an in-memory aibridge DRPC server in `coderd/aibridged.go` that does `api.WebsocketWaitGroup.Add(1)` and only releases `Done()` when its client session is closed. PR #25575 then flipped `CODER_AI_GATEWAY_ENABLED` to default to `true`, so every `cli.Server()` invocation now spins up that goroutine. In `cli/server.go`, the only call to `aibridgeDaemon.Close()` was a `defer` scheduled at function return. During graceful shutdown the code first calls `coderAPICloser.Close()`, which waits on `api.WebsocketWaitGroup`. That wait sits for the full 10s timeout in `coderd/coderd.go` (`websocket shutdown timed out after 10 seconds`), then returns, then the function unwinds, and only then does the deferred `aibridgeDaemon.Close()` fire and let the goroutine call `Done()`. The 10s tax was previously latent (aibridged was enterprise-only and opt-in). After the two May 22 PRs it hit every `cli.Server()` test. On Linux/macOS CI it just makes the suite slower; on the Depot Windows runner, the ramdisk reservation leaves only ~17 GiB of headroom and the ~10s shutdown tails of multiple concurrent package binaries overlap into an OOM, presenting as `test-go-pg (windows-2022)` jobs that die silently at the ~600s watchdog with an empty `steps` array. See Slack: https://codercom.slack.com/archives/C05AE94121Z/p1779807717764189 ## Fix Close `aibridgeDaemon` explicitly during graceful shutdown, before `coderAPICloser.Close()` waits on the WebSocket wait group. This matches the existing ordered-shutdown pattern used for `tunnel` and `notificationsManager`. The deferred `aibridgeDaemon.Close()` is retained as a safety net for early-return paths, and is safe to double-call because `aibridged.Server.Close()` is already idempotent via `shutdownOnce` in `coderd/aibridged/aibridged.go`. ## Regression test `TestServer_AIGatewayShutdownOrdering` boots a real `coder server` with `--ai-gateway-enabled=true`, cancels its context, and asserts graceful shutdown finishes in under 8s. With the fix the test runs in ~0.1s; without the fix it fails deterministically at ~10.0s. The flag is passed explicitly so the test continues to guard the ordering even if the deployment default is ever flipped back. ## Evidence this fixes the OOM On Linux the patched `cli` test package drops from 114 s back to its pre-regression 30 s wall time at the same single-process peak RSS (~7.6 GiB), and the `websocket shutdown timed out after 10 seconds` log line disappears from every server-test run. Since the Windows OOM is the sum of multiple concurrent 10 s shutdown tails overlapping past the runner's ~17 GiB headroom, removing those tails returns the concurrent-RSS budget to its pre-regression level. The Windows OOM was intermittent (a handful of hits across many runs since May 22), so a single green `test-go-pg (windows-2022)` job on this PR is not by itself proof; confirmation will come from watching Windows runs on `main` over the next several days and seeing the ~600 s silent-kill fingerprint stop recurring. Relates to ENG-2771	2026-05-27 17:33:14 +10:00
Michael Suchacz	8b1705eb65	feat: route chatd provider traffic through aibridge (#25629 ) ## Summary Routes chatd model calls backed by concrete AI Provider rows through the in-process aibridge transport by default, with deployment options to use direct provider routing when AI Gateway is disabled or chat AI Gateway routing is disabled. - Splits model routing into common, direct provider, and AI Gateway paths behind a single deployment-mode entry point. - Builds chatd models through explicit request, route, and options data. Active API key attribution is passed explicitly instead of being hidden inside generic model construction. - For AI Gateway BYOK routes, resolves the user's provider key in chatd, forwards it through provider-specific auth headers, and sets `X-Coder-AI-Governance-Token` to the `delegated` marker so aibridge preserves those headers while still stripping Coder-specific metadata. - Keeps central provider credentials and deployment fallback credentials out of forwarded provider auth headers, so AI Gateway central policy remains authoritative. - Redacts delegated provider auth from default string formatting to avoid accidental plaintext logging of user BYOK credentials. - Covers selected chat models, advisor overrides, title and quickgen paths, subagent overrides, computer use model selection, and an integration-style chat turn through the aibridge transport path. - Persists initiating API key IDs on chat and queued user messages, including subagent child messages, and fails closed for AI Gateway-routed model builds without an active key. - Removes unused `api_key_id` indexes while keeping the persistence columns and foreign keys. - Keeps the deployment option available through config and env parsing, but hides it from CLI help and generated docs. - Stabilizes the subagent poll fallback test so background CreateChat processing cannot win the state transition under slower CI environments. ## Tests - `go test ./coderd/x/chatd -run 'TestAIGatewayProviderAuthForUser\|TestAIGatewayProviderAuthRedactsFormatting\|TestResolveModelRouteForConfigAIGatewayProviderAuth\|TestAIGatewayModelForwardsProviderAuth\|TestProcessChat_AIGatewayRoutingUsesDelegatedAPIKey\|TestAwaitSubagentCompletion' -count=1` - `go test ./coderd/aibridged -run 'TestServeHTTP_DelegatedAPIKey\|TestServeHTTP_StripCoderToken' -count=1` - `git diff --check HEAD~1..HEAD` - `make lint` > Mux working on behalf of Mike.	2026-05-26 19:31:52 +00:00
Danny Kopping	a56c88a0cc	fix: run AI provider seed and build after newAPI so dbcrypt applies (#25699 ) ## Problem Two related symptoms of the same architectural issue: the `dbcrypt` wrapper is installed inside `enterprise/coderd.New`, so any access to `options.Database` that happens before `newAPI` runs bypasses encryption. Symptom 1 (reads): Provider keys added via the admin UI are encrypted at rest. `BuildProviders` was running before `newAPI`, against the unwrapped store, so the ciphertext was read as-is and shoved into the keypool as the upstream credential. Anthropic/OpenAI reject it, and the interception log shows: ``` coderd.aibridged.pool: interception failed ... error="all configured keys failed authentication" credential_kind=centralized credential_hint=PaPb...4A== credential_length=184 ``` Symptom 2 (writes): `SeedAIProvidersFromEnv` was also running before `newAPI`, against the unwrapped store, so env-derived keys (`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`, etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID = null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set. ## Fix Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after `newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes encrypt correctly; reads decrypt correctly. The enterprise closure (`enterprise/cli/server.go`) runs inside `newAPI` and calls `BuildProviders` for the aibridgeproxyd at that point. Once the agpl seed moves to after `newAPI`, the proxy on first boot would see no env-seeded providers. Add a matching seed call inside the enterprise closure before its `BuildProviders` to cover that case. Seeding is idempotent, so the agpl-side seed running again post-`newAPI` is a no-op when the rows already exist. ## Known shortcomings The clean version of this fix would just inherit `ctx` like every other startup step and place these calls naturally. It can't, for two reasons that are both about the surrounding handler architecture rather than this change: 1. `dbcrypt` wrapping is positioned inside `newAPI`, not around `options.Database` at creation. That's why both seed and build have to wait until after `newAPI` in the first place. The principled fix is to install the wrapper at the point the store is created (behind a hook the enterprise build supplies), so every consumer sees a single authoritative view and the ordering stops mattering. This would also collapse the duplicated seed call back to a single site. 2. The handler's shutdown sequence is not deferred. `coderAPICloser.Close()` and the other teardown steps run only if control reaches the `select` at the bottom of the handler. An early `return` from anywhere in Phase 1 (e.g. seed/build returning `context.Canceled` when the user hits ctrl-c during startup) skips that block and orphans all the goroutines `newAPI` spawned — tailnet workers, gitsync, telemetry batcher, etc. `goleak` then catches them at package teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving the shutdown into deferred closers (with a `sync.Once`-guarded close to avoid double-close from the explicit Phase 2 call) is the principled fix. For this PR I took the smallest change that fixes the reported bugs: a detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the seed and build call sites in both the agpl and enterprise paths. It lets the calls complete even if the user cancels during startup, after which the handler reaches its shutdown select naturally and tears down through Phase 2. Both shortcomings above are worth addressing separately. ## Test plan - `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`; passes locally with `-count=3`. - Manually verified on a deployment with `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers: `ai_provider_keys.api_key_key_id` is populated, `api_key` is base64 ciphertext, and upstream auth succeeds. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 21:27:02 +02:00
Danny Kopping	282ab7de34	refactor: load AI providers from the database at startup (#25672 ) Replace the env-based `BuildProviders` with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via `SeedAIProvidersFromEnv` (run at boot) and `BuildProviders` reads it back as `aibridge.Provider` instances. `cli/server.go` and `enterprise/cli/server.go` both call the same path, so aibridged and aibridgeproxyd see the same provider set. Per-provider `DumpDir` is replaced by a top-level `CODER_AI_GATEWAY_DUMP_DIR` base; each provider's effective dump path is `<base>/<provider name>`.	2026-05-26 15:57:01 +02:00
Ethan	4f1043a50a	feat(scaletest): add chat scaletest command (#25553 ) Adds `coder exp scaletest chat`, a harness for creating Coder Agents chat load. Start the mock LLM separately, prepare the scaletest workspaces you want to target, then run the chat scaletest against the existing `scaletest-*` fleet selected by the shared workspace targeting flags: ```sh coder exp scaletest llm-mock --address 127.0.0.1:18080 coder exp scaletest chat --llm-mock-url http://127.0.0.1:18080/v1 --chats-per-workspace 10 --turns 1 coder exp scaletest chat --llm-mock-url http://127.0.0.1:18080/v1 --template docker --target-workspaces 0:10 --chats-per-workspace 1 --turns 10 --turn-start-delay 30s ``` This is the same pattern used by the `workspace-traffic` load generator. Keeping the fake LLM as a separate process is intentional so it can be scaled independently from the Coder deployment, which will likely be necessary as we scale up and up. This PR is the starting point: it provides the command, mock provider/model bootstrap, existing workspace selection, chat streaming, follow-up turns, metrics, and cleanup. Follow-up PRs will add multi-step turns via tool calls. I'm still a bit iffy on the mechanism I have for that. It'll likely involve having the runner send some magic strings that the mock will recognise. Relates to CODAGT-307 Relates to GRU-48 Relates to https://github.com/coder/scaletest/issues/124 Generated by Mux, but reviewed by a human	2026-05-26 14:19:36 +10:00
Mathias Fredriksson	7958ad6d04	fix(cli): use quartz clock in waitForTaskIdle for immediate first poll (#25648 ) waitForTaskIdle used time.NewTicker(5s) which delays the first poll by 5 seconds. Debugger tracing proved the failure mechanism: on slow CI (Windows), the first poll at 5s sees "working" (idle patch has not landed due to goroutine scheduling), needs poll #2 at 10s, but the 25s context expires before it fires. Two changes: 1. Use r.clock.NewTicker (quartz) with time.Nanosecond initial interval and Reset(5s) for immediate first poll. Tests inject a mock clock via clitest.NewWithClock for deterministic control. 2. Rewrite WaitsForWorkingAppState test with quartz traps (NewTicker + TickerReset) for deterministic synchronization instead of racing goroutines. Fix PausedDuringWaitForReady sync point. Closes DEVEX-381	2026-05-25 19:14:29 +03:00
Danielle Maywood	5deab9f721	test: wait for devcontainer readiness (#25567 )	2026-05-22 13:55:21 +01:00
Danny Kopping	ef6ee2af68	chore: tolerate empty providers at startup and log env seeds (#25605 ) Since AI Gateway is now enabled by default, and if the AI Gateway Proxy is enabled too it's possible the server can start without any configured providers. This would previously block startup, which is unacceptable. In an upstack PR we will handle reloading the providers at runtime, so the server needs to be able to start up even if it can't handle any proxy requests to AI Gateway. This change was necessitated because if there are providers configured in the environment they need to be seeded _before_ the proxy starts.	2026-05-22 12:45:14 +02:00
Danny Kopping	ddec110b0e	refactor: move aibridged out of enterprise to AGPL (#25570 ) In order to allow Coder Agents to use AI Gateway in OSS, we need to rehome the `aibridged`\-related code into the AGPL path. The HTTP API is only registered under enterprise so will still require the AI Governance Add-on to be present in order to use it, whereas Coder Agents uses an in-memory pipe to the same handlers.	2026-05-22 09:11:37 +02:00
Danny Kopping	c50b0e84b9	feat!: default `CODER_AI_GATEWAY_ENABLED` to true (#25575 ) `CODER_AI_GATEWAY_ENABLED` / `CODER_AIBRIDGE_ENABLED` is now being defaulted to `true` now that it will be used by Coder Agents. If you previously had this value disabled explicitly, that value will persist.	2026-05-22 08:57:36 +02:00
Danny Kopping	9341efec9f	feat!: seed ai_providers from env on server startup (#24895 ) _Disclaimer: implemented by a Coder Agent using Claude Opus 4.7_ Part of the implementation of [RFC: Common AI Provider Configs](https://www.notion.so/coderhq/RFC-Common-AI-Provider-Configs-34bd579be59280ed958feffb82024797) (AIGOV-201). ## Note This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database. I'll raise a PR upstack to document this process and workarounds should a startup fail. ## What this PR does Reconciles environment-derived AI provider configuration with the `ai_providers` table at server startup. The seed runs before the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy `CODER_AIBRIDGE_` environment variables become a one-shot migration source. ### Behavior - Concurrent server starts are serialized through a Postgres advisory lock (`LockIDAIProvidersEnvSeed`). - Missing rows are inserted with an audit entry attributed to the system actor. - Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path). - Existing rows whose canonical hash does not* match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB. - Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts. - Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message. - Unknown provider types (e.g. `copilot`, until the DB enum is widened) are skipped with a log entry rather than failing startup. ### Canonical hashing The `canonicalAIProvider` shape captures exactly the fields that determine runtime behavior — `type`, `base_url`, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is computed on demand from the row + env, never persisted, so the database does not need a new column for it. API keys live in the separate `ai_provider_keys` table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart. <details> <summary>Decision log</summary> - The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration. - The lock uses an `iota` slot in `coderd/database/lock.go` rather than `GenLockID` so it's stable, easy to audit, and matches the convention used for every other startup lock. - A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth. - We deliberately do NOT publish to the `ai_providers_changed` pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel. </details>	2026-05-22 08:37:27 +02:00
Zach	ddc0e99c69	chore: remove coder_secret Terraform integration (#25512 ) Removes the coder_secret Terraform integration: the data.coder_secret consumption path through provisionerdserver → provisioner.proto → provisioner/terraform, the dynamic-parameter secret-requirement validation, and the workspace-update / resolve-autostart surfaces that depended on it. This is being done due to a product/feature direction change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit) and the agent-manifest secret-injection path are untouched. The provisionerd API is bumped from v1.17 to v1.18 rather than rolled back: v1.17 shipped in v2.33.x, so user_secrets field numbers are reserved and the changelog documents both versions. Generated with assistance from Coder Agents.	2026-05-21 09:19:29 -06:00
Thomas Kosiewski	26a0805dcd	fix(cli): isolate root HTTP transports (#25430 ) The CLI root client shared `http.DefaultTransport` for normal API requests and for the version-check build-info request. In parallel tests, other clients can close idle connections on that process-global transport, which can fail the Boundary license check before the AGPL 404 handling runs. `TestBoundaryLicenseVerification/AGPLDeployment` configures a proxy that returns `404` from `/api/v2/entitlements`, which `verifyLicense()` maps to the expected AGPL deployment error. However, `clitest.SetupConfig()` only writes the URL and session token to disk. It does not pass the test's isolated `proxyClient.HTTPClient` into the CLI invocation, so `coder boundary` builds a fresh client through `RootCmd.InitClient()`. Before this change, that fresh client used `http.DefaultTransport`; if another parallel test closed idle connections on the shared transport while the entitlement request was in flight, Go returned `http: CloseIdleConnections called` instead of the proxy's `404`. The command then failed with `failed to get entitlements`, and the test never reached the expected AGPL error path. Clone the default transport for each CLI root HTTP client and for the unwrapped build-info client, preserving the configured TLS settings when present. Each CLI invocation now gets its own transport instance, so cleanup from unrelated parallel tests cannot interrupt its entitlement or build-info requests. Closes https://github.com/coder/internal/issues/1538 <details> <summary>Coder Agents notes</summary> Generated by Coder Agents for Linear ENG-2705. Local validation: - `go test ./cli -run 'TestNewHTTPTransport\|Test_ensureTLSConfig\|Test_wrapTransportWithVersionCheck' -count=1` - `go test ./enterprise/cli -run TestBoundaryLicenseVerification/AGPLDeployment -count=20 -parallel=16` - `go test ./cli ./enterprise/cli` - `make lint` - `go test ./enterprise/cli -run '^TestBoundaryLicenseVerification$' -count=50 -parallel=16` - pre-commit hook during `git commit` </details>	2026-05-21 16:51:34 +02:00
Paweł Banaszewski	46e93e6325	chore: add ai_gateway options that alias aibridge options (#25061 ) Adds options matching new AI Gateway naming. New options are added as alias for old options. Old options are still working. Old options have deprecated message. No conflict detection was added. Updated documentation so it mentions only new options. Added note about old options still working. > Various AI tools where used to create this PR	2026-05-21 11:14:11 +02:00
Spike Curtis	05e47b9c0f	fix: filter out cross-talk on TestPortForward (#25503 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Fixes https://github.com/coder/internal/issues/1539 Protects from port cross-talk by adding a short random prefix to our socket communication and instructing the service on the workspace agent side of the test to ignore any connections that don't use the prefix.	2026-05-20 13:08:57 -04:00
Cian Johnston	85289464b6	fix(cli): remove unnecessary PTY from TestServerCreateAdminUser/Validates (#25444 ) Fixes https://linear.app/codercom/issue/PLAT-224 The Validates subtest only checks that `Run()` returns a validation error and never reads PTY output. We don't need it in this test, so removing. > 🤖 Generated by Coder Agents	2026-05-19 10:35:50 +01:00
Danielle Maywood	170a6e1fe9	feat: add chat sharing foundation (#25041 )	2026-05-18 22:32:05 +01:00
Callum Styan	191dd230ae	feat: add agentfake scaletest subcommand (#25072 ) This PR builds on top of https://github.com/coder/coder/pull/25070 to add a way of running the larger "fake agent" manager via the existing CLI, pulling in the URL/credentials already set. With this, we can run a pod per scaletest region to act as all the workspaces in that region. This is in a new subcommand `scaletest agentfake` currently. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2026-05-15 14:36:54 -07:00
Danny Kopping	841b777ccd	feat: add ai_providers table, queries, dbauthz, audit, RBAC (#24892 )	2026-05-14 16:10:46 +02:00
Max Schwenk	f3e90b334d	fix(cli): show sync wait dependencies (#25089 ) ## Problem `coder exp sync want` and `coder exp sync start` both printed generic success messages, which hid the dependency units involved in startup coordination. Before, declaring dependencies with `sync want` printed: ```text Success ``` Before, `sync start` printed while waiting, then finished with another generic success message: ```text Waiting for dependencies of unit 'test-unit' to be satisfied... Success ``` ## Solution Print the dependency units in both cases, using wording that matches where the command is in the lifecycle. After, `sync want` prints the dependencies it declared for the unit: ```text Unit "test-unit" declared dependencies: [dep-unit] ``` After, `sync start` enumerates the dependencies while it is waiting, then prints the same dependencies after the unit starts executing: ```text Unit "test-unit" is waiting for dependencies to be satisfied: [dep-unit, dep-unit-2] Unit "test-unit" finished waiting for dependencies: [dep-unit, dep-unit-2] ``` The sync golden tests now cover the updated output, including multiple dependencies for `sync start`.	2026-05-14 14:45:20 +02:00
Steven Masley	0f505aa4da	chore: unhide flag to force unix filepaths in config-ssh (#25142 ) Docs now include this flag. This flag is now also viewable in linux/mac despite it effectively being a `no-op`. Closes https://github.com/coder/coder/issues/24205	2026-05-13 14:59:33 -05:00
Michael Suchacz	38f586107d	refactor: remove agents TUI (#25190 )	2026-05-13 21:30:11 +02:00
Yevhenii Shcherbina	b5e1ea33d8	feat: add AI budget policy and period deployment config (#25122 ) Closes https://linear.app/codercom/issue/AIGOV-283/add-deployment-config-for-ai-budget-policy-and-period Adds `CODER_AI_BUDGET_POLICY` and `CODER_AI_BUDGET_PERIOD` deployment options for AI Governance cost controls.	2026-05-12 10:48:36 -04:00
Steven Masley	19573e8aee	feat!: patchTemplateMeta to use optional fields (#24984 ) Closes https://github.com/coder/coder/issues/13112 Breaking Change: Removed status code `StatusNotModified` when no diffs occur in a patch. Now the patch is always applied and a template is always returned.	2026-05-11 12:43:52 -05:00
Zach	81e2be69e9	test: use typed atomics in test files (#25071 ) Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent mixing atomic and non-atomic access on the same value, guarantee 64-bit alignment on 32-bit platforms, and provide a cleaner API.	2026-05-11 08:41:17 -06:00
Jeremy Ruppel	a1dbd758bc	feat: add template builder deployment config and telemetry types (#25082 )	2026-05-11 09:48:55 -04:00
Thomas Kosiewski	4a6756a3e8	fix: isolate test HTTP clients (#25038 )	2026-05-11 11:03:38 +02:00
Marcin Tojek	febabfb8b2	feat: add request/response dump support to aibridgeproxyd (#24837 ) Closes https://github.com/coder/coder/issues/24335	2026-05-11 10:59:26 +02:00
Nick Vigilante	369a191972	feat: add Quickstart template with language and IDE selection (#24904 ) Add a new Quickstart starter template that lets users pick programming languages, editors, and an optional Git repo to clone. The template uses Docker under the hood but presents a developer-focused experience: pick your tools, start coding. ## What's included - Languages parameter (multi-select): Python, Node.js, Go, Rust, Java, C/C++ - IDEs parameter (multi-select): VS Code (Browser), VS Code Desktop, Cursor, JetBrains, Zed, Windsurf - Git repo parameter: Optional URL to clone on workspace start - JetBrains filtering: Maps selected languages to relevant IDE codes (Python → PyCharm, Go → GoLand, etc.) - Docker precondition check: Uses `data "external"` + `terraform_data` precondition to surface a friendly error when Docker is unavailable, before the Docker provider fails with a cryptic message - 4 presets: Web Development, Backend (Go), Data Science, Full Stack - Single install script: All languages install in one `coder_script` to avoid apt-get lock conflicts (agent scripts run in parallel via `errgroup`) <details><summary>Design decisions</summary> - Docker as invisible backend: Docker is required on the Coder server but never mentioned in the user-facing parameter UI. The experience is entirely "pick languages, pick editors, start coding." - `coder_script` over startup_script: Language installs use a templated script file (`install-languages.sh.tftpl`) driven by the languages parameter. A single script avoids dpkg lock contention since `coder_script` resources execute concurrently. - `data "external"` for Docker check: The external provider probes Docker availability independently of the Docker provider. If Docker is down, the `terraform_data` precondition fails with a human-readable message before any `docker_` resource is evaluated. This depends on the Docker provider connecting lazily (at resource eval time, not at provider init), which current behavior confirms. - JetBrains filtering by language: Rather than showing all 9 JetBrains IDEs, the template computes relevant IDE codes from the language selection (e.g. Python → PY, Go → GO) and passes them as `default` to the JetBrains module. - Arch-aware Go install*: The install script detects `uname -m` to download the correct Go binary for amd64 or arm64. </details> <details><summary>Screenshots and recordings from the UI</summary> <p> <img width="1851" height="1471" alt="Screenshot 2026-05-05 at 2 14 20 PM" src="https://github.com/user-attachments/assets/d4c9cdc5-d311-43a5-9e2e-f90b0019eda7" /> <img width="1851" height="1471" alt="Screenshot 2026-05-05 at 2 15 06 PM" src="https://github.com/user-attachments/assets/cf3023fe-b6db-4503-a6c4-eaa0ec0659f8" /> https://github.com/user-attachments/assets/7507fd7d-ddb5-457a-9f7d-cbf89b36eb20 </p> </details> > [!NOTE] > This PR was authored by Coder Agents.	2026-05-06 13:55:38 +00:00
Kayla はな	f6233e622b	fix(cli): use app slug instead of raw command in terminal URLs (#24827 )	2026-05-05 19:43:08 -06:00
Ethan	4751416b29	fix!: persist structured chat errors (#24919 ) Breaking change for changelog: > `codersdk.Chat.last_error` now returns a structured `ChatError` object (`{message, kind, provider, retryable, status_code, detail}`) instead of a plain string. The chats API is experimental (`/api/experimental/chats`), so this ships without a deprecation cycle; consumers reading `chat.last_error` as a string must update to read `chat.last_error.message`. SDK/generated TypeScript terminal error payloads now use the single `ChatError` type; the live stream error payload type is renamed from `ChatStreamError` to `ChatError`. Persisted chat errors now carry the same provider-specific detail (kind, provider, retryable, HTTP status, optional detail) as the live stream, so refreshing a failed chat rehydrates with the full structured error instead of a one-line headline. Existing rows are migrated in place: legacy text errors are wrapped into `{message, kind: "generic"}` so already-errored chats still render, and rows with `last_error IS NULL` stay NULL. Internally, persisted fallback decoding now reuses the existing `chaterror.KindGeneric` constant, with no JSON value change. Closes CODAGT-239	2026-05-05 12:56:06 +10:00
Thomas Kosiewski	c3794d54ac	fix: avoid PTY for ssh command mode (#24862 )	2026-05-01 15:02:05 +02:00

1 2 3 4 5 ...

1906 Commits