coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 13:08:25 +00:00

Author	SHA1	Message	Date
Danny Kopping	c8555e2163	fix: deprecate ai provider seeding env config (#25854 ) Environment variables used to configure AI Gateway providers are now deprecated, and we need to reflect this as such.	2026-06-01 15:15:47 +02:00
Mathias Fredriksson	6ecf804896	test(cli): eliminate race in PausedDuringWaitForReady test (#25858 ) The PausedDuringWaitForReady and WaitsForWorkingAppState tests flaked because the quartz resetTrap was released immediately after catching ticker.Reset (line 174), allowing client.TaskByID (line 175) to race with the subsequent DB mutation (pauseTask / PatchAppStatus). Fix: keep the resetTrap open across both poll iterations. On the first poll, release the trap so the goroutine sees the initial state and continues. On the second poll, hold the goroutine frozen at ticker.Reset while mutating state. Then release; client.TaskByID deterministically sees the mutated state. No race because the goroutine cannot execute client.TaskByID while trapped. Closes CODAGT-482	2026-06-01 13:58:57 +03:00
Spike Curtis	3a727a9087	test: batch 01 of refactoring CLI tests not to use PTY (#25871 ) Part of https://github.com/coder/internal/issues/1400 Batch of refactored CLI tests to avoid creating PTYs.	2026-05-29 20:12:52 +00:00
Spike Curtis	8a47b7fa14	test: batch 00 of refactoring CLI tests not to use PTY (#25868 ) Part of https://github.com/coder/internal/issues/1400 Batch of refactored CLI tests to avoid creating PTYs.	2026-05-29 15:33:45 -04:00
Mathias Fredriksson	2af037ce02	fix(cli): use quartz mock clock in PausedDuringWaitForReady test (#25811 ) PausedDuringWaitForReady used the real clock, so the 5s poll in waitForTaskIdle could race with an in-flight stop build. The SQL view (tasks_with_status) returns "unknown" for stop builds with job_status != "succeeded" because the build_status CASE has no branch for (stop, pending) or (stop, running). On macOS CI, where the provisioner is slower, the poll fires during this transient window and hits the TaskStatusUnknown case instead of TaskStatusPaused, failing with "task entered unknown state" rather than the expected "was paused". Convert to the same quartz mock clock pattern that PR #25648 applied to WaitsForWorkingAppState: inject a mock clock via NewWithClock, trap ticker creation and reset, then advance time deterministically so the poll fires after the stop build completes. Closes CODAGT-482	2026-05-29 11:25:06 +03:00
Danny Kopping	5b10268827	feat: serve 503 sentinel for disabled providers (#25794 ) _Disclosure: created with Coder Agents._ When providers are disabled, we should serve a sentinel error so the requesting client (Claude Code, Coder Agents, etc) is informed. Coder Agents can also conditionalize its display to show a helpful error message. --------- Signed-off-by: Danny Kopping <danny@coder.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 10:24:16 +02:00
Spike Curtis	ee4126e913	test: refactor CLI create tests not to use PTY (#25807 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->Part of https://github.com/coder/internal/issues/1400 Refactors CLI tests of the `create` command as the first batch of tests refactored to take a PTY out of the loop. One interesting difference I noticed between PTY and a direct pipe to standard in is that on the PTY we write `\r` to enter some input, but the kernel actually sends `\n` (or maybe `\r\n`) to the process, at least on Unix. (On windows we sent `\r\n` into the PTY). This is reflected in the implementation of the `Writer` , otherwise mostly inspired by the PTYTest equivalents.	2026-05-28 17:50:37 -04:00
Danny Kopping	12520ee964	feat: add ai provider status and reload freshness metrics (#25770 ) Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.	2026-05-28 14:57:33 +02:00
Danny Kopping	2770bdc9d1	feat: route extra ai_provider_types through OpenAI and Anthropic providers (#25722 ) _Disclosure:_ _produced_ _with_ _Claude_ _Opus_ _4\.7_ AI Gateway only supports Anthropic (+Bedrock), OpenAI, and Copilot providers at present. All other types (Vercel, Gemini, etc) will be mapped to OpenAI since they support OpenAI-compatible endpoints.	2026-05-27 16:16:05 +02:00
Max Schwenk	ae492495ee	fix(cli): show ready sync start dependencies (#25546 ) ## Problem Follow-on to: - https://github.com/coder/coder/pull/25089 `coder exp sync start` still printed a generic success message when the unit was ready on the first status check. That hid whether the unit had no dependencies or had dependencies that were already satisfied before `sync start` ran. Before: ```text Success ``` ## Solution Print explicit startup output for both ready-at-first-check cases. After, dependencies already satisfied: ```text Unit "test-unit" started immediately, dependencies already satisfied: [dep-unit, dep-unit-2] ``` After, no dependencies: ```text Unit "test-unit" started with no dependencies ``` The existing waiting path is unchanged and still reports the dependencies while waiting and after waiting finishes. Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>	2026-05-27 12:33:39 +02:00
Danny Kopping	79e007cf30	feat: hot-reload aibridged and aibridgeproxyd providers on DB changes (#25673 ) Previously the in-process aibridge daemon and the enterprise aibridgeproxy daemon both snapshotted their provider routing once at boot. Any `ai_providers` or `ai_provider_keys` mutation required a restart for either to pick it up. Add an `ai_providers_changed` pubsub channel that the CRUD handlers publish on after Create / Update / Delete. Both daemons subscribe: - aibridged rebuilds its `[]aibridge.Provider` snapshot via `BuildProviders` and swaps it into the pool atomically. Inflight requests keep serving against the bridge they already acquired; new acquires build against the new snapshot. Per-provider construction errors stay scoped to the offending row. - aibridgeproxyd rebuilds its routing snapshot from `GetAIProviders` and swaps the host→provider map atomically. The MITM listener picks up new providers without restart. DB read for aibridgeproxyd uses the existing `AsAIProviderMetadataReader` subject for routing-only access.	2026-05-27 11:58:43 +02:00
Ethan	e91bec8574	fix(cli): close aibridge daemon before WebSocket shutdown wait (#25719 ) > [!WARNING] > The investigation and solution in this PR were done with [Mux](https://mux.coder.com/). I've reviewed the investigation methodology, evidence and solution, and it all appears sound. ## Summary PR #25570 (`refactor: move aibridged out of enterprise to AGPL`, merged 2026-05-22) added an in-memory aibridge DRPC server in `coderd/aibridged.go` that does `api.WebsocketWaitGroup.Add(1)` and only releases `Done()` when its client session is closed. PR #25575 then flipped `CODER_AI_GATEWAY_ENABLED` to default to `true`, so every `cli.Server()` invocation now spins up that goroutine. In `cli/server.go`, the only call to `aibridgeDaemon.Close()` was a `defer` scheduled at function return. During graceful shutdown the code first calls `coderAPICloser.Close()`, which waits on `api.WebsocketWaitGroup`. That wait sits for the full 10s timeout in `coderd/coderd.go` (`websocket shutdown timed out after 10 seconds`), then returns, then the function unwinds, and only then does the deferred `aibridgeDaemon.Close()` fire and let the goroutine call `Done()`. The 10s tax was previously latent (aibridged was enterprise-only and opt-in). After the two May 22 PRs it hit every `cli.Server()` test. On Linux/macOS CI it just makes the suite slower; on the Depot Windows runner, the ramdisk reservation leaves only ~17 GiB of headroom and the ~10s shutdown tails of multiple concurrent package binaries overlap into an OOM, presenting as `test-go-pg (windows-2022)` jobs that die silently at the ~600s watchdog with an empty `steps` array. See Slack: https://codercom.slack.com/archives/C05AE94121Z/p1779807717764189 ## Fix Close `aibridgeDaemon` explicitly during graceful shutdown, before `coderAPICloser.Close()` waits on the WebSocket wait group. This matches the existing ordered-shutdown pattern used for `tunnel` and `notificationsManager`. The deferred `aibridgeDaemon.Close()` is retained as a safety net for early-return paths, and is safe to double-call because `aibridged.Server.Close()` is already idempotent via `shutdownOnce` in `coderd/aibridged/aibridged.go`. ## Regression test `TestServer_AIGatewayShutdownOrdering` boots a real `coder server` with `--ai-gateway-enabled=true`, cancels its context, and asserts graceful shutdown finishes in under 8s. With the fix the test runs in ~0.1s; without the fix it fails deterministically at ~10.0s. The flag is passed explicitly so the test continues to guard the ordering even if the deployment default is ever flipped back. ## Evidence this fixes the OOM On Linux the patched `cli` test package drops from 114 s back to its pre-regression 30 s wall time at the same single-process peak RSS (~7.6 GiB), and the `websocket shutdown timed out after 10 seconds` log line disappears from every server-test run. Since the Windows OOM is the sum of multiple concurrent 10 s shutdown tails overlapping past the runner's ~17 GiB headroom, removing those tails returns the concurrent-RSS budget to its pre-regression level. The Windows OOM was intermittent (a handful of hits across many runs since May 22), so a single green `test-go-pg (windows-2022)` job on this PR is not by itself proof; confirmation will come from watching Windows runs on `main` over the next several days and seeing the ~600 s silent-kill fingerprint stop recurring. Relates to ENG-2771	2026-05-27 17:33:14 +10:00
Michael Suchacz	8b1705eb65	feat: route chatd provider traffic through aibridge (#25629 ) ## Summary Routes chatd model calls backed by concrete AI Provider rows through the in-process aibridge transport by default, with deployment options to use direct provider routing when AI Gateway is disabled or chat AI Gateway routing is disabled. - Splits model routing into common, direct provider, and AI Gateway paths behind a single deployment-mode entry point. - Builds chatd models through explicit request, route, and options data. Active API key attribution is passed explicitly instead of being hidden inside generic model construction. - For AI Gateway BYOK routes, resolves the user's provider key in chatd, forwards it through provider-specific auth headers, and sets `X-Coder-AI-Governance-Token` to the `delegated` marker so aibridge preserves those headers while still stripping Coder-specific metadata. - Keeps central provider credentials and deployment fallback credentials out of forwarded provider auth headers, so AI Gateway central policy remains authoritative. - Redacts delegated provider auth from default string formatting to avoid accidental plaintext logging of user BYOK credentials. - Covers selected chat models, advisor overrides, title and quickgen paths, subagent overrides, computer use model selection, and an integration-style chat turn through the aibridge transport path. - Persists initiating API key IDs on chat and queued user messages, including subagent child messages, and fails closed for AI Gateway-routed model builds without an active key. - Removes unused `api_key_id` indexes while keeping the persistence columns and foreign keys. - Keeps the deployment option available through config and env parsing, but hides it from CLI help and generated docs. - Stabilizes the subagent poll fallback test so background CreateChat processing cannot win the state transition under slower CI environments. ## Tests - `go test ./coderd/x/chatd -run 'TestAIGatewayProviderAuthForUser\|TestAIGatewayProviderAuthRedactsFormatting\|TestResolveModelRouteForConfigAIGatewayProviderAuth\|TestAIGatewayModelForwardsProviderAuth\|TestProcessChat_AIGatewayRoutingUsesDelegatedAPIKey\|TestAwaitSubagentCompletion' -count=1` - `go test ./coderd/aibridged -run 'TestServeHTTP_DelegatedAPIKey\|TestServeHTTP_StripCoderToken' -count=1` - `git diff --check HEAD~1..HEAD` - `make lint` > Mux working on behalf of Mike.	2026-05-26 19:31:52 +00:00
Danny Kopping	a56c88a0cc	fix: run AI provider seed and build after newAPI so dbcrypt applies (#25699 ) ## Problem Two related symptoms of the same architectural issue: the `dbcrypt` wrapper is installed inside `enterprise/coderd.New`, so any access to `options.Database` that happens before `newAPI` runs bypasses encryption. Symptom 1 (reads): Provider keys added via the admin UI are encrypted at rest. `BuildProviders` was running before `newAPI`, against the unwrapped store, so the ciphertext was read as-is and shoved into the keypool as the upstream credential. Anthropic/OpenAI reject it, and the interception log shows: ``` coderd.aibridged.pool: interception failed ... error="all configured keys failed authentication" credential_kind=centralized credential_hint=PaPb...4A== credential_length=184 ``` Symptom 2 (writes): `SeedAIProvidersFromEnv` was also running before `newAPI`, against the unwrapped store, so env-derived keys (`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`, etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID = null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set. ## Fix Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after `newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes encrypt correctly; reads decrypt correctly. The enterprise closure (`enterprise/cli/server.go`) runs inside `newAPI` and calls `BuildProviders` for the aibridgeproxyd at that point. Once the agpl seed moves to after `newAPI`, the proxy on first boot would see no env-seeded providers. Add a matching seed call inside the enterprise closure before its `BuildProviders` to cover that case. Seeding is idempotent, so the agpl-side seed running again post-`newAPI` is a no-op when the rows already exist. ## Known shortcomings The clean version of this fix would just inherit `ctx` like every other startup step and place these calls naturally. It can't, for two reasons that are both about the surrounding handler architecture rather than this change: 1. `dbcrypt` wrapping is positioned inside `newAPI`, not around `options.Database` at creation. That's why both seed and build have to wait until after `newAPI` in the first place. The principled fix is to install the wrapper at the point the store is created (behind a hook the enterprise build supplies), so every consumer sees a single authoritative view and the ordering stops mattering. This would also collapse the duplicated seed call back to a single site. 2. The handler's shutdown sequence is not deferred. `coderAPICloser.Close()` and the other teardown steps run only if control reaches the `select` at the bottom of the handler. An early `return` from anywhere in Phase 1 (e.g. seed/build returning `context.Canceled` when the user hits ctrl-c during startup) skips that block and orphans all the goroutines `newAPI` spawned — tailnet workers, gitsync, telemetry batcher, etc. `goleak` then catches them at package teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving the shutdown into deferred closers (with a `sync.Once`-guarded close to avoid double-close from the explicit Phase 2 call) is the principled fix. For this PR I took the smallest change that fixes the reported bugs: a detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the seed and build call sites in both the agpl and enterprise paths. It lets the calls complete even if the user cancels during startup, after which the handler reaches its shutdown select naturally and tears down through Phase 2. Both shortcomings above are worth addressing separately. ## Test plan - `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`; passes locally with `-count=3`. - Manually verified on a deployment with `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers: `ai_provider_keys.api_key_key_id` is populated, `api_key` is base64 ciphertext, and upstream auth succeeds. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 21:27:02 +02:00
Danny Kopping	282ab7de34	refactor: load AI providers from the database at startup (#25672 ) Replace the env-based `BuildProviders` with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via `SeedAIProvidersFromEnv` (run at boot) and `BuildProviders` reads it back as `aibridge.Provider` instances. `cli/server.go` and `enterprise/cli/server.go` both call the same path, so aibridged and aibridgeproxyd see the same provider set. Per-provider `DumpDir` is replaced by a top-level `CODER_AI_GATEWAY_DUMP_DIR` base; each provider's effective dump path is `<base>/<provider name>`.	2026-05-26 15:57:01 +02:00
Ethan	4f1043a50a	feat(scaletest): add chat scaletest command (#25553 ) Adds `coder exp scaletest chat`, a harness for creating Coder Agents chat load. Start the mock LLM separately, prepare the scaletest workspaces you want to target, then run the chat scaletest against the existing `scaletest-*` fleet selected by the shared workspace targeting flags: ```sh coder exp scaletest llm-mock --address 127.0.0.1:18080 coder exp scaletest chat --llm-mock-url http://127.0.0.1:18080/v1 --chats-per-workspace 10 --turns 1 coder exp scaletest chat --llm-mock-url http://127.0.0.1:18080/v1 --template docker --target-workspaces 0:10 --chats-per-workspace 1 --turns 10 --turn-start-delay 30s ``` This is the same pattern used by the `workspace-traffic` load generator. Keeping the fake LLM as a separate process is intentional so it can be scaled independently from the Coder deployment, which will likely be necessary as we scale up and up. This PR is the starting point: it provides the command, mock provider/model bootstrap, existing workspace selection, chat streaming, follow-up turns, metrics, and cleanup. Follow-up PRs will add multi-step turns via tool calls. I'm still a bit iffy on the mechanism I have for that. It'll likely involve having the runner send some magic strings that the mock will recognise. Relates to CODAGT-307 Relates to GRU-48 Relates to https://github.com/coder/scaletest/issues/124 Generated by Mux, but reviewed by a human	2026-05-26 14:19:36 +10:00
Mathias Fredriksson	7958ad6d04	fix(cli): use quartz clock in waitForTaskIdle for immediate first poll (#25648 ) waitForTaskIdle used time.NewTicker(5s) which delays the first poll by 5 seconds. Debugger tracing proved the failure mechanism: on slow CI (Windows), the first poll at 5s sees "working" (idle patch has not landed due to goroutine scheduling), needs poll #2 at 10s, but the 25s context expires before it fires. Two changes: 1. Use r.clock.NewTicker (quartz) with time.Nanosecond initial interval and Reset(5s) for immediate first poll. Tests inject a mock clock via clitest.NewWithClock for deterministic control. 2. Rewrite WaitsForWorkingAppState test with quartz traps (NewTicker + TickerReset) for deterministic synchronization instead of racing goroutines. Fix PausedDuringWaitForReady sync point. Closes DEVEX-381	2026-05-25 19:14:29 +03:00
Danielle Maywood	5deab9f721	test: wait for devcontainer readiness (#25567 )	2026-05-22 13:55:21 +01:00
Danny Kopping	ef6ee2af68	chore: tolerate empty providers at startup and log env seeds (#25605 ) Since AI Gateway is now enabled by default, and if the AI Gateway Proxy is enabled too it's possible the server can start without any configured providers. This would previously block startup, which is unacceptable. In an upstack PR we will handle reloading the providers at runtime, so the server needs to be able to start up even if it can't handle any proxy requests to AI Gateway. This change was necessitated because if there are providers configured in the environment they need to be seeded _before_ the proxy starts.	2026-05-22 12:45:14 +02:00
Danny Kopping	ddec110b0e	refactor: move aibridged out of enterprise to AGPL (#25570 ) In order to allow Coder Agents to use AI Gateway in OSS, we need to rehome the `aibridged`\-related code into the AGPL path. The HTTP API is only registered under enterprise so will still require the AI Governance Add-on to be present in order to use it, whereas Coder Agents uses an in-memory pipe to the same handlers.	2026-05-22 09:11:37 +02:00
Danny Kopping	c50b0e84b9	feat!: default `CODER_AI_GATEWAY_ENABLED` to true (#25575 ) `CODER_AI_GATEWAY_ENABLED` / `CODER_AIBRIDGE_ENABLED` is now being defaulted to `true` now that it will be used by Coder Agents. If you previously had this value disabled explicitly, that value will persist.	2026-05-22 08:57:36 +02:00
Danny Kopping	9341efec9f	feat!: seed ai_providers from env on server startup (#24895 ) _Disclaimer: implemented by a Coder Agent using Claude Opus 4.7_ Part of the implementation of [RFC: Common AI Provider Configs](https://www.notion.so/coderhq/RFC-Common-AI-Provider-Configs-34bd579be59280ed958feffb82024797) (AIGOV-201). ## Note This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database. I'll raise a PR upstack to document this process and workarounds should a startup fail. ## What this PR does Reconciles environment-derived AI provider configuration with the `ai_providers` table at server startup. The seed runs before the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy `CODER_AIBRIDGE_` environment variables become a one-shot migration source. ### Behavior - Concurrent server starts are serialized through a Postgres advisory lock (`LockIDAIProvidersEnvSeed`). - Missing rows are inserted with an audit entry attributed to the system actor. - Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path). - Existing rows whose canonical hash does not* match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB. - Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts. - Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message. - Unknown provider types (e.g. `copilot`, until the DB enum is widened) are skipped with a log entry rather than failing startup. ### Canonical hashing The `canonicalAIProvider` shape captures exactly the fields that determine runtime behavior — `type`, `base_url`, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is computed on demand from the row + env, never persisted, so the database does not need a new column for it. API keys live in the separate `ai_provider_keys` table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart. <details> <summary>Decision log</summary> - The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration. - The lock uses an `iota` slot in `coderd/database/lock.go` rather than `GenLockID` so it's stable, easy to audit, and matches the convention used for every other startup lock. - A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth. - We deliberately do NOT publish to the `ai_providers_changed` pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel. </details>	2026-05-22 08:37:27 +02:00
Zach	ddc0e99c69	chore: remove coder_secret Terraform integration (#25512 ) Removes the coder_secret Terraform integration: the data.coder_secret consumption path through provisionerdserver → provisioner.proto → provisioner/terraform, the dynamic-parameter secret-requirement validation, and the workspace-update / resolve-autostart surfaces that depended on it. This is being done due to a product/feature direction change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit) and the agent-manifest secret-injection path are untouched. The provisionerd API is bumped from v1.17 to v1.18 rather than rolled back: v1.17 shipped in v2.33.x, so user_secrets field numbers are reserved and the changelog documents both versions. Generated with assistance from Coder Agents.	2026-05-21 09:19:29 -06:00
Thomas Kosiewski	26a0805dcd	fix(cli): isolate root HTTP transports (#25430 ) The CLI root client shared `http.DefaultTransport` for normal API requests and for the version-check build-info request. In parallel tests, other clients can close idle connections on that process-global transport, which can fail the Boundary license check before the AGPL 404 handling runs. `TestBoundaryLicenseVerification/AGPLDeployment` configures a proxy that returns `404` from `/api/v2/entitlements`, which `verifyLicense()` maps to the expected AGPL deployment error. However, `clitest.SetupConfig()` only writes the URL and session token to disk. It does not pass the test's isolated `proxyClient.HTTPClient` into the CLI invocation, so `coder boundary` builds a fresh client through `RootCmd.InitClient()`. Before this change, that fresh client used `http.DefaultTransport`; if another parallel test closed idle connections on the shared transport while the entitlement request was in flight, Go returned `http: CloseIdleConnections called` instead of the proxy's `404`. The command then failed with `failed to get entitlements`, and the test never reached the expected AGPL error path. Clone the default transport for each CLI root HTTP client and for the unwrapped build-info client, preserving the configured TLS settings when present. Each CLI invocation now gets its own transport instance, so cleanup from unrelated parallel tests cannot interrupt its entitlement or build-info requests. Closes https://github.com/coder/internal/issues/1538 <details> <summary>Coder Agents notes</summary> Generated by Coder Agents for Linear ENG-2705. Local validation: - `go test ./cli -run 'TestNewHTTPTransport\|Test_ensureTLSConfig\|Test_wrapTransportWithVersionCheck' -count=1` - `go test ./enterprise/cli -run TestBoundaryLicenseVerification/AGPLDeployment -count=20 -parallel=16` - `go test ./cli ./enterprise/cli` - `make lint` - `go test ./enterprise/cli -run '^TestBoundaryLicenseVerification$' -count=50 -parallel=16` - pre-commit hook during `git commit` </details>	2026-05-21 16:51:34 +02:00
Paweł Banaszewski	46e93e6325	chore: add ai_gateway options that alias aibridge options (#25061 ) Adds options matching new AI Gateway naming. New options are added as alias for old options. Old options are still working. Old options have deprecated message. No conflict detection was added. Updated documentation so it mentions only new options. Added note about old options still working. > Various AI tools where used to create this PR	2026-05-21 11:14:11 +02:00
Spike Curtis	05e47b9c0f	fix: filter out cross-talk on TestPortForward (#25503 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Fixes https://github.com/coder/internal/issues/1539 Protects from port cross-talk by adding a short random prefix to our socket communication and instructing the service on the workspace agent side of the test to ignore any connections that don't use the prefix.	2026-05-20 13:08:57 -04:00
Cian Johnston	85289464b6	fix(cli): remove unnecessary PTY from TestServerCreateAdminUser/Validates (#25444 ) Fixes https://linear.app/codercom/issue/PLAT-224 The Validates subtest only checks that `Run()` returns a validation error and never reads PTY output. We don't need it in this test, so removing. > 🤖 Generated by Coder Agents	2026-05-19 10:35:50 +01:00
Danielle Maywood	170a6e1fe9	feat: add chat sharing foundation (#25041 )	2026-05-18 22:32:05 +01:00
Callum Styan	191dd230ae	feat: add agentfake scaletest subcommand (#25072 ) This PR builds on top of https://github.com/coder/coder/pull/25070 to add a way of running the larger "fake agent" manager via the existing CLI, pulling in the URL/credentials already set. With this, we can run a pod per scaletest region to act as all the workspaces in that region. This is in a new subcommand `scaletest agentfake` currently. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2026-05-15 14:36:54 -07:00
Danny Kopping	841b777ccd	feat: add ai_providers table, queries, dbauthz, audit, RBAC (#24892 )	2026-05-14 16:10:46 +02:00
Max Schwenk	f3e90b334d	fix(cli): show sync wait dependencies (#25089 ) ## Problem `coder exp sync want` and `coder exp sync start` both printed generic success messages, which hid the dependency units involved in startup coordination. Before, declaring dependencies with `sync want` printed: ```text Success ``` Before, `sync start` printed while waiting, then finished with another generic success message: ```text Waiting for dependencies of unit 'test-unit' to be satisfied... Success ``` ## Solution Print the dependency units in both cases, using wording that matches where the command is in the lifecycle. After, `sync want` prints the dependencies it declared for the unit: ```text Unit "test-unit" declared dependencies: [dep-unit] ``` After, `sync start` enumerates the dependencies while it is waiting, then prints the same dependencies after the unit starts executing: ```text Unit "test-unit" is waiting for dependencies to be satisfied: [dep-unit, dep-unit-2] Unit "test-unit" finished waiting for dependencies: [dep-unit, dep-unit-2] ``` The sync golden tests now cover the updated output, including multiple dependencies for `sync start`.	2026-05-14 14:45:20 +02:00
Steven Masley	0f505aa4da	chore: unhide flag to force unix filepaths in config-ssh (#25142 ) Docs now include this flag. This flag is now also viewable in linux/mac despite it effectively being a `no-op`. Closes https://github.com/coder/coder/issues/24205	2026-05-13 14:59:33 -05:00
Michael Suchacz	38f586107d	refactor: remove agents TUI (#25190 )	2026-05-13 21:30:11 +02:00
Yevhenii Shcherbina	b5e1ea33d8	feat: add AI budget policy and period deployment config (#25122 ) Closes https://linear.app/codercom/issue/AIGOV-283/add-deployment-config-for-ai-budget-policy-and-period Adds `CODER_AI_BUDGET_POLICY` and `CODER_AI_BUDGET_PERIOD` deployment options for AI Governance cost controls.	2026-05-12 10:48:36 -04:00
Steven Masley	19573e8aee	feat!: patchTemplateMeta to use optional fields (#24984 ) Closes https://github.com/coder/coder/issues/13112 Breaking Change: Removed status code `StatusNotModified` when no diffs occur in a patch. Now the patch is always applied and a template is always returned.	2026-05-11 12:43:52 -05:00
Zach	81e2be69e9	test: use typed atomics in test files (#25071 ) Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent mixing atomic and non-atomic access on the same value, guarantee 64-bit alignment on 32-bit platforms, and provide a cleaner API.	2026-05-11 08:41:17 -06:00
Jeremy Ruppel	a1dbd758bc	feat: add template builder deployment config and telemetry types (#25082 )	2026-05-11 09:48:55 -04:00
Thomas Kosiewski	4a6756a3e8	fix: isolate test HTTP clients (#25038 )	2026-05-11 11:03:38 +02:00
Marcin Tojek	febabfb8b2	feat: add request/response dump support to aibridgeproxyd (#24837 ) Closes https://github.com/coder/coder/issues/24335	2026-05-11 10:59:26 +02:00
Nick Vigilante	369a191972	feat: add Quickstart template with language and IDE selection (#24904 ) Add a new Quickstart starter template that lets users pick programming languages, editors, and an optional Git repo to clone. The template uses Docker under the hood but presents a developer-focused experience: pick your tools, start coding. ## What's included - Languages parameter (multi-select): Python, Node.js, Go, Rust, Java, C/C++ - IDEs parameter (multi-select): VS Code (Browser), VS Code Desktop, Cursor, JetBrains, Zed, Windsurf - Git repo parameter: Optional URL to clone on workspace start - JetBrains filtering: Maps selected languages to relevant IDE codes (Python → PyCharm, Go → GoLand, etc.) - Docker precondition check: Uses `data "external"` + `terraform_data` precondition to surface a friendly error when Docker is unavailable, before the Docker provider fails with a cryptic message - 4 presets: Web Development, Backend (Go), Data Science, Full Stack - Single install script: All languages install in one `coder_script` to avoid apt-get lock conflicts (agent scripts run in parallel via `errgroup`) <details><summary>Design decisions</summary> - Docker as invisible backend: Docker is required on the Coder server but never mentioned in the user-facing parameter UI. The experience is entirely "pick languages, pick editors, start coding." - `coder_script` over startup_script: Language installs use a templated script file (`install-languages.sh.tftpl`) driven by the languages parameter. A single script avoids dpkg lock contention since `coder_script` resources execute concurrently. - `data "external"` for Docker check: The external provider probes Docker availability independently of the Docker provider. If Docker is down, the `terraform_data` precondition fails with a human-readable message before any `docker_` resource is evaluated. This depends on the Docker provider connecting lazily (at resource eval time, not at provider init), which current behavior confirms. - JetBrains filtering by language: Rather than showing all 9 JetBrains IDEs, the template computes relevant IDE codes from the language selection (e.g. Python → PY, Go → GO) and passes them as `default` to the JetBrains module. - Arch-aware Go install*: The install script detects `uname -m` to download the correct Go binary for amd64 or arm64. </details> <details><summary>Screenshots and recordings from the UI</summary> <p> <img width="1851" height="1471" alt="Screenshot 2026-05-05 at 2 14 20 PM" src="https://github.com/user-attachments/assets/d4c9cdc5-d311-43a5-9e2e-f90b0019eda7" /> <img width="1851" height="1471" alt="Screenshot 2026-05-05 at 2 15 06 PM" src="https://github.com/user-attachments/assets/cf3023fe-b6db-4503-a6c4-eaa0ec0659f8" /> https://github.com/user-attachments/assets/7507fd7d-ddb5-457a-9f7d-cbf89b36eb20 </p> </details> > [!NOTE] > This PR was authored by Coder Agents.	2026-05-06 13:55:38 +00:00
Kayla はな	f6233e622b	fix(cli): use app slug instead of raw command in terminal URLs (#24827 )	2026-05-05 19:43:08 -06:00
Ethan	4751416b29	fix!: persist structured chat errors (#24919 ) Breaking change for changelog: > `codersdk.Chat.last_error` now returns a structured `ChatError` object (`{message, kind, provider, retryable, status_code, detail}`) instead of a plain string. The chats API is experimental (`/api/experimental/chats`), so this ships without a deprecation cycle; consumers reading `chat.last_error` as a string must update to read `chat.last_error.message`. SDK/generated TypeScript terminal error payloads now use the single `ChatError` type; the live stream error payload type is renamed from `ChatStreamError` to `ChatError`. Persisted chat errors now carry the same provider-specific detail (kind, provider, retryable, HTTP status, optional detail) as the live stream, so refreshing a failed chat rehydrates with the full structured error instead of a one-line headline. Existing rows are migrated in place: legacy text errors are wrapped into `{message, kind: "generic"}` so already-errored chats still render, and rows with `last_error IS NULL` stay NULL. Internally, persisted fallback decoding now reuses the existing `chaterror.KindGeneric` constant, with no JSON value change. Closes CODAGT-239	2026-05-05 12:56:06 +10:00
Thomas Kosiewski	c3794d54ac	fix: avoid PTY for ssh command mode (#24862 )	2026-05-01 15:02:05 +02:00
Dean Sheather	e57525002c	chore: remove agents experiment flag and mark feature as beta (#24432 ) Remove the `ExperimentAgents` feature flag so the Agents feature is always available without requiring `--experiments=agents`. The feature is now in beta. Existing deployments that still pass `--experiments=agents` will get a harmless "ignoring unknown experiment" warning on startup. ### Changes Backend: - Remove `RequireExperimentWithDevBypass` middleware from chat and MCP server routes - Always include `AgentsAccessRole` in assignable site roles (later refactored to org-scoped on main; rebase keeps that) - Always set `AgentsTabVisible = true`, then drop the entire dead `AgentsTabVisible` metadata pipeline (Go htmlState field, populateHTMLState goroutine, HTML meta tag, useEmbeddedMetadata registration, mock); no production consumer reads it. `AgentsNavItem` already gates on `permissions.createChat`. - Make `blob:` CSP `img-src` addition unconditional - Remove `ExperimentAgents` constant, `DisplayName` case, and `ExperimentsKnown` entry CLI: - Graduate the agents TUI from `coder exp agents` to `coder agents` (moved from `AGPLExperimental()` to `CoreSubcommands()`) - Drop the `agent` alias so it does not collide with the hidden workspace-agent command - Rename implementation files `cli/exp_agents_.go` -> `cli/agents_.go` and internal identifiers (`expChatsTUIModel` -> `chatsTUIModel`, `newExpChatsTUIModel` -> `newChatsTUIModel`, `setupExpAgentsBackend` -> `setupAgentsBackend`, `startExpAgentsSession` -> `startAgentsSession`, `expAgentsPtr` -> `agentsPtr`, `expAgentsSession` -> `agentsSession`, `TestExpAgents` -> `TestAgents`). `expClient` (the `codersdk.ExperimentalClient` local) is kept; `coderd/exp_chats.go` and other still-experimental `cli/exp_.go` commands are intentionally untouched. Frontend:* - Remove experiment check from `AgentsNavItem` - render when `canCreateChat` is true - Remove `agentsEnabled` experiment check from `WorkspacesPage`, then gate `chatsByWorkspace` on `permissions.createChat` so users without chat access don't trigger the per-page DB query (Copilot review feedback) - Add `FeatureStageBadge` (beta) next to the Coder logo in the Agents sidebar (desktop + mobile) Docs: - Remove experiment flag setup instructions from `early-access.md` and `getting-started.md` (and rename `early-access.md`'s "Enable Coder Agents" heading to "Set up Coder Agents", since there is no enablement step left) - Update `chats-api.md` and `getting-started.md`'s Chats API note to say "beta" instead of "experimental" - `docs/manifest.json`: drop "experimental" from the Chats API sidebar description - `make gen` regenerated `docs/reference/cli/agents.md` and the CLI index - `scripts/check_emdash.sh`: exclude `cli/testdata/.golden` and `enterprise/cli/testdata/.golden` from the new repo-wide emdash lint, since serpent emits emdash borders in every generated `--help` golden file Tests: - Remove `ExperimentAgents` setup from all test files (14 occurrences across 7 files) - Update stale "with the agents experiment" comments in `coderd/x/chatd/integration_test.go` and `coderd/mcp_test.go` <img width="1185" height="900" alt="image" src="https://github.com/user-attachments/assets/b420bc8f-41d6-42c6-abd8-ad572533d651" /> > 🤖 Generated by Coder Agents	2026-05-01 01:49:00 +10:00
Susana Ferreira	dbb50ebaaf	feat: remove 429 from aibridge circuit breaker failure conditions (#24701 ) ## Description Removes 429 (Too Many Requests) from the circuit breaker failure conditions. Rate limiting is now handled by automatic key failover instead of tripping the circuit breaker. ## Changes `DefaultIsFailure` no longer treats 429 as a circuit breaker failure. The circuit breaker now only trips on server overload responses (503, 529). Tests and integration tests updated to use 503 instead of 429 for tripping circuits. Description strings in deployment config updated to reflect the change. Closes https://github.com/coder/internal/issues/1445 > [!NOTE] > Initially generated by Coder Agents, modified and reviewed by @ssncferreira	2026-04-30 09:31:32 +01:00
Susana Ferreira	101a4082dd	feat: support multiple keys per AI Bridge provider (#24683 ) ## Description Adds support for configuring multiple API keys per AI Bridge provider. This PR introduces the configuration parsing and validation only; wiring the key pools into the aibridge providers will happen in upstream PRs. ## Changes Providers now accept a comma-separated list of keys via the `KEYS` env var (or a single key via the existing `KEY` var). The two are mutually exclusive. Bedrock follows the same pattern with `BEDROCK_ACCESS_KEYS` / `BEDROCK_ACCESS_KEY_SECRETS`, with an additional validation that the two slices have matching lengths. Key validation at startup checks for empty values, duplicates, and a maximum of 5 keys per provider. Related to: https://github.com/coder/internal/issues/1445 > [!NOTE] > Initially generated by Coder Agents, modified and reviewed by @ssncferreira	2026-04-30 09:19:32 +01:00
Callum Styan	950660e392	fix(cli): extend `exp scaletest cleanup` to properly clean up prebuilds (#23628 ) Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 10:28:27 -07:00
Paweł Banaszewski	a24dc19d49	chore: clean up env var usage in aibridge (#24783 ) > AI tools where used when creating this PR This PR removes environment variable parsing from `/aibridge` directory. Added env variables/flags for dump dir as coder options. Only added to new indexed provider options (`CODER_AIBRIDGE_PROVIDER_<N>_`) not to deprecated legacy env variables (`CODER_AIBRIDGE_ANTHROPIC_` and `CODER_AIBRIDGE_OPENAI_KEY_*`). Reverted adding `MaxRetries` option as it will be removed soon due to key failover work: https://github.com/coder/coder/pull/24783#discussion_r3155544808	2026-04-29 18:28:37 +02:00
George K	3f0e015fe5	fix: allow coderd to start with an empty DERP map when built-in DERP is disabled (#24544 ) Allow coderd to start with an empty base DERP map when built-in DERP is disabled and no static DERP map is configured, so DERP can come from workspace proxies after startup. Also add a DERP healthcheck warning when no DERP servers are currently available at runtime. Related to: https://linear.app/codercom/issue/PLAT-43/bug-coderd-unable-to-be-started-if-built-in-derp-server-disabled-and Related to: https://github.com/coder/coder/issues/22324	2026-04-28 09:17:08 -07:00
Mathias Fredriksson	3c450899ea	fix: pass agent context config explicitly instead of reading env (#24759 ) The CODER_AGENT_EXP_* env vars are agent-internal options. When set in the workspace environment they leak to MCP subprocesses and user shells. ReadEnvConfig() captures the values and ClearEnvVars() strips them before the reinit loop, so config survives agent restarts. NewAPI and ReadEnvConfig both use applyDefaults() to fill zero fields. The chatd test passes config via agenttest.WithContextConfigFromEnv().	2026-04-28 17:58:28 +03:00

1 2 3 4 5 ...

1899 Commits