Commit Graph

407 Commits

Author SHA1 Message Date
Steven Masley 4591212482 feat: implement SCIM handler for SCIM 2.0 compliance (#25572)
Rewrites the SCIM 2.0 user provisioning handler to be RFC 7644
compliant. Verified against an external IdP Okta.

Behavior is OPT IN
2026-05-28 10:00:37 -05:00
Danny Kopping 12520ee964 feat: add ai provider status and reload freshness metrics (#25770)
Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.
2026-05-28 14:57:33 +02:00
Danny Kopping a9f5ed7644 fix: re-validate provider per request and classify reloads (#25766)
Refactors the `aibridgeproxyd` provider reload mechanism which was unnecessarily complex.

Also ensures that providers are evaluated on each CONNECT request to prevent interception of requests to (newly) disabled providers; in this case the requests will passthrough unencrypted, by design.
2026-05-28 13:22:38 +02:00
Callum Styan 9d37f63fbd feat: report synthetic metadata from fake agents (#25166)
Fake agents now fetch their manifest, spawn a single per-agent metadata
goroutine, and emit batched BatchUpdateMetadata calls with 3072-byte
base64 payloads so scaletest runs mirror the load shape of real agents.
This matches what the current scaletest workspace template does for
metadata. In the future we can extend the harness here to take in a
config option for the metadata payload size.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Mux <mux@coder.com>
2026-05-27 13:49:42 -07:00
Danny Kopping 79e007cf30 feat: hot-reload aibridged and aibridgeproxyd providers on DB changes (#25673)
Previously the in-process aibridge daemon and the enterprise aibridgeproxy daemon both snapshotted their provider routing once at boot. Any `ai_providers` or `ai_provider_keys` mutation required a restart for either to pick it up.

Add an `ai_providers_changed` pubsub channel that the CRUD handlers publish on after Create / Update / Delete. Both daemons subscribe:

- **aibridged** rebuilds its `[]aibridge.Provider` snapshot via `BuildProviders` and swaps it into the pool atomically. Inflight requests keep serving against the bridge they already acquired; new acquires build against the new snapshot. Per-provider construction errors stay scoped to the offending row.
- **aibridgeproxyd** rebuilds its routing snapshot from `GetAIProviders` and swaps the host→provider map atomically. The MITM listener picks up new providers without restart.

DB read for aibridgeproxyd uses the existing `AsAIProviderMetadataReader` subject for routing-only access.
2026-05-27 11:58:43 +02:00
Cian Johnston d3155e1cab test(enterprise/cli): add test to prove fix for #25699 (#25701)
Adds an end-to-end enterprise CLI test to ensure legacy AI provider keys seeded at server startup are encrypted at rest when DBCrypt external token encryption is enabled, preventing regressions related to #25699.

> Partially implemented by Coder Agents, and massaged afterwards by me.
2026-05-26 20:08:07 +00:00
Danny Kopping a56c88a0cc fix: run AI provider seed and build after newAPI so dbcrypt applies (#25699)
## Problem

Two related symptoms of the same architectural issue: the `dbcrypt`
wrapper is installed inside `enterprise/coderd.New`, so any access to
`options.Database` that happens before `newAPI` runs bypasses
encryption.

**Symptom 1 (reads):** Provider keys added via the admin UI are
encrypted at rest. `BuildProviders` was running *before* `newAPI`,
against the unwrapped store, so the ciphertext was read as-is and shoved
into the keypool as the upstream credential. Anthropic/OpenAI reject it,
and the interception log shows:

```
coderd.aibridged.pool: interception failed  ... error="all configured keys failed authentication"
  credential_kind=centralized  credential_hint=PaPb...4A==  credential_length=184
```

**Symptom 2 (writes):** `SeedAIProvidersFromEnv` was also running before
`newAPI`, against the unwrapped store, so env-derived keys
(`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`,
etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID =
null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set.

## Fix

Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after
`newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes
encrypt correctly; reads decrypt correctly.

The enterprise closure (`enterprise/cli/server.go`) runs *inside*
`newAPI` and calls `BuildProviders` for the aibridgeproxyd at that
point. Once the agpl seed moves to after `newAPI`, the proxy on first
boot would see no env-seeded providers. Add a matching seed call inside
the enterprise closure before its `BuildProviders` to cover that case.
Seeding is idempotent, so the agpl-side seed running again post-`newAPI`
is a no-op when the rows already exist.

## Known shortcomings

The clean version of this fix would just inherit `ctx` like every other
startup step and place these calls naturally. It can't, for two reasons
that are both about the surrounding handler architecture rather than
this change:

1. **`dbcrypt` wrapping is positioned inside `newAPI`, not around
`options.Database` at creation.** That's why both seed and build have to
wait until after `newAPI` in the first place. The principled fix is to
install the wrapper at the point the store is created (behind a hook the
enterprise build supplies), so every consumer sees a single
authoritative view and the ordering stops mattering. This would also
collapse the duplicated seed call back to a single site.

2. **The handler's shutdown sequence is not deferred.**
`coderAPICloser.Close()` and the other teardown steps run only if
control reaches the `select` at the bottom of the handler. An early
`return` from anywhere in Phase 1 (e.g. seed/build returning
`context.Canceled` when the user hits ctrl-c during startup) skips that
block and orphans all the goroutines `newAPI` spawned — tailnet workers,
gitsync, telemetry batcher, etc. `goleak` then catches them at package
teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving
the shutdown into deferred closers (with a `sync.Once`-guarded close to
avoid double-close from the explicit Phase 2 call) is the principled
fix.

For this PR I took the smallest change that fixes the reported bugs: a
detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the
seed and build call sites in both the agpl and enterprise paths. It lets
the calls complete even if the user cancels during startup, after which
the handler reaches its shutdown select naturally and tears down through
Phase 2. Both shortcomings above are worth addressing separately.

## Test plan

- `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`;
passes locally with `-count=3`.
- Manually verified on a deployment with
`CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers:
`ai_provider_keys.api_key_key_id` is populated, `api_key` is base64
ciphertext, and upstream auth succeeds.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:27:02 +02:00
Danny Kopping 282ab7de34 refactor: load AI providers from the database at startup (#25672)
Replace the env-based `BuildProviders` with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via `SeedAIProvidersFromEnv` (run at boot) and `BuildProviders` reads it back as `aibridge.Provider` instances. `cli/server.go` and `enterprise/cli/server.go` both call the same path, so aibridged and aibridgeproxyd see the same provider set.

Per-provider `DumpDir` is replaced by a top-level `CODER_AI_GATEWAY_DUMP_DIR` base; each provider's effective dump path is `<base>/<provider name>`.
2026-05-26 15:57:01 +02:00
Danny Kopping 4ddda3a9db feat: filter interceptions and sessions by provider name (#25640)
Allows filtering sessions & interceptions by provider name, and adds a test to vaidate that provider name is immutable (at least until #25606 lands).
2026-05-25 16:31:48 +02:00
Danny Kopping ddec110b0e refactor: move aibridged out of enterprise to AGPL (#25570)
In order to allow Coder Agents to use AI Gateway in OSS, we need to rehome the `aibridged`\-related code into the AGPL path.

The HTTP API is only registered under enterprise so will still require the AI Governance Add-on to be present in order to use it, whereas Coder Agents uses an in-memory pipe to the same handlers.
2026-05-22 09:11:37 +02:00
Danny Kopping c50b0e84b9 feat!: default CODER_AI_GATEWAY_ENABLED to true (#25575)
`CODER_AI_GATEWAY_ENABLED` / `CODER_AIBRIDGE_ENABLED` is now being defaulted to `true` now that it will be used by Coder Agents.

If you previously had this value disabled explicitly, that value will persist.
2026-05-22 08:57:36 +02:00
Danny Kopping 9341efec9f feat!: seed ai_providers from env on server startup (#24895)
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.7_

Part of the implementation of [RFC: Common AI Provider Configs](https://www.notion.so/coderhq/RFC-Common-AI-Provider-Configs-34bd579be59280ed958feffb82024797) (AIGOV-201).

## Note

This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database.

I'll raise a PR upstack to document this process and workarounds should a startup fail.

## What this PR does

Reconciles environment-derived AI provider configuration with the `ai_providers` table at server startup. The seed runs **before** the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy `CODER_AIBRIDGE_*` environment variables become a one-shot migration source.

### Behavior

- Concurrent server starts are serialized through a Postgres advisory lock (`LockIDAIProvidersEnvSeed`).
- Missing rows are inserted with an audit entry attributed to the system actor.
- Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path).
- Existing rows whose canonical hash does **not** match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB.
- Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts.
- Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message.
- Unknown provider types (e.g. `copilot`, until the DB enum is widened) are skipped with a log entry rather than failing startup.

### Canonical hashing

The `canonicalAIProvider` shape captures exactly the fields that determine runtime behavior — `type`, `base_url`, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is **computed on demand from the row + env**, never persisted, so the database does not need a new column for it. API keys live in the separate `ai_provider_keys` table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart.

<details>
<summary>Decision log</summary>

- The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration.
- The lock uses an `iota` slot in `coderd/database/lock.go` rather than `GenLockID` so it's stable, easy to audit, and matches the convention used for every other startup lock.
- A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth.
- We deliberately do NOT publish to the `ai_providers_changed` pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel.

</details>
2026-05-22 08:37:27 +02:00
Michael Suchacz 40878eeba4 feat: add AI provider schema expansion (#25412) 2026-05-22 02:16:01 +02:00
Paweł Banaszewski 46e93e6325 chore: add ai_gateway options that alias aibridge options (#25061)
Adds options matching new AI Gateway naming.
New options are added as alias for old options. Old options are still
working.
Old options have deprecated message.
No conflict detection was added.

Updated documentation so it mentions only new options. Added note about
old options still working.

> Various AI tools where used to create this PR
2026-05-21 11:14:11 +02:00
Danielle Maywood 170a6e1fe9 feat: add chat sharing foundation (#25041) 2026-05-18 22:32:05 +01:00
Callum Styan 191dd230ae feat: add agentfake scaletest subcommand (#25072)
This PR builds on top of https://github.com/coder/coder/pull/25070 to
add a way of running the larger "fake agent" manager via the existing
CLI, pulling in the URL/credentials already set.

With this, we can run a pod per scaletest region to act as all the
workspaces in that region.


This is in a new subcommand `scaletest agentfake` currently.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2026-05-15 14:36:54 -07:00
Danny Kopping 841b777ccd feat: add ai_providers table, queries, dbauthz, audit, RBAC (#24892) 2026-05-14 16:10:46 +02:00
Yevhenii Shcherbina b5e1ea33d8 feat: add AI budget policy and period deployment config (#25122)
Closes
https://linear.app/codercom/issue/AIGOV-283/add-deployment-config-for-ai-budget-policy-and-period

Adds `CODER_AI_BUDGET_POLICY` and `CODER_AI_BUDGET_PERIOD` deployment
options for AI Governance cost controls.
2026-05-12 10:48:36 -04:00
Steven Masley 19573e8aee feat!: patchTemplateMeta to use optional fields (#24984)
Closes https://github.com/coder/coder/issues/13112

**Breaking Change**: Removed status code `StatusNotModified` when no
diffs occur in a patch. Now the patch is always applied and a template
is always returned.
2026-05-11 12:43:52 -05:00
Zach b221632615 fix: wipe user secrets when user is soft-deleted (#24985)
Extend the delete_deleted_user_resources() trigger so that secrets
belonging to a soft-deleted user are removed in the same transaction as
the existing api_keys and user_links cleanup.

user_secrets.user_id has ON DELETE CASCADE, but Coder soft-deletes users
by flipping users.deleted rather than removing the row, so the foreign key
cascade never fires and secrets would otherwise survive deletion.

Assisted by Coder Agents.
2026-05-11 09:07:30 -06:00
Zach 81e2be69e9 test: use typed atomics in test files (#25071)
Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent
mixing atomic and non-atomic access on the same value, guarantee 64-bit
alignment on 32-bit platforms, and provide a cleaner API.
2026-05-11 08:41:17 -06:00
Jeremy Ruppel a1dbd758bc feat: add template builder deployment config and telemetry types (#25082) 2026-05-11 09:48:55 -04:00
Thomas Kosiewski 4a6756a3e8 fix: isolate test HTTP clients (#25038) 2026-05-11 11:03:38 +02:00
Marcin Tojek febabfb8b2 feat: add request/response dump support to aibridgeproxyd (#24837)
Closes https://github.com/coder/coder/issues/24335
2026-05-11 10:59:26 +02:00
Susana Ferreira b6dacb4a3c feat: add automatic key failover for AI Bridge OpenAI (#24847)
## Description

Adds automatic key failover for centralized OpenAI provider, covering both chat completions and responses APIs. Same shape as the Anthropic PR: each upstream call walks the configured key pool, keys are marked **temporary** on 429 (with cooldown from `Retry-After`) and **permanent** on 401/403. Each agentic-loop iteration gets its own fresh walker so a tool-call continuation can fail over independently of the initial request.

BYOK is unchanged: BYOK requests run as a single attempt with no failover.

## Changes

- `config.OpenAI` carries a `KeyPool`. `Key` remains for BYOK Authorization Bearer set per interception.
- Chat completions blocking interceptor: walks the pool via `newChatCompletionWithKeyFailover`, marks keys on key-specific failures, returns on first success or non-failover error.
- Chat completions streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events.
- Responses blocking interceptor: extracts `newResponseWithKeyFailover` parallel to chatcompletions.
- Responses streaming interceptor: per-iteration walker, retains the existing buffer-then-forward design.

## Related Issues

Related to: https://github.com/coder/internal/issues/1446
Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes

## Follow-up PRs

- Bedrock multi-key support.
- Refactor provider vs interceptor config separation.
- Record the actually-used key in the interception credential hint after failover.

> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira
2026-05-07 15:35:46 +01:00
Susana Ferreira f1155ac4d7 feat: add automatic key failover for AI Bridge Anthropic (#24836)
## Description

Adds automatic key failover for centralized Anthropic provider. When a key pool is configured, each upstream call walks the pool and tries keys in order until one succeeds or the pool is exhausted. Keys are marked **temporary** on 429 (with cooldown from `Retry-After`) and **permanent** on 401/403. Errors that aren't key-specific don't trigger failover. Each agentic-loop iteration gets its own fresh walker, so a tool-call continuation can fail over independently of the initial request.

BYOK is unchanged: BYOK requests run as a single attempt with no failover.

## Changes

- `config.Anthropic` carries a `KeyPool`. `Key` remains for BYOK X-Api-Key set per interception.
- Blocking interceptor: walks the pool, marks keys on key-specific failures, returns on first success or non-failover error.
- Streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events.
- New `keypool` error types: `TransientExhaustionError` (carries soonest cooldown) and `ErrPermanentExhaustion`. Replace the prior `ErrAllKeysExhausted`.
- Error responses now consistently include the outer `"type": "error"` field.

## Related Issues

Related to: https://github.com/coder/internal/issues/1446
Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes

## Follow-up PRs

- Bedrock multi-key support.
- Refactor provider vs interceptor config separation.
- Record the actually-used key in the interception credential hint after failover.

> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira
2026-05-07 14:57:44 +01:00
Susana Ferreira dbb50ebaaf feat: remove 429 from aibridge circuit breaker failure conditions (#24701)
## Description

Removes 429 (Too Many Requests) from the circuit breaker failure conditions. Rate limiting is now handled by automatic key failover instead of tripping the circuit breaker.

## Changes

`DefaultIsFailure` no longer treats 429 as a circuit breaker failure. The circuit breaker now only trips on server overload responses (503, 529).

Tests and integration tests updated to use 503 instead of 429 for tripping circuits. Description strings in deployment config updated to reflect the change.

Closes https://github.com/coder/internal/issues/1445

> [!NOTE]
> Initially generated by Coder Agents, modified and reviewed by @ssncferreira
2026-04-30 09:31:32 +01:00
Susana Ferreira 101a4082dd feat: support multiple keys per AI Bridge provider (#24683)
## Description

Adds support for configuring multiple API keys per AI Bridge provider. This PR introduces the configuration parsing and validation only; wiring the key pools into the aibridge providers will happen in upstream PRs.

## Changes

Providers now accept a comma-separated list of keys via the `KEYS` env var (or a single key via the existing `KEY` var). The two are mutually exclusive. Bedrock follows the same pattern with `BEDROCK_ACCESS_KEYS` / `BEDROCK_ACCESS_KEY_SECRETS`, with an additional validation that the two slices have matching lengths.

Key validation at startup checks for empty values, duplicates, and a maximum of 5 keys per provider. 

Related to: https://github.com/coder/internal/issues/1445

> [!NOTE]
> Initially generated by Coder Agents, modified and reviewed by @ssncferreira
2026-04-30 09:19:32 +01:00
Paweł Banaszewski a24dc19d49 chore: clean up env var usage in aibridge (#24783)
> AI tools where used when creating this PR

This PR removes environment variable parsing from `/aibridge` directory.

Added env variables/flags for dump dir as coder options.
Only added to new indexed provider options
(`CODER_AIBRIDGE_PROVIDER_<N>_*`) not to deprecated legacy env variables
(`CODER_AIBRIDGE_ANTHROPIC_*` and `CODER_AIBRIDGE_OPENAI_KEY_*`).

Reverted adding `MaxRetries` option as it will be removed soon due to
key failover work:
https://github.com/coder/coder/pull/24783#discussion_r3155544808
2026-04-29 18:28:37 +02:00
Paweł Banaszewski e00e85765b chore: move aibridge library code into coder repo (#24190)
This PR merges code from `coder/aibridge` repository into `coder/coder`.
It was split into 4 PRs for easier review but stacked PRs will need to
be merged into this PR so all checks pass.

* https://github.com/coder/coder/pull/24190 -> raw code copy (this PR,
before merging PRs on top of it, it was just 1 commit:
https://github.com/coder/coder/commit/70d33f33200c7e77df910957595715f81f9bec24)
* https://github.com/coder/coder/pull/24570 -> update imports in
`coder/coder` to use copied code
* https://github.com/coder/coder/pull/24586 -> linter fixes and CI
integration (also added README.md)
* https://github.com/coder/coder/pull/24571 -> added exclude to
scripts/check_emdash.sh check

Original PR message (before PR squash):
Moves coder/aibridge code into coder/coder repository.

Omitted files:

- `go.mod`, `go.sum`, `.gitignore`, `.github/workflows/ci.yml,`
`Makefile`, `LICENSE`, `README.md` (modified README.md is added later)
- `.github`, `example`, `buildinfo,` `scripts` directories

Simple verification script (will list omitted files)

```
tmp=$(mktemp -d)
echo "$tmp"
git clone --depth=1 https://github.com/coder/aibridge "$tmp/aibridge"
git clone --depth=1 --branch pb/aibridge-code-move https://github.com/coder/coder "$tmp/coder"
diff -rq --exclude=.git "$tmp/aibridge" "$tmp/coder/aibridge"
# rm -rf "$tmp"
```
2026-04-22 17:01:01 +02:00
Susana Ferreira 522118ab20 feat: support AWS SDK default credential chain for Bedrock authentication (#24346)
## Description

Makes AWS Bedrock credentials optional. When `AccessKey` and
`AccessKeySecret` are not set, AI Bridge falls back to the AWS SDK
default credential chain, which supports IAM Roles (instance profiles,
IRSA, ECS task roles), SSO, shared credentials files, and environment
variables.

This allows AI Bridge to authenticate with AWS Bedrock using:
- Permanent credentials (access key + secret) as before
- IAM Roles, shared config files, environment variables, SSO, etc, via
the SDK default credential chain

Depends on: https://github.com/coder/aibridge/pull/265
Related to: https://github.com/coder/aibridge/issues/144 
Related to: https://linear.app/codercom/issue/AIGOV-67

_Disclaimer: initially produced by Claude Opus 4.6, modified and
reviewed by @ssncferreira ._
2026-04-20 10:00:05 +01:00
Spike Curtis 4c1a32cd7c feat: wire DERPTLSConfig through CLI, SDK, tailnet, VPN, agent, and health checks (#24435)
Wire DERPTLSConfig through the CLI, SDK, tailnet, VPN client, agent, and
health checks to allow custom TLS configuration for DERP connections.
The main use case is to be able to set a custom CA and also present
client certs (mTLS). See https://github.com/coder/tailscale/pull/105 for
related changes.

Adds three new global CLI flags:
- `--client-tls-ca-file` / `CODER_CLIENT_TLS_CA_FILE`
- `--client-tls-cert-file` / `CODER_CLIENT_TLS_CERT_FILE`
- `--client-tls-key-file` / `CODER_CLIENT_TLS_KEY_FILE`

Based on community PR #22695 by @ibdafna, with autogeneration issues
fixed (protobuf version mismatches in .pb.go files, golden file
regeneration, lint fixes).

> [!NOTE]
> This PR was authored by Coder Agents on behalf of a Coder team member.

<details>
<summary>Relationship to #22695</summary>

This is a clean reimplementation of the changes from #22695 on top of
current `main`, with the following differences:
- **Removed**: Accidental protobuf version changes in `.pb.go` files
(contributor had `protoc v6.33.4` vs project's `protoc v4.23.4`)
- **Added**: Properly regenerated golden files and docs via `make gen`
- **Fixed**: Lint issue (`var-declaration` revive warning on explicit
type in `createHTTPClient`)
- All meaningful code changes are identical to the original PR
</details>
2026-04-16 12:46:52 -04:00
Yevhenii Shcherbina dd73ea54bd feat: add allow-byok option for ai-gateway (#24274)
## Summary                  
Adds `--ai-gateway-allow-byok` deployment option to control whether
users can use Bring Your Own Key (BYOK) mode with AI Gateway.
When disabled (`--ai-gateway-allow-byok=false`), BYOK requests are
rejected with a 403 and a message directing the admin to enable the
flag. Centralized key authentication works regardless of this setting.
Defaults to `true` (BYOK allowed).

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2026-04-15 14:16:49 -04:00
Danny Kopping 08045c2aac feat: configure multiple AI Bridge providers of the same type (#23948)
_Disclaimer: produced mostly by Claude Opus 4.6 following detailed
planning._

## Summary
- Support multiple instances of the same AI Bridge provider type via
indexed env vars (`CODER_AIBRIDGE_PROVIDER_<N>_<KEY>`), following the
`CODER_EXTERNAL_AUTH_<N>_<KEY>` pattern
- Existing single-provider env vars (`CODER_AIBRIDGE_OPENAI_KEY`, etc.)
continue to work unchanged
- Setting both a legacy env var and an indexed provider with the same
name errors at startup to prevent silent misconfiguration
- Mark legacy provider fields (`OpenAI`, `Anthropic`, `Bedrock`) as
deprecated in `AIBridgeConfig` in favor of `Providers`
  ## Example
```sh
CODER_AIBRIDGE_PROVIDER_0_TYPE=anthropic
CODER_AIBRIDGE_PROVIDER_0_NAME=anthropic-corp
CODER_AIBRIDGE_PROVIDER_0_KEY=sk-ant-corp-xxx

CODER_AIBRIDGE_PROVIDER_0_BASE_URL=https://llm-proxy.internal.example.com/anthropic

CODER_AIBRIDGE_PROVIDER_1_TYPE=anthropic
CODER_AIBRIDGE_PROVIDER_1_NAME=anthropic-direct
  CODER_AIBRIDGE_PROVIDER_1_KEY=sk-ant-direct-yyy         
  ```
  Each instance is routed by name:
- /api/v2/aibridge/**anthropic-corp**/v1/messages
- /api/v2/aibridge/**anthropic-direct**/v1/messages
Closes
[AIGOV-157](https://linear.app/codercom/issue/AIGOV-157/spike-to-understand-if-there-is-a-simple-way-to-handle-multi-api-key)

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-04-15 07:59:37 +00:00
Thomas Kosiewski 6ab30123bf feat: add chat debug log tables, queries, and SDK types (#23913) 2026-04-13 15:06:06 +02:00
Zach 508114d484 feat: user secret database encryption (#24218)
Add dbcrypt support for user secret values. When database encryption is
enabled, secret values are transparently encrypted on write and
decrypted on read through the existing dbcrypt store wrapper.

- Wrap `CreateUserSecret`, `GetUserSecretByUserIDAndName`,
`ListUserSecretsWithValues`, and `UpdateUserSecretByUserIDAndName` in
enterprise/dbcrypt/dbcrypt.go.
- Add rotate and decrypt support for user secrets in
enterprise/dbcrypt/cliutil.go (`server dbcrypt rotate` and `server
dbcrypt decrypt`).
- Add internal tests covering encrypt-on-create, decrypt-on-read,
re-encrypt-on-update, and plaintext passthrough when no cipher is
configured.
2026-04-10 09:34:11 -06:00
J. Scott Miller 7bde763b66 feat: add workspace build transition to provisioner job list (#24131)
Closes #16332

Previously `coder provisioner jobs list` showed no indication of what a workspace
build job was doing (i.e., start, stop, or delete). This adds
`workspace_build_transition` to the provisioner job metadata, exposed in
both the REST API and CLI. Template and workspace name columns were also
added, both available via `-c`.

```
$ coder provisioner jobs list -c id,type,status,"workspace build transition"
ID                                    TYPE                     STATUS     WORKSPACE BUILD TRANSITION
95f35545-a59f-4900-813d-80b8c8fd7a33  template_version_import  succeeded
0a903bbe-cef5-4e72-9e62-f7e7b4dfbb7a  workspace_build          succeeded  start
```
2026-04-10 09:50:11 -05:00
Yevhenii Shcherbina 9440adf435 feat: add chatgpt support for aibridge (#23822)
Registers a new aibridge provider for ChatGPT by reusing the existing
OpenAI provider with a different `Name` and `BaseURL`
(https://chatgpt.com/backend-api/codex). The ChatGPT backend API is
OpenAI-compatible, so no new provider type is needed.
ChatGPT authenticates exclusively via per-user OAuth JWTs (BYOK mode) —
no centralized API key is configured. The OpenAI provider already
handles this: when no key is set, it falls through to the bearer token
from the request's Authorization header.
  
  Depends on #23811
2026-03-31 12:08:45 -04:00
Susana Ferreira b0036af57b feat: register multiple Copilot providers for business and enterprise upstreams (#23811)
## Description

Adds support for multiple Copilot provider instances to route requests to different Copilot upstreams (individual, business, enterprise). Each instance has its own name and base URL, enabling per-upstream metrics, logs, circuit breakers, API dump, and routing.

## Changes

* Add Copilot business and enterprise provider names and host constants
* Register three Copilot provider instances in aibridged (default, business, enterprise)
* Update `defaultAIBridgeProvider` in `aibridgeproxy` to route new Copilot hosts to their corresponding providers

## Related

* Depends on: https://github.com/coder/aibridge/pull/240
* Closes: https://github.com/coder/aibridge/issues/152

Note: documentation changes will be added in a follow-up PR.

_Disclaimer: initially produced by Claude Opus 4.6, heavily modified and reviewed by @ssncferreira ._
2026-03-31 16:00:37 +01:00
Danny Kopping dba9f68b11 chore!: remove members' ability to read their own interceptions; rationalize RBAC requirements (#23320)
_Disclaimer:_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _reviewed_ _by_ _me._

**This is a breaking change.** Users who are not have `owner` or sitewide `auditor` roles will no longer be able to view interceptions.  
Regular users should not need to view this information; in fact, it could be used by a malicious insider to see what information we track and don't track to exfiltrate data or perform actions unobserved.

---

Changed authorization for AI Bridge interception-related operations from system-level permissions to resource-specific permissions. The following functions now authorize against `rbac.ResourceAibridgeInterception` instead of `rbac.ResourceSystem`:

- `ListAIBridgeTokenUsagesByInterceptionIDs`
- `ListAIBridgeToolUsagesByInterceptionIDs`
- `ListAIBridgeUserPromptsByInterceptionIDs`

Updated RBAC roles to grant AI Bridge interception permissions:

- **User/Member roles**: Can create and update AI Bridge interceptions but cannot read them back
- **Service accounts**: Same create/update permissions without read access
- **Owners/Auditors**: Retain full read access to all interceptions

Removed system-level authorization bypass in `populatedAndConvertAIBridgeInterceptions` function, allowing proper resource-level authorization checks.

Updated tests to reflect the new permission model where members cannot view AI Bridge interceptions, even their own, while owners and auditors maintain full visibility.
2026-03-24 12:03:20 +02:00
Susana Ferreira 139594a4f4 feat: block CONNECT tunnels to private/reserved IP ranges (#23109)
## Description

Blocks `CONNECT` tunnels to private and reserved IP ranges in
aibridgeproxyd, preventing the proxy from being used to reach internal
networks.

The Coder access URL is always exempt (hostname+port match) so the proxy
can reach its own deployment. It is possible to exempt additional ranges
via `CODER_AIBRIDGE_PROXY_ALLOWED_PRIVATE_CIDRS`.

DNS rebinding is handled differently per path:
* Direct (no upstream proxy): validate the resolved IP right before the
TCP dial, no window between check and connect.
* Upstream proxy: Resolves and checks before forwarding to the upstream
dialer. A small rebinding window exists since the upstream proxy
re-resolves independently.

## Changes

* Add blocked IP denylist covering private, reserved, and
special-purpose ranges
* Add `AllowedPrivateCIDRs` option with CLI flag and env var
* Wire IP checks into `proxy.ConnectDial` for both upstream and direct
paths
* Add tests for blocked/allowed cases across direct dial, upstream
proxy, CIDR exemptions, and CoderAccessURL exemption

Notes: documentation will be handled in a follow-up PR.
Closes: https://github.com/coder/security/issues/124
2026-03-20 09:49:26 +00:00
Steven Masley 84de391f26 chore: add tallyman events for ai seat tracking (#22689)
AI seat tracking inserted as heartbeat into usage table.
2026-03-18 09:30:22 -05:00
Zach 3f76f312e4 feat(cli): add --no-wait flag to coder create (#22867)
Adds a `--no-wait` flag (CODER_CREATE_NO_WAIT) to the create command,
matching the existing pattern in `coder start`. When set, the `coder
create` command returns immediately after the workspace creation API
call succeeds instead of streaming build logs until completion.

This enables fire-and-forget workspace creation in CI/automation
contexts (e.g., GitHub Actions), where waiting for the build to finish
is unnecessary. Combined with other existing flags, users can create a
workspace with no interactivity, assuming the user is already
authenticated.
2026-03-16 11:54:30 -06:00
Danny Kopping 870583224d chore: deprecate injected MCP approach in AI Bridge (#23031)
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.6._

Marks the injected MCP approach in AI Bridge as deprecated across the
codebase.

## Changes

- **`codersdk/deployment.go`**: Deprecated `ExternalAuthConfig.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex` fields; deprecated and hid the
`--aibridge-inject-coder-mcp-tools` server flag; deprecated
`AIBridgeConfig.InjectCoderMCPTools`.
- **`coderd/externalauth/externalauth.go`**: Deprecated `Config.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex`.
- **`enterprise/aibridgedserver/aibridgedserver.go`**: Added runtime
deprecation warning when `CODER_AIBRIDGE_INJECT_CODER_MCP_TOOLS` is
enabled; deprecated `getCoderMCPServerConfig`.
- **`enterprise/aibridged/mcp.go`**: Deprecated `MCPProxyBuilder`
interface and `MCPProxyFactory` struct.
- **`docs/ai-coder/ai-bridge/mcp.md`**: Added deprecation warning
banner.
2026-03-13 16:15:33 +02:00
Zach a46336c3ec fix(cli)!: coder groups list -o json returns empty values (#22923)
The groupsToRows function was not setting the Group field on
groupTableRow, causing JSON output to contain zero-value structs. Table
output was unaffected since it uses separate fields.

BREAKING CHANGE: The JSON output structure changes from `{"Group":
{"id": ...}}` to `{"id": ...}` (flat). This is technically a breaking
change, but JSON output never contained real data (all fields were
zero-valued), so no working consumer could exist. We're taking the
opportunity to flatten the structure to match other list commands like
`coder list -o json`.
2026-03-11 09:45:00 -06:00
Mathias Fredriksson 338d30e4c4 fix(enterprise/cli): use :0 for http-address in proxy server tests (#22726)
`Test_ProxyServer_Headers` never passed `--http-address`, so it bound to
the default `127.0.0.1:3000`.
`TestWorkspaceProxy_Server_PrometheusEnabled`
used `RandomPort(t)` for `--http-address` (a drive-by from #14972 which
was
fixing the Prometheus port).

Both now use `--http-address :0`. `ConfigureHTTPServers` calls
`net.Listen("tcp", ":0")` and holds the listener open, so there is no
TOCTOU window. Neither test connects to the HTTP listener, so the
assigned port is irrelevant. This matches `cli/server_test.go` where
`:0` is used throughout.
2026-03-06 17:05:06 -05:00
Susana Ferreira 21c91cebaa feat: add TLS listener support to aibridgeproxyd (#22411)
## Description

Adds optional TLS support for the AI Bridge Proxy listener. When TLS cert and key files are provided, the proxy serves over HTTPS instead of plain HTTP.

## Changes

* New configuration options to enable TLS on the proxy listener 
* Wraps the TCP listener in `tls.NewListener` when configured
* Tests for validation errors, invalid files, and full integration (tunneled + MITM) through a TLS listener

Note: Documentation for TLS listener setup and client configuration will be handled in a follow-up PR.
Related to: https://github.com/coder/internal/issues/1335
2026-03-05 09:19:34 +00:00
Susana Ferreira c79e8f2707 refactor: clarify MITM certificate naming in aibridgeproxyd (#22408)
## Description

Renames internal fields, variables, and comments related to the proxy's certificate/key configuration to explicitly reference their MITM CA purpose.

The AI Bridge Proxy uses a CA certificate to sign dynamically generated leaf certificates during MITM interception of HTTPS traffic from AI clients. With the upcoming introduction of TLS listener certificates (for serving the proxy itself over HTTPS, implemented upstack https://github.com/coder/coder/pull/22411), the previous generic naming would become ambiguous. This refactor makes it clear which certificate is which.

No user-facing flags, environment variables, YAML keys, or JSON fields were changed, this is purely an internal rename to avoid confusion going forward.

Related to https://github.com/coder/internal/issues/1335
2026-03-05 09:06:38 +00:00
Steven Masley 7bc454eed8 chore: version is 2.31 not 1.31 (#22494) 2026-03-02 16:23:09 +00:00
Kyle Carberry edee917d88 feat: add experimental agents support (#22290)
feat: add AI chat system with agent tools and chat UI

Introduce the chatd subsystem and Agents UI for AI-powered chat
within Coder workspaces.

- Add chatd package with chat loop, message compaction, prompt
  management, and LLM provider integration (OpenAI, Anthropic)
- Add agent tools: create workspace, list/read templates, read/write/
  edit files, execute commands
- Add chat API endpoints with streaming, message editing, and
  durable reconnection
- Add database schema and migrations for chats, chat messages, chat
  providers, and chat model configs
- Add RBAC policies and dbauthz enforcement for chat resources
- Add Agents UI pages with conversation timeline, queued messages
  list, diff viewer, and model configuration panel
- Add comprehensive test coverage including coderd integration tests,
  chatd unit tests, and Storybook stories
- Gate feature behind experiments flag

---------

Co-authored-by: Cian Johnston <cian@coder.com>
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jeremy Ruppel <jeremy@coder.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 16:50:56 +00:00