If a deployment has 2 domains, overriding the oidc url allows the oidc
redirect to differ from the access_url
response to https://github.com/coder/coder/discussions/21500
**This config setting is hidden by default**
## Summary
Add circuit breaker support for AI Bridge to protect against cascading
failures from upstream AI provider rate limits (HTTP 429, 503, and
Anthropic's 529 overloaded responses).
## Changes
- Add 5 new CLI options for circuit breaker configuration:
- `--aibridge-circuit-breaker-enabled` (default: false)
- `--aibridge-circuit-breaker-failure-threshold` (default: 5)
- `--aibridge-circuit-breaker-interval` (default: 10s)
- `--aibridge-circuit-breaker-timeout` (default: 30s)
- `--aibridge-circuit-breaker-max-requests` (default: 3)
- Update aibridge dependency to include circuit breaker support
- Add tests for pool creation with circuit breaker providers
## Notes
- Circuit breaker is **disabled by default** for backward compatibility
- When enabled, applies to both OpenAI and Anthropic providers
- Uses sony/gobreaker internally via the aibridge library
## Testing
```
make test RUN=TestPoolWithCircuitBreakerProviders
```
## Description
Adds upstream proxy support for AI Bridge Proxy passthrough requests.
This allows aiproxy to forward non-allowlisted requests through an
upstream proxy. Currently, the only supported configuration is when
aiproxy is the first proxy in the chain (client → aiproxy → upstream
proxy).
## Changes
* Add `--aibridge-proxy-upstream` option to configure an upstream
HTTP/HTTPS proxy URL for passthrough requests
* Add `--aibridge-proxy-upstream-ca` option to trust custom CA
certificates for HTTPS upstream proxies
* Passthrough requests (non-allowlisted domains) are forwarded through
the upstream proxy
* MITM'd requests (allowlisted domains) continue to go directly to
aibridge, not through the upstream proxy
* Add tests for upstream proxy configuration and request routing
Closes: https://github.com/coder/internal/issues/1204
## Description
Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so
that only requests to allowlisted domains are intercepted and decrypted.
Requests to all other domains are tunneled directly without decryption.
## Changes
* New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default:
`api.anthropic.com`,`api.openai.com`)
* Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT`
requests to allowlisted hosts
* Certificate caching: Now only generates/caches certificates for
allowlisted domains
* Validation: Startup fails if domain allowlist is empty or contains
invalid entries
Closes: https://github.com/coder/internal/issues/1182
Closes https://github.com/coder/coder/issues/21360
A few considerations/notes:
- I've kept the number of conns to 10 in all other places, except coderd
- which uses the config value
- I opted to also make idle conns configurable; the greater the delta
between max open and max idle, the more connection churn
- Postgres maintains a [_process_ per
connection](https://www.postgresql.org/docs/current/connect-estab.html),
contrary to what the comment said previously
- Operators should be able to tune this, since process churn can
negatively affect OS scheduling
- I've set the value to `"auto"` by default so it's not another knob one
_has to_ twiddle, and sets max idle = max conns / 3
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Because this affects more than just the template insights
page (specifically it also affects the deployment stats endpoint which
is shown on bottom bar and Prometheus), the group is being renamed
generically to just "stats collection". In the future if we need to
affect the other stats we can put those options here.
Then, because this change only affects a portion of stats, specifically
usage stats like connection and application time, bytes sent, etc, add a
new sub-group called "usage stats".
Then finally add back the "enable" flag. This also gives us a place to
one day place an "anonymize" flag if we need to go that route.
## Description
Adds the core AI Bridge MITM proxy daemon. This proxy intercepts HTTPS traffic, decrypts it using a configured CA certificate, and forwards requests to AIBridge for processing.
## Changes
* Added `aibridgeproxyd` package with the core proxy server implementation
* Added configuration options: `CODER_AIBRIDGE_PROXY_ENABLED`, `CODER_AIBRIDGE_PROXY_LISTEN_ADDR`, `CODER_AIBRIDGE_PROXY_CERT_FILE`, `CODER_AIBRIDGE_PROXY_KEY_FILE`
* Added tests for server initialization and MITM functionality
Closes https://github.com/coder/internal/issues/1180
**Breaking Change:** Existing oauth apps might now use PKCE. If an
unknown IdP type was being used, and it does not support PKCE, it will
break.
To fix, set the PKCE methods on the external auth to `none`
```
export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none
```
Closes#20399
To summarize the original commit messages:
- Do not log stats to the database.
- Return errors on the insight endpoints.
- Update the frontend to show those errors.
- Also fixes an issue with getting the user status count via codersdk,
since I added a test to ensure it was not disabled by this flag and it
was sending the wrong payload.
## Summary
This adds configurable overload protection to the AI Bridge daemon to
prevent the server from being overwhelmed during periods of high load.
Partially addresses coder/internal#1153 (rate limits and concurrency
control; circuit breakers are deferred to a follow-up).
## New Configuration Options
| Option | Environment Variable | Description | Default |
|--------|---------------------|-------------|---------|
| `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` |
Maximum number of concurrent AI Bridge requests. Set to 0 to disable
(unlimited). | `0` |
| `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number
of AI Bridge requests per second. Set to 0 to disable rate limiting. |
`0` |
## Behavior
When limits are exceeded:
- **Concurrency limit**: Returns HTTP `503 Service Unavailable` with
message "AI Bridge is currently at capacity. Please try again later."
- **Rate limit**: Returns HTTP `429 Too Many Requests` with
`Retry-After` header.
Both protections are optional and disabled by default (0 values).
## Implementation
The overload protection is implemented as reusable middleware in
`coderd/httpmw/ratelimit.go`:
1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses
`APITokenFromRequest` to extract the authentication token, with fallback
to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic).
Falls back to IP-based rate limiting if no token is present. Includes
`Retry-After` header for backpressure signaling.
2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight
requests and reject when at capacity.
The middleware is applied in `enterprise/coderd/aibridge.go` via
`r.Group` in the following order:
1. Concurrency check (faster rejection for load shedding)
2. Rate limit check
**Note**: Rate limiting currently applies to all AI Bridge requests,
including pass-through requests. Ideally only actual interceptions
should count, but this would require changes in the aibridge library.
## Testing
Added comprehensive tests for:
- Rate limiting by auth token (Bearer token, X-Api-Key, no token
fallback to IP)
- Different tokens not rate limited against each other
- Disabled when limit is zero
- Retry-After header is set on 429 responses
- Concurrency limiting (allows within limit, rejects over limit,
disabled when zero)
Adds `--disable-workspace-sharing` option.
Workspace sharing is disabled by not including user and group ACLs in
the workspace RBAC object, which prevents ACL-based authz.
Closes https://github.com/coder/internal/issues/1072
The commit also adds saving of workspace user/group ACLs in the test DB
data generator.
Replace hardcoded 7-day retention for workspace agent logs with
configurable retention from deployment settings. Defaults to 7d to
preserve existing behavior.
Depends on #21038
Updates #20743
Add `RetentionConfig` with server flags for configuring data retention:
- `--audit-logs-retention`: retention for audit log entries
- `--connection-logs-retention`: retention for connection logs
- `--api-keys-retention`: retention for expired API keys (default 7d)
Updates #20743
Currently, when AI Bridge is enabled AND the `oauth2` and
`mcp-server-http` experiments are enabled we inject Coder's MCP tools
into all intercepted AI Bridge requests.
This PR introduces a config to control this behaviour.
**NOTE:** this is a backwards-incompatible change; previously these
tools would be injected automatically, now this setting will need to be
explicitly enabled.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
The authz recorder is causing a lot of memory to be allocated, and is a
memory leak for websocket connections.
This change makes it opt-in on a per request basis (ontop of `isDev`).
To get the authz headers, use `Copy as cURL` on chrome and append the
header `x-authz-checks=true`.
Solves #15575
Adds OAuth access token revocation when unlinking external auth
provider. Due to revocation not being consistently implemented by
providers this is only best effort attempt. Unsuccessful revocation
won't influence link removal.
# Add separate token lifetime limits for administrators
This PR introduces a new configuration option `--max-admin-token-lifetime` that allows administrators to create API tokens with longer lifetimes than regular users. By default, administrators can create tokens with a lifetime of up to 7 days (168 hours), while the existing `--max-token-lifetime` setting continues to apply to regular users.
The implementation:
- Adds a new `MaximumAdminTokenDuration` field to the session configuration
- Modifies the token validation logic to check the user's role and apply the appropriate lifetime limit
- Updates the token configuration endpoint to return the correct maximum lifetime based on the user's role
- Adds tests to verify that administrators can create tokens with longer and shorter lifetimes
- Updates documentation and help text to reflect the new option
This change allows organizations to grant administrators extended token lifetimes while maintaining tighter security controls for regular users.
Fixes#17395
Relates to https://github.com/coder/coder/issues/17432
### Part 1:
Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.
- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.
- By default `FailureHardLimit` is set to 3.
- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.
### Part 2
Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
'normal', -- Prebuilds are working as expected; this is the default, healthy state.
'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.
- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>
### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
Adds deployment option `CODER_WORKSPACE_HOSTNAME_SUFFIX`. This will eventually replace `CODER_SSH_HOSTNAME_PREFIX`, but we will do this slowly and support both for `coder ssh` for some time.
Note that the name is changed to "workspace" hostname, since this suffix will also be used for Coder Connect on Coder Desktop, which is not limited to SSH.
* Adds `codersdk.ExperimentWebPush` (`web-push`)
* Adds a `coderd/webpush` package that allows sending native push
notifications via `github.com/SherClockHolmes/webpush-go`
* Adds database tables to store push notification subscriptions.
* Adds an API endpoint that allows users to subscribe/unsubscribe, and
send a test notification (404 without experiment, excluded from API docs)
* Adds server CLI command to regenerate VAPID keys (note: regenerating
the VAPID keypair requires deleting all existing subscriptions)
---------
Co-authored-by: Kyle Carberry <kyle@carberry.com>
Third and final PR to address
https://github.com/coder/coder/issues/16230.
This PR enables GitHub OAuth2 login by default on new deployments.
Combined with https://github.com/coder/coder/pull/16629, this will allow
the first admin user to sign up with GitHub rather than email and
password.
We take care not to enable the default on deployments that would upgrade
to a Coder version with this change.
To disable the default provider an admin can set the
`CODER_OAUTH2_GITHUB_DEFAULT_PROVIDER` env variable to false.
Niche edge case, assumes access_token is jwt.
Some `access_token`s are JWT's with potential useful claims.
These claims would be nearly equivalent to `user_info` claims.
This is not apart of the oauth spec, so this feature should not be
loudly advertised. If using this feature, alternate solutions are preferred.
First PR in a series to address
https://github.com/coder/coder/issues/16230.
Introduces support for logging in via the [GitHub OAuth2 Device
Flow](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/authorizing-oauth-apps#device-flow).
It's previously been possible to configure external auth with the device
flow, but it's not been possible to use it for logging in. This PR
builds on the existing support we had to extend it to sign ins.
When a user clicks "sign in with GitHub" when device auth is configured,
they are redirected to the new `/login/device` page, which makes the
flow possible from the client's side. The recording below shows the full
flow.
https://github.com/user-attachments/assets/90c06f1f-e42f-43e9-a128-462270c80fdd
I've also manually tested that it works for converting from
password-based auth to oauth.
Device auth can be enabled by a deployment's admin by setting the
`CODER_OAUTH2_GITHUB_DEVICE_FLOW` env variable or a corresponding config
setting.
Another PR to address https://github.com/coder/coder/issues/15109.
Changes:
- Introduces the `--ephemeral` flag, which changes the Coder config
directory to a temporary location. The config directory is where the
built-in PostgreSQL stores its data, so using a new one results in a
deployment with a fresh state.
The `--ephemeral` flag is set to replace the `--in-memory` flag once the
in-memory database is removed.
Allows adding custom static CSP directives to Coder. Niche use case but
makes this easier then creating a reverse proxy that has to replace the
header. We want to preserve our directives, so having an append option
is preferred to a "replace" option via a reverse proxy.
Closes https://github.com/coder/coder/issues/15118
Resolves https://github.com/coder/coder/issues/15513
Disables notifications when both `$CODER_NOTIFICATIONS_WEBHOOK_ENDPOINT` and `$CODER_EMAIL_SMARTHOST` are unset.
Breaking change: `$CODER_EMAIL_SMARTHOST` is no longer set by default as `localhost:587`, meaning any deployments that make use of this default value will need to add it back.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* chore: implement filters for the organizations query
* chore: implement organization sync and create idpsync package
Organization sync can now be configured to assign users to an org based on oidc claims.