coder

mirror of https://github.com/coder/coder.git synced 2026-06-05 05:58:20 +00:00

Author	SHA1	Message	Date
Paweł Banaszewski	90c11f3386	feat: add client column to aibridge_interceptions table (#21839 ) Adds `client` column to `aibridge_interceptions` table. It is set accordingly to what is passed from AI Bridge in `RecordInterception`. Adds interception filtering by `client` value. Depends on: https://github.com/coder/aibridge/pull/158 Updates aibridge library to include this change. Fixes: https://github.com/coder/aibridge/issues/31	2026-02-17 15:43:02 +01:00
Zach	fa2481c650	test: add synctest-based aibridged cache expiry test (#21984 ) Resolves the TODO in TestPool by adding TestPool_Expiry which uses Go 1.25's testing/synctest to verify TTL-based cache eviction. I wanted to get familiar with the new `synctest` package in Go 1.25 and found this TODO comment, so I decided to take a stab at it 😄	2026-02-09 15:09:40 +02:00
Danny Kopping	303389e75a	fix: correct https://github.com/coder/internal/issues/1167 behaviour (#21692 ) Closes https://github.com/coder/internal/issues/1167 Previously we were checking that start != end time; this was flaking on Windows. On Windows, `time.Now()` has limited resolution (~1ms with Go runtime's `timeBeginPeriod`, or ~15.6ms in default system resolution). When two `time.Now()` calls execute within the same clock tick, they return identical timestamps, causing `StartedAt.Before(EndedAt)` to return `false`. References: - [Go issue #8687](https://github.com/golang/go/issues/8687) - Windows system clock resolution issue - [Go issue #67066](https://github.com/golang/go/issues/67066) - time.Now precision on Windows (still open) Instead, we're changing the assertion to (the more semantically correct) "end not before start". A possible future enhancement could be to plumb coder/quartz through the recording mechanism, but it's unnecessary for now. Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-27 12:36:48 +02:00
Danny Kopping	7123518baa	feat: conditionally send `aibridge` actor headers (#21643 ) Also passes along the authenticated username as actor metadata. Closes https://github.com/coder/aibridge/issues/135 Depends on https://github.com/coder/aibridge/pull/142 Replace aibridge tag with merge commit once https://github.com/coder/aibridge/pull/142 lands. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-01-26 15:08:17 +00:00
Susana Ferreira	47b3846bca	feat: use coder specific header for aibridge authentication from AI proxy (#21590 ) ## Description Introduces a new `X-Coder-Token` header for authenticating requests from AI Proxy to AI Bridge. Previously, the proxy overwrote the `Authorization` header with the Coder token, which prevented the original authentication headers from flowing through to upstream providers. With this change, AI Proxy sets the Coder token in a separate header, preserving the original `Authorization` and `X-Api-Key` headers. AI Bridge uses this header for authentication and removes it before forwarding requests to upstream providers. For requests that don't come through AI Proxy, AI Bridge continues to use `Authorization` and `X-Api-Key` for authentication. ## Changes * Add `HeaderCoderAuth` constant and update `ExtractAuthToken` to check headers in the following order: `X-Coder-Token` > `Authorization` > `X-Api-Key` * Update AI Proxy to set `X-Coder-Token` instead of overwriting `Authorization` * Remove `X-Coder-Token` in AI Bridge before forwarding to upstream providers * Add tests for header handling and token extraction priority Related to: https://github.com/coder/internal/issues/1235	2026-01-21 19:06:19 +00:00
Kacper Sawicki	ed679bb3da	feat(codersdk): add circuit breaker configuration support for aibridge (#21546 ) ## Summary Add circuit breaker support for AI Bridge to protect against cascading failures from upstream AI provider rate limits (HTTP 429, 503, and Anthropic's 529 overloaded responses). ## Changes - Add 5 new CLI options for circuit breaker configuration: - `--aibridge-circuit-breaker-enabled` (default: false) - `--aibridge-circuit-breaker-failure-threshold` (default: 5) - `--aibridge-circuit-breaker-interval` (default: 10s) - `--aibridge-circuit-breaker-timeout` (default: 30s) - `--aibridge-circuit-breaker-max-requests` (default: 3) - Update aibridge dependency to include circuit breaker support - Add tests for pool creation with circuit breaker providers ## Notes - Circuit breaker is disabled by default for backward compatibility - When enabled, applies to both OpenAI and Anthropic providers - Uses sony/gobreaker internally via the aibridge library ## Testing ``` make test RUN=TestPoolWithCircuitBreakerProviders ```	2026-01-20 14:59:29 +01:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Danny Kopping	39bf9ed18a	perf: increase bridge pool cache size limit (#21399 ) With this low upper bound, the cache thrashes under load (i.e. cache entries are replaced too quickly), leading to audit records not persisting in time before the context is canceled (see `OnEvict` behaviour). The TTL remains 15m because we need to keep MCP connections relatively fresh, but this TTL is irrelevant if injected tools are not used. This was an oversight; the limit should never have been set so low. 5000 is likely so large that the cache will never fill up; in future we should make this configurable if customers run into issues. It's a bit difficult right now to determine how much real memory each element _actually_ uses, but even if it's a crazy number like 100KiB per instance then it'll only use 500MiB. Signed-off-by: Danny Kopping <danny@coder.com>	2025-12-30 11:44:34 +00:00
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
Paweł Banaszewski	e24cc5e6da	feat: add tracing to aibridge (#21106 ) Adds tracing for AIBridge. Updates github.com/coder/aibridge version from `v0.2.2` to `v0.3.0` Depends on: https://github.com/coder/aibridge/pull/63 Fixes: https://github.com/coder/aibridge/issues/26 --------- Co-authored-by: Danny Kopping <danny@coder.com>	2025-12-05 15:59:52 +01:00
Danny Kopping	e340560164	chore: actually store translated token metadata (#20929 ) Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-25 16:50:19 +00:00
Danny Kopping	c6631e1e50	feat: expose `aibridged` metrics (#20865 ) Upgrades `coder/aibridge` to v0.2.0 which includes https://github.com/coder/aibridge/pull/62. Creates a `prometheus.Registerer` with a prefix `coder_aibridged_` and passes that along to coder/aibridge which actually exposes the metrics. Also includes a side-effect of a change described in https://github.com/coder/aibridge/pull/62#discussion_r2550017470. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-24 18:16:06 +02:00
Paweł Banaszewski	991831b1dd	chore: add API key ID to interceptions (#20513 ) Adds APIKeyID to interceptions. Needed for tracking API key usage with bridge. fixes https://github.com/coder/coder/issues/20001	2025-11-10 13:46:41 +01:00
Danny Kopping	b20fd6f2c1	chore: graduate aibridge API out of experimental (#20523 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:18:54 -06:00
Danny Kopping	2294c55bd9	chore: graduate `aibridged*` packages out of experimental (#20522 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:00:24 -06:00

16 Commits