Commit Graph

2 Commits

Author SHA1 Message Date
Susana Ferreira 47b3846bca feat: use coder specific header for aibridge authentication from AI proxy (#21590)
## Description

Introduces a new `X-Coder-Token` header for authenticating requests from
AI Proxy to AI Bridge. Previously, the proxy overwrote the
`Authorization` header with the Coder token, which prevented the
original authentication headers from flowing through to upstream
providers.

With this change, AI Proxy sets the Coder token in a separate header,
preserving the original `Authorization` and `X-Api-Key` headers. AI
Bridge uses this header for authentication and removes it before
forwarding requests to upstream providers. For requests that don't come
through AI Proxy, AI Bridge continues to use `Authorization` and
`X-Api-Key` for authentication.

## Changes

* Add `HeaderCoderAuth` constant and update `ExtractAuthToken` to check
headers in the following order: `X-Coder-Token` > `Authorization` >
`X-Api-Key`
* Update AI Proxy to set `X-Coder-Token` instead of overwriting
`Authorization`
* Remove `X-Coder-Token` in AI Bridge before forwarding to upstream
providers
* Add tests for header handling and token extraction priority

Related to: https://github.com/coder/internal/issues/1235
2026-01-21 19:06:19 +00:00
Kacper Sawicki 6f86f67754 feat(coderd): add overload protection with rate limiting and concurrency control (#21161)
## Summary

This adds configurable overload protection to the AI Bridge daemon to
prevent the server from being overwhelmed during periods of high load.

Partially addresses coder/internal#1153 (rate limits and concurrency
control; circuit breakers are deferred to a follow-up).

## New Configuration Options

| Option | Environment Variable | Description | Default |
|--------|---------------------|-------------|---------|
| `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` |
Maximum number of concurrent AI Bridge requests. Set to 0 to disable
(unlimited). | `0` |
| `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number
of AI Bridge requests per second. Set to 0 to disable rate limiting. |
`0` |

## Behavior

When limits are exceeded:
- **Concurrency limit**: Returns HTTP `503 Service Unavailable` with
message "AI Bridge is currently at capacity. Please try again later."
- **Rate limit**: Returns HTTP `429 Too Many Requests` with
`Retry-After` header.

Both protections are optional and disabled by default (0 values).

## Implementation

The overload protection is implemented as reusable middleware in
`coderd/httpmw/ratelimit.go`:

1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses
`APITokenFromRequest` to extract the authentication token, with fallback
to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic).
Falls back to IP-based rate limiting if no token is present. Includes
`Retry-After` header for backpressure signaling.
2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight
requests and reject when at capacity.

The middleware is applied in `enterprise/coderd/aibridge.go` via
`r.Group` in the following order:
1. Concurrency check (faster rejection for load shedding)
2. Rate limit check

**Note**: Rate limiting currently applies to all AI Bridge requests,
including pass-through requests. Ideally only actual interceptions
should count, but this would require changes in the aibridge library.

## Testing

Added comprehensive tests for:
- Rate limiting by auth token (Bearer token, X-Api-Key, no token
fallback to IP)
- Different tokens not rate limited against each other
- Disabled when limit is zero
- Retry-After header is set on 429 responses
- Concurrency limiting (allows within limit, rejects over limit,
disabled when zero)
2025-12-11 16:38:54 +01:00