mirror of
https://github.com/coder/coder.git
synced 2026-06-06 14:38:23 +00:00
6f86f67754
## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options | Option | Environment Variable | Description | Default | |--------|---------------------|-------------|---------| | `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` | Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). | `0` | | `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. | `0` | ## Behavior When limits are exceeded: - **Concurrency limit**: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - **Rate limit**: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check **Note**: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)
25 lines
757 B
Go
25 lines
757 B
Go
// Package aibridge provides utilities for the AI Bridge feature.
|
|
package aibridge
|
|
|
|
import (
|
|
"net/http"
|
|
"strings"
|
|
)
|
|
|
|
// ExtractAuthToken extracts an authorization token from HTTP headers.
|
|
// It checks the Authorization header (Bearer token) and X-Api-Key header,
|
|
// which represent the different ways clients authenticate against AI providers.
|
|
// If neither are present, an empty string is returned.
|
|
func ExtractAuthToken(header http.Header) string {
|
|
if auth := strings.TrimSpace(header.Get("Authorization")); auth != "" {
|
|
fields := strings.Fields(auth)
|
|
if len(fields) == 2 && strings.EqualFold(fields[0], "Bearer") {
|
|
return fields[1]
|
|
}
|
|
}
|
|
if apiKey := strings.TrimSpace(header.Get("X-Api-Key")); apiKey != "" {
|
|
return apiKey
|
|
}
|
|
return ""
|
|
}
|