Commit Graph

11 Commits

Author SHA1 Message Date
Kacper Sawicki 6f86f67754 feat(coderd): add overload protection with rate limiting and concurrency control (#21161)
## Summary

This adds configurable overload protection to the AI Bridge daemon to
prevent the server from being overwhelmed during periods of high load.

Partially addresses coder/internal#1153 (rate limits and concurrency
control; circuit breakers are deferred to a follow-up).

## New Configuration Options

| Option | Environment Variable | Description | Default |
|--------|---------------------|-------------|---------|
| `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` |
Maximum number of concurrent AI Bridge requests. Set to 0 to disable
(unlimited). | `0` |
| `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number
of AI Bridge requests per second. Set to 0 to disable rate limiting. |
`0` |

## Behavior

When limits are exceeded:
- **Concurrency limit**: Returns HTTP `503 Service Unavailable` with
message "AI Bridge is currently at capacity. Please try again later."
- **Rate limit**: Returns HTTP `429 Too Many Requests` with
`Retry-After` header.

Both protections are optional and disabled by default (0 values).

## Implementation

The overload protection is implemented as reusable middleware in
`coderd/httpmw/ratelimit.go`:

1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses
`APITokenFromRequest` to extract the authentication token, with fallback
to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic).
Falls back to IP-based rate limiting if no token is present. Includes
`Retry-After` header for backpressure signaling.
2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight
requests and reject when at capacity.

The middleware is applied in `enterprise/coderd/aibridge.go` via
`r.Group` in the following order:
1. Concurrency check (faster rejection for load shedding)
2. Rate limit check

**Note**: Rate limiting currently applies to all AI Bridge requests,
including pass-through requests. Ideally only actual interceptions
should count, but this would require changes in the aibridge library.

## Testing

Added comprehensive tests for:
- Rate limiting by auth token (Bearer token, X-Api-Key, no token
fallback to IP)
- Different tokens not rate limited against each other
- Disabled when limit is zero
- Retry-After header is set on 429 responses
- Concurrency limiting (allows within limit, rejects over limit,
disabled when zero)
2025-12-11 16:38:54 +01:00
Steven Masley 1d1070d051 chore: ensure proper rbac permissions on 'Acquire' file in the cache (#18348)
The file cache was caching the `Unauthorized` errors if a user without
the right perms opened the file first. So all future opens would fail.

Now the cache always opens with a subject that can read files. And authz
is checked on the Acquire per user.
2025-06-16 13:40:45 +00:00
Steven Masley eeb3d63be6 chore: merge authorization contexts (#12816)
* chore: merge authorization contexts

Instead of 2 auth contexts from apikey and dbauthz, merge them to
just use dbauthz. It is annoying to have two.

* fixup authorization reference
2024-03-29 10:14:27 -05:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Steven Masley b0a16150a3 chore: Implement standard rbac.Subject to be reused everywhere (#5881)
* chore: Implement standard rbac.Subject to be reused everywhere

An rbac subject is created in multiple spots because of the way we
expand roles, scopes, etc. This difference in use creates a list
of arguments which is unwieldy.

Use of the expander interface lets us conform to a single subject
in every case
2023-01-26 14:42:54 -06:00
Ammar Bandukwala 423ac04156 coderd: tighten /login rate limiting (#4432)
* coderd: tighten /login rate limit

* coderd: add Bypass rate limit header
2022-10-20 17:01:23 +00:00
Colin Adler 5de6f86959 feat: trace httpapi.{Read,Write} (#4134) 2022-09-21 17:07:00 -05:00
Jon Ayers 7e9819f2a8 ref: move httpapi.Reponse into codersdk (#2954) 2022-07-12 19:15:02 -05:00
Steven Masley 548de7d6f3 feat: User pagination using offsets (#1062)
Offset pagination and cursor pagination supported
2022-04-22 15:27:55 -05:00
Garrett Delfosse d9d4599ba9 chore: idea: unify http responses further (#941) 2022-04-12 10:17:33 -05:00
Kyle Carberry 31536186f7 feat: Add rate-limits to the API (#848)
Closes #285.
2022-04-04 17:32:05 -05:00