Files
coder/aibridge/keypool/headers.go
T
Susana Ferreira f1155ac4d7 feat: add automatic key failover for AI Bridge Anthropic (#24836)
## Description

Adds automatic key failover for centralized Anthropic provider. When a key pool is configured, each upstream call walks the pool and tries keys in order until one succeeds or the pool is exhausted. Keys are marked **temporary** on 429 (with cooldown from `Retry-After`) and **permanent** on 401/403. Errors that aren't key-specific don't trigger failover. Each agentic-loop iteration gets its own fresh walker, so a tool-call continuation can fail over independently of the initial request.

BYOK is unchanged: BYOK requests run as a single attempt with no failover.

## Changes

- `config.Anthropic` carries a `KeyPool`. `Key` remains for BYOK X-Api-Key set per interception.
- Blocking interceptor: walks the pool, marks keys on key-specific failures, returns on first success or non-failover error.
- Streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events.
- New `keypool` error types: `TransientExhaustionError` (carries soonest cooldown) and `ErrPermanentExhaustion`. Replace the prior `ErrAllKeysExhausted`.
- Error responses now consistently include the outer `"type": "error"` field.

## Related Issues

Related to: https://github.com/coder/internal/issues/1446
Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes

## Follow-up PRs

- Bedrock multi-key support.
- Refactor provider vs interceptor config separation.
- Record the actually-used key in the interception credential hint after failover.

> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira
2026-05-07 14:57:44 +01:00

38 lines
970 B
Go

package keypool
import (
"net/http"
"strconv"
"strings"
"time"
)
// ParseRetryAfter extracts the cooldown duration from response
// headers. It prefers the OpenAI-specific "retry-after-ms"
// header (milliseconds) over the standard "Retry-After" header
// (seconds). Returns zero if neither header is present or
// parseable. The HTTP-date form of "Retry-After" is not parsed.
func ParseRetryAfter(resp *http.Response) time.Duration {
if resp == nil {
return 0
}
// OpenAI convention: millisecond precision.
if val := resp.Header.Get("retry-after-ms"); val != "" {
ms, err := strconv.ParseFloat(strings.TrimSpace(val), 64)
if err == nil && ms > 0 {
return time.Duration(ms * float64(time.Millisecond))
}
}
// Standard header: seconds.
if val := resp.Header.Get("Retry-After"); val != "" {
seconds, err := strconv.Atoi(strings.TrimSpace(val))
if err == nil && seconds > 0 {
return time.Duration(seconds) * time.Second
}
}
return 0
}