mirror of
https://github.com/coder/coder.git
synced 2026-06-03 04:58:23 +00:00
b6dacb4a3c
## Description Adds automatic key failover for centralized OpenAI provider, covering both chat completions and responses APIs. Same shape as the Anthropic PR: each upstream call walks the configured key pool, keys are marked **temporary** on 429 (with cooldown from `Retry-After`) and **permanent** on 401/403. Each agentic-loop iteration gets its own fresh walker so a tool-call continuation can fail over independently of the initial request. BYOK is unchanged: BYOK requests run as a single attempt with no failover. ## Changes - `config.OpenAI` carries a `KeyPool`. `Key` remains for BYOK Authorization Bearer set per interception. - Chat completions blocking interceptor: walks the pool via `newChatCompletionWithKeyFailover`, marks keys on key-specific failures, returns on first success or non-failover error. - Chat completions streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events. - Responses blocking interceptor: extracts `newResponseWithKeyFailover` parallel to chatcompletions. - Responses streaming interceptor: per-iteration walker, retains the existing buffer-then-forward design. ## Related Issues Related to: https://github.com/coder/internal/issues/1446 Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes ## Follow-up PRs - Bedrock multi-key support. - Refactor provider vs interceptor config separation. - Record the actually-used key in the interception credential hint after failover. > [!NOTE] > Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira