Files
Susana Ferreira 0766cc3097 feat: add automatic key failover for AI Bridge passthrough (#24920)
## Description

Adds automatic key failover for passthrough routes for the Anthropic and OpenAI providers. A new `keyFailoverTransport` wraps the reverse-proxy transport: centralized requests walk the configured key pool and retry with the next key on key-specific failures (401/403/429), reusing the same key-marking semantics as the bridged routes.

BYOK passthrough requests run as a single attempt with no failover.

## Changes

- New `keypool.KeyFailoverConfig` carrying the `Pool` to walk and the provider-specific closures (`IsBYOK`, `InjectAuthKey`, `MarkKey`, `BuildExhaustedResponse`).
- New `keypool.NewKeyFailoverTransport`: wraps an inner `http.RoundTripper`. Returns `inner` unchanged when `Pool` is nil, otherwise produces a transport that buffers the request body once, walks the pool per request, and replays each attempt with the next key.
- New `Provider.KeyFailoverConfig(logger)` interface method. Anthropic injects `X-Api-Key`; OpenAI injects `Authorization: Bearer ...`; Copilot returns an empty config.
- `passthrough.go` wires `NewKeyFailoverTransport` around the existing apidump middleware, so every retry attempt is recorded.

## Related Issues

Related to: https://github.com/coder/internal/issues/1446
Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes

## Follow-up PRs

- Remove dead `Provider.InjectAuthHeader` method now that all auth is applied per-attempt by `KeyFailoverTransport`.
- Bedrock multi-key support.
- Refactor provider vs interceptor config separation.
- Record the actually-used key in the interception credential hint after failover.

> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira
2026-05-07 15:46:36 +01:00

92 lines
2.5 KiB
Go

package utils_test
import (
"io"
"net/http"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coder/coder/v2/aibridge/utils"
)
func TestNewJSONErrorResponse(t *testing.T) {
t.Parallel()
tests := []struct {
name string
status int
retryAfter time.Duration
body []byte
// Empty string means the header should be absent.
expectRetryAfter string
}{
{
// Permanent exhaustion: 502 with no Retry-After.
name: "permanent_no_retry_after",
status: http.StatusBadGateway,
retryAfter: 0,
body: []byte(`{"error":"permanent"}`),
expectRetryAfter: "",
},
{
// Transient exhaustion with zero retryAfter: no Retry-After.
name: "transient_no_retry_after",
status: http.StatusTooManyRequests,
retryAfter: 0,
body: []byte(`{"error":"rate"}`),
expectRetryAfter: "",
},
{
// Transient exhaustion: 429 with Retry-After in seconds.
name: "transient_with_retry_after",
status: http.StatusTooManyRequests,
retryAfter: 60 * time.Second,
body: []byte(`{"error":"rate"}`),
expectRetryAfter: "60",
},
{
// Transient exhaustion with negative retryAfter: Retry-After header omitted.
name: "transient_negative_retry_after",
status: http.StatusTooManyRequests,
retryAfter: -1 * time.Second,
body: []byte(`{"error":"rate"}`),
expectRetryAfter: "",
},
{
// Transient exhaustion with 500ms retryAfter rounds up to Retry-After: 1.
name: "transient_under_one_second_rounds_up",
status: http.StatusTooManyRequests,
retryAfter: 500 * time.Millisecond,
body: []byte(`{"error":"rate"}`),
expectRetryAfter: "1",
},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
resp := utils.NewJSONErrorResponse(tc.status, tc.retryAfter, tc.body)
require.NotNil(t, resp)
assert.Equal(t, tc.status, resp.StatusCode)
assert.Equal(t, "application/json", resp.Header.Get("Content-Type"))
assert.Equal(t, int64(len(tc.body)), resp.ContentLength)
if tc.expectRetryAfter == "" {
assert.Empty(t, resp.Header.Get("Retry-After"))
} else {
assert.Equal(t, tc.expectRetryAfter, resp.Header.Get("Retry-After"))
}
body, err := io.ReadAll(resp.Body)
require.NoError(t, err)
require.NoError(t, resp.Body.Close())
assert.Equal(t, tc.body, body)
})
}
}