mirror of
https://github.com/coder/coder.git
synced 2026-06-02 20:48:20 +00:00
a56c88a0cc
## Problem Two related symptoms of the same architectural issue: the `dbcrypt` wrapper is installed inside `enterprise/coderd.New`, so any access to `options.Database` that happens before `newAPI` runs bypasses encryption. **Symptom 1 (reads):** Provider keys added via the admin UI are encrypted at rest. `BuildProviders` was running *before* `newAPI`, against the unwrapped store, so the ciphertext was read as-is and shoved into the keypool as the upstream credential. Anthropic/OpenAI reject it, and the interception log shows: ``` coderd.aibridged.pool: interception failed ... error="all configured keys failed authentication" credential_kind=centralized credential_hint=PaPb...4A== credential_length=184 ``` **Symptom 2 (writes):** `SeedAIProvidersFromEnv` was also running before `newAPI`, against the unwrapped store, so env-derived keys (`CODER_AIBRIDGE_OPENAI_KEY`, indexed `CODER_AIBRIDGE_PROVIDER_<N>_KEY`, etc.) landed in `ai_provider_keys` as plaintext with `ApiKeyKeyID = null` even when `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` was set. ## Fix Move both `SeedAIProvidersFromEnv` and `BuildProviders` to after `newAPI`, where `options.Database` is the dbcrypt-wrapped store. Writes encrypt correctly; reads decrypt correctly. The enterprise closure (`enterprise/cli/server.go`) runs *inside* `newAPI` and calls `BuildProviders` for the aibridgeproxyd at that point. Once the agpl seed moves to after `newAPI`, the proxy on first boot would see no env-seeded providers. Add a matching seed call inside the enterprise closure before its `BuildProviders` to cover that case. Seeding is idempotent, so the agpl-side seed running again post-`newAPI` is a no-op when the rows already exist. ## Known shortcomings The clean version of this fix would just inherit `ctx` like every other startup step and place these calls naturally. It can't, for two reasons that are both about the surrounding handler architecture rather than this change: 1. **`dbcrypt` wrapping is positioned inside `newAPI`, not around `options.Database` at creation.** That's why both seed and build have to wait until after `newAPI` in the first place. The principled fix is to install the wrapper at the point the store is created (behind a hook the enterprise build supplies), so every consumer sees a single authoritative view and the ordering stops mattering. This would also collapse the duplicated seed call back to a single site. 2. **The handler's shutdown sequence is not deferred.** `coderAPICloser.Close()` and the other teardown steps run only if control reaches the `select` at the bottom of the handler. An early `return` from anywhere in Phase 1 (e.g. seed/build returning `context.Canceled` when the user hits ctrl-c during startup) skips that block and orphans all the goroutines `newAPI` spawned — tailnet workers, gitsync, telemetry batcher, etc. `goleak` then catches them at package teardown and `TestServer_TelemetryDisabled_FinalReport` fails. Moving the shutdown into deferred closers (with a `sync.Once`-guarded close to avoid double-close from the explicit Phase 2 call) is the principled fix. For this PR I took the smallest change that fixes the reported bugs: a detached context (`context.WithoutCancel(ctx)` + a 30s timeout) at the seed and build call sites in both the agpl and enterprise paths. It lets the calls complete even if the user cancels during startup, after which the handler reaches its shutdown select naturally and tears down through Phase 2. Both shortcomings above are worth addressing separately. ## Test plan - `make test RUN=TestServer_TelemetryDisabled_FinalReport` with `-race`; passes locally with `-count=3`. - Manually verified on a deployment with `CODER_EXTERNAL_TOKEN_ENCRYPTION_KEYS` set and env-configured providers: `ai_provider_keys.api_key_key_id` is populated, `api_key` is base64 ciphertext, and upstream auth succeeds. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>