mirror of
https://github.com/coder/coder.git
synced 2026-06-03 04:58:23 +00:00
e7883d4573
## Problem `resolveDeploymentSystemPrompt` was called inside `InTx` closures in both `CreateChat` (`coderd/x/chatd/chatd.go`) and `createChildSubagentChatWithOptions` (`coderd/x/chatd/subagent.go`). That method uses `p.db` (the root store) internally to call `GetChatSystemPromptConfig`, which requires a second DB pool checkout while the transaction already holds one connection. Under concurrent chat creation load (e.g., the chat scaletest at 4800 chats), this causes pool starvation: every in-flight create holds one connection and blocks waiting for another, leading to `idle in transaction` pileups and cascading timeouts across the entire coderd DB pool — including unrelated background work like prebuild metrics and the chat acquire loop. ## Fix Move the `resolveDeploymentSystemPrompt` call before `p.db.InTx(...)` in both call sites. The system prompt config is a read-only deployment-level setting that does not need transactional consistency with the chat insert, so fetching it before the transaction is both safe and preferable (it also shortens transaction lifetime). ## Backporting The `CreateChat` instance of this bug is also present on `release/2.32` (`coderd/x/chatd/chatd.go` line 907). The `subagent.go` instance is not — the child-subagent-chat creation path with its own `InTx` was added after the branch cut. This should be backported, but because this is only in the chat creation path, and that's not typically hit with a great deal of concurrency in the real world, I don't think an urgent patch for 2.32 is necessary. ## Lint gap The existing `InTx` ruleguard rule in `scripts/rules.go` catches direct outer-store usage (`p.db.GetFoo()`) and passing the outer store as a function argument inside `InTx` closures, but it explicitly cannot catch indirect access through receiver methods like `p.resolveDeploymentSystemPrompt()` — the rule documents this blind spot at line 273. Catching this class of bug would require interprocedural analysis (following the callee's body to see if it touches `p.db`), which is beyond what ruleguard's AST pattern matching can express. We're considering a lightweight custom `go/analysis` analyzer (similar to `paralleltestctx`) that does 1-level same-package callee inspection to detect this pattern. In the meantime, this PR adds guidance to `AGENTS.md` so AI reviewers can flag the pattern during code review.