Files
coder/coderd/x
Ethan e7883d4573 fix(coderd/x/chatd): hoist system prompt fetch out of chat creation transactions (#24369)
## Problem

`resolveDeploymentSystemPrompt` was called inside `InTx` closures in
both `CreateChat` (`coderd/x/chatd/chatd.go`) and
`createChildSubagentChatWithOptions` (`coderd/x/chatd/subagent.go`).
That method uses `p.db` (the root store) internally to call
`GetChatSystemPromptConfig`, which requires a second DB pool checkout
while the transaction already holds one connection.

Under concurrent chat creation load (e.g., the chat scaletest at 4800
chats), this causes pool starvation: every in-flight create holds one
connection and blocks waiting for another, leading to `idle in
transaction` pileups and cascading timeouts across the entire coderd DB
pool — including unrelated background work like prebuild metrics and the
chat acquire loop.

## Fix

Move the `resolveDeploymentSystemPrompt` call before `p.db.InTx(...)` in
both call sites. The system prompt config is a read-only
deployment-level setting that does not need transactional consistency
with the chat insert, so fetching it before the transaction is both safe
and preferable (it also shortens transaction lifetime).

## Backporting

The `CreateChat` instance of this bug is also present on `release/2.32`
(`coderd/x/chatd/chatd.go` line 907). The `subagent.go` instance is not
— the child-subagent-chat creation path with its own `InTx` was added
after the branch cut.

This should be backported, but because this is only in the chat creation
path, and that's not typically hit with a great deal of concurrency in
the real world, I don't think an urgent patch for 2.32 is necessary.

## Lint gap

The existing `InTx` ruleguard rule in `scripts/rules.go` catches direct
outer-store usage (`p.db.GetFoo()`) and passing the outer store as a
function argument inside `InTx` closures, but it explicitly cannot catch
indirect access through receiver methods like
`p.resolveDeploymentSystemPrompt()` — the rule documents this blind spot
at line 273. Catching this class of bug would require interprocedural
analysis (following the callee's body to see if it touches `p.db`),
which is beyond what ruleguard's AST pattern matching can express. We're
considering a lightweight custom `go/analysis` analyzer (similar to
`paralleltestctx`) that does 1-level same-package callee inspection to
detect this pattern. In the meantime, this PR adds guidance to
`AGENTS.md` so AI reviewers can flag the pattern during code review.
2026-04-16 00:13:15 +10:00
..