Files
coder/coderd/httpapi
Kyle Carberry 386b449273 perf(coderd): reduce chat streaming latency with event-driven acquisition (#23745)
Previously, when a user sent a message, there was a 0–1000ms (avg
~500ms) polling delay before processing began.
`SendMessage`/`CreateChat`/`EditMessage` set `status='pending'` in the
DB and returned, but nothing woke the processing loop — it was a blind
1-second ticker.

## Changes

**Event-driven acquisition (main change):** Adds a `wakeCh` channel to
the chatd `Server`. `CreateChat`, `SendMessage`, `EditMessage`, and
`PromoteQueued` call `signalWake()` after committing their transactions,
which wakes the run loop to call `processOnce` immediately. The 1-second
ticker remains as a fallback safety net for edge cases (stale recovery,
missed signals).

**Buffer WebSocket write channel:** Changes the
`OneWayWebSocketEventSender` event channel from unbuffered to buffered
(64), decoupling the event producer from WebSocket write speed. The
existing 10s write timeout guards against stuck connections.

<details><summary>Implementation plan & analysis</summary>

The full latency analysis identified these sources of delay in the
streaming pipeline:

1. **Chat acquisition polling** — 0–1000ms (avg 500ms) dead time per
message. Fixed by wake channel.
2. **Unbuffered WebSocket write channel** — each token blocked on the
previous WS write completing. Fixed by buffering.
3. **PersistStep DB transaction per step** — `FOR UPDATE` lock + batch
insert. Not addressed in this PR (medium risk, would overlap DB write
with next provider TTFB).
4. **Multi-hop channel pipeline** — 4 channel hops per token. Not
addressed (medium complexity).

</details>

<details><summary>Test stabilization notes</summary>

`signalWake()` causes the chatd daemon to process chats immediately
after creation/send/edit, which exposed timing assumptions in several
tests that expected chats to remain in `pending` status long enough to
assert on. These tests were updated with `require.Eventually` +
`WaitUntilIdleForTest` patterns to wait for processing to settle before
asserting.

The race detector (`test-go-race-pg`) shows failures in
`TestCreateWorkspaceTool_EndToEnd` and `TestAwaitSubagentCompletion` —
these appear to be pre-existing races in the end-to-end chat flow that
are now exercised more aggressively because processing starts
immediately instead of after a 1s delay. Main branch CI (race detector)
passes without these changes.

</details>
2026-03-28 15:26:42 -04:00
..
2023-02-18 18:32:09 -06:00