mirror of
https://github.com/coder/coder.git
synced 2026-06-03 04:58:23 +00:00
386b449273
Previously, when a user sent a message, there was a 0–1000ms (avg ~500ms) polling delay before processing began. `SendMessage`/`CreateChat`/`EditMessage` set `status='pending'` in the DB and returned, but nothing woke the processing loop — it was a blind 1-second ticker. ## Changes **Event-driven acquisition (main change):** Adds a `wakeCh` channel to the chatd `Server`. `CreateChat`, `SendMessage`, `EditMessage`, and `PromoteQueued` call `signalWake()` after committing their transactions, which wakes the run loop to call `processOnce` immediately. The 1-second ticker remains as a fallback safety net for edge cases (stale recovery, missed signals). **Buffer WebSocket write channel:** Changes the `OneWayWebSocketEventSender` event channel from unbuffered to buffered (64), decoupling the event producer from WebSocket write speed. The existing 10s write timeout guards against stuck connections. <details><summary>Implementation plan & analysis</summary> The full latency analysis identified these sources of delay in the streaming pipeline: 1. **Chat acquisition polling** — 0–1000ms (avg 500ms) dead time per message. Fixed by wake channel. 2. **Unbuffered WebSocket write channel** — each token blocked on the previous WS write completing. Fixed by buffering. 3. **PersistStep DB transaction per step** — `FOR UPDATE` lock + batch insert. Not addressed in this PR (medium risk, would overlap DB write with next provider TTFB). 4. **Multi-hop channel pipeline** — 4 channel hops per token. Not addressed (medium complexity). </details> <details><summary>Test stabilization notes</summary> `signalWake()` causes the chatd daemon to process chats immediately after creation/send/edit, which exposed timing assumptions in several tests that expected chats to remain in `pending` status long enough to assert on. These tests were updated with `require.Eventually` + `WaitUntilIdleForTest` patterns to wait for processing to settle before asserting. The race detector (`test-go-race-pg`) shows failures in `TestCreateWorkspaceTool_EndToEnd` and `TestAwaitSubagentCompletion` — these appear to be pre-existing races in the end-to-end chat flow that are now exercised more aggressively because processing starts immediately instead of after a 1s delay. Main branch CI (race detector) passes without these changes. </details>