mirror of
https://github.com/coder/coder.git
synced 2026-06-04 21:48:22 +00:00
63b6868113
## Problem In multi-replica Coder deployments, the chat relay WebSocket between replicas fails with HTTP 401 (or TLS handshake errors). The subscriber replica cannot relay `message_part` events from the worker replica. **Root cause:** `codersdk.Client.Dial()` does not pass `c.HTTPClient` to `websocket.DialOptions.HTTPClient`. The websocket library (`github.com/coder/websocket`) falls back to `http.DefaultClient`, which lacks the mesh TLS configuration needed for inter-replica communication. The relay code in `enterprise/coderd/chatd/chatd.go` correctly sets `sdkClient.HTTPClient = cfg.ReplicaHTTPClient` (which has mesh TLS certs), but that client was never used for the actual WebSocket handshake. ## Fix One-line fix in `codersdk/client.go`: propagate `c.HTTPClient` to `opts.HTTPClient` when the caller hasn't already set one. ## Test Added `TestChatStreamRelay/RelayWithTLSAndCookieAuth` which: - Sets up two replicas with TLS certificates (simulating mesh TLS in production) - Authenticates via cookies (simulating browser WebSocket behavior) - Verifies message_part events relay across replicas over TLS This test times out without the fix because the WebSocket handshake fails with `x509: certificate signed by unknown authority` (http.DefaultClient rejects self-signed certs). ## Related Follow-up to #22635 which fixed the `redirectToAccessURL` middleware bypassing 307 redirects for relay requests. That fix changed the error from HTTP 200 to HTTP 401, exposing this deeper issue.