## Summary Fixes a bug where interrupting a streaming chat and sending a new message left the relay connected to the wrong replica. Expanded into a broader refactor that cleanly separates concerns: - **OSS** owns pubsub subscription, message catch-up, queue updates, status forwarding, and local parts merging. - **Enterprise** (`enterprise/coderd/chatd`) only manages relay dialing, reconnection, and stale-dial discarding for cross-replica streaming. ## Architecture ### OSS `coderd/chatd/chatd.go` `Subscribe()` builds the initial snapshot then runs a single merge goroutine that handles: - Pubsub subscription for durable events (status, messages, queue, errors) - Message catch-up via `AfterMessageID` - Local `message_part` forwarding - Relay events from enterprise (when `SubscribeFn` is set) - Sends `StatusNotification` to enterprise so it can manage relay lifecycle Key types: - `SubscribeFn` — enterprise hook, returns relay-only events channel - `SubscribeFnParams` — `ChatID`, `Chat`, `WorkerID`, `StatusNotifications`, `RequestHeader`, `DB`, `Logger` - `StatusNotification` — `Status` + `WorkerID`, sent to enterprise on pubsub status changes ### Enterprise `enterprise/coderd/chatd/chatd.go` `NewMultiReplicaSubscribeFn(cfg MultiReplicaSubscribeConfig)` returns a `SubscribeFn` that: - Opens an initial synchronous relay if the chat is running on a remote worker - Reads `StatusNotifications` from OSS to open/close relay connections - Handles async dial, reconnect timers, stale-dial discarding - Returns only relay `message_part` events ## Bug fixes ### Original bug: stale relay dial after interrupt `openRelayAsync` goroutines used `mergedCtx` (subscription-level), not a per-dial context. `closeRelay()` could not cancel in-flight dials. When the user interrupts and a new replica picks up the chat, the old dial goroutine could complete after the new one and deliver a stale `relayResult`. **Fix**: per-dial `dialCtx`/`dialCancel`, `expectedWorkerID` tracking, `workerID` on `relayResult`. `closeRelay()` cancels the dial context and drains `relayReadyCh`. Merge loop rejects mismatched worker IDs. ### Additional fixes - `statusNotifications` send-on-closed-channel race — goroutine now owns `close()` via defer - Enterprise spin-loop on `StatusNotifications` close — two-value receive with nil-out - `hasPubsub` set from `p.pubsub != nil` instead of subscription success — now tracks actual subscription result - `lastMessageID` not initialized from `afterMessageID` — caused duplicate messages on catch-up - `wrappedParts` goroutine leaked remote connection on `dialCtx` cancel - `closeRelay()` did not drain `relayReadyCh` - `setChatWaiting` race with `SendMessage(Interrupt)` — wrapped in `InTx` - `processChat` post-TX side effects fired when chat was taken by another worker — added `errChatTakenByOtherWorker` sentinel - Cancel closure data race on `reconnectTimer` - Bare blocking send on pubsub error path - `localParts` hot-spin after channel close - No-pubsub branch dropped relay events and initial snapshot - Failed relay dial caused permanent stall (no reconnect retry) - DB error during reconnect timer caused permanent stall - `time.NewTimer` replaced with `quartz.Clock` for testable timing ## Tests 9 enterprise tests covering: - Relay reconnect on drop (mock clock) - Async dial does not block merge loop - Relay snapshot delivery - Stale dial discarded after interrupt - Cancel during in-flight dial - Running-to-running worker switch - Failed dial retries (mock clock) - Local worker closes relay - Multiple consecutive reconnects (mock clock) All pass with `-race`.
Coder enables organizations to set up development environments in their public or private cloud infrastructure. Cloud development environments are defined with Terraform, connected through a secure high-speed Wireguard® tunnel, and automatically shut down when not used to save on costs. Coder gives engineering teams the flexibility to use the cloud for workloads most beneficial to them.
- Define cloud development environments in Terraform
- EC2 VMs, Kubernetes Pods, Docker Containers, etc.
- Automatically shutdown idle resources to save on costs
- Onboard developers in seconds instead of days
Quickstart
The most convenient way to try Coder is to install it on your local machine and experiment with provisioning cloud development environments using Docker (works on Linux, macOS, and Windows).
# First, install Coder
curl -L https://coder.com/install.sh | sh
# Start the Coder server (caches data in ~/.cache/coder)
coder server
# Navigate to http://localhost:3000 to create your initial user,
# create a Docker template and provision a workspace
Install
The easiest way to install Coder is to use our
install script for Linux
and macOS. For Windows, use the latest ..._installer.exe file from GitHub
Releases.
curl -L https://coder.com/install.sh | sh
You can run the install script with --dry-run to see the commands that will be used to install without executing them. Run the install script with --help for additional flags.
See install for additional methods.
Once installed, you can start a production deployment with a single command:
# Automatically sets up an external access URL on *.try.coder.app
coder server
# Requires a PostgreSQL instance (version 13 or higher) and external access URL
coder server --postgres-url <url> --access-url <url>
Use coder --help to get a list of flags and environment variables. Use our install guides for a complete walkthrough.
Documentation
Browse our docs here or visit a specific section below:
- Templates: Templates are written in Terraform and describe the infrastructure for workspaces
- Workspaces: Workspaces contain the IDEs, dependencies, and configuration information needed for software development
- IDEs: Connect your existing editor to a workspace
- Administration: Learn how to operate Coder
- Premium: Learn about our paid features built for large teams
Support
Feel free to open an issue if you have questions, run into bugs, or have a feature request.
Join our Discord to provide feedback on in-progress features and chat with the community using Coder!
Integrations
We are always working on new integrations. Please feel free to open an issue and ask for an integration. Contributions are welcome in any official or community repositories.
Official
- VS Code Extension: Open any Coder workspace in VS Code with a single click
- JetBrains Toolbox Plugin: Open any Coder workspace from JetBrains Toolbox with a single click
- JetBrains Gateway Plugin: Open any Coder workspace in JetBrains Gateway with a single click
- Dev Container Builder: Build development environments using
devcontainer.jsonon Docker, Kubernetes, and OpenShift - Coder Registry: Build and extend development environments with common use-cases
- Kubernetes Log Stream: Stream Kubernetes Pod events to the Coder startup logs
- Self-Hosted VS Code Extension Marketplace: A private extension marketplace that works in restricted or airgapped networks integrating with code-server.
- Setup Coder: An action to setup coder CLI in GitHub workflows.
Community
- Provision Coder with Terraform: Provision Coder on Google GKE, Azure AKS, AWS EKS, DigitalOcean DOKS, IBMCloud K8s, OVHCloud K8s, and Scaleway K8s Kapsule with Terraform
- Coder Template GitHub Action: A GitHub Action that updates Coder templates
Contributing
We are always happy to see new contributors to Coder. If you are new to the Coder codebase, we have a guide on how to get started. We'd love to see your contributions!
Hiring
Apply here if you're interested in joining our team.
