Garrett Delfosse 54d650ea79 fix(tailnet): preserve DNS hosts across control plane reconnections (#24253)
When the control plane connection drops and reconnects, a new
`tunnelUpdater` is created with empty workspace state. This causes the
in-memory DNS resolver to lose all host records, breaking `.coder` name
resolution until the server sends a fresh workspace snapshot.

If the API is unreachable (e.g., the route goes through a VPN that is
also reconnecting), the snapshot never arrives and DNS stays broken
indefinitely — requiring a full Coder Desktop restart.

Fix: carry workspace state from the previous `tunnelUpdater` to the new
one on reconnect, and immediately re-apply DNS hosts so the resolver
stays populated during the reconnection window.

Fixes https://linear.app/codercom/issue/PLAT-110

<details><summary>Investigation & decision log</summary>

### Root cause analysis

Customer diagnostic data from Roblox (March 31) showed:
- NRPT rule present (`.coder` → `fd60:627a:a42b::53`) — routing is
correct
- DNS resolver returns NXDOMAIN for everything including the sentinel
`is.coder--connect--enabled--right--now.coder` — resolver is running but
has zero host records
- Coder Connect UI shows "connected" — the WireGuard data plane is up

The resolver is empty because
`TunnelAllWorkspaceUpdatesController.New()` creates a fresh
`tunnelUpdater` with `workspaces: make(map[uuid.UUID]*Workspace)`
(empty). The previous updater's workspace data is discarded. If the
server's workspace snapshot is delayed or the API is unreachable, the
resolver has no records to serve.

This is compounded by GlobalProtect VPN reconnects: the Coder API is
behind the VPN, so when GP reconnects, the API route is temporarily lost
and the snapshot can't arrive.

### What this PR changes

- `TunnelAllWorkspaceUpdatesController.New()` now clones workspace state
from the previous updater before creating the new one
- Immediately re-applies DNS hosts with the inherited state (log:
`re-applying DNS hosts from previous session`)
- When the server's snapshot arrives, it replaces the inherited data
normally
- If `SetDNSHosts` fails during re-apply, it's logged as a warning and
not fatal — the recvLoop will program DNS when the snapshot arrives

### What this PR does NOT fix (future work)

- **Tunnel binary restart**: when the tunnel process itself is killed
and relaunched, all in-memory state is lost. A DNS host cache on disk
would be needed for this case.
- **NRPT rule cleanup on startup**: the Tailscale fork's
`nrptRuleDatabase` constructor unconditionally deletes all NRPT rules on
engine creation. Deferring cleanup to the first successful `SetDNS` call
would reduce the DNS gap.
- **Hosts file retry**: the `setHosts()` retry in the Tailscale fork
(5×10ms) is too short for environments where endpoint security locks the
file.

These are tracked as follow-up items in the `coder/tailscale` fork.

</details>

> 🤖 Generated by Coder Agents
2026-04-29 12:29:44 -04:00
2022-04-04 11:55:06 -05:00

Coder Logo Light Coder Logo Dark

Self-Hosted Cloud Development Environments

Coder Banner Light Coder Banner Dark

Quickstart | Docs | Why Coder | Premium

discord release godoc Go Report Card OpenSSF Best Practices OpenSSF Scorecard license

Coder enables organizations to set up development environments in their public or private cloud infrastructure. Cloud development environments are defined with Terraform, connected through a secure high-speed Wireguard® tunnel, and automatically shut down when not used to save on costs. Coder gives engineering teams the flexibility to use the cloud for workloads most beneficial to them.

  • Define cloud development environments in Terraform
    • EC2 VMs, Kubernetes Pods, Docker Containers, etc.
  • Automatically shutdown idle resources to save on costs
  • Onboard developers in seconds instead of days

Coder Hero Image

Quickstart

The most convenient way to try Coder is to install it on your local machine and experiment with provisioning cloud development environments using Docker (works on Linux, macOS, and Windows).

# First, install Coder
curl -L https://coder.com/install.sh | sh

# Start the Coder server (caches data in ~/.cache/coder)
coder server

# Navigate to http://localhost:3000 to create your initial user,
# create a Docker template and provision a workspace

Install

The easiest way to install Coder is to use our install script for Linux and macOS. For Windows, use the latest ..._installer.exe file from GitHub Releases.

curl -L https://coder.com/install.sh | sh

You can run the install script with --dry-run to see the commands that will be used to install without executing them. Run the install script with --help for additional flags.

See install for additional methods.

Once installed, you can start a production deployment with a single command:

# Automatically sets up an external access URL on *.try.coder.app
coder server

# Requires a PostgreSQL instance (version 13 or higher) and external access URL
coder server --postgres-url <url> --access-url <url>

Use coder --help to get a list of flags and environment variables. Use our install guides for a complete walkthrough.

Documentation

Browse our docs here or visit a specific section below:

  • Templates: Templates are written in Terraform and describe the infrastructure for workspaces
  • Workspaces: Workspaces contain the IDEs, dependencies, and configuration information needed for software development
  • IDEs: Connect your existing editor to a workspace
  • Administration: Learn how to operate Coder
  • Premium: Learn about our paid features built for large teams

Support

Feel free to open an issue if you have questions, run into bugs, or have a feature request.

Join our Discord to provide feedback on in-progress features and chat with the community using Coder!

Integrations

We are always working on new integrations. Please feel free to open an issue and ask for an integration. Contributions are welcome in any official or community repositories.

Official

Community

Contributing

We are always happy to see new contributors to Coder. If you are new to the Coder codebase, we have a guide on how to get started. We'd love to see your contributions!

Hiring

Apply here if you're interested in joining our team.

Languages
Go 74.4%
TypeScript 23.5%
Shell 0.8%
HCL 0.4%
PLpgSQL 0.3%
Other 0.2%