When users pass --wait=no or set CODER_SSH_WAIT=no, startup logs are no
longer dumped to stderr. The stage indicator is still shown, just not
the log content.
Fixes#13580
The Agent function had complex nested control flow and cross-case state sharing
via the showStartupLogs flag. This made the code hard to follow and maintain.
This change extract an agentWaiter struct with self-contained methods:
- wait: main state machine loop
- waitForConnection: handles Connecting/Timeout states
- handleConnected: handles Connected state and startup scripts
- streamLogs: handles log streaming/fetching
- waitForReconnection: handles Disconnected state
- pollWhile: helper to consolidate polling loops
Each handler is now self-contained with no cross-method state sharing and the
showStartupLogs flag is replaced by return values and the waitedForConnection
tracking variable.
## Summary
In this pull request we're adding a simple backoff to the workspace
agent polling. This backoff is being added to address seemingly random
cases of elevated number of calls that we've seen to the
`api/v2/workspaceagent/{agent_id}` endpoint.
For more information on the investigation, see:
https://github.com/coder/internal/issues/725
### Changes
- Updated the polling to use predefined progressive intervals for
polling instead of continuously polling every 500ms
### Testing
- Added a test for the function used to calculate the progressive
polling interval
Co-authored-by: Spike Curtis <spike@coder.com>
Whilst the `networking-troubleshooting` docs page already mentions that
a direct connection can be established over a private network, even if
there are no STUN servers, it's worth this is the case at the end of the
ping output.
This also removes a print statement that was dirtying up the diagnostic
output, and corrects the name of the `--disable-direct-connections`
flag.
First PR to address #14244.
Adds common potential reasons as to why a direct connection to the workspace agent couldn't be established to `coder ping`:
- If the Coder deployment administrator has blocked direction connections (`CODER_BLOCK_DIRECT`).
- If the client has no STUN servers within it's DERP map.
- If the client or agent appears to be behind a hard NAT, as per Tailscale `netInfo.MappingVariesByDestIP`
Also adds a warning if the client or agent has a network interface below the 'safe' MTU for tailnet. This warning is always displayed at the end of a `coder ping`.
Beginnings of a solution to #12297
Doesn't cover disco or definitively display whether we successfully connected to DERP, but shows some checklist diagnostics for connecting to an agent.
For this first PR, I just added it to `coder ping` to see how we like it, but could be incorporated into `coder ssh` _et al._ after a timeout.
```
$ coder ping dogfood2
p2p connection established in 147ms
pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 147ms
pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 140ms
pong from dogfood2 p2p via 95.217.xxx.yyy:42631 in 140ms
✔ preferred DERP region 999 (Council Bluffs, Iowa)
✔ sent local data to Coder networking coodinator
✔ received remote agent data from Coder networking coordinator
preferred DERP 10013 (Europe Fly.io (Paris))
endpoints: 95.217.xxx.yyy:42631, 95.217.xxx.yyy:37576, 172.17.0.1:37576, 172.20.0.10:37576
✔ Wireguard handshake 11s ago
```
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen
* chore: rename startup logs to agent logs
This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.
* Fix migration order
* Fix naming
* Rename the frontend
* Fix tests
* Fix down migration
* Match enums for workspace agent logs
* Fix inserting log source
* Fix migration order
* Fix logs tests
* Fix psql insert
* fix: remove startup logs eof for streaming
We have external utilities like logstream-kube that may send
logs after an agent shuts down unexpectedly to report additional
information. In a recent change we stopped accepting these logs,
which broke these utilities.
In the future we'll rename startup logs to agent logs or something
more generalized so this is less confusing in the future.
* fix(cli/cliui): handle never ending startup log stream in Agent
---------
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
* feat(api): Add agent shutdown lifecycle states
* feat(agent): Add shutdown_script support
* feat(agent): Add shutdown_script timeout
* feat(site): Support new agent lifecycle states
---
Co-authored-by: Marcin Tojek <marcin@coder.com>
* fix: Fix ssh message/spinner in VSCode integrated terminal
The messages never show up in VSCode integrated terminal due to the
defer `fmt.Fprintf`. There could be a race in VSCode in handling the
terminal codes but ultimately, we can simplify our logic by just
stopping the spinner for the duration of the update.
* Avoid race in starting spinner after exit
* feat: Add connection_timeout and troubleshooting_url to agent
This commit adds the connection timeout and troubleshooting url fields
to coder agents.
If an initial connection cannot be established within connection timeout
seconds, then the agent status will be marked as `"timeout"`.
The troubleshooting URL will be present, if configured in the Terraform
template, it can be presented to the user when the agent state is either
`"timeout"` or `"disconnected"`.
Fixes#4678
This changes all "coder workspace *" commands to root.
A few of these were already at the root, like SSH. The
inconsistency made for a confusing experience.
This enables a "kubernetes_pod" to attach multiple agents that
could be for multiple services. Each agent is required to have
a unique name, so SSH syntax is:
`coder ssh <workspace>.<agent>`
A resource can have zero agents too, they aren't required.
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* feat: Add stage to build logs
This adds a stage property to logs, and refactors the job logs
cliui.
It also adds tests to the cliui for build logs!
* feat: Add config-ssh and tests for resiliency
* Rename "Echo" test to "ImmediateExit"
* Fix Terraform resource agent association
* Fix logs post-cancel
* Fix select on Windows
* Remove terraform init logs
* Move timer into it's own loop
* Fix race condition in provisioner jobs
* Fix requested changes