Commit Graph

568 Commits

Author SHA1 Message Date
Jon Ayers 9f4972901c fix: avoid logging no such process errors for process priority (#14655) 2024-09-12 18:00:42 +00:00
Spike Curtis fb3523b37f chore: remove legacy AgentIP address (#14640)
Removes the support for the Agent's "legacy IP" which was a hardcoded IP address all agents used to use, before we introduced "single tailnet". Single tailnet went GA in 2.7.0.
2024-09-12 07:40:19 +04:00
Ethan c8580a415a feat: expose current agent connections by type via prometheus (#14612) 2024-09-11 14:13:30 +10:00
Danielle Maywood 25f1ddbf5e feat: add 'hidden' option to 'coder_app' to hide app from UI (#14570)
Add 'hidden' property to 'coder_app' resource to allow hiding apps from the UI.
2024-09-09 14:39:32 +01:00
Mathias Fredriksson 8f07d3357e feat(agent/agentssh): use tcp for X11 forwarding (#14560)
Fixes #14198
2024-09-04 20:06:08 +03:00
Danielle Maywood 839918c5e7 chore(docs): document agent api debug endpoints (#14454)
* chore(docs): add agent api debug docs

* chore(docs): add sections to agent api readme

* chore(docs): link debug manifest to agentsdk.Manifest schema

* chore(docs): add high level overview of agent api debug docs

* chore(docs): link to agent api docs from reference

* chore(docs): fix invalid paths

* chore(docs): use env variable for coder agent debug address
2024-08-28 09:47:14 +01:00
Ethan 8c15192433 feat(cli): add p2p diagnostics to ping (#14426)
First PR to address #14244.

Adds common potential reasons as to why a direct connection to the workspace agent couldn't be established to `coder ping`:
- If the Coder deployment administrator has blocked direction connections (`CODER_BLOCK_DIRECT`).
- If the client has no STUN servers within it's DERP map.
- If the client or agent appears to be behind a hard NAT, as per Tailscale `netInfo.MappingVariesByDestIP`

Also adds a warning if the client or agent has a network interface below the 'safe' MTU for tailnet. This warning is always displayed at the end of a `coder ping`.
2024-08-28 15:39:01 +10:00
Mathias Fredriksson 8c0565177e chore(agent): remove err=<nil> log for batch update metadata complete (#14179)
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2024-08-07 11:31:47 +00:00
Spike Curtis 70c5c47efd fix: stop blocking fake Agent API channel writes after context expires (#13908) 2024-07-16 23:22:13 +04:00
Muhammad Atif Ali 0787de88a9 chore: update documentation links to the new format (#13797) 2024-07-10 21:31:37 +03:00
Spike Curtis f6639b788f feat: add ConnectRPC variants for older Agent API versions (#13778) 2024-07-03 16:11:05 +04:00
Spike Curtis e5268e4551 chore: spin clock library out to coder/quartz repo (#13777)
Code that was in `/clock` has been moved to github.com/coder/quartz.  This PR refactors our use of the clock library to point to the external Quartz repo.
2024-07-03 15:02:54 +04:00
Spike Curtis 8923ce5216 fix: fix flake in TestAppHealth_Healthy (#13607) 2024-06-20 12:02:31 +04:00
Spike Curtis c01d6fdf46 chore: refactor apphealth and tests to use clock testing library (#13576)
Refactors the apphealth subsystem and unit tests to use `clock.Clock`.

Also slightly simplifies the implementation, which wrapped a function that never returned an error in a `retry.Retry`.  The retry is entirely superfluous in that case, so removed.

UTs used to take a few seconds to run, and now run in milliseconds or better.  No sleeps, `Eventually`, or polling.

Dropped the "no spamming" test since we can directly assert the number of handler calls on the mainline test case.
2024-06-14 15:06:33 +04:00
Marcin Tojek e96652ebbc feat: block file transfers for security (#13501) 2024-06-10 12:12:23 +00:00
Kayla Washburn-Love b248f125e1 chore: rename notification banners to announcement banners (#13419) 2024-05-31 10:59:28 -06:00
Kayla Washburn-Love d8e0be6ee6 feat: add support for multiple banners (#13081) 2024-05-08 15:40:43 -06:00
Spike Curtis d51c6912a7 fix: make handleManifest always signal dependents (#13141)
Fixes #13139

Using a bare channel to signal dependent goroutines means that we can only signal success, not failure, which leads to deadlock if we fail in a way that doesn't cause the whole `apiConnRoutineManager` to tear down routines.

Instead, we use a new object called a `checkpoint` that signals success or failure, so that dependent routines get unblocked if the routine they depend on fails.
2024-05-06 14:47:41 +04:00
Spike Curtis 2efb46a10e chore: remove superfluous context.Canceled handling (#13140)
Removes a check for `context.Canceled` inside the `handleManifest` routine.  This checking is handled in the `apiConnRoutineManager`, so checking inside the handler is redundant.
2024-05-06 14:33:16 +04:00
Cian Johnston 99dda4a43a fix(agent): keep track of lastReportIndex between invocations of reportLifecycle() (#13075) 2024-04-25 16:54:51 +01:00
Jon Ayers 426e9f2b96 feat: support adjusting child proc oom scores (#12655) 2024-04-03 09:42:03 -05:00
Colin Adler 4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
Cian Johnston b0c4e7504c feat(support): add client magicsock and agent prometheus metrics to support bundle (#12604)
* feat(codersdk): add ability to fetch prometheus metrics directly from agent
* feat(support): add client magicsock and agent prometheus metrics to support bundle
* refactor(support): simplify AgentInfo control flow

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2024-03-15 15:33:49 +00:00
Cian Johnston 653ddccd8e fix(agent): remove unused token debug handler (#12602) 2024-03-15 09:43:36 +00:00
Cian Johnston 63696d762f feat(codersdk): add debug handlers for logs, manifest, and token to agent (#12593)
* feat(codersdk): add debug handlers for logs, manifest, and token to agent

* add more logging

* use io.LimitReader instead of seeking
2024-03-14 15:36:12 +00:00
Cian Johnston 3b406878e0 feat(agent): expose HTTP debug server over tailnet API (#12582) 2024-03-14 10:02:01 +00:00
Colin Adler e5d911462f fix(tailnet): enforce valid agent and client addresses (#12197)
This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.
2024-03-01 09:02:33 -06:00
Cian Johnston eba8cd7c07 chore: consolidate various randomPort() implementations (#12362)
Consolidates our existing randomPort() implementations to package testutil
2024-02-29 12:51:44 +00:00
Mathias Fredriksson 32691e67e6 test(agent/agentscripts): fix test flake in TestEnv (#12326) 2024-02-27 17:58:10 +00:00
Spike Curtis b0afffbafb feat: use v2 API for agent metadata updates (#12281)
Switches the agent to report metadata over the v2 API.

Fixes #10534
2024-02-26 09:50:19 +04:00
Spike Curtis aa7a9f5cc4 feat: use v2 API for agent lifecycle updates (#12278)
Agent uses the v2 API to post lifecycle updates.

Part of #10534
2024-02-23 15:24:28 +04:00
Spike Curtis 4cc132cea0 feat: switch agent to use v2 API for sending logs (#12068)
Changes the agent to use the new v2 API for sending logs, via the logSender component.

We keep the PatchLogs function around, but deprecate it so that we can test the v1 endpoint.
2024-02-23 11:27:15 +04:00
Spike Curtis af3fdc68c3 chore: refactor agent routines that use the v2 API (#12223)
In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection.

This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.
2024-02-23 11:04:23 +04:00
Mathias Fredriksson b1c0b39d88 feat(agent): add script data dir for binaries and files (#12205)
The agent is extended with a `--script-data-dir` flag, defaulting to the
OS temp dir. This dir is used for storing `coder-script-data/bin` and
`coder-script/[script uuid]`. The former is a place for all scripts to
place executable binaries that will be available by other scripts, SSH
sessions, etc. The latter is a place for the script to store files.

Since we default to OS temp dir, files are ephemeral by default. In the
future, we may consider adding new env vars or changing the default
storage location. Workspace startup speed could potentially benefit from
scripts being able to skip steps that require downloading software. We
may also extend this with more env variables (e.g. persistent storage in
HOME).

Fixes #11131
2024-02-20 13:26:18 +02:00
Spike Curtis 081e37d7d9 chore: move LogSender to agentsdk (#12158)
Moves the LogSender to agentsdk and deprecates LogsSender based on the v1 API.
2024-02-20 10:44:20 +04:00
Mathias Fredriksson c63f569174 refactor(agent/agentssh): move envs to agent and add agentssh config struct (#12204)
This commit refactors where custom environment variables are set in the
workspace and decouples agent specific configs from the `agentssh.Server`.
To reproduce all functionality, `agentssh.Config` is introduced.

The custom environment variables are now configured in `agent/agent.go`
and the agent retains control of the final state. This will allow for
easier extension in the future and keep other modules decoupled.
2024-02-19 16:30:00 +02:00
Mathias Fredriksson 0442ee5fa8 fix(agent/reconnectingpty): fix screen startup speed by disabling messages (#12190) 2024-02-16 22:37:02 +02:00
Spike Curtis 2aff014e5d feat: add logSender for sending logs on agent v2 API (#12046)
Adds a new subcomponent of the agent for queueing up logs until they can be sent over the Agent API.

Subsequent PR will change the agent to use this instead of the HTTP API for posting logs.

Relates to #10534
2024-02-15 16:57:17 +04:00
Dean Sheather 2fc3064653 chore: add tests for app ID copy in app healths (#12088) 2024-02-12 05:49:48 +00:00
Spike Curtis 92b2e26a48 feat: send log limit exceeded in response, not error (#12078)
When we exceed the db-imposed limit of logs, we need to communicate that back to the agent.  In v1 we did it with a 4xx-level HTTP status, but with dRPC, the errors are delivered as strings, which feels fragile to me for something we want to gracefully handle.

So, this PR adds the log limit exceeded as a field on the response message, and fixes the API handler to set it as appropriate instead of an error.
2024-02-09 16:17:20 +04:00
Colin Adler ec8e41f516 chore: add logging around agent app health reporting (#12071) 2024-02-08 23:37:44 -06:00
Spike Curtis 1cf4b62867 feat: change agent to use v2 API for reporting stats (#12024)
Modifies the agent to use the v2 API to report its statistics, using the `statsReporter` subcomponent.
2024-02-07 15:26:41 +04:00
Mathias Fredriksson f2aef0726b fix(agent/agentssh): allow scp to exit with zero status (#12028)
Fixes #11786
2024-02-07 10:22:31 +02:00
Spike Curtis 1aa117b9ec chore: rename client Listen to ConnectRPC (#11916)
ConnectRPC seems more appropriate for this function
2024-02-01 14:44:11 +04:00
Spike Curtis eb03e4490a feat: add statsReporter for reporting stats on agent v2 API (#11920)
Adds a new statsReporter subcomponent of the agent, which in a later PR will be used to report stats over the v2 API.

Refactors the logic a bit so that we can handle starting and stopping stats reporting if the agent API connection drops and reconnects.
2024-02-01 08:21:01 +04:00
Spike Curtis 0fc177203e feat: use agent v2 API to update app health (#11889)
Use the Agent v2 API to update App Health
2024-01-30 11:35:12 +04:00
Spike Curtis 2599850e54 feat: use agent v2 API to post startup (#11877)
Uses the v2 Agent API to post startup information.
2024-01-30 11:23:28 +04:00
Spike Curtis da8bb1c198 feat: use agent v2 API to fetch manifest (#11832)
Agent uses the v2 API to obtain the manifest, instead of the HTTP API.
2024-01-30 10:11:28 +04:00
Spike Curtis 9cf4e7f15a fix: prevent agent_test.go from failing on error logs (#11909)
We're failing tests on error logs like this: https://github.com/coder/coder/actions/runs/7706053882/job/21000984583

Unfortunately, the error we hit, when the underlying connection is closed, is unexported, so we can't specifically ignore it.

Part of the issue is that agent.Close() doesn't wait for these goroutines to complete before returning, so the test harness proceeds to close the connection. This looks to our product code like the network connection failing.  It would be possible to fix this, but just doesn't seem worth it for the extra insurance of catching other error logs in these tests.
2024-01-30 10:04:01 +04:00
Spike Curtis 0eff646c31 chore: move proto to sdk conversion to agentsdk (#11831)
`agentsdk` depends on `agent/proto` because it needs to get the version to dial.

Therefore, the conversion routines need to live in `agentsdk` so that we can convert to and from the Manifest.

I briefly considered refactoring the agent to only reference `proto.Manifest`, but decided against it because we might have multiple protocol versions in the future, its useful to have a protocol-independent data structure.
2024-01-30 09:04:56 +04:00