Commit Graph

69 Commits

Author SHA1 Message Date
Steven Masley 60b3fd0783 chore!: send modules archive over the proto messages (#21398)
# What this does

Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces.

This allow terraform to skip downloading modules from their registries, a step that takes seconds. 

<img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" />


# Wire protocol

The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit.

# 🚨  Behavior Change (Breaking Change) 🚨 

**Before this PR** modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version

**After this PR** modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.
2026-01-09 11:33:34 -06:00
Steven Masley d2044c2ee9 chore: update protobuf to reuse file request (#21447)
**This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398**

Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`.

Renamed to `FileUpload` because it is now bi-directional.

This **is backwards compatible**. I tested it to confirm the payloads are identical.  Types were just renamed and moved around.

```golang
func TestTypeUpgrade(t *testing.T) {
	t.Parallel()

	x := &proto2.UploadFileRequest{
		Type: &proto2.UploadFileRequest_ChunkPiece{
			ChunkPiece: &proto.ChunkPiece{
				Data:         []byte("Hello World!"),
				FullDataHash: []byte("Foobar"),
				PieceIndex:   42,
			},
		},
	}

	data, err := protobuf.Marshal(x)
	require.NoError(t, err)

	// Exactly the same output
	// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main`
	// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch
	fmt.Println(base64.StdEncoding.EncodeToString(data))
}
```


# What this does

This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.
2026-01-09 11:23:32 -06:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Steven Masley 3194bcfc9e chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064)
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
2025-12-15 11:26:41 -06:00
Steven Masley 9ca5b44b56 chore: implement persistent terraform directories (experimental) (#20563)
Prior to this, every workspace build ran `terraform init` in a fresh
directory. This would mean the `modules` are downloaded fresh. If the
module is not pinned, subsequent workspace builds would have different
modules.
2025-11-13 07:50:17 -06:00
Steven Masley 7c8deaf0d6 chore: refactor terraform paths to a central structure (#20566)
Refactors all Terraform file path logic into a centralized tfpath package. This consolidates all path construction into a single, testable Layout type. 

Instead of passing around `string` for directories, pass around the `Layout` which has the file location methods on it.
2025-11-12 09:24:07 -06:00
Steven Masley c1341cccdd feat: use proto streams to increase maximum module files payload (#18268)
This PR implements protobuf streaming to handle large module files by:
1. **Streaming large payloads**: When module files exceed the 4MB limit,
they're streamed in chunks using a new UploadFile RPC method
2. **Database storage**: Streamed files are stored in the database and
referenced by hash for deduplication
3. **Backward compatibility**: Small module files continue using the
existing direct payload method
2025-06-13 12:46:26 -05:00
Steven Masley 64807e1d61 chore: apply the 4mb max limit on drpc protocol message size (#17771)
Respect the 4mb max limit on proto messages
2025-05-13 11:24:51 -05:00
Steven Masley 37832413ba chore: resolve internal drpc package conflict (#17770)
Our internal drpc package name conflicts with the external one in usage. 
`drpc.*` == external
`drpcsdk.*` == internal
2025-05-12 10:31:38 -05:00
Spike Curtis b3aba6dab7 test: ignore context.Canceled in acquireWithCancel (#17448)
fixes https://github.com/coder/internal/issues/584

Ignore canceled error when sending an acquired job, since dRPC is racy and will sometimes return this error even after successfully sending the job, if the test is quickly finished.
2025-04-17 16:17:19 +04:00
Danny Kopping 95f03c561f fix: increase context timeout in TestProvisionerd/MaliciousTar to avoid flake (#17400)
Fixing a flake seen here:
https://github.com/coder/coder/actions/runs/14465389766/job/40566518088

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-04-15 09:39:23 +00:00
Cian Johnston 7b88776403 chore(testutil): add testutil.GoleakOptions (#16070)
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
2025-01-08 15:38:37 +00:00
Cian Johnston 82070944f1 chore(provisionerd): close completeChan exactly once (#16011)
Fixes https://github.com/coder/internal/issues/263
2025-01-02 15:14:06 +00:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
Cian Johnston 4719d2406f chore(testutil): extract testutil.CreateZip and testutil.CreateTar helpers (#15540)
Extracts `testutil.CreateTar` and `testutil.CreateZip` test helpers.
2024-11-18 09:17:04 +00:00
Colin Adler 421c0d1242 chore: add nginx topology to tailnet tests (#13188) 2024-05-07 18:17:38 +10:00
Steven Masley 09f00c08df chore: shutdown provisioner should stop waiting on client (#13118)
* chore: shutdown provisioner should stop waiting on client
* chore: add unit test that replicates failed client conn
2024-05-03 10:15:17 -05:00
Kyle Carberry 895df54051 fix: separate signals for passive, active, and forced shutdown (#12358)
* fix: separate signals for passive, active, and forced shutdown

`SIGTERM`: Passive shutdown stopping provisioner daemons from accepting new
jobs but waiting for existing jobs to successfully complete.

`SIGINT` (old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait 5s for jobs to be exited, then force quit.

`SIGKILL`: Untouched from before, will force-quit.

* Revert dramatic signal changes

* Rename

* Fix shutdown behavior for provisioner daemons

* Add test for graceful shutdown
2024-03-15 13:16:36 +00:00
Kyle Carberry 30afe43f8a fix: create tempdir prior to cleanup (#11394)
See https://github.com/coder/coder/actions/runs/7399827933/job/20132407700

Seems like this happened because the test was being cleaned up
while the tempdir was being made.
2024-01-03 19:18:15 +00:00
Spike Curtis 9a4e1100fa chore: move drpc transport tools to codersdk/drpc (#11224)
Part of #10532

DRPC transport over yamux and in-mem pipes was previously only used on the provisioner APIs, but now will also be used in tailnet.  Moved to subpackage of codersdk to avoid import loops.
2023-12-15 12:41:39 +04:00
Spike Curtis 211718f95a fix: fix MaliciousTar test case (#10158)
fixes #9895

Problem was that provisionerd tries to acquire the next job, and races with shutdown, triggering the assert in the handler.  Switches this test case to use the more robust handler.
2023-10-10 13:24:43 +04:00
Spike Curtis 375c70d141 feat: integrate Acquirer for provisioner jobs (#9717)
* chore: add Acquirer to provisionerdserver pkg

Signed-off-by: Spike Curtis <spike@coder.com>

* code review improvements & fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* feat: integrate Acquirer for provisioner jobs

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix imports, whitespace

Signed-off-by: Spike Curtis <spike@coder.com>

* provisionerdserver always closes; remove poll interval from playwright

Signed-off-by: Spike Curtis <spike@coder.com>

* post jobs outside transactions

Signed-off-by: Spike Curtis <spike@coder.com>

* graceful shutdown in test

Signed-off-by: Spike Curtis <spike@coder.com>

* Mark AcquireJob deprecated

Signed-off-by: Spike Curtis <spike@coder.com>

* Graceful shutdown on all provisionerd tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Deprecate, not remove CLI flags

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-09-19 10:25:57 +04:00
Jon Ayers 622442203d chore: fix test flake in TestProvisionerd (#9709) 2023-09-18 11:23:22 -05:00
Spike Curtis 11b6068112 feat: add support for networked provisioners (#9593)
* Refactor provisionerd to use interface to connect to provisioners

Signed-off-by: Spike Curtis <spike@coder.com>

* feat: add support for networked provisioners

Signed-off-by: Spike Curtis <spike@coder.com>

* fix token length and linting

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-09-08 09:53:48 +00:00
Spike Curtis 60d5002eb6 refactor: change template archive extraction to be on provisioner (#9264)
* refactor provisionersdk protocol

Signed-off-by: Spike Curtis <spike@coder.com>

* refactor provisioners to use new protocol

Signed-off-by: Spike Curtis <spike@coder.com>

* refactor provisionerd to use new protocol

Signed-off-by: Spike Curtis <spike@coder.com>

* refactor tests & proto renames

* Fixes from self-review

Signed-off-by: Spike Curtis <spike@coder.com>

* appease fmt & link

Signed-off-by: Spike Curtis <spike@coder.com>

* code review fixes & e2e fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* More fmt

Signed-off-by: Spike Curtis <spike@coder.com>

* Code review fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* new gen; use uuid for session workdir

Signed-off-by: Spike Curtis <spike@coder.com>

* Revert nix-based gen CI task until dogfood is on nix

Signed-off-by: Spike Curtis <spike@coder.com>

* revert deleting dogfood Docker stuff

Signed-off-by: Spike Curtis <spike@coder.com>

* Revert "revert deleting dogfood Docker stuff"

This reverts commit 9762158167.

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-08-25 06:10:15 +00:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Marcin Tojek b1d1b63113 chore: ensure logs consistency across Coder (#8083) 2023-06-20 12:30:45 +02:00
Marcin Tojek a7366a8b76 feat!: drop support for legacy parameters (#7663) 2023-06-02 11:16:46 +02:00
ElliotG 0069831e8d fix: use error log when failing provisioner job (#6812)
Co-authored-by: Colin Adler <colin1adler@gmail.com>
2023-04-05 13:30:53 -05:00
Mathias Fredriksson 860e2829c5 fix: Prevent race between provisionerd connect and close (#6206)
* fix: Prevent race between provisionerd connect and close

* test: Add detection for provisioner creation after test completion
2023-02-14 16:37:43 +00:00
Mathias Fredriksson 41ae01d2e9 fix: Improve closure of provisioner and agent tailnet dial (#6199) 2023-02-14 14:57:48 +00:00
Kyle Carberry 2d0a69ba47 fix: require client pipe to be closed in provisionerd test (#6188)
https://github.com/coder/coder/actions/runs/4165019548/jobs/7207442687
2023-02-14 00:46:05 +00:00
Colin Adler d2ae16dd22 fix: routinely ping agent websocket to ensure liveness (#5824) 2023-01-23 20:05:29 +00:00
Marcin Tojek 971e36781b chore: improve logging in provisionerd_test (#5353) 2022-12-09 13:11:54 +01:00
Colin Adler ab3b3d5fca feat: add debouncing to provisionerd rpc calls (#5198) 2022-12-01 16:54:53 -06:00
Mathias Fredriksson 8ff89c4288 fix: Fix flakeyness of TestProvisionerd/ReconnectAndComplete (#5169) 2022-11-24 14:09:56 +02:00
Colin Adler 1f20cab110 fix: don't use yamux for in-memory provisioner{,d} streams (#5136) 2022-11-22 12:19:32 -06:00
Ammar Bandukwala 97dbd4dc5d Implement Quotas v3 (#5012)
* provisioner/terraform: add cost to resource_metadata

* provisionerd/runner: use Options struct

* Complete provisionerd implementation

* Add quota_allowance to groups

* Combine Quota and RBAC licenses

* Add Opts to InTx
2022-11-14 17:57:33 +00:00
Ammar Bandukwala 95fb59696e Refactor Provisioner to distinguish Plan and Apply (#5036) 2022-11-11 16:45:58 -06:00
Kyle Carberry 30281852d6 feat: Add buffering to provisioner job logs (#4918)
* feat: Add bufferring to provisioner job logs

This should improve overall build performance, and especially under load.

It removes the old `id` column on the `provisioner_job_logs` table
and replaces it with an auto-incrementing big integer to preserve order.

Funny enough, we never had to care about order before because inserts
would at minimum be 1ms different. Now they aren't, so the order needs
to be preserved.

* Fix log bufferring

* Fix frontend log streaming

* Fix JS test
2022-11-06 20:50:34 -06:00
Mathias Fredriksson 4730c589fe chore: Use standardized test timeouts and delays (#3291) 2022-08-01 15:45:05 +03:00
Mathias Fredriksson 6916d34458 fix: Fix cleanup in test helpers, prefer defer in tests (#3113)
* fix: Change uses of t.Cleanup -> defer in test bodies

Mixing t.Cleanup and defer can lead to unexpected order of execution.

* fix: Ensure t.Cleanup is not aborted by require

* chore: Add helper annotations
2022-07-25 19:22:02 +03:00
Kyle Carberry 4f1df88529 fix: Always output job failure reason in provisioner daemon tests (#2850)
This flake can be seen here: https://github.com/coder/coder/runs/7186604615?check_suite_focus=true
2022-07-10 14:52:33 -05:00
Spike Curtis 22febc749a provisionerd sends failed or complete last (#2732)
* provisionerd sends failed or complete last

Signed-off-by: Spike Curtis <spike@coder.com>

* Move runner into package

Signed-off-by: Spike Curtis <spike@coder.com>

* Remove jobRunner interface

Signed-off-by: Spike Curtis <spike@coder.com>

* renames and slight reworking from code review

Signed-off-by: Spike Curtis <spike@coder.com>

* Reword comment about okToSend

Signed-off-by: Spike Curtis <spike@coder.com>
2022-07-01 09:55:46 -07:00
Dean Sheather 6be8a373e0 feat: run a terraform plan before creating workspaces with the given template parameters (#1732) 2022-06-02 00:44:53 +10:00
Colin Adler 98ccd0eb89 feat: add README parsing to template versions (#1500) 2022-05-17 15:00:48 -05:00
Colin Adler 9319c39257 fix: additional provisionerd test double closes (#1267) 2022-05-03 08:02:54 -05:00
Kyle Carberry 603b7da413 test: Wrap provisionerd channel closes in sync.Once (#1181)
This caused a few flakes, so figured I'd tackle all of them:
https://github.com/coder/coder/runs/6167856950?check_suite_focus=true#step:9:246
2022-04-26 09:44:16 -05:00
Kyle Carberry 947e8f9d2e fix: Manually format external URL (#1168)
path.Join escaped the double slash!
2022-04-25 22:22:31 +00:00