Commit Graph

3372 Commits

Author SHA1 Message Date
Susana Ferreira f044cc3550 feat: add provisioner daemon name to provisioner jobs responses (#17877)
# Description

This PR adds the `worker_name` field to the provisioner jobs endpoint.

To achieve this, the following SQL query was updated:
-
`GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisioner`

As a result, the `codersdk.ProvisionerJob` type, which represents the
provisioner job API response, was modified to include the new field.

**Notes:** 
* As mentioned in
[comment](https://github.com/coder/coder/pull/17877#discussion_r2093218206),
the `GetProvisionerJobsByIDsWithQueuePosition` query was not changed due
to load concerns. This means that for template and template version
endpoints, `worker_id` will still be returned, but `worker_name` will
not.
* Similar to `worker_id`, the `worker_name` is only present once a job
is assigned to a provisioner daemon. For jobs in a pending state (not
yet assigned), neither `worker_id` nor `worker_name` will be returned.

---

# Affected Endpoints

- `/organizations/{organization}/provisionerjobs`
- `/organizations/{organization}/provisionerjobs/{job}`

---

# Testing

- Added new tests verifying that both `worker_id` and `worker_name` are
returned once a provisioner job reaches the **succeeded** state.
- Existing tests covering state transitions and other logic remain
unchanged, as they test different scenarios.

---

# Front-end Changes

Admin provisioner jobs dashboard:
<img width="1088" alt="Screenshot 2025-05-16 at 11 51 33"
src="https://github.com/user-attachments/assets/0e20e360-c615-4497-84b7-693777c5443e"
/>

Fixes: https://github.com/coder/coder/issues/16982
2025-05-19 16:05:39 +01:00
Mathias Fredriksson 98e2ec4417 feat: show devcontainer dirty status and allow recreate (#17880)
Updates #16424
2025-05-19 12:56:10 +03:00
Sas Swart c775ea8411 test: fix a race in TestReinit (#17902)
closes https://github.com/coder/internal/issues/632

`pubsubReinitSpy` used to signal that a subscription had happened before
it actually had.
This created a slight opportunity for the main goroutine to publish
before the actual subscription was listening. The published event was
then dropped, leading to a failed test.
2025-05-19 11:37:54 +02:00
Spike Curtis 1a41608035 fix: stop extending API key access if OIDC refresh is available (#17878)
fixes #17070

Cleans up our handling of APIKey expiration and OIDC to keep them separate concepts. For an OIDC-login APIKey, both the APIKey and OIDC link must be valid to login. If the OIDC link is expired and we have a refresh token, we will attempt to refresh.

OIDC refreshes do not have any effect on APIKey expiry.

https://github.com/coder/coder/issues/17070#issuecomment-2886183613 explains why this is the correct behavior.
2025-05-19 12:05:35 +04:00
Steven Masley f36fb67f57 chore: use static params when dynamic param metadata is missing (#17836)
Existing template versions do not have the metadata (modules + plan) in
the db. So revert to using static parameter information from the
original template import.

This data will still be served over the websocket.
2025-05-16 11:47:59 -05:00
Steven Masley c2bc801f83 chore: add 'classic_parameter_flow' column setting to templates (#17828)
We are forcing users to try the dynamic parameter experience first.
Currently this setting only comes into effect if an experiment is
enabled.
2025-05-15 17:55:17 -05:00
Thomas Kosiewski 1bacd82e80 feat: add API key scope to restrict access to user data (#17692) 2025-05-15 15:32:52 +01:00
Yevhenii Shcherbina 2aa8cbebd7 fix: exclude deleted templates from metrics collection (#17839)
Also add some clarification about the lack of database constraints for
soft template deletion.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-15 13:33:58 +02:00
Danny Kopping f2edcf3f59 fix: add missing clause for tracking replacements (#17849)
We should only be tracking resource replacements during a prebuild
claim.

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-15 13:02:30 +02:00
Steven Masley 789c4beba7 chore: add dynamic parameter error if missing metadata from provisioner (#17809) 2025-05-14 12:21:36 -05:00
ケイラ f3bcac2e90 refactor: improve overlayFS errors (#17808) 2025-05-14 10:26:47 -06:00
Danny Kopping 6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
Sas Swart 425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
ケイラ 60762d4c13 feat: load terraform modules when using dynamic parameters (#17714) 2025-05-13 16:07:29 -05:00
Dean Sheather ef745c0c5d chore: optimize workspace_latest_builds view query (#17789)
Avoids two sequential scans of massive tables (`workspace_builds`,
`provisioner_jobs`) and uses index scans instead. This new view largely
replicates our already optimized query `GetWorkspaces` to fetch the
latest build.

The original query and the new query were compared against the dogfood
database to ensure they return the exact same data in the exact same
order (minus the new `workspaces.deleted = false` filter to improve
performance even more). The performance is massively improved even
without the `workspaces.deleted = false` filter, but it was added to
improve it even more.

Note: these query times are probably inflated due to high database load
on our dogfood environment that this intends to partially resolve.

Before: 2,139ms
([explain](https://explain.dalibo.com/plan/997e4fch241b46e6))

After: 33ms
([explain](https://explain.dalibo.com/plan/c888dc223870f181))

Co-authored-by: Cian Johnston <cian@coder.com>

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-13 20:51:01 +02:00
Steven Masley 64807e1d61 chore: apply the 4mb max limit on drpc protocol message size (#17771)
Respect the 4mb max limit on proto messages
2025-05-13 11:24:51 -05:00
Danielle Maywood b0788f410f chore: rename "Test Notification" to "Troubleshooting Notification" (#17790)
Rename the "Test Notification" to "Troubleshooting Notification"
2025-05-13 13:52:55 +01:00
Susana Ferreira 599bb35a04 fix(coderd): list templates returns non-deprecated templates by default (#17747)
## Description

Modifies the behaviour of the "list templates" API endpoints to return
non-deprecated templates by default. Users can still query for
deprecated templates by specifying the `deprecated=true` query
parameter.

**Note:** The deprecation feature is an enterprise-level feature

## Affected Endpoints
* /api/v2/organizations/{organization}/templates
* /api/v2/templates

Fixes #17565
2025-05-13 12:44:46 +01:00
Danielle Maywood 0b5f27f566 feat: add parent_id column to workspace_agents table (#17758)
Adds a new nullable column `parent_id` to `workspace_agents` table. This
lays the groundwork for having child agents.
2025-05-13 00:01:31 +01:00
Steven Masley 398b999d8f chore: pass previous values into terraform apply (#17696)
Pass previous workspace build parameter values into the terraform
`plan/apply`. Enforces monotonicity in terraform as well as `coderd`.
2025-05-12 15:32:00 -05:00
ケイラ d0ab91c16f fix: reduce size of terraform modules archive (#17749) 2025-05-12 13:50:07 -06:00
Steven Masley 37832413ba chore: resolve internal drpc package conflict (#17770)
Our internal drpc package name conflicts with the external one in usage. 
`drpc.*` == external
`drpcsdk.*` == internal
2025-05-12 10:31:38 -05:00
Danny Kopping af2941bb92 feat: add is_prebuild_claim to distinguish post-claim provisioning (#17757)
Used in combination with
https://github.com/coder/terraform-provider-coder/pull/396

This is required by both https://github.com/coder/coder/pull/17475 and
https://github.com/coder/coder/pull/17571

Operators may need to conditionalize their templates to perform certain
operations once a prebuilt workspace has been claimed. This value will
**only** be set once a claim takes place and a subsequent `terraform
apply` occurs. Any `terraform apply` runs thereafter will be
indistinguishable from a normal run on a workspace.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-12 14:19:03 +00:00
Danny Kopping 3ee95f14ce chore: upgrade terraform-provider-coder & preview libs (#17738)
The changes in `coder/preview` necessitated the changes in
`codersdk/richparameters.go` & `provisioner/terraform/resources.go`.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-05-09 17:41:19 +02:00
Jon Ayers a9f1a6b2a2 fix: revert fix: persist terraform modules during template import (#17665) (#17734)
This reverts commit ae3d90b057.
2025-05-08 22:03:08 -04:00
ケイラ ae3d90b057 fix: persist terraform modules during template import (#17665) 2025-05-08 16:13:46 -06:00
Steven Masley d5360a6da0 chore: fetch workspaces by username with organization permissions (#17707)
Closes https://github.com/coder/coder/issues/17691

`ExtractOrganizationMembersParam` will allow fetching a user with only
organization permissions. If the user belongs to 0 orgs, then the user "does not exist" 
from an org perspective. But if you are a site-wide admin, then the user does exist.
2025-05-08 14:41:17 -05:00
Vincent Vielle 1bb96b8528 fix: resolve flake test on manager (#17702)
Fixes coder/internal#544

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2025-05-08 16:49:57 +03:00
Ethan 4369765996 test: fix TestWorkspaceAgentReportStats flake (#17678)
Closes https://github.com/coder/internal/issues/609.

As seen in the below logs, the `last_used_at` time was updating, but just to the same value that it was on creation; `dbtime.Now` was called in quick succession. 

```
 t.go:106: 2025-05-05 12:11:54.166 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2025-05-05T12:11:54.161329Z"
    t.go:106: 2025-05-05 12:11:54.172 [debu]  coderd: GET  host=localhost:50422  path=/api/v2/workspaces/745b7ff3-47f2-4e1a-9452-85ea48ba5c46  proto=HTTP/1.1  remote_addr=127.0.0.1  start="2025-05-05T12:11:54.1669073Z"  workspace_name=peaceful_faraday34  requestor_id=b2cf02ae-2181-480b-bb1f-95dc6acb6497  requestor_name=testuser  requestor_email=""  took=5.2105ms  status_code=200  latency_ms=5  params_workspace=745b7ff3-47f2-4e1a-9452-85ea48ba5c46  request_id=7fd5ea90-af7b-4104-91c5-9ca64bc2d5e6
    workspaceagentsrpc_test.go:70: 
        	Error Trace:	C:/actions-runner/coder/coder/coderd/workspaceagentsrpc_test.go:70
        	Error:      	Should be true
        	Test:       	TestWorkspaceAgentReportStats
        	Messages:   	2025-05-05 12:11:54.161329 +0000 UTC is not after 2025-05-05 12:11:54.161329 +0000 UTC
```

If we change the initial `LastUsedAt` time to be a time in the past, ticking with a `dbtime.Now` will always update it to a later value. If it never updates, the condition will still fail.
2025-05-06 00:15:24 +10:00
Cian Johnston 544259b809 feat: add database tables and API routes for agentic chat feature (#17570)
Backend portion of experimental `AgenticChat` feature:
- Adds database tables for chats and chat messages
- Adds functionality to stream messages from LLM providers using
`kylecarbs/aisdk-go`
- Adds API routes with relevant functionality (list, create, update
chats, insert chat message)
- Adds experiment `codersdk.AgenticChat`

---------

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2025-05-02 17:29:57 +01:00
brettkolodny b7e08ba7c9 fix: filter out deleted users when attempting to delete an organization (#17621)
Closes
[coder/internal#601](https://github.com/coder/internal/issues/601)
2025-05-01 13:26:01 -03:00
Cian Johnston 2acf0adcf2 chore(codersdk/toolsdk): improve static analyzability of toolsdk.Tools (#17562)
* Refactors toolsdk.Tools to remove opaque `map[string]any` argument in
favour of typed args structs.
* Refactors toolsdk.Tools to remove opaque passing of dependencies via
`context.Context` in favour of a tool dependencies struct.
* Adds panic recovery and clean context middleware to all tools.
* Adds `GenericTool` implementation to allow keeping `toolsdk.All` with
uniform type signature while maintaining type information in handlers.
* Adds stricter checks to `patchWorkspaceAgentAppStatus` handler.
2025-04-29 16:05:23 +01:00
Mathias Fredriksson 1fc74f629e refactor(agent): update agentcontainers api initialization (#17600)
There were too many ways to configure the agentcontainers API resulting
in inconsistent behavior or features not being enabled. This refactor
introduces a control flag for enabling or disabling the containers API.
When disabled, all implementations are no-op and explicit endpoint
behaviors are defined. When enabled, concrete implementations are used
by default but can be overridden by passing options.
2025-04-29 17:53:10 +03:00
Yevhenii Shcherbina 02b2de9ae4 refactor: skip reconciliation for some presets (#17595) 2025-04-29 07:55:37 -04:00
ケイラ 12589026b6 chore: update error message for duplicate organization members (#17594)
Closes https://github.com/coder/internal/issues/345
2025-04-28 14:51:33 -06:00
Yevhenii Shcherbina a78f0fc4e1 refactor: use specific error for agpl and prebuilds (#17591)
Follow-up PR to https://github.com/coder/coder/pull/17458
Addresses this discussion:
https://github.com/coder/coder/pull/17458#discussion_r2055940797
2025-04-28 16:37:41 -04:00
Steven Masley 14105ff301 test: do not run memory race test in parallel (#17582)
Closes
https://github.com/coder/internal/issues/597#issuecomment-2835262922

The parallelized tests share configs, which when accessed concurrently
throw race errors. The configs are read only, so it is fine to run these
tests with shared idp configs.
2025-04-28 12:20:07 -05:00
Steven Masley 37c5e7c440 chore: return safe copy of string slice in 'ParseStringSliceClaim' (#17439)
Claims parsed should be safe to mutate and filter. This was likely not
causing any bugs or issues, and just doing this out of precaution
2025-04-28 12:18:02 -05:00
Danny Kopping e0483e3136 feat: add prebuilds metrics collector (#17547)
Closes https://github.com/coder/internal/issues/509

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-04-28 12:28:56 +02:00
Danny Kopping 08ad910171 feat: add prebuilds configuration & bootstrapping (#17527)
Closes https://github.com/coder/internal/issues/508

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Cian Johnston <cian@coder.com>
2025-04-25 11:07:15 +02:00
Yevhenii Shcherbina 118f12ac3a feat: implement claiming of prebuilt workspaces (#17458)
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Edward Angert <EdwardAngert@users.noreply.github.com>
Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com>
Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com>
Co-authored-by: Ethan <39577870+ethanndickson@users.noreply.github.com>
Co-authored-by: M Atif Ali <atif@coder.com>
Co-authored-by: Aericio <16523741+Aericio@users.noreply.github.com>
Co-authored-by: M Atif Ali <me@matifali.dev>
Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>
2025-04-24 09:39:38 -04:00
ケイラ 36a72a2b25 chore: loosen static validation when using dynamic parameters (#17516)
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-04-23 10:15:49 -06:00
Steven Masley 71dbd0c888 fix: nil ptr deref when removing OIDC from deployment and accessing old users (#17501)
If OIDC is removed from a deployment, trying to create a workspace for a previous user
on OIDC would panic.
2025-04-23 08:45:26 -05:00
Steven Masley ca38729840 chore: revert dynamic params as a safe experiment (#17510) 2025-04-22 16:21:15 +00:00
ケイラ 2cc56ab515 chore: fill out workspace owner data for dynamic parameters (#17366) 2025-04-17 14:51:50 -06:00
Steven Masley ea65ddc17d fix: correct user roles being passed into terraform context (#17460)
Roles were being passed into the workspace context incorrectly. Site
wide scopes were being org scoped. Roles outside the org should also not
be sent.
2025-04-17 15:42:23 -05:00
Yevhenii Shcherbina 183146e2c9 fix: add minor fix to reconciliation loop (#17454)
Follow-up PR to https://github.com/coder/coder/pull/17261
I noticed that 1 metrics-related test fails in `dk/prebuilds` after
merging my PR into `dk/prebuilds`.
2025-04-17 13:43:24 -04:00
Yevhenii Shcherbina 27bc60d1b9 feat: implement reconciliation loop (#17261)
Closes https://github.com/coder/internal/issues/510

<details>
<summary> Refactoring Summary </summary>

### 1) `CalculateActions` Function

#### Issues Before Refactoring:

- Large function (~150 lines), making it difficult to read and maintain.
- The control flow is hard to follow due to complex conditional logic.
- The `ReconciliationActions` struct was partially initialized early,
then mutated in multiple places, making the flow error-prone.

Original source:  

https://github.com/coder/coder/blob/fe60b569ad754245e28bac71e0ef3c83536631bb/coderd/prebuilds/state.go#L13-L167

#### Improvements After Refactoring:

- Simplified and broken down into smaller, focused helper methods.
- The flow of the function is now more linear and easier to understand.
- Struct initialization is cleaner, avoiding partial and incremental
mutations.

Refactored function:  

https://github.com/coder/coder/blob/eeb0407d783cdda71ec2418c113f325542c47b1c/coderd/prebuilds/state.go#L67-L84

---

### 2) `ReconciliationActions` Struct

#### Issues Before Refactoring:

- The struct mixed both actionable decisions and diagnostic state, which
blurred its purpose.
- It was unclear which fields were necessary for reconciliation logic,
and which were purely for logging/observability.

#### Improvements After Refactoring:

- Split into two clear, purpose-specific structs:
- **`ReconciliationActions`** — defines the intended reconciliation
action.
- **`ReconciliationState`** — captures runtime state and metadata,
primarily for logging and diagnostics.

Original struct:  

https://github.com/coder/coder/blob/fe60b569ad754245e28bac71e0ef3c83536631bb/coderd/prebuilds/reconcile.go#L29-L41

</details>

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Dean Sheather <dean@deansheather.com>
Co-authored-by: Spike Curtis <spike@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-04-17 09:29:29 -04:00
Michael Suchacz daafa0d689 chore: add missing prometheus tests for UNKNOWN/STATIC paths (#17446) 2025-04-17 13:50:18 +02:00
Steven Masley 0bc49ff5ae test: fix flake in TestRoleSyncTable with test cases sharing resources (#17441)
The test case definition shares maps that can have concurrent access if run in parallel.
2025-04-17 00:14:11 +00:00