Commit Graph

2570 Commits

Author SHA1 Message Date
Steven Masley 246a829ea9 feat: evaluate dynamic parameters http endpoint (#18182)
Used when a websocket is too heavy. This implements a single request to
the preview engine.
2025-06-02 13:50:07 -05:00
Hugo Dutka 2e7cd0fe22 chore(coderd/database/dbpurge): remove dbmem from tests (#18151)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 16:48:38 +02:00
Hugo Dutka 1d48131e98 chore(coderd/externalauth): remove dbmem from tests (#18147)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 14:42:18 +02:00
Hugo Dutka d1fc0dd2c5 chore(coderd/database/dbmetrics): remove dbmem from tests (#18150)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 14:26:56 +02:00
Hugo Dutka 8ca5519f57 chore(coderd/idpsync): run all tests with postgres (#18149)
Related to https://github.com/coder/coder/issues/15109.

Running postgres tests used to create a new postgres docker container
every time. I believe the slow down might've been caused by that and was
misattributed to postgres performance.

```
coder@main ~/coder ((0e90ac29))> DB=ci gotestsum --packages="./coderd/idpsync" -- -count=1
✓  coderd/idpsync (1.471s)

DONE 91 tests in 4.766s
```
2025-06-02 14:07:31 +02:00
Hugo Dutka bdf227c1f9 chore(coderd/database/dbgen): remove dbmem from tests (#18152)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 14:05:40 +02:00
Hugo Dutka d3ed6fe652 chore(coderd/autobuild): use dbtestutil.WillUsePostgres instead of os.Getenv in test (#18145)
Standardizing on `WillUsePostgres` will make it easier to remove the
check entirely once dbmem is removed.

Related to https://github.com/coder/coder/issues/15109.
2025-06-02 13:58:07 +02:00
Hugo Dutka 782d01bae2 chore(coderd/httpmw): remove dbmem usage from tests (#18146)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 13:57:56 +02:00
Hugo Dutka 9a28cb0545 chore(coderd/telemetry): remove dbmem from tests (#18148)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 13:57:27 +02:00
Hugo Dutka 9812162190 chore(coderd/database/dbrollup): remove dbmem from tests (#18153)
Related to https://github.com/coder/coder/issues/15109
2025-06-02 13:56:50 +02:00
Steven Masley 9db114d17c feat: add filecache prometheus metrics (#18089)
Dynamic parameters has an in memory file cache. This adds prometheus
metrics to monitor said cache.
2025-05-30 11:54:54 -05:00
Danielle Maywood 562c4696de fix(coderd/database/dbmem): fill DisplayGroup field for InsertWorkspaceApp (#18136)
It appears `dbmem` was missed in the new app groups feature
https://github.com/coder/coder/pull/17977.
2025-05-30 17:52:31 +01:00
Cian Johnston 9afdd33e64 fix(coderd/database/dbmem): apply rlock/runlock on GetTelemetryItems (#18133)
Fixes https://github.com/coder/coder/issues/18132
2025-05-30 16:39:32 +01:00
Steven Masley 216fe441cf chore: align CSRF settings with deployment config (#18116) 2025-05-30 09:30:49 -05:00
Bruno Quaresma f974add373 chore: rollback PR #18025 (#18118)
Rollback https://github.com/coder/coder/pull/18025
2025-05-30 11:00:26 -03:00
Bruno Quaresma d779126ee3 chore: rollback PR #18081 (#18104)
Rollback https://github.com/coder/coder/pull/18081
2025-05-29 13:12:13 -03:00
Steven Masley e4648b6fc1 feat: allow iframing urls on the same domain as the deployment (#18102)
Used for AI tasks. We should eventually add regions to this csp header.
2025-05-29 10:07:57 -05:00
Steven Masley 8387dd27ab chore: add form_type parameter argument to db (#17920)
`form_type` is a new parameter field in the terraform provider. Bring
that field into coder/coder.

Validation for `multi-select` has also been added.
2025-05-29 08:55:19 -05:00
Cian Johnston 776c144128 fix(coderd): ensure agent timings are non-zero on insert (#18065)
Relates to https://github.com/coder/coder/issues/15432

Ensures that no workspace build timings with zero values for started_at or ended_at are inserted into the DB or returned from the API.
2025-05-29 13:36:06 +01:00
Danielle Maywood b712d0b23f feat(coderd/agentapi): implement sub agent api (#17823)
Closes https://github.com/coder/internal/issues/619

Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.

`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
2025-05-29 12:15:47 +01:00
Danny Kopping bc83de2a72 feat: add prebuilt workspaces telemetry (#18084)
Adds telemetry for a _global_ account of prebuilt workspaces created,
failed to build, and claimed.

Partitioning this data by template/preset tuple is not currently in
scope.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-29 13:13:44 +02:00
Bruno Quaresma 2ec7404197 chore: make owner_name and owner_username consistent (#18081)
We've been using owner_name inconsistently as username. So this PR fixes
it by making the attribute naming more consistent.
2025-05-28 17:25:32 -03:00
Yevhenii Shcherbina b330c0803c fix: reimplement reporting of preset-hard-limited metric (#18055)
Addresses concerns raised in https://github.com/coder/coder/pull/18045
2025-05-28 14:18:32 -04:00
Steven Masley ca8660cea6 chore: keep previous workspace build parameters for dynamic params (#18059)
The existing code persists all static parameters and their values. Using
the previous build as the source if no new inputs are found.

Dynamic params do not have a state of the parameters saved to disk. So
instead, all previous values are persisted always, and new inputs
override.
2025-05-28 10:00:39 -05:00
Danielle Maywood 6e255c72c6 chore(coderd/database): enforce agent name unique within workspace build (#18052)
Adds a database trigger that runs on insert and update of the
`workspace_agents` table. The trigger ensures that the agent name is
unique within the context of the workspace build it is being inserted
into.
2025-05-28 14:21:17 +01:00
Yevhenii Shcherbina 110102a60a fix: optimize queue position sql query (#17974)
Use only `online provisioner daemons` for
`GetProvisionerJobsByIDsWithQueuePosition` query. It should improve
performance of the query.
2025-05-28 08:21:16 -04:00
Steven Masley b4531c4218 feat: make dynamic parameters respect owner in form (#18013)
Closes https://github.com/coder/coder/issues/18012

---------

Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com>
2025-05-27 15:43:00 -05:00
ケイラ 9fc3329575 feat: persist app groups in the database (#17977) 2025-05-27 13:13:08 -06:00
Mathias Fredriksson a18eb9d08f feat(site): allow recreating devcontainers and showing dirty status (#18049)
This change allows showing the devcontainer dirty status in the UI as
well as a recreate button to update the devcontainer.

Closes #16424
2025-05-27 19:42:24 +03:00
Bruno Quaresma d63417b542 fix: update WorkspaceOwnerName to use user.name instead of user.username (#18025)
We have been using the user.username instead of user.name in wrong
places, making it very confusing for the UI.
2025-05-27 11:42:07 -03:00
Bruno Quaresma 9827c97f32 feat: add AI Tasks page (#18047)
**Preview:**

<img width="1624" alt="Screenshot 2025-05-26 at 21 25 04"
src="https://github.com/user-attachments/assets/2a51915d-2527-4467-bf99-1f2d876b953b"
/>
2025-05-27 11:34:07 -03:00
Spike Curtis 6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
Susana Ferreira 6f6e73af03 feat: implement expiration policy logic for prebuilds (#17996)
## Summary 

This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
	  instances = 2
	  expiration_policy {
		  ttl = 86400
	  }
  }
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.

## Changes

* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0

Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
2025-05-26 20:31:24 +01:00
Mathias Fredriksson 0731304905 feat(agent/agentcontainers): recreate devcontainers concurrently (#18042)
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424
2025-05-26 18:30:52 +03:00
Mathias Fredriksson d6c14f3d8a feat(agent/agentcontainers): update containers periodically (#17972)
This change introduces a significant refactor to the agentcontainers API
and enables periodic updates of Docker containers rather than on-demand.
Consequently this change also allows us to move away from using a
locking channel and replace it with a mutex, which simplifies usage.

Additionally a previous oversight was fixed, and testing added, to clear
devcontainer running/dirty status when the container has been removed.

Updates coder/coder#16424
Updates coder/internal#621
2025-05-22 19:44:33 +03:00
Hugo Dutka f825477a5c fix: add timeouts to test telemetry snapshot (#17879)
This PR ensures that waits on channels will time out according to the
test context, rather than waiting indefinitely. This should alleviate
the panic seen in https://github.com/coder/internal/issues/645 and, if
the deadlock recurs, allow the test to be retried automatically in CI.
2025-05-22 13:51:24 +02:00
Ethan 34494fb330 chore: avoid depending on rbac in slim builds (#17959)
I noticed the `coder-vpn.dylib` (of course alongside the Agent/CLI binaries) had grown substantially (from 29MB to 37MB for the dylib), and discovered that importing RBAC in slim builds was the issue

This PR removes the dependency on RBAC in slim builds, and adds a compile-time check to ensure it can't be imported in the future:

```
$ make build
# github.com/coder/coder/v2/coderd/rbac
coderd/rbac/no_slim.go:8:2: initialization cycle: _DO_NOT_IMPORT_THIS_PACKAGE_IN_SLIM_BUILDS refers to itself
make: *** [Makefile:224: build/coder-slim_2.22.1-devel+7e46d24b4_linux_amd64] Error 1
```

Before and after for `coder-slim_darwin_arm64`:
```
$ gsa before after
┌───────────────────────────────────────────────────────────────────────────────────┐
│ Diff between before and after                                                     │
├─────────┬─────────────────────────────────────────┬──────────┬──────────┬─────────┤
│ PERCENT │ NAME                                    │ OLD SIZE │ NEW SIZE │ DIFF    │
├─────────┼─────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ -100%   │ github.com/gorilla/mux                  │          │          │ +0 B    │
│ -100%   │ github.com/ammario/tlru                 │          │          │ +0 B    │
│ -100%   │ github.com/armon/go-radix               │          │          │ +0 B    │
│ -0.00%  │ gvisor.dev/gvisor                       │ 2.4 MB   │ 2.4 MB   │ -4 B    │
│ -0.21%  │ os                                      │ 155 kB   │ 155 kB   │ -328 B  │
│ -0.23%  │ regexp                                  │ 152 kB   │ 152 kB   │ -346 B  │
│ -0.04%  │ runtime                                 │ 876 kB   │ 876 kB   │ -372 B  │
│ -100%   │ github.com/rcrowley/go-metrics          │ 675 B    │          │ -675 B  │
│ -23.79% │ github.com/cespare/xxhash/v2            │ 3.0 kB   │ 2.3 kB   │ -715 B  │
│ -100%   │ github.com/agnivade/levenshtein         │ 1.4 kB   │          │ -1.4 kB │
│ -100%   │ github.com/go-ini/ini                   │ 1.5 kB   │          │ -1.5 kB │
│ -100%   │ github.com/xeipuuv/gojsonreference      │ 2.4 kB   │          │ -2.4 kB │
│ -100%   │ github.com/xeipuuv/gojsonpointer        │ 5.2 kB   │          │ -5.2 kB │
│ -2.43%  │ go.opentelemetry.io/otel                │ 316 kB   │ 309 kB   │ -7.7 kB │
│ -2.40%  │ slices                                  │ 381 kB   │ 372 kB   │ -9.2 kB │
│ -0.68%  │ crypto                                  │ 1.4 MB   │ 1.4 MB   │ -9.5 kB │
│ -100%   │ github.com/tchap/go-patricia/v2         │ 23 kB    │          │ -23 kB  │
│ -100%   │ github.com/yashtewari/glob-intersection │ 28 kB    │          │ -28 kB  │
│ -4.35%  │ <autogenerated>                         │ 754 kB   │ 721 kB   │ -33 kB  │
│ -100%   │ github.com/sirupsen/logrus              │ 72 kB    │          │ -72 kB  │
│ -2.56%  │ github.com/coder/coder/v2               │ 3.3 MB   │ 3.2 MB   │ -84 kB  │
│ -100%   │ github.com/gobwas/glob                  │ 107 kB   │          │ -107 kB │
│ -100%   │ sigs.k8s.io/yaml                        │ 244 kB   │          │ -244 kB │
│ -100%   │ github.com/open-policy-agent/opa        │ 2.2 MB   │          │ -2.2 MB │
├─────────┼─────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ -7.79%  │ __go_buildinfo __DATA                   │ 18 kB    │ 17 kB    │ -1.4 kB │
│ -6.81%  │ __itablink __DATA_CONST                 │ 23 kB    │ 22 kB    │ -1.6 kB │
│ -6.61%  │ __typelink __DATA_CONST                 │ 71 kB    │ 66 kB    │ -4.7 kB │
│ -2.86%  │ __noptrdata __DATA                      │ 1.0 MB   │ 993 kB   │ -29 kB  │
│ -21.49% │ __data __DATA                           │ 320 kB   │ 251 kB   │ -69 kB  │
│ -6.19%  │ __rodata __DATA_CONST                   │ 6.0 MB   │ 5.6 MB   │ -372 kB │
│ -47.19% │ __rodata __TEXT                         │ 7.6 MB   │ 4.0 MB   │ -3.6 MB │
├─────────┼─────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ -14.02% │ before                                  │ 50 MB    │ 43 MB    │ -7.0 MB │
│         │ after                                   │          │          │         │
└─────────┴─────────────────────────────────────────┴──────────┴──────────┴─────────┘
```
2025-05-22 19:48:23 +10:00
Yevhenii Shcherbina 53e8e9c7cd fix: reduce cost of prebuild failure (#17697)
Relates to https://github.com/coder/coder/issues/17432

### Part 1:

Notes:
- `GetPresetsAtFailureLimit` SQL query is added, which is similar to
`GetPresetsBackoff`, they use same CTEs: `filtered_builds`,
`time_sorted_builds`, but they are still different.

- Query is executed on every loop iteration. We can consider marking
specific preset as permanently failed as an optimization to avoid
executing query on every loop iteration. But I decided don't do it for
now.

- By default `FailureHardLimit` is set to 3.

- `FailureHardLimit` is configurable. Setting it to zero - means that
hard limit is disabled.

### Part 2

Notes:
- `PrebuildFailureLimitReached` notification is added.
- Notification is sent to template admins.
- Notification is sent only the first time, when hard limit is reached.
But it will `log.Warn` on every loop iteration.
- I introduced this enum:
```sql
CREATE TYPE prebuild_status AS ENUM (
  'normal',           -- Prebuilds are working as expected; this is the default, healthy state.
  'hard_limited',     -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.
  'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.
);
```
`validation_failed` not used in this PR, but I think it will be used in
next one, so I wanted to save us an extra migration.

- Notification looks like this:
<img width="472" alt="image"
src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27"
/>

### Latest notification views:
<img width="463" alt="image"
src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe"
/>
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a"
/>
2025-05-21 15:16:38 -04:00
Michael Suchacz b7462fb256 feat: improve transaction safety in CompleteJob function (#17970)
This PR refactors the CompleteJob function to use database transactions
more consistently for better atomicity guarantees. The large function
was broken down into three specialized handlers:

- completeTemplateImportJob
- completeWorkspaceBuildJob
- completeTemplateDryRunJob

Each handler now uses the Database.InTx wrapper to ensure all database
operations for a job completion are performed within a single
transaction, preventing partial updates in case of failures.

Added comprehensive tests for transaction behavior for each job type.

Fixes #17694

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-05-21 16:48:51 +02:00
Danielle Maywood 3e7ff9d9e1 chore(coderd/rbac): add Action{Create,Delete}Agent to ResourceWorkspace (#17932) 2025-05-20 21:20:56 +01:00
Steven Masley a123900fe8 chore: remove coder/preview dependency from codersdk (#17939) 2025-05-20 10:45:12 -05:00
Steven Masley e76d58f2b6 chore: disable parameter validatation for dynamic params for all transitions (#17926)
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated 
with the static parameters in the database.
2025-05-20 10:09:53 -05:00
Thomas Kosiewski 93f17bc73e fix: remove unnecessary user lookup in agent API calls (#17934)
# Use workspace.OwnerUsername instead of fetching the owner

This PR optimizes the agent API by using the `workspace.OwnerUsername` field directly instead of making an additional database query to fetch the owner's username. The change removes the need to call `GetUserByID` in the manifest API and workspace agent RPC endpoints.

An issue arose when the agent token was scoped without access to user data (`api_key_scope = "no_user_data"`), causing the agent to fail to fetch the manifest due to an RBAC issue.

Change-Id: I3b6e7581134e2374b364ee059e3b18ece3d98b41
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-05-20 17:07:50 +02:00
Danielle Maywood 1267c9c405 fix: ensure reason present for workspace autoupdated notification (#17935)
Fixes https://github.com/coder/coder/issues/17930

Update the `WorkspaceAutoUpdated` notification to only display the
reason if it is present.
2025-05-20 16:01:57 +01:00
Michael Suchacz 769c9ee337 feat: cancel stuck pending jobs (#17803)
Closes: #16488
2025-05-20 15:22:44 +02:00
Steven Masley 9c000468a1 chore: expose use_classic_parameter_flow on workspace response (#17925) 2025-05-19 21:59:15 +00:00
Steven Masley 358b64154e chore: skip parameter resolution for dynamic params (#17922)
Pass through the user input as is. The previous code only passed through
parameters that existed in the db (static params). This would omit
conditional params.

Validation is enforced by the dynamic params websocket, so validation at
this point is not required.
2025-05-19 16:15:15 -05:00
Danielle Maywood 61f22a59ba feat(agent): add ParentId to agent manifest (#17888)
Closes https://github.com/coder/internal/issues/648

This change introduces a new `ParentId` field to the agent's manifest.
This will allow an agent to know if it is a child or not, as well as
knowing who the owner is.

This is part of the Dev Container Agents work
2025-05-19 16:09:56 +01:00
Susana Ferreira f044cc3550 feat: add provisioner daemon name to provisioner jobs responses (#17877)
# Description

This PR adds the `worker_name` field to the provisioner jobs endpoint.

To achieve this, the following SQL query was updated:
-
`GetProvisionerJobsByOrganizationAndStatusWithQueuePositionAndProvisioner`

As a result, the `codersdk.ProvisionerJob` type, which represents the
provisioner job API response, was modified to include the new field.

**Notes:** 
* As mentioned in
[comment](https://github.com/coder/coder/pull/17877#discussion_r2093218206),
the `GetProvisionerJobsByIDsWithQueuePosition` query was not changed due
to load concerns. This means that for template and template version
endpoints, `worker_id` will still be returned, but `worker_name` will
not.
* Similar to `worker_id`, the `worker_name` is only present once a job
is assigned to a provisioner daemon. For jobs in a pending state (not
yet assigned), neither `worker_id` nor `worker_name` will be returned.

---

# Affected Endpoints

- `/organizations/{organization}/provisionerjobs`
- `/organizations/{organization}/provisionerjobs/{job}`

---

# Testing

- Added new tests verifying that both `worker_id` and `worker_name` are
returned once a provisioner job reaches the **succeeded** state.
- Existing tests covering state transitions and other logic remain
unchanged, as they test different scenarios.

---

# Front-end Changes

Admin provisioner jobs dashboard:
<img width="1088" alt="Screenshot 2025-05-16 at 11 51 33"
src="https://github.com/user-attachments/assets/0e20e360-c615-4497-84b7-693777c5443e"
/>

Fixes: https://github.com/coder/coder/issues/16982
2025-05-19 16:05:39 +01:00
Mathias Fredriksson 98e2ec4417 feat: show devcontainer dirty status and allow recreate (#17880)
Updates #16424
2025-05-19 12:56:10 +03:00