Commit Graph

3026 Commits

Author SHA1 Message Date
Mykyta Protsenko c87c33f7dd perf: add index to improve the GetWorkspaceAgentByInstanceID query performance (#20936)
## Context

GetWorkspaceAgentByInstanceID has a suboptimal plan. Even though it is
designed to fetch a small subset of records, there are no corresponding
indexes and that query results in full table scan:

Query:

```
SELECT id, auth_instance_id FROM workspace_agents
where auth_instance_id='i-013c2b96b6441648a' and deleted=FALSE;
```

Plan:

```
------------------------------------------------------------------------------------------------------------------
 Seq Scan on workspace_agents  (cost=0.00..222325.48 rows=2 width=36) (actual time=0.012..234.152 rows=4 loops=1)
   Filter: ((NOT deleted) AND ((auth_instance_id)::text = 'i-013c2b96b6441648a'::text))
   Rows Removed by Filter: 302276
 Planning Time: 0.173 ms
 Execution Time: 234.169 ms
```

After adding the index, the plan improves drastically.

Updated plan:

```
 Bitmap Heap Scan on workspace_agents  (cost=4.44..12.32 rows=2 width=36) (actual time=0.019..0.019 rows=0 loops=1)
   Recheck Cond: (((auth_instance_id)::text = 'i-013c2b96b6441648a'::text) AND (NOT deleted))
   ->  Bitmap Index Scan on workspace_agents_auth_instance_id_deleted_idx  (cost=0.00..4.44 rows=2 width=0) (actual time=0.013..0.014 rows=0 loops=1)
         Index Cond: (((auth_instance_id)::text = 'i-013c2b96b6441648a'::text) AND (deleted = false))
 Planning Time: 0.388 ms
 Execution Time: 0.044 ms
```

## Changes

* add an index to optimize this query

## Testing

* ran the queries manually against prod and test DBs
* ran `./scripts/develop.sh`, connected to the local PostgreSQL
instance, inspected the indexes to make sure new index is there:

```
Indexes:
    "workspace_agents_pkey" PRIMARY KEY, btree (id)
    // NEW INDEX CREATED SUCCESSFULLY  [comment is mine]
    "workspace_agents_auth_instance_id_deleted_idx" btree (auth_instance_id, deleted)
    "workspace_agents_auth_token_idx" btree (auth_token)
    "workspace_agents_resource_id_idx" btree (resource_id)
```

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-11-26 05:57:25 +02:00
George K a9261577bc perf: optimize migration 371 to run faster on large deployments (#20906)
closes https://github.com/coder/coder/issues/20899

This is in response to a migration in v2.27 that takes very long on
deployments with large `api_keys` tables.

NOTE: The optimization causes the _up_ migration to delete old data
(keys that expired more than 7 days ago). The _down_ migration won't
resurrect the deleted data.
2025-11-25 21:44:59 -06:00
Asher c266bb830c chore: add debug logging and recovery to agent api requests (#20785)
This is to debug context timeouts on API requests to the agent.

Because rbac and database cannot be imported in slim, split the logger
middleware into slim and non-slim versions and break out the recovery
middleware.
2025-11-25 14:59:20 -09:00
Callum Styan b0e8384b82 perf: reduce DB calls to GetWorkspaceByAgentID via caching workspace info (#20662)
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-11-25 14:45:05 -08:00
Mathias Fredriksson e189dc1f81 fix: complete Tasks GA promotion (docs, site) (#20927)
## Summary

Completes the Coder Tasks GA promotion by updating swagger tags and
regenerating API documentation and updating the frontend API structure.

## Related

Follows #20923 and #20921 which promoted Tasks from Beta/Experimental to
GA.

---

🤖 This change was written by Claude Sonnet 4.5 Thinking using
[mux](https://github.com/coder/mux) and reviewed by a human 🏂
2025-11-25 16:46:13 +00:00
Danielle Maywood b255827a52 chore: promote tasks to stable from experimental (#20921)
- Promote tasks from `/api/experimental` to `/api/v2`.
- Move sdk from `ExperimentalClient` to `Client`.
- Update swagger
2025-11-25 15:24:25 +00:00
Mathias Fredriksson 37fc6646ad perf(coderd/database): limit GetLatestWorkspaceAppStatusByAppID to 1 row (#20917)
## Description

This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID`
returned an unbounded number of rows for a given app ID, which could
cause performance issues for noisy or long-running AI tasks.

## Impact

This change reduces database query overhead for workspace app status
updates, particularly for busy AI tasks that update their status
frequently. Previously, fetching the latest status would return all
historical statuses, now it returns only the most recent one.

Fixes #20862

---

🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️
2025-11-25 16:56:42 +02:00
Susana Ferreira 3011207519 feat: add display name field for tasks (#20856)
## Problem

Tasks currently only expose a machine-friendly name field (e.g.
`task-python-debug-a1b2`), but this value is primarily an identifier
rather than a clean, descriptive label. We need a separate
display-friendly name for use in the UI.

This PR introduces a new `display_name` field and updates the task-name
generation flow. The Claude system prompt was updated to return valid
JSON with both `name` and `display_name`. The name generation logic
follows a fallback chain (Anthropic > prompt sanitization > random
fallback). To make task names more closely resemble their display names,
the legacy `task-` prefix has been removed. For context, PR
https://github.com/coder/coder/pull/20834 introduced a small Task icon
to the workspace list to help identify workspaces associated to tasks.

## Changes

- Database migration: Added `display_name` column to tasks table
- Updated system prompt to generate both task name and display name as
valid JSON
- Task name generation now follows a fallback chain: Anthropic > prompt
sanitization > random fallback
- Removed `task-` prefix from task names to allow more descriptive names
- Note: PR https://github.com/coder/coder/pull/20834 adds a Task icon to
workspaces in the workspace list to distinguish task-created workspaces

**Note:** UI changes will be addressed in a follow-up PR

Related to: https://github.com/coder/coder/issues/20801
2025-11-25 13:00:59 +00:00
Danielle Maywood 82f525baf3 feat(coderd): add task prompt modification endpoint (#20811)
This PR adds the backend implementation for modifying task prompts. Part
of https://github.com/coder/internal/issues/1084

## Changes

- New `UpdateTaskPrompt` database query to update task prompts
- New PATCH `/api/v2/tasks/{task}/prompt` endpoint

## Notes

This is part 1 of a 2-part PR stack. The frontend UI will be added in a
follow-up PR based on this branch
(https://github.com/coder/coder/pull/20812).

---

🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
2025-11-25 11:13:32 +00:00
Spike Curtis afd40436f0 fix: mock Agent querying OS for listening ports in tests (#20842)
fixes https://github.com/coder/internal/issues/1123

We want to tests that ports are not included after they are no longer used, but this isn't safe on the real OS networking stack because there is no way to guarantee a port _won't_ be used. Instead, we introduce an interface and fake implementation for testing.

On order to leave the filtering logic in the test path, this PR also does some refactoring.

Caching logic is left in the real OS querying implementation and a new test case is added for it in this PR.
2025-11-25 14:25:24 +04:00
Danielle Maywood c12303f0b2 fix: allow agents to be created on dormant workspaces (#20909)
Closes https://github.com/coder/coder/issues/20711

We now allow agents to be created on dormant workspaces.

I've ran the test with and without the change. I've confirmed that -
without the fix - it triggers the "rbac: unauthorized" error.
2025-11-25 06:24:33 +00:00
Callum Styan 658e8c34a9 perf: improve performance of metricsAggregator path by reducing memory allocations (#20724)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-11-24 15:45:08 -08:00
Jake Howell ca560d36ce fix: remove inflight interceptions from aibridge returned values (#20852)
Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54)

When querying against the values in the database for
`/api/experimental/aibridge/interceptions` we found strange behaviour
wherein there was interceptions that lacked prompting and other various
fields we want. Generally this was as a result of the data not actually
existing for these values (as they were inflight).

The simple solution to this was to hide them if they didn't exist. This
PR addresses that.

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2025-11-25 10:23:39 +11:00
Steven Masley cefe07d074 feat: purge expired api keys in dbpurge (#20863)
closes https://github.com/coder/coder/issues/19889

This is in response to a migration in v2.27 that takes very long on deployments with large `api_key` tables.
2025-11-24 10:24:32 -06:00
Atif Ali 636408906f chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831) 2025-11-24 18:09:04 +05:00
Susana Ferreira 2a9afc77de feat: associate task icon with workspaces (#20834)
## Problem

Workspaces associated with tasks were not visually distinguishable in
the workspaces list view. Additionally, the list workspaces endpoint was
not returning the `task_id` field.

<img width="2784" height="864" alt="Screenshot 2025-11-20 at 10 32 22"
src="https://github.com/user-attachments/assets/60704f16-3c66-4553-9215-f10654998a38"
/>

## Changes

- Fix `ConvertWorkspaceRows` to include `task_id` in the list workspaces
endpoint response
- Add "Task" icon to the workspace list view for workspaces associated
with tasks
- Add test to verify `task_id` is correctly returned by the list
workspaces endpoint
- Add Storybook story to showcase the Task icon in the workspace list

Closes https://github.com/coder/coder/issues/20802
2025-11-21 11:47:10 +00:00
Danny Kopping 5a7d4f69f6 feat: add configurable retention for aibridge (#20828)
Closes https://github.com/coder/internal/issues/1134

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-21 11:35:36 +02:00
Marcin Tojek d004710a74 feat: add prebuild invalidation via last_invalidated_at timestamp (#20582)
Updates #17917
2025-11-20 17:12:25 +01:00
Spike Curtis 007f2df079 fix: use API, not request context to insert audit/connection logs (#20829)
Fixes: #20744

Upsert audit and connection log entries with a context derived from the API context, rather than the individual request so that we don't error out if the request is canceled or the client hangs up (e.g. if we return an error).
2025-11-20 13:01:50 +04:00
Steven Masley a10c5ff381 chore: protect build timings insert for invalid enums (#20821)
Database insert errors will fail the transaction. So this error is
fatal. Properly return it for a better error call stack, and not just
hiding the error in the logs.
2025-11-19 09:34:19 -06:00
Mathias Fredriksson f6556fce9f test(coderd/workspaceapps/apptest): fix lastusedat assertion for all test (#20827)
The test flake can be verified by setting `ReportInterval` to a really
low value, like `100 * time.Millisecond`.

We now set it to a really high value to avoid triggering flush without 
manually calling the function in test. This can easily happen because 
the default value is 30s and we run tests in parallel. The assertion
typically happens such that:

	[use workspace] -> [fetch previous last used] -> [flush]
	-> [fetch new last used]

When this edge case is triggered:

	[use workspace] -> [report interval flush]
	-> [fetch previous last used] -> [flush] -> [fetch new last used]

In this case, both the previous and new last used will be the same,
breaking the test assertion.

Fixes coder/internal#960
Fixes coder/internal#975
2025-11-19 15:10:59 +02:00
Susana Ferreira 16b8e6072f fix: set codersdk.Task current_state during task initialization (#20692)
## Problem

With the new tasks data model, a task starts with an `initializing`
status. However, the API returns `current_state: null` to represent the
agent state, causing the frontend to display "No message available".
This PR updates `codersdk.Task` to return a `current_state` when the
task is initializing with meaningful messages about what's happening
during task initialization.

**Previous message**

<img width="2764" height="288" alt="Screenshot 2025-11-07 at 09 06 13"
src="https://github.com/user-attachments/assets/feec9f15-91ca-4378-8565-5f9de062d11a"
/>

**New message**

<img width="2726" height="226" alt="Screenshot 2025-11-12 at 11 00 15"
src="https://github.com/user-attachments/assets/2f9bee3e-7ac4-4382-b1c3-1d06bbc2906e"
/>

## Changes

- Populate `current_state` with descriptive initialization messages when
task status is `initializing` and no valid app status exists for the
current build
- **dbfake**: Fix `WorkspaceBuild` builder to properly handle
pending/running jobs by linking tasks without requiring agent/app
resources

**Note:** UI Storybook changes to reflect these new messages will be
addressed in a follow-up PR.

Closes: https://github.com/coder/internal/issues/1063
2025-11-17 13:24:12 +00:00
Mathias Fredriksson fa314fe7e5 fix(coderd/database): rename duplicate migration 397 to 398 (#20783)
Fix duplicate migration from #20683.
2025-11-14 18:05:29 +00:00
Mathias Fredriksson 1483fd11ff fix(coderd/database): improve task status in tasks_with_status view (#20683)
This change restructures the `tasks_with_status` view query to:

- Improve debuggability by adding a `status_debug` column to better
understand the outcome
- Reduce clutter from `bool_or`, `bool_and` which are aggregate
functions that did not actually have serve a purpose (each join is 0-1
rows)
- Improve agent lifecycle state coverage, `start_timeout` and
`start_error` were omitted
- These states are easy to trigger even in a perfectly functioning
workspace/task so we now rely on app health to report whether or not
there was an issue
- Mark canceling and canceled workspace build jobs as error state
- Agent stop states were implicitly `unknown`, now there are explicit (I
initially considered `error`, could go either way)
2025-11-14 19:52:26 +02:00
Steven Masley f23836d426 chore: add more scopes to the curated catalog (#20746)
Just noticed when writing docs. These are probably obvious scopes to
allow
2025-11-14 08:30:10 -06:00
Susana Ferreira 79d46769fe chore: remove warning for non-trackable workspace builds in metrics (#20775)
Previously, `UpdateWorkspaceTimingsMetrics` would log a warning for
workspace builds that aren't tracked (restarts, stops, subsequent builds
after creation). This was noisy since these are legitimate operations,
not errors.

`UpdateWorkspaceTimingsMetrics` is specifically designed to track only
workspace creation, prebuild creation, and prebuild claim timings.

Related with: https://github.com/coder/coder/pull/20772
2025-11-14 12:26:32 +00:00
Danny Kopping 86c4948445 chore: add timing flag context to warn message (#20772)
`prometheus.provisionerd_server_metrics: unsupported workspace timing
flags` appears in the logs, but without knowledge of the available flags
it's not possible to troubleshoot this.

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-14 10:10:53 +00:00
Steven Masley fe3b825b86 chore: per template opt into cached terraform directories (#20609)
For experimental and dogfood purposes, this adds the ability to opt in a single template. 
Leaving the rest of the templates as is. 

For GA, this setting might be removed or changed.
2025-11-13 14:04:12 -06:00
Steven Masley 9ca5b44b56 chore: implement persistent terraform directories (experimental) (#20563)
Prior to this, every workspace build ran `terraform init` in a fresh
directory. This would mean the `modules` are downloaded fresh. If the
module is not pinned, subsequent workspace builds would have different
modules.
2025-11-13 07:50:17 -06:00
Steven Masley 04727c06e8 chore: add experiment toggle for terraform workspace caching (#20559)
Experiments passed to provisioners to determine behavior. This adds
`--experiments` flag to provisioner daemons. Prior to this, provisioners
had no method to turn on/off experiments.
2025-11-12 14:26:15 -06:00
Steven Masley 9149c1e9f2 chore: append template metadata to protobuf config (#20558)
Adds some extra meta data sent to provisioners. Also adds a field
`reuse_terraform_workspace` to tell the provisioner whether or not to
use the caching experiment.
2025-11-12 12:46:39 -06:00
Mathias Fredriksson e61b0fcf42 chore(codersdk): deprecate HasAITask on WorkspaceBuild (#20732)
Closes coder/internal#973
2025-11-12 10:27:06 +00:00
Danny Kopping 04f809f2d0 chore!: allow coder MCP tools to not be injected (#20713)
Currently, when AI Bridge is enabled AND the `oauth2` and
`mcp-server-http` experiments are enabled we inject Coder's MCP tools
into all intercepted AI Bridge requests.

This PR introduces a config to control this behaviour.

**NOTE:** this is a backwards-incompatible change; previously these
tools would be injected automatically, now this setting will need to be
explicitly enabled.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-12 11:23:01 +02:00
Ethan e49c917bb0 perf: use a single query for notification target lookups (#20574)
Somewhat minor inefficiency in notifications I discovered during a scaletest where I was creating many users. Our `GetUsers` query filter for rbac roles uses the `&&` operator on arrays, which is the intersection of the two arrays. Despite that, we were making seperate DB queries for each role, and then collating the results. I didn't see any other instances of this.

The test changes are required as the order of outgoing notifications is now non-deterministic.
2025-11-11 21:23:23 -05:00
Danielle Maywood f2a1a7e8c3 fix(coderd): gate AI task notifications on agent ready state (#20690)
Relates to https://github.com/coder/internal/issues/1098

Currently AgentAPI waits for only 2 seconds worth of identical terminal
screen snapshots before deciding a task has entered a "stable" state. We
interpret this as becoming "idle", resulting in a notification being
triggered. This behavior is not ideal and is ultimately the root cause
of our spammy notifications.

Unfortunately, until we move AgentAPI to either use the Claude Code SDK
(or ACP wrapper around it), we are unable to easily fix the root cause.

This PR instead waits until the agent is ready before it will send state
change notifications. This will at least resolve _some_ of the
complaints about task state notifications being too spammy.

---

🤖 PR was written by Claude Sonnet 4.5 using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
2025-11-10 16:00:13 +00:00
Paweł Banaszewski 991831b1dd chore: add API key ID to interceptions (#20513)
Adds APIKeyID to interceptions.
Needed for tracking API key usage with bridge.
fixes https://github.com/coder/coder/issues/20001
2025-11-10 13:46:41 +01:00
Mathias Fredriksson ce04f6cc5d fix(coderd): remove deprecated AITaskSidebarApp column (#20680)
This column was no longer used in `v2.28` and the codersdk field
deprecated. Both can now be dropped in `v2.29`.

Closes coder/internal#974
2025-11-07 12:45:45 +02:00
Cian Johnston 34f6e72879 feat(coderd): add lookup task by name in httpmw.TaskParam (#20647)
* Adds a `GetTaskByOwnerIDAndName` query
* Updates `httpmw.TaskParam` to fall back to task name if no task by
UUID found.
* Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.
2025-11-05 14:28:34 +00:00
Mathias Fredriksson daad93967a fix(coderd): fix template ai task check error message (#20651)
Create task was still mentioning magic prompt parameter when checking
template task validity. This change updates it to only mention validity
of `coder_ai_task` resource.
2025-11-03 12:54:43 +00:00
Mathias Fredriksson a6b0eae38d refactor(coderd): drop sidebar app constraint and simplify provisionerdserver for tasks (#20591)
Updates coder/internal#973
Updates coder/internal#974
2025-11-03 13:46:38 +02:00
Cian Johnston 1961252918 chore(coderd/provisionerdserver): address flake in TestServer_ExpirePrebuildsSessionToken (#20648)
Addresses a flake seen locally by @mafredri:

```
panic: interface conversion: proto.isAcquiredJob_Type is nil, not *proto.AcquiredJob_WorkspaceBuild_ [recovered]
        panic: interface conversion: proto.isAcquiredJob_Type is nil, not *proto.AcquiredJob_WorkspaceBuild_

goroutine 77 [running]:
testing.tRunner.func1.2({0x35ba440, 0xc000f15620})
        /usr/local/go/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1737 +0x35e
panic({0x35ba440?, 0xc000f15620?})
        /usr/local/go/src/runtime/panic.go:792 +0x132
github.com/coder/coder/v2/coderd/provisionerdserver_test.TestServer_ExpirePrebuildsSessionToken(0xc00010d500)                                 /home/coder/coder/coderd/provisionerdserver/provisionerdserver_test.go:4128 +0xc4b
testing.tRunner(0xc00010d500, 0x4bd8450)
        /usr/local/go/src/testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 1
        /usr/local/go/src/testing/testing.go:1851 +0x413
FAIL    github.com/coder/coder/v2/coderd/provisionerdserver     20.830s
FAIL
```

It's unclear why this would happen in the first place.
2025-11-03 11:39:02 +00:00
Mathias Fredriksson 7ae3fdc749 refactor: use task data model for notifications (#20590)
Updates coder/internal#973
Updates coder/internal#974
2025-10-31 15:53:27 +02:00
Asher d306a2d7e5 chore: log with %s on unexpected non-sdk err (#20570)
With `%w` it prints an address instead of the error, like `<op> <url>
0xc001329370` instead of `<op> <url>: some error`, honestly idk why you
even can log with `%w` it seems like it makes no sense to use `%w`
outside of `fmt.Errorf`.

This is to help debug https://github.com/coder/internal/issues/1010.
2025-10-30 10:23:52 -08:00
Danielle Maywood d80b5fc8ed refactor!: remove TaskAppID from codersdk.WorkspaceBuild (#20583)
Remove the `TaskAppID` field from `codersdk.WorkspaceBuild`. Consumers can instead use the new `codersdk.Task` data model for this information.
2025-10-30 16:45:51 +00:00
Cian Johnston 38017010ce fix(coderd): disallow POSTing a workspace build on a deleted workspace (#20584)
- Adds a check on /api/v2/workspacebuilds to disallow creating a START or STOP build if the workspace is deleted. 
- DELETEs are still allowed.
2025-10-30 13:32:18 +00:00
Cian Johnston 73dedcc765 fix: delete related task when deleting workspace (#20567)
* Instead of prompting the user to start a deleted workspace (which is
silly), prompt them to create a new task instead.
* Adds a warning dialog when deleting a workspace
* Updates provisionerdserver to delete the related task if a workspace
is related to a task
2025-10-30 10:37:51 +00:00
Steven Masley 54497f4f6b chore: add revocation endpoint to oauth well-known (#20561)
Was added to apps endpoints, but not the wider site ones. This is a site
wide oauth route
2025-10-29 16:44:53 -05:00
Mathias Fredriksson 859e94d67a fix: deprecate codersdk.AITaskPromptParameterName and reduce usage (#20501)
Depends on coder/sqlc#1
Fixes coder/internal#979
Updates coder/internal#973
2025-10-29 18:59:12 +00:00
Mathias Fredriksson 303e9ef7de fix: switch to coder/sqlc fork (#20536)
Refs https://github.com/coder/sqlc/pull/1
Unblocks https://github.com/coder/coder/pull/20501

Upstream https://github.com/sqlc-dev/sqlc/pull/4159
2025-10-29 18:45:56 +02:00
Cian Johnston 1ebc217624 fix: update task link AppStatus using task_id (#20543)
Fixes https://github.com/coder/coder/issues/20515

Alternative to https://github.com/coder/coder/pull/20519

Adds `task_id` to `workspaces_expanded` view and updates the "View Task"
link in `AppStatuses` component.

NOTE: this contains a migration
2025-10-29 15:45:45 +00:00