The metrics cache to calculate and expose build time metrics for
templates currently calls `GetTemplates`, which returns all templates
even if they are deleted. We can use the `GetTemplatesWithFilter` query
to easily filter out deleted templates from the results, and thus not
call `GetTemplateAverageBuildTime` for those deleted templates. Delete
time for workspaces for non-deleted templates is still calculated.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
fixes: https://github.com/coder/internal/issues/1143
Both gVisor and the Go standard library implementations of `net.Conn` can under certain circumstances return `nil` for `RemoteAddr()` and `LocalAddr()` calls. If we call their methods, we segfault.
This PR fixes these calls and adds ruleguard rules.
Note that `slog.F("remote_addr", conn.RemoteAddr())` is fine because slog detects the `nil` before attempting to stringify the type.
Previously setting AI Bridge retention to 0 would cause records to be
deleted immediately since we didn't check for the zero value before
calculating the deletion threshold.
This adds a check for aibridgeRetention > 0 to skip deletion when
retention is disabled, matching the pattern used for other retention
settings (connection logs, audit logs, etc.).
Also fixes the return type of DeleteOldAIBridgeRecords from int32 to
int64 since COUNT(*) returns bigint in PostgreSQL.
Refs #21055
Replace hardcoded 7-day retention for workspace agent logs with
configurable retention from deployment settings. Defaults to 7d to
preserve existing behavior.
Depends on #21038
Updates #20743
Replace hardcoded 7-day retention for expired API keys with configurable
retention from deployment settings. Skips deletion entirely when effective
retention is 0.
Depends on #21021
Updates #20743
Add configurable retention policy for audit logs. The DeleteOldAuditLogs
query excludes deprecated connection events (connect, disconnect, open,
close) which are handled separately by DeleteOldAuditLogConnectionEvents.
Disabled (0) by default.
Depends on #21021
Updates #20743
Add `DeleteOldConnectionLogs` query and integrate it into the `dbpurge`
routine. Retention is controlled by `--retention-connection-logs` flag.
Disabled (0) by default.
Depends on #21021
Updates #20743
Add `RetentionConfig` with server flags for configuring data retention:
- `--audit-logs-retention`: retention for audit log entries
- `--connection-logs-retention`: retention for connection logs
- `--api-keys-retention`: retention for expired API keys (default 7d)
Updates #20743
## Problem
`TestDescCacheTimestampUpdate` was flaky on Windows CI because
`time.Now()` has ~15.6ms resolution, causing consecutive calls to return
identical timestamps.
## Solution
Inject `quartz.Clock` into `MetricsAggregator` using an options pattern,
making the test deterministic by using a mock clock with explicit time
advancement.
### Changes
- Add `clock quartz.Clock` field to `MetricsAggregator` struct
- Add `WithClock()` option for dependency injection
- Replace all `time.Now()` calls with `ma.clock.Now()`
- Update test to use mock clock with `mClock.Advance(time.Second)`
---
This PR was fully generated by [`mux`](https://github.com/coder/mux)
using Claude Opus 4.5, and reviewed by me.
Closes https://github.com/coder/internal/issues/1146
## Problem
Users may not realize that task notifications are disabled by default.
To improve awareness, we show a warning alert on the Tasks page when all
task notifications are disabled.
**Alert visibility logic:**
- Shows when **all** task notification templates (Task Working, Task
Idle, Task Completed, Task Failed) are disabled
- Can be dismissed by the user, which stores the dismissal in the user
preferences API
- If the user later enables any task notification in Account Settings,
the dismissal state is cleared so the alert will show again if they
disable all notifications in the future
<img width="2980" height="1588" alt="Screenshot 2025-11-25 at 17 48 17"
src="https://github.com/user-attachments/assets/316bf097-d9d2-4489-bc16-2987ba45f45c"
/>
## Changes
- Added a warning alert to the Tasks page when all task notifications
are disabled
- Introduced new `/users/{user}/preferences` endpoint to manage user
preferences (stored in `user_configs` table)
- Alert is dismissible and stores the dismissal state via the new user
preferences API endpoint
- Enabling any task notification in Account Settings clears the
dismissal state via the preferences API
- Added comprehensive Storybook stories for both TasksPage and
NotificationsPage to test all alert visibility states and interactions
Closes: https://github.com/coder/internal/issues/1089
The agentapi context needs to be a context with some amount of
authorization attached to it via the context so that the cache refresh
routine can fetch the workspace from the db via GetWorkspaceForAgentID.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Closes https://github.com/coder/coder/issues/20913
I've ran the test without the fix, verified the test caught the issue,
then applied the fix, and confirmed the issue no longer happens.
---
🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code
and then review by a human 👩
## Context
GetWorkspaceAgentByInstanceID has a suboptimal plan. Even though it is
designed to fetch a small subset of records, there are no corresponding
indexes and that query results in full table scan:
Query:
```
SELECT id, auth_instance_id FROM workspace_agents
where auth_instance_id='i-013c2b96b6441648a' and deleted=FALSE;
```
Plan:
```
------------------------------------------------------------------------------------------------------------------
Seq Scan on workspace_agents (cost=0.00..222325.48 rows=2 width=36) (actual time=0.012..234.152 rows=4 loops=1)
Filter: ((NOT deleted) AND ((auth_instance_id)::text = 'i-013c2b96b6441648a'::text))
Rows Removed by Filter: 302276
Planning Time: 0.173 ms
Execution Time: 234.169 ms
```
After adding the index, the plan improves drastically.
Updated plan:
```
Bitmap Heap Scan on workspace_agents (cost=4.44..12.32 rows=2 width=36) (actual time=0.019..0.019 rows=0 loops=1)
Recheck Cond: (((auth_instance_id)::text = 'i-013c2b96b6441648a'::text) AND (NOT deleted))
-> Bitmap Index Scan on workspace_agents_auth_instance_id_deleted_idx (cost=0.00..4.44 rows=2 width=0) (actual time=0.013..0.014 rows=0 loops=1)
Index Cond: (((auth_instance_id)::text = 'i-013c2b96b6441648a'::text) AND (deleted = false))
Planning Time: 0.388 ms
Execution Time: 0.044 ms
```
## Changes
* add an index to optimize this query
## Testing
* ran the queries manually against prod and test DBs
* ran `./scripts/develop.sh`, connected to the local PostgreSQL
instance, inspected the indexes to make sure new index is there:
```
Indexes:
"workspace_agents_pkey" PRIMARY KEY, btree (id)
// NEW INDEX CREATED SUCCESSFULLY [comment is mine]
"workspace_agents_auth_instance_id_deleted_idx" btree (auth_instance_id, deleted)
"workspace_agents_auth_token_idx" btree (auth_token)
"workspace_agents_resource_id_idx" btree (resource_id)
```
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
closes https://github.com/coder/coder/issues/20899
This is in response to a migration in v2.27 that takes very long on
deployments with large `api_keys` tables.
NOTE: The optimization causes the _up_ migration to delete old data
(keys that expired more than 7 days ago). The _down_ migration won't
resurrect the deleted data.
This is to debug context timeouts on API requests to the agent.
Because rbac and database cannot be imported in slim, split the logger
middleware into slim and non-slim versions and break out the recovery
middleware.
## Summary
Completes the Coder Tasks GA promotion by updating swagger tags and
regenerating API documentation and updating the frontend API structure.
## Related
Follows #20923 and #20921 which promoted Tasks from Beta/Experimental to
GA.
---
🤖 This change was written by Claude Sonnet 4.5 Thinking using
[mux](https://github.com/coder/mux) and reviewed by a human 🏂
## Description
This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID`
returned an unbounded number of rows for a given app ID, which could
cause performance issues for noisy or long-running AI tasks.
## Impact
This change reduces database query overhead for workspace app status
updates, particularly for busy AI tasks that update their status
frequently. Previously, fetching the latest status would return all
historical statuses, now it returns only the most recent one.
Fixes#20862
---
🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻♂️
## Problem
Tasks currently only expose a machine-friendly name field (e.g.
`task-python-debug-a1b2`), but this value is primarily an identifier
rather than a clean, descriptive label. We need a separate
display-friendly name for use in the UI.
This PR introduces a new `display_name` field and updates the task-name
generation flow. The Claude system prompt was updated to return valid
JSON with both `name` and `display_name`. The name generation logic
follows a fallback chain (Anthropic > prompt sanitization > random
fallback). To make task names more closely resemble their display names,
the legacy `task-` prefix has been removed. For context, PR
https://github.com/coder/coder/pull/20834 introduced a small Task icon
to the workspace list to help identify workspaces associated to tasks.
## Changes
- Database migration: Added `display_name` column to tasks table
- Updated system prompt to generate both task name and display name as
valid JSON
- Task name generation now follows a fallback chain: Anthropic > prompt
sanitization > random fallback
- Removed `task-` prefix from task names to allow more descriptive names
- Note: PR https://github.com/coder/coder/pull/20834 adds a Task icon to
workspaces in the workspace list to distinguish task-created workspaces
**Note:** UI changes will be addressed in a follow-up PR
Related to: https://github.com/coder/coder/issues/20801
This PR adds the backend implementation for modifying task prompts. Part
of https://github.com/coder/internal/issues/1084
## Changes
- New `UpdateTaskPrompt` database query to update task prompts
- New PATCH `/api/v2/tasks/{task}/prompt` endpoint
## Notes
This is part 1 of a 2-part PR stack. The frontend UI will be added in a
follow-up PR based on this branch
(https://github.com/coder/coder/pull/20812).
---
🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
fixes https://github.com/coder/internal/issues/1123
We want to tests that ports are not included after they are no longer used, but this isn't safe on the real OS networking stack because there is no way to guarantee a port _won't_ be used. Instead, we introduce an interface and fake implementation for testing.
On order to leave the filtering logic in the test path, this PR also does some refactoring.
Caching logic is left in the real OS querying implementation and a new test case is added for it in this PR.
Closes https://github.com/coder/coder/issues/20711
We now allow agents to be created on dormant workspaces.
I've ran the test with and without the change. I've confirmed that -
without the fix - it triggers the "rbac: unauthorized" error.
Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54)
When querying against the values in the database for
`/api/experimental/aibridge/interceptions` we found strange behaviour
wherein there was interceptions that lacked prompting and other various
fields we want. Generally this was as a result of the data not actually
existing for these values (as they were inflight).
The simple solution to this was to hide them if they didn't exist. This
PR addresses that.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
## Problem
Workspaces associated with tasks were not visually distinguishable in
the workspaces list view. Additionally, the list workspaces endpoint was
not returning the `task_id` field.
<img width="2784" height="864" alt="Screenshot 2025-11-20 at 10 32 22"
src="https://github.com/user-attachments/assets/60704f16-3c66-4553-9215-f10654998a38"
/>
## Changes
- Fix `ConvertWorkspaceRows` to include `task_id` in the list workspaces
endpoint response
- Add "Task" icon to the workspace list view for workspaces associated
with tasks
- Add test to verify `task_id` is correctly returned by the list
workspaces endpoint
- Add Storybook story to showcase the Task icon in the workspace list
Closes https://github.com/coder/coder/issues/20802
Fixes: #20744
Upsert audit and connection log entries with a context derived from the API context, rather than the individual request so that we don't error out if the request is canceled or the client hangs up (e.g. if we return an error).
Database insert errors will fail the transaction. So this error is
fatal. Properly return it for a better error call stack, and not just
hiding the error in the logs.
The test flake can be verified by setting `ReportInterval` to a really
low value, like `100 * time.Millisecond`.
We now set it to a really high value to avoid triggering flush without
manually calling the function in test. This can easily happen because
the default value is 30s and we run tests in parallel. The assertion
typically happens such that:
[use workspace] -> [fetch previous last used] -> [flush]
-> [fetch new last used]
When this edge case is triggered:
[use workspace] -> [report interval flush]
-> [fetch previous last used] -> [flush] -> [fetch new last used]
In this case, both the previous and new last used will be the same,
breaking the test assertion.
Fixescoder/internal#960Fixescoder/internal#975
## Problem
With the new tasks data model, a task starts with an `initializing`
status. However, the API returns `current_state: null` to represent the
agent state, causing the frontend to display "No message available".
This PR updates `codersdk.Task` to return a `current_state` when the
task is initializing with meaningful messages about what's happening
during task initialization.
**Previous message**
<img width="2764" height="288" alt="Screenshot 2025-11-07 at 09 06 13"
src="https://github.com/user-attachments/assets/feec9f15-91ca-4378-8565-5f9de062d11a"
/>
**New message**
<img width="2726" height="226" alt="Screenshot 2025-11-12 at 11 00 15"
src="https://github.com/user-attachments/assets/2f9bee3e-7ac4-4382-b1c3-1d06bbc2906e"
/>
## Changes
- Populate `current_state` with descriptive initialization messages when
task status is `initializing` and no valid app status exists for the
current build
- **dbfake**: Fix `WorkspaceBuild` builder to properly handle
pending/running jobs by linking tasks without requiring agent/app
resources
**Note:** UI Storybook changes to reflect these new messages will be
addressed in a follow-up PR.
Closes: https://github.com/coder/internal/issues/1063
This change restructures the `tasks_with_status` view query to:
- Improve debuggability by adding a `status_debug` column to better
understand the outcome
- Reduce clutter from `bool_or`, `bool_and` which are aggregate
functions that did not actually have serve a purpose (each join is 0-1
rows)
- Improve agent lifecycle state coverage, `start_timeout` and
`start_error` were omitted
- These states are easy to trigger even in a perfectly functioning
workspace/task so we now rely on app health to report whether or not
there was an issue
- Mark canceling and canceled workspace build jobs as error state
- Agent stop states were implicitly `unknown`, now there are explicit (I
initially considered `error`, could go either way)
Previously, `UpdateWorkspaceTimingsMetrics` would log a warning for
workspace builds that aren't tracked (restarts, stops, subsequent builds
after creation). This was noisy since these are legitimate operations,
not errors.
`UpdateWorkspaceTimingsMetrics` is specifically designed to track only
workspace creation, prebuild creation, and prebuild claim timings.
Related with: https://github.com/coder/coder/pull/20772
`prometheus.provisionerd_server_metrics: unsupported workspace timing
flags` appears in the logs, but without knowledge of the available flags
it's not possible to troubleshoot this.
Signed-off-by: Danny Kopping <danny@coder.com>
For experimental and dogfood purposes, this adds the ability to opt in a single template.
Leaving the rest of the templates as is.
For GA, this setting might be removed or changed.
Prior to this, every workspace build ran `terraform init` in a fresh
directory. This would mean the `modules` are downloaded fresh. If the
module is not pinned, subsequent workspace builds would have different
modules.
Experiments passed to provisioners to determine behavior. This adds
`--experiments` flag to provisioner daemons. Prior to this, provisioners
had no method to turn on/off experiments.
Adds some extra meta data sent to provisioners. Also adds a field
`reuse_terraform_workspace` to tell the provisioner whether or not to
use the caching experiment.
Currently, when AI Bridge is enabled AND the `oauth2` and
`mcp-server-http` experiments are enabled we inject Coder's MCP tools
into all intercepted AI Bridge requests.
This PR introduces a config to control this behaviour.
**NOTE:** this is a backwards-incompatible change; previously these
tools would be injected automatically, now this setting will need to be
explicitly enabled.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Somewhat minor inefficiency in notifications I discovered during a scaletest where I was creating many users. Our `GetUsers` query filter for rbac roles uses the `&&` operator on arrays, which is the intersection of the two arrays. Despite that, we were making seperate DB queries for each role, and then collating the results. I didn't see any other instances of this.
The test changes are required as the order of outgoing notifications is now non-deterministic.
Relates to https://github.com/coder/internal/issues/1098
Currently AgentAPI waits for only 2 seconds worth of identical terminal
screen snapshots before deciding a task has entered a "stable" state. We
interpret this as becoming "idle", resulting in a notification being
triggered. This behavior is not ideal and is ultimately the root cause
of our spammy notifications.
Unfortunately, until we move AgentAPI to either use the Claude Code SDK
(or ACP wrapper around it), we are unable to easily fix the root cause.
This PR instead waits until the agent is ready before it will send state
change notifications. This will at least resolve _some_ of the
complaints about task state notifications being too spammy.
---
🤖 PR was written by Claude Sonnet 4.5 using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩