Commit Graph

109 Commits

Author SHA1 Message Date
Ethan c4db03f11a perf(coderd/database): skip redundant chat row update in InsertChatMessage (#23111)
## Summary

- add an `IS DISTINCT FROM` guard to `InsertChatMessage`'s
`updated_chat` CTE so `chats.last_model_config_id` is only rewritten
when the incoming `model_config_id` actually changes
- regenerate the query layer
- add focused regression coverage for the two meaningful behaviors:
same-model inserts and real model switches
- trim redundant message-field assertions so the new test stays focused
on the guard behavior

## Proof this is an improvement

This PR reduces work in the hottest chat write query without changing
the insert behavior.

### Why the old query did unnecessary work

Before this change, `InsertChatMessage` always ran this update whenever
`model_config_id` was non-null:

```sql
UPDATE chats
SET last_model_config_id = sqlc.narg('model_config_id')::uuid
WHERE id = @chat_id::uuid
  AND sqlc.narg('model_config_id')::uuid IS NOT NULL
```

That means the query rewrote the `chats` row even when
`chats.last_model_config_id` was already equal to the incoming value.

### What changes in this PR

This PR adds:

```sql
AND chats.last_model_config_id IS DISTINCT FROM sqlc.narg('model_config_id')::uuid
```

So same-model inserts still insert the message, but they no longer
perform a redundant `UPDATE chats`.

### Why this matters on the hot path

From the chat scaletest investigation that motivated this change:

- `InsertChatMessage` (+ `updated_chat` CTE) was the hottest write query
- about **104k calls**
- about **0.69 ms average latency**
- about **71.8 s total DB execution time**

We also verified common callsites where the update is provably
redundant:

- `CreateChat` inserts the chat with `LastModelConfigID =
opts.ModelConfigID`, then immediately inserts initial system/user
messages with that same model config
- follow-up user messages commonly pass `lockedChat.LastModelConfigID`
straight into `InsertChatMessage`
- assistant/tool/summary persistence keeps the current model in the
common case; only real switches or fallback cases need the chat row
update

That means a meaningful fraction of executions of the hottest DB write
query move from:

- **before:** insert message **+** rewrite chat row
- **after:** insert message only

This should reduce row churn and write contention on `chats`, especially
against other chat-row writers like `UpdateChatStatus` and
`GetChatByIDForUpdate`.
2026-03-17 00:44:10 +11:00
Mathias Fredriksson 4a79af1a0d refactor: add chat_message_role enum and content_version column (#23042)
Migration 000434 converts chat_messages.role from text to a Postgres
enum, rebuilds the partial index, and adds content_version smallint.
The column is backfilled with DEFAULT 0, then the default is dropped
so future inserts must set it explicitly.

Version 0 uses the role-aware heuristic from #22958. Version 1 (all
new inserts) stores []ChatMessagePart JSON for all roles, including
system messages. ParseContent takes database.ChatMessage directly
and dispatches on version internally. Unknown versions error.

All string(codersdk.ChatMessageRole*) casts at DB write sites are
replaced with database.ChatMessageRole* constants from sqlc.

Refs #22958
2026-03-13 16:47:36 +00:00
George K e5c19d0af4 feat: backend support for creating and storing service accounts (#22698)
Add is_service_account column to users table with CHECK constraints
enforcing login_type='none' and empty email for service accounts.
Update user creation API to validate service account constraints.

Related to:
https://linear.app/codercom/issue/PLAT-27/feat-backend-support-for-creating-and-storing-service-accounts
2026-03-11 10:19:08 -07:00
Jon Ayers 22a87f6cf6 fix: filter sub-agents from build duration metric (#22732) 2026-03-10 12:17:32 -05:00
Jon Ayers e7ea649dc2 fix: optimize GetProvisionerJobsByIDsWithQueuePosition query (#22724) 2026-03-09 16:47:02 -05:00
Kyle Carberry aba3832b15 fix: update the compaction message to be the "user" role (#22819)
## Bug

After compaction in the chat loop, the loop re-enters and calls the LLM
with a prompt that has **no non-system messages**. Anthropic (and most
providers) require at least one user/assistant/tool message, so the API
errors with empty messages.

## Root Cause

The compaction summary was stored as `role=system`. After compaction,
`GetChatMessagesForPromptByChatID` returns only:
- The compressed system summary (matched by the CTE)
- Original non-compressed system messages (system prompts)

All original user/assistant/tool messages are excluded (they predate the
summary). The compaction assistant/tool messages are `compressed=TRUE`
and don't match the main query's `compressed=FALSE` clauses.

So `ReloadMessages` returned only system messages. The Anthropic
provider moves system messages into a separate `system` field, leaving
the `messages` API field as `[]`.

## Fix

1. **Changed compaction summary from `role=system` to `role=user`** —
the summary now appears as a user message in the reloaded prompt, giving
the model valid conversational context to respond to.

2. **Simplified the CTE** — removed the `role = 'system'` check and
narrowed `visibility IN ('model', 'both')` to just `visibility =
'model'`. The summary is the only compressed message with
`visibility=model` (the assistant has `visibility=user`, the tool has
`visibility=both`), so the role check was redundant.

## Test

`PostRunCompactionReEntryIncludesUserSummary`: verifies the re-entry
prompt contains a user message (the compaction summary) after compaction
+ reload.
2026-03-08 22:25:27 -04:00
Danielle Maywood f91475cd51 test: remove unnecessary dbauthz.AsSystemRestricted calls in tests (#22663) 2026-03-05 20:29:49 +00:00
Sas Swart cfcb81fb0f fix: user status change chart accommodates DST (#22191)
closes https://github.com/coder/internal/issues/464

# Summary

This PR resolves a flaky test that was sensitive to DST transitions in
various time zones. The root of the flake was:
* a bug; the query and its tests assume 24 hours per day
* the tests used local system time, which resulted in failures for dates
proximal to DST transitions

# Changes

Query:

The original query assumed 24 hour intervals between each day, which is
not a valid assumption. It now increments `1 day` at a time.

Database tests:

Database level tests for the query all assumed 24 hour days. They now
increment in DST-aware days instead. Instead of using time.Now() as a
base for testing, the test uses a series of dates over the course of an
entire year, to ensure that DST transition dates are present in every
test run.

# API Endpoint

The endpoint that delivers the user status chart now accepts an IANA
timezone name as a parameter and passes it, keeping the existing offset
as a fallback, to the database query.

API level tests were added to ensure the correct response form and error
behaviour. Correctness of content is tested at the database level.
2026-03-04 12:54:39 +02:00
Kyle Carberry edee917d88 feat: add experimental agents support (#22290)
feat: add AI chat system with agent tools and chat UI

Introduce the chatd subsystem and Agents UI for AI-powered chat
within Coder workspaces.

- Add chatd package with chat loop, message compaction, prompt
  management, and LLM provider integration (OpenAI, Anthropic)
- Add agent tools: create workspace, list/read templates, read/write/
  edit files, execute commands
- Add chat API endpoints with streaming, message editing, and
  durable reconnection
- Add database schema and migrations for chats, chat messages, chat
  providers, and chat model configs
- Add RBAC policies and dbauthz enforcement for chat resources
- Add Agents UI pages with conversation timeline, queued messages
  list, diff viewer, and model configuration panel
- Add comprehensive test coverage including coderd integration tests,
  chatd unit tests, and Storybook stories
- Gate feature behind experiments flag

---------

Co-authored-by: Cian Johnston <cian@coder.com>
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jeremy Ruppel <jeremy@coder.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 16:50:56 +00:00
Kacper Sawicki 1e274063d4 feat(coderd): filter expired API tokens server-side (#22263)
## Summary

Moves expired token filtering from client-side to server-side by adding
an `include_expired` parameter to the `GetAPIKeysByLoginType` and
`GetAPIKeysByUserID` database queries. This is more efficient for large
deployments with many expired/short-lived tokens.

## Changes

- Add `include_expired` parameter to SQL queries using `OR`
short-circuit
- Add `include_expired` query parameter to `GET
/users/{user}/keys/tokens`
- Add `IncludeExpired` field to `codersdk.TokensFilter`
- Remove client-side filtering from CLI `tokens list` command
- Add `TestTokensFilterExpired` test

Fixes coder/internal#1357
2026-02-24 15:27:03 +00:00
Danielle Maywood 911d734df9 fix: avoid re-using AuthInstanceID for sub agents (#22196)
Parent agents were re-using AuthInstanceID when spawning child agents.
This caused GetWorkspaceAgentByInstanceID to return the most recently
created sub agent instead of the parent when the parent tried to refetch
its own manifest.

Fix by not reusing AuthInstanceID for sub agents, and updating
GetWorkspaceAgentByInstanceID to filter them out entirely.
2026-02-19 16:56:29 +00:00
Danielle Maywood 92a6d6c2c0 chore: remove unnecessary loop variable captures (#22180)
Since Go 1.22, the loop variable capture issue is resolved. Variables
declared by for loops are now per-iteration rather than per-loop, making
the 'v := v' pattern unnecessary.
2026-02-19 09:02:19 +00:00
Paweł Banaszewski 90c11f3386 feat: add client column to aibridge_interceptions table (#21839)
Adds `client` column to `aibridge_interceptions` table. It is set accordingly to what is passed from AI Bridge in `RecordInterception`.
Adds interception filtering by `client` value.

Depends on: https://github.com/coder/aibridge/pull/158
Updates aibridge library to include this change.

Fixes: https://github.com/coder/aibridge/issues/31
2026-02-17 15:43:02 +01:00
George K be94af386c chore(coderd/database): enforce workspace ACL JSON object constraints (#22019)
The constraints prevent faulty code from saving 'null' as JSON and breaking the `workspaces_expanded` view.
2026-02-10 16:17:29 -08:00
Mathias Fredriksson 96695edfed fix(coderd/database): correct task pending status logic (#21886)
Previously, tasks with pending provisioner jobs (not yet picked up)
were incorrectly reported as "initializing".

Refs #21887
2026-02-05 14:08:03 +02:00
Cian Johnston 91be688e39 chore(coderd/database): remove deprecated db2sdk.List(Lazy)? methods (#21902)
Removes deprecated methods db2sdk.List and db2sdk.ListLazy.
2026-02-03 17:52:07 +00:00
Mathias Fredriksson f75cbab6ce fix(coderd/database): prevent AcquireProvisionerJob from grabbing canceled jobs (#21852)
The AcquireProvisionerJob query only checked started_at IS NULL, allowing
it to acquire jobs that were canceled while pending (which have
completed_at set but started_at still NULL).

Added completed_at IS NULL check to the query to prevent this.

Also fixed JobCompleteBuilder.Do() in dbfake to set started_at when
completing jobs to match production behavior.

Fixes coder/internal#1323
2026-02-03 10:42:17 +02:00
Danielle Maywood 37aecda165 feat(coderd/provisionerdserver): insert sub agent resource (#21699)
Update provisionerdserver to handle the changes introduced to
provisionerd in https://github.com/coder/coder/pull/21602

We now create a relationship between `workspace_agent_devcontainers` and
`workspace_agents` with the newly created `subagent_id`.
2026-01-30 17:19:19 +00:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
George K 0712faef4f feat(enterprise): implement organization "disable workspace sharing" option (#21376)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.

This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.

Closes https://github.com/coder/internal/issues/1073 (part 2)

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2026-01-14 09:47:50 -08:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Danielle Maywood f45a179181 test: move context to after db creation (#21224)
Closes  https://github.com/coder/internal/issues/1040

We move the context to just before it is used to avoid the scenario
where NewDB takes a while to spin up and runs up the context to the
deadline.
2025-12-11 21:51:16 +00:00
George K da71e546bb chore: fix test errors on newer debian-based systems due to deprecated TZ (#21115)
It appears on newer Debian systems `Canada/Newfoundland` TZ is not
present and `America/St_Johns` should be used instead. Coder tests use a
docker PG image where `Canada/Newfoundland` is still supported:

```
$ docker run --rm -it us-docker.pkg.dev/coder-v2-images-public/public/postgres:17 bash
root@ca99e82721dc:/# ls -l /usr/share/zoneinfo/Canada/Newfoundland
lrwxrwxrwx 1 root root 19 Mar 26  2025 /usr/share/zoneinfo/Canada/Newfoundland -> ../America/St_Johns
```
However, if a local PG instance is running on a Debian Trixie host,
coder test will use it and error out due to the zone being unavailable:

```
$ docker run --rm -it debian:trixie bash
root@f285092767e4:/# ls -l /usr/share/zoneinfo/Canada/Newfoundland
ls: cannot access '/usr/share/zoneinfo/Canada/Newfoundland': No such file or directory
root@f285092767e4:/# ls -l /usr/share/zoneinfo/America/St_Johns
-rw-r--r-- 1 root root 3655 Aug 24 20:12 /usr/share/zoneinfo/America/St_Johns
```

... which causes the tests to error out:

```
$ go test ./enterprise/coderd
--- FAIL: TestWorkspaceTemplateParamsChange (0.13s)
    workspaces_test.go:3097: TestWorkspaceTagsTerraform: using cached terraform providers
    workspaces_test.go:3097: Set TF_CLI_CONFIG_FILE=/home/geo/.cache/coderv2-test/terraform_workspace_tags_test/a28ed341dee8/terraform.rc
    coderdenttest.go:84:
        	Error Trace:	/home/geo/coder/coderd/database/dbtestutil/db.go:161
        	            				/home/geo/coder/coderd/database/dbtestutil/db.go:122
        	            				/home/geo/coder/coderd/coderdtest/coderdtest.go:270
        	            				/home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:105
        	            				/home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:84
        	            				/home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:84
        	            				/home/geo/coder/enterprise/coderd/workspaces_test.go:3103
        	Error:      	Received unexpected error:
        	            	pq: invalid value for parameter "TimeZone": "Canada/Newfoundland"
        	Test:       	TestWorkspaceTemplateParamsChange
        	Messages:   	failed to set timezone for database
...
```

This commit replaces the problematic TZ with the canonical one.
2025-12-10 08:09:13 -08:00
Steven Masley cefe07d074 feat: purge expired api keys in dbpurge (#20863)
closes https://github.com/coder/coder/issues/19889

This is in response to a migration in v2.27 that takes very long on deployments with large `api_key` tables.
2025-11-24 10:24:32 -06:00
Mathias Fredriksson 1483fd11ff fix(coderd/database): improve task status in tasks_with_status view (#20683)
This change restructures the `tasks_with_status` view query to:

- Improve debuggability by adding a `status_debug` column to better
understand the outcome
- Reduce clutter from `bool_or`, `bool_and` which are aggregate
functions that did not actually have serve a purpose (each join is 0-1
rows)
- Improve agent lifecycle state coverage, `start_timeout` and
`start_error` were omitted
- These states are easy to trigger even in a perfectly functioning
workspace/task so we now rely on app health to report whether or not
there was an issue
- Mark canceling and canceled workspace build jobs as error state
- Agent stop states were implicitly `unknown`, now there are explicit (I
initially considered `error`, could go either way)
2025-11-14 19:52:26 +02:00
Mathias Fredriksson a1fa58ac17 fix: update dbgen and dbfake task creation and toolsdk test fixtures (#20508)
Depends on #20506
Fixes coder/internal#1103
2025-10-28 14:15:58 +02:00
Paweł Banaszewski 50ba223aa1 feat: add db query for setting interception ended_at field (#20437)
Adds UpdateAIBridgeInterceptionEnded query to mark interceptions as
done.
Needed for https://github.com/coder/internal/issues/1051
2025-10-27 09:51:37 +01:00
Mathias Fredriksson 5c802c2627 feat(coderd): use task data model when creating a new task (#20275)
Updates coder/internal#976
2025-10-23 19:12:09 +03:00
Hugo Dutka e62c5db678 chore: remove references to dbtestutil.WillUsePostgres (#20436)
Addresses https://github.com/coder/internal/issues/758.

This PR only cleans up dead code, it makes no changes to test logic.
2025-10-23 14:24:54 +02:00
Cian Johnston dc6e50d6b7 feat(coderd/telemetry): add telemetry for database Tasks (#20279)
Adds Tasks to telemetry snapshots

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2025-10-17 10:48:56 +01:00
Mathias Fredriksson 82945cfb16 fix(coderd/database): add missing columns to tasks with status (#20311)
Updates coder/internal#976
2025-10-15 16:34:33 +00:00
Cian Johnston 9f229370e7 feat(coderd/database): add ListTasks query (#20282)
Relates to https://github.com/coder/internal/issues/981

Adds a `ListTasks` query that allows filtering by OwnerID and OrganizationID.
2025-10-14 17:33:30 +01:00
Mathias Fredriksson 5dc57da6b4 fix(coderd/database): ensure task name uniqueness (#20236)
This change ensures task names are unique per user the same way we do
for workspaces. This ensures we don't create tasks that are impossible
to start due to another task being named the same creating a workspace
name conflict.

Updates coder/internal#948
Supersedes coder/coder#20212
2025-10-13 12:42:38 +03:00
Mathias Fredriksson 952c69f412 feat(coderd/database): add task status and status view (#20235)
This change updates the `task_workspace_apps` table structure for
improved linking to workspace builds and adds queries to manage tasks
and a view to expose task status.

Updates coder/internal#948
Supersedes coder/coder#20212
Supersedes coder/coder#19773
2025-10-13 12:25:58 +03:00
Dean Sheather 39bf3ba628 chore: replace GetManagedAgentCount query with aggregate table (#19636)
- Removes GetManagedAgentCount query
- Adds new table `usage_events_daily` which stores aggregated usage
events by the type and UTC day
- Adds trigger to update the values in this table when a new row is
inserted into `usage_events`
- Adds a migration that adds `usage_events_daily` rows for existing data
in `usage_events`
- Adds tests for the trigger
- Adds tests for the backfill query in the migration

Since the `usage_events` table is unreleased currently, this migration
will do nothing on real deployments and will only affect preview
deployments such as dogfood.

Closes https://github.com/coder/internal/issues/943
2025-08-30 03:39:37 +10:00
Steven Masley ef0d74fb75 chore: improve performance of 'GetLatestWorkspaceBuildsByWorkspaceIDs' (#19452)
Closes https://github.com/coder/internal/issues/716

This prevents a scan over the entire `workspace_build` table by removing
a `join`. This is still imperfect as we are still scanning over the
number of builds for the workspaces in the arguments. Ideally we would
have some index or something precomputed. Then we could skip scanning
over the builds for the correct workspaces that are not the latest.
2025-08-26 09:26:11 -05:00
Rafael Rodriguez ad5e6785f4 feat: add filtering options to provisioners list (#19378)
## Summary

In this pull request we're adding support for additional filtering
options to the `provisioners list` CLI command and the
`/provisionerdaemons` API endpoint.

Resolves: https://github.com/coder/coder/issues/18783

### Changes

#### Added CLI Options

- `--show-offline`: When this option is provided, all provisioner
daemons will be returned. This means that when `--show-offline` is not
provided only `idle` and `busy` provisioner daemons will be returned.
- `--status=<list_of_statuses>`: When this option is provided with a
comma-separated list of valid statuses (`idle`, `busy`, or `offline`)
only provisioner daemons that have these statuses will be returned.
- `--max-age=<duration>`: When this option is provided with a valid
duration value (e.g., `24h`, `30s`) only provisioner daemons with a
`last_seen_at` timestamp within the provided max age will be returned.

#### Query Params

- `?offline=true`: Include offline provisioner daemons in the results.
Offline provisioner daemons will be excluded if `?offline=false` or if
offline is not provided.
- `?status=<list_of_statuses>`: Include provisioner daemons with the
specified statuses.
- `?max_age=<duration>`: Include provisioner daemons with a
`last_seen_at` timestamp within the max age duration.

#### Frontend

- Since offline provisioners will not be returned by default anymore
(`--show-offline` has to be provided to see them), a checkbox was added
to the provisioners list page to allow for offline provisioners to be
displayed
- A revamp of the provisioners page will be done in:
https://github.com/coder/coder/issues/17156, this checkbox change was
just added to maintain currently functionality with the backend updates

Current provisioners page (without checkbox)

<img width="1329" height="574" alt="Screenshot 2025-08-20 at 10 51
00 AM"
src="https://github.com/user-attachments/assets/77b73650-0b62-44f0-a77f-acbe5710809f"
/>

Provisioners page with checkbox (unchecked)

<img width="1314" height="626" alt="Screenshot 2025-08-20 at 10 48
40 AM"
src="https://github.com/user-attachments/assets/7ba164ad-6d3f-417b-bd39-338c0161b145"
/>

Provisioner page with checkbox (checked) and URL updated with query
parameters

<img width="1306" height="597" alt="Screenshot 2025-08-20 at 10 50
14 AM"
src="https://github.com/user-attachments/assets/e78d0986-bbf8-491b-9d56-b682973237a0"
/>

### Show Offline vs Offline Status

To list offline provisioner daemons, users can either:

1. Include the `--show-offline` option

OR

2. Include `offline` in the list of values provided to the `--status`
option
2025-08-21 16:03:34 -04:00
Callum Styan bcdade7d8c fix: add database constraint to enforce minimum username length (#19453)
Username length and format, via regex, are already enforced at the
application layer, but we have some code paths with database queries
where we could optimize away many of the DB query calls if we could be
sure at the database level that the username is never an empty string.

For example: https://github.com/coder/coder/pull/19395

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-21 07:56:41 -07:00
Ethan 791d39c261 test(coderd/database): use seperate context for subtests to fix flake (#19330)
Fixes flakes like
https://github.com/coder/coder/actions/runs/16927282256/job/47965470039

https://coder.com/blog/go-testing-contexts-and-t-parallel

...I'm going to take a stab at turning this into a lint rule. I think
it's possible by just reading the AST?
2025-08-13 14:45:35 +10:00
Yevhenii Shcherbina c65996a041 feat: add user_secrets table (#19162)
Closes https://github.com/coder/internal/issues/780

## Summary of changes:
- added `user_secrets` table
- `user_secrets` table contains `env_name` and `file_path` fields which
are not used at the moment, but will be used in later PRs
- `user_secrets` table doesn't contain `value_key_id`, I will add it in
a separate migration in a dbcrypt PR
- on one hand I don't want to add fields which are not used (because
it's a risk smth may change in implementation later), on the other hand
I don't want to add too many migrations for user secrets table
- added unique sql indexes
- added sql queries for CRUD operations on user-secrets
- introduced new `ResourceUserSecret` resource
- basic unit-tests for CRUD ops and authorization behavior
- Role updates:
  - owner:
    - remove `ResourceUserSecret` from site-wide perms
    - add `ResourceUserSecret` to user-wide perms
   - orgAdmin
- remove `ResourceUserSecret` from org-wide perms; seems it's not
strictly required, because `ResourceUserSecret` is not tied to
organization in dbauthz wrappers?
   - memberRole
- no need to change memberRole because it implicitly has access to
user-secrets thanks to the `allPermsExcept`
   - is it enough changes to roles?
   
Main questions:
- [ ] We will have 2 migrations for user-secrets:
  - initial migration (in current PR)
  - adding `value_key_id` in dbcrypt PR
  - is this approach reasonable?
- [ ] Are changes to roles's permissions are correct?
- [ ] Are changes in roles_test.go are correct?

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2025-08-07 15:58:59 -04:00
Dean Sheather dc598856e3 chore: improve build deadline code (#19203)
- Adds/improves a lot of comments to make the autostop calculation code
clearer
- Changes the behavior of the enterprise template schedule store to
match the behavior of the workspace TTL endpoint when the new TTL is
zero
- Fixes a bug in the workspace TTL endpoint where it could unset the
build deadline, even though a max_deadline was specified
- Adds a new constraint to the workspace_builds table that enforces the
deadline is non-zero and below the max_deadline if it is set
- Adds CHECK constraint enum generation to scripts/dbgen, used for
testing the above constraint
- Adds Dean and Danielle as CODEOWNERS for the autostop calculation code
2025-08-07 11:00:31 +10:00
Ethan 5c1bf1d46c test(coderd/database): use seperate context for subtests to fix flake (#19029)
Fixes flakes like https://github.com/coder/coder/actions/runs/16487670478/job/46615625141, caused by the issue described in https://coder.com/blog/go-testing-contexts-and-t-parallel

It'd be cool if we could lint for this? That a context from an outer test isn't used in a subtest if that subtest calls `t.Parallel`.
2025-07-24 20:07:54 +10:00
Cian Johnston c4b69bbe63 fix: prioritise human-initiated builds over prebuilds (#18933)
Continues from https://github.com/coder/coder/pull/18882

- Reverts extraneous changes
- Adds explicit `ORDER BY initiator_id = $PREBUILDS_USER_ID` to
`AcquireProvisionerJob`
- Improves test added for above PR

---------

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: kylecarbs <7122116+kylecarbs@users.noreply.github.com>
2025-07-22 13:03:50 +01:00
Ethan 7c077d39c5 chore: populate connectionlog count using a separate query (#18629)
This is the third PR for moving connection events out of the audit log.

This PR populates `count` on `ConnectionLogResponse` using a separate query, to preemptively mitigate the issue described in #17689. It's structurally identical to a portion of https://github.com/coder/coder/pull/18600, but for the connection log instead of the audit log.
       
Future PRs:
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature
2025-07-15 15:03:30 +10:00
Ethan 7a339a1ffe feat: add connectionlogs API (#18628)
This is the second PR for moving connection events out of the audit log.

This PR:
- Adds the `/api/v2/connectionlog` endpoint
- Adds filtering for `GetAuthorizedConnectionLogsOffset` and thus the endpoint. 
There's quite a few, but I was aiming for feature parity with the audit log.
  1. `organization:<id|name>`
  2. `workspace_owner:<username>`
  3. `workspace_owner_email:<email>`
  4. `type:<ssh|vscode|jetbrains|reconnecting_pty|workspace_app|port_forwarding>`
  5. `username:<username>` 
     - Only includes web-based connection events (workspace apps, web port forwarding) as only those include user metadata.
  6. `user_email:<email>`
  7. `connected_after:<time>`
  8. `connected_before:<time>`
  9. `workspace_id:<id>`
  10. `connection_id:<id>`
      - If you have one snapshot of the connection log, and some sessions are ongoing in that snapshot, you could use this filter to check if they've been closed since.
  11. `status:<connected|disconnected>`
       - If `connected` only sessions with a null `close_time` are returned, if `disconnected`, only those with a non-null `close_time`. If filter is omitted, both are returned.
       
Future PRs:
- Populate `count` on `ConnectionLogResponse` using a seperate query (to preemptively mitigate the issue described in #17689)
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature (including these filters)
2025-07-15 14:55:34 +10:00
Ethan 08e17a07fc chore!: route connection logs to new table (#18340)
### Breaking Change (changelog note):
> User connections to workspaces, and the opening of workspace apps or ports will no longer create entries in the audit log. Those events will now be included in the 'Connection Log'.
Please see the 'Connection Log' page in the dashboard, and the Connection Log [documentation](https://coder.com/docs/admin/monitoring/connection-logs) for details. Those with permission to view the Audit Log will also be able to view the Connection Log. The new Connection Log has the same licensing restrictions as the Audit Log, and requires a Premium Coder deployment.

### Context

This is the first PR of a few for moving connection events out of the audit log, and into a new database table and web UI page called the 'Connection Log'.

This PR:
- Creates the new table
- Adds and tests queries for inserting and reading, including reading with an RBAC filter.
- Implements the corresponding RBAC changes, such that anyone who can view the audit log can read from the table
- Implements, under the enterprise package, a `ConnectionLogger` abstraction to replace the `Auditor` abstraction for these logs. (No-op'd in AGPL, like the `Auditor`)
- Routes SSH connection and Workspace App events into the new `ConnectionLogger`
- Updates all existing tests to check the values of the `ConnectionLogger` instead of the `Auditor`.

Future PRs:
- Add filtering to the query
- Add an enterprise endpoint to query the new table
- Write a query to delete old events from the audit log, call it from dbpurge.
- Implement a table in the Web UI for viewing connection logs.


> [!NOTE]
> The PRs in this stack obviously won't be (completely) atomic. Whilst they'll each pass CI, the stack is designed to be merged all at once. I'm splitting them up for the sake of those reviewing, and so changes can be reviewed as early as possible.  Despite this, it's really hard to make this PR any smaller than it already is. I'll be keeping it in draft until it's actually ready to merge.
2025-07-15 14:36:06 +10:00
Cian Johnston 0367dbac43 chore: optimize GetPrebuiltWorkspaces query (#18717)
* Adds GetRunningPrebuiltWorkspacesOptimized query
* Runs both original and updated query side-by-side and logs diffs
2025-07-09 11:30:42 +01:00
Hugo Dutka 3c2f3d640b chore: remove dbmem (#18803)
Remove the in-memory database. Addresses #15109.
2025-07-09 09:46:31 +02:00