fixes https://github.com/coder/internal/issues/1196
The above issue exposes two different bugs in Coder.
In the agent, there is a race where if the agent is closed while starting up networking, it will erroneously disconnect from Coderd, which delays or breaks writing final status and logs.
In Coderd, there is a bug where we don't properly record the latest agent disconnection time if the agent had previously disconnected. This causes us to report the agent status as "Connected" even after it has disconnected up until the inactivity timeout fires.
This PR fixes both issues.
It also slightly reworks when we send workspace updates based on connection and disconnection. Previously we would send two updates when the agent connected in certain circumstances, even though the status would be the same in both (only times changed). Now we universally only send one on connect, and then another on disconnect.
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Closes#20399
To summarize the original commit messages:
- Do not log stats to the database.
- Return errors on the insight endpoints.
- Update the frontend to show those errors.
- Also fixes an issue with getting the user status count via codersdk,
since I added a test to ensure it was not disabled by this flag and it
was sending the wrong payload.
Bumps [next](https://github.com/vercel/next.js) from 15.5.8 to 15.5.9.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.9</h2>
<p>Please see the <a
href="https://nextjs.org/blog/security-update-2025-12-11">Next.js
Security Update</a> for information about this security patch.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/vercel/next.js/commit/c5de33e93ccccaf3bee60cf50603e2152f9886e1"><code>c5de33e</code></a>
v15.5.9</li>
<li><a
href="https://github.com/vercel/next.js/commit/dd233994aeb24e906cdb9aedca5447cdef401792"><code>dd23399</code></a>
Backport <a
href="https://redirect.github.com/facebook/react/issues/35351">facebook/react#35351</a>
for 15.5.8 (<a
href="https://redirect.github.com/vercel/next.js/issues/87086">#87086</a>)</li>
<li>See full diff in <a
href="https://github.com/vercel/next.js/compare/v15.5.8...v15.5.9">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/coder/coder/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This pull request adds a new GitHub Actions workflow,
`code-review.yaml`, to automate AI-powered code review for pull
requests. The workflow creates a Coder Task that uses an AI agent to
analyze PR changes, review code quality, identify issues, and post
committable suggestions directly on the PR. The workflow can be
triggered by adding the "code-review" label or via manual dispatch.
Key additions and features:
**AI-Powered Code Review Workflow**
* Introduces `.github/workflows/code-review.yaml`, a comprehensive
workflow that triggers on PR labeling or manual dispatch to initiate an
AI-driven code review using Coder Tasks.
* The workflow includes steps to determine PR context, extract
repository info, build a detailed code review prompt with clear
instructions and examples, and submit the review as inline suggestions
using GitHub's native suggestion syntax.
Previously the GetTemplateVersionVariables query did not sort output,
relying on PostgreSQL on-disk ordering which is undeterministic.
Variables are now sorted by name because there is no alternative for
ordering.
Tests were adjusted to accommodate the new ordering, previously they
relied on data being written to disk in insert order.
Closes https://github.com/coder/internal/issues/1040
We move the context to just before it is used to avoid the scenario
where NewDB takes a while to spin up and runs up the context to the
deadline.
for #19974
`ShareIcon`, which returns a MUI tooltip, had these problems:
1. There was no Storybook story which rendered the "Open external URL"
tooltip
2. `AppLink` renders a button. If it has tooltip text, this button is
also a Radix `TooltipTrigger`. Since `ShareIcon` is rendered as a child
of the `AppLink` button, `ShareIcon`'s tooltip (a nested tooltip) was
never appearing. I did try turning `ShareIcon` into a Radix tooltip as
well, but I still couldn't get the nested tooltip's text to appear, and
nested tooltips are not very accessible. AFAIK there's no way to focus a
child element of a button.
I've deleted the separate `ShareIcon` component and moved `ShareIcon`'s
tooltip text to the bottom of `AppLink`'s tooltip:
## before
<img width="175" height="121" alt="image"
src="https://github.com/user-attachments/assets/ad17927e-c3d1-499b-83f8-a5832b777305"
/>
^The `UsersIcon` on the right is a MUI tooltip trigger, but it can't
receive focus. It's supposed to show the text "Shared with all
authenticated users" when focused
## after
<img width="228" height="121" alt="image"
src="https://github.com/user-attachments/assets/adc202c1-57cd-4a80-8f94-f7f32897d286"
/>
for #19974
The MUI tooltip inside `PaginationNavButton` was a controlled component.
This + the stateful logic inside `PaginationNavButtonCore` meant that
`showDisabledMessage` would never be set to true. I.e., the "You are
already on the first page" tooltip if on the first page and the "You are
already on the last page" tooltip if on the last page would never show
up.
The `PaginationNavButton`s gets disabled if we're at either the
first/last page, and disabled buttons can't receive focus, so there's no
way to open the MUI tooltips with keyboard navigation.
Removing the MUI tooltip + related props from `PaginationNavButton` has
no effect on my screen reader UX with macOS VoiceOver; it's entirely
unchanged
Since the failing test logs are gone, we can only guess at what went
wrong. Given our parallel test-suite, and that tests typically run slow
on Windows, it seems reasonable that the context timed out due to a
single context being responsbile for setup and two command executions.
This change fixes the issue by updating the context usage, if this flake
ever resurfaces, we can re-investigate.
Fixescoder/internal#770
## Summary
This adds configurable overload protection to the AI Bridge daemon to
prevent the server from being overwhelmed during periods of high load.
Partially addresses coder/internal#1153 (rate limits and concurrency
control; circuit breakers are deferred to a follow-up).
## New Configuration Options
| Option | Environment Variable | Description | Default |
|--------|---------------------|-------------|---------|
| `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` |
Maximum number of concurrent AI Bridge requests. Set to 0 to disable
(unlimited). | `0` |
| `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number
of AI Bridge requests per second. Set to 0 to disable rate limiting. |
`0` |
## Behavior
When limits are exceeded:
- **Concurrency limit**: Returns HTTP `503 Service Unavailable` with
message "AI Bridge is currently at capacity. Please try again later."
- **Rate limit**: Returns HTTP `429 Too Many Requests` with
`Retry-After` header.
Both protections are optional and disabled by default (0 values).
## Implementation
The overload protection is implemented as reusable middleware in
`coderd/httpmw/ratelimit.go`:
1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses
`APITokenFromRequest` to extract the authentication token, with fallback
to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic).
Falls back to IP-based rate limiting if no token is present. Includes
`Retry-After` header for backpressure signaling.
2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight
requests and reject when at capacity.
The middleware is applied in `enterprise/coderd/aibridge.go` via
`r.Group` in the following order:
1. Concurrency check (faster rejection for load shedding)
2. Rate limit check
**Note**: Rate limiting currently applies to all AI Bridge requests,
including pass-through requests. Ideally only actual interceptions
should count, but this would require changes in the aibridge library.
## Testing
Added comprehensive tests for:
- Rate limiting by auth token (Bearer token, X-Api-Key, no token
fallback to IP)
- Different tokens not rate limited against each other
- Disabled when limit is zero
- Retry-After header is set on 429 responses
- Concurrency limiting (allows within limit, rejects over limit,
disabled when zero)
Closes https://github.com/coder/internal/issues/1178
I verified the fix works by adding a `time.Sleep(100 *time.Millisecond)`
between the `CreateWorkspaceBuild` and`CancelWorkspaceBuild`
calls. Adding this reliably triggered the flake, and when I added the fix
the flake stopped happening.
When working on PRs, Claude Code was sometimes force pushing to
branches. This adds simple git workflow guidelines that emphasize proper
branch checkout and avoiding force pushes.
## Changes
Added git workflow section to `CLAUDE.md`, `AGENTS.md`, and
`.claude/docs/WORKFLOWS.md` with:
- Instructions to fetch, checkout, and pull before working on PR
branches
- Note to avoid `git push --force` unless explicitly requested
## Examples of force push behavior
Observed in recent PRs:
- PR #21148: 7 commits including merge commit from iterative changes
- PR #21150: 9 commits with multiple documentation iterations
- PR #21182: 4 commits with iterative fixes
- Force update on `feat/add-tasks-template-flag` branch:
`9bf7980b9...f98cf44f7`
The guidelines now make it clear to check out branches properly and push
normally.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This PR piggy backs on the agent API cached workspace added in an earlier PR to provide a fast path for avoiding `GetWorkspaceByAgentID` calls in dbauthz's `GetWorkspaceAgentByID`. This query is not the most expensive, but has a significant call volume at ~16 million calls per week.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Implements cmd+enter (Mac) / ctrl+enter (Windows/Linux) keyboard
shortcut for submitting tasks on the tasks page. Regular enter key still
creates new lines as expected.
Fixes#21179🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Added specific icons for common file types in the template file tree:
- **.sh files**: TerminalIcon (terminal/shell scripts)
- **.json files**: BracesIcon (JSON data files)
- **.yaml/.yml files**: FileCodeIcon (YAML configuration files)
<img width="260" height="290" alt="image"
src="https://github.com/user-attachments/assets/a470f4bc-fc2c-4e2d-8067-a9fbbfe32e42"
/>
These icons help users quickly identify file types at a glance in the
template editor.
It appears on newer Debian systems `Canada/Newfoundland` TZ is not
present and `America/St_Johns` should be used instead. Coder tests use a
docker PG image where `Canada/Newfoundland` is still supported:
```
$ docker run --rm -it us-docker.pkg.dev/coder-v2-images-public/public/postgres:17 bash
root@ca99e82721dc:/# ls -l /usr/share/zoneinfo/Canada/Newfoundland
lrwxrwxrwx 1 root root 19 Mar 26 2025 /usr/share/zoneinfo/Canada/Newfoundland -> ../America/St_Johns
```
However, if a local PG instance is running on a Debian Trixie host,
coder test will use it and error out due to the zone being unavailable:
```
$ docker run --rm -it debian:trixie bash
root@f285092767e4:/# ls -l /usr/share/zoneinfo/Canada/Newfoundland
ls: cannot access '/usr/share/zoneinfo/Canada/Newfoundland': No such file or directory
root@f285092767e4:/# ls -l /usr/share/zoneinfo/America/St_Johns
-rw-r--r-- 1 root root 3655 Aug 24 20:12 /usr/share/zoneinfo/America/St_Johns
```
... which causes the tests to error out:
```
$ go test ./enterprise/coderd
--- FAIL: TestWorkspaceTemplateParamsChange (0.13s)
workspaces_test.go:3097: TestWorkspaceTagsTerraform: using cached terraform providers
workspaces_test.go:3097: Set TF_CLI_CONFIG_FILE=/home/geo/.cache/coderv2-test/terraform_workspace_tags_test/a28ed341dee8/terraform.rc
coderdenttest.go:84:
Error Trace: /home/geo/coder/coderd/database/dbtestutil/db.go:161
/home/geo/coder/coderd/database/dbtestutil/db.go:122
/home/geo/coder/coderd/coderdtest/coderdtest.go:270
/home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:105
/home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:84
/home/geo/coder/enterprise/coderd/coderdenttest/coderdenttest.go:84
/home/geo/coder/enterprise/coderd/workspaces_test.go:3103
Error: Received unexpected error:
pq: invalid value for parameter "TimeZone": "Canada/Newfoundland"
Test: TestWorkspaceTemplateParamsChange
Messages: failed to set timezone for database
...
```
This commit replaces the problematic TZ with the canonical one.
Fixes#21145
The browser tab title for tasks was showing the machine-readable name
(e.g., `kyle/my-workspace.main`) instead of the user-friendly display
name (e.g., `Create Documentation`).
Changed `site/src/pages/TaskPage/TaskPage.tsx` to use
`task.display_name` for the page title. The `display_name` field is
always set by the backend (NOT NULL constraint, auto-generated if
empty), so no fallback is needed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This PR piggy backs on the agent API cached workspace added in earlier PRs to provide a fast path for avoiding `GetWorkspaceByID` calls in `GetLatestWorkspaceBuildByWorkspaceID` via injection of the workspaces RBAC object into the context. We can do this from the `agentConnectionMonitor` easily since we already cache the workspace.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
In this PR we're optimizing the `GetTemplateAppInsightsByTemplate` query
by pre-filtering out apps which do not have an active session during the
start/end time window.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Tracking issue here: https://github.com/coder/internal/issues/1009
To summarize, the current version of this query selects from
`workspace_agent_stats` twice. The expensive portion of this query is
the bitmap heap scan we have to do for each of these selects. We can
easily cut the cost of this query by 40-50% by cutting this down to a
single select, and using those rows for both sets of calculations.
Eliminating the heap scan itself would require a follow up PR to
introduce a new index. Blink helped with the rewrite of the query.
The current plan looks like this:
```
Nested Loop (cost=6101.64..6101.69 rows=1 width=64) (actual time=11.782..11.787 rows=1 loops=1)
-> Aggregate (cost=2996.17..2996.19 rows=1 width=32) (actual time=3.356..3.357 rows=1 loops=1)
-> Bitmap Heap Scan on workspace_agent_stats (cost=54.80..2992.86 rows=440 width=24) (actu
al time=0.346..2.927 rows=818 loops=1)
Recheck Cond: (created_at > (now() - '00:15:00'::interval))
Filter: (connection_median_latency_ms > '0'::double precision)
Rows Removed by Filter: 1070
Heap Blocks: exact=486
-> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..54.69 rows=1368 width
=0) (actual time=0.241..0.241 rows=1888 loops=1)
Index Cond: (created_at > (now() - '00:15:00'::interval))
-> Aggregate (cost=3105.47..3105.49 rows=1 width=32) (actual time=8.418..8.420 rows=1 loops=1)
-> Subquery Scan on a (cost=3060.95..3105.39 rows=7 width=32) (actual time=7.851..8.394 ro
ws=63 loops=1)
Filter: (a.rn = 1)
-> WindowAgg (cost=3060.95..3088.29 rows=1368 width=209) (actual time=7.850..8.382 r
ows=63 loops=1)
Run Condition: (row_number() OVER (?) <= 1)
-> Sort (cost=3060.93..3064.35 rows=1368 width=56) (actual time=7.836..8.036 r
ows=1888 loops=1)
Sort Key: workspace_agent_stats_1.agent_id, workspace_agent_stats_1.create
d_at DESC
Sort Method: quicksort Memory: 181kB
-> Bitmap Heap Scan on workspace_agent_stats workspace_agent_stats_1 (co
st=55.03..2989.67 rows=1368 width=56) (actual time=0.388..2.096 rows=1888 loops=1)
Recheck Cond: (created_at > (now() - '00:15:00'::interval))
Heap Blocks: exact=486
-> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..54.
69 rows=1368 width=0) (actual time=0.295..0.295 rows=1888 loops=1)
Index Cond: (created_at > (now() - '00:15:00'::interval))
Planning Time: 2.350 ms
Execution Time: 13.152 ms
(24 rows)
```
The new plan looks like this
```
Aggregate (cost=2966.96..2966.98 rows=1 width=64) (actual time=3.812..3.814 rows=1 loops=1)
-> WindowAgg (cost=2891.96..2916.94 rows=1250 width=88) (actual time=2.696..3.412 rows=1890 loop
s=1)
-> Sort (cost=2891.94..2895.06 rows=1250 width=80) (actual time=2.686..2.780 rows=1890 loo
ps=1)
Sort Key: workspace_agent_stats.agent_id, workspace_agent_stats.created_at DESC
Sort Method: quicksort Memory: 226kB
-> Bitmap Heap Scan on workspace_agent_stats (cost=50.11..2827.64 rows=1250 width=80
) (actual time=0.218..1.551 rows=1890 loops=1)
Recheck Cond: (created_at > (now() - '00:15:00'::interval))
Heap Blocks: exact=474
-> Bitmap Index Scan on idx_agent_stats_created_at (cost=0.00..49.80 rows=1250
width=0) (actual time=0.146..0.147 rows=1890 loops=1)
Index Cond: (created_at > (now() - '00:15:00'::interval))
Planning Time: 0.534 ms
Execution Time: 3.969 ms
(12 rows)
```
If we compare the results of the query they're similar enough that any
differences can be attributed to slightly different timestamps for
`now()` in the version of the query I am using to generate results for
comparison:
```
workspace_rx_bytes | workspace_tx_bytes | workspace_connection_latency_50 | workspace_connection_latency_95 | session_count_vscode | session_count_ssh | session_count_jetbrains | session_count_reconnecting_pty
--------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+--------------------------------
15263563 | 74555854 | 47.933 | 250.5522 | 239 | 59 | 3 | 3
(1 row)
workspace_rx_bytes | workspace_tx_bytes | workspace_connection_latency_50 | workspace_connection_latency_95 | session_count_vscode | session_count_ssh | session_count_jetbrains | session_count_reconnecting_pty
--------------------+--------------------+---------------------------------+---------------------------------+----------------------+-------------------+-------------------------+--------------------------------
15295819 | 74598410 | 47.933 | 250.5522 | 239 | 59 | 3 | 3
```
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
for #19974
Continuing the train of thought from
https://github.com/coder/coder/pull/20849#issuecomment-3560666271: it's
probably better to do away with a custom tooltip component that's only
used in `ResourcesChart`/`ScriptsChart`/`StagesChart` and only slightly
differs from our base tooltip
Add screenshots to the dev containers user guide:
- Running dev containers with sub-agents (index.md, working-with-dev-containers.md)
- Discovered dev containers with Start button (index.md)
- Outdated status with rebuild option (working-with-dev-containers.md)
- Display apps disabled (customizing-dev-containers.md)
Also deletes the outdated devcontainer-agent-ports.png.
Refs #21157