Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent
mixing atomic and non-atomic access on the same value, guarantee 64-bit
alignment on 32-bit platforms, and provide a cleaner API.
This PR does three things:
- Exports derp expvars to the pprof endpoint
- Exports the expvar metrics as prometheus metrics in both coderd and
wsproxy
- Updates our tailscale to a fix I also had to make to avoid a data race
condition
I generated this with mux but I also manually tested that the metrics
were getting properly emitted
- Previously all tests were sharing the global http.Transport meaning on
`Close` it would close connections presumed to be idle for other tests.
fixes https://github.com/coder/internal/issues/112
Since Go 1.22, the loop variable capture issue is resolved. Variables
declared by for loops are now per-iteration rather than per-loop, making
the 'v := v' pattern unnecessary.
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
Closes#19984
As part of this, I refactored the error template to take in a slice of
actions rather than using individual booleans and strings to control the
behavior.
We decided a link resolves the issue for now so that is what I added,
although we may want to consider a way to start the workspace and follow
the logs dynamically on that page and then show the app when finished
(similar to the tasks page), or at least make the link automatically
start the workspace instead of only taking you to the dashboard where
you have to then start the workspace.
This is to debug context timeouts on API requests to the agent.
Because rbac and database cannot be imported in slim, split the logger
middleware into slim and non-slim versions and break out the recovery
middleware.
- Attempts pings twice per replicasync callback in wsproxy
- Reworks the test setup code to be more lenient and retry proxy
registration on failure
Closescoder/internal#957
This PR makes the initial steps at removing usage of the global Go HTTP
client, which was seen to have impacts on test flakiness in
https://github.com/coder/internal/issues/1020. The first commit removes
uses from tests, with the exception of one test that is tightly coupled
to the default client. The second commit makes easy/low-risk removals
from application code. This should have some impact to reduce test flakiness.
There is currently an issue with subdomain workspace apps on workspace
proxies, where if you have a workspace proxy wildcard nested beneath the
primary wildcard, cookies from the primary may be sent to the server
before cookies from the proxy specifically.
Currently:
1. Use a subdomain app via the primary proxy `*.coder.corp.com`
a. Client sends no cookies
a. Server does token smuggling flow
a. Server sets a cookie `coder_subdomain_app_session_token` on
`*.coder.corp.com`
a. Server redirects client to reload the page
a. Request should succeed as usual
1. Wait until the primary proxy's session token cookie has expired in
the database (or make it invalid yourself)
1. Use a subdomain app via a separate proxy `*.sydney.coder.corp.com`
a. Client sends `coder_subdomain_app_session_token` cookie from
`*.coder.corp.com`
a. Server validates supplied cookie, it fails because it's expired
a. Server does token smuggling flow
a. Server sets a cookie `coder_subdomain_app_session_token` on
`*.sydney.coder.corp.com`
a. Server redirects client to reload page
a. Client sends BOTH cookies.
a. The server will only process the first cookie it receives, so if the
expired cookie for the primary proxy is sent first the request will end
up in a permanent loop on step b.
The fix is to append `_{hash(wildcard_access_url)}` to the subdomain
cookies as we cannot control browser behavior further. This avoids the
conflict as each proxy will only read it's specific cookie.
Breaking API Change:
> The presence of the `ip` field on `codersdk.ConnectionLog` cannot be
guaranteed, and so the field has been made optional. It may be omitted
on API responses.
When running a scaletest, I noticed logs of the form:
```
2025-09-12 06:34:10.924 [erro] coderd.workspaceapps: upsert connection log failed trace=0xa17580 span=0xa17620 workspace_id=81b937d7-5777-4df5-b5cb-80241f30326f agent_id=78b2ff6d-b4a6-4a4e-88a7-283e05455a88 app_id=00000000-0000-0000-0000-000000000000 user_id=00000000-0000-0000-0000-000000000000 user_agent="" app_slug_or_port=terminal status_code=404 request_id=67f03cf8-9523-444a-97bc-90de080a54c8 ...
error= 1 error occurred:
* pq: null value in column "ip" of relation "connection_logs" violates not-null constraint
```
to ensure logs are never omitted from the connection log due to a
missing IP again (i.e. I'm not sure if we can always rely on a valid,
parseable, IP from `(http.Request).RemoteAddr`), I've removed the `NOT
NULL` constraint on `ip` on `connection_logs`, and made `ip` on the API
response optional.
The specific cause for these null IPs was the
`/workspaceproxies/me/issue-signed-app-token [post]` endpoint
constructing it's own `http.Request` without a `RemoteAddr` set, and
then passing that to the token issuer.
To solve this, we'll have workspace proxies send the real IP of the
client when calling `/workspaceproxies/me/issue-signed-app-token [post]`
via the header `Coder-Workspace-Proxy-Real-IP`.
Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token.
The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token.
However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.
Fixes https://github.com/coder/internal/issues/907
We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
Closes https://github.com/coder/internal/issues/884
We're adding this as a `go run` in `lint/go` for now, since adding it to
golangci-lint ourselves involves recompiling golangci-lint and then
running that new binary. I'll look into proposing it being added to the
public golangci-lint linters.
Doesn't appear to cause the lint ci job to take any longer, which is
nice.
PProf labels segment the code into groups for determing the source of
cpu/memory profiles. Since the web server and background jobs share a
lot of the same code (eg wsbuilder), it helps to know if the load is
user induced, or background job based.
Solves https://github.com/coder/coder/issues/15096
This is a slight rework/refactor of the earlier PRs from @dannykopping
and @Emyrk:
- https://github.com/coder/coder/pull/15669
- https://github.com/coder/coder/pull/15684
- https://github.com/coder/coder/pull/17596
Rather than having a per-app CORS behaviour setting and additionally a
template level setting for ports, this PR adds a single template level
CORS behaviour setting that is then used by all apps/ports for
workspaces created from that template.
The main changes are in `proxy.go` and `request.go` to:
a) get the CORS behaviour setting from the template
b) have `HandleSubdomain` bypass the CORS middleware handler if the
selected behaviour is `passthru`
c) in `proxyWorkspaceApp`, do not modify the response if the selected
behaviour is `passthru`
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for configuring CORS behavior ("simple" or "passthru")
at the template level for all shared ports.
* Introduced a new "CORS Behavior" setting in the template creation and
settings forms.
* API endpoints and responses now include the optional `cors_behavior`
property for templates.
* Workspace apps and proxy now honor the specified CORS behavior,
enabling conditional CORS middleware application.
* Enhanced workspace app tests with comprehensive scenarios covering
CORS behaviors and authentication states.
* **Bug Fixes**
* None.
* **Documentation**
* Updated API and admin documentation to describe the new
`cors_behavior` property and its usage.
* Added examples and schema references for CORS behavior in relevant API
docs.
* **Tests**
* Extended automated tests to cover different CORS behavior scenarios
for templates and workspace apps.
* **Chores**
* Updated audit logging to track changes to the `cors_behavior` field on
templates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.
Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
closes#14730
Adds support for WorkspaceUpdates to the WebsocketDialer. This allows us to dial the new endpoint added in #14847 and connect it up to a `tailnet.Controllers` to connect to all agents over the tailnet.
I refactored the fakeWorkspaceUpdatesProvider to a mock and moved it to `tailnettest` so it could be more easily reused. The Mock is a little more full-featured.
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.
What happened is that we accidentally shipped two new API features without bumping the version. `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.
Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet. That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs. This isn't great, but hasn't caused us major issues because
1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported.
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.
Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.
To mitigate against this thing happening again, this PR also:
1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version. That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.
With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
chore of #14729
Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing. This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.
fixes https://github.com/coder/internal/issues/114
We need to wait for ServerTailnet goroutines to finish when closing down, otherwise we can race with the shutdown of coderd & the coordinator, which causes errors.
- Adds the database implementation for fetching and caching keys
used for JWT signing. It's been merged into the `keyrotate` pkg and
renamed to `cryptokeys` since they're coupled concepts.
* chore: move multi-org endpoints into enterprise directory
All multi-organization features are gated behind "premium" licenses. Enterprise licenses can no longer
access organization CRUD.
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.