Adds a `sharing add` command for sharing Workspaces with other users and
groups.
The command allows sharing with multiple users, and groups within one
command as well as specifying the role (`use`, or `admin`) defaulting to
`use` if none is specified.
In the current implementation when the command completes we show the
user the current state of the workspace ACL.
```
$ coder sharing add apricot-catfish-86 --user=member:admin --group=contractors:use
USER GROUP ROLE
member - admin
member contractors use
```
If a user is a part of multiple groups, or the workspace has been
individually shared with them they will show up multiple times. Although
this is a bit confusing at first glance it's important to be able to
tell what the maximum role a user may have, and via what ACL they have
it.
---
One piece of UX to consider is that in order to be able to share a
Workspace with a user they must have a role that can read that user. In
the tests we give the user the `ScopedRoleOrgAuditor` role.
Closes
[coder/internal#859](https://github.com/coder/internal/issues/859)
see https://github.com/coder/internal/issues/959 but the tl; dr is:
- we call this DB query on an interval (every 15s) and it would be
called on each coderd replica as well
- the generated values update very infrequently (for our most used
internal template I saw the builds created/claimed update twice in a 1h
period)
- we have no index on the initiator ID, so this query has to scan the
entire workspace_builds table on every request
In reality this should likely just be a Prometheus metric, and
Prometheus can handle the counter reset behaviour at query time, but for
now this should at least cut the load of the query to 25% of it's
current impact.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
I noticed that our docs mention the possibility of using the
Tailscale-managed DERP server fleet.
https://github.com/coder/coder/pull/15901 changed the phrasing from
> However, Tailscale has graciously allowed us to use
to
> However, our Wireguard integration through Tailscale has graciously
allowed us to use
This change alters the original meaning of the sentence. AFAIK, the
original meant that we contacted Tailscale directly and asked if it
would be ok for our customers to use the Tailscale-managed DERP server
fleet, and Tailscale graciously agreed. The new phrasing conveys
something different. This PR reverts the phrasing to the original.
---------
Co-authored-by: david-fraley <67079030+david-fraley@users.noreply.github.com>
In trying to address confusion with the `-` (for stdin) directory flag last year, I had `template push` read from stdin if stdin was not a TTY. However, I made the mistake of checking if the directory flag was set or not by comparing it to the default value. This meant in something like GitHub Actions, where you don't have a TTY for stdin, it was impossible to read from the current working directory. The fix is just to check if the flag was explicitly set, using pflags.
If users encounter this bug, and this fix is unavailable in their version of Coder, they can workaround it by setting `-d "$(pwd)"`
This PR improves the ruleguard rule for detecting `t.Fail` calls in goroutines. It picks up additional violations, of which are fixed in this PR.
See self-review for details.
The motivation for fixing this comes from a flake I fixed in https://github.com/coder/coder/pull/19599, where tests would fail from a `require` in an `Eventually`.
The latest release of all `pg_dump` major versions, going back to 13,
started inserting `\restrict` `\unrestrict` keywords into dumps. This
currently breaks sqlc in `gen/dump` and our check migration script. Full
details of the postgres change are available here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=575f54d4c
To fix, we'll always use the `pg_dump` in our postgres 13.21 docker
image for schema dumps, instead of what's on the runner/local machine.
Coder doesn't restore from postgres dumps, so we're not vulnerable to
attacks that would be patched by the latest postgres version.
Regardless, we'll unpin ASAP.
Once sqlc is updated to handle these keywords, we need to start
stripping them when comparing the schema in the migration check script,
and then we can unpin the pg_dump version. This is being tracked at
https://github.com/coder/internal/issues/965
Updated toolsdk documentation link to the latest version.
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
fixes https://github.com/coder/internal/issues/958
Logging was being done at error level, but most likely any errors are from simple races between an update triggered around the same time as a client disconnecting. Debug is fine for these.
Fixes https://github.com/coder/internal/issues/933
Refactors CLI tests that check the `--auth` flag parsing for various public clouds into a unit test that just creates the agent Client and asserts on the type.
Testing that the agent client actually authenticates correctly with these auth types is well covered by Coderd tests, so we don't need to retread that ground here, and the deleted tests were flaky on Windows.
Refactors Agent instance identity to be a SessionTokenProvider.
Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.
This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.
Fixes#19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
Fixes: https://github.com/coder/internal/issues/950
Pretty sure the intention of the `hold` wait group is to try to get the two goroutines that the test starts running at the same time. But, that should be the case for two goroutines started anyway.
The use of `hold` doesn't actually guarantee concurrent execution of `Acquire`, just that both goroutines get as far as `Done()` --- the go scheduler could run them serially without incident.
So I've chosen to just remove the use of `hold` to simplify.
But, for posterity, the data race was due to incrementing by 1 in the loop along with the goroutine that calls Done. You could increment by 1 and then back down to 0 before the second iteration of the loop starts. This then causes a data race with calling `Wait()` in the first goroutine and `Add()` in the second iteration. c.f. https://pkg.go.dev/sync#WaitGroup.Add
Due to how we currently label a workspace as a task, there is a delay
between when a task workspace is created and when it is labelled as a
task.
This PR introduces fallback check for when a workspace does _not_ have
`HasAITask` set. This fallback check tests to see if the special "AI
Prompt" parameter is present in the workspace's build parameters.
* provisionerdserver: Expires prebuild user token for workspace, if it
exists, when regenerating session token.
* dbauthz: disallow prebuilds user from creating api keys
* dbpurge: added functionality to expire stale api keys owned by the
prebuilds user
Relates to https://github.com/coder/internal/issues/893
Instead of `coder task create <template> --input <input>`, it is now
`coder task create <input> --template <template>`.
If there is only one AI task template on the deployment, the
`--template` parameter can be omitted.
A Dependabot PR got blocked by a typo in a 2.10 changelog! I then noticed we're keeping these old changelogs (<= 2.10) around, even though we haven't been updating this directory for many months now.
I'm putting this PR up as I assume we want to delete those, it seems they'd be more confusing to users than anything. They're not referenced on the website nor in the docs manifest.json.
If I'm mistaken, and we do want to keep these, feel free to close this PR.
Got sick of seeing blink create duplicates, so I'm updating the prompt. To make it configurable without committing I'm making it a variable, here's what I've got:
> Investigate this CI failure. Check logs, and figure out what went wrong. Search for existing issues in coder/internal. If an issue for the CI failure does not exist already, create one ONLY in coder/internal. Do NOT create duplicate issues. Use title format \"flake: TestName\" for flaky tests, and assign them to the person from git blame.
If multiple tests fail with the reason `unknown`, the test process exited unexpectedly, perhaps due to a panic.
Once blink supports per-slack-channel contexts, i'll probably just set the variable to the empty string and use that instead.
This PR should resolve https://github.com/coder/internal/issues/719 by
limiting the `workspace_builds` rows selected by the query to the most
recent 100 builds of a template, as opposed to all builds in the last
30d. For our own internal templates with the most builds (1700-2000 in a
30d period) this should cut the query execution time by about 80%.
Unless we have some restriction on keeping the 30d period, contract
related or otherwise, this seems like a safe change to make. In addition
to the execution speed improvements it also means the memory for the
query is bounded as well.
If we want to keep a 30d time period for the avg build time value I
think it's worth exploring a purpose built solution such as histogram
structures where the build times could be bucketized by template ID as
they're observed.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
This test still flakes occasionally, see
https://github.com/coder/internal/issues/954#issuecomment-3237154735
The cause appears to be related to the assignment of `time.Now()` as the
`LastSeenAt` time when creating a provisioner which can flake with the
calculated scheduled next autostart and the code to set then
`require.Eventually` the updated provisioner LastSeenAt.
Instead we should simply calculate all time values for the stale portion
of the test based on the provisioners LastSeenAt value to avoid such
issues.
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Removes GetManagedAgentCount query
- Adds new table `usage_events_daily` which stores aggregated usage
events by the type and UTC day
- Adds trigger to update the values in this table when a new row is
inserted into `usage_events`
- Adds a migration that adds `usage_events_daily` rows for existing data
in `usage_events`
- Adds tests for the trigger
- Adds tests for the backfill query in the migration
Since the `usage_events` table is unreleased currently, this migration
will do nothing on real deployments and will only affect preview
deployments such as dogfood.
Closes https://github.com/coder/internal/issues/943
## Description
When creating a prebuilt workspace, both `flags.IsPrebuild` and
`flags.IsFirstBuild` are true. Previously, the logic rejected cases with
multiple flags, so `coderd_workspace_creation_duration_seconds` wasn’t
updated for prebuilt creations. This is the only valid scenario where
two flags can be true.
## Changes
* Fix logic to update `coderd_workspace_creation_duration_seconds`
metric for prebuilt workspaces.
* Add prebuild helper functions to coderdenttest (other prebuild tests
can reuse this).
* Update workspace's provisionerdmetric tests to include this metric.
Follow-up: https://github.com/coder/coder/pull/19503
Related to: https://github.com/coder/coder/issues/19528
Previously, if you had a new license that would start before the current
one fully expired, you would get a warning. Now, the license validity
periods are merged together, and a warning is only generated based on
the end of the current contiguous period of license coverage.
Closes#19498
Coder Tasks requires us to create a workspace, but we want to be able to
return a `codersdk.Task` instead of a `codersdk.Workspace`. This
requires untangling `createWorkspace` from directly writing to
`http.ResponseWriter`.
Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token.
The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token.
However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.
Relates to https://github.com/coder/internal/issues/888
As part of our renewed connection scaletesting efforts, we want to
scaletest coder in a scenario where direct connections aren't available
(relatively common for our customers), and all concurrent connections
are relayed via DERP.
This PR adds a flag, `--disable-direct` that can be included on the
existing`coder exp scaletest workspace-traffic -ssh` to disable direct
connections.
# Update dependencies: Tailscale and xz compression library
This PR updates two dependencies:
- Bumps our fork of Tailscale from
`v1.1.1-0.20250729141742-067f1e5d9716` to
`v1.1.1-0.20250829055033-3536204c8d21`
- Updates the xz compression library from `v0.5.12` to `v0.5.15`
The flake here had two causes:
1. related to usage of time.Now() in MustWaitForProvisionersAvailable
and
2. the fact that UpdateProvisionerLastSeenAt can not use a time that is
further in the past than the current LastSeenAt time
Previously the test here was calling
`coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now`
rather than the next tick time like the real `hasProvisionersAvailable`
function does. Additionally, when using `UpdateProvisionerLastSeenAt`
the underlying db query enforces that the time we're trying to set
`LastSeenAt` to cannot be older than the current value.
I was able to reliably reproduce the flake by executing both the
`UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own
goroutines, the former with a small sleep to reliably ensure we'd
trigger the autobuild before we set the `LastSeenAt` time. That's when I
also noticed that `coderdtest.MustWaitForProvisionersAvailable` was
using `time.Now` instead of the tick time. When I updated that function
to take in a tick time + added a 2nd call to
`UpdateProvisionerLastSeenAt` to set an original non-stale time, we
could then never get the test to pass because the later call to set the
stale time would not actually modify `LastSeenAt`. On top of that,
calling the provisioner daemons closer in the middle of the function
doesn't really do anything of value in this test.
**The fix for the flake is to keep the go routines, ensuring there would
be a flake if there was not a relevant fix, but to include the fix which
is to ensure that we explicitly wait for the provisioner to be stale
before passing the time to `tickCh`.**
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Addresses comment raised on previous PR
https://github.com/coder/coder/pull/19619#discussion_r2307943410
We know we can skip sub agents when searching for which agent is related
to the task, as this is not an explicitly supported feature at the
moment. When we come to properly setting up a Task -> Agent relationship
this limitation will be dropped.
The coder-login module was recently updated to set environment variables
instead of running `coder login`.
This unfortunately broke `develop.sh`:
```
Encountered an error running "coder login", see "coder login --help" for more information
error: Trace=[create api key: ]
```
Unsetting these env vars so that they do not interfere.
Closes https://github.com/coder/internal/issues/949
Adds the following fields to `codersdk.Task`
- OwnerName
- TemplateName
- TemplateDisplayName
- TemplateIcon
- WorkspaceAgentID
- WorkspaceAgentLifecycle
- WorkspaceAgentHealth
The implementation is unfortunately not compatible with multiple agents
as we have no reliable way to tell which agent has the AI task running
in it. For now we just pick the first agent found, but in the future
this will need to be changed.
This pull request makes a minor update to an external documentation link
in the `OverviewPageView` component. The change ensures that users are
directed to the correct reference section for CLI server experiments.
* Updated the `href` attribute in the documentation link to point to
`https://coder.com/docs/reference/cli/server#--experiments` instead of
the previous URL, improving the accuracy of the reference for users.