coder

mirror of https://github.com/coder/coder.git synced 2026-06-06 06:28:20 +00:00

Author	SHA1	Message	Date
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
Dean Sheather	b199eb1c38	fix: allow stops and deletes after breaching AI limit (#21186 ) Fixes a bug a customer encountered once they breached their limit. Adds a test.	2025-12-09 11:05:12 +00:00
blinkagent[bot]	b4be5bcfed	docs: fix swagger tags for license endpoints (#21101 ) ## Summary Change `@Tags` from `Organizations` to `Enterprise` for `POST /licenses` and `POST /licenses/refresh-entitlements` to match the `GET` and `DELETE` license endpoints which are already tagged as `Enterprise`. ## Problem The license API endpoints were inconsistently tagged in the swagger annotations: - `GET /licenses` → `Enterprise` ✓ - `DELETE /licenses/{id}` → `Enterprise` ✓ - `POST /licenses` → `Organizations` ✗ - `POST /licenses/refresh-entitlements` → `Organizations` ✗ This caused the POST endpoints to be documented in the [Organizations API docs](https://coder.com/docs/reference/api/organizations) instead of the [Enterprise API docs](https://coder.com/docs/reference/api/enterprise) where the other license endpoints live. ## Fix Simply updated the `@Tags` annotation from `Organizations` to `Enterprise` for both POST endpoints. This was an oversight from the original swagger docs addition in #5625 (January 2023). Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-12-05 15:27:22 +00:00
Marcin Tojek	9c7135a61d	chore: add license check for prebuilds (#20947 ) Related: https://github.com/coder/coder/pull/20864	2025-11-26 15:00:07 +01:00
Danielle Maywood	e7dbbcde87	fix: do not notify marked for deletion for deleted workspaces (#20937 ) Closes https://github.com/coder/coder/issues/20913 I've ran the test without the fix, verified the test caught the issue, then applied the fix, and confirmed the issue no longer happens. --- 🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code and then review by a human 👩	2025-11-26 09:23:16 +00:00
Danielle Maywood	c12303f0b2	fix: allow agents to be created on dormant workspaces (#20909 ) Closes https://github.com/coder/coder/issues/20711 We now allow agents to be created on dormant workspaces. I've ran the test with and without the change. I've confirmed that - without the fix - it triggers the "rbac: unauthorized" error.	2025-11-25 06:24:33 +00:00
Jake Howell	ca560d36ce	fix: remove inflight interceptions from aibridge returned values (#20852 ) Addresses [`aibridge#54`](https://github.com/coder/aibridge/issues/54) When querying against the values in the database for `/api/experimental/aibridge/interceptions` we found strange behaviour wherein there was interceptions that lacked prompting and other various fields we want. Generally this was as a result of the data not actually existing for these values (as they were inflight). The simple solution to this was to hide them if they didn't exist. This PR addresses that. --------- Co-authored-by: Danny Kopping <danny@coder.com>	2025-11-25 10:23:39 +11:00
Atif Ali	636408906f	chore(docs): standardize "AIBridge" to "AI Bridge" in documentation (#20831 )	2025-11-24 18:09:04 +05:00
Marcin Tojek	d004710a74	feat: add prebuild invalidation via last_invalidated_at timestamp (#20582 ) Updates #17917	2025-11-20 17:12:25 +01:00
blinkagent[bot]	48b8e22502	fix: add Windows stub for CacheTFProviders (#20840 ) Fixes https://github.com/coder/internal/issues/1119 ## Description The `CacheTFProviders` function in `testutil/terraform_cache.go` was only available on Linux and macOS due to the `//go:build linux \|\| darwin` build tag. This caused a compile error on Windows when `enterprise/coderd/workspaces_test.go` tried to call it: ``` enterprise\coderd\workspaces_test.go:3403:28: undefined: testutil.CacheTFProviders ``` ## Changes 1. Added `testutil/terraform_cache_windows.go` with a Windows-specific stub implementation that returns an empty string 2. Updated `downloadProviders` helper in `enterprise/coderd/workspaces_test.go` to handle empty paths gracefully ## Behavior - On Linux/macOS: Terraform providers are cached as before - On Windows: Provider caching is skipped, tests download providers normally during `terraform init` ## Testing This should fix the Windows nightly gauntlet failure. The test will still run on Windows, just without provider caching optimization. Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-11-20 07:52:07 +00:00
Steven Masley	04727c06e8	chore: add experiment toggle for terraform workspace caching (#20559 ) Experiments passed to provisioners to determine behavior. This adds `--experiments` flag to provisioner daemons. Prior to this, provisioners had no method to turn on/off experiments.	2025-11-12 14:26:15 -06:00
Susana Ferreira	ca94588bd5	fix: send prebuild job notification after job build db commit (#20693 ) ## Problem Fix race condition in prebuilds reconciler. Previously, a job notification event was sent to a Go channel before the provisioning database transaction completed. The notification is consumed by a separate goroutine that publishes to PostgreSQL's LISTEN/NOTIFY, using a separate database connection. This creates a potential race: if a provisioner daemon receives the notification and queries for the job before the provisioning transaction commits, it won't find the job in the database. This manifested as a flaky test failure in `TestReinitializeAgent`, where provisioners would occasionally miss newly created jobs. The test uses a 25-second timeout context, while the acquirer's backup polling mechanism checks for jobs every 30 seconds. This made the race condition visible in tests, though in production the backup polling would eventually pick up the job. The solution presented here guarantees that a job notification is only sent after the provisioning database transaction commits. ## Changes * The `provision()` and `provisionDelete()` functions now return the provisioner job instead of sending notifications internally. * A new `publishProvisionerJob()` helper centralizes the notification logic and is called after each transaction completes. Closes: https://github.com/coder/internal/issues/963	2025-11-12 10:36:39 +00:00
Danny Kopping	04f809f2d0	chore!: allow coder MCP tools to not be injected (#20713 ) Currently, when AI Bridge is enabled AND the `oauth2` and `mcp-server-http` experiments are enabled we inject Coder's MCP tools into all intercepted AI Bridge requests. This PR introduces a config to control this behaviour. NOTE: this is a backwards-incompatible change; previously these tools would be injected automatically, now this setting will need to be explicitly enabled. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-12 11:23:01 +02:00
Kacper Sawicki	f543a87b78	chore: cache terraform providers for workspaces terraform tests (#20603 ) Fixes flaky `TestWorkspaceTagsTerraform` and `TestWorkspaceTemplateParamsChange` tests that were failing with `connection reset by peer` errors when downloading the coder/coder provider. This applies the same caching solution which was done in https://github.com/coder/coder/pull/17373 1. Extracts provider caching logic into `testutil/terraform_cache.go` 2. Updates TestProvision to use the shared caching helpers 3. Updates enterprise workspace tests to use the shared caching helpers The cache is persisted at `~/.cache/coderv2-test/` and automatically cached between CI runs via existing GitHub Actions cache setup. Closes https://github.com/coder/internal/issues/607	2025-11-12 08:43:22 +00:00
Paweł Banaszewski	991831b1dd	chore: add API key ID to interceptions (#20513 ) Adds APIKeyID to interceptions. Needed for tracking API key usage with bridge. fixes https://github.com/coder/coder/issues/20001	2025-11-10 13:46:41 +01:00
Dean Sheather	b3f651d62f	chore: change managed agent limit (#20540 )	2025-11-05 00:46:27 +11:00
Danny Kopping	ff532d9bf3	chore: handle deprecated `aibridge` experimental routes (#20565 ) In v2.28 we're [removing the aibridge experiment](https://github.com/coder/coder/pull/20544). We need to handle `/api/experimental/aibridge/*` until Beta (next release). Signed-off-by: Danny Kopping <danny@coder.com>	2025-10-29 19:11:34 -06:00
Susana Ferreira	7e8fcb4b0f	perf: optimize prebuilds membership reconciliation to check orgs not presets (#20493 ) ## Description The membership reconciliation ensures the prebuilds system user is a member of all organizations with prebuilds configured. To support prebuilds quota management, each organization must have a prebuilds group that the system user belongs to. ## Problem Previously, membership reconciliation iterated over all presets to check and update membership status. This meant database queries `GetGroupByOrgAndName` and `InsertGroupMember` were executed for each preset. Since presets are unique combinations of `(organization, template, template version, preset)`, this resulted in several redundant checks for the same organization. In dogfood, `InsertGroupMember` was called thousands of times per day, even though memberships were already configured ([internal Grafana dashboard link](https://grafana.dev.coder.com/goto/46MZ1UgDg?orgId=1)) <img width="5382" height="1788" alt="Screenshot 2025-10-28 at 16 01 36" src="https://github.com/user-attachments/assets/757b7253-106f-4f72-8586-8e2ede9f18db" /> ## Solution This PR introduces `GetOrganizationsWithPrebuildStatus`, a single query that returns: * All unique organizations with prebuilds configured * Whether the prebuilds user is a member of each organization * Whether the prebuilds group exists in each organization * Whether the prebuilds user is in the prebuilds group The membership reconciliation logic now: * Fetches status for all organizations in one query * Only performs inserts for organizations missing required memberships or groups * Safely handles concurrent operations via unique constraint violations * This reduces database load from `O(presets)` to `O(organizations)` per reconciliation loop, with a single read query when everything is configured. ## Changes * Add `GetOrganizationsWithPrebuildStatus` SQL query * Update `membership.ReconcileAll` to use organization-based reconciliation instead of preset-based * Update tests to reflect new behavior Related to internal thread: https://codercom.slack.com/archives/C07GRNNRW03/p1760535570381369	2025-10-29 14:24:29 +00:00
Danny Kopping	b20fd6f2c1	chore: graduate aibridge API out of experimental (#20523 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:18:54 -06:00
Danny Kopping	2294c55bd9	chore: graduate `aibridged*` packages out of experimental (#20522 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2025-10-29 07:00:24 -06:00
Susana Ferreira	aad1b401c1	feat: add prebuilds reconciliation duration metric (#20535 ) ## Description Adds `coderd_prebuilds_reconciliation_duration_seconds` histogram metric to track the duration of each prebuilds reconciliation cycle. This metric helps operators monitor reconciliation performance and identify potential bottlenecks. ## Changes - Added `ReconcileStats` struct to capture reconciliation cycle statistics - Updated `ReconcileAll()` to return stats including elapsed time - Added histogram metric `coderd_prebuilds_reconciliation_duration_seconds`	2025-10-29 12:52:30 +00:00
Danny Kopping	95a1ca898f	chore: remove aibridge experiment (#20520 ) Removes the experiment and all references to it	2025-10-29 06:18:38 -06:00
Susana Ferreira	c3e3bb58f2	feat: delete pending canceled prebuilds (#20499 ) ## Description PR https://github.com/coder/coder/pull/20387 introduced canceling pending prebuild jobs from inactive template versions to avoid provisioning obsolete workspaces. However, the associated prebuilds remained in the database with "Canceled" status, visible in the UI. This PR now orphan-deletes these canceled prebuilt workspaces. Since the canceled jobs were never processed by a provisioner, no Terraform resources were created, making orphan deletion safe. Orphan deletion always creates a provisioner job, but behaves differently based on provisioner availability: - If no provisioner daemon is available, the job is immediately marked as completed and the workspace is marked as deleted without any provisioner processing - If a provisioner daemon is available, it processes the delete job with empty Terraform state (no actual resources to destroy) The job cancellation and workspace deletion occur atomically in the same transaction. We don't split this into two separate reconciliation runs because there's no way to distinguish between system-canceled prebuilds and user-canceled workspaces. If we deleted canceled workspaces in a later run, we'd delete user-canceled workspaces that users may want to keep for troubleshooting. Note: This only applies to system-generated prebuilds from inactive template versions. ## Changes * Update `UpdatePrebuildProvisionerJobWithCancel` query to return job ID, workspace ID, template ID, and template version preset ID * Add `DeprovisionMode` enum to support orphan deletion in the provision flow * Update `ActionTypeCancelPending` handler to cancel jobs and orphan-delete associated workspaces atomically	2025-10-29 10:37:28 +00:00
Dean Sheather	dec6d310a8	fix: avoid bad switch statement in license code (#20509 ) Noticed this while trying to investigate a flake. Relates to https://github.com/coder/internal/issues/788	2025-10-28 06:19:52 +00:00
ケイラ	4f7b279fd8	feat: add an organization member permission level (#19953 )	2025-10-27 17:14:16 -06:00
Paweł Banaszewski	50ba223aa1	feat: add db query for setting interception ended_at field (#20437 ) Adds UpdateAIBridgeInterceptionEnded query to mark interceptions as done. Needed for https://github.com/coder/internal/issues/1051	2025-10-27 09:51:37 +01:00
Susana Ferreira	f6e86c6fdb	feat: cancel pending prebuilds from non-active template versions (#20387 ) ## Description This PR introduces an optimization to automatically cancel pending prebuild-related jobs from non-active template versions in the reconciliation loop. ## Problem Currently, when a template is configured with more prebuild instances than available provisioners, the provisioner queue can become flooded with pending prebuild jobs. This issue is worsened when provisioning/deprovisioning operations take a long time. When the prebuild reconciliation loop generates jobs faster than provisioners can process them, pending jobs accumulate in the queue. Since prebuilt workspaces should always run the latest active template version, pending prebuild jobs from non-active versions become obsolete once a new version is promoted. ## Solution The reconciliation loop cancels pending prebuild-related jobs from non-active template versions that match the following criteria: * Build number: 1 (initial build created by the reconciliation loop) * Job status: `pending` * Not yet picked up by a provisioner (`worker_id` is `NULL`) * Owned by the prebuilds system user * Workspace transition: `start` This prevents the queue from being cluttered with stale prebuild jobs that would provision workspaces on an outdated template version that would consequently need to be deprovisioned. ## Changes * Added new SQL query `CountPendingNonActivePrebuilds` to identify presets with pending jobs from non-active versions * Added new SQL query `UpdatePrebuildProvisionerJobWithCancel` to cancel jobs for a specific preset * New reconciliation action type `ActionTypeCancelPending` handles the cancellation logic * Cancellation is non-blocking: failures to cancel prebuild jobs are logged as errors and don't prevent other reconciliation actions ## Follow-up PR Canceling pending prebuild jobs leaves workspaces in a Canceled state. While no Terraform resources need to be destroyed (since jobs were canceled before provisioning started), these database records should still be cleaned up. This will be addressed in a follow-up PR. Closes: https://github.com/coder/coder/issues/20242	2025-10-24 15:27:49 +01:00
Steven Masley	13ca9ead3a	chore!: ensure consistent secret token generation and hashing (#20388 ) This PR uses the same sha256 hashing technique as we use for APIKeys. So now all randomly generated secrets will be hashed with sha256 for consistency. This is a breaking change for the oauth tokens. Since oauth is only allowed for dev builds and experimental, this is ok.	2025-10-23 15:38:49 -05:00
Hugo Dutka	e62c5db678	chore: remove references to dbtestutil.WillUsePostgres (#20436 ) Addresses https://github.com/coder/internal/issues/758. This PR only cleans up dead code, it makes no changes to test logic.	2025-10-23 14:24:54 +02:00
Jake Howell	d455f6ea2b	fix: rename `total` to `count` in `AIBridgeListInterceptionsResponse` (#20410 ) Thanks to the great work in #20393, we’ve successfully introduced offset-based pagination for this endpoint. However, the frontend expects a `count` field in the response rather than `total`. This PR updates the response payload to rename the returned key to `count` for consistency with frontend expectations and existing API patterns. This is necessary to unblock the work in #20331	2025-10-23 13:19:12 +11:00
Marcin Tojek	caeca1097b	chore: refactor license validation (#20411 )	2025-10-22 16:12:36 +02:00
Marcin Tojek	f2a410566c	feat: add support buttons (#20339 ) Fixes: https://github.com/coder/coder/issues/16804	2025-10-22 15:35:16 +02:00
Dean Sheather	69c2c40512	chore: add user details to aibridge interception list endpoint (#20397 ) - Adds FK from `aibridge_interceptions.initiator_id` to `users.id` - This is enforced by deleting any rows that don't have any users. Since this is an experimental feature AND coder never deletes user rows I think this is acceptable. - Adds `name` as a property on `codersdk.MinimalUser` - This matches the `visible_users` view in the database. I'm unsure why `name` wasn't already included given that `username` is. - Adds a new `initiator` field to `codersdk.AIBridgeInterception` which contains `codersdk.MinimalUser` (ID, username, name, avatar URL) - Removes `initiator_id` from `codersdk.AIBridgeInterception` - Should be fine since we're still in early access	2025-10-22 16:18:31 +11:00
Dean Sheather	ea261a1f7c	chore: add offset-based pagination support to aibridge list endpoint (#20393 ) Necessary for the frontend to be able to paginate easily. Cursor pagination is good for fetching all events, but doesn't play very well when a pagination component gets involved. Adds support for `?offset=x` to the existing endpoint. The cursor-based pagination (`?after_id=x`) is still supported. The two pagination modes are mutually exclusive, and are documented as such. If both are supplied, the request will be rejected. Also adds a `total` property to the response that contains the full count of items matching the filter. We already have indices in place so I don't think this will impact performance (or we can revisit it before GA).	2025-10-21 11:50:00 +00:00
ケイラ	caeff49aba	chore: refactor roles to support multiple permission sets scoped by org id (#20186 ) In preparation for adding the "member" permission level, which will also be grouped by org ID, do a bit of a refactor to make room for it and the existing "org" level to live in the same `map`	2025-10-09 11:08:34 -06:00
Sas Swart	544f15523c	fix: adjust workspace claims to be initiated by users (#20179 ) The prebuilds user never initiates a workspace claim autonomously. A claim can only happen when a user attempts to create a workspace. When listing prebuild provisioner jobs, it would not make sense to see jobs related to users who are creating workspaces and have gotten a prebuilt workspace. When cleaning up an overwhelmed provisioner queue, we should not delete claims as they have humans waiting for them and are not part of the thundering herd. Therefore, this PR ensures that provisioner jobs that claim workspaces are considered to be initiated by the user, not the prebuilds system.	2025-10-08 10:40:54 +02:00
Zach	4d1003eace	fix: remove initial global HTTP client usage (#20128 ) This PR makes the initial steps at removing usage of the global Go HTTP client, which was seen to have impacts on test flakiness in https://github.com/coder/internal/issues/1020. The first commit removes uses from tests, with the exception of one test that is tightly coupled to the default client. The second commit makes easy/low-risk removals from application code. This should have some impact to reduce test flakiness.	2025-10-02 11:43:13 -06:00
Cian Johnston	ff930ad4f3	feat(coderd): add ability to search org members by user_id, is_system, github_user_id (#20048 ) Adds the ability to search org members by query. Supported fields: `user_id`, `is_system`, `github_user_id`.	2025-09-30 23:54:21 +01:00
Cian Johnston	3e1f6afd66	chore: work around timing issue in TestReplicas/ErrorWithoutLicense (#20002 ) Closes https://github.com/coder/internal/issues/268 Wraps the assertions in a `testutil.Eventually` so that hopefully any transient timing issues resolve themselves. If this does not resolve the issue, we may need to plumb through some kind of `chan struct{}` into `api.Entitlements.Update()`	2025-09-29 10:20:26 +01:00
Dean Sheather	fc58996bbf	chore: add StripPrefix to aibridge server handler (#19990 ) oops	2025-09-26 15:40:42 +00:00
Dean Sheather	43415f0144	chore: add enterprise feature for aibridge (#19976 ) Adds enterprise feature "aibridge" and gates the aibridge CRUD and LLM API endpoints behind it.	2025-09-27 01:13:06 +10:00
Paweł Banaszewski	0a6ba5d51a	feat: add endpoint to list aibridge interceptions (#19929 ) Co-authored-by: Dean Sheather <dean@deansheather.com>	2025-09-27 00:20:33 +10:00
Danny Kopping	0a79817050	feat: initialize `aibridged` & mount API handler (#19798 ) Addresses https://github.com/coder/internal/issues/987	2025-09-25 16:37:28 +02:00
Brett Kolodny	38ca98745b	feat: add shared_with_group: and shared_with_user: filters to /workspaces endpoint (#19875 ) Adds shared_with_user and shared_with_group filters to the /workspaces endpoint. - `shared_with_user`: filters workspaces shared with a specific user. Accepts a user UUID or username. - `shared_with_group`: filters workspaces shared with a specific group. Accepts: - a group UUID, or - `<organization name>/<group name>`, or - `<group name>` (resolved in the default organization). Closes [coder/internal#1004](https://github.com/coder/internal/issues/1004)	2025-09-19 16:05:27 -04:00
Brett Kolodny	e6b04d1918	feat: add shared filter to workspaces query (#19807 ) Adds a `shared:<boolean>` search query to the `/workspaces [get]` endpoint https://github.com/user-attachments/assets/ccf84bd9-c1fd-4085-825b-2e3176a2d488 Closes [coder/internal#972](https://github.com/coder/internal/issues/972)	2025-09-16 12:37:39 -04:00
Ethan	995b330250	test: avoid sharing deployment values between subtests (#19833 ) Blink didn't figure out a CI failure on main was caused by a data race; fixing it. I've also updated the [blink prompt](https://gist.githubusercontent.com/ethanndickson/8dea9f1db3957ac1baf30ae8ce6f1a42/raw/060aea7fabb82bef0029a17dad9a5daee7940760/blink-flake-instructions.md). https://github.com/coder/coder/actions/runs/17737809615	2025-09-16 13:51:26 +10:00
Ethan	6a9b896f5b	fix!: use client ip when creating connection logs for workspace proxied app accesses (#19788 ) Breaking API Change: > The presence of the `ip` field on `codersdk.ConnectionLog` cannot be guaranteed, and so the field has been made optional. It may be omitted on API responses. When running a scaletest, I noticed logs of the form: ``` 2025-09-12 06:34:10.924 [erro] coderd.workspaceapps: upsert connection log failed trace=0xa17580 span=0xa17620 workspace_id=81b937d7-5777-4df5-b5cb-80241f30326f agent_id=78b2ff6d-b4a6-4a4e-88a7-283e05455a88 app_id=00000000-0000-0000-0000-000000000000 user_id=00000000-0000-0000-0000-000000000000 user_agent="" app_slug_or_port=terminal status_code=404 request_id=67f03cf8-9523-444a-97bc-90de080a54c8 ... error= 1 error occurred: * pq: null value in column "ip" of relation "connection_logs" violates not-null constraint ``` to ensure logs are never omitted from the connection log due to a missing IP again (i.e. I'm not sure if we can always rely on a valid, parseable, IP from `(http.Request).RemoteAddr`), I've removed the `NOT NULL` constraint on `ip` on `connection_logs`, and made `ip` on the API response optional. The specific cause for these null IPs was the `/workspaceproxies/me/issue-signed-app-token [post]` endpoint constructing it's own `http.Request` without a `RemoteAddr` set, and then passing that to the token issuer. To solve this, we'll have workspace proxies send the real IP of the client when calling `/workspaceproxies/me/issue-signed-app-token [post]` via the header `Coder-Workspace-Proxy-Real-IP`.	2025-09-15 12:30:17 +10:00
Brett Kolodny	854f3c0187	feat: add workspaces/acl [delete] endpoint (#19772 ) Closes [coder/internal#971](https://github.com/coder/internal/issues/971)	2025-09-12 12:21:01 -04:00
Kacper Sawicki	3074547322	perf(enterprise): remove expensive GetWorkspaces query from entitlements (#19747 ) Closes: https://github.com/coder/internal/issues/964 This PR addresses the significant database load issue where the `GetWorkspaces` query was causing performance problems in the license entitlements code.	2025-09-09 15:46:11 +02:00
Callum Styan	0ec9df390b	fix: reduce impact of GetPrebuildMetrics on database (#19694 ) see https://github.com/coder/internal/issues/959 but the tl; dr is: - we call this DB query on an interval (every 15s) and it would be called on each coderd replica as well - the generated values update very infrequently (for our most used internal template I saw the builds created/claimed update twice in a 1h period) - we have no index on the initiator ID, so this query has to scan the entire workspace_builds table on every request In reality this should likely just be a Prometheus metric, and Prometheus can handle the counter reset behaviour at query time, but for now this should at least cut the load of the query to 25% of it's current impact. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-09-04 13:43:50 -07:00

1 2 3 4 5 ...

686 Commits