mirror of
https://github.com/coder/coder.git
synced 2026-06-03 21:18:24 +00:00
53e8e9c7cd
Relates to https://github.com/coder/coder/issues/17432 ### Part 1: Notes: - `GetPresetsAtFailureLimit` SQL query is added, which is similar to `GetPresetsBackoff`, they use same CTEs: `filtered_builds`, `time_sorted_builds`, but they are still different. - Query is executed on every loop iteration. We can consider marking specific preset as permanently failed as an optimization to avoid executing query on every loop iteration. But I decided don't do it for now. - By default `FailureHardLimit` is set to 3. - `FailureHardLimit` is configurable. Setting it to zero - means that hard limit is disabled. ### Part 2 Notes: - `PrebuildFailureLimitReached` notification is added. - Notification is sent to template admins. - Notification is sent only the first time, when hard limit is reached. But it will `log.Warn` on every loop iteration. - I introduced this enum: ```sql CREATE TYPE prebuild_status AS ENUM ( 'normal', -- Prebuilds are working as expected; this is the default, healthy state. 'hard_limited', -- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore. 'validation_failed' -- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried. ); ``` `validation_failed` not used in this PR, but I think it will be used in next one, so I wanted to save us an extra migration. - Notification looks like this: <img width="472" alt="image" src="https://github.com/user-attachments/assets/e10efea0-1790-4e7f-a65c-f94c40fced27" /> ### Latest notification views: <img width="463" alt="image" src="https://github.com/user-attachments/assets/11310c58-68d1-4075-a497-f76d854633fe" /> <img width="725" alt="image" src="https://github.com/user-attachments/assets/6bbfe21a-91ac-47c3-a9d1-21807bb0c53a" />
35 lines
2.0 KiB
Plaintext
35 lines
2.0 KiB
Plaintext
{
|
|
"_version": "1.1",
|
|
"msg_id": "00000000-0000-0000-0000-000000000000",
|
|
"payload": {
|
|
"_version": "1.2",
|
|
"notification_name": "Prebuild Failure Limit Reached",
|
|
"notification_template_id": "00000000-0000-0000-0000-000000000000",
|
|
"user_id": "00000000-0000-0000-0000-000000000000",
|
|
"user_email": "bobby@coder.com",
|
|
"user_name": "Bobby",
|
|
"user_username": "bobby",
|
|
"actions": [
|
|
{
|
|
"label": "View failed prebuilt workspaces",
|
|
"url": "http://test.com/workspaces?filter=owner:prebuilds+status:failed+template:docker"
|
|
},
|
|
{
|
|
"label": "View template version",
|
|
"url": "http://test.com/templates/cern/docker/versions/angry_torvalds"
|
|
}
|
|
],
|
|
"labels": {
|
|
"org": "cern",
|
|
"preset": "particle-accelerator",
|
|
"template": "docker",
|
|
"template_version": "angry_torvalds"
|
|
},
|
|
"data": {},
|
|
"targets": null
|
|
},
|
|
"title": "There is a problem creating prebuilt workspaces",
|
|
"title_markdown": "There is a problem creating prebuilt workspaces",
|
|
"body": "The number of failed prebuild attempts has reached the hard limit for template docker and preset particle-accelerator.\n\nTo resume prebuilds, fix the underlying issue and upload a new template version.\n\nRefer to the documentation for more details:\n\nTroubleshooting templates (https://coder.com/docs/admin/templates/troubleshooting)\nTroubleshooting of prebuilt workspaces (https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces#administration-and-troubleshooting)",
|
|
"body_markdown": "\nThe number of failed prebuild attempts has reached the hard limit for template **docker** and preset **particle-accelerator**.\n\nTo resume prebuilds, fix the underlying issue and upload a new template version.\n\nRefer to the documentation for more details:\n- [Troubleshooting templates](https://coder.com/docs/admin/templates/troubleshooting)\n- [Troubleshooting of prebuilt workspaces](https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces#administration-and-troubleshooting)\n"
|
|
} |