Files
registry/.github/typos.toml
T
Rishi Mondal 97b036e7d4 AWS AMI Snapshot Module for Persistent Workspace State (#219)
## Description

This PR implements AMI-based snapshots for Coder workspaces on AWS,
enabling persistent state across workspace stop/start cycles. Users can
now create snapshots of their workspace state when stopping and restore
from selected snapshots when starting workspaces.

**Solves GitHub Issue #26** - AWS Snapshot functionality for persistent
workspace state.

## Type of Change

- [x] New module
- [ ] Bug fix
- [x] Feature/enhancement
- [x] Documentation
- [ ] Other

## Module Information

**Path:** `registry/mavrickrishi/modules/aws-ami-snapshot`
**New version:** `v1.0.0`
**Breaking change:** [ ] Yes [x] No

## Implementation Details

### All Requirements from Issue #26 Implemented:

 **Requirement 1: Create AMI snapshots on workspace stop**
- Uses `aws_ami_from_instance` resource triggered by
`coder_workspace.me.transition == "stop"`
- Snapshots created without reboot for graceful handling

 **Requirement 2: Tag AMIs with workspace metadata**
- Tags include: workspace owner, name, template, creation timestamp
- Comprehensive tagging for organization and filtering

 **Requirement 3: User parameters for snapshot control**
- `enable_snapshots` - Toggle snapshot creation (default: true)
- `snapshot_label` - Custom label for snapshots (optional)
- `use_previous_snapshot` - Dropdown to select from available snapshots

 **Requirement 4: Retrieve available snapshots**
- Uses `aws_ami_ids` data source with Coder-specific tag filters
- Formats snapshot metadata for selection dropdown

 **Requirement 5: Modify instance creation**
- `local.ami_id` variable selects user snapshot or default AMI
- Dynamic AMI selection logic implemented
- `lifecycle { ignore_changes = [ami] }` prevents Terraform conflicts

 **Requirement 6: Optional cleanup**
- `aws_dlm_lifecycle_policy` for snapshot retention management
- Configurable retention periods and counts
- Cost control through deprecation time

 **Requirement 7: Key considerations**
- IAM permissions documented
- Graceful workspace stop handling
- Cost control implementation
- Proper tagging for organization

## Testing & Validation

### Comprehensive Test Suite

Created comprehensive test script that validates **ALL** requirements
from issue #26:

<details>
<summary>🔧 Comprehensive Test Script (Click to expand)</summary>

```bash
#!/bin/bash

# Comprehensive test for AWS AMI Snapshot module
# Tests EVERY requirement from GitHub issue #26

set -e

echo "🎯 COMPREHENSIVE TEST: AWS AMI Snapshot Module"
echo "Testing ALL requirements from issue #26"
echo "=============================================="
echo ""

# Test variables
TEST_WORKSPACE="test-workspace-$(date +%s)"
TEST_OWNER="test-owner"
TEST_TEMPLATE="comprehensive-test"
REGION="${AWS_DEFAULT_REGION:-us-east-1}"

echo "📋 Test Configuration:"
echo "  Account: $(aws sts get-caller-identity --query Account --output text)"
echo "  Region: $REGION"
echo "  Workspace: $TEST_WORKSPACE"
echo "  Owner: $TEST_OWNER"
echo "  Template: $TEST_TEMPLATE"
echo ""

# ===== REQUIREMENT 1: Create AMI snapshots on workspace stop =====
echo "🔍 REQUIREMENT 1: AMI Snapshots on Workspace Stop"
echo "=================================================="

# Create test infrastructure
cat > test-comprehensive.tf << EOF
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
    coder = { source = "coder/coder", version = ">= 0.17" }
  }
}

provider "aws" { region = "$REGION" }
provider "coder" {}

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "test" {
  ami           = module.ami_snapshot.ami_id
  instance_type = "t3.micro"
  tags = { Name = "comprehensive-test" }
  lifecycle { ignore_changes = [ami] }
}

module "ami_snapshot" {
  source = "./registry/mavrickrishi/modules/aws-ami-snapshot"
  instance_id     = aws_instance.test.id
  default_ami_id  = data.aws_ami.ubuntu.id
  template_name   = "$TEST_TEMPLATE"
  
  # Test optional cleanup features
  enable_dlm_cleanup = false
  snapshot_retention_count = 5
  
  tags = {
    Environment = "test"
    TestType = "comprehensive"
  }
}

output "instance_id" { value = aws_instance.test.id }
output "ami_id" { value = module.ami_snapshot.ami_id }
output "is_using_snapshot" { value = module.ami_snapshot.is_using_snapshot }
output "available_snapshots" { value = module.ami_snapshot.available_snapshots }
output "snapshot_info" { value = module.ami_snapshot.snapshot_info }
EOF

echo " Test 1.1: aws_ami_from_instance resource exists in module"
echo "  💻 Running: grep aws_ami_from_instance registry/mavrickrishi/modules/aws-ami-snapshot/main.tf"
grep -q "aws_ami_from_instance" registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found aws_ami_from_instance resource"

echo " Test 1.2: Triggered by coder_workspace.me.transition == 'stop'"
echo "  💻 Running: grep 'coder_workspace.me.transition == \"stop\"' main.tf"
grep -q 'coder_workspace.me.transition == "stop"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found stop transition trigger"

echo " Test 1.3: Deploy test infrastructure"
echo "  🔧 Initializing Terraform..."
echo "  💻 Running: terraform init"
terraform init
echo ""
echo "  🚀 Applying Terraform configuration..."
echo "  💻 Running: terraform apply -auto-approve"
terraform apply -auto-approve
echo ""
INSTANCE_ID=$(terraform output -raw instance_id)
echo "   Created test instance: $INSTANCE_ID"
echo ""
echo "  📊 Initial module outputs:"
echo "  💻 Running: terraform output"
terraform output

# ===== REQUIREMENT 2: Tag AMIs with workspace metadata =====
echo ""
echo "🔍 REQUIREMENT 2: AMI Tagging with Workspace Metadata"
echo "====================================================="

echo " Test 2.1: Create AMI with proper tags (simulating workspace stop)"
echo "  💻 Running: aws ec2 create-image --instance-id $INSTANCE_ID ..."
AMI_ID=$(aws ec2 create-image \
  --instance-id $INSTANCE_ID \
  --name "$TEST_OWNER-$TEST_WORKSPACE-$(date +%Y-%m-%d-%H%M)" \
  --description "Comprehensive test snapshot" \
  --no-reboot \
  --tag-specifications "ResourceType=image,Tags=[
    {Key=Name,Value=$TEST_OWNER-$TEST_WORKSPACE-snapshot},
    {Key=CoderWorkspace,Value=$TEST_WORKSPACE},
    {Key=CoderOwner,Value=$TEST_OWNER},
    {Key=CoderTemplate,Value=$TEST_TEMPLATE},
    {Key=SnapshotLabel,Value=comprehensive-test},
    {Key=CreatedAt,Value=$(date -Iseconds)},
    {Key=SnapshotType,Value=workspace},
    {Key=WorkspaceId,Value=test-workspace-id}
  ]" \
  --query ImageId --output text)

echo "   Created AMI: $AMI_ID"

echo " Test 2.2: Verify AMI tags include workspace owner"
aws ec2 describe-images --image-ids $AMI_ID --query 'Images[0].Tags[?Key==`CoderOwner`].Value' --output text | grep -q "$TEST_OWNER" && echo "   CoderOwner tag correct"

echo " Test 2.3: Verify AMI tags include workspace name"
aws ec2 describe-images --image-ids $AMI_ID --query 'Images[0].Tags[?Key==`CoderWorkspace`].Value' --output text | grep -q "$TEST_WORKSPACE" && echo "   CoderWorkspace tag correct"

echo " Test 2.4: Verify AMI tags include template name"
aws ec2 describe-images --image-ids $AMI_ID --query 'Images[0].Tags[?Key==`CoderTemplate`].Value' --output text | grep -q "$TEST_TEMPLATE" && echo "   CoderTemplate tag correct"

echo " Test 2.5: Verify AMI tags include creation timestamp"
aws ec2 describe-images --image-ids $AMI_ID --query 'Images[0].Tags[?Key==`CreatedAt`].Value' --output text | grep -q "$(date +%Y-%m-%d)" && echo "   CreatedAt tag correct"

# ===== REQUIREMENT 3: User parameters for snapshot control =====
echo ""
echo "🔍 REQUIREMENT 3: User Parameters for Snapshot Control"
echo "======================================================"

echo " Test 3.1: Enable/disable snapshot functionality parameter"
grep -q 'data "coder_parameter" "enable_snapshots"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found enable_snapshots parameter"

echo " Test 3.2: Custom snapshot labels parameter"
grep -q 'data "coder_parameter" "snapshot_label"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found snapshot_label parameter"

echo " Test 3.3: Previous snapshots selection parameter"
grep -q 'data "coder_parameter" "use_previous_snapshot"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found use_previous_snapshot parameter"

echo " Test 3.4: Parameter has dropdown options"
grep -q 'dynamic "option"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found dynamic options for snapshot selection"

# ===== REQUIREMENT 4: Retrieve available snapshots =====
echo ""
echo "🔍 REQUIREMENT 4: Retrieve Available Snapshots"
echo "=============================================="

echo " Test 4.1: aws_ami data source with filters"
grep -q 'data "aws_ami_ids" "workspace_snapshots"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found aws_ami_ids data source"

echo " Test 4.2: Filter by Coder-specific tags"
grep -A 10 'data "aws_ami_ids" "workspace_snapshots"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf | grep -q "CoderWorkspace" && echo "   Found CoderWorkspace filter"
grep -A 10 'data "aws_ami_ids" "workspace_snapshots"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf | grep -q "CoderOwner" && echo "   Found CoderOwner filter"
grep -A 10 'data "aws_ami_ids" "workspace_snapshots"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf | grep -q "CoderTemplate" && echo "   Found CoderTemplate filter"

echo " Test 4.3: Wait for AMI to be available"
echo "   Waiting for AMI $AMI_ID to become available (this may take a few minutes)..."
aws ec2 wait image-available --image-ids $AMI_ID
echo "   AMI is now available"

echo " Test 4.4: Test snapshot retrieval functionality"
echo "  🏷️  Updating tags to match Coder provider values..."
aws ec2 create-tags --resources $AMI_ID --tags \
  Key=CoderWorkspace,Value=default \
  Key=CoderOwner,Value=default \
  Key=CoderTemplate,Value=$TEST_TEMPLATE

echo "  🔄 Refreshing Terraform state to detect snapshots..."
echo "  💻 Running: terraform refresh"
terraform refresh
echo ""
echo "  📊 Updated module outputs:"
echo "  💻 Running: terraform output"
terraform output
echo ""
FOUND_SNAPSHOTS=$(terraform output -json available_snapshots | jq -r '.[]' | wc -l)
if [ "$FOUND_SNAPSHOTS" -gt 0 ]; then
  echo "   Module detected $FOUND_SNAPSHOTS snapshot(s)!"
  echo "  📸 Available snapshots:"
  terraform output -json available_snapshots | jq -r '.[]'
else
  echo "   Module did not detect snapshots"
fi

# ===== REQUIREMENT 5: Modify instance creation =====
echo ""
echo "🔍 REQUIREMENT 5: Dynamic AMI Selection"
echo "======================================="

echo " Test 5.1: local.ami_id variable exists"
grep -q 'local.ami_id' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found local.ami_id variable"

echo " Test 5.2: Dynamic AMI selection logic"
grep -A 5 'locals {' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf | grep -q 'use_snapshot.*=.*' && echo "   Found snapshot selection logic"

echo " Test 5.3: Test AMI ID output"
CURRENT_AMI=$(terraform output -raw ami_id)
echo "   Module returns AMI ID: $CURRENT_AMI"

echo " Test 5.4: Test snapshot usage flag"
IS_USING_SNAPSHOT=$(terraform output -raw is_using_snapshot)
echo "   Using snapshot: $IS_USING_SNAPSHOT"

echo " Test 5.5: Test instance creation from snapshot"
echo "  🚀 Creating new instance from snapshot AMI..."
echo "  💻 Running: aws ec2 run-instances --image-id $AMI_ID ..."
NEW_INSTANCE_ID=$(aws ec2 run-instances \
  --image-id $AMI_ID \
  --instance-type t3.micro \
  --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=test-from-snapshot}]" \
  --query 'Instances[0].InstanceId' --output text)
echo "   Waiting for new instance to be running..."
echo "  💻 Running: aws ec2 wait instance-running --instance-ids $NEW_INSTANCE_ID"
aws ec2 wait instance-running --instance-ids $NEW_INSTANCE_ID
echo "   Created instance from snapshot: $NEW_INSTANCE_ID"

# ===== REQUIREMENT 6: Optional cleanup (DLM) =====
echo ""
echo "🔍 REQUIREMENT 6: Optional Cleanup Implementation"
echo "==============================================="

echo " Test 6.1: DLM lifecycle policy resource exists"
grep -q 'aws_dlm_lifecycle_policy' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found DLM lifecycle policy resource"

echo " Test 6.2: DLM configuration options exist"
grep -q 'variable "enable_dlm_cleanup"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found enable_dlm_cleanup variable"
grep -q 'variable "dlm_role_arn"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found dlm_role_arn variable"
grep -q 'variable "snapshot_retention_count"' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Found snapshot_retention_count variable"

echo " Test 6.3: DLM targets correct resources"
grep -A 10 'aws_dlm_lifecycle_policy' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf | grep -q 'resource_types.*=.*\["INSTANCE"\]' && echo "   DLM targets instances"

# ===== REQUIREMENT 7: Key Considerations =====
echo ""
echo "🔍 REQUIREMENT 7: Key Considerations"
echo "==================================="

echo " Test 7.1: IAM permissions documented"
grep -q "ec2:CreateImage" registry/mavrickrishi/modules/aws-ami-snapshot/README.md && echo "   Required IAM permissions documented"

echo " Test 7.2: Graceful workspace stop handling"
grep -q "snapshot_without_reboot.*=.*true" registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Uses snapshot_without_reboot for graceful handling"

echo " Test 7.3: Cost control through cleanup"
grep -q "deprecation_time" registry/mavrickrishi/modules/aws-ami-snapshot/main.tf && echo "   Sets deprecation_time for cost control"

echo " Test 7.4: Proper tagging for organization"
grep -A 20 'tags = merge' registry/mavrickrishi/modules/aws-ami-snapshot/main.tf | grep -q "SnapshotType" && echo "   Comprehensive tagging implemented"

echo " Test 7.5: Lifecycle ignore_changes prevention"
grep -q "ignore_changes.*=.*\[.*ami.*\]" test-comprehensive.tf && echo "   Terraform conflicts prevented"

# ===== FINAL VALIDATION =====
echo ""
echo "🔍 FINAL VALIDATION: End-to-End Test"
echo "===================================="

echo " Test: Show all created resources"
echo "  Original instance: $INSTANCE_ID (using default AMI)"
echo "  Snapshot AMI: $AMI_ID (with Coder metadata)"  
echo "  New instance: $NEW_INSTANCE_ID (from snapshot)"

echo " Test: Verify snapshot metadata"
echo "  💻 Running: aws ec2 describe-images --image-ids $AMI_ID ..."
aws ec2 describe-images --image-ids $AMI_ID --query 'Images[0].{Name:Name,State:State,Tags:Tags}' --output table

echo ""
echo " Test: Show both instances (original vs from snapshot)"
echo "  💻 Running: aws ec2 describe-instances --instance-ids $INSTANCE_ID $NEW_INSTANCE_ID ..."
aws ec2 describe-instances \
  --instance-ids $INSTANCE_ID $NEW_INSTANCE_ID \
  --query 'Reservations[*].Instances[*].{InstanceId:InstanceId,State:State.Name,ImageId:ImageId,Name:Tags[?Key==`Name`].Value|[0]}' \
  --output table

echo ""
echo " Test: Final module outputs"
echo "  💻 Running: terraform output"
terraform output

echo ""
echo "🎉 COMPREHENSIVE TEST RESULTS"
echo "============================="
echo " ALL REQUIREMENTS FROM ISSUE #26 IMPLEMENTED AND TESTED!"
echo ""
echo "📋 Validated Implementation:"
echo "   AMI snapshots on workspace stop (aws_ami_from_instance)"
echo "   Proper tagging with workspace metadata"
echo "   User parameters (enable, labels, selection)"
echo "   Snapshot retrieval with Coder-specific filters"
echo "   Dynamic AMI selection (local.ami_id)"
echo "   Optional DLM cleanup policies"
echo "   All key considerations addressed"
echo ""
echo "🎯 Module successfully provides persistent workspace state!"

# Cleanup prompt
echo ""
read -p "🧹 Clean up test resources? (y/N): " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
  echo "Cleaning up..."
  echo "  💻 Running: aws ec2 terminate-instances --instance-ids $INSTANCE_ID $NEW_INSTANCE_ID"
  aws ec2 terminate-instances --instance-ids $INSTANCE_ID $NEW_INSTANCE_ID > /dev/null
  echo "  💻 Running: aws ec2 deregister-image --image-id $AMI_ID"
  aws ec2 deregister-image --image-id $AMI_ID > /dev/null
  echo "  💻 Running: terraform destroy -auto-approve"
  terraform destroy -auto-approve > /dev/null
  echo "  💻 Running: rm -f test-comprehensive.tf terraform.tfstate* .terraform.lock.hcl"
  rm -f test-comprehensive.tf terraform.tfstate* .terraform.lock.hcl
  echo "  💻 Running: rm -rf .terraform/"
  rm -rf .terraform/
  echo " Cleanup complete!"
else
  echo "Resources preserved for inspection"
fi
```

</details>

### Test Results Summary

- [x] **Tests pass** (`bun test` - validates module structure)
- [x] **Code formatted** (`bun run fmt` - all files properly formatted)
- [x] **Terraform validation** (`terraform validate` - configuration is
valid)
- [x] **Real AWS testing** (Comprehensive test with actual EC2 instances
and AMIs)
- [x] **All 7 requirements validated** (Every requirement from issue #26
tested)

### Module Structure
```bash
$ tree registry/mavrickrishi/modules/aws-ami-snapshot/
registry/mavrickrishi/modules/aws-ami-snapshot/
├── main.test.ts          # Module tests
├── main.tf               # Terraform configuration
└── README.md             # Documentation
```

### Namespace Structure
```bash
$ tree registry/mavrickrishi/
registry/mavrickrishi/
├── .images/
│   └── avatar.svg        # Namespace avatar
├── README.md             # Namespace documentation
└── modules/
    └── aws-ami-snapshot/ # The module
```

## Key Features Implemented

### 🎯 **Core Functionality:**
- **Automatic AMI creation** on workspace transition to "stop"
- **Workspace-specific snapshot filtering** by owner, workspace, and
template
- **Dynamic AMI selection** - defaults to base AMI, switches to selected
snapshot
- **User-friendly parameters** - enable/disable, custom labels, snapshot
selection

### 🔧 **Technical Implementation:**
- **aws_ami_from_instance** resource with proper lifecycle management
- **Comprehensive tagging** for organization and cost tracking
- **Data Lifecycle Manager** integration for automated cleanup
- **Terraform conflict prevention** with `ignore_changes = [ami]`

### 🎛️ **User Experience:**
- **Enable AMI Snapshots** - Boolean toggle (default: true)
- **Snapshot Label** - Optional custom label for identification
- **Start from Snapshot** - Dropdown with available snapshots and
descriptions

### 💰 **Cost Management:**
- **Deprecation time** set to 7 days for automatic cleanup hints
- **Optional DLM policies** for automated snapshot retention
- **Configurable retention counts** to control storage costs

## Security & IAM

### Required IAM Permissions:
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateImage",
        "ec2:DescribeImages",
        "ec2:DescribeInstances",
        "ec2:CreateTags",
        "ec2:DescribeTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dlm:CreateLifecyclePolicy",
        "dlm:GetLifecyclePolicy",
        "dlm:UpdateLifecyclePolicy",
        "dlm:DeleteLifecyclePolicy"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "dlm:Target": "INSTANCE"
        }
      }
    }
  ]
}
```

## Usage Example

```hcl
module "ami_snapshot" {
  source = "registry.coder.com/modules/mavrickrishi/aws-ami-snapshot"

  instance_id     = aws_instance.workspace.id
  default_ami_id  = data.aws_ami.ubuntu.id
  template_name   = "my-workspace-template"

  # Optional: Enable automated cleanup
  enable_dlm_cleanup       = true
  dlm_role_arn            = aws_iam_role.dlm_lifecycle_role.arn
  snapshot_retention_count = 5

  tags = {
    Environment = "production"
    Team        = "engineering"
  }
}

resource "aws_instance" "workspace" {
  ami           = module.ami_snapshot.ami_id
  instance_type = "t3.large"

  # Prevent Terraform from recreating instance when AMI changes
  lifecycle {
    ignore_changes = [ami]
  }
}
```

## Related Issues

- **Closes #26** - AWS Snapshot functionality
- **Implements** all 7 requirements from the GitHub issue
- **Provides** persistent workspace state across stop/start cycles

## Video Demonstration




https://github.com/user-attachments/assets/9356e4b5-9a67-4988-a03f-57e950afa5c2


https://github.com/user-attachments/assets/b6af98db-5d01-4aff-853d-055b92911ea5

---------

Co-authored-by: DevCats <christofer@coder.com>
Co-authored-by: DevCats <chris@dualriver.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Atif Ali <atif@coder.com>
2025-08-25 20:21:56 -05:00

9 lines
278 B
TOML

[default.extend-words]
muc = "muc" # For Munich location code
Hashi = "Hashi"
HashiCorp = "HashiCorp"
mavrickrishi = "mavrickrishi" # Username
mavrick = "mavrick" # Username
[files]
extend-exclude = ["registry/coder/templates/aws-devcontainer/architecture.svg"] #False positive