Tutorial 11: Manage Resource Drift

Learning Objectives

Understand what resource drift is and why it occurs
Learn to detect drift using Terraform commands
Practice resolving different types of drift scenarios
Implement drift prevention strategies
Use refresh-only mode and import functionality

What is Resource Drift?

Resource drift occurs when the actual state of your infrastructure differs from what Terraform expects based on its state file. This can happen when:

Manual Changes: Someone modifies resources outside of Terraform
External Automation: Other tools or scripts change infrastructure
Console Changes: Modifications made through cloud provider consoles
API Changes: Direct API calls that bypass Terraform
Process Failures: Incomplete Terraform operations

Types of Drift

1. Configuration Drift

Resource exists but has different configuration:

# Terraform expects
resource "aws_instance" "web" {
  instance_type = "t2.micro"
  # But actual instance is t2.small
}

2. Resource Deletion Drift

Resource deleted outside Terraform:

# Terraform expects resource to exist
resource "aws_instance" "web" {
  # But instance was terminated manually
}

3. Resource Creation Drift

New resources created outside Terraform:

# Terraform doesn't know about manually created resources
# that should be managed by Terraform

Detecting Resource Drift

Using terraform plan

The primary way to detect drift is through terraform plan:

# Check for drift
terraform plan

# Detailed output
terraform plan -detailed-exitcode

Plan Output Examples

No Drift:

No changes. Your infrastructure matches the configuration.

Configuration Drift:

# aws_instance.web will be updated in-place
~ resource "aws_instance" "web" {
    ~ instance_type = "t2.small" -> "t2.micro"
      # (other attributes remain unchanged)
  }

Resource Deleted:

# aws_instance.web will be created
+ resource "aws_instance" "web" {
    + ami           = "ami-12345"
    + instance_type = "t2.micro"
    # ... other attributes
  }

Using terraform refresh

# Refresh state from real infrastructure
terraform refresh

# Then check for changes
terraform plan

Refresh-Only Mode (Recommended)

# Safer refresh that shows what would change
terraform plan -refresh-only

# Apply refresh changes
terraform apply -refresh-only

Practical Drift Scenarios

Scenario 1: Manual Instance Type Change

Setup Infrastructure

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-west-2"
}

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]
  
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  
  tags = {
    Name = "drift-example"
  }
}

output "instance_id" {
  value = aws_instance.web.id
}

Create Infrastructure

terraform init
terraform apply

Simulate Drift (Manual Change)

# Get instance ID
INSTANCE_ID=$(terraform output -raw instance_id)

# Change instance type manually (simulates console change)
aws ec2 modify-instance-attribute \
  --instance-id $INSTANCE_ID \
  --instance-type '{"Value": "t2.small"}'

Detect and Resolve Drift

# Detect drift
terraform plan

Expected output:

# aws_instance.web will be updated in-place
~ resource "aws_instance" "web" {
    ~ instance_type = "t2.small" -> "t2.micro"
}

Resolution Options:

Revert to Terraform Configuration:

terraform apply  # Reverts instance back to t2.micro

Update Configuration to Match Reality:

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.small"  # Update to match actual state
  
  tags = {
    Name = "drift-example"
  }
}

Scenario 2: Resource Deleted Outside Terraform

Simulate Resource Deletion

# Terminate instance manually
aws ec2 terminate-instances --instance-ids $INSTANCE_ID

Detect Drift

terraform plan

Expected output:

# aws_instance.web will be created
+ resource "aws_instance" "web" {
    + ami           = "ami-12345"
    + instance_type = "t2.micro"
    # ... all attributes will be created
}

Resolution

# Recreate the resource
terraform apply

Scenario 3: Tag Modifications

Setup with Tags

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  
  tags = {
    Name        = "drift-example"
    Environment = "development"
    Owner       = "terraform"
  }
}

Simulate Tag Changes

# Add tags manually
aws ec2 create-tags \
  --resources $INSTANCE_ID \
  --tags Key=Team,Value=DevOps Key=Project,Value=WebApp

# Modify existing tag
aws ec2 create-tags \
  --resources $INSTANCE_ID \
  --tags Key=Environment,Value=staging

Detect and Handle Tag Drift

terraform plan

Output shows tag differences:

# aws_instance.web will be updated in-place
~ resource "aws_instance" "web" {
    ~ tags = {
        - "Environment" = "staging" -> "development"
        - "Project"     = "WebApp" -> null
        - "Team"        = "DevOps" -> null
          "Name"        = "drift-example"
          "Owner"       = "terraform"
      }
}

Options:

Revert tags: terraform apply
Ignore specific tags: Use ignore_changes
Update configuration: Add new tags to Terraform

Scenario 4: Security Group Rule Changes

Setup Security Group

resource "aws_security_group" "web" {
  name        = "web-sg"
  description = "Security group for web server"
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Simulate Manual Rule Addition

# Add SSH rule manually
aws ec2 authorize-security-group-ingress \
  --group-id $(terraform output -raw security_group_id) \
  --protocol tcp \
  --port 22 \
  --cidr 0.0.0.0/0

Handle the Drift

Terraform will remove the manually added rule on next apply unless you:

Add rule to configuration:

resource "aws_security_group" "web" {
  # ... existing configuration
  
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Use separate resource for manual rules:

resource "aws_security_group_rule" "manual_ssh" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.web.id
}

Advanced Drift Management

Using Lifecycle Rules to Prevent Drift

Ignore Changes

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  
  tags = {
    Name = "web-server"
  }
  
  lifecycle {
    ignore_changes = [
      tags,           # Ignore all tag changes
      ami,            # Ignore AMI updates
      user_data,      # Ignore user data changes
    ]
  }
}

Ignore Specific Tag Changes

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  
  tags = {
    Name        = "web-server"
    Environment = "production"
    ManagedBy   = "terraform"
  }
  
  lifecycle {
    ignore_changes = [
      tags["LastModified"],
      tags["Backup"],
    ]
  }
}

Prevent Accidental Deletion

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  
  lifecycle {
    prevent_destroy = true
  }
}

Import Existing Resources

Import Resource Not Managed by Terraform

# Import existing EC2 instance
terraform import aws_instance.existing i-1234567890abcdef0

# Import S3 bucket
terraform import aws_s3_bucket.existing my-existing-bucket

# Import security group
terraform import aws_security_group.existing sg-12345678

Import Process Example

# 1. Add resource configuration
resource "aws_instance" "imported" {
  ami           = "ami-12345"  # Will be updated after import
  instance_type = "t2.micro"  # Will be updated after import
  
  # Minimal configuration for import
}

# 2. Import the resource
terraform import aws_instance.imported i-1234567890abcdef0

# 3. Update configuration to match imported resource
terraform plan  # Shows what needs to be updated

# 4. Update configuration file to match actual state
terraform plan  # Should show no changes

Automated Drift Detection

Drift Detection Script

#!/bin/bash
# drift-detection.sh

set -e

DRIFT_DETECTED=false

echo "Checking for infrastructure drift..."

# Run plan and capture output
if ! terraform plan -detailed-exitcode -out=drift.plan > drift.log 2>&1; then
    EXIT_CODE=$?
    
    if [ $EXIT_CODE -eq 2 ]; then
        echo "DRIFT DETECTED: Infrastructure changes found"
        DRIFT_DETECTED=true
        
        # Show the differences
        echo "Drift details:"
        cat drift.log
        
        # Optional: Send notification
        # send_slack_notification "Drift detected in $ENV environment"
        
    else
        echo "ERROR: terraform plan failed"
        cat drift.log
        exit 1
    fi
else
    echo "No drift detected - infrastructure matches configuration"
fi

# Clean up
rm -f drift.plan drift.log

if [ "$DRIFT_DETECTED" = true ]; then
    exit 2  # Exit with code 2 to indicate drift
fi

Scheduled Drift Monitoring

# .github/workflows/drift-detection.yml
name: Infrastructure Drift Detection

on:
  schedule:
    - cron: '0 9 * * *'  # Daily at 9 AM
  workflow_dispatch:

jobs:
  detect-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.5.0
          
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-west-2
          
      - name: Initialize Terraform
        run: terraform init
        
      - name: Check for drift
        run: |
          if ! terraform plan -detailed-exitcode; then
            echo "::warning::Infrastructure drift detected"
            # Send notification to Slack/Teams/Email
          fi

Drift Resolution Strategies

1. Immediate Correction

# Immediately fix drift by applying configuration
terraform apply -auto-approve

2. Planned Correction

# Create plan for review
terraform plan -out=drift-fix.plan

# Review plan with team
terraform show drift-fix.plan

# Apply after approval
terraform apply drift-fix.plan

3. Configuration Update

# Update Terraform configuration to match reality
# Then run plan to verify
terraform plan

4. Selective Correction

# Fix only specific resources
terraform apply -target=aws_instance.web

# Or fix multiple specific resources
terraform apply -target=aws_instance.web -target=aws_security_group.web

Drift Prevention Best Practices

1. Access Control

# IAM policy to restrict console access
data "aws_iam_policy_document" "restrict_console" {
  statement {
    effect = "Deny"
    actions = [
      "ec2:ModifyInstanceAttribute",
      "ec2:TerminateInstances",
      "s3:DeleteBucket",
      "rds:ModifyDBInstance"
    ]
    resources = ["*"]
    
    condition {
      test     = "StringNotEquals"
      variable = "aws:RequestedRegion"
      values   = ["us-west-2"]
    }
  }
}

2. Resource Tagging Strategy

locals {
  common_tags = {
    ManagedBy   = "terraform"
    Environment = var.environment
    Project     = var.project_name
    Owner       = var.team_email
    
    # Add automation tags
    TerraformConfig = basename(path.cwd)
    TerraformState  = "s3://bucket/path/terraform.tfstate"
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type
  
  tags = merge(local.common_tags, {
    Name = "${var.project_name}-web-server"
    Role = "web-server"
  })
}

3. Monitoring and Alerting

# CloudWatch alarm for instance state changes
resource "aws_cloudwatch_metric_alarm" "instance_state_change" {
  alarm_name          = "ec2-instance-state-change"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "StateChange"
  namespace           = "AWS/EC2"
  period              = "300"
  statistic           = "Sum"
  threshold           = "0"
  
  dimensions = {
    InstanceId = aws_instance.web.id
  }
  
  alarm_description = "Instance state changed outside of Terraform"
  alarm_actions     = [aws_sns_topic.alerts.arn]
}

4. Regular Drift Checks

# Add to CI/CD pipeline
#!/bin/bash
# Regular drift check as part of deployment pipeline

echo "Checking for drift before deployment..."

if ! terraform plan -detailed-exitcode > /dev/null; then
    echo "WARNING: Drift detected before deployment"
    echo "Please review and resolve drift before proceeding"
    
    # Show drift details
    terraform plan
    
    # Optionally fail the pipeline
    # exit 1
fi

echo "No drift detected, proceeding with deployment"

5. Documentation and Training

# Infrastructure Change Policy

## Approved Methods for Infrastructure Changes
1. ✅ Terraform configurations in version control
2. ✅ Emergency procedures with immediate Terraform update
3. ❌ Manual changes through AWS Console
4. ❌ Direct AWS CLI modifications
5. ❌ Third-party tools without Terraform integration

## Emergency Change Procedure
1. Make necessary manual change
2. Immediately update Terraform configuration
3. Run `terraform plan` to verify configuration matches reality
4. Commit configuration changes to version control
5. Document the emergency change in incident log

## Drift Detection Schedule
- Daily automated checks
- Pre-deployment drift verification
- Weekly drift reports
- Monthly drift review meetings

Troubleshooting Common Drift Issues

Issue 1: State File Out of Sync

# Symptoms: Plan shows unexpected changes
# Solution: Refresh state
terraform refresh

# Or use refresh-only mode (safer)
terraform plan -refresh-only
terraform apply -refresh-only

Issue 2: Resource Not Found

# Symptoms: Error about resource not existing
# Solution: Remove from state if intentionally deleted
terraform state rm aws_instance.deleted_manually

# Or recreate if accidentally deleted
terraform apply

Issue 3: Import Conflicts

# Symptoms: Resource already exists error during apply
# Solution: Import existing resource
terraform import aws_instance.existing i-1234567890abcdef0

# Then update configuration to match
terraform plan

Issue 4: Permission Issues

# Symptoms: Access denied errors during drift detection
# Solution: Verify IAM permissions
aws sts get-caller-identity
aws iam get-user

# Check specific permissions
aws ec2 describe-instances --dry-run

Key Takeaways

Resource drift is inevitable in dynamic environments
Regular drift detection prevents small issues from becoming big problems
Use terraform plan regularly to detect drift early
Implement access controls to minimize unauthorized changes
Use lifecycle rules to ignore expected external changes
Have procedures for both preventing and resolving drift
Import existing resources rather than recreating them
Monitor infrastructure changes with CloudWatch and alerts
Train team members on proper change procedures

Next Steps

Complete Tutorial 12: Use Refresh-Only Mode
Learn about Terraform workspaces for environment management
Explore advanced state management techniques
Practice with complex drift scenarios

Manage Resource Drift