Tutorial 11: Manage Resource Drift
Learning Objectives
- Understand what resource drift is and why it occurs
- Learn to detect drift using Terraform commands
- Practice resolving different types of drift scenarios
- Implement drift prevention strategies
- Use refresh-only mode and import functionality
What is Resource Drift?
Resource drift occurs when the actual state of your infrastructure differs from what Terraform expects based on its state file. This can happen when:
- Manual Changes: Someone modifies resources outside of Terraform
- External Automation: Other tools or scripts change infrastructure
- Console Changes: Modifications made through cloud provider consoles
- API Changes: Direct API calls that bypass Terraform
- Process Failures: Incomplete Terraform operations
Types of Drift
1. Configuration Drift
Resource exists but has different configuration:
# Terraform expects
resource "aws_instance" "web" {
instance_type = "t2.micro"
# But actual instance is t2.small
}
2. Resource Deletion Drift
Resource deleted outside Terraform:
# Terraform expects resource to exist
resource "aws_instance" "web" {
# But instance was terminated manually
}
3. Resource Creation Drift
New resources created outside Terraform:
# Terraform doesn't know about manually created resources
# that should be managed by Terraform
Detecting Resource Drift
Using terraform plan
The primary way to detect drift is through terraform plan
:
# Check for drift
terraform plan
# Detailed output
terraform plan -detailed-exitcode
Plan Output Examples
No Drift:
No changes. Your infrastructure matches the configuration.
Configuration Drift:
# aws_instance.web will be updated in-place
~ resource "aws_instance" "web" {
~ instance_type = "t2.small" -> "t2.micro"
# (other attributes remain unchanged)
}
Resource Deleted:
# aws_instance.web will be created
+ resource "aws_instance" "web" {
+ ami = "ami-12345"
+ instance_type = "t2.micro"
# ... other attributes
}
Using terraform refresh
# Refresh state from real infrastructure
terraform refresh
# Then check for changes
terraform plan
Refresh-Only Mode (Recommended)
# Safer refresh that shows what would change
terraform plan -refresh-only
# Apply refresh changes
terraform apply -refresh-only
Practical Drift Scenarios
Scenario 1: Manual Instance Type Change
Setup Infrastructure
# main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-west-2"
}
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
tags = {
Name = "drift-example"
}
}
output "instance_id" {
value = aws_instance.web.id
}
Create Infrastructure
terraform init
terraform apply
Simulate Drift (Manual Change)
# Get instance ID
INSTANCE_ID=$(terraform output -raw instance_id)
# Change instance type manually (simulates console change)
aws ec2 modify-instance-attribute \
--instance-id $INSTANCE_ID \
--instance-type '{"Value": "t2.small"}'
Detect and Resolve Drift
# Detect drift
terraform plan
Expected output:
# aws_instance.web will be updated in-place
~ resource "aws_instance" "web" {
~ instance_type = "t2.small" -> "t2.micro"
}
Resolution Options:
- Revert to Terraform Configuration:
terraform apply # Reverts instance back to t2.micro
- Update Configuration to Match Reality:
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.small" # Update to match actual state
tags = {
Name = "drift-example"
}
}
Scenario 2: Resource Deleted Outside Terraform
Simulate Resource Deletion
# Terminate instance manually
aws ec2 terminate-instances --instance-ids $INSTANCE_ID
Detect Drift
terraform plan
Expected output:
# aws_instance.web will be created
+ resource "aws_instance" "web" {
+ ami = "ami-12345"
+ instance_type = "t2.micro"
# ... all attributes will be created
}
Resolution
# Recreate the resource
terraform apply
Scenario 3: Tag Modifications
Setup with Tags
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
tags = {
Name = "drift-example"
Environment = "development"
Owner = "terraform"
}
}
Simulate Tag Changes
# Add tags manually
aws ec2 create-tags \
--resources $INSTANCE_ID \
--tags Key=Team,Value=DevOps Key=Project,Value=WebApp
# Modify existing tag
aws ec2 create-tags \
--resources $INSTANCE_ID \
--tags Key=Environment,Value=staging
Detect and Handle Tag Drift
terraform plan
Output shows tag differences:
# aws_instance.web will be updated in-place
~ resource "aws_instance" "web" {
~ tags = {
- "Environment" = "staging" -> "development"
- "Project" = "WebApp" -> null
- "Team" = "DevOps" -> null
"Name" = "drift-example"
"Owner" = "terraform"
}
}
Options:
- Revert tags:
terraform apply
- Ignore specific tags: Use
ignore_changes
- Update configuration: Add new tags to Terraform
Scenario 4: Security Group Rule Changes
Setup Security Group
resource "aws_security_group" "web" {
name = "web-sg"
description = "Security group for web server"
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Simulate Manual Rule Addition
# Add SSH rule manually
aws ec2 authorize-security-group-ingress \
--group-id $(terraform output -raw security_group_id) \
--protocol tcp \
--port 22 \
--cidr 0.0.0.0/0
Handle the Drift
Terraform will remove the manually added rule on next apply unless you:
- Add rule to configuration:
resource "aws_security_group" "web" {
# ... existing configuration
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
- Use separate resource for manual rules:
resource "aws_security_group_rule" "manual_ssh" {
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
security_group_id = aws_security_group.web.id
}
Advanced Drift Management
Using Lifecycle Rules to Prevent Drift
Ignore Changes
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
tags = {
Name = "web-server"
}
lifecycle {
ignore_changes = [
tags, # Ignore all tag changes
ami, # Ignore AMI updates
user_data, # Ignore user data changes
]
}
}
Ignore Specific Tag Changes
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
tags = {
Name = "web-server"
Environment = "production"
ManagedBy = "terraform"
}
lifecycle {
ignore_changes = [
tags["LastModified"],
tags["Backup"],
]
}
}
Prevent Accidental Deletion
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
lifecycle {
prevent_destroy = true
}
}
Import Existing Resources
Import Resource Not Managed by Terraform
# Import existing EC2 instance
terraform import aws_instance.existing i-1234567890abcdef0
# Import S3 bucket
terraform import aws_s3_bucket.existing my-existing-bucket
# Import security group
terraform import aws_security_group.existing sg-12345678
Import Process Example
# 1. Add resource configuration
resource "aws_instance" "imported" {
ami = "ami-12345" # Will be updated after import
instance_type = "t2.micro" # Will be updated after import
# Minimal configuration for import
}
# 2. Import the resource
terraform import aws_instance.imported i-1234567890abcdef0
# 3. Update configuration to match imported resource
terraform plan # Shows what needs to be updated
# 4. Update configuration file to match actual state
terraform plan # Should show no changes
Automated Drift Detection
Drift Detection Script
#!/bin/bash
# drift-detection.sh
set -e
DRIFT_DETECTED=false
echo "Checking for infrastructure drift..."
# Run plan and capture output
if ! terraform plan -detailed-exitcode -out=drift.plan > drift.log 2>&1; then
EXIT_CODE=$?
if [ $EXIT_CODE -eq 2 ]; then
echo "DRIFT DETECTED: Infrastructure changes found"
DRIFT_DETECTED=true
# Show the differences
echo "Drift details:"
cat drift.log
# Optional: Send notification
# send_slack_notification "Drift detected in $ENV environment"
else
echo "ERROR: terraform plan failed"
cat drift.log
exit 1
fi
else
echo "No drift detected - infrastructure matches configuration"
fi
# Clean up
rm -f drift.plan drift.log
if [ "$DRIFT_DETECTED" = true ]; then
exit 2 # Exit with code 2 to indicate drift
fi
Scheduled Drift Monitoring
# .github/workflows/drift-detection.yml
name: Infrastructure Drift Detection
on:
schedule:
- cron: '0 9 * * *' # Daily at 9 AM
workflow_dispatch:
jobs:
detect-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
- name: Initialize Terraform
run: terraform init
- name: Check for drift
run: |
if ! terraform plan -detailed-exitcode; then
echo "::warning::Infrastructure drift detected"
# Send notification to Slack/Teams/Email
fi
Drift Resolution Strategies
1. Immediate Correction
# Immediately fix drift by applying configuration
terraform apply -auto-approve
2. Planned Correction
# Create plan for review
terraform plan -out=drift-fix.plan
# Review plan with team
terraform show drift-fix.plan
# Apply after approval
terraform apply drift-fix.plan
3. Configuration Update
# Update Terraform configuration to match reality
# Then run plan to verify
terraform plan
4. Selective Correction
# Fix only specific resources
terraform apply -target=aws_instance.web
# Or fix multiple specific resources
terraform apply -target=aws_instance.web -target=aws_security_group.web
Drift Prevention Best Practices
1. Access Control
# IAM policy to restrict console access
data "aws_iam_policy_document" "restrict_console" {
statement {
effect = "Deny"
actions = [
"ec2:ModifyInstanceAttribute",
"ec2:TerminateInstances",
"s3:DeleteBucket",
"rds:ModifyDBInstance"
]
resources = ["*"]
condition {
test = "StringNotEquals"
variable = "aws:RequestedRegion"
values = ["us-west-2"]
}
}
}
2. Resource Tagging Strategy
locals {
common_tags = {
ManagedBy = "terraform"
Environment = var.environment
Project = var.project_name
Owner = var.team_email
# Add automation tags
TerraformConfig = basename(path.cwd)
TerraformState = "s3://bucket/path/terraform.tfstate"
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
tags = merge(local.common_tags, {
Name = "${var.project_name}-web-server"
Role = "web-server"
})
}
3. Monitoring and Alerting
# CloudWatch alarm for instance state changes
resource "aws_cloudwatch_metric_alarm" "instance_state_change" {
alarm_name = "ec2-instance-state-change"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "StateChange"
namespace = "AWS/EC2"
period = "300"
statistic = "Sum"
threshold = "0"
dimensions = {
InstanceId = aws_instance.web.id
}
alarm_description = "Instance state changed outside of Terraform"
alarm_actions = [aws_sns_topic.alerts.arn]
}
4. Regular Drift Checks
# Add to CI/CD pipeline
#!/bin/bash
# Regular drift check as part of deployment pipeline
echo "Checking for drift before deployment..."
if ! terraform plan -detailed-exitcode > /dev/null; then
echo "WARNING: Drift detected before deployment"
echo "Please review and resolve drift before proceeding"
# Show drift details
terraform plan
# Optionally fail the pipeline
# exit 1
fi
echo "No drift detected, proceeding with deployment"
5. Documentation and Training
# Infrastructure Change Policy
## Approved Methods for Infrastructure Changes
1. ✅ Terraform configurations in version control
2. ✅ Emergency procedures with immediate Terraform update
3. ❌ Manual changes through AWS Console
4. ❌ Direct AWS CLI modifications
5. ❌ Third-party tools without Terraform integration
## Emergency Change Procedure
1. Make necessary manual change
2. Immediately update Terraform configuration
3. Run `terraform plan` to verify configuration matches reality
4. Commit configuration changes to version control
5. Document the emergency change in incident log
## Drift Detection Schedule
- Daily automated checks
- Pre-deployment drift verification
- Weekly drift reports
- Monthly drift review meetings
Troubleshooting Common Drift Issues
Issue 1: State File Out of Sync
# Symptoms: Plan shows unexpected changes
# Solution: Refresh state
terraform refresh
# Or use refresh-only mode (safer)
terraform plan -refresh-only
terraform apply -refresh-only
Issue 2: Resource Not Found
# Symptoms: Error about resource not existing
# Solution: Remove from state if intentionally deleted
terraform state rm aws_instance.deleted_manually
# Or recreate if accidentally deleted
terraform apply
Issue 3: Import Conflicts
# Symptoms: Resource already exists error during apply
# Solution: Import existing resource
terraform import aws_instance.existing i-1234567890abcdef0
# Then update configuration to match
terraform plan
Issue 4: Permission Issues
# Symptoms: Access denied errors during drift detection
# Solution: Verify IAM permissions
aws sts get-caller-identity
aws iam get-user
# Check specific permissions
aws ec2 describe-instances --dry-run
Key Takeaways
- Resource drift is inevitable in dynamic environments
- Regular drift detection prevents small issues from becoming big problems
- Use
terraform plan
regularly to detect drift early - Implement access controls to minimize unauthorized changes
- Use lifecycle rules to ignore expected external changes
- Have procedures for both preventing and resolving drift
- Import existing resources rather than recreating them
- Monitor infrastructure changes with CloudWatch and alerts
- Train team members on proper change procedures
Next Steps
- Complete Tutorial 12: Use Refresh-Only Mode
- Learn about Terraform workspaces for environment management
- Explore advanced state management techniques
- Practice with complex drift scenarios