AskLearn
Loading...
← Back to Terraform Course
IntermediateFundamentals

Query Data Sources

Access external data

Tutorial 7: Query Data Sources

Learning Objectives

  • Understand the purpose and benefits of data sources
  • Learn to query existing infrastructure and external data
  • Practice using common data sources across different providers
  • Integrate data sources with resource configurations

What are Data Sources?

Data sources allow Terraform to fetch information from existing infrastructure, external services, or APIs that can be used in your configuration. Unlike resources that create and manage infrastructure, data sources are read-only and query existing data.

Data Sources vs Resources

# Resource - Creates/manages infrastructure
resource "aws_instance" "web" {
  ami           = "ami-12345"
  instance_type = "t2.micro"
}

# Data Source - Queries existing information
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]
}

Benefits of Data Sources

  • Dynamic Configurations: Automatically use latest AMIs, availability zones
  • Environment Discovery: Query existing infrastructure details
  • Avoid Hardcoding: Reference dynamic values instead of static ones
  • Integration: Connect to existing resources managed outside Terraform
  • Validation: Ensure referenced resources exist

Data Source Syntax

Basic Structure

data "provider_type" "name" {
  # Query parameters
  argument1 = "value1"
  argument2 = "value2"
  
  # Filters and constraints
  filter {
    name   = "filter_name"
    values = ["filter_value"]
  }
}

Referencing Data Sources

# Reference using data.type.name.attribute
resource "aws_instance" "web" {
  ami = data.aws_ami.amazon_linux.id
}

Common AWS Data Sources

AMI Data Source

# Get the latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]
  
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
  
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

# Use in resource
resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  
  tags = {
    AMI_Name = data.aws_ami.amazon_linux.name
    AMI_Date = data.aws_ami.amazon_linux.creation_date
  }
}

Availability Zones Data Source

# Get all available AZs in current region
data "aws_availability_zones" "available" {
  state = "available"
}

# Use for subnet creation
resource "aws_subnet" "public" {
  count             = length(data.aws_availability_zones.available.names)
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 1}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
  
  tags = {
    Name = "public-subnet-${count.index + 1}"
    AZ   = data.aws_availability_zones.available.names[count.index]
  }
}

VPC Data Source

# Query existing VPC by tag
data "aws_vpc" "existing" {
  filter {
    name   = "tag:Name"
    values = ["existing-vpc"]
  }
}

# Query default VPC
data "aws_vpc" "default" {
  default = true
}

# Use existing VPC
resource "aws_subnet" "app" {
  vpc_id     = data.aws_vpc.existing.id
  cidr_block = "10.0.100.0/24"
  
  tags = {
    Name    = "app-subnet"
    VPC_ID  = data.aws_vpc.existing.id
    VPC_CIDR = data.aws_vpc.existing.cidr_block
  }
}

Security Group Data Source

# Query existing security group
data "aws_security_group" "web" {
  filter {
    name   = "group-name"
    values = ["web-security-group"]
  }
}

# Use in instance
resource "aws_instance" "app" {
  ami                    = data.aws_ami.amazon_linux.id
  instance_type          = "t2.micro"
  vpc_security_group_ids = [data.aws_security_group.web.id]
}

Route53 Zone Data Source

# Query hosted zone
data "aws_route53_zone" "main" {
  name         = "example.com"
  private_zone = false
}

# Create DNS record
resource "aws_route53_record" "web" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = "web.${data.aws_route53_zone.main.name}"
  type    = "A"
  ttl     = 300
  records = [aws_instance.web.public_ip]
}

Advanced Data Source Usage

Multiple Filters

data "aws_ami" "web_server" {
  most_recent = true
  owners      = ["self", "amazon"]
  
  filter {
    name   = "name"
    values = ["web-server-*"]
  }
  
  filter {
    name   = "architecture"
    values = ["x86_64"]
  }
  
  filter {
    name   = "state"
    values = ["available"]
  }
  
  filter {
    name   = "tag:Environment"
    values = ["production"]
  }
}

Data Source with Variables

variable "environment" {
  type = string
}

data "aws_ami" "app" {
  most_recent = true
  owners      = ["self"]
  
  filter {
    name   = "name"
    values = ["${var.environment}-app-*"]
  }
  
  filter {
    name   = "tag:Environment"
    values = [var.environment]
  }
}

Conditional Data Sources

variable "use_existing_vpc" {
  type    = bool
  default = false
}

variable "existing_vpc_name" {
  type    = string
  default = ""
}

# Only query if using existing VPC
data "aws_vpc" "existing" {
  count = var.use_existing_vpc ? 1 : 0
  
  filter {
    name   = "tag:Name"
    values = [var.existing_vpc_name]
  }
}

# Use existing or created VPC
locals {
  vpc_id = var.use_existing_vpc ? data.aws_vpc.existing[0].id : aws_vpc.new[0].id
}

Working with External Data

HTTP Data Source

# Query external API
data "http" "my_ip" {
  url = "https://ifconfig.me/ip"
}

# Use in security group
resource "aws_security_group" "admin" {
  name = "admin-access"
  
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["${chomp(data.http.my_ip.response_body)}/32"]
  }
}

Template Data Source

# Read and render template file
data "template_file" "user_data" {
  template = file("${path.module}/user-data.sh.tpl")
  
  vars = {
    environment = var.environment
    app_name    = var.app_name
    db_host     = aws_db_instance.main.endpoint
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t2.micro"
  user_data     = data.template_file.user_data.rendered
}

Local File Data Source

# Read local file content
data "local_file" "ssh_key" {
  filename = "~/.ssh/id_rsa.pub"
}

resource "aws_key_pair" "deployer" {
  key_name   = "deployer-key"
  public_key = data.local_file.ssh_key.content
}

Cross-Provider Data Sources

Multiple Cloud Providers

# AWS data source
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]
  
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*"]
  }
}

# Azure data source
data "azurerm_client_config" "current" {}

data "azurerm_resource_group" "main" {
  name = "existing-rg"
}

# Use both in respective resources
resource "aws_instance" "web" {
  ami = data.aws_ami.amazon_linux.id
}

resource "azurerm_virtual_machine" "web" {
  resource_group_name = data.azurerm_resource_group.main.name
}

Data Source Error Handling

Handling Missing Resources

# This will fail if no AMI is found
data "aws_ami" "specific" {
  owners = ["self"]
  
  filter {
    name   = "name"
    values = ["very-specific-ami-name"]
  }
}

# Better approach with try() function
locals {
  ami_id = try(data.aws_ami.specific.id, data.aws_ami.fallback.id)
}

data "aws_ami" "fallback" {
  most_recent = true
  owners      = ["amazon"]
  
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*"]
  }
}

Validation with Data Sources

data "aws_vpc" "selected" {
  id = var.vpc_id
}

# Validate VPC has required tags
locals {
  has_required_tags = contains(keys(data.aws_vpc.selected.tags), "Environment")
}

# Use in validation or conditional logic
resource "aws_subnet" "app" {
  count = local.has_required_tags ? 1 : 0
  
  vpc_id     = data.aws_vpc.selected.id
  cidr_block = "10.0.1.0/24"
}

Complex Data Source Examples

Building Dynamic Infrastructure

# Get all subnets in a VPC
data "aws_subnets" "app" {
  filter {
    name   = "vpc-id"
    values = [var.vpc_id]
  }
  
  filter {
    name   = "tag:Type"
    values = ["private"]
  }
}

# Get details for each subnet
data "aws_subnet" "app" {
  for_each = toset(data.aws_subnets.app.ids)
  id       = each.value
}

# Create instances across all subnets
resource "aws_instance" "app" {
  for_each = data.aws_subnet.app
  
  ami               = data.aws_ami.amazon_linux.id
  instance_type     = "t2.micro"
  subnet_id         = each.value.id
  availability_zone = each.value.availability_zone
  
  tags = {
    Name = "app-${each.value.availability_zone}"
    AZ   = each.value.availability_zone
  }
}

Database Configuration Discovery

# Query parameter group
data "aws_db_parameter_group" "mysql" {
  name = "mysql-parameters"
}

# Query subnet group
data "aws_db_subnet_group" "main" {
  name = "main-db-subnet-group"
}

# Query security group
data "aws_security_group" "db" {
  filter {
    name   = "tag:Purpose"
    values = ["database"]
  }
}

# Create RDS instance using discovered resources
resource "aws_db_instance" "main" {
  identifier = "main-database"
  
  engine         = "mysql"
  engine_version = "8.0"
  instance_class = "db.t3.micro"
  
  db_name  = "appdb"
  username = "admin"
  password = var.db_password
  
  parameter_group_name   = data.aws_db_parameter_group.mysql.name
  db_subnet_group_name   = data.aws_db_subnet_group.main.name
  vpc_security_group_ids = [data.aws_security_group.db.id]
  
  allocated_storage = 20
  storage_type      = "gp3"
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot = true
  
  tags = {
    Name = "main-database"
  }
}

Data Source Dependencies

Implicit Dependencies

# Data source depends on resource
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
  
  tags = {
    Name = "main-vpc"
  }
}

# This data source will wait for VPC creation
data "aws_vpc" "main" {
  depends_on = [aws_vpc.main]
  
  filter {
    name   = "tag:Name"
    values = ["main-vpc"]
  }
}

Data Source Chains

# Chain of data sources
data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    
    principals {
      type        = "AWS"
      identifiers = [data.aws_caller_identity.current.arn]
    }
  }
}

resource "aws_iam_role" "main" {
  name               = "main-role"
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

Best Practices

1. Use Specific Filters

# Good - specific filters
data "aws_ami" "web" {
  most_recent = true
  owners      = ["self", "amazon"]
  
  filter {
    name   = "name"
    values = ["web-server-v2.*"]
  }
  
  filter {
    name   = "state"
    values = ["available"]
  }
}

# Avoid - too generic
data "aws_ami" "web" {
  most_recent = true
  owners      = ["amazon"]
}

2. Handle Multiple Results

# When expecting single result
data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["main-vpc"]
  }
}

# When expecting multiple results
data "aws_subnets" "private" {
  filter {
    name   = "tag:Type"
    values = ["private"]
  }
}

3. Document Data Source Purpose

# Query the latest approved AMI for web servers
# This AMI is built and maintained by the platform team
data "aws_ami" "web_server" {
  most_recent = true
  owners      = ["123456789012"]  # Platform team account
  
  filter {
    name   = "name"
    values = ["approved-web-server-*"]
  }
  
  filter {
    name   = "tag:Approved"
    values = ["true"]
  }
}

4. Use Variables for Flexibility

variable "ami_name_pattern" {
  description = "Pattern to match AMI names"
  type        = string
  default     = "amzn2-ami-hvm-*"
}

data "aws_ami" "selected" {
  most_recent = true
  owners      = ["amazon"]
  
  filter {
    name   = "name"
    values = [var.ami_name_pattern]
  }
}

5. Validate Data Source Results

data "aws_ami" "web" {
  most_recent = true
  owners      = ["self"]
  
  filter {
    name   = "name"
    values = ["web-server-*"]
  }
}

# Validate AMI was found and is recent
locals {
  ami_age_days = (parseint(formatdate("YYYYMMDD", timestamp()), 10) - 
                 parseint(formatdate("YYYYMMDD", data.aws_ami.web.creation_date), 10))
  
  ami_is_recent = local.ami_age_days <= 30
}

# Use validation in lifecycle rules or conditional logic
resource "aws_instance" "web" {
  ami           = data.aws_ami.web.id
  instance_type = "t2.micro"
  
  lifecycle {
    precondition {
      condition     = local.ami_is_recent
      error_message = "AMI is older than 30 days. Please update."
    }
  }
}

Testing Data Sources

Validate Data Source Queries

# Plan to see data source results
terraform plan

# Show specific data source
terraform show -json | jq '.values.root_module.data[]'

# Refresh data sources
terraform refresh

Debug Data Source Issues

# Enable debug logging
export TF_LOG=DEBUG
terraform plan

# Check for specific data source
terraform console
> data.aws_ami.amazon_linux

Key Takeaways

  • Data sources query existing infrastructure and external data
  • Use filters and constraints to get specific results
  • Data sources make configurations dynamic and avoid hardcoding
  • Handle missing data gracefully with try() and conditional logic
  • Use specific filters to avoid ambiguous results
  • Document the purpose of each data source
  • Validate data source results when critical
  • Test data source queries in different environments

Next Steps

  1. Complete Tutorial 8: Output Values
  2. Learn about local values and expressions
  3. Explore built-in functions for data manipulation
  4. Practice with complex data source scenarios

Additional Resources