Tutorial 7: Query Data Sources
Learning Objectives
- Understand the purpose and benefits of data sources
- Learn to query existing infrastructure and external data
- Practice using common data sources across different providers
- Integrate data sources with resource configurations
What are Data Sources?
Data sources allow Terraform to fetch information from existing infrastructure, external services, or APIs that can be used in your configuration. Unlike resources that create and manage infrastructure, data sources are read-only and query existing data.
Data Sources vs Resources
# Resource - Creates/manages infrastructure
resource "aws_instance" "web" {
ami = "ami-12345"
instance_type = "t2.micro"
}
# Data Source - Queries existing information
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
}
Benefits of Data Sources
- Dynamic Configurations: Automatically use latest AMIs, availability zones
- Environment Discovery: Query existing infrastructure details
- Avoid Hardcoding: Reference dynamic values instead of static ones
- Integration: Connect to existing resources managed outside Terraform
- Validation: Ensure referenced resources exist
Data Source Syntax
Basic Structure
data "provider_type" "name" {
# Query parameters
argument1 = "value1"
argument2 = "value2"
# Filters and constraints
filter {
name = "filter_name"
values = ["filter_value"]
}
}
Referencing Data Sources
# Reference using data.type.name.attribute
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
}
Common AWS Data Sources
AMI Data Source
# Get the latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# Use in resource
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
tags = {
AMI_Name = data.aws_ami.amazon_linux.name
AMI_Date = data.aws_ami.amazon_linux.creation_date
}
}
Availability Zones Data Source
# Get all available AZs in current region
data "aws_availability_zones" "available" {
state = "available"
}
# Use for subnet creation
resource "aws_subnet" "public" {
count = length(data.aws_availability_zones.available.names)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "public-subnet-${count.index + 1}"
AZ = data.aws_availability_zones.available.names[count.index]
}
}
VPC Data Source
# Query existing VPC by tag
data "aws_vpc" "existing" {
filter {
name = "tag:Name"
values = ["existing-vpc"]
}
}
# Query default VPC
data "aws_vpc" "default" {
default = true
}
# Use existing VPC
resource "aws_subnet" "app" {
vpc_id = data.aws_vpc.existing.id
cidr_block = "10.0.100.0/24"
tags = {
Name = "app-subnet"
VPC_ID = data.aws_vpc.existing.id
VPC_CIDR = data.aws_vpc.existing.cidr_block
}
}
Security Group Data Source
# Query existing security group
data "aws_security_group" "web" {
filter {
name = "group-name"
values = ["web-security-group"]
}
}
# Use in instance
resource "aws_instance" "app" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
vpc_security_group_ids = [data.aws_security_group.web.id]
}
Route53 Zone Data Source
# Query hosted zone
data "aws_route53_zone" "main" {
name = "example.com"
private_zone = false
}
# Create DNS record
resource "aws_route53_record" "web" {
zone_id = data.aws_route53_zone.main.zone_id
name = "web.${data.aws_route53_zone.main.name}"
type = "A"
ttl = 300
records = [aws_instance.web.public_ip]
}
Advanced Data Source Usage
Multiple Filters
data "aws_ami" "web_server" {
most_recent = true
owners = ["self", "amazon"]
filter {
name = "name"
values = ["web-server-*"]
}
filter {
name = "architecture"
values = ["x86_64"]
}
filter {
name = "state"
values = ["available"]
}
filter {
name = "tag:Environment"
values = ["production"]
}
}
Data Source with Variables
variable "environment" {
type = string
}
data "aws_ami" "app" {
most_recent = true
owners = ["self"]
filter {
name = "name"
values = ["${var.environment}-app-*"]
}
filter {
name = "tag:Environment"
values = [var.environment]
}
}
Conditional Data Sources
variable "use_existing_vpc" {
type = bool
default = false
}
variable "existing_vpc_name" {
type = string
default = ""
}
# Only query if using existing VPC
data "aws_vpc" "existing" {
count = var.use_existing_vpc ? 1 : 0
filter {
name = "tag:Name"
values = [var.existing_vpc_name]
}
}
# Use existing or created VPC
locals {
vpc_id = var.use_existing_vpc ? data.aws_vpc.existing[0].id : aws_vpc.new[0].id
}
Working with External Data
HTTP Data Source
# Query external API
data "http" "my_ip" {
url = "https://ifconfig.me/ip"
}
# Use in security group
resource "aws_security_group" "admin" {
name = "admin-access"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["${chomp(data.http.my_ip.response_body)}/32"]
}
}
Template Data Source
# Read and render template file
data "template_file" "user_data" {
template = file("${path.module}/user-data.sh.tpl")
vars = {
environment = var.environment
app_name = var.app_name
db_host = aws_db_instance.main.endpoint
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
user_data = data.template_file.user_data.rendered
}
Local File Data Source
# Read local file content
data "local_file" "ssh_key" {
filename = "~/.ssh/id_rsa.pub"
}
resource "aws_key_pair" "deployer" {
key_name = "deployer-key"
public_key = data.local_file.ssh_key.content
}
Cross-Provider Data Sources
Multiple Cloud Providers
# AWS data source
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*"]
}
}
# Azure data source
data "azurerm_client_config" "current" {}
data "azurerm_resource_group" "main" {
name = "existing-rg"
}
# Use both in respective resources
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
}
resource "azurerm_virtual_machine" "web" {
resource_group_name = data.azurerm_resource_group.main.name
}
Data Source Error Handling
Handling Missing Resources
# This will fail if no AMI is found
data "aws_ami" "specific" {
owners = ["self"]
filter {
name = "name"
values = ["very-specific-ami-name"]
}
}
# Better approach with try() function
locals {
ami_id = try(data.aws_ami.specific.id, data.aws_ami.fallback.id)
}
data "aws_ami" "fallback" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*"]
}
}
Validation with Data Sources
data "aws_vpc" "selected" {
id = var.vpc_id
}
# Validate VPC has required tags
locals {
has_required_tags = contains(keys(data.aws_vpc.selected.tags), "Environment")
}
# Use in validation or conditional logic
resource "aws_subnet" "app" {
count = local.has_required_tags ? 1 : 0
vpc_id = data.aws_vpc.selected.id
cidr_block = "10.0.1.0/24"
}
Complex Data Source Examples
Building Dynamic Infrastructure
# Get all subnets in a VPC
data "aws_subnets" "app" {
filter {
name = "vpc-id"
values = [var.vpc_id]
}
filter {
name = "tag:Type"
values = ["private"]
}
}
# Get details for each subnet
data "aws_subnet" "app" {
for_each = toset(data.aws_subnets.app.ids)
id = each.value
}
# Create instances across all subnets
resource "aws_instance" "app" {
for_each = data.aws_subnet.app
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
subnet_id = each.value.id
availability_zone = each.value.availability_zone
tags = {
Name = "app-${each.value.availability_zone}"
AZ = each.value.availability_zone
}
}
Database Configuration Discovery
# Query parameter group
data "aws_db_parameter_group" "mysql" {
name = "mysql-parameters"
}
# Query subnet group
data "aws_db_subnet_group" "main" {
name = "main-db-subnet-group"
}
# Query security group
data "aws_security_group" "db" {
filter {
name = "tag:Purpose"
values = ["database"]
}
}
# Create RDS instance using discovered resources
resource "aws_db_instance" "main" {
identifier = "main-database"
engine = "mysql"
engine_version = "8.0"
instance_class = "db.t3.micro"
db_name = "appdb"
username = "admin"
password = var.db_password
parameter_group_name = data.aws_db_parameter_group.mysql.name
db_subnet_group_name = data.aws_db_subnet_group.main.name
vpc_security_group_ids = [data.aws_security_group.db.id]
allocated_storage = 20
storage_type = "gp3"
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
skip_final_snapshot = true
tags = {
Name = "main-database"
}
}
Data Source Dependencies
Implicit Dependencies
# Data source depends on resource
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "main-vpc"
}
}
# This data source will wait for VPC creation
data "aws_vpc" "main" {
depends_on = [aws_vpc.main]
filter {
name = "tag:Name"
values = ["main-vpc"]
}
}
Data Source Chains
# Chain of data sources
data "aws_caller_identity" "current" {}
data "aws_iam_policy_document" "assume_role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "AWS"
identifiers = [data.aws_caller_identity.current.arn]
}
}
}
resource "aws_iam_role" "main" {
name = "main-role"
assume_role_policy = data.aws_iam_policy_document.assume_role.json
}
Best Practices
1. Use Specific Filters
# Good - specific filters
data "aws_ami" "web" {
most_recent = true
owners = ["self", "amazon"]
filter {
name = "name"
values = ["web-server-v2.*"]
}
filter {
name = "state"
values = ["available"]
}
}
# Avoid - too generic
data "aws_ami" "web" {
most_recent = true
owners = ["amazon"]
}
2. Handle Multiple Results
# When expecting single result
data "aws_vpc" "main" {
filter {
name = "tag:Name"
values = ["main-vpc"]
}
}
# When expecting multiple results
data "aws_subnets" "private" {
filter {
name = "tag:Type"
values = ["private"]
}
}
3. Document Data Source Purpose
# Query the latest approved AMI for web servers
# This AMI is built and maintained by the platform team
data "aws_ami" "web_server" {
most_recent = true
owners = ["123456789012"] # Platform team account
filter {
name = "name"
values = ["approved-web-server-*"]
}
filter {
name = "tag:Approved"
values = ["true"]
}
}
4. Use Variables for Flexibility
variable "ami_name_pattern" {
description = "Pattern to match AMI names"
type = string
default = "amzn2-ami-hvm-*"
}
data "aws_ami" "selected" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = [var.ami_name_pattern]
}
}
5. Validate Data Source Results
data "aws_ami" "web" {
most_recent = true
owners = ["self"]
filter {
name = "name"
values = ["web-server-*"]
}
}
# Validate AMI was found and is recent
locals {
ami_age_days = (parseint(formatdate("YYYYMMDD", timestamp()), 10) -
parseint(formatdate("YYYYMMDD", data.aws_ami.web.creation_date), 10))
ami_is_recent = local.ami_age_days <= 30
}
# Use validation in lifecycle rules or conditional logic
resource "aws_instance" "web" {
ami = data.aws_ami.web.id
instance_type = "t2.micro"
lifecycle {
precondition {
condition = local.ami_is_recent
error_message = "AMI is older than 30 days. Please update."
}
}
}
Testing Data Sources
Validate Data Source Queries
# Plan to see data source results
terraform plan
# Show specific data source
terraform show -json | jq '.values.root_module.data[]'
# Refresh data sources
terraform refresh
Debug Data Source Issues
# Enable debug logging
export TF_LOG=DEBUG
terraform plan
# Check for specific data source
terraform console
> data.aws_ami.amazon_linux
Key Takeaways
- Data sources query existing infrastructure and external data
- Use filters and constraints to get specific results
- Data sources make configurations dynamic and avoid hardcoding
- Handle missing data gracefully with try() and conditional logic
- Use specific filters to avoid ambiguous results
- Document the purpose of each data source
- Validate data source results when critical
- Test data source queries in different environments
Next Steps
- Complete Tutorial 8: Output Values
- Learn about local values and expressions
- Explore built-in functions for data manipulation
- Practice with complex data source scenarios