Terraform Data - What is and How to use it.

Terraform is a tool that benefits you in managing different cloud infrastructure services in the form of code. But, first, you codify your infrastructure, and so it’s also recognized as Infrastructure as Code (IaC). Also, The cloud has become indispensable to more and more businesses. It does not simply support reducing time and costs but also lets customers concentrate on their core business. Terraform has many features like any programming language, so we will learn how to use Terraform Data today and why we need to use it.

What is Data Source on Terraform?

Data sources enable Terraform to use the information determined outside of Terraform, represented by different separate Terraform states or code or changed by functions.

In other words, Cloud infrastructure, applications, and services transmit data, which Terraform can query and perform managing data sources.

For example, terraform manages data sources to retrieve information from cloud provider APIs, such as availability zone IDs or data about the peace of your infrastructure through the outputs of different Terraform states.

What can you do with Terraform Data?

Data sources enable you to get data from APIs or different Terraform workspaces. You can utilize this Data to create your project’s code more flexibly and combine workspaces that control other elements of your infrastructure. Additionally, you can manage data sources to attach and distribute data between workspaces in Terraform Cloud and Terraform Enterprise.

Terraform When to use Data

The best use case for Data Source is when we are hardcoded information that could change and decrease the maintainability of our code.

How to use Terraform Data Source

Let’s see one example where we have one module that creates subnets, and we need to specify the Availability Zones for them.

The Subnet configuration utilizes a variable calworksled region with a default value of us-west-1 to set the region.

But, modifying the value of the region variable will not happily change the region because the Subnet configuration incorporates an azs argument to set Availability Zones, which is a hard-coded list of availability zones in the us-west-1 region.

module "subnet" {  
	azs = ["us-west-1b", "us-west-1c"] 
}

Instead, use the aws_availability_zones data source to retrieve the available AZs for the current region.

data "aws_availability_zones" "az_available" {
  state = "available"
}

The aws_availability_zones data source is a component of the AWS provider, and its documentation is below its provider in the Terraform Registry. The same resources, data source section support arguments to specify how they perform. In this case, the state argument restricts the availability zones to particularly those that are currently available.

You can use data source attributes, including the pattern data.<NAME>.<ATTRIBUTE>. So, if you update our module to use this data source to collect the list of availability zones.

module "vpc" { 
azs = data.aws_availability_zones.az_available.names
}

Terraform Data Remote State

You can use the terraform_remote_state data source to use different Terraform workspace’s (state) output data.

Let’s see one example, using one local state file:

data "terraform_remote_state" "bitslovers_vpc" {
  backend = "local"
  config = {
    path = "../terraform.tfstate"
  }
}

This remote state section utilizes the local backend to load state data from the path in the config section.

But, for complex scenarios, we usually store the state on S3 buckets. (It’s recommended for big projects with a lot of DevOps Engineers).

Let’s see one example, how to use a remote state from S3 Buckets using Terraform Data:

data "terraform_remote_state" "bitslovers_vpc" {
  backend = "s3"
  config = {
    bucket  = "bitslovers-remotestate"
    key     = "blog/us_east_1/dev/terraform.tfstate"
    region  = "us-east-1"
    encrypt = true
  }
}

So, to use the data:

resource "aws_instance" "bitslovers" {
  subnet_id = "${data.terraform_remote_state.bitslovers_vpc.subnet_id}"
}

It doesn’t matter if you are using a Local or Remote state. It’s the same approach to use the data loaded.

Data Source Lifecycle

Suppose the arguments of a data instance include no references to computed values, such as attributes of resources that have not yet been generated.

In that case, terraform will read the data instance, and its state will be refreshed during Terraform’s “refresh” phase, which by default runs before making a plan.

But Why? Let me explain: This guarantees that the recovered Data is ready for use through the planning phase, and the diff will show the actual values received.

Also, the Data instance arguments may point to computed values, in which case the instance’s properties cannot be resolved until the whole of its arguments are specified. So, updating the data instance will be deferred until the apply phase. All interjections of the data instance attributes will display as “computed” in the planning phase because the values are unknown.

Using “depends_on” with Data Source

Data resources have the identical dependency interpretation behavior as established for managed resources. Placing the depends_on delays the reading of the data source until all modifications to the dependencies have been ready.

Also, to guarantee that data sources reach the most up-to-date data in a broad diversity of use scenarios, arguments directly pointing to the managed resources are handled as if you placed the resource in depends_on. This behavior can be avoided when we desire by indirectly tell the managed resource values through a local value.

Data sources vs locals

When you start to use Terraform, it’s easy to mix up data sources, locals, and variables. So here’s how I hold them separated:

data sources take in data from the outside world via providers, such as the aws_availability_zones provider, to load the availability zones from AWS.
variables load in data from the parent module or the process requesting terraform;
Locals keep common data from data sources or variables or internal sources like expressions or results (output) attributes of resources or submodules.

Conclusion

We have worked for a bit of a meander collectively, looking at some characteristics and behaviors of data sources in Terraform. Data sources are a helpful instrument that will get you into a problem if you make premises about its behavior. Thoughtful consideration of the behavior of your data source upon your specific needs will produce a good design and a robust solution.