I've been building systems for almost 15 years which is crazy to think about... I've also been building infrastructure (almost exclusively IaC) for around 7 years. I started with kubernetes running API driven microservices on AWS, then quickly moved to cloud native architectures using terraform, and now (and for the past 5 years) I've been on the serverless bandwagon. In this post, I'll share my experience with Terraform and how it has helped me scale up my infrastructures as a developer.
I've seen so many posts that start this way, "what is terraform", honestly It's off-putting so let me be brief; It's a way of codifying your cloud infrastructure. You write your infrastructure as code (IaC) in a declarative way, and Terraform takes care of provisioning and managing the resources for you. It's like having a blueprint for your cloud infrastructure that you can version control, share, and reuse.
You're a founding engineer at a startup, you've been building the product for a few months, and now you're ready to scale up. You've got a few 1000 users, and you're starting to see some traction and now - the shift has started, your company is starting to bring on new engineers, team size is no longer 5 its 50 - we're going to be moving away from an MVP/early product to a scalable solution. You've got resources scattered across different accounts, regions, and services. You're using a mix of manual provisioning, scripts, and ad-hoc solutions. It's time to get serious about your infrastructure and make sure it can grow at the same speed of the business.
We are going to be building a lot of new functionality onto the system, and we want a safe, easy to provision, repeatable, and secure way of building it. First we'll need to decide on a strategical architecture decision for the business, I'll not go into detail about this decision but make sure it is a decision that can scale with the business. For example, if your building a modern reactive web application that is data driven with requirements such as audibility, traceability, scalability (all the -ilities) you might want to check out a previous post I wrote: An Introduction to Event Sourcing
Once we have our architecture it's time to decide on our terraform architecture or better described as terraform design pattern - I don't see many people talking about this decision, but it's a critical one. We need to decide on how we want to structure our terraform code, how we want to manage our state, and how we want to deploy our infrastructure. There are many ways to do this, but I'll share my experience with a few different approaches. Let's go over the things I usually consider.
If you've worked with terraform you'll already be locking and using remote state - state management in this context is more about how we'll split and manage our state files. I usually start by having one monolithic state per environment - initially this is a good starting point. Its' important to note - although the state is monolithic our IaC won't be, bringing us to the next point.
A typical monolithic remote state backend configuration:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "eu-west-1"
}
}
resource "aws_s3_bucket" "main" {
bucket = "monolithic-bucket"
}
As your infrastructure grows, splitting state by domain or environment becomes essential:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/network/terraform.tfstate"
region = "eu-west-1"
}
}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
}
Ownership is a key aspect of scaling up your infrastructure. As your team grows, you'll want to ensure that different teams or individuals can own specific parts of the infrastructure. This means splitting your terraform code into smaller, manageable modules that can be owned by different teams. To increase visibility and understanding of what a service is; having each service owner also owning the infrastructure for this service. For example, if you're building a web application, you might have a team responsible for the frontend, another team responsible for the backend, and another team responsible for the database. Each team can own their own terraform modules and manage their own state files.
A simple example of a component module (for an S3 bucket):
resource "aws_s3_bucket" "this" {
bucket = var.bucket_name
}
variable "bucket_name" {
description = "The name of the S3 bucket"
type = string
}
A block module might compose several components, such as an API Gateway, Lambda, and DynamoDB table:
module "api_gateway" {
source = "../api-gateway"
name = var.api_name
}
module "lambda" {
source = "../lambda"
function_name = var.lambda_name
}
module "dynamodb" {
source = "../dynamodb"
table_name = var.table_name
}
A service module brings blocks together to represent a business domain:
module "web_api" {
source = "../../modules/web-api"
api_name = "my-web-api"
lambda_name = "my-lambda"
table_name = "my-table"
}
Modules are obviously critical - they allow you to encapsulate and reuse your infrastructure code. So let's speak about the three types of initial modules I usually start with:
Components Starting with components; closely tied to the resources they are provisioning, these are the building blocks of your infrastructure. They are usually small, reusable pieces of code that can be used to provision specific resources. For example, you might have a component module for provisioning an S3 bucket, another for provisioning a DynamoDB table, and another for provisioning an IAM role.
Blocks Blocks are more usable from an engineering point of view - these modules are usually split into two categories; a collection of components that work together to provide a specific functionality or a domain specific module that provides a specific functionality. An example of a block could be a web-api block that contains several components such as an API Gateway, Lambda function, and DynamoDB table. Blocks are more focused on the functionality they provide rather than the resources they provision.
Services The final type of module in this architecture is services; these modules are tightly coupled to the domain of the system - exclusively defined by blocks these modules are easy ways for us to identify what parts of the system are defined where. An example of a service module could be a web application service that uses a web-api block.
So in summary; components are tightly coupled resources that are tied to a provider. blocks are collection of components (or domain specific modules) that provide a specific functionality. Finally, services are collections of blocks that provide a specific domain functionality.
The following diagram illustrates the inheritance of modules in this approach:
graph LR
%%is-centered
Components --> Blocks
Blocks --> Services
Diagram illustrating the inheritance of modules
I want to dive more into the terraform design patter choices here but let me first point out some standards that I follow, and have been for quite some time.
Some years ago I came up with some internal standards within a company I worked with - I've pretty much been following these standards since. However, Google released a terraform best practices document that I highly recommend reading - It pretty much closely aligned with what I push. Here are some of the key points I follow:
resource_nameA typical naming convention for AWS resources in Terraform:
resource "aws_iam_role" "app_server_role" {
name = "app_server_role"
# ...
}
variables.tf fileExample of a well-documented variable in a module:
variable "bucket_name" {
description = "The name of the S3 bucket"
type = string
}
outputs.tf fileExample output definition:
output "bucket_arn" {
description = "The ARN of the S3 bucket"
value = aws_s3_bucket.this.arn
}
README.md file in each module and to automate the generation of this file. Checkout terraform-docsdata.tf file. This isn't an critical one as there are times when you might only need one data object - arguments for avoiding the data.tf file can be made here.This one is required for me; I see a lot of people using imperative code in their terraform modules. This is a big no-no for me, terraform is a declarative language and should be used as such. Avoid using loops, conditionals, and other imperative constructs in your modules. Instead, use the built-in functions and resources to achieve the desired outcome. We're not trying to write a program here, we're trying to define infrastructure, the key focus should be on how easy it is to understand the code and how easy it is to maintain. Imperative code can make this difficult, so avoid it where possible.
A declarative resource definition is clear and maintainable:
resource "aws_s3_bucket" "example" {
bucket = "my-declarative-bucket"
}
By contrast, imperative patterns (such as using null_resource and local-exec) should be avoided for core infrastructure logic:
resource "null_resource" "example" {
provisioner "local-exec" {
command = "aws s3 mb s3://my-bucket"
}
}
latest or master branches for production code, as this can lead to unexpected changes or can be misleading.For example, reference a tagged module version rather than a branch or commit hash:
module "vpc" {
source = "git::https://github.com/terraform-aws-modules/terraform-aws-vpc.git?ref=v3.0.0"
name = "my-vpc"
cidr = "10.0.0.0/16"
}
Terralith is a design pattern that I avoid for the most part. It is a monolithic approach to managing your infrastructure. The idea is to have one large Terraform module that contains all the resources and configurations for your entire infrastructure. To give it some weight - If you are building something that you know is going to be small and won't need modules it can be useful for small teams or projects where the infrastructure is relatively simple and doesn't require a lot of complexity. However, as your team grows and your infrastructure becomes more complex, this approach can become difficult to manage and maintain. I highly advise people avoid this pattern.
graph TD
%%is-centered
A[Monolithic State] --> B[All Resources]
B --> C[Single Team]
Terralmod is a design pattern that I highly recommend for most projects. It is a modular approach to managing your infrastructure. The idea is to break down your infrastructure into smaller, reusable modules that can be easily managed and maintained. This approach allows you to have a clear separation of concerns, making it easier to understand and manage your infrastructure. It also allows you to reuse modules across different projects, reducing duplication and improving maintainability.
graph TD
%%is-centered
A[Root Module] --> B[Component Module 1]
A --> C[Component Module 2]
A --> D[Component Module 3]
Terraservice is a design pattern that I usually go for when building infrastructure with many services and many teams. The idea is to break down your infrastructure into smaller, service-oriented modules that can be easily managed and maintained by the teams that own them. This approach allows you to have a clear separation of concerns, making it easier to understand and manage your infrastructure. It also allows you to reuse modules across different projects, reducing duplication and improving maintainability. This pattern is similar to Terramod but focuses more on the ownership aspect of the infrastructure.
graph TD
%%is-centered
S1[Service 1] --> M1[Block A]
S1 --> M2[Block B]
S2[Service 2] --> M3[Block C]
S2 --> M4[Block D]
M2 --> C2[Component Module 2]
M2 --> C3[Component Module 3]
M3 --> C4[Component Module 4]
M1 --> C4
M3 --> C1[Component Module 1]
M4 --> C1
M1 --> C1
M1 --> C2
Overall, I think the biggest takeaway and TLDR's are:
If you're looking for more information on Terraform and IaC, I highly recommend the following resources: