Mayara Gouveia
4 min readOct 15, 2021

--

Autoscaling group with warm pool on Kubernetes AWS EKS with Terraform — Yes! It's possible!

Have you heard of an auto-scaling group feature called warm pool?

This feature gives you the ability to decrease the latency of your application that depends on the boot of new instances periodically.

Imagine a scenario where you have a Kubernetes cluster on Amazon EKS with a few managed node groups, and one of them does a task that requires, through auto-scaling, the creation of more nodes inside the group, to perform the workload. You run into a problem called latency! The time it takes for the nodes to be ready and running can be very long depending on the demand of your application.

The result is…. loading……..

With the warm pool you can keep these nodes on stand-by, already inserted in the cluster on the NotReady state, for when your application needs them.

Is widely acknowledged that everything these days is code, and my lab infrastructure is not different. My lab environment is provisioned through Terraform.

I use some official Terraform modules to provision my cluster on Amazon EKS and then came the idea of using the warm pool, however, I ran into a problem: The aws_eks_node_group module didn’t provide the warm pool resource yet!

On the other hand, after several attempts, I decided to adjust the available modules to my need.

I used the aws_auto_scaling_group module, which already has warm pool support (but It's not made specifically for EKS) alongside aws_launch_configuration module, where I could configure userdata in order to join the node group to the cluster.

Here's the code snippet of my default managed node group without warm pool.

resource "aws_eks_node_group" "managed_nodes_without_warm_pool" {
cluster_name = var.cluster_name
node_group_name = var.managed-nodes-name
subnet_ids = data.terraform_remote_state.cluster.subnetid
launch_template {
id = aws_launch_template.managed-nodes-without-warm-pool-lt.id
version = aws_launch_template.managed-nodes-without-warm-pool-lt.latest_version
}
scaling_config {
desired_size = var.desired
max_size = var.maxsize
min_size = var.minsize
}
labels = {
lifecycle = "OnDemand"
}
lifecycle {
create_before_destroy = true
}
}

Now let's make the magic happen. Let's create a new node group with warm pool.

I chose to use the aws_ami module alongside a filter to select the newest EC2 image for EKS, however, I had to use variables to perform searches based on the Kubernetes cluster version I am using.

data "aws_ami" "managed_nodes_ami" {
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-${var.cluster_version}-*"]
}
owners = ["amazon"]
}

From the resulting image, I used the aws_launch_configuration module to make some adjustments to the security group, group name, and the most important, the userdata.

resource "aws_launch_configuration" "managed-nodes-warm-pool-lc" {
name = "managed-nodes-warm-pool-lc"
image_id = data.aws_ami.managed_nodes_ami.id
instance_type = var.wp-instance_type
security_groups = var.security_group
associate_public_ip_address = true
user_data = file("userdata.sh")
iam_instance_profile = aws_iam_instance_profile.managed-nodes-warm-pool-ip.name
lifecycle {
create_before_destroy = true
}
}

With the launch configuration ready, I used the aws_autoscaling_group module, and with it, I was able to create my node group with the warm pool enabled.

resource "aws_autoscaling_group" "managed-nodes-warm-pool-asg" {
name = "managed-nodes-warm-pool-asg"
desired_capacity = var.desired
max_size = var.maxsize
min_size = var.minsize
health_check_grace_period = 15
health_check_type = "EC2"
force_delete = true
launch_configuration = aws_launch_configuration.managed-nodes-warm-pool-lc.name
vpc_zone_identifier = data.terraform_remote_state.cluster.outputs.public
warm_pool {
pool_state = "Stopped"
min_size = var.wp_min
max_group_prepared_capacity = var.wp_max
}
dynamic "tag" {
for_each = var.custom_tags
content {
key = tag.key
value = tag.value
propagate_at_launch = true
}
}
timeouts {
delete = "15m"
}
}

In my userdata.sh script, I put the information for the node group to join the cluster and like magic, everything worked as it should!

Here's the application behavior with a small load of traffic. Warm pool hosts keep on NotReady status.

And here, with a huge load of traffic, warm pool gets nodes in Ready status in just a few seconds as well as creates new NotReady nodes in case of need.

There is no perfect tool, but solutions that fit your needs!

--

--