Terraform shared VPC GKE Cluster on Google Cloud
Introduction:
Google Kubernetes Engine (GKE) is a managed, production-ready environment for deploying containerized applications. It is based on Kubernetes, an open-source container orchestration system. Creating a GKE cluster on Google Cloud using Terraform has several benefits. Here are some of them:
Unified Workflow: If you are already deploying infrastructure to Google Cloud with Terraform, your GKE cluster can fit into that workflow. You can also deploy applications into your GKE cluster using Terraform
Full Lifecycle Management: Terraform doesn't only create resources, it updates, and deletes tracked resources without requiring you to inspect the API to identify those resources. This means that you can use Terraform to manage the entire lifecycle of your GKE cluster
Consistency: By using Terraform to create your GKE cluster, you can ensure that your cluster is created in a consistent way every time. This can help to avoid configuration drift and ensure that your cluster is always in a known state
Collaboration: By storing your Terraform state in a remote location, you can collaborate with other team members more effectively. This can help to avoid conflicts and ensure that everyone is working with the same version of the infrastructure
Automation: By using Terraform to create your GKE cluster, you can automate the entire process. This can help to save time and reduce the risk of human error
Scope of Article
This article will go through how to set up a Google Cloud GKE cluster using HashiCorp Terraform. Setting up a shared VPC network is not a part of the scope of this article.
Before you Begin
Create a project in the Google Cloud Console and set up billing on that project. Any examples in this guide will be part of the GCP “always free” tier.
Please refer to this page for information on how to set up a Service Project, a Shared VPC Network, and Firewall Rules with the necessary IAM permissions for your service account and user account.
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc
Create a service account with appropriate IAM permission to provision GKE Cluster - https://cloud.google.com/iam/docs/service-accounts-create
Activate service account https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account
Install Terraform (This article is written in Terraform v1.3.5)
Implementation:
Terraform Tree Structure
terraform:
├── application
└── cluster
└── backend.tf
└── gke.tf
└── outputs.tf
└── provider.tf
└── variables.tf
└── provider.tf
└── vars.tfvars
provider.tf
terraform: ├── application └── cluster └── provider.tf
Create a Terraform configuration file to configure the required version of Terraform and the required providers to connect with Google Cloud
# Declare the connection to the google provider with Terraform required version
terraform {
required_providers {
google = {
version = var.google_provider_version
}
}
required_version = var.terraform_version
}
provider "google" {
project = var.project_id
region = var.region
impersonate_service_account = "<SERVICE_ACCOUNT_NAME>@<PROJECT_ID>.iam.gserviceaccount.com"
}
Note: This article does not cover how to use Google Cloud Service Account impersonation in your Terraform code. For the production use case, we recommend using a service account access token in the provider block.
Please Refer: cloud.google.com/blog/topics/developers-pra..
terraform:
├── application
└── cluster
└── backend.tf
Using Google Cloud Storage bucket to store Terraform state file.
terraform {
backend "gcs" {
bucket = "cluster-tfstate"
prefix = "terraform/state/cluster"
}
}
Note*: Before executing above, please create GCS bucket "cluster-state"*
Cluster Setup
terraform:
├── application
└── cluster
└── gke.tf
In this GKE Cluster, we will be creating 2 separate node pools - One for your control plane node pool to assign your service pod to the control plane node and another for user workspace POD to CPU node pool
# GKE on Shared VPC Cluster
resource "google_container_cluster" "primary" {
name = "${var.name}-cluster"
project = var.project
location = var.region
network = "projects/${var.gcp_host_project}/global/networks/${var.shared_vpc_name}"
subnetwork = "projects/${var.gcp_host_project}/regions/${var.region}/subnetworks/${var.shared_vpc_subnetwork}"
initial_node_count = 1 // node count for the default pool
networking_mode = "VPC_NATIVE"
network_policy {
enabled = true // this enables Calico in the cluster, disabling traffic between user pods
}
# Enables private cluster access and specifies a private RFC1918 block for the master's VPC. (Beta)
private_cluster_config {
enable_private_nodes = true
master_ipv4_cidr_block = var.master_ipv4_cidr_block
enable_private_endpoint = false
}
# Configuration for cluster IP allocation.
ip_allocation_policy {
cluster_secondary_range_name = var.subnetwork_pod_range_name
services_secondary_range_name = var.subnetwork_svc_range_name
}
# Enable Stackdriver Kubernetes logging and monitoring (https://github.com/terraform-providers/terraform-provider-google/pull/1544)
logging_service = "logging.googleapis.com/kubernetes"
monitoring_service = "monitoring.googleapis.com/kubernetes"
}
// Control Plane Managed Node Pool
// Application services running at seperately managed Control Plane Node Pool
resource "google_container_node_pool" "control_plane_nodes" {
name = "${var.name}-control-plane"
project = var.project
location = var.region
cluster = google_container_cluster.primary.name
node_count = 2
lifecycle {
prevent_destroy = false
}
node_config {
disk_size_gb = "250"
machine_type = "n2d-standard-8"
image_type = "cos_containerd"
tags = [${var.project}-gke"]
labels = {
tier = "control-plane"
}
metadata = {
disable-legacy-endpoints = "true"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/cloud-platform.read-only"
]
}
}
// CPU node pool
// User Pods will be running on this seperately managed CPU Node Pool
resource "google_container_node_pool" "primary_nodes" {
name = "${var.name}-cpu-node-pool"
project = var.project
location = var.region
cluster = google_container_cluster.primary.name
node_count = 1
autoscaling {
min_node_count = var.min_node_count
max_node_count = var.max_node_count
}
lifecycle {
prevent_destroy = false
}
node_config {
disk_size_gb = var.disk_size_gb
machine_type = var.machine_type
image_type = var.image_type
tags = [${var.project}-gke"]
labels = {
env = var.name
}
metadata = {
disable-legacy-endpoints = "true"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/cloud-platform.read-only"
]
}
}
Taint control plane Node Pool
Taint all nodes within the Control Plane Node Pool
kubectl taint nodes tier=control-plane:NoSchedule
Workload Identity
Workload Identity allows workloads in your GKE clusters to impersonate Identity and Access Management (IAM) service accounts to access Google Cloud services.
# Note 1:
#workload identity needs to be enable at part of TF provision, else this needs to be run afterwards
gcloud container clusters update <name>-cluster \
--region=us-central1 \
--workload-pool=<cluster-name>.svc.id.goog
# Note 2:
# Update an existing node pool
gcloud container node-pools update <name>-cpu-node-pool \
--cluster=<name>-cluster \
--region=us-central1 \
--workload-metadata=GKE_METADATA
# Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_on_cluster
Ref: cloud.google.com/kubernetes-engine/docs/how..
-----------------------------------------------------------------------------------------------------------------
############################
### PROJECT ###
############################
variable "project" {}
variable "region" {}
############################
### NETWORK ###
############################
variable "gcp_host_project" {}
variable "shared_vpc_name" {}
variable "shared_vpc_subnetwork" {}
variable "subnetwork_pod_range_name" {}
variable "subnetwork_svc_range_name" {}
variable "master_ipv4_cidr_block" {}
############################
### NODES ###
############################
variable "name" {}
variable "disk_size_gb" {}
variable "image_type" {}
variable "machine_type" {}
variable "min_node_count" {}
variable "max_node_count" {}
Note**: We may add more variables in this file to further templatize this.**
vars.tfvars
-----------------------------------------------------------------------------------------------------------------
############################
### PROJECT ###
############################
variable "project" {}
variable "region" {}
############################
### NETWORK ###
############################
variable "gcp_host_project" {}
variable "shared_vpc_name" {}
variable "shared_vpc_subnetwork" {}
variable "subnetwork_pod_range_name" {}
variable "subnetwork_svc_range_name" {}
variable "master_ipv4_cidr_block" {}
############################
### NODES ###
############################
variable "name" {}
variable "disk_size_gb" {}
variable "image_type" {}
variable "machine_type" {}
variable "min_node_count" {}
variable "max_node_count" {}
Apply Terraform
# Initialize environment
### Step 1. Terraform init
Follow conventional terraform workflow to build this solution.
You will be prompted for required variables.
Alternatively, you may create a `vars.tfvars` file and
apply the `-var-file=vars.tfvars` flag.
Initialize the terraform environment.
```
terraform init
```
### Step 2. IMPORTANT: Initialize workspace
Select or create a terraform workspace before running the configuration **IN THIS
FOLDER**. This ensures that the backend state will be saved according to the
target environment. Recommended workspace names are `development`, `production`
to clearly designate the target environment:
```
terraform workspace select development
```
or
```
terraform workspace new development
```
## Terraform plan
Plan the terraform solution.
```
terraform plan
```
or
```
terraform plan -var-file=vars.tfvars
```
## Terraform apply
Apply the terraform solution.
```
terraform apply
```
or
```
terraform apply -var-file=vars.tfvars
```
That's all, you should be able to provision GKE Cluster on shared VPC successfully after Terraform Apply!! In fact, you may replace shared VPC with a private VPC network if you wants to restrict your GKE Cluster resources network access within the same project.
Let’s confirm the cluster creation in the GCP portal.
Access cluster using Kubectl:
Configure Kubectl to connect to your GKE cluster by running gcloud container clusters get-credentials CLUSTER_NAME --region REGION
where CLUSTER_NAME
is the name of your GKE cluster and REGION
is the region where the cluster is located.
$ gcloud container clusters get-credentials CLUSTER_NAME --region REGION --project PROJECT_NAME
Verify the connection by running kubectl get nodes
to display the nodes in your GKE cluster.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-terraform-k8s-cluster-d-node-pool-1c70ecad-xfjt Ready <none> 12m v1.25.8-gke.500
gke-terraform-k8s-cluster-d-node-pool-2956fa2f-8xt5 Ready <none> 12m v1.25.8-gke.500
gke-terraform-k8s-cluster-d-node-pool-574989dc-38cb Ready <none> 12m v1.25.8-gke.500
Delete the cluster:
Once we no longer need this infrastructure, we can clean up to reduce costs.
$ terraform destroy -auto-approve
Summary
Overall, using Terraform to create a GKE cluster on Google Cloud can help to streamline the process, ensure consistency, and improve collaboration and automation.
Please share your thoughts and experiences after following the steps outlined.