Terraform shared VPC GKE Cluster on Google Cloud

Introduction:

Google Kubernetes Engine (GKE) is a managed, production-ready environment for deploying containerized applications. It is based on Kubernetes, an open-source container orchestration system. Creating a GKE cluster on Google Cloud using Terraform has several benefits. Here are some of them:

  1. Unified Workflow: If you are already deploying infrastructure to Google Cloud with Terraform, your GKE cluster can fit into that workflow. You can also deploy applications into your GKE cluster using Terraform

  2. Full Lifecycle Management: Terraform doesn't only create resources, it updates, and deletes tracked resources without requiring you to inspect the API to identify those resources. This means that you can use Terraform to manage the entire lifecycle of your GKE cluster

  3. Consistency: By using Terraform to create your GKE cluster, you can ensure that your cluster is created in a consistent way every time. This can help to avoid configuration drift and ensure that your cluster is always in a known state

  4. Collaboration: By storing your Terraform state in a remote location, you can collaborate with other team members more effectively. This can help to avoid conflicts and ensure that everyone is working with the same version of the infrastructure

  5. Automation: By using Terraform to create your GKE cluster, you can automate the entire process. This can help to save time and reduce the risk of human error

Scope of Article

This article will go through how to set up a Google Cloud GKE cluster using HashiCorp Terraform. Setting up a shared VPC network is not a part of the scope of this article.

Before you Begin

Implementation:

Terraform Tree Structure

terraform:
├── application
 └── cluster 
     └── backend.tf
     └── gke.tf
     └── outputs.tf
     └── provider.tf
     └── variables.tf
     └── provider.tf
     └── vars.tfvars
  1. provider.tf

      terraform:
      ├── application
       └── cluster 
           └── provider.tf
    

    Create a Terraform configuration file to configure the required version of Terraform and the required providers to connect with Google Cloud

# Declare the connection to the google provider with Terraform required version
terraform {
  required_providers {
    google = {
      version = var.google_provider_version
    }
  }
  required_version = var.terraform_version
}

provider "google" {
  project = var.project_id
  region  = var.region
  impersonate_service_account = "<SERVICE_ACCOUNT_NAME>@<PROJECT_ID>.iam.gserviceaccount.com"
}

Note: This article does not cover how to use Google Cloud Service Account impersonation in your Terraform code. For the production use case, we recommend using a service account access token in the provider block.

Please Refer: cloud.google.com/blog/topics/developers-pra..

  1. backend.tf

 terraform:
 ├── application
  └── cluster 
      └── backend.tf

Using Google Cloud Storage bucket to store Terraform state file.

terraform {
  backend "gcs" {
    bucket = "cluster-tfstate"
    prefix = "terraform/state/cluster"
  }
}

Note*: Before executing above, please create GCS bucket "cluster-state"*

  1. Cluster Setup

 terraform:
 ├── application
  └── cluster 
      └── gke.tf

In this GKE Cluster, we will be creating 2 separate node pools - One for your control plane node pool to assign your service pod to the control plane node and another for user workspace POD to CPU node pool

# GKE on Shared VPC Cluster
resource "google_container_cluster" "primary" {
  name               = "${var.name}-cluster"
  project            = var.project
  location           = var.region
  network            = "projects/${var.gcp_host_project}/global/networks/${var.shared_vpc_name}"
  subnetwork         = "projects/${var.gcp_host_project}/regions/${var.region}/subnetworks/${var.shared_vpc_subnetwork}"
  initial_node_count = 1 // node count for the default pool
  networking_mode    = "VPC_NATIVE"
  network_policy {
    enabled = true // this enables Calico in the cluster, disabling traffic between user pods
  }

  # Enables private cluster access and specifies a private RFC1918 block for the master's VPC. (Beta)
  private_cluster_config {
    enable_private_nodes    = true
    master_ipv4_cidr_block  = var.master_ipv4_cidr_block
    enable_private_endpoint = false
  }

  # Configuration for cluster IP allocation.
  ip_allocation_policy {
    cluster_secondary_range_name  = var.subnetwork_pod_range_name
    services_secondary_range_name = var.subnetwork_svc_range_name
  }

  # Enable Stackdriver Kubernetes logging and monitoring (https://github.com/terraform-providers/terraform-provider-google/pull/1544)
  logging_service    = "logging.googleapis.com/kubernetes"
  monitoring_service = "monitoring.googleapis.com/kubernetes"

}

// Control Plane Managed Node Pool
// Application services running at seperately managed Control Plane Node Pool
resource "google_container_node_pool" "control_plane_nodes" {
  name     = "${var.name}-control-plane"
  project  = var.project
  location = var.region
  cluster  = google_container_cluster.primary.name
  node_count = 2
  lifecycle {
    prevent_destroy = false
  }
  node_config {
    disk_size_gb = "250"
    machine_type = "n2d-standard-8"
    image_type   = "cos_containerd"
    tags         = [${var.project}-gke"]
    labels = {
      tier = "control-plane"
    }
    metadata = {
      disable-legacy-endpoints = "true"
    }
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/cloud-platform.read-only"
    ]
  }
}


// CPU node pool
// User Pods will be running on this seperately managed CPU Node Pool
resource "google_container_node_pool" "primary_nodes" {
  name     = "${var.name}-cpu-node-pool"
  project  = var.project
  location = var.region
  cluster  = google_container_cluster.primary.name
  node_count = 1
  autoscaling {
    min_node_count = var.min_node_count
    max_node_count = var.max_node_count
  }
  lifecycle {
    prevent_destroy = false
  }
  node_config {
    disk_size_gb = var.disk_size_gb
    machine_type = var.machine_type
    image_type   = var.image_type
    tags         = [${var.project}-gke"]
    labels = {
      env = var.name
    }
    metadata = {
      disable-legacy-endpoints = "true"
    }
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/cloud-platform.read-only"
    ]
  }
}
  1. Taint control plane Node Pool

    Taint all nodes within the Control Plane Node Pool

kubectl taint nodes tier=control-plane:NoSchedule
  1. Workload Identity

    Workload Identity allows workloads in your GKE clusters to impersonate Identity and Access Management (IAM) service accounts to access Google Cloud services.

# Note 1: 
#workload identity needs to be enable at part of TF provision, else this needs to be run afterwards
gcloud container clusters update <name>-cluster \
    --region=us-central1 \
    --workload-pool=<cluster-name>.svc.id.goog

# Note 2:
# Update an existing node pool
gcloud container node-pools update <name>-cpu-node-pool \
    --cluster=<name>-cluster \
    --region=us-central1 \
    --workload-metadata=GKE_METADATA

# Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_on_cluster

Ref: cloud.google.com/kubernetes-engine/docs/how..

  1. variables.tf

-----------------------------------------------------------------------------------------------------------------
############################
###        PROJECT      ###
############################

variable "project" {}
variable "region" {}

############################
###        NETWORK      ###
############################

variable "gcp_host_project" {}
variable "shared_vpc_name" {}
variable "shared_vpc_subnetwork" {}
variable "subnetwork_pod_range_name" {}
variable "subnetwork_svc_range_name" {}
variable "master_ipv4_cidr_block" {}

############################
###        NODES        ###
############################

variable "name" {}
variable "disk_size_gb" {}
variable "image_type" {}
variable "machine_type" {}
variable "min_node_count" {}
variable "max_node_count" {}

Note**: We may add more variables in this file to further templatize this.**

  1. vars.tfvars

-----------------------------------------------------------------------------------------------------------------
############################
###        PROJECT      ###
############################

variable "project" {}
variable "region" {}

############################
###        NETWORK      ###
############################

variable "gcp_host_project" {}
variable "shared_vpc_name" {}
variable "shared_vpc_subnetwork" {}
variable "subnetwork_pod_range_name" {}
variable "subnetwork_svc_range_name" {}
variable "master_ipv4_cidr_block" {}

############################
###        NODES        ###
############################

variable "name" {}
variable "disk_size_gb" {}
variable "image_type" {}
variable "machine_type" {}
variable "min_node_count" {}
variable "max_node_count" {}
  1. Apply Terraform

# Initialize environment

### Step 1. Terraform init

Follow conventional terraform workflow to build this solution.
You will be prompted for required variables.
Alternatively, you may create a `vars.tfvars` file and 
apply the `-var-file=vars.tfvars` flag.

Initialize the terraform environment.

```
terraform init
```

### Step 2. IMPORTANT: Initialize workspace

Select or create a terraform workspace before running the configuration **IN THIS
FOLDER**. This ensures that the backend state will be saved according to the 
target environment. Recommended workspace names are `development`, `production` 
to clearly designate the target environment:

```
terraform workspace select development
```

or

```
terraform workspace new development
```

## Terraform plan

Plan the terraform solution.

```
terraform plan
```

or

```
terraform plan -var-file=vars.tfvars
```

## Terraform apply

Apply the terraform solution.

```
terraform apply
```

or

```
terraform apply -var-file=vars.tfvars
```

That's all, you should be able to provision GKE Cluster on shared VPC successfully after Terraform Apply!! In fact, you may replace shared VPC with a private VPC network if you wants to restrict your GKE Cluster resources network access within the same project.

Let’s confirm the cluster creation in the GCP portal.

Access cluster using Kubectl:

Configure Kubectl to connect to your GKE cluster by running gcloud container clusters get-credentials CLUSTER_NAME --region REGION where CLUSTER_NAME is the name of your GKE cluster and REGION is the region where the cluster is located.

$ gcloud container clusters get-credentials CLUSTER_NAME --region REGION --project PROJECT_NAME

Verify the connection by running kubectl get nodes to display the nodes in your GKE cluster.

$ kubectl get nodes
NAME                                                  STATUS   ROLES    AGE   VERSION
gke-terraform-k8s-cluster-d-node-pool-1c70ecad-xfjt   Ready    <none>   12m   v1.25.8-gke.500
gke-terraform-k8s-cluster-d-node-pool-2956fa2f-8xt5   Ready    <none>   12m   v1.25.8-gke.500
gke-terraform-k8s-cluster-d-node-pool-574989dc-38cb   Ready    <none>   12m   v1.25.8-gke.500

Delete the cluster:

Once we no longer need this infrastructure, we can clean up to reduce costs.

$ terraform destroy -auto-approve

Summary

Overall, using Terraform to create a GKE cluster on Google Cloud can help to streamline the process, ensure consistency, and improve collaboration and automation.

Please share your thoughts and experiences after following the steps outlined.