Blog

Do this before you upgrade GKE to K8S v1.25, or you might feel sorry!

20 Jan, 2023
Xebia Background Header Wave

In this blog you'll read how to make sure your persistent volume-backed pods will start after upgrading Kubernetes (K8S) to v1.25. The blog starts with a quick introduction to the problem. After we'll dive in on how the problem manifested, what we tried and how we ultimately fixed it and I'll end this blogpost of with a script you can use to make sure you'll not be impacted by this problem.

Recently (January 13th, 2023), K8S v1.25 became the new default for the regular release channel on Google Kubernetes Engine (GKE). This means that GKE clusters subscribed to this release channel will upgrade the control plane soon. After the control plane upgrade, node pools with auto upgrade configured will follow. This might however cause an undocumented problem with your Google Compute Engine (GCE) persistent disk-backed persistent volumes on GKE.

We recently upgraded our cluster from K8S v1.24 to v1.25. The control plane upgraded fine, but the problems started when we were upgrading the node pools. We noticed some of our workloads not coming back up after the nodes had been upgraded. The workloads were in an Unschedulable state and the events in kubectl describe pods ... gave us a pretty clear indication of why the scheduler was unable to schedule the pod;

1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }
24 node(s) had no available volume zone
3 Insufficient cpu
28 node(s) had volume node affinity conflict.
preemption: 0/28 nodes are available: 28 Preemption is not helpful for scheduling.

The problem was that the new GKE nodes running K8S v1.25 were unable to attach the GCE persistent disks (24 node(s) had no available volume zone). It was unclear why they were unable to since we couldn't find any other related log messages. The autoscaler meanwhile was continuously trying to bring up nodes that would be able to attach the disk.

Then we tried creating a new PVC, which was pending with the following message: Waiting for volume to be created either by external provisioner "pd.csi.storage.gke.io" or manually created by system administrator.. This pointed us to the external provisioner since we hadn't been creating volumes ourselves before the upgrade. Reading up on Google's documentation on the provisioner pointed us to the solution; enabling the Compute Engine persistent disk CSI Driver. The control plane took a very long time to enable the plugin as the autoscaler was still scaling up and down. After waiting for a couple of minutes the plugin was enabled and the cluster started to look healthy again.

Adding a bit of background to this problem; Kubernetes used to have CSI drivers built-in for, among others, the major Cloud Providers. The CSI drivers do the communication between Kubernetes and the disk provisioner (i.e. GCE Persistent Disks). For a couple of years, the plan has been to remove these drivers from the Kubernetes source code and to re-implement them as a plugin instead. Even though I was unable to find it in the K8S/GKE release notes, this built-in driver seems to have been removed/stopped working in the v1.25 release.

GKE clusters created with K8S v1.18 or later are configured by default to make use of the Compute Engine persistent disk CSI Driver-plugin. However, GKE clusters originally created with an older version of K8S don't have this plugin enabled by default. This means that, when you update the nodes to K8S v1.25, GCE persistent volumes will no longer be automatically attached. This causes the pods configured with a Persistent Volume Claim to enter the Unschedulable state. The GKE cluster autoscaler will try to add nodes until a node comes up that has the required driver installed.

If you want to know if you are at risk for this issue, determine if your workloads make use of persistent volumes and if the feature flag Compute Engine persistent disk CSI Driver is enabled. If you are not sure, you can run the following command from your terminal:

gcloud container clusters list --format \
'value(name, location, addonsConfig.gcePersistentDiskCsiDriverConfig.enabled)'

Run the following command if any of your clusters don't have the driver enabled and they do have persistent volumes backed by GCE persistent disks:

gcloud container clusters update <CLUSTER-NAME> \
--update-addons=GcePersistentDiskCsiDriver=ENABLED

Alternatively, you can use the following script (provided by Mark van Holsteijn) to automatically enable the Compute Engine persistent disk CSI Driver for all clusters in a project:

#!/bin/bash
# NAME
#   check-gce-csi-driver-enabled -- checks whether the GCE CSI driver is enabled on your clusters
#
gcloud container clusters \
    list \
    --format 'value(name, location, addonsConfig.gcePersistentDiskCsiDriverConfig.enabled, selfLink)' | \
    while read name location enabled selflink; do
    project=$(awk -F/ '{print $6}' <<< "$selflink")
    location=$(awk -F/ '{print $8}' <<< "$selflink")
    location_type=$(awk -F/ '{if($7 == "zones"){print "zone"} else{print "region"}}' <<< "$selflink")

    if [[ $enabled != True ]]; then
        echo "WARN: cluster $name in $location_type $location does not have the GCE CSI driver enabled." >&2
        echo "# change your terraform configuration, or run to following command to update"
        echo "gcloud container clusters update $name --$location_type $location --project $project --update-addons=GcePersistentDiskCsiDriver=ENABLED"

    else
        echo "INFO: cluster $name in $location_type $location has the GCE CSI driver enabled \o/" >&2
    fi
done

Explore related posts