EKS v1.26 nodes fail to join the cluster with custom VPC domain-name

EKS v1.26 nodes fail to join the cluster with custom VPC domain-name

If you have recently upgraded your EKS cluster to v1.26 you might notice that your nodes running custom AMI have not joined the cluster.

First thing to do is to dive into kubelet logs on the node itself which then unveils logs showing similar errors to:

Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io “ip-10-20-30-40.custom.domain” is forbidden: User “system:node:ip-10-20-30-40.eu-west-1.compute.internal” cannot get resource “csinodes” in API group “storage.k8s.io” at the cluster scope: can only access CSINode with the same name as the requesting node

This leads to look for more detailed information why kubelet is failing to join. In Github you will come across https://github.com/awslabs/amazon-eks-ami/pull/1264 which describes the root cause of the problem and provides resolution in the time being.

Making long story short if you are using custom domain and your cluster is now failing in production 🙂 Here is the fix ….

Since I am using Anton’s Babenko Terraform eks modules we will add the following pieces to our terraform self managed nodes in order to mitigate this

pre_bootstrap_user_data = <<-EOT

  export INSTANCE_ID=$(imds /latest/meta-data/instance-id)

  export PRIVATE_DNS_NAME=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[].Instances[].PrivateDnsName' --output text)

EOT

bootstrap_extra_args = "--kubelet-extra-args \"--hostname-override=$PRIVATE_DNS_NAME\""

Related Posts
Leave a Reply

Your email address will not be published.Required fields are marked *