If you have recently upgraded your EKS cluster to v1.26 you might notice that your nodes running custom AMI have not joined the cluster.
First thing to do is to dive into kubelet logs on the node itself which then unveils logs showing similar errors to:
Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io “ip-10-20-30-40.custom.domain” is forbidden: User “system:node:ip-10-20-30-40.eu-west-1.compute.internal” cannot get resource “csinodes” in API group “storage.k8s.io” at the cluster scope: can only access CSINode with the same name as the requesting node
This leads to look for more detailed information why kubelet is failing to join. In Github you will come across https://github.com/awslabs/amazon-eks-ami/pull/1264 which describes the root cause of the problem and provides resolution in the time being.
Making long story short if you are using custom domain and your cluster is now failing in production 🙂 Here is the fix ….
Since I am using Anton’s Babenko Terraform eks modules we will add the following pieces to our terraform self managed nodes in order to mitigate this
pre_bootstrap_user_data = <<-EOT export INSTANCE_ID=$(imds /latest/meta-data/instance-id) export PRIVATE_DNS_NAME=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations.Instances.PrivateDnsName' --output text) EOT bootstrap_extra_args = "--kubelet-extra-args \"--hostname-override=$PRIVATE_DNS_NAME\""