Pods stuck in ContainerCreating status with Karpenter: How to fix

Kubernetes is a powerful orchestration platform, but sometimes pods get stuck in the ContainerCreating status. In this article, we’ll explore common reasons for this issue and provide practical solutions to get your pods up and running smoothly. If you’ve ever faced this frustrating situation, read on to discover how to troubleshoot and resolve it.

ContainerCreating status

Problem – Pods stuck in ContainerCreating status with Karpenter

We face an intriguing and challenging issue: the recurring occurrence of Pods getting stuck in the “ContainerCreating” when trying to climb onto nodes provisioned through Karpenter. This phenomenon has significant impacts on the normal operation of Kubernetes, preventing the efficient deployment of containers and, consequently, affecting the agility and reliability of the environment.

During my Karpenter deployment journey, I noticed that Pods that used Docker Image based on PHP Alpine were getting stuck at “ ContainerCreating”, as per:

fernando@lab-1487   ~  kubectl get pods -A -o wide --watch | grep 23-144                                                                                                         1|1   11105  16:06:39  
default                         inflate-b9d769f59-v98tf                             1/1     Running            0          25m     172.31.21.32    ip-172-31-23-144.ec2.internal   <none>           <none>
kube-system                     aws-node-hgwvk                                      1/1     Running            0          33m     172.31.23.144   ip-172-31-23-144.ec2.internal   <none>           <none>
kube-system                     filebeat-bjcsx                                      1/1     Running            0          33m     172.31.23.144   ip-172-31-23-144.ec2.internal   <none>           <none>
kube-system                     kube-proxy-fzwfp                                    1/1     Running            0          33m     172.31.23.144   ip-172-31-23-144.ec2.internal   <none>           <none>
kube-system                     metricbeat-szq58                                    1/1     Running            0          33m     172.31.23.144   ip-172-31-23-144.ec2.internal   <none>           <none>
lens-metrics                    node-exporter-pllfr                                 1/1     Running            0          33m     172.31.27.199   ip-172-31-23-144.ec2.internal   <none>           <none>
monitoring                      zabbix-agent-48hc7                                  1/1     Running            0          33m     172.31.23.144   ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-pt6ct             0/1     Pending             0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-h8km8             0/1     Pending             0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-pt6ct             0/1     ContainerCreating   0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-h8km8             0/1     ContainerCreating   0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-zrv7z             0/1     Pending             0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-zrv7z             0/1     ContainerCreating   0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-pt6ct             0/1     Terminating         0          0s      <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-pt6ct             0/1     Terminating         0          10s     <none>          ip-172-31-23-144.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-nbtt-59644cc6c6-pt6ct             0/1     Terminating         0          10s     <none>          ip-172-31-23-144.ec2.internal   <none>           <none>

The Pods that had problems were random, many Pods from other projects went up correctly.

Several components of the Kubernetes Cluster were reviewed during the Troubleshooting, such as the Pod, Kubelet, Karpenter logs and other pertinent items.

In the Pods there were these logs:

Warning   Failed  Pod api-cep-nbtt-59644cc6c6-pt6ct   spec.containers{api-cep-nbtt}   kubelet, ip-172-31-23-144.ec2.internal  Error: cannot find
volume "kube-api-access-8lv5h" to mount into container "api-cep-nbtt"

While the Kubelet service had this information:

 
[root@ip-172-31-23-144 bin]# systemctl status kubelet.service
 kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubelet-args.conf, 30-kubelet-extra-args.conf
   Active: active (running) since Thu 2023-06-22 18:33:15 UTC; 1h 24min ago
     Docs: <https://github.com/kubernetes/kubernetes>
 Main PID: 2949 (kubelet)
   CGroup: /runtime.slice/kubelet.service
           └─2949 /usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime remote --container-runtime-endpoint un...

Jun 22 19:22:31 ip-172-18-23-144.ec2.internal kubelet[2949]: I0622 19:22:31.402350    2949 kubelet.go:1951] "SyncLoop DELETE" source="api" pods=[api-cep-nbtt/api-cep-nbtt-59644cc6c6-pt6ct]
Jun 22 19:22:31 ip-172-18-23-144.ec2.internal kubelet[2949]: I0622 19:22:31.408804    2949 kubelet.go:1945] "SyncLoop REMOVE" source="api" pods=[api-cep-nbtt/api-cep-nbtt-59644cc6c6-pt6ct]
Jun 22 19:22:31 ip-172-18-23-144.ec2.internal kubelet[2949]: I0622 19:22:31.408869    2949 kubelet.go:2144] "Failed to delete pod" pod="api-cep-nbtt/api-cep-nbtt-59644cc6c6-pt6ct"...d not found"
Jun 22 19:22:32 ip-172-18-23-144.ec2.internal kubelet[2949]: E0622 19:22:32.034178    2949 kubelet_pods.go:160] "Mount cannot be satisfied for the container, because the volume is missing or the volume mounte...
Jun 22 19:22:32 ip-172-18-23-144.ec2.internal kubelet[2949]: E0622 19:22:32.034515    2949 kuberuntime_manager.go:864] container &Container{Name:api-cep-nbtt,Image:311635504079.dkr.ecr.u...inerPort:900
Jun 22 19:22:32 ip-172-18-23-144.ec2.internal kubelet[2949]: ],},HTTPGet:nil,TCPSocket:nil,},PreStop:nil,},TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:...lObjectRefer
Jun 22 19:22:32 ip-172-18-23-144.ec2.internal kubelet[2949]: E0622 19:22:32.037294    2949 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \\"StartContainer\\" for \\"api-cep-nbtt\\" wi...
Jun 22 19:22:32 ip-172-18-23-144.ec2.internal kubelet[2949]: I0622 19:22:32.326023    2949 kubelet.go:1973] "SyncLoop (PLEG): event for pod" pod="api-cep-nbtt/api-cep-nbtt-59644cc...285179c693f}
Jun 22 19:22:32 ip-172-18-23-144.ec2.internal kubelet[2949]: I0622 19:22:32.337575    2949 kubelet.go:1973] "SyncLoop (PLEG): event for pod" pod="api-cep-nbtt/api-cep-nbtt-59644cc...3ee0fbe11b0}
Jun 22 19:22:33 ip-172-18-23-144.ec2.internal kubelet[2949]: I0622 19:22:33.383158    2949 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=33966fe3-4ceb-481d-9d57-4586953a8692...692/volumes"
Hint: Some lines were ellipsized, use -l to show in full.
 

Full version:

[root@ip-172-31-23-144 bin]# systemctl status kubelet.service -l
 kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubelet-args.conf, 30-kubelet-extra-args.conf
   Active: active (running) since Thu 2023-06-22 18:33:15 UTC; 1h 28min ago
     Docs: <https://github.com/kubernetes/kubernetes>
 Main PID: 2949 (kubelet)
   CGroup: /runtime.slice/kubelet.service
           └─2949 /usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime remote --container-runtime-endpoint unix:///run/containerd/containerd.sock --node-ip=172.31.23.144 --pod-infra-container-image=311401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5 --v=2 --node-labels=intent=apps,karpenter.sh/capacity-type=spot,karpenter.sh/provisioner-name=default

Jun 22 19:22:31 ip-172-31-23-144.ec2.internal kubelet[2949]: I0622 19:22:31.402350    2949 kubelet.go:1951] "SyncLoop DELETE" source="api" pods=[api-cep-homolog/api-cep-nbtt-59644cc6c6-pt6ct]
Jun 22 19:22:31 ip-172-31-23-144.ec2.internal kubelet[2949]: I0622 19:22:31.408804    2949 kubelet.go:1945] "SyncLoop REMOVE" source="api" pods=[api-cep-homolog/api-cep-nbtt-59644cc6c6-pt6ct]
Jun 22 19:22:31 ip-172-31-23-144.ec2.internal kubelet[2949]: I0622 19:22:31.408869    2949 kubelet.go:2144] "Failed to delete pod" pod="api-cep-homolog/api-cep-nbtt-59644cc6c6-pt6ct" err="pod not found"
Jun 22 19:22:32 ip-172-31-23-144.ec2.internal kubelet[2949]: E0622 19:22:32.034178    2949 kubelet_pods.go:160] "Mount cannot be satisfied for the container, because the volume is missing or the volume mounter (vol.Mounter) is nil" containerName="api-cep-nbtt" ok=false volumeMounter={Name:kube-api-access-8lv5h ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil> SubPathExpr:}
Jun 22 19:22:32 ip-172-31-23-144.ec2.internal kubelet[2949]: E0622 19:22:32.034515    2949 kuberuntime_manager.go:864] container &Container{Name:api-cep-nbtt,Image:311635504079.dkr.ecr.us-east-1.amazonaws.com/api-apphook:supervisor-193,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{ContainerPort{Name:9000tcp,HostPort:0,ContainerPort:9000,Protocol:TCP,HostIP:,},},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{cpu: {{500 -3} {<nil>} 500m DecimalSI},memory: {{536870912 0} {<nil>}  BinarySI},},Requests:ResourceList{cpu: {{200 -3} {<nil>} 200m DecimalSI},memory: {{268435456 0} {<nil>}  BinarySI},},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-8lv5h,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:&Lifecycle{PostStart:&Handler{Exec:&ExecAction{Command:[/bin/bash -c touch /var/www/storage/logs/laravel.log && chgrp -R www-data /var/www/bootstrap/ /var/www/storage/ /var/www/storage/logs/ && chmod -R g+w /var/www/bootstrap/ /var/www/storage/ /var/www/storage/logs/ && tail -f /var/www/storage/logs/laravel.log > /proc/1/fd/2 &
Jun 22 19:22:32 ip-172-31-23-144.ec2.internal kubelet[2949]: ],},HTTPGet:nil,TCPSocket:nil,},PreStop:nil,},TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{EnvFromSource{Prefix:,ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:api-apphook-php,},Optional:nil,},SecretRef:nil,},},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod api-cep-nbtt-59644cc6c6-pt6ct_api-cep-homolog(22966fe3-4ceb-481d-9d57-4586953a8692): CreateContainerConfigError: cannot find volume "kube-api-access-8lv5h" to mount into container "api-cep-nbtt"
Jun 22 19:22:32 ip-172-31-23-144.ec2.internal kubelet[2949]: E0622 19:22:32.037294    2949 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \\"StartContainer\\" for \\"api-cep-nbtt\\" with CreateContainerConfigError: \\"cannot find volume \\\\\\"kube-api-access-8lv5h\\\\\\" to mount into container \\\\\\"api-cep-nbtt\\\\\\"\\"" pod="api-cep-homolog/api-cep-nbtt-59644cc6c6-pt6ct" podUID=22966fe3-4ceb-481d-9d57-4586953a8692
Jun 22 19:22:32 ip-172-31-23-144.ec2.internal kubelet[2949]: I0622 19:22:32.326023    2949 kubelet.go:1973] "SyncLoop (PLEG): event for pod" pod="api-cep-homolog/api-cep-nbtt-59644cc6c6-h8km8" event=&{ID:5a5e6258-6d1e-4faa-ac8c-4a0c10670eb6 Type:ContainerStarted Data:9af88fa8f99000720300c1d7f7cab91501db31b5a4ba36d51c9a0285179c693f}
Jun 22 19:22:32 ip-172-31-23-144.ec2.internal kubelet[2949]: I0622 19:22:32.337575    2949 kubelet.go:1973] "SyncLoop (PLEG): event for pod" pod="api-cep-homolog/api-cep-nbtt-59644cc6c6-zrv7z" event=&{ID:eabe3711-a972-494b-922a-3ad1d5c72936 Type:ContainerStarted Data:f9cdd2dd078343c8f2fa9f4744a14d19d4a24992011c0b31ef33f3ee0fbe11b0}
Jun 22 19:22:33 ip-172-31-23-144.ec2.internal kubelet[2949]: I0622 19:22:33.383158    2949 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=22966fe3-4ceb-481d-9d57-4586953a8692 path="/var/lib/kubelet/pods/22966fe3-4ceb-481d-9d57-4586953a8692/volumes"

However, analyzing the logs did not come to anything conclusive.

Solution

After carrying out a lot of research and testing, the solution to avoid errors in kubernetes pod status containercreating was found:

It is necessary to modify the Container Runtime used in the Karpenter Provisioner.

From:
containerd

To:
dockerd

kubernetes

In the Provisioner settings in Karpenter, it was necessary to adjust the ContainerRuntime like this:

############### ContainerRuntime 
  kubeletConfiguration : 
  containerRuntime : dockerd

Adjust Karpenter provisioner:

kubectl delete -f provisioner_v4.yaml
kubectl apply -f provisioner_v4.yaml

Nodes rising after adjustment:

fernando@lab-1487   ~                                                                                                                                                                  11123  17:16:18  
kubectl get node --selector=intent=apps -L kubernetes.io/arch -L node.kubernetes.io/instance-type -L karpenter.sh/provisioner-name -L topology.kubernetes.io/zone -L karpenter.sh/capacity-type --watch
NAME                            STATUS    ROLES    AGE   VERSION   ARCH    INSTANCE-TYPE   PROVISIONER-NAME   ZONE         CAPACITY-TYPE
ip-172-31-21-230.ec2.internal   Unknown   <none>   0s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   0s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   0s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   0s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   0s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   0s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   2s              amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Unknown   <none>   20s             amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   NotReady   <none>   20s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   NotReady   <none>   20s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   NotReady   <none>   20s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   NotReady   <none>   20s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   NotReady   <none>   50s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   60s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   60s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   60s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   60s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   78s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   80s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot
ip-172-31-21-230.ec2.internal   Ready      <none>   110s   v1.21.14-eks-48e63af   amd64   c5.large        default            us-east-1b   spot

And now the Pods managed to climb, all of them with status in running, as expected:

fernando@lab-1487   ~  kubectl get pods -A -o wide --watch | grep 21-230                                                                                                 1|SIGINT(2) ↵  11116  17:20:59  
app-lab-homolog            app-lab-nginx-574bf47b65-whd9l                 1/1     Running   0          3m28s   172.31.22.233   ip-172-31-21-230.ec2.internal   <none>           <none>
app-lab-homolog            app-lab-supervisor-56479db977-28hbv            1/1     Running   0          3m28s   172.31.31.106   ip-172-31-21-230.ec2.internal   <none>           <none>
default                         inflate-b9d769f59-nm6z2                             1/1     Running   0          8m35s   172.31.26.33    ip-172-31-21-230.ec2.internal   <none>           <none>
kube-system                     aws-node-n2n6w                                      1/1     Running   0          4m40s   172.31.21.230   ip-172-31-21-230.ec2.internal   <none>           <none>
kube-system                     filebeat-clprj                                      1/1     Running   0          4m39s   172.31.21.230   ip-172-31-21-230.ec2.internal   <none>           <none>
kube-system                     kube-proxy-zhzdt                                    1/1     Running   0          4m40s   172.31.21.230   ip-172-31-21-230.ec2.internal   <none>           <none>
kube-system                     metricbeat-cbsql                                    1/1     Running   0          4m40s   172.31.21.230   ip-172-31-21-230.ec2.internal   <none>           <none>
lens-metrics                    node-exporter-nlsbf                                 1/1     Running   0          4m40s   172.31.28.153   ip-172-31-21-230.ec2.internal   <none>           <none>
monitoring                      zabbix-agent-5fqb5                                  1/1     Running   0          4m39s   172.31.21.230   ip-172-31-21-230.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-supervisor-59644cc6c6-mdnvg             0/1     Pending             0          0s      <none>          ip-172-31-21-230.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-supervisor-59644cc6c6-mdnvg             0/1     ContainerCreating   0          0s      <none>          ip-172-31-21-230.ec2.internal   <none>           <none>
api-cep-homolog             api-cep-supervisor-59644cc6c6-mdnvg             1/1     Running             0          20s     172.31.30.64    ip-172-31-21-230.ec2.internal   <none>           <none>

Understanding the Problem

Dockershim vs CRI(Container Runtime Interface)

This article addresses the solution to the problem that occurred in an EKS cluster in version 1.21, where the use of Dockershim was a viable option. Dockershim served as a bridge between ‘kubelet’ (a Kubernetes component) and the Docker Engine. However, this interface has been discontinued, as Kubernetes has shifted its focus towards runtimes that adhere to the CRI (Container Runtime Interface) standard. This change was made to streamline the integration between Kubernetes and various container runtime implementations, allowing for greater flexibility and compatibility across different container technologies.

Amazon Elastic Kubernetes Service (Amazon EKS) has also ended support for “dockershim” starting with Kubernetes version 1.24. The official Amazon Machine Images (AMI) for version 1.24 and later only include the “containerd” as runtime, while versions prior to 1.24 included both Docker Engine and “containerd,” with Docker Engine as the default runtime.

It is necessary to keep in mind that there are specific Amazon Machine Images (AMIs) for AWS EKS that do not use Dockershim, but rather the “containerd” directly. An example is Bottlerocket, so it is important to be careful during this change, so as not to end up impacting other existing workloads in the same cluster.

Versions

Following are some details about the versions of resources used in the environment:

  • Kubernetes managed by AWS (EKS), version 1.21.
  • Karpenter v0.23.0.
  • Container Runtime: docker://20.10.17
  • Container Runtime Version: containerd://1.6.6
  • Kubelet Version: v1.21.14-eks-48e63af
  • Kube-Proxy Version: v1.21.14-eks-48e63af

Kubernetes Version Matrix

When handling Container Runtime, versions, etc., it is interesting to evaluate whether the chosen version is compatible with the Kubernetes cluster version.

ContainerD

To evaluate the most suitable version of ContainerD, we can use the website below:

https://containerd.io/releases

This site presents a matrix that shows the versions of the “containerd” recommended for a given Kubernetes version. Indicates that any actively supported version of “containerd” may receive bug fixes to resolve issues found in any version of Kubernetes. Recommendation is based on the most extensively tested versions.

If the version is not available in this matrix, you can find it in the release history:

https://github.com/containerd/containerd/releases

Dockershim

For Docker, it is possible to check the changelog, according to the Kubernetes version tag:

https://github.com /kubernetes/kubernetes/tree/v1.21.0/pkg/kubelet/dockershim

We do not have a matrix containing the versions of Dockershim compatible with each version of Kubernetes, however in the case of AWS EKS, we have plenty of details regarding theDockershim deprecation on the page below:
https://docs.aws.amazon.com/eks/latest/userguide/dockershim-deprecation.html

Other possibilities

When a pod remains in the ContainerCreating status, it indicates that the container within the pod is struggling to start. This can happen due to various reasons, such as resource constraints, misconfigured settings, or issues with the underlying infrastructure. Let’s dive into the details.

Resource Constraints

One common cause is insufficient resources—CPU, memory, or storage. Pods may fail to start if they don’t have enough resources allocated. We’ll explore how to adjust resource requests and limits to prevent this issue of Pods stuck in ContainerCreating status with Karpenter.

Image Pull Failures

Another culprit is image pull failures. If the container image specified in your pod’s configuration cannot be pulled from the container registry, the pod won’t start. We’ll discuss strategies to troubleshoot and resolve image-related problems.

Troubleshooting Steps

Now that we’ve identified potential causes, let’s walk through troubleshooting steps:

  1. Check Resource Requests and Limits: Review your pod’s resource requests and limits. Adjust them as needed to ensure sufficient resources are available.
  2. Inspect Image Pull Errors: Examine the pod’s events to identify any image pull errors. Verify that the image exists in the specified registry and that authentication (if required) is correctly configured.
  3. Analyze Pod Logs: Dive into the pod’s logs to uncover any specific issues during container startup. Look for error messages related to initialization, dependencies, or configuration.
  4. Review Node Conditions: Check the health of the underlying node where the pod is scheduled. Node conditions like disk pressure, memory pressure, or network issues can impact pod creation.

Best Practices

To prevent future occurrences, consider the following best practices:

  • Resource Planning: Estimate resource requirements accurately and set appropriate requests and limits.
  • Image Pull Policies: Implement image pull policies to avoid unnecessary retries and delays.
  • Health Probes: Configure readiness and liveness probes to ensure containers are healthy before serving traffic.
  • Node Maintenance: Regularly maintain nodes to prevent resource exhaustion or other issues.

Conclusion

By following these steps and best practices, you can troubleshoot and resolve the dreaded ContainerCreating status. Remember to monitor your pods and nodes continuously to catch issues early. Happy Kubernetes troubleshooting!


Supporting material

Karpenter for Kubernetes | Karpenter vs Cluster Autoscaler

More

Encountered other Kubernetes challenges or have questions about pod provisioning? Explore our comprehensive blog posts and resources for in-depth guidance and troubleshooting tips. Additionally, feel free to reach out to our team of experts for personalized assistance.


*Disclaimer: The information provided in this article is for educational purposes only


Cover image from freepik

Compartilhe / Share
Fernando Müller Junior
Fernando Müller Junior

I am Fernando Müller, a Tech Lead SRE with 16 years of experience in IT, I currently work at Appmax, a fintech located in Brazil. Passionate about working with Cloud Native architectures and applications, Open Source tools and everything that exists in the SRE world, always looking to develop and learn constantly (Lifelong learning), working on innovative projects!

Articles: 39

Receba as notícias por email / Receive news by email

Insira seu endereço de e-mail abaixo e assine nossa newsletter / Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *