Monitoring Kubernetes clusters: Revision for the CKA Exam

If you’re preparing for the CKA (Certified Kubernetes Administrator) exam, you know that monitoring cluster resources is a critical skill. I’ve been studying Kubernetes for a few years now, and I can assure you that mastering monitoring can make all the difference to your journey.

In this article, I’ll share my experience and strategies for efficiently monitoring Kubernetes clusters, focusing on the aspects covered in the CKA exam for the item”Monitor cluster and application resource usage”. We’ll look at commands, tools and practices that can help you understand resource monitoring in Kubernetes.

image
image


Exploring monitoring in Kubernetes

Why is monitoring important?

Kubernetes is designed to manage distributed and dynamic applications. This means that problems such as overloaded pods, underutilized nodes or applications with memory leaks can happen without warning. Monitoring:

  • Helps identify performance bottlenecks.
  • It allows you to take preventive action before something affects your users.
  • Ensure that resources are used effectively.

In the CKA exam, you need to demonstrate practical knowledge of tools and metrics. Let’s see how.


Monitoring Kubernetes clusters – Native tools

One of the advantages of Kubernetes is that it offers integrated tools for monitoring resources. Here are the most important ones:

1. kubectl top

The kubectl top command is essential for monitoring CPU and memory usage in real time. It works in conjunction with Metrics Server, which collects these metrics from the cluster.

In my routine, I constantly use this command for quick checks.

Example:

To monitor node resources:

kubectl top nodes

Expected result:

NAME    CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node-1  500m         25%    2Gi             50%
node-2  250m         12%    1Gi             25%

To visualize the use of pod resources:

kubectl top pods

Exam tip:

Make sure that Metrics Server is installed before using kubectl top. You can check this with:

kubectl get deployment -n kube-system metrics-server<br>

Otherwise, you may receive an error message : Metrics API not available, similar to the image below:

image

Installing Metrics Server

For kubectl top to work, you need to install Metrics Server. Here’s how to do it:

# Metrics Server installation
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Various components necessary for the Metrics Server to function are installed:

metrics server

You should now be able to check resource consumption via the kubectl top command.

In my lab I encountered some problems with the above manifest to upload the Metrics Server, the Pods were not Ready:

metrics server pods

Investigating the logs, I noticed errors in the Metrics Server Pods:


Readiness probe failed: HTTP probe failed with statuscode: 500



I0125 17:59:00.032698       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0125 17:59:07.184763       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.104:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.104 because it doesn't contain any IP SANs" node="wsl2"
I0125 17:59:10.033275       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0125 17:59:20.035244       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0125 17:59:22.185345       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.104:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.104 because it doesn't contain any IP SANs" node="wsl2"
I0125 17:59:30.033193       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0125 17:59:36.919835       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0125 17:59:37.184158       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.104:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.104 because it doesn't contain any IP SANs" node="wsl2"
I0125 17:59:40.033405       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

I was able to correct the errors and fix the Pods using this YAML manifest to upload the Metrics Server:

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.3
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

Fixed this specification for Container by adjusting:

  • updated the secure port to –secure-port=4443
  • set the container port to containerPort: 4443
  • added arg –kubelet-insecure-tls

Then the Metrics Server Pod went up normally and it was possible to obtain the Pods’ metrics using the kubectl top pod -n kube-system command, for example:

kubectl top pods

2. Logs and Events

Another way to monitor the cluster is to analyze logs and events. Here are the most useful commands:

  • Logs of a pod: kubectl logs <pod-name>
  • Eventos do cluster: kubectl get events
  • Detailed description of a pod: kubectl describe pod <pod-name>

These commands help identify specific problems, such as pods that are rebooting or unable to access resources.

Using the kubectl get events command, we can get events from a variety of resources, which helps a lot with troubleshooting.

Point of attention

It may be necessary to enter the Namespace using the -n parameter, in order to bring back all the desired events. In this case, I used -A, which returns all the Namespaces:

kubectl get events

Configuring Pod Resources

Kubernetes allows you to define requests and limits to control the pods’ CPU and memory usage. This is essential to ensure that resources are allocated fairly and to avoid performance problems.

YAML example:

apiVersion: v1
kind: Pod
metadata:
  name: exemplo-pod
spec:
  containers:
  - name: exemplo-container
    image: nginx
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  • CPU: Expressed in millicores (e.g. 250m means 25% of 1 core).
  • Memory: Defined in values such as Mi (mebibytes) or Gi (gibibytes).

What are Requests and Limits?

  • Requests: Minimum amount of resources guaranteed for the pod.
  • Limits: Maximum limit that the pod can consume.

Behaviors with Requests and Limits

  1. CPU usage:
    • If a container exceeds the CPU limit, it will not be terminated, but will be throttled.
    • This means that they will not be able to use more than the configured limit.
  2. Memory usage:
    • If memory usage exceeds the limit, the container will be terminated with an OOMKilled (Out of Memory) error.
  3. No Requests or Limits:
    • Kubernetes allows the container to use unlimited resources (within the capacity of the node), but this can cause instability in the cluster.

Tips for the exam:

Check the offending Pod

  • In CKA, you may need to identify a pod that is consuming more resources than defined in the limits. Use kubectl describe pod to check this.

Always configure requests and limits in containers:

  • During the test, check the official Kubernetes documentation for default values, if necessary.

Prioritize critical Pods with requests:

  • Define requests for priority workloads, ensuring that they have the minimum necessary resources.

Test the behavior in the laboratory:

  • Simulate scenarios where a Pod consumes more resources than allowed to understand throttling and OOMKilled behavior.

Monitor metrics:

  • Use tools such as Prometheus and Grafana to check the actual resource consumption of containers.

The correct use of requests and limits is not only essential for maintaining the stability of a Kubernetes cluster, but is also a critical topic for the CKA Exam. During your studies, practice configuring and checking resources to gain confidence in the exam.

Advanced Tools

Although the CKA exam focuses on the basics, it’s good to know some more robust tools. They may not be required, but they will help you better understand how the cluster works.

Prometheus and Grafana

While native commands are useful, for more in-depth monitoring, I recommend tools such as Prometheus and Grafana. In my experience, these tools offer powerful visual insights.

  • Prometheus: Collects detailed cluster metrics.
  • Grafana: Creates dashboards to visualize these metrics.

Example of an alert in Prometheus:

A simple alert for high CPU:

- alert: HighCPUUsage
  expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 0.8
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: "Uso de CPU acima de 80% no pod {{ $labels.pod }}"

Kube-state-metrics

This tool provides detailed information about the state of Kubernetes objects such as Deployments, Pods and Nodes.

OpenLens: Visual manager for Kubernetes

OpenLens is a powerful open-source tool that makes it easy to manage Kubernetes clusters via a graphical interface. Built on the basis of the popular Lens, OpenLens stands out for being a more open and flexible version, allowing you to explore the full potential of Kubernetes without licensing limitations.

This tool is especially useful for administrators who want a clear and intuitive view of their cluster’s resources, including pods, services, deployments, CPU and memory metrics, as well as detailed insights into events and logs. OpenLens makes it possible to monitor cluster health and analyze problems more efficiently, using a highly interactive graphical interface.

Through its interface it is possible to understand the consumption of RAM, CPU, Network and other important elements for monitoring Kubernetes Clusters:

openlens
OpenLens – RAM consumption in a Pod

The main features of OpenLens include:

  • Real-time visualization of metrics: Allows you to monitor resource usage directly on the dashboard.
  • Multi-cluster management: Ideal for those working with several clusters simultaneously.
  • Support for customized extensions: You can add extensions to adapt OpenLens to your specific needs.

In practice, OpenLens simplifies tasks that could be more laborious via the command line, such as browsing namespaces or inspecting specific pod logs. It is an excellent option for those studying for the CKA, as it offers a visual environment that complements learning with kubectl and other CLI tools.

If you’re looking for a more intuitive way to explore and manage your clusters, OpenLens could be an indispensable addition to your set of advanced tools.

Exam tip:

Focus on using kubectl and native tools, but get a basic idea of these integrations to expand your knowledge.


Case Studies

Let’s imagine a common scenario that you might encounter in the exam:

Problem: A pod is being restarted constantly.

Steps to resolve:

  1. Check the pod’s status: kubectl get pod <pod-name>
  2. Analyze the logs: kubectl logs <pod-name>
  3. Check the events: kubectl describe pod <pod-name>
  4. Identify whether resource limits are being exceeded.

With this information, you can take actions such as increasing the limits or adjusting the application code.


Conclusion

Monitoring resource usage in Kubernetes is an essential skill for passing the CKA exam and also for your career. Tools such as kubectl top, kubectl describe and the configuration of requests and limits are important when acting against performance problems.

Remember: practice in a test environment. The more you explore these tools, the more confident you’ll be in the test. And if you need any more tips or help, feel free to contact me or leave a comment here!


Keep reviewing the content for the CKA Exam, check out this post where we review the part about the ETCD:

Compartilhe / Share
Fernando Müller Junior
Fernando Müller Junior

I am Fernando Müller, a Tech Lead SRE with 16 years of experience in IT, I currently work at Appmax, a fintech located in Brazil. Passionate about working with Cloud Native architectures and applications, Open Source tools and everything that exists in the SRE world, always looking to develop and learn constantly (Lifelong learning), working on innovative projects!

Articles: 46

Receba as notícias por email / Receive news by email

Insira seu endereço de e-mail abaixo e assine nossa newsletter / Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *