Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
If you’re preparing for the CKA (Certified Kubernetes Administrator) exam, you know that monitoring cluster resources is a critical skill. I’ve been studying Kubernetes for a few years now, and I can assure you that mastering monitoring can make all the difference to your journey.
In this article, I’ll share my experience and strategies for efficiently monitoring Kubernetes clusters, focusing on the aspects covered in the CKA exam for the item”Monitor cluster and application resource usage”. We’ll look at commands, tools and practices that can help you understand resource monitoring in Kubernetes.
Topics
Kubernetes is designed to manage distributed and dynamic applications. This means that problems such as overloaded pods, underutilized nodes or applications with memory leaks can happen without warning. Monitoring:
In the CKA exam, you need to demonstrate practical knowledge of tools and metrics. Let’s see how.
One of the advantages of Kubernetes is that it offers integrated tools for monitoring resources. Here are the most important ones:
kubectl top
The kubectl top command is essential for monitoring CPU and memory usage in real time. It works in conjunction with Metrics Server, which collects these metrics from the cluster.
In my routine, I constantly use this command for quick checks.
To monitor node resources:
kubectl top nodes
Expected result:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-1 500m 25% 2Gi 50%
node-2 250m 12% 1Gi 25%
To visualize the use of pod resources:
kubectl top pods
Make sure that Metrics Server is installed before using kubectl top. You can check this with:
kubectl get deployment -n kube-system metrics-server<br>
Otherwise, you may receive an error message : Metrics API not available, similar to the image below:
For kubectl top to work, you need to install Metrics Server. Here’s how to do it:
# Metrics Server installation
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Various components necessary for the Metrics Server to function are installed:
You should now be able to check resource consumption via the kubectl top command.
In my lab I encountered some problems with the above manifest to upload the Metrics Server, the Pods were not Ready:
Investigating the logs, I noticed errors in the Metrics Server Pods:
Readiness probe failed: HTTP probe failed with statuscode: 500
I0125 17:59:00.032698 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0125 17:59:07.184763 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.104:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.104 because it doesn't contain any IP SANs" node="wsl2"
I0125 17:59:10.033275 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0125 17:59:20.035244 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0125 17:59:22.185345 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.104:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.104 because it doesn't contain any IP SANs" node="wsl2"
I0125 17:59:30.033193 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0125 17:59:36.919835 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0125 17:59:37.184158 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.104:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.104 because it doesn't contain any IP SANs" node="wsl2"
I0125 17:59:40.033405 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I was able to correct the errors and fix the Pods using this YAML manifest to upload the Metrics Server:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: registry.k8s.io/metrics-server/metrics-server:v0.6.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
Fixed this specification for Container by adjusting:
Then the Metrics Server Pod went up normally and it was possible to obtain the Pods’ metrics using the kubectl top pod -n kube-system command, for example:
Another way to monitor the cluster is to analyze logs and events. Here are the most useful commands:
These commands help identify specific problems, such as pods that are rebooting or unable to access resources.
Using the kubectl get events command, we can get events from a variety of resources, which helps a lot with troubleshooting.
It may be necessary to enter the Namespace using the -n parameter, in order to bring back all the desired events. In this case, I used -A, which returns all the Namespaces:
Kubernetes allows you to define requests and limits to control the pods’ CPU and memory usage. This is essential to ensure that resources are allocated fairly and to avoid performance problems.
apiVersion: v1
kind: Pod
metadata:
name: exemplo-pod
spec:
containers:
- name: exemplo-container
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Check the offending Pod
Always configure requests and limits in containers:
Prioritize critical Pods with requests:
Test the behavior in the laboratory:
Monitor metrics:
The correct use of requests and limits is not only essential for maintaining the stability of a Kubernetes cluster, but is also a critical topic for the CKA Exam. During your studies, practice configuring and checking resources to gain confidence in the exam.
Although the CKA exam focuses on the basics, it’s good to know some more robust tools. They may not be required, but they will help you better understand how the cluster works.
While native commands are useful, for more in-depth monitoring, I recommend tools such as Prometheus and Grafana. In my experience, these tools offer powerful visual insights.
A simple alert for high CPU:
- alert: HighCPUUsage
expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 0.8
for: 1m
labels:
severity: warning
annotations:
summary: "Uso de CPU acima de 80% no pod {{ $labels.pod }}"
This tool provides detailed information about the state of Kubernetes objects such as Deployments, Pods and Nodes.
OpenLens is a powerful open-source tool that makes it easy to manage Kubernetes clusters via a graphical interface. Built on the basis of the popular Lens, OpenLens stands out for being a more open and flexible version, allowing you to explore the full potential of Kubernetes without licensing limitations.
This tool is especially useful for administrators who want a clear and intuitive view of their cluster’s resources, including pods, services, deployments, CPU and memory metrics, as well as detailed insights into events and logs. OpenLens makes it possible to monitor cluster health and analyze problems more efficiently, using a highly interactive graphical interface.
Through its interface it is possible to understand the consumption of RAM, CPU, Network and other important elements for monitoring Kubernetes Clusters:
The main features of OpenLens include:
In practice, OpenLens simplifies tasks that could be more laborious via the command line, such as browsing namespaces or inspecting specific pod logs. It is an excellent option for those studying for the CKA, as it offers a visual environment that complements learning with kubectl and other CLI tools.
If you’re looking for a more intuitive way to explore and manage your clusters, OpenLens could be an indispensable addition to your set of advanced tools.
Focus on using kubectl and native tools, but get a basic idea of these integrations to expand your knowledge.
Let’s imagine a common scenario that you might encounter in the exam:
With this information, you can take actions such as increasing the limits or adjusting the application code.
Monitoring resource usage in Kubernetes is an essential skill for passing the CKA exam and also for your career. Tools such as kubectl top, kubectl describe and the configuration of requests and limits are important when acting against performance problems.
Remember: practice in a test environment. The more you explore these tools, the more confident you’ll be in the test. And if you need any more tips or help, feel free to contact me or leave a comment here!
Keep reviewing the content for the CKA Exam, check out this post where we review the part about the ETCD: