Choose the right tool for the successful monitoring of Kubernetes!
Kubernetes is a production-ready, open-source platform designed with Google’s acquired experience in container orchestration, associated with best-of-breed ideas from the public. It is projected to automate deploying, scaling, and operating application containers.
With the modern way of building and running applications, your control and observability strategies need to advance, and so the tools that you use. The traditional infrastructure monitoring tools may not be sufficient, and you need a specialized Kubernetes monitoring system, as listed below.
Some help with logs and others with metrics. Some give an interface for operating Kubernetes from a birds-eye view. Some are Kubernetes-native, while others are more agnostic.
Let’s explore the following tools to monitor Kubernetes.
Prometheus + Grafana
Prometheus is one of the most popular and best monitoring tools used with Kubernetes. This tool is developed early by SoundCloud and later donated to the CNCF. Google Borg Monitor inspires it.
Well, Prometheus stores all its data as a time sequence. In a nutshell, the thing that makes Prometheus stand out among other time-series databases is its built-in alerting mechanisms, multidimensional data model, pull vs. push model, PromQL (the Prometheus querying language), and of course, the ever-growing community.
Some more features of Prometheus includes:
- No reliance on distributed storage;
- Targets are discovered through the service discovery or static configuration.
- PromQL, a flexible query language to advantage this dimensionality
- Single server nodes are autonomous
- Time-series collection happens via a pull model over HTTP
- Pushing time series is supported through an intermediary gateway
- A multidimensional data model with time series data analyzed by metric name and key/value pairs
- And, multiple forms of graphing and dashboarding support
The best way to learn Prometheus is to install it on your dev server and play around with it. They got great documentation, but if you are looking for video-based learning, check out this Udemy course.
You can use Kube Prometheus, which offers end-to-end cluster monitoring. Alternatively, you can use Kube State Metrics to expose the state of the objects.
And to visualize the data, you can use Grafana.
Grafana is used to visualize metrics but also an alerting tool. Grafana can issue an alert on Slack, webhook, mail, or alternative communication channels. Another key reason is the source of your data: Grafana can query several entities simultaneously.
You can query Prometheus metrics from Grafana and visualize them, create a dashboard, and set alerting as you need. Grafana has a plugin for Kubernetes and got a beautiful dashboard.
By combining Prometheus and Grafana, you can achieve a great Kubernetes monitoring level for your production system.
The latest version of Checkmk includes a completely revamped Kubernetes monitoring, which allows you to instantly analyze and monitor the dynamic interrelationships of container infrastructures. This provides in-depth monitoring of all of your Kubernetes objects and is very simple to use.
You do not need prior experience with dynamic infrastructures and can set up Kubernetes monitoring within minutes. Checkmk automatically adds all of your Kubernetes objects and can monitor Kubernetes assets such as clusters, codes, deployments, pods, volumes, namespaces, daemonSets, and statefulSets.
The monitoring provides all relevant data in pre-configured dashboards. This allows you to detect bottlenecks and anomalies in resource consumption within minutes. You can create your own customized views as well. Navigate through multiple views by simply clicking through the various items.
The easiest way to deploy Checkmk in Kubernetes is to use a helm repository. tribe29 provides a template that users can adapt for their own environment. You can follow this video tutorial for Kubernetes and start with a free trial of the Checkmk Enterprise Edition.
Checkmk does not stop there, of course. With more than 2,000 out-of-the-box plug-ins, you can monitor any aspect of your IT infrastructure with just one single tool. Get to the bottom of issues and analyze the interrelation of your Kubernetes host system and your orchestrated containers, for example. With Checkmk, you get deep insights into CPU, memory, network bandwidth, and other metrics.
The visualization of data and the communication of alerts are adaptable to the requirements of diverse teams. Checkmk can also integrate with other monitoring tools such as Prometheus, ntop, or Datadog. You have all information in one place and can ensure that insights are automatically shared.
Some of the other features of Checkmk include:
- Smart alerting that understands Kubernetes’ self-healing capabilities and highlights critical conditions only when action is really needed
- Powerful Kubernetes cluster collector to get all of the data you need
- Supports TLS encryption to secure your monitoring
Checkmk is extremely scalable thanks to its high-performance monitoring core and its ability to support distributed monitoring.
Kubewatch is a Kubernetes watcher which publishes event notifications in a Slack channel. This tool provides you the facility to determine the resources you need to monitor. It is created in Golang and uses a Kubernetes client library to connect with a Kubernetes API server. This library serves as a base factor for the Kubernetes event watching.
kubewatch is simple to configure and can be deployed using either helm or system deployment. More clearly, kubewatch will see changes required to specific Kubernetes resources that you seek it to watch — deployments, daemon sets, pods, services, replica sets, services, replication controllers, secrets, and configuration maps.
Distributed tracing is steadily growing into monitoring and troubleshooting Kubernetes environments. Jaeger is a tracing system, which is released by Uber Technologies. It’s used for monitoring transactions and troubleshooting in complex distributed systems.
Jaeger features OpenTracing-based instrumentation for Java, Python, Node, and C++. It uses consistent upfront sampling with individual per service/endpoint probabilities and supports multiple storage backends — Cassandra, Elasticsearch, Kafka, and memory.
Some of the other features of Jaeger includes:
- Distributed transaction monitoring
- Distributed context propagation
- Performance / latency optimization
- Root cause analysis
- Service dependency analysis
cAdvisor is designed for assembling, processing, and exporting resource usage and production information about running containers. It’s also developed into Kubernetes and integrated into the Kubelet binary. It’s simple to use (it exposes Prometheus metrics out-of-the-box) but not robust enough to be recognized as an all-round monitoring solution.
Unlike others, cAdvisor is not deployed per pod but on the node level. It will auto-determine all the containers running on a system and collects system metrics such as memory, CPU, network, etc.
cAdvisor is a basic tool, and the following are some of its features.
- Native support for Docker containers and aid other container types.
- Supports exporting of the stats to various storage plugins, ex. InfluxDB etc.,
- It provides the overall machine usage by analyzing the ‘root’ container on the machine.
- Support for running standalone outside of the Docker or any other container also.
- cAdvisor operates per node. It auto-discovers all the containers in the given node and collects CPU, filesystem, and network usage statistics. You can view metrics on the Web-UI, which exports live information about all containers on the system.
Telepresence lets you run a particular service locally while connecting that service to a remote Kubernetes cluster. This lets developers working on multi-service operations to adopt any tool installed locally to check/debug/edit your service. For instance, you can run a debugger or IDE.
It also lets developers do fast local development of a particular service, even if that service depends on separate services in the cluster. Make a transition to your service, save, and you can instantly spot the new service in action.
Telepresence is an impressive local development environment for services running in Kubernetes. The live debugging part is unique and getting evolved quite rapidly. Below are some of its more features.
- Allow code running in the container to connect to an IDE or debugger running on the host.
- Telepresence uses an OpenShift-specific proxy image when it observes an OpenShift cluster.
- Telepresence also supports the forwarding traffic to and from other containers in the pod.
- Telepresence uses a Docker-accessible directory as the temporary dir.
Weave Scope is a troubleshooting & monitoring tool for Kubernetes. It makes logical topologies of your application and infrastructure, which facilitate you to consider, monitor, and control your containerized, microservices-based application.
It gives a top-down view of your app as well as your full infrastructure. It authorizes you to determine any problems with your distributed containerized app in real-time, as it is deployed to a cloud provider.
Some of the features of the Weave Scope includes:
- Support for any deployment style (Local, hosted, or hybrid) and the ability to collect and report Host/Container metrics
- Aggregate metrics, events, and labels from Kubernetes
- Real-time Contextual metrics
- Nodes can be filtered by CPU and Memory management so that you can quickly identify containers using the most resources.
With Zabbix, it is feasible to build virtually limitless types of data from the system. High-performance real-time monitoring systems that you can control tens of thousands of servers, virtual machines, and network devices simultaneously.
Besides saving the data, visualization features are accessible and extremely flexible ways of figuring out the data for alarming.
Some of the features of Zabbix includes:
- Root Cause Analysis
- Zabbix helps in keeping the data in JSON format, so many applications can also use it.
- Real-Time Monitoring
- Zabbix proxy is highly suggested for wide-scale production systems.
- Drill-Down Reports
- The low-level discovery automatically checks the new nodes without any struggle.
- Highly configurable and extensible.
Zabbix is significant and not just Kubernetes but fit to monitor infrastructure and application metrics too. If you are interested in learning Zabbix, then check out this brilliant course.
Not exactly a monitoring tool, but Kubernetes Dashboard is a general-purpose UI for Kubernetes where you can manage and troubleshoot the Kubernetes cluster.
If you don’t have any monitoring tools, then Dashboard would be a good start. Check out the installation guide.
Choosing the right Kubernetes tools is crucial. But guess what? Above all is available in FREE to try, so why not give it a try to see what works for your Kubernetes monitoring?
Happy monitoring and troubleshooting!