Top 5 Container Metrics
1. Memory Consumption
Memory is the most critical metric when it comes to Docker containers because when the memory is full, the container stops working. Java applications usually face the problem of being terminated when memory isn’t configured correctly. When a Docker container’s memory hits the limit, Docker kills the process inside the container that is causing that. Docker protects the host by stopping the container so there’s no chance it will affect other containers running in the same host.What’s good about containers is that they can be started again in a matter of seconds if Docker stopped them. Even so, the application will have some downtime—even if it’s only for a short period. If you’re using Docker Compose, you can configure the container to start again when it’s stopped. Or if you’re using Kubernetes, you can configure the desired state of how many containers you want to be running all the time. Kubernetes will then continuously validate that the state is compliant by recreating containers.Memory is essential not only to avoid unexpected responses for an application, but also to set the guidelines for when to scale out the containers. The memory metric will set a threshold to define autoscaling rules. An auto-scaling rule might say, “If memory consumption is higher than 80 percent, then start two more containers.”It’s always best to scale out containers based on memory consumption to avoid unresponsive applications. It’s good to know what the memory consumption is, but you need to take preventive actions. Container orchestrators like Kubernetes enable you to do that.2. CPU Usage
Another critical metric in a container is CPU usage. In Docker, by default containers have full access to the CPU resources in the host. To avoid affecting other containers in the host, you can limit the percentage of CPU that Docker will allow a container to use. For example, let’s say that a container has a CPU configuration of 0.5. That means the container will have access to 50 percent at most of one CPU core from the host.Docker will not stop a container when the container reaches its CPU limit, but the application’s performance will be negatively impacted when this happens. With the CPU metric, you can quickly identify if you need to give more CPU resources to the container or not. Containers will help you to optimize resources but only if you know where you need to tune—for example if you’re configuring way too many CPU resources and the container only uses a small portion of those resources.As with memory, the CPU metric works as a threshold to configure autoscaling rules for the containers. In Kubernetes or any other container orchestrator, you can define autoscaling rules when a container reaches the CPU limit to add more containers. Therefore, applications performance will be steady because the orchestrator will make sure to provide as many containers as are needed.3. Disk Operations
Disk operations metrics like I/O operations or disk usage are numbers that could affect more than just a single container’s performance. The disk is a resource that both containers and the host depend on to provide good performance.You can get I/O metrics with the “docker stats” command. And you can get disk usage metrics with the “docker system df” and get more detailed information by adding the “-v” flag. Controlling the disk usage in the host is crucial to avoid problems when pulling new container images. Images could be huge, which is why you might need to run a clean-up process to remove old images. Logs written to the disk is another reason why disk usage increases. So “log rotation” is a good practice you still need to implement with containers.Even though containers are stateless by default, other applications are stateful like a database. For stateful applications, you’ll use volumes. A volume is how you map a folder or driver inside the container with another path or driver in the host. You could create volumes in the host before assigning them to a container.Disks become a shared resource that could affect containers’ and the host’s performance.4. Network Traffic
As you’ve seen with the previous metrics, the same type of metrics that are important in a virtual or physical server are still crucial for containers. So networking traffic can’t be missed when discussing the top metrics for containers. You could get networking traffic with the “docker stats” command to know the amount of data a container has sent or received.Knowing how much traffic is happening in a container could be a clue whether the host for a container is appropriate or not. Maybe the container needs to be running in a host that has better network performance. Or perhaps you need to scale out the containers. I’ve seen cases where specific applications, memory, and CPU were OK but the application’s performance was still poor.When taking a look at the networking metrics, I’ve found some interesting information. Sometimes, the network traffic was low, and it was because of a misconfiguration in the load balancer. Other times, the applications in the container weren’t able to use enough network bandwidth.My point here is that there were times where I wasn’t able to spot problems (for example because of a host type or a bug in the application) with memory or CPU metrics, but I was able to with network metrics.5. Number of Containers Running
To run containers at scale, you need to use a container orchestrator like Kubernetes or Swarm. There are going to be times when the orchestrator won’t be able to schedule more containers because there are no more resources available. To fix this, you could use the number of containers that are running to scale out the host.I noticed the importance of this metric when a friend used it to demonstrate that certain containers were having problems. It turns out that because of a misconfiguration, the container was running only for a period of time. Of course, the orchestrator was spinning up a new container but the application was unstable.You can get this metric by running a “docker container is” command or by asking the orchestrator how many containers are running for an application. In Kubernetes you can get this information by listing the pods that are running or by getting the state of a deployment object.The number of containers running is not a metric of a container per se. Sometimes you need to zoom out to get a different perspective on a problem.Don’t Rely Only Upon Top Metrics
I didn’t give you too many details on how to collect these types of metrics for containers. The reason for that is that you now have a lot of tools for that job, which are available as paid services or for free. Some examples of free tools are cAdvisor, InfluxDB, Grafana, or Prometheus. These tools will give you other metrics—not just for the containers, but also for the host. So don’t rely only on the top metrics I listed here. There are going to be times when these metrics look good and maybe the host is running out of IOPS, or the memory swap is terrible, which is when alternate metrics will come in handy.Knowing which are the top metrics for containers is just the beginning of improving the lead time. For example, you can also use these metrics when drawing the value stream mapping.Relevant Articles
Revolutionize Your IT Landscape with Digital Twins
In today’s fast-paced digital landscape, organizations seek innovative strategies to increase operational visibility, improve decision-making, and fuel business agility. One emerging powerhouse concept that addresses these needs is the Digital Twin—the practice of...
What makes a Good Deployment Manager?
Deployment management is a critical aspect of the software development process. It involves the planning, coordination, and execution of the deployment of software applications to various environments, such as production, testing, and development. The deployment...
DevOps vs SRE: How Do They Differ?
Nowadays, there’s a lack of clarity about the difference between site reliability engineering (SRE) and development and operations (DevOps). There’s definitely an overlap between the roles, even though there are clear distinctions. Where DevOps focuses on automation...
Self-Healing Data: The Power of Enov8 VME
Introduction In the interconnected world of applications and data, maintaining system resilience and operational efficiency is no small feat. As businesses increasingly rely on complex IT environments, disruptions caused by data issues or application failures can lead...
What is Data Lineage? An Explanation and Example
In today’s data-driven world, understanding the origins and transformations of data is critical for effective management, analysis, and decision-making. Data lineage plays a vital role in this process, providing insights into data’s lifecycle and ensuring data...
What is Data Fabrication? A Testing-Focused Explanation
In today’s post, we’ll answer what looks like a simple question: what is data fabrication? That’s such an unimposing question, but it contains a lot for us to unpack. Isn’t data fabrication a bad thing? The answer is actually no, not in this context. And...