How Docker Containers Make Use of Cgroups

May 07, 2024

Allocating finite compute resources efficiently among multiple processes on a shared machine can be difficult. A single greedy process, known as a noisy neighbour, can monopolize all system resources (CPU, memory, network bandwidth), depriving other containers and potentially causing the entire system to crash. The Linux kernel uses control groups, or cgroups, to prevent these scenarios.

Container platforms like Docker use cgroups to create isolated environments for each container. By default, Docker assigns CPU shares, memory limits, block I/O weights, and other resource constraints to newly created containers. This ensures predictable performance and prevents a single container from dominating the host machine.

In this article, we will delve into how the Linux kernel uses cgroups to manage resources. We will examine how Docker builds on cgroups to guarantee compute resources for each container. Understanding the internal operations of cgroups and container resource management can provide useful insights for those running containerized workloads.

What are cgroups

Cgroups, or control groups, are features in the Linux kernel that isolate and limit the amount of system resources a process can use. This helps to prevent any single process from consuming all available resources and starving others. Cgroups define quotas for CPU time, system memory, network bandwidth, and more for each process.

Key Resources Monitored by cgroups

Below is a breakdown of the main resources that cgroups can monitor and control:

CPU: Cgroups can limit the amount of CPU time a group of processes can use. By setting shares or hard limits, certain groups can be prioritized, or CPU monopolization by processes can be prevented.
Memory: Cgroups can set memory limits for groups of processes, preventing runaway processes from exhausting system memory and negatively impacting other applications.
Block I/O: Cgroups allow for controlling disk input/output (I/O) bandwidth. This is useful for balancing disk access among different applications or preventing a single process from dominating disk resources.
Network: Network bandwidth available to cgroups can be limited. This can be beneficial in cases where you want to prioritize network traffic for specific applications or prevent excessive usage by particular processes.

Checking Cgroups for a Process

Knowing how to find and inspect cgroups is key to grasp their role in resource management. To find out which cgroups a particular process, such as your bash shell, is part of, we can make use of the /proc pseudo-filesystem. For example, get the PID of the bash shell you're currently using by running:

$ pgrep bash
$ pidof bash
$ ps -fC bash

Now run cat /proc/[PID]/cgroup ,replacing [PID] with the actual PID you found:

$ cat /proc/[PID]/cgroup

This will show the "slice" and "scope" that contain the process. Slices organize cgroups hierarchically, while scopes contain the actual settings that constrain processes. To check available resources that can be modified for that process navigate to the Cgroup Directory (/sys/fs/cgroup/[path], where [path] is the cgroup path we discovered (e.g., user.slice/user-1000.slice/session-4.scope)

While the manual method provides detailed insights, it can be a bit cumbersome. For a more straightforward overview, consider these tools:

systemd-cgls: This command gives a hierarchical arrangement of cgroups and their processes, making it easier to understand the structure at a glance.
lscgroup: Part of the cgroup-utils package, this utility offers a clear listing of available cgroups, simplifying the task of identifying and examining specific cgroups.

Cgroups in action

Now that you understand what cgroups are and how to view them, the next step is to explore how docker use cgroups to restrict the resources available to processes. Docker has various options that utilize cgroups under the hood to limit the resources that a container can use.

Let’s take closer look at them all.

Limiting CPU usage

CPU resource limits can be set on a docker run command by specifying the “--cpus” flag.

In the following example, we run an instance of an Ubuntu container and set the cpu limit to 1 CPU:

$ sudo docker run -it -cpus="1.0" ubuntu

We can verify that indeed our container is now limmited to 1 CPU by making use of the stress command. stress is a command-line tool in Linux that allows you to load and stress a computer system. It imposes certain types of computing stress such as CPU, memory, I/O, and disk stress on the machine.

Inside the docker container, run:

$ stress -c 3

This will start 3 proccesses that consume a total of 3 CPU . Executing the top command in another window, we can verify the effect on the host's CPU. Notice, the stress proccess are only restricted to 1 CPUs. If they was no CPU limitation the stress command would have utilized 3 cpus.

Restricting Memory

Memory limits are set using the “-m” or “--memory” flags on docker run. This sets the maximum amount of system memory the container can use, including cache and buffers. The Docker daemon will terminate containers that exceed their memory limit. Setting memory reservations with “--memory-reservation” ensures containers have a minimum guaranteed amount of memory when limits are in place. For example:

$ sudo docker run -it --memory="1GB" ubuntu

Executing the above will restrict the container to 1GB of memory. We can test this restriction by also employing the stress command:

$ stress --vm 2 --vm-bytes 2048M --timeout 80s

In this command:

-vm 2 specifies that two workers should stress the memory.
-vm-bytes 2048M instructs each worker to allocate 2048M megabytes of memory.
-timeout 80s means the stress test will run for 80 seconds.

Limiting Process Counts

It's crucial to limit the number of processes a container can spawn. Without this restriction, unchecked process creation may exhaust resources.

The --pids-limit flag sets a hard cap on the number of processes per container. For instance:

docker run -it --pids-limit=64 ubuntu bash

This command would prevent the Ubuntu container from creating more than 64 processes. If the limit is surpassed, the Docker daemon will terminate the container with an out-of-memory error.

Setting process limits can prevent fork bombs and runaway daemonized processes from consuming all available system resources with excessive child processes.

The --pids-limit flag is supported in Docker engine 1.12 and later versions. If this setting is not specified, the default number of processes per container is 'unlimited'. Be cautious when setting limits since critical wrapper and supervisor processes may require additional child processes to function properly.

Summing up

Managing limited compute resources across multiple containers is crucial for system stability and performance. The Linux kernel employs cgroups to isolate and constrain resource usage by containers, preventing the "noisy neighbour" problem where a single greedy container monopolizes resources, starving others. Docker builds upon cgroups, setting default limits on CPU, memory, disk I/O, and more when creating new containers, ensuring predictable behaviour and shielding the host from resource exhaustion.

Administrators can effectively balance resource access across containers by understanding and configuring cgroup controls like CPU usage restrictions, memory caps, and process count limits, enabling efficient and stable system operation.

sysxplore