Detecting swap usage by Ceph daemons
TL;DR: In good Ceph architectures we prefer daemons to be killed, instead of being swapped. This behaviour is much more predictable, detectable and usually informs you about where to fix the actual problem. We use this method to detect whether Ceph daemons are being swapped by the kernel. To find the code with comments and formatting, scroll down.
for pid in $(pgrep "ceph|radosgw");do grep "VmSwap\|Name\|^Pid" /proc/$pid/status | awk '{ printf "%-20s %-40s\n", $1, $2}';echo -en "Commandline:\t";cat /proc/$pid/cmdline|tr '\000' ' ';printf '\n%s\n';done
System memory
Computer systems use memory to load programs and execute instructions. This memory is fast, relatively expensive and always limited in size. When an operating system needs to allocate more memory than the available memory in the system there are a couple of things it can do, for example:
-
- kill the process;
-
- kill another process;
-
- crash the kernel.
In Linux based systems this behaviour is governed by the kernel OOM killer (Out-Of-Memory killer).
Memory paging
Within most generic computing applications, it is undesirable to have systems or programs crash. Therefore, an alternative to that behaviour is available called Memory paging. Memory paging refers to the concept of storing data from memory on secondary storage (slow), like a hard drive. So, instead of killing a process, the operating system will move some of its memory to a slower memory tier, its hard disk. Within Linux, this behaviour is called swap.
Implications of swap usage
We tend to say that “nothing is free”. So, in this case, that raises the question: “How do I pay for using swap?” Well, mostly you pay for it in lack of predictable performance. Anything that is still in memory will be relatively fast, anything in swap will be relatively slow.
Ceph daemons
Ceph daemons, the programs that make up Ceph, are designed for failure. That means that in a good Ceph architecture a Ceph daemon may fail without negatively impacting the overall availability of data from a client’s point of view. We also prefer predictable performance over an individual Ceph daemons’ uptime. When the performance of a Ceph daemon becomes unpredictable, and we combine that with many daemons that have the same behaviour working together, this can even lead to service degradation or brown-out-like behaviour.
Therefore, in good Ceph architectures, we prefer daemons to be killed, instead of being swapped. This behaviour is much more predictable, detectable and usually informs you on where to fix the actual problem. Regarding how to fix the problem: You can do this by adding memory or adjusting the memory demands.
In short, we prefer to disable swap entirely on systems that host Ceph daemons. In some cases, we are bound by what the customer has built, or there may be compelling reasons to enable swap. For example, many hyper-converged systems use swap. For a lot of virtual machine workloads, swap is preferred over killing the vm.
So, in some cases, we simply want to know: “Are my Ceph daemons being swapped?”
You can check this using several different methods. We use the bash one line script below:
for pid in $(pgrep "ceph|radosgw");do grep "VmSwap|Name|^Pid" /proc/$pid/status | awk '{ printf "%-20s %-40s\n", $1, $2}';echo -en "Commandline:\t";cat /proc/$pid/cmdline|tr '\000' ' ';printf '\n%s\n';done
With comments and formatting:
#!/bin/bash
#Use prep to get the pids for Ceph processes
for pid in $(pgrep "ceph|radosgw")
do
#Show the swap usage for those processes
#Use awk to make the output readable
grep "VmSwap\|Name\|^Pid" /proc/$pid/status | awk '{ printf "%-20s %-40s\n", $1, $2}'
#Show the command line of the process
#This helps you identify which daemon it is
#We use ‘tr’ to split the cmdline on space
echo -en "Commandline:\t";cat /proc/$pid/cmdline|tr '\000' ' '
#prints one empty lines to split the output of multiple pins
printf '\n%s\n'
done
Output example:
Name: ceph-crash
Pid: 598
VmSwap: 0
Commandline: /usr/bin/python3.8 /usr/bin/ceph-crash
Name: ceph-msgr
Pid: 45942
Commandline:
Name: ceph-watch-noti
Pid: 147074
Commandline:
Name: ceph-completion
Pid: 147075
Commandline:
Name: ceph-mon
Pid: 618593
VmSwap: 0
Commandline: /usr/bin/ceph-mon -n mon.alpha -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
Name: ceph-mgr
Pid: 618869
VmSwap: 0
Commandline: /usr/bin/ceph-mgr -n mgr.alpha.jtqinn -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
Name: ceph-crash
Pid: 623699
VmSwap: 0
Commandline: /usr/libexec/platform-python -s /usr/bin/ceph-crash -n client.crash.alpha
Name: ceph-osd
Pid: 633218
VmSwap: 0
Commandline: /usr/bin/ceph-osd -n osd.0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
Name: ceph-osd
Pid: 653387
VmSwap: 0
Commandline: /usr/bin/ceph-osd -n osd.5 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
So, unless you specifically designed your cluster to use swap for Ceph daemons, the daemons should not be using swap. You can of course turn off swap, or restart your daemons to stop them from using swap. If you find your Ceph daemons are using swap and want to dig a little deeper, or just need some help in resolving that, feel free to contact us.