Detecting swap usage by Ceph daemons

Detecting swap usage by Ceph daemons

Detecting swap usage by Ceph daemons

TL;DR: In good Ceph architectures we prefer daemons to be killed, instead of being swapped. This behaviour is much more predictable, detectable and usually informs you about where to fix the actual problem. We use this method to detect whether Ceph daemons are being swapped by the kernel. To find the code with comments and formatting, scroll down.

for pid in $(pgrep "ceph|radosgw");do grep "VmSwap\|Name\|^Pid" /proc/$pid/status | awk '{ printf "%-20s %-40s\n", $1, $2}';echo -en "Commandline:\t";cat /proc/$pid/cmdline|tr '\000' ' ';printf '\n%s\n';done

System memory

Computer systems use memory to load programs and execute instructions. This memory is fast, relatively expensive and always limited in size. When an operating system needs to allocate more memory than the available memory in the system there are a couple of things it can do, for example:


      • kill the process;

      • kill another process;

      • crash the kernel.

    In Linux based systems this behaviour is governed by the kernel OOM killer (Out-Of-Memory killer).

    Memory paging

    Within most generic computing applications, it is undesirable to have systems or programs crash. Therefore, an alternative to that behaviour is available called Memory paging. Memory paging refers to the concept of storing data from memory on secondary storage (slow), like a hard drive. So, instead of killing a process, the operating system will move some of its memory to a slower memory tier, its hard disk. Within Linux, this behaviour is called swap.

    Implications of swap usage

    We tend to say that “nothing is free”. So, in this case, that raises the question: “How do I pay for using swap?” Well, mostly you pay for it in lack of predictable performance. Anything that is still in memory will be relatively fast, anything in swap will be relatively slow.

    Ceph daemons

    Ceph daemons, the programs that make up Ceph, are designed for failure. That means that in a good Ceph architecture a Ceph daemon may fail without negatively impacting the overall availability of data from a client’s point of view. We also prefer predictable performance over an individual Ceph daemons’ uptime. When the performance of a Ceph daemon becomes unpredictable, and we combine that with many daemons that have the same behaviour working together, this can even lead to service degradation or brown-out-like behaviour.

    Therefore, in good Ceph architectures, we prefer daemons to be killed, instead of being swapped. This behaviour is much more predictable, detectable and usually informs you on where to fix the actual problem. Regarding how to fix the problem: You can do this by adding memory or adjusting the memory demands.

    In short, we prefer to disable swap entirely on systems that host Ceph daemons. In some cases, we are bound by what the customer has built, or there may be compelling reasons to enable swap. For example, many hyper-converged systems use swap. For a lot of virtual machine workloads, swap is preferred over killing the vm.

    So, in some cases, we simply want to know: “Are my Ceph daemons being swapped?”

    You can check this using several different methods. We use the bash one line script below:

    for pid in $(pgrep "ceph|radosgw");do grep "VmSwap|Name|^Pid" /proc/$pid/status | awk '{ printf "%-20s %-40s\n", $1, $2}';echo -en "Commandline:\t";cat /proc/$pid/cmdline|tr '\000' ' ';printf '\n%s\n';done

    With comments and formatting:

    #Use prep to get the pids for Ceph processes
    for pid in $(pgrep "ceph|radosgw")
    	#Show the swap usage for those processes
    	#Use awk to make the output readable
    	grep "VmSwap\|Name\|^Pid" /proc/$pid/status | awk '{ printf "%-20s %-40s\n", $1, $2}'
    	#Show the command line of the process
    	#This helps you identify which daemon it is
    	#We use ‘tr’ to split the cmdline on space
    	echo -en "Commandline:\t";cat /proc/$pid/cmdline|tr '\000' ' '
    	#prints one empty lines to split the output of multiple pins
    	printf '\n%s\n'

    Output example:

    Name:                ceph-crash
    Pid:                 598
    VmSwap:              0
    Commandline:	/usr/bin/python3.8 /usr/bin/ceph-crash
    Name:                ceph-msgr
    Pid:                 45942
    Name:                ceph-watch-noti
    Pid:                 147074
    Name:                ceph-completion
    Pid:                 147075
    Name:                ceph-mon
    Pid:                 618593
    VmSwap:              0
    Commandline:	/usr/bin/ceph-mon -n mon.alpha -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug  --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true
    Name:                ceph-mgr
    Pid:                 618869
    VmSwap:              0
    Commandline:	/usr/bin/ceph-mgr -n mgr.alpha.jtqinn -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
    Name:                ceph-crash
    Pid:                 623699
    VmSwap:              0
    Commandline:	/usr/libexec/platform-python -s /usr/bin/ceph-crash -n client.crash.alpha
    Name:                ceph-osd
    Pid:                 633218
    VmSwap:              0
    Commandline:	/usr/bin/ceph-osd -n osd.0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug
    Name:                ceph-osd
    Pid:                 653387
    VmSwap:              0
    Commandline:	/usr/bin/ceph-osd -n osd.5 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug

    So, unless you specifically designed your cluster to use swap for Ceph daemons, the daemons should not be using swap. You can of course turn off swap, or restart your daemons to stop them from using swap. If you find your Ceph daemons are using swap and want to dig a little deeper, or just need some help in resolving that, feel free to contact us.