Quick overview of Ceph version running on OSDs

When checking a Ceph cluster it’s useful to know which versions you OSDs in the cluster are running. There is a very simple on-line command to do this: ceph osd metadata|jq ‘.[].ceph_version’|sort|uniq -c Running this on a cluster which is currently being upgraded to Jewel to Luminous it shows: 10 “ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)” 1670 […]

Do not use SMR disks with Ceph

Many new disks like the Seagate He8 disks are using a technique called Shingled Magnetic Recording to increase capacity. As these disks offer a very low price per Gigabyte they seem interesting to use in a Ceph cluster. Performance Due to the nature of SMR these disks are very, very, very bad when it comes […]

Testing Ceph BlueStore with the Kraken release

Ceph version Kraken (11.2.0) has been released and the Release Notes tell us that the new BlueStore backend for the OSDs is now available. BlueStore The current backend for the OSDs is the FileStore which mainly uses the XFS filesystem to store it’s data. To overcome several limitations of XFS and POSIX in general the […]

Chown Ceph OSD data directory using GNU Parallel

Starting with Ceph version Jewel (10.2.X) all daemons (MON and OSD) will run under the privileged user ceph. Prior to Jewel daemons were running under root which is a potential security issue. This means data has to change ownership before a daemon running the Jewel code can run. Chown data As the Release Notes state […]

Slow requests with Ceph: ‘waiting for rw locks’

Slow requests in Ceph When a I/O operating inside Ceph is taking more than X seconds, which is 30 by default, it will be logged as a slow request. This is to show you as a admin that something is wrong inside the cluster and you have to take action. Origin of slow requests Slow […]

The Ceph Trafficlight

At PCextreme we have a 700TB Ceph cluster which is used behind our public cloud Aurora Compute which runs Apache CloudStack. Ceph health One of the things we monitor of the Ceph cluster is it’s health. This can be OK, WARN or ERR. It speaks for itself that you always want to see OK, but […]

Ceph Monitors are laggy or clock might be skewed

This weekend I got to investigate a Ceph cluster which had issues where the Monitors were constantly performing new elections. After some investigation on of the three monitors was eating 100% CPU on a single core and kept printing this in the logs: mon.charlie@2(peon).paxos(paxos updating c 106399655..106400232) lease_expire from mon.0 [2a00:XXX:121:XXX::6789:1]:6789/0 is 2.380296 seconds in […]

Protecting your Ceph pools against removal or property changes

One of the dangers of Ceph was that by accident you could remove a multi TerraByte pool and loose all the data. Although the CLI tools asked you for conformation, librados and all it’s bindings did not. Imagine explaining that you just removed a 200TB pool from your storage system due to a typo in […]

NFS-Ganesha with libcephfs on Ubuntu 14.04

This week I’m testing a lot with CephFS and one of the things I never tried was re-exporting CephFS using NFS-Ganesha and libcephfs. NFS-Ganesha is a NFS server which runs in userspace. It has multiple backends (FSALs) it can use and libcephfs is one of them. libcephfs is a userspace library which you can use […]

Ceph with a cluster and public network on IPv6

I’m a big fan of Ceph and IPv6, so I always try to deploy Ceph over IPv6 when possible. Ceph is the future, just like IPv6 is. Why implement legacy? Recently I did a deployment of Ceph with a public and cluster network running over IPv6. It has a small catch, so I let me […]