Ceph RBD latency with QD=1 bs=4k

You might be asking yourself, why talk about QD=1 bs=4k? Well, from the experience of Wido den Hollander, our colleague at 42on, a single thread IO latency is very important for many applications. What we see with benchmarks is that people focus on high amounts of bandwidth and large amounts of IOps. They simply go […]

Ceph Monitors are laggy or clock might be skewed

This weekend I got to investigate a Ceph cluster which had issues where the Monitors were constantly performing new elections. After some investigation on of the three monitors was eating 100% CPU on a single core and kept printing this in the logs: mon.charlie@2(peon).paxos(paxos updating c 106399655..106400232) lease_expire from mon.0 [2a00:XXX:121:XXX::6789:1]:6789/0 is 2.380296 seconds in […]

Ceph with a cluster and public network on IPv6

I’m a big fan of Ceph and IPv6, so I always try to deploy Ceph over IPv6 when possible. Ceph is the future, just like IPv6 is. Why implement legacy? Recently I did a deployment of Ceph with a public and cluster network running over IPv6. It has a small catch, so I let me […]

Safely backing up your Ceph monitors

So you might wonder: Why do I need to make a backup of my Ceph monitors? I have multiple monitors. That’s true, but would you run into the very unfortunate situation where you loose all you monitors, you loose all your data. The monitors contain very important metadata (pgmap, osdmap, crushmap) to run your cluster. […]

Redundant Ceph monitors with Round Robin DNS

One of the unique features of Ceph is that it can be build without any Single Point of Failure. No single machine will take your cluster down when designed properly. Ceph’s monitors play a crucial part in this. To make them redundant you want a odd number of monitors, where 3 is more then sufficient […]