As performance optimization research might cost too much time and effort, here are some steps you might first want to try to improve your overall Ceph cluster.
The first step is to deploy Ceph on newer releases of Linux and deploying on releases with long-term support. Secondly, update your hardware design and service placement. As Ceph was designed to run on commodity hardware to make building and maintaining petabyte-scale data clusters economically feasible for more organizations. Moreover, as you plan out your cluster hardware, you will need to balance a number of considerations such as failure domains and potential performance issues. It is important to distribute Ceph daemons and other supporting processes across many hosts. If you haven’t done this from the start and run into Ceph Manager (amongst others) performance issues, then we recommend you adapt your hardware design and configure specific types of hosts for running specific types of daemon on it and especially separate them from any processes that utilize your Ceph data cluster (Like OpenStack, CloudStack, etc). If you are running an OSD with a single disk, create a partition for your volume storage that is separate from the partition containing the OS. Generally, it’s recommended to separate disks for the OS and the volume storage.
Furthermore, of course network configuration is also critical for building a high performance Ceph Storage Cluster. Especially because the Ceph Storage Cluster does not perform request routing or dispatching on behalf of the Ceph Client. Instead, Ceph Clients make requests directly to Ceph OSD Daemons. These Ceph OSD Daemons then perform the data replication on behalf of Ceph Clients. This means replication and other processes impose additional loads on Ceph Storage Cluster networking. Although it is technically possible to run a Ceph Storage Cluster with only two networks: a public (client, front-side) network and a cluster (private, replication, back-side) network, this approach complicates network configuration (both hardware and software) and does not usually have a significant impact on overall performance. For this reason, Ceph recommends that for resilience and capacity dual-NIC systems either active/active bond these interfaces or implement a layer 3 multipath strategy with for example FRR. 3.
For further improvement, erasure coding can substantially lower the cost per gigabyte but has lower IOPS performance compared with replication. Erasure coding is a data-durability feature for object storage. You can use erasure coding when storing large amounts of write-once and read-infrequently data where performance is less critical than cost. But if the performance loss is too large, you should design for replication, not erasure coding.
Finally monitor nodes of course are critical for the proper operation of the cluster. To guarantee performance first of all try and use dedicated monitor nodes to make sure they have exclusive access to resources or, if running in shared environments, fence off monitor processes. For redundancy it is advised to distribute monitor nodes across datacenters or availability zones and deploy an odd number of monitors (3 or 5) for quorum voting. Adding more monitors makes your cluster more durable.
If you do encounter performance issues in your Ceph cluster, always start at the lowest level (the disks, network, or other hardware) and work your way up to the higher-level interfaces (block devices and object gateways) and be precise about what you find (see 42on Ceph talk about performance testing). This will give you the best approach to figure out what your bottleneck is. If you are not sure or need any help, you can of course contact us. Do you have other tips and tricks for improvement? Let us know!