The importancy of upgrading and updating Ceph consistently

The importancy of upgrading and updating Ceph consistently

During our Ceph consultancy projects, we often get questions about Ceph versions. A question we get a lot is about what Ceph version should be used, do I use the latest or an earlier version? Another question is about running different Ceph versions on one cluster, is this a bad thing? I thought it might be nice to share our advice in a blog, so here you go.

42on supports most Ceph versions, from Debian Ubuntu to CentOS RHEL and SUSE. When you run Ceph with one of the major vendors, it is a good thing to follow their advised version support. But what when you work with the open source community version? What version should you be running then?

We see a lot of Ceph users, run older or different versions within their cluster. Often the result of starting with Ceph on a smaller scale and then later expanding Ceph with new hardware using a newer version. When multiple expansions take place, sometimes multiple Ceph versions exist within one cluster. This means that users decide to install the new versions on the new hardware, while the old servers are still running the older versions. Often the problem of this specific reason of different versions lies in the design of the cluster. It is therefore good to plan, in advance, how version control will be implemented within the cluster. A good approach for this is to ensure that a current version is installed on the old and new hardware first. Once that is done you can perform updates on all of them. 

Ceph is a distributed storage infrastructure consisting of a system of separate components that have one main job to fulfill, and that job is to ensure the delivery of consistent storage. In order to do that, Ceph needs to know everything that happens within the system. In turn, for the system to know what happens in it, you should make sure that all the exact same versions are applied. The reason for this is that all versions have their own language (major upgrades) and even their own dialect (minor updates). So, when you use the same version within the entire infrastructure, the system can communicate with each other (protocol). Working with different versions of Ceph is possible but not recommended; the more the versions differ the less recommended it is. In the Netherlands, we can perfectly talk to our Belgian, German, English, and French neighbors, but the further separated the more our language differs, the more difficult it gets; my Chinese and Japanese still lack, which is why I use ‘version English’ in my blogs.

Because Ceph consists of separate components it is important to have the same versions running within a cluster.

Because Ceph consists of separate components it is important to have the same versions running within a cluster.

Apart from the above-described protocol, in newer versions major and minor issues are solved, performance is improved, and features are added, providing an even more consistent storage infrastructure. Also, it is unsupported to have major different versions like 14.2.22 and 15.2.14 mixed up. In terms of minor releases, the differences are minimal, but it is a good practice to get them right as quickly and as much as possible.

One could argue that there are two reasons why there can exist different Ceph versions within one cluster:

  • The cluster is in the middle of an update.
  • The cluster is using different versions because there is a known, not yet solved bug in the following version. Users can then decide to put part of their workloads on a new version but keep a certain part on the old one.

With both reasons, you have a conscious deviation and that is fine.

At 42on we recommend using the second-last version of Ceph because you want to have software running that is as stable as possible and has been viewed as much as possible in the market. If enough people use the version, bugs can be solved because there is always someone who is making progress. Thus, it has been receiving updates for critical bugs and gained use case performance improvements for a year already. In addition, you will still benefit from updates and support for another year.

In conclusion, by selecting your Ceph upgrade and update strategy, two things are most important:

  • Stay up-to-date with major versions.
  • Stay consistent by updating the minor updates.

Although you probably enjoy a period of relative rest if you do not conduct those big upgrades, that leap is risky. On the other hand, it is a spoilage of value as you work with an old system from which the performance is lower. In addition, you do not have the advantage of all kinds of optimizations and features that simplify management or improve stability. Furthermore, if you do not regularly update your cluster, you will suddenly have a very large and new project ahead of you.

So, update and upgrade your cluster consistently, preferably upgrade once a year and update several times a year. If you like, 42on can work together with you to get or stay up-to-date.

I am curious about your Ceph update strategy. Have you implemented version control or are you running different and/or older versions within your clusters?

Read why it is important to update correctly in our blog, through the following link https://42on.com/5-more-ways-to-break-your-ceph-cluster/ .

We are hiring!
Are you our new