This year we have seen lots of traction of the Rook project and have been talking much about the adoption of Rook within production infrastructures of our clients. We see more and more companies work with it and listened to how efficient the solution is.
The reason why we talk much about Rook, is because of the touchpoints to both Kubernetes and Ceph. Two open source projects we support our clients’ teams with.
In short, Rook turns distributed storage systems into self-managing, self-scaling, self-healing storage services. It automates the tasks of a storage administrator: deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the Kubernetes platform to deliver its services via a Kubernetes Operator for Ceph. Rook is open source using the Apache 2.0 license.
So, Rook is a framework to make it easy to bring storage backends to run inside of Kubernetes. But why is Rook on the rise now? More about this in this blog.
Kubernetes storage challenges
One of the reasons for adopting Rook is because of the state of the applications and the possible need for storage. The applications that run on Kubernetes are usually stateless. Stateless means that your application isn’t dependent from state. So, if you call a function twice you will always have exactly the same result. This is a pattern you usually find in functional programming languages where you don’t have methods depending on an instance and the corresponding instance variables.
But being stateless also implies the storage as ephemeral, which is not always desirable in every business case, often some dependences on external storage (stateful / persistent) are required. A Kubernetes storage solution can run externally via an API, but in some cases, this is not ideal. Think of the disadvantage of the portability of the external storage or the implementation that has to take place, regardless of the required budget and the management of the storage solution.
How does it work?
Another possibility is a storage solution which runs within Kubernetes.
To help solve these earlier mentioned challenges, Rook takes care of the availability of storage within the Kubernetes cluster. Rook consists of multiple components:
- Rook Operator is the core of Rook. The Rook operator is a simple container that automatically bootstraps the storage clusters and monitors the storage daemons to ensure the storage clusters are healthy.
- Rook Agents run on each storage node and configure a FlexVolume plugin that integrates with Kubernetes’ volume controller framework. Agents handle all storage operations such as attaching network storage devices, mounting volumes on the host, and formatting the filesystem.
- Rook Discovers detect storage devices attached to the storage node.
Rook takes care of the deployment of MON, OSD and MGR daemons for the Ceph clusters as Kubernetes pods.
Its architecture looks like this:
Once Rook is setup it can be consumed as any other Kubernetes storage with storage classes and persistent volume claims. So, how does it work?
Architectural layers are quite easy. Let’s go through it layer by layer with a visualization of every layer.
For starters we have layer 1: Rook architectural management.
This first layer (Rook Management) is shown by the orange boxes. It is the primary Rook operator and also consists of some additional daemons depending on the configuration.
The Ceph OSDs are Object Storage Daemons that manage the underlying physical storage device. Ceph aggregates that, pools it together in a software defined cluster and then exposes it to the user.
Second we have layer 2: CSI provisioning.
At layer 2 we have the Ceph Container Storage Interface (CSI) provisioning of Rook. This is where the user application requests block, file or object. In turn, the block, file or objects are provisioned based on a storage class and presented to the user. Generally, block storage is used for a single container. However, file storage can be shared between containers. This is also the case for object storage.
There is also a similar system in the Rook setup for object storage, where a storage class exists and where the user can make a bucket claim for the storage. When this is done, Rook makes a bucket provisioner, to create the bucket for the user.
Third we have layer 3: Ceph data path.
When it comes to the dark grey blocks, these regulate the information. For instance, if the user requests object storage, they can get info about the storage that Rook created via a Kubernetes secret. This then connects through S3 protocol using these connection details provided in the secret.
While Rook manages your Ceph cluster and clients’ access to it, the end result is just a Ceph cluster. This means that Rook does not sit in or changes the default data path that is used with Ceph.
Fairbanks and 42on think that Rook is an awesome project that further enhances both Kubernetes and Ceph. If you are interested in Rook you can get in touch with the community via one of the hyperlinks below. In case you have any questions about Rook, Ceph or Kubernetes let us know in the comment section and we will answer it.
Rook Github: https://github.com/rook/rook
Rook website: https://rook.io
Source: https://youtu.be/j86OXjC1Jr8 (Rook Intro and Ceph Deep Dive)