Ceph object storage and gateway 101

Ceph object storage and gateway 101

Ceph Object Gateway

Ceph Object Gateway, also known as RADOS Gateway (RGW) is an object storage interface built on top of librados to provide applications with a RESTful gateway to Ceph storage clusters. Ceph Object Gateway supports two interfaces:

1.    It is S3-compatible: This provides object storage functionality with an interface that is compatible with a large subset of the Amazon S3 RESTful API.

2.    It is Swift-compatible: This provides object storage functionality with an interface that is compatible with a large subset of the OpenStack Swift API.

A Ceph Object Gateway stores administrative data in a series of pools defined in an instance’s zone configuration. For example, the buckets, users, user quotas and usage statistics discussed in the subsequent sections are stored in pools in the Ceph Storage Cluster. By default, Ceph Object Gateway will create the following pools and map them to the default zone.

  • .rgw
  • .rgw.control
  • .rgw.gc
  • .log
  • .intent-log
  • .usage
  • .users
  • .users.email
  • .users.swift
  • .users.uid

You should consider creating these pools manually so that you can set the CRUSH ruleset and the number of placement groups. In a typical configuration, the pools that store the Ceph Object Gateway’s administrative data, will often use the same CRUSH ruleset and use fewer placement groups, because there are 10 pools for the administrative data.

Ceph object storage

On the other hand, there is Ceph Object Storage. Ceph Object Storage uses the Ceph Object Gateway daemon (radosgw), which is an HTTP server for interacting with a Ceph Storage Cluster. Since it provides interfaces compatible with OpenStack Swift and Amazon S3, the Ceph Object Gateway has its own user management. Ceph Object Gateway can store data in the same Ceph Storage Cluster used to store data from Ceph File System clients or Ceph Block Device clients. The S3 and Swift APIs share a common namespace, so you may write data with one API and retrieve it with the other.

Below you can find an image with an overview of the various layers and the hierarchy of the systems for more clarification.

No alt text provided for this image

Please note that Ceph Object Storage does not use the Ceph Metadata Server.

Garbage Collection

Furthermore, it can be handy to know that when new data objects are written into the storage cluster, the Ceph Object Gateway immediately allocates the storage for these new objects. After you delete or overwrite data objects in the storage cluster, the Ceph Object Gateway deletes those objects from the bucket index. Sometime afterward, the Ceph Object Gateway then purges the space that was used to store the objects in the storage cluster. The process of purging the deleted object data from the storage cluster is known as Garbage Collection, or GC.

Garbage collection operations typically run in the background. You can configure these operations to either execute continuously, or to run only during intervals of low activity and light workloads. By default, the Ceph Object Gateway conducts GC operations continuously. Because GC operations are a normal part of Ceph Object Gateway operations, deleted objects that are eligible for garbage collection exist most of the time.

To view the objects awaiting garbage collection you can, use radosgw-admin. For example: [root@rgw ~] radosgw-admin gc list

There you have it, the Ceph object storage and gateway basics to help you on the way! In case you have any more questions about it, feel free to contact us and let’s have a chat about it. For more insights about Ceph you can also visit our LinkedIn account through the following link: https://www.linkedin.com/company/42on/

Source: Ceph and RedHat