By Franck Pachot
I’ve always been working with databases. Before the cloud era, the most abstract term was “data”. A variable in memory is data. A file is data. A block of disk contains data. We often created a ‘/data’ directory to put everything that is not binaries and configuration files. I’ll always remember when I did that while working in Dakar. My colleagues were laughing for minutes – my Senegalese followers will understand why. “Data”, like “information” is abstract (which is a reason why it is not plural). It makes sense only when you associate it with a container: data bank, database, datafile, datastore… In database infrastructure, we store data in files or block storage where we read and write by pages: read one or many continuous blocks, bring them in memory, update them in memory, write those blocks back. And to export and import data outside of the database, we store them in files, within filesystem that can be local or remote (like NFS). But it is basically the same: you open a file, you seek to the right offset, you read or write, you synchronize, you keep the file opened until you don’t need to work on it anymore, and then you close it. This API is so convenient, that finally in Linux everything is a file: you write to the network with file descriptors, you access the block devices with /dev ones, you output to the screen with stderr and stdout,…
And then came the cloud which maps most of the artifacts we have in the data center: virtual network, virtual machines, block storage, network file systems,… but then came a new invention: the Object Storage. What is an object? Well.. it is data… Then, what is new? It has some metadata associated with it… But that’s what a filesystem provides then? Well.. here we have no hierarchy, no directories, but more metadata, like tags. What? Everything in the cloud without folders and directories? No, we have buckets… But without hierarchy, how do you avoid name collision? No problem, each object has a UUID identifier.
My hope is that you find this blog post when you “google” to know what is object storage. It is quite common that “experts” answer a quick “did you google for it?” in forums without realizing that actually, what makes you an expert is not what you know but how accurate you can be when “googling” for it.
If you already know what is an Object Storage, you will probably find more information on each cloud provider documentation. But what if you don’t know at all. I’m writing this blog post because a friend without cloud knowledge was trying to understand what is Object Storage.
He is an Oracle DBA and came to this page about Oracle Cloud Object Storage:
Overview of Object Storage
Oracle Cloud Infrastructure offers two distinct storage class tiers to address the need for both performant, frequently accessed “hot” storage, and less frequently accessed “cold” storage. Storage tiers help you maximize performance where appropriate and minimize costs where possible.
Use Object Storage for data to which you need fast, immediate, and frequent access. Data accessibility and performance justifies a higher price to store data in the Object Storage tier.
And you know what a DBA thinks where he reads “performant, frequently accessed hot storage”, and “fast, immediate, and frequent access”? This is for a database. In the whole page, there are some use case described. But no mention of “read” and “write”. No mention of “random” or “sequential” access. Nothing that explicitly tells you that the Object Store is not where you want to put your database datafiles. It mentions some use-cases that seem very related with databases (Big Data Support, Backup, Repository, Large Datasets,…) and it mentions features that are very related to databases (consistency, durability, metadata, encryption).
Basically, if you already know what it is, you have a nice description. But if you are new to the cloud and try to match this service with something you know, then you are completely misled. Especially if you didn’t read about Block Storage before.
Are all cloud providers doing the same mistake? Here is AWS definition:
Cloud object storage makes it possible to store practically limitless amounts of data in its native format
The Amazon Simple Storage Service (Amazon S3) has also no mention of read/write workloads and random/sequential access. The features are Durability, Availability, & Scalability. Nothing is telling you that it is not designed for database files.
Google Cloud may be more clear:
Object storage for companies of all sizes. Store any amount of data. Retrieve it as often as you’d like. Good for “hot” data that’s accessed frequently, including websites, streaming videos, and mobile apps.
The “Store” and “Retreive as often” gives the idea of write once and read many, as a whole. This is not for databases. But again “accessed frequently” should mention “read” workload.
Microsoft is known to listen and talk to their users. Look at Azure name and definition for this:
Blob storage: Massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning
Yes, that rings a bell or two for a database person. BLOB is exactly what we use in databases to store what a cloud practitioner stores in an Object Storage. Here is the “O” for “Object”, the same as in “Binary Large Object”. You do not store database files in an Object Storage. Databases need block volumes and you have block storage services. You don’t store a hierarchy of folders and files in an Object Storage. File servers need protocols providing shared access and a structure of filesystem trees. You store everything else in an Object Storage. Think of it like writing to tape but reading like if the tape was transformed to SSD. In this store you put files and can get them efficiently. I’m talking about “store”, “put” and “get” like in a document database. But documents can be terabytes. And you can read those files with a database, as if it were a BLOB, like Oracle Autonomous Datawarehouse reading ORC, Parquet, or Avro. Or Amazon Athena running SQL queries on S3 files.
I hope what is an Object Storage is more clear for you, especially if you are in databases. And also remember that what is easy to google for you may be impossible to find for someone else. You need concepts and Cloud Practitioner certifications are really good for that.