Object Storage Technology

Overview

Not everyone is aware that we use Object Storage technology every day for social networking on sites such as Facebook™, LinkedIn™, Instagram™, Dropbox™ and YouTube™, to name a few. These companies use Object Storage to store users’ content like photos, streaming videos, collaboration files, and much more. Using this technology, they can easily store the content of billions of users. Facebook™, for example, had more than 2.01 billion active users in June 2017.

Until recently, the use cases for Object Storage have been predominantly based around cloud storage applications, but in reality, they are commonly used in markets such as finance, media & entertainment, big data analytics, data science, video surveillance, healthcare and in any context where large amounts of data need to be stored very inexpensively, with the ability to analyze and manage data stored with a level of granularity.

Aside from specific businesses where it is possible to predict storage space occupancy with precision, allowing firms to assign resources precisely, in the enterprise market data is growing exponentially for all organizations, regardless of market or size. Managing, storing and accessing that data is a constant and costly challenge. This is putting increasing pressure on the storage infrastructure which requires constant maintenance, expensive upgrades and continuous investment, as any IT professional knows. Storage costs are rising in particular due to the dramatic growth in unstructured data from multiple sources and formats: commercial, scientific, images, multimedia, podcasts (digital audio files), vodcasts (digital video files), video surveillance, MP3s, Internet of Things (IoT), and all activities that generate huge number of PDF, XLS, PPT, XML, JPEG files and many other types.

90%

Data

80%

Data no longer accessed

The time has come for Object Storage technology

Managing and storing data is a constant and costly challenge for all kind of organizations. Studies and analysis in the field find that more than 90% of all business data generated today is unstructured, and we know from experience that this creates several problems for traditional storage systems. On top of this, the vast majority of information stored quickly loses its worth over time: assessments have shown that one year after its creation more than 80% of stored data is no longer accessed.

Object Storage vs SAN-NAS Storage

Continuing to buy new storage devices just because data volumes increase is quite unthinkable

Inactive data not only occupies expensive storage space, but is backed up and migrated onto new storage for years, an illogical and crazy state of affairs that goes largely ignored. Rather than address this clear and undisputed problem, companies instead persist in buying traditional RAID and NAS for storing cold data and continuing in this way to sustain high costs and complexity, kicking the problem down the road.

By migrating static used data to lower cost object storage, business can optimize the use of primary disks.

80% of data on primary disks that is never accessed
10% of data is accessed only once
10% of data is accessed < 5 times
3% of data is accessed > 5 times

Object vs File

Traditional file systems and storage systems were created in another technology era, when the volume of data to be managed, stored and accessed was in no way comparable to today’s needs. As we know, traditional file systems store data in hierarchical structures consisting of directories and subdirectories with a hierarchy of nested folders, subfolders and files. This creates highly complex tree structures that make it very difficult to manage large sets of files. As the amount of data grows, performance of the file system and storage is dramatically degraded. Standard file systems use only basic metadata to describe the file, such as the filename, date of creation, last modification date and type. This limited amount of metadata does not provide sufficient information to efficiently handle large volumes of unstructured data in the underlying storage infrastructure. Instead, Object Storage can store files known as objects with a rich set of metadata attached, giving users the context and the content. Unlike conventional storage systems, Object Storage is not accessed directly from the operating system but through standard Internet protocols such as HTTP and APIs like S3, REST, OpenStack Swift, SOAP and CDMI and so on.

File System

  • Millions of files

  • Amendable data

  • Locking Mechanisms

  • File System Hierarchy

  • Complex to Scale

  • TCO Increase Exponentially

Object Storage

  • Hundreds to Billions of Objects

  • Immutable Data

  • No Locking Mechanisms

  • One Storage Pool, Object ID

  • Scale Uniformity

  • Lowest TCO

An Object Storage analogy

To illustrate how object storage works compared to traditional storage SAN and NAS systems, a nice analogy that is used is that of car valet parking. When you hand over your car to a valet service, you don’t care if the valet parks your car on the upper fifth floor or in the basement. You’re only interested in getting it back when you show the valet your ticket. Objects are addressed in the system using a key, which enables immediate access and identification without knowing the exact location in the storage system nodes. Object Storage uses a single global namespace that allows any object to be stored or retrieved from anywhere in the distributed environment via a unique identifier. This allows object storage systems to scale linearly as the environment grows simply adding new nodes. In this way, web-scale applications can access data without driving through file-based trees, but based only on the content of the information (object). This is one of the many important distinguishing features of Object Storage technology compared to traditional NAS-SAN storage.

Data and Meta Data

The importance of Metadata

Objects contain both data (information) and data descriptors called metadata. Objects are addressed using unique ID numbers managed in a flat index, which means they can be found and identified without knowing their exact location in the system. Metadata descriptors are defined by the application when the object is created, and additional information can be attached to a particular object group or individual object at any time. Meanwhile, objects are identified by applications via  metadata, so programs can be designed specifically to manage information using a specific metadata scheme

Object

The object is the basic unit of information storage. The metadata and unique name are assigned when the object is created. Objects may be any size. For instance, a complete movie can be stored as an object.

Great flexibility in searching through

Object storage solutions provide the ideal way to handle PDF, XLS, PPT, XML, JPEG files, audio, image, video production and post-production files for healthcare, scientific research, big data analytics, IoT, or any sector where information can be very sophisticated and can vary over time. Thanks to rich set of metadata, object storage offers unmatched flexibility compared with traditional storage systems.

It’s also possible define metadata retention and deletion policies for scheduled storage management, and these properties can also be useful for regulatory compliance.

As we know, metadata descriptors are defined by the application: for this reason, object storage is an extremely valuable storage model allowing organizations to enjoy dependable security benefits, manage data growth and exploit a range of new opportunities.

For example, in a healthcare setting, patient information must be archived for years if not decades, but may need to be accessed in a matter of seconds: Thanks to a rich set of metadata, medical reports can be accessed through different keys as shown in the X-ray image example to right.

File

Object

Scalability

Object Storage Scale-Out architecture

From a hardware point of view, Object Storage consists of multiple nodes, each of which is a standard commodity X86 server with storage capacity and I/O (input/output) bandwidth. Capacity is increased by adding new nodes which are linked together through a high-speed internal network. The entire cluster of nodes is managed by the administrator as a single storage system. Embedding the resources in each single node means it is possible to grow capacity and performance linearly.

Object Storage

  • Performance, Capacity, Both
  • True Linear predictability
  • Massive single system scale
  • Performance increases as you grow capacity
  • New performance and capacity resources available in minutes

Traditional Storage 

  • Capacity only
  • Degradation of performance & capacity at scale
  • Limited performance scalability
  • Creates island of storage
  • Forklift upgrade

Object Storage data protection vs traditional RAID

Object Storage data protection offers much higher resilience and capacity efficiency compared to traditional RAID approach. It is certainly true that RAID technology offers ever-higher performance as it has become more and more sophisticated. But at the end of the day, even the most sophisticated RAID architecture has to rely on hard disk performance. RAID technology was born in an era where disks had relatively little capacity, in the order of a few megabytes. Over the years capacity increased significantly to reach the present 12 TB. In the meantime, increase in HDD speed has failed to keep pace with the increase in capacity. As a result, RAID technology is generally quite inadequate for modern enterprise data protection and recovery needs.

The RAID-based storage system is an outdated

RAID systems are still a fundamental part of storage infrastructures, but they were designed many years ago, when hard disk capacity was in the order of few megabytes. Today, RAID technology is at the end of its life. This can be seen from a simple calculation.
Rebuild time after disk failure in a RAID system can never be less than the size of the disk divided by its sequential write speed. For example, if we take a 1 TB disk with 7200-RPM, with a sequential write data transfer rate of 115 MBps, we get: 1,000,000 MB : 115 MBps = around 8,700 seconds.
Which in practice means 2.41 hours!

For a 4 TB disk, the rebuild time would be at least 10 hours!
For a 12 TB disk, the rebuild time would be at least 30 hours!

Not to mention the degradation of the entire RAID system during reconstruction. Storage professionals are well aware of this problem (better described as a nightmare).

Object Storage doesn’t need to rebuild broken hard drives to recover data as with traditional RAID systems. So disk failures in object storage nodes do not impact overall system performance. This gives Object Storage great value for IT departments as it can dramatically reduce management time through ease of scalability and provide tremendous flexibility as well as data protection.

Security

Object Storage data protection methods

There are several techniques used to protect data within object storage. Often, the simplest methods are the best. The most secure way to protect data is to make several copies of it, and this is exactly what object storage does. Object Storage makes multiple copies of objects to different nodes, and the number of object copies (replication factor) can be defined during the initial configuration. If an object is missing because a disk or an entire node is broken, the number of copies is re-established by generating another copy of the data within the available nodes resulting in extremely high durability and availability of data.

The advantage of replication is that it is a lightweight process. It doesn’t require complex calculations like other techniques such as Erasure Coding (EC), for example. Erasure coding uses a complex algorithm that consumes a large number of CPU cycles, a figure which grows with every further access to data that needs to be reconstituted, since erasure-coded data is parsed and stored in changed block increments across nodes. This process can lead to considerable delays, especially in WAN or cloud implementations. The downside to replication is that each redundant copy consumes more storage capacity. Object storage therefore generally relies on erasure coding, a data replication technique that splits data into small pieces call chunks, each of which is replicated multiple times within the nodes. Chunks are stored with extra bits of data which mean that the system only needs a pre-defined subset of the chunk from the dispersed storage nodes to fully retrieve the original data using information that is stored elsewhere in the array.

As the cost of disks has fallen considerably, pure object replica within nodes remains the best way to sustain performance. Some vendors, including QStar, also use data compression to optimize capacity for multiple security copies. As we have seen, using erasure coding to rebuild the original file requires a large number of CPU cycles to read and protect the chunks. This means the node must rely on powerful servers, whereas traditional nodes normally use commodity servers.

No storage lock-in with Internet Protocols

Object Storage & APIs

With Object Storage technology customers’ storage investments are safeguarded over time. Abstraction of storage resources allows access by standard internet protocols like HTTP or an application programming interface such as S3 without knowing where the objects are or where they are to be stored.

Adopting this model companies are no longer locked in to a specific operating system or hardware/storage architecture, unlike traditional block storage systems (FC), where a lack of upgrade to the OS or a discontinuation of support can lead to serious matrix compatibility problems in future data migrations or storage consolidation.
This doesn’t apply with Object Storage because access through standard internet protocols or APIs means that companies’ investments are protected from obsolete or outdated operating systems, FC storage systems, FC switches, servers or applications.

Object Storage offers flexible and scalable data storage for most applications and services being developed with modern techniques and tools. The rise of Object Storage is partly due to the simplicity of interacting with this storage technology by software developers.
Because the API consists of standard HTTP requests, libraries were quickly developed for most programming languages. Saving a chunk of data became as easy as an HTTP PUT request to the object store. Retrieving the file and metadata is a normal GET request.

The Future of Object Storage

Object Storage is capable of efficiently meeting the needs of many different use cases across multiple IT environments. As such it’s no surprise that numerous businesses are opting for Object Storage technology. Storage systems for massive data archiving are no longer limited to block and file storage, which entails high costs, lock-in investments and very expensive forklift upgrades. Object Storage is the emerging technology for large content repositories offering a highly scalable and cost-effective storage platform to support a wide variety of businesses and industries.

Related Solutions

QStar Cloud Gateway solution integrates seamlessly into existing infrastructures without changes for either the applications or users. Migrated or moved data continues to appear locally even when stored in the Cloud. The simple data read/write approach is virtually compatible with all existing applications. QStar Cloud Gateway can archive unlimited amounts of information securely and cost-effectively within any Cloud – private, public or hybrid such as Google, Amazon, Azure, IBM Bluemix and many others. QStar Cloud Gateway is designed to combat the relentless explosion of data, providing transparent and automatic migration of static files from local infrastructure to the Cloud, using attributes such as date created, modified, accessed, file owner, size and extension. Meanwhile, it restores data to its original location in real time, completely transparently to applications, network and users.

QStar LTFS as NAS architecture virtualizes a Tape Library, effectively converting it into network-attached storage – NAS for sharing with multiple users and applications. The solution supports common networking protocols (SMB and NFS) plus S3 compatible API commands and is integrated on either a Windows or Linux server. Files that are stored in a LTFS as NAS environment are retrieved in the same manner as the native operating system, even though the data is actually stored in a Tape Library. Not only do users not realize that the volume (file system) they are accessing was created on tape and not on disk, but, through a sophisticated cache architecture, the read/write data activity is managed so effectively, that performance is comparable to a NAS device. Transparency is such that the architecture is also supported by virtual machine (VM) environments, even though the VM environment is not designed to support tape drives; existing applications installed in a VM can access the Tape as NAS architecture just like a standard NAS disk. Data can be accessed transparently over the network.