Frequently Asked Questions About Cloud Archive Storage and Active Archive Solutions

  • Question – What is Hybrid Archive Storage?

Answer – Hybridization is used across many industries to select the best features and eliminate the weakest. Storage technologies too have good and bad points, so using a combination of storage technologies can allow organizations to get the “best of both worlds”. Gateway software with replication (such as QStar Archive Replicator) allows content to be written synchronously to multiple storage systems, some local and some remote, to protect data but also allow cost savings on the retrieval of that data. Many Cloud Storage vendors charge for data retrieval, so eliminating reads in favor of local storage (using Object Storage, Tape Libraries, NAS disk – or a combination) will reduce overall monthly cost of archive storage while adding additional levels of data protection.

  • Question – What is the 3-2-1 Archive and Data Protection Best Practice?

Answer – The 3-2-1 Archive and Data Protection Best Practice was formulated from decades of experience with many different archive technologies, many of which do not exist today. Using proprietary technology or formats can lead to expensive migration programs when (not if) they become obsolete. Storing three copies of all archive data to two different technologies, and storing one copy offsite (preferably offline) offers a strong platform to ensure all eventualities are protected against – such as; fire, flood, earthquake, ransomware, virus, malicious action, obsolescence, etc.

  • Question – What is a Cloud Archive Gateway?

Answer – A Cloud Archive Gateway is an appliance or software running on a server that facilitates the movement of data from an on-premise (local storage) to a private or public cloud (such as Amazon, Azure or Google). Content can be moved or copied to a “landing zone” using standard file protocols (SMB or NFS). All this data is then stored to the Cloud and only a reference is retained in the Cloud Archive Gateway to allow that data to be retrieved when required. Users benefit by storing data offsite to their data center, giving additional levels of data protection.

  • Question – What is Tiered Storage?

Answer – Data has a hierarchy of usefulness. To start, data typically is created and perhaps edited repeatedly followed by numerous reads and perhaps re-edits. During this phase fastest storage (Flash) is ideal. As data ages – the number of read or edit requests diminishes and so this data can be relocated to lower cost, hard disk-based storage technologies. Finally, data is no longer accessed but still potentially has value and is archived to tape or Cloud storage.

  • Question – What is a Tiered Data Storage infrastructure?

Answer – Tiered Data Storage employs a policy-based data migrator to relocate data to the most appropriate storage tier, based on performance, security and cost. Content typically when first created is best residing on the fastest storage tier available, after it becomes less active – that data can be automatically moved, based on pre-set policies, to a slower more cost-effective storage tier or perhaps to an archive environment such as a tape library or Cloud.

  • Question – What is Self-Healing Archive Storage?

Answer – Archive Storage is defined as “Self-Healing” when it makes automatic additional copies of data. If the first copy of data is unreadable then a second or even third copy would be read automatically instead by the intelligent software that knows where all data is kept (online, offline, local or remote). The number of copies is kept constant. In addition, self-healing archive storage will often self-audit periodically by checking the validity of data through file hashing / digest or checksums.

  • Question – What is Object Storage Software?

Answer – Object Storage software creates a Cloud-like infrastructure either locally or remote to the primary storage installation. An Object Storage system is designed without the file hierarchy. Content is stored in a flat structure and each “object” is found and retrieved through a unique identifier, much like a coat-check or valet parking ticket. Objects are secured through replication or erasure-coding.

  • Question – What is Erasure Coding?

Answer – Erasure-coding reduces the amount of storage capacity required to protect content but increases the processing requirement to reassemble parts into a whole object. Erasure coding is a method to protect data in which it is broken into small pieces (say 16 pieces), a set number of these pieces (say 10 pieces) can always allow the whole object to be recreated.

  • Question – What is Long Term Archival Storage?

Answer – Data is increasingly being stored for longer periods as it is the life-blood of many organizations which leverage that content again and again for profit. Long-term equates to content that out-lives the original storage platform it was written to. Therefore, data migration – from one technology to another – must be planned for and facilitated by the storage or storage software. Long-term Archival Storage should be secure and low cost, leading to many organizations choosing tape.

  • Question – What is High Performance Computing Storage?

Answer – High Performance Computing (HPC) is used to solve the worlds most difficult problems, such as weather forecasting, DNA sequencing, quantum physics, oil and gas exploration, understanding the universe etc. Super high-speed calculations must be performed on very large datasets. The storage used must match this high-speed. However, often only a small subset is analyzed at one time, therefore high-performance and high-capacity archive storage is used to store data that is currently not being analyzed.

  • Question – What is a Hybrid Storage Architecture?

Answer – Hybrid Storage consists of different storage technologies working together to boost their strengths and reduce their weaknesses. Hybrid Storage often refers to primary storage where super-fast Flash is used in conjunction with slower disk and even Cloud storage to provide higher capacities at lower overall cost of ownership.

  • Question – What is a Hybrid Archive Storage Architecture?

Answer – Archive storage is designed for long-term data protection often using replication or data coping. A hybrid archive storage replicates data to different technologies to allow for higher cost, faster storage (such as hard disk) paired with lower cost, more robust storage (such as tape or Cloud). Also tape storage could be paired with Cloud storage, to remove the often-costly Cloud to Cloud migration, should an organization decide to switch Cloud providers.

  • Question – What is Active Archive Storage?

Answer – Active Archive Storage is designed to store vast amounts of data (many PBs), securely, while allowing users direct access to their data through standard applications or file system interfaces. Many backup products offer archive functions – but are almost always “inactive” in that an Administrator must “restore” the data for the user. An active archive protects data without backup, through replication or creating mirror copies and should include an offline or remote copy (air-gap) to protect against facility destruction.

  • Question – What is File and S3 Archive Gateway Software?

Answer – Archive Gateway software acts as mechanism to store data to technologies that are not normally supported directly. Data is written using Cloud (S3) or File (SMB or NFS) protocols to the gateway and from there data is rewritten to one or more technologies including tape, VTL, optical, NAS, public or private Cloud. Content is often written through a fast cache, which maintains all metadata associated with the file or object along with any data required to access it in the future.

  • Question – What is a Public, Private and Hybrid Cloud?

Answer – Cloud storage is based on the principle of storing objects rather than files. Private Clouds are most often owned completely by the organization, while Public Clouds are owned by a third party that rents capacity monthly to the organization. To protect and secure content replication occurs between two instances of a Private Cloud or using a multi-cloud solution that stores objects to two different Public Clouds. A Hybrid Cloud stores objects to both a Private and Public Cloud for the same reason and possibly to reduce egress fees if data is frequently accessed.

  • Question – What is an LTO Storage Solutions/Archive?

Answer – LTO (or Linear Tape Open) is a tape format that is approaching its 9th generation and started in 2000 with a capacity of 100GB. LTO is an open standard, developed by HPE, IBM and Quantum, using half-inch tape media and offers mid to high performance and capacity. Individual tape media (LTO9 has 18TB native capacity) is automated using a tape library, consisting of many media slots and a smaller number of tape drives. The library contains a picker mechanism to move the correct media to a drive for reading or writing. LTO is a sequential media – starting at the beginning and using a serpentine writing method to fill the entire media. Individual files cannot be deleted, only a complete erase will remove data already written.

  • Question – What is LTFS? (and TDO)

Answer – LTFS (Linear Tape File System) is a standard method of writing data to LTO, allowing transfer of content from one tape system to another. LTFS started with the introduction of LTO5 in 2010/2011 and many archive organizations offer it. Uniquely, each media maintains a record of all content on the small index partition and the data is written to the much larger data partition. TDO (Tape Disk Object) is a proprietary format developed by QStar Technologies. It is designed for tape library infrastructures. The metadata for the whole data set (many media) is written to disk and periodically also to tape using a single data partition, which significantly improves read performance. Simply put, LTFS works best with stand-alone tape drives, TDO works best with tape libraries.

  • Question – What is a Tape File System?

Answer – A Tape File System allows archived files to be read individually from tape media. Conversely a Backup product will write a stream of content which requires a restore operation before reading individual files. A Tape File System can be open (such as LTFS – Linear Tape File System) or proprietary (such as QStar’s TDO – Tape Disk Object) where both offer different advantages. LTFS is typically used for stand-alone tape drives and allows data to be moved from one place to another. Proprietary tape formats are used in a tape library environment where data security and performance are more crucial.

  • Question – What is Policy-based Migration Software?

Answer – Policy-based migration software is used to identify and relocate files that are no longer required on the fastest tier of storage but still have value to the organization.  Policies can be created to define files that have not been accessed or modified in multiple months, these are defined as “static files”. Once identified they can be migrated or moved to a new, lower cost storage tier – such as Cloud or a tape library. If moved they can be found on a different drive letter, share or mount, BUT if migrated, a stub (reparse point or symbolic link) is created that redirects the request to the new location. Options can be added to protect files from deletion either permanently or for a period of time.