Considering Cloud Object Storage for Instant Recovery

Contributed by Maanas Saran, Product Marketing Manager at Actifio

There is a common perception amongst IT admins in various enterprises that object storage in various public clouds, like AWS, Microsoft Azure or Google Cloud Platform, can only be used for long-term retention and archiving. This perception often stems from the notion that object storage is not good for low-latency operations and hence cannot be effectively used as a storage endpoint for applications requiring high I/O (and operations like backups and recoveries).

While this perception holds some ground, the reality is starkly different. Most of the public cloud providers have tiered classes of storage that serve different purposes, and can be used not just for archiving for long-term retention but also for other use cases like daily backups and recoveries.

Before diving into the economics of how object storage can be better than traditional storage media, let’s look at the different tiered storage options that the top public clouds offer:

  Scenarios   Amazon AWS   Microsoft Azure   Google Cloud Platform  
Frequently Accessed Operations: Cloud applications, big data analytics, dynamic websites. Can also used for provisioning clones for DevOps AWS S3 Azure Hot Storage Tier Google Standard
Infrequently Accessed Operations: Ideal for backups, recoveries and data store for disaster recovery AWS S3 - IA Azure Cool Storage Tier Google Nearline
Rarely Accessed Operations: Ideal for archiving AWS Glacier Azure Archive Storage Tier Google Coldline

Now, a glance at the above table might prompt some to make a case for doing backups and restores to an archive tier (aka Rarely Accessed Operations) since the storage costs for that tier across all public cloud vendors is extremely low. However, there are a few reasons why backups to an archive tier might not be a good option:

1. Greater costs

When your backup data needs to be restored on a frequent basis, using Glacier/Coldline/Azure Archive Tier could increase your recovery costs (because of high transaction costs within the archive tier.)

2. Slow performance

Glacier/Google Coldline/Azure Archive Tier are the slowest response storage tiers, which increases recovery times – it typically takes several hours before they can start serving data for a restore request. If SLAs and RTO are important, you should consider using a higher-performing storage class.

3. Ability to instantly mount for test dev environments

With an archive tier storage being used for backups, you would lose the ability to instantly restore or mount recovery points for enabling faster test dev operations.

So now that we know there are different storage classes for object storage across public clouds and also that instant recovery operations are not suited for the archive tier, let’s look at the four key advantages that the other two tiers (Frequently Accessed and Infrequently Accessed) offer when used for operations such as backups and restores:

1. Lower overall costs

Let’s compare a solution that has to recover from block storage vs. a solution that can recover straight out of object storage.

Assume that 100 TB of data needs to be recovered in AWS.

Solution 1: Uses AWS EBS (block storage) to store backups. Cost of EBS (using HDD volumes) is $45/TB/month. Thus, monthly recurring costs (MRC) = $45 x 100 = $4,500. Yearly costs = $54,000.

Solution 2: Uses AWS S3 IA storage to store backups. Cost of S3 IAS is $12.50/TB/month. Thus, (MRC) = $12.50 x 100 = $1,250 for storage + $600 for API = $1,850. Yearly costs = $22,200.
Thus, savings with S3 IAS over EBS = 58%

Obviously the savings are attractive. So what technology is needed that can help you recover instantly straight out of object storage (without transferring the data from object to block storage)?

Look for a solution that can mount instantly from a point in time backup image stored in S3 / S3 IAS / Google NL / Azure Blob storage. The solution needs to emulate a block device in front of this object storage so that the recovered application can work seamlessly. Moreover the solution should also have the ability to intelligently cache any writes happening after the recovery.

One of the important aspects of recoveries is the ability to rewind to any point in time and recover instantly. Storing full backups in object storage will increase the TCO.

For example: Storing 10 points in time of 10 TB source data would consume 100 TB of object storage. A more efficient approach would be to consider a solution that can store data in object storage in an incremental forever manner and yet deliver instant recovery. Assuming a 3 percent change rate daily, for 10 points in time, you would need only 10 TB + 3 TB x 9 = 37 TB of object storage. This incremental forever approach delivers 63 percent savings.

2. Lower latency and improved performance

The first byte latency for both S3 and S3 IA is in milliseconds and hence operations such as instant restore, mount and recoveries will have low latency without any significant performance degradation. This means that you can instantly mount and recover, or run your test dev environments from your backups stored in object storage.

For Example: AWS S3 and Azure Blob Storage have an average download latency of around 10.6 milliseconds, faster than average read latency for HDDs @ 15.78 milliseconds. (Download latency for object storage is the same as read latency in a physical HDD.)

Reference: Cloud Storage Latency
HDD Storage Latency

3. Better throughput

The traditional view of object storage is that it can’t deliver high performance throughput, and hence it could be non-ideal for scenarios like backups and recoveries that require greater throughputs from the storage.

However, the public cloud vendors have worked extensively on increasing throughput performance on the object storage tier. Some tests have found AWS S3 download throughput to be at 732 Mbps (91MB/s)! (Reference)
The high throughput on the object storage tier can lead to faster backups and recoveries.

For example: Protecting 10 TB of on prem data with a 3 percent daily change rate will mean around 300 GB of changes written to the object storage. At an average upload throughput of 24 MB/s for AWS S3, it will take only around four hours to get all the churned data into AWS S3 object storage. This also highlights the importance of an incremental forever backup to cloud object storage solution.

As data sizes grow, it should be possible to use multiple threads to write to the cloud object storage, thus keeping the backup window small.

4. Ability to instantly recover and mount recovery points for Devops

DevOps can benefit by instantly mounting from object storage to deliver rapid and space efficient data access. Significant cost savings can be gained by leveraging thin clones directly from object storage.

Take a 10TB MS SQL DB backup stored in S3 IAS as example. You can now create 10 thin clones without consuming any extra storage. With thick clones, you would have needed 10 x 10 = 100 TB of storage. To achieve such storage efficiencies, look out for a solution that can provision instant thin clones from any point in time backup image stored in object storage.

Cloud object storage has historically been considered a target for archiving data for long-term retention. Object storage with its compelling economics has long been a key piece of an enterprise’s journey to storage efficiency.

But cloud object storage has a lot more to offer than just being the traditional storage target for archiving. It’s time to look beyond for instant recovery of applications for IT and provisioning instant thin clones for DevOps.