Monday, December 6, 2010

SSP Failure to Cloud Storage Success - What a Difference a Decade Makes

Storability 10 year Reunion
Years ago, a few buddies and I started one of the first cloud storage providers. Of course, we didn’t call it cloud storage back then, but we merry band of brithers (and sisters), we first generation Storage Service Providers (1gSSPs) were cloud storage way before the cloud was cool.

All the 1gSSPs – StorageNetworks, ScaleEight, StorageWay, Sanrise, and others – failed. The core problem was and still is that renting raw capacity over the network is a lousy business model.
  • 1gSSPs couldn’t sustainably buy their storage cheaper than their retail customers (although over a beer I can share some great stories of how the early 1gSSP robber-barron’s ‘negotiated’ with the storage vendors during the boom).
  • SSPs couldn’t sustainably offer broad enough management efficiencies to generate profits.
  • SSPs couldn’t overcome a host of logistic and cultural issues (network performance/cost, stigma/liability of releasing core data, etc).
After the bust, and the 911 attacks, the entire business simply collapsed. Some of us – my company, Storability and others like Arsenal Digital - managed to flip over to providing managed storage services – running NOCs, and doing backups and restores for our customers. Wasn’t a great business, but we survived long enough to eventually be sold off.

For those interested in an unbiased history of the 1gSSP market, there is a thorough and thoughtful analysis from the National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign (UIUC) posted here.

Ten years later, things look different, and the same.

A whole host of storage service providers – nee’ cloud storage providers – has arisen, not so much from the ashes of the 1gSSPs, but certainly with their dust in the new CSP DNA. These folks have it a little easier than we did back then, and I think more than a few of them are going to make an honest living this time.

In addition to the obvious improvements in network connectivity, bandwidth, and reliability, I see three critical changes that I believe will mark the difference between the past failure of 1gSSPs and the future success of today’s Cloud Storage Providers – file systems, file virtualization, and file storage gateways.


File Systems

Data used to be stored as long strings of 1’s and 0’s, actually millions and billions of 1’s and 0’s we called megabytes, gigabytes, petabytes, etc.  Back in the 1gSSP days, applications like database management systems untangled those 1’s and 0’s and formed them into useful information like bank account records and social security numbers.

Today, data is still made up of 1's and 0's, but the fastest growing forms of data, from the pictures you upload with your cell phone, to the books you download to your Kindle, come packaged in a convenient format called a file.

Files matter – files have digital labels that convey information about the file package itself.  Raw strings of 1’s and 0’s don’t.  More importantly, files have business and human context - 1's and 0's not so much.

Context matters - with it, we can make decisions about how to treat data. With files, we can look at the metadata (the data about the data contained in the label or header attached to the file itself) and learn who created the file, how old it is, and even gain hints about its actual content (does the file contain a song or a spreadsheet?). With this information, we can make intelligent decisions about where to put the file, how many copies we should make, how often we should back it up for safekeeping, etc.

With raw megabytes – no context - we have no way of discerning what’s what, so we have to treat the entire string of 1’s and 0’s the same – in most cases that means treating it all as if it’s all vitally important.

1gSSPs got sort of a raw deal trying to build a business storing all that raw data.

  • They had to treat it all the same – backing it all up every night, for instance. 
  • They had to connect it directly to live applications. The banking app needs instant anytime access to the entire database – no telling when you might make an ATM withdrawal – and apps don’t like to wait for data, so the connection has to be very high speed (laws of physics and economics apply here).
  • They had to have it all – because they couldn’t discern one cluster of 1’s and 0’s from another, customers had to trust the SSP with all their raw data.
Files make life easier for today's wannabe Cloud Storage Provider.

  • Customers can decide and control what files go to the cloud
  • CSPs can offer differentiated services for files based on metadata
  • Applications are not as dependent on instant and constant access to files - they've learned to be patient waiting for downloads, just like the rest of us.
  • Files can be uploaded and downloaded between users and CSPs with ease, so variability in persistence and performance of the connection is better tolerated
File Virtualization

So, the ability to decide and control file location is critical for the success of cloud storage, but it's not enough.

If we know that Sally’s MP3 file of Andrea Boccelli’s “Silent Night” is non-business-critical (albeit absolutely amazing and worth downloading today), we can decide to push Sally’s file to a cheap storage device, and not back it up, saving us money and effort. We might even decide to upload Sally's file to a Cloud Service Provider that offers essentially free storage capacity, and really save the company some dough.

BUT…how will Sally know where it is when she goes to download it next Christmastime? Whoops.

Important point - moving files and treating them differently based on metadata is great, but users and applications cannot be expected to keep track of constantly changing file locations. So cloud storage won’t fly as a business model if Sally or her apps need to keep track of what’s where in the cloud. 

Enter file virtualization, a technology which masks the file's physical location.

File virtualization matters – with a virtualized file structure, regardless of where it physically resides, Sally and Sally’s applications are tricked into thinking Sally’s file is on her network drive at G:/Sally/Music/SilentNight.MP3.  She never realizes, and does not need to know, that it’s been moved, thus the Cloud Storage business model becomes viable.

File Storage Gateways

OK, so now we can decide, move, and eliminate the disruption of moving. So far so good, but the Cloud Storage Business needs one more piece of connecting tissue to reach the tipping point.

If all we care about is Sally and her music, the cloud storage business is pretty simple and in fact a bunch of free or almost free services abound that do just that. Though I admit it is obviously possible (duh, Facebook) I don’t know how to make money off ‘free’ so I am leaving that model alone.

In order to have a successful enterprise oriented (paying customer) cloud storage business, CSPs need the rough equivalent of a set-top box they can provide to the customer. Today, most CSPs offer a programmatic interface to upload and download files, which is kludgy at best, and isn’t going to scale in a commercial environment.

  • No customer is going to want to be locked into a single CSP, or be forced to adapt their infrastructure or modify their applications to support one vendor's cloud model. 
  • Latency is an issue - no matter what we do to reduce the performance imperative, we are eventually going to have to accept the logic that some subset of cloud resident files must reside at least temporarily at the customer premises (sort of like the difference between downloading and streaming movies).
File storage gateways matter – with a gateway in place the customer can treat the cloud just like another storage device. Sure, the vast majority of spinning disks are now located at the CSP, but to the customer the CSP (through the File Storage Gateway) appears to be just another NAS box - albeit a cheap one, that never fills up, and never needs to be backed up.

Up until recently, there have been a few FSG startups poking about, which has been useful for vetting and growing the concept.  Fortunately, for commercial CSPs, serious and trusted vendors are now releasing FSGs.

So now, I believe we finally have the necessary infrastructure and technology for Cloud Storage success.  It's now possible to decide what data can, and control what data will, be safely stored in the cloud.  Once separated and moved, it's possible to decide and control how data is treated when it gets to the cloud.  It's now possible to do all this without disrupting users and applications. Moving from one CSP to another is now simple and non-disruptive. The performance and persistence issues that plagued 1gSSPs are under control. Modifications to the files, user behavior, and application intelligence are no longer necessary to achieve the benefits of cloud storage.

To my mind, the combination of these three major changes in the storage landscape – massive reliance on file systems, commercialization of file virtualization, and emergence of viable file storage gateways have now combined to eliminate the barriers and challenges we faced in the 1gSSP days, and together provide the technical and process infrastructure necessary for cloud storage to finally reach its full potential.

With the technical and logistical hurdles out of the way, it will be up to the skill of the players to decide who wins.

All best wishes go out to the next generation of cloud storage entrepreneurs – as we brithers say, ladies and gentlemen, the ice is yours. Good curling!

No comments: