Big Data Cloud

By John Burke
On Dec 02, 2013
Monday, December 2, 2013

Big data and cloud platforms seem like a perfect match: cloud-born data streams such as social media data are targets for analysis, and resource needs have a way of either ballooning steadily (storage) or fluctuating widely (compute), both of which conditions cloud solutions can cope with.  The reality is, though, that the enterprise is not there yet.  Although more than half (53.4%) of the companies we spoke to for our 2013-2014 enterprise technology benchmark say they have a big data initiative under way, none of them have moved any of their production big data work into the cloud.  About 40% are evaluating cloud solutions of one sort or another.

The Amazon Kinesis and Elastic MapReduce, Google BigQuery, and startups such as XPlenty and Qubole, are rapidly staking out claims to various big data cloud services.  Why the reluctance to adopt?

One big sticking point right now is the cost of storage.  IT shops that run data centers well have very low costs for storage.  Especially as they look at open-ended commitments to data growth on the demand side (there is little commitment yet to planning for the end of life of the data, so very little commitment to deleting things) and falling prices per-TB on increasing densities and improving performance, they are seeing low costs for storage in-house.  They are also, often, looking at empty rack space as they continue to virtualize and consolidate servers and see some of the application hosting burden shift to the cloud in the form of SaaS.  

Other sticking points include the usual issues with control and with security.  Lots of IT shops or risk managers are unwilling to see data they care about living in he cloud still, so any of the non-cloud-native data critical to the analysis don't get to go live out there in a Hadoop cluster in the clouds.

What is going to propel enterprises into cloud-based big data?  A few things.  Declining price-points on cloud storage will help, and will come.  (Running out of space for more storage in the in-house data centers will also help.) Increasing transparency in security and auditing on the part of cloud providers will help too, and they are also coming.  Add to that the increasing movement of enterprise apps into the cloud and the increasing federation of data across enterprises and you see a world emerging where even a lot of "inside" data is going to be cloud-native as well, so keeping it cloud native for analytics will be more logical.  The ability of lines of business with money to spend on IT services--and a willingness to avoid IT when they see it as an impediment to progress--will also bring cloud solutions increasingly to the fore. Lastly, add in the limits to growth on IT staffs and the limited availability of skilled big data staff, and you come to the point where "as a service" delivery of any level of big data systems will be attractive, ranging from Hadoop clusters as a service all the way up through no-programming, drag-and-click analytics and visualization tools.  

I am confident that when we ask again this year about use of cloud services in big data, we will find that early adopters have begun to use them in production.

Research Tracks: