Will cloud-based machine learning services take over analytics?
Machine learning and “data science” have been popular topics in the tech circles for years now but their usage hasn’t followed the buzz in the more traditional industries. Companies have largely recognized the potential of data driven decision making, yet the wide spread adoption is still on the way. One common explanation for this is the lack of expertise in the area. In a much cited report, McKinsey&Co predicted that by 2018 the US market alone will have a six figure shortage of people with analytical expertise. Because of or despite this, Amazon Web Services (AWS), Google and Microsoft along with many smaller companies have made big bets for this explanation with their cloud-based machine learning services.
Google opened the game already in 2012 with Prediction API, Microsoft followed suit with Azure Machine Learning and last year AWS Machine Learning was launched. Important questions on the table are who are these services for and which one is the right for me, if any. What all these services promise is “machine learning for everybody”. Just like with dietary supplements, what sounds too good to be true usually is. After all, I wouldn’t want a “data scientist” to build my house even with the latest tools. Yet unlike with most dietary supplements, there is some truth to the claim. These services are easy to set up and to integrate to an existing cloud infrastructure. However, what is gained in simplicity is lost in flexibility. Before talking more about this tradeoff, let’s have a quick look at what AWS and Azure can do. I haven’t used the Prediction API and hence I consider it unfair to talk about its capabilities.
Azure vs AWS
Even though Azure and AWS make similar promises about their offerings, the actual products differ vastly. AWS has opted for a “blackbox” that takes in the data, hums for a while and pushes out the results while Azure’s drag’n’drop solution gives the user a lot more control. Whether this is good or bad, depends on your point of view. AWS offers a lot more limited number of algorithms. Currently AWS Machine Learning can only do linear and logistic regressions while Azure also offers for example decision trees, random forests, neural networks, K-means clustering and support vector machines. Azure also allows embedded R and Python scripts which extend the possibilities even further. In short, AWS offers a few algorithms in a can that does everything from data normalization to result visualization for you while Azure offers more control and more options.
On the practical side, although both allow the usage of external data both services work best with data stored within their cloud environment. Both allow batch as well as stream analytics and are priced as follows. AWS charges 0.42$ per compute hour for data analysis and model training and 0.1$ and 1$ for a thousand batch and real time predictions, respectively. Pricing for Azure is a bit more complicated with 9,99$/month per user, 1$ for experiment hour in the studio, 2$ for production API compute hour and 0.50$ for 1000 API predictions.
One of the great advantages of machine learning services is that they integrate really well in the cloud infrastructure. Hence the choice of the machine learning service provider is really a choice of the cloud infrastructure provider. For companies already operating in the cloud, the choice is between their current provider, custom solution and status quo. Comparing custom solution against the machine learning service is very different for AWS and Azure users. AWS is good for what it does but the scope is very limited. Thus, if you’re only doing simple classification and/or regression and want it easily integrated to for example red shift, AWS Machine Learning is a good choice. Azure has a very different position. They are aiming for customers who probably have the skills to produce customized solutions but choose to use Azure instead.
So “machine learning for everybody” is kind of true for AWS. The “black box” really enables anyone to do the analysis but there’s also downside to this. This easily leads to so called “danger zone” where a person with subject matter expertise gets a tool to do machine learning without truly understanding what goes on underneath the hood. This is also somewhat true for the Azure, even though it requires at least the understanding of how data flows from an analytical phase to another.
Machine learning services can do a lot for you. They provide “battle tested” algorithms from the data science teams of the world’s biggest tech companies and make it relatively easy to apply them in practice. However, what they don’t do is experiment design or data wrangling. Most efficient algorithms are useless if they are not suitable for the problem at hand, if the data is not good or not used in a proper way or if the question is not posed in a right angle.
At the end of day, the most interesting question is whether the cloud-based services will be able popularize the use of machine learning. Even though there probably will not be large changes at the far ends of the spectrum I expect to see movement at the margin. Some companies that would have otherwise been reluctant to invest in hiring analytical knowhow will find it easier and more affordable to apply advanced analytical tools. The effects of AWS and Azure will likely be different. New AWS Machine Learning customers come more likely from the extensive and new Azure Machine Learning customers more likely from the intensive margin. This means that new AWS Machine Learning were more likely making a decision whether to use machine learning at all wile new Azure customers were more likely considering customized solution. It will also be interesting to see if significantly better machine learning will help Microsoft in catching up AWS.