What Gartner's Magic Quadrant for Advanced Analytics doesn't tell you
tl;dr: Gartner's Magic Quadrant for advanced analytics doesn't do a good job in helping to decide which analytics tool(s) to acquire. It also misses the point that open source projects are the real leaders in advanced analytics. Vendors that position themselves to complement open source projects rather than competing against them will be ones to watch.
As a first disclaimer, I'd like to state that I believe that the specific tool you use is not the most important thing in what today is called advanced analytics. It is far more important to get the right people in the right roles working towards the right goals. Nevertheless, choosing the right tool(s) helps discovery, development and deployment as well as recruitement so the decision shouldn't be treated lightly.
When trying to evaluate competencies of different vendors, many decision makers turn to Gartner and in particular, Gartner's Magic Quadrant (GMQ). For those who aren't acquinted with GMQ, it is a square that has "ability to execute" on the y-axis and "vision" on the x-axis. The square is divided into four smaller squares which are named niche player, challenger, visionary and leader sections. Companies fall on one of the smaller squares based on their scores in the "ability to execute" and "vision". 2016 version can be found here.
Looking backwards is not the way to analyze advanced analytics
These two measures are produced from a combination analyst reports and customer surveys. This brings us to my first concern. Oddly, the analysis for advanced analytics is not advanced at all. I concede that making predictions about the direction of the market is challenging but they should have included something about the trend. Instead of showing us a video clip we get to see a photo.
Fortunately, there have been few efforts to capture who is moving in and out of fashion. If we measure academic use with google scholar hits in 2014 and 2015, we can see that SPSS is still the most popular tool despite tanking 30 % in a single year. Another GMQ leader SAS did not fare much better with 25 % decrease. At the other end of the spectrum, Python usage grew most rapidly followed by KNIME, RapidMiner and R. Kdnuggets polls data analysts/scientists annually and probably captures the sentiment of more tech oriented folks. Latest poll asked:"What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months”. Most respondents have used either R or Python while RapidMiner was the most popular commercial tool (It has a free "community" version).
Gartner doesn't disclose how it determines the scores
Secondly, their analysis is a blackbox. Two dimensions are derived from more granular evaluation criteria (listed here). Gartner does not publish specific scores because “Gartner does not provide specific scores because scores are based on not just quantitative elements, but qualitative as well, so they are not strictly mathematical calculations”. It also utilizes client surveys which can be a good source of information when used effectively. Surveys can give a good idea what are the pros and cons of a specific product. But when survey results are compared against each other we have a problem. Suppose you ask Volkswagen and Audi drivers whether they are satisfied with the acceleration and compare the answers to determine which car runs faster…
All tools do not have to come from a single vendor
Another important consideration is that all analytics capabilities do not have to come from a single vendor. Bob Muenchen raises a great point in his blog: In 2016 GMQ you had to have a visual composition framework (VCF) to be included. He writes "What Garter is saying is, in essence, advanced analytics software that does not use the workflow interface is not worth following! ". I wholeheartedly agree with this concern but I think it goes even deeper.
With few notable exceptions, most vendors are embracing interconnectivity. There’s nothing special about having your data in the Amazon cloud, doing data management with your platforms native solutions, heavier analytics with R, simple ad hoc discovery with KNIME while visualizing your results with Tableau. Hence, if a vendor specializes in machine learning and leaves visual discovery for another software it can’t be included. Given Gartner’s influence, this means that their role has shifted from observer to market shaper (if it hadn’t already, that is).
The market is changing rapidly
I'm willing to go as far as stating that there are no leaders in the advanced analytics software industry. All commercial vendors have been left standing by the open source projects. During the last years when “data science” has become the latest hype word, discussion forums have become infested with topics like “What language should I learn” and “how to get into data science”. If you spend 15 minutes looking into the discussion it can summarized by asking “R or Python” and answering “it doesn’t really matter”. This pretty much sums up what people are learning. And it’s not just learning, Random Forest implementation came to R in 2001 while it took SAS until 2013 to get it out.
Don't fight the flow, use it
The fact is that even if the largest vendors have huge data science/engineering teams full of world class talent, they can’t match the pace of the open source communities. And most have realized that they don’t have to. R and Python have few shortcomings that have held enterprises from adopting them: Non-trivial to scale, hard to deploy into production, requires rare skills and lack of support.
VCF approach solutions like KNIME, RapidMiner and Alteryx tap into the skill problem by allowing simple modeling without coding. Yet they usually allow you to extend your workflows with R code. This is great for more advanced users, flexibility and collaboration between data scientists and more business oriented users. Having competent analysts implementing models that can be consumed by others allows advanced analytics to reach wider audiences within the organization.
Another way to make use of the open source is to build on top of it. A great example was Revolution Analytics that goes by the name Microsoft R now (Acquired last year). Have a deeper look here.
Adaptation is the key for giants
Even if the market is getting more fragmented, "one stop vendors" have their pros. One of the corner stones of SAS success is that it offers a single throat to strangle when things fall apart. There are definitely benefits in this approach but it comes with a potentially huge cost. In a rapidly changing market, vendor lock-in is a concern that should be carefully thought through. SAS, the old king of analytics, has been slow to change for the new world. While it is still perfectly viable option, there are big concerns whether their closed ecosystem can keep up with the development. There is another kind of example though. Under Satya Nadella Microsoft has been able to turn itself around very quickly. It is remarkable that a company whose CEO not so long ago called open source cancer now embraces Linux and external software.
When trying to decide what analytical tool(s) suits your needs the best, there are two important factors to consider: What problems do you need solve and what skills do you have to solve these problems. Furthermore, you need to make sure sure that you can match the future challenges and attract the talent you need. Many vendors can solve todays problems and a combination of tools can probably do this even better. Betting against open source is risky in two ways. Firstly, not a single vendor can match the breadth of packages R and Python developers are mushrooming. Secondly, it will hard to attract the best talent with outdated software.