Cloud analytics has changed the game and old tech players are about to be marginalized

Old_tech_players_v2.jpg

Cloud analytics has changed the game and old tech players are about to be marginalized

Postponing cloud transformation is like burning the candle at both ends while not taking the time to install a light bulb; eventually you’ll end up in the dark.

In 2017, if you’re still using on-premise systems to support your analytics needs, maybe you’re just about able to keep your head above the water with the cost of infrastructure, management costs and technical debt – but most likely you’re not. 

Cloud is the platform for technology and data science innovation today and capabilities are  already years ahead of on premise. As we speak, old technology monolites are giving way to new players in the market.

The age of separate IT and business is over. In the modern digital world, companies need to be able to utilize technology as part of their core business. Machine learning, personalized digital services and cognitive computing bundled with a rapidly expanding volume and variety of data take us into a world where traditional IT solutions aren’t options for companies looking to stay current. 

Innovation is largely driven by the open source community and single vendor solutions leave companies at the mercy of a single provider to incorporate all the newest technology in their offering. 

Especially in data and analytics, companies looking to build competitive advantage have only one viable option: Open public cloud driven by world class platform as a service components, open source tools and open source compatible and complementary software solutions.

Welcome to the present day!

Long gone are the days when IT was nothing but an irritation in CFOs quarterly presentation. Today companies are investing in data and analytics and IT is moving from isolated IT departments to the center of the business. 

Personalized experiences, targeted marketing, new and smarter products, chatbots, optimized production processes and data driven decision making are but a few examples of what companies are looking into. Today’s software is increasingly built on top of uncertainty and probability with few guarantees that the original approach will yield expected or the best results.

This and ever increasing pace of technological innovation, growing mountains of data and more demanding customers are forcing companies to adopt agile ways of doing things. 
There is simply no way to commit to a 5 year EDW project in 2017 when no one knows what the future holds. With so much short and long term uncertainty in the air, company culture must become more like an R&D house rather than Ford’s assembly line. 

As business and IT are merging into one, the IT stack needs to be able to adapt to changing conditions and be an enabler of innovation, not an immutable fortress where new projects go to die.

And nowhere is this truer than in computational applications: Predictive analytics, machine learning, deep learning and cognitive computing call new ways for building solutions. Working with unstructured data needs flexible frameworks for processing data and deep learning models need huge amounts of computational resources when they are trained. 

New technologies to develop solutions are mushrooming and companies need to be able to utilize them if needed. Ever increasing data size and variety call for scalable compute and storage solutions.

Need for speed

So today’s IT infrastructure needs to be fast to adapt to changes, enable rapid experimentation, scale to massive amounts and varieties of data and computation and facilitate the utilization of the latest innovations today and in the future.
Especially in the realm of bigdata and data science, open source has become the norm. 

Open source languages such as R and Python offer unparalleled number of analytical libraries and world favored bigdata technologies such as Hadoop, Spark and Kafka are these days managed by Apache Software foundation and are free for anyone to use. 

This is largely thanks to many large technology companies who open source their internal projects. Great examples would be Hive from Facebook, Tensorflow from Google, Luigi from Spotify and CNTK from Microsoft.

On top of companies releasing their own projects as open source, they also contribute significantly to open source movement. Not to mention thousands and thousands of developers who solve their own problems and share their solutions with the rest of the world. Companies releasing their projects have realized that instead of swimming against the current, they can themselves benefit from it.

When innovation is coming from a large number of sources like technology companies, open source movement and academia, it is a far-fetched idea that a single proprietary technology would be able to keep up. 

Getting vendor locked to a company that is able to ramp up its license fees because they know your way out is cumbersome and expensive is bad enough. Getting vendor locked to an expensive solutions provider that keeps you behind the competition in innovation is a real nightmare.

On top of the latest innovation, open source has number of benefits attached. Technology alone won’t achieve analytical goals, it also needs talented people to build applications with it. Universities are utilizing open source software in their education very widely these days. 

If you hire a statistician, she probably knows R whereas physics and engineering students most likely know Python. Simply put, best people want to work with the best technology.

Vertical and horizontal complements

Even if open source technology is at the core of analytics, it’s has several drawbacks. Firstly, even if many applications are utilizing open source, building and maintaining an open source data infrastructure cost time and money. Keeping software and security up to date is a tedious yet necessary task. Very few companies have the desire or even the means to do continuous software updates, invest in world class security experts while keeping all the systems running without interruptions.

Secondly, many open source tools suffer from similar drawbacks: They are non-trivial to scale, hard to deploy into production, require scarce skillsets and lack commercial support. Furthermore, in many tasks open source tools are simply not desirable options. If we look at the analytics software landscape we can see that some are pretty much doing the same things as open source tools, some are in the spots where open source isn’t at its strongest and some are building on top of open source to mitigate the above mentioned issues.

Hence, we can categorize commercial solutions into three classes:

1.       Substitutes

Some solutions are basically substitutes for open source. They are closed solutions that don’t play well with open source and have similar functionality. Good examples would be SAS and SPSS which cover the modelling part of the analytics where open source is at its strongest yet provide limited interactivity.

2.       Horizontal complements

Horizontal complements are products that fill niches where open source is not strong. For example, open source tools are not very good at creating enterprise reporting and do not provide visual interfaces for data exploration. For seasoned data scientists, this isn’t a problem but business users and data analysts have different requirements. Great examples of horizontal complements would be Business Intelligence vendors and data exploration tools such as Alteryx, Knime and Rapidminer.

3.       Vertical complements

Vertical complements extend the capabilities of open source without competing against them. Good examples would be EMR from Amazon Web Services and HDInsight from Microsoft. These offer clusters from Hadoop ecosystem on demand. It’s a whole different experience from buying all the required hardware, installing software and maintaining a cluster when you can press a button and start developing. These not only allow rapid innovation, but also scalable and flexible environments where you don’t need to fit the capacity to satisfy your most intensive computations but can increase the size of your cluster to serve computational peaks. Another great example would be Microsoft R server that helps R programmers to develop scalable solutions and deploy them into production with few lines of code.

Open cloud

In the modern world, the only technology that allows companies to stay up to date with security, technology and allows the needed scalability and flexibility at reasonable cost is an open cloud. A data and analytics platform that is built upon PaaS infrastructure, utilizes open source innovation that is topped products that help you scale and deploy those with ease, allows utilization of third party software when needed will stand the test of time. Technology providers that can’t provide Spark and Hadoop as a service, managed container services or GPUs and deployment options for deep learning today, will probably not provide you tomorrows solutions on time either.

The time of one-stop-shops is over but going fully open source is rarely an option either. Modern architecture for analytics is about finding the right balance between commercial services and open source. A scalable and flexible base infrastructure, the latest innovation from open source combined with horizontally and vertically complementary software is the winning combination.
 

Get the eBook: 15 must wins of data cloud