Lynda Partner, senior vice president for products and offerings at Pythian, spoke at Data Summit Connect earlier this year about how to accelerate value with machine learning. In this interview with BDQ, Partner continues the conversation and highlights how to select the right use cases for ML, avoid mistakes, and manage an ML project once it becomes a reality.
How do you describe machine learning?
Machine learning can be described as an application of AI that provides systems the ability to automatically access data and use that data to learn and improve from experience without being explicitly programmed. While it’s fascinating to think that computers can learn without human intervention, the real value of machine learning comes when the business can use the outputs that are created as part of that learning—so when the result of machine learning is the ability to adjust an action according to what was learned, then you get valuable machine learning.
Can you provide some examples of where it is being used?
One of the most well-known is evident every time we visit Netflix or Amazon and we are shown suggestions about what we should watch or buy next. To do this, machine learning algorithms have accessed tons of data about you and others like you to learn what you are most likely to be interested in, and then they take that recommendation and automatically display it to you, and just you. This has been incredibly effective in boosting sales in the case of Amazon and engagement in the case of Netflix. According to McKinsey, 35% of what consumers purchase on Amazon and 75% of what they watch on Netflix come from product recommendations based on machine learning algorithms.
What is the benefit for businesses?
These algorithms have been able to generate high-quality recommendations that actually result in strong business outcomes by accessing massive amounts of data and then producing recommendations in real time. And because the costs to process this amount of data have come down with the advent of cloud computing and big data systems, the business case for using machine learning is much more compelling than it was say a decade ago.
Machine learning isn’t just for consumer-facing services like Netflix and Amazon, it also has increasingly widespread use behind the scenes. In industrial applications, ML models are predicting when equipment needs to be serviced before it actually fails. Banks are using ML models to predict the likelihood that a transaction is fraudulent so they can cancel a credit card before it is used for more stolen purchases. Chatbots are making employees more efficient by answering questions intelligently, saving the time employees would otherwise spend sifting through FAQs or taking up a live person’s time. And supply chains are being optimized as algorithms take into account more possible factors than any human could keep in mind.
Why is it so important for companies to get started now?
As more and more companies realize that algorithms can do things at a scale and speed that humans or regular analytics can’t match, machine learning will no longer be optional. Competitors will be using it and providing customer experiences, efficiencies or improved functionality that will give them an edge, forcing its adoption more broadly just to keep up. It is a change that will be impactful as the advent of computers themselves.
Can you describe the level of maturity and experience with machine learning at organizations you work with?
The media does a great job of showcasing pioneers of machine learning, but the reality that we see everyday is different. Yes, there are a few companies who are very mature but most companies are still early in their ML journey. Even those who are actively developing models tend to be early in their lifecycle. We find very few companies who have deployed ML models at scale within their organizations.
As a consultancy and solution provider, how does Pythian seek to help companies on their journey to leveraging machine learning?
We focus on helping companies get to business value through the use of machine learning. Building models for the sake of models is not the goal; the end goal is seeing the model working at scale, all by itself, learning and delivering outputs that drive revenue growth, or create loyal customers, or reduce costs, or even save lives. There are many steps in this journey; it’s much more than hiring data scientists and setting them loose. We focus on the entire lifecycle of machine learning from ideation to implementation at scale and the care and feeding of models beyond that.
Has the pandemic accelerated or slowed down activities in this area?
During the pandemic we saw a shift toward ML use cases that focused on either cost savings or enhancing the digital customer experience. It really depended on how each organization was affected. But I would say that the shift to cost savings was the most obvious. Those companies that were forced to become more digital were usually preoccupied with less sophisticated digitization projects like adding curbside pickup if they were a retailer or dealing with huge growth in online traffic. Others, whose business was really hurt by the pandemic, took the opportunity to look at ML to reduce costs and become more efficient.
What are the issues that are speeding or hindering efforts to adopt machine learning?
The worst thing that can happen as companies start to adopt machine learning is that they realize a string of failures upfront that might sour them against future investments. So, to me, spending the time to pick the best use cases, the ones that are most likely to be winners—even small winners—is one of the best things a company can do.
A second area to focus on—and it's tied to my previous point—is the need for collaboration across multiple teams. A successful machine learning project involves IT, business owners, data scientists, architects, security folk and more. Bringing these people together early in the process is a critical success factor.
What else is important?
Having access to lots of data—clean, integrated data—is critical. Without data, your model will starve to death, and starving people can’t run marathons, which is what you want your model to do for you. I joke that a machine learning model is like Sesame Street’s Cookie Monster with his insatiable hunger for cookies. In the case of machine learning, your model has an insatiable appetite for data. Your model is only as good as the data you feed it.
Are there market segments or departments that will benefit most from its use?
There isn’t a functional area that can’t benefit from machine learning—but not all projects are created equal. Picking the right use cases is so important. For every eight projects that are started, we see four making it to the deployment phase, and 1 getting deployed into production. Based on published survey data about the effort and duration for each phase, we think that can translate into $1 million to $2 million per successfully deployed model. That cost can be reduced with better upfront planning and easy access to data but with that kind of spend profile, we see machine learning projects more often in larger enterprises.
How do you advise companies to get started? Are there predictable steps in the process?
There are predictable steps, but we often find that companies start in the wrong place. You want to avoid doing models for the sake of modeling and ensure you focus on selecting the best use case before you start modeling. The best use case is one that can be tied directly to a business problem or opportunity. The best use case when you are starting out is one that has a high technical-feasibility score and a low data-risk score. These two attributes will help you deliver faster which will make the ROI on your project higher. To do all this requires a team of businesspeople, analysts, IT folk, and data scientists all working together.
What are some of the adjacent technologies and methodologies that organizations need to focus on?
Depending on which study you look at, data scientists spend between 45% and 80% of their time integrating and cleaning data, when they really want to spend all their time visualizing and modeling. An enterprise data platform that provides access to integrated, cleaned data in its non-aggregated form would speed up machine learning projects immensely. It is the single best thing you can do to accelerate your outcomes and make your project more cost-effective at the same time.
Beyond rigorous use case selection, what else should organizations do early on?
They should also invest in a cloud data platform for easy access to data—you need IT for this. In addition, they should start a data governance program, and educate more people about ML to reduce the communications gap. For companies that have already done those things, it is time to invest in an integrated development environment for faster model deployment with IT help. They should also invest in MLOps skills, tools and processes and start thinking about model management.
Can you provide an example of a company that has been able to achieve goals using machine learning that would otherwise not have been possible?
I love the Teck story. Teck Resources is a large mining company that uses trucks to haul ore out from mines. They move non-stop, night and day, and operating them consumes 40% of the mining site costs. Each minute they are operating translates into real revenue and each minute they aren’t costs the operation a lot of money and lost revenue. When they do break down, they often are not in a convenient location. Imagine having to go out into a mine to try to repair a broken down million-pound truck. Teck used machine learning to predict failures before they actually happened by analyzing the terabytes of sensor data coming from the trucks along with maintenance, scheduling, and other truck lifecycle records. They were able to significantly reduce downtime and the associated more costly repairs.
Looking ahead, where do you see the greatest potential of machine learning?
Because there is no shortage of places that machine learning can be applied, its greatest potential will be unlocked when its complexity can be reduced. Like the advent of low and no code which brings programming capabilities to the non-programmer, machine learning’s full potential will be fully realized when a, data scientists can be made more productive by eliminating the non-modeling parts of their job, and b, when even the modeling parts of their job are accelerated with pre-packaged models that can be adjusted—not developed from scratch.