Machine learning (ML) is on the rise at businesses hungry for greater automation and intelligence with use cases spreading across industries. At the same time, most projects are still in the early phases. From selecting data sets and data platforms to architecting and optimizing data pipelines, there are many success factors to keep in mind.
Machine Learning was the focus of a workshop presented by Chelsey H Hill, assistant professor of business analytics, Feliciano School of Business, Montclair State University, at Data Summit 2019.
The sixth annual Data Summit conference is being held in Boston, May 21-22, 2019, with pre-conference workshops on May 20.
The advantages that ML offers organizations—the ability to automatically build models that can analyze huge volumes of data and deliver lightning-fast results—have also led to a growth in the availability of both commercial and open source frameworks, libraries, and toolkits for engineers.
“Machine learning is a data analytics technique that teaches computers to learn from experience,” Hill said. “From this algorithmic learning we will be able to either predict or understand.”
Machine learning is generally used to indentify spam email, for market segmentation, to make weather forecasts, product recommendation, and to detect credit card fraud, among other things, Hill explained.
The Cross-Industry Standard Process for Data Mining (CRISP-DM) should be used to guide any machine learning project, she said.
“The foundation of any project is that business understanding,” Hill said.
The general machine learning process consists of:
- Data collection
- Exploratory data analysis
- Model building
- Model testing/validation
- Model improvement
And there are two types of machine learning: supervised and unsupervised, she explained. Unsupervissed learning finds hidden patterns or data structures in the features or input data while supervised learning trains a model using features or input data, to predict how or why.
The goal is to input the data, abstract, and generalize.
“When we build our model we want to use it on unseen data so everything we are doing is about generalization,” Hill said.
Though computers are smart, they aren’t smart enough to operate without human intervention, she explained.
“Data doesn’t come in a nice, neat package,” Hill said.
Knowing the business, understanding the business problem, explore to understand and give business context is the key to successful implementation of machine learning.
Many Data Summit 2019 presentations are available for review at https://www.dbta.com/DataSummit/2019/Presentations.aspx.