Discussing Best Practices for Machine Learning

Machine learning (ML) is projected to keep rising as companies look for greater automation and intelligence, with use cases spreading across industries.

A recent study fielded among the subscribers of Database Trends and Applications found that 48% currently have machine learning initiatives underway with another 20% considering adoption.

From data quality issues, to architecting and optimizing models and data pipelines, there are many success factors to keep in mind.

DBTA recently held a webinar with Gaurav Deshpande, VP of marketing, TigerGraph, and Robert Stanley, senior director special projects, Melissa Informatics, who discussed key technologies and strategies for adopting machine learning.

With quality data using AI, data discovery, pattern recognition and other benefits are possible, Stanley explained. However, business and research data can be complex and “dirty” and as a result business goals are often blocked by poor quality or incomplete data, leading to missed opportunities, errors, and inefficiencies.

Methods for data preparation and data quality (DQ) including data identification, classification, normalization, and integration, make data useful for AI, Stanley said.

Machine learning is best at identifying potentially actionable content from within data that is less structured (supervised ML) or less well understood (unsupervised ML).

ML shows real value in understanding diverse diseases at the molecular level, and it is useful to contextualize results within existing knowledge, or to follow up with research, according to Stanley.

There are many pitfalls to ML that can be avoided – mostly relating to spurious correlation and lack of mechanistic understanding that include:

  • Dirty /missing data
  • Unsupervised or semi-supervid learning on massive datasets often gives spurious results
  • Simple ML is often as effective as more complex ANN and “deep learning”

Melissa offers machine reasoning which is comprised of ontologies, linked data networks, and reasoning engines, Stanley explained. Machine reasoning can make sense out of incomplete or noisy data, making it possible to answer difficult questions.

TigerGraph offers something similar, according to Deshpande. Graph is a natural model for interconnected data. It is an organic way of modeling data for a variety of relationships and transactions. It can identify key data and process massive amounts of data and use the power of relationships and deep analytics to provide insights.

Graph powers explainable AI, Deshpande said. Machine learning with TigerGraph can seamlessly integrate multiple sources of data to provide unified and comprehensive view for each member, find similar members with a click of a button in real time, and deliver care path recommendations for similar members.

An archived on-demand replay of this webinar is available here.