Machine Learning and the Future of Efficient Data Science Configurations

As machine learning (ML) becomes the norm of the data industry, it’s clear that its potential is limited within current data structures. Tabular data strategies have decreased in efficiency, as they lack the ability to acknowledge relationships interconnecting various data points. Data models create complexities as they become scaled by data scientists, constructed by different code languages for different data pipelines, yielding low productivity and high development times. Leaders in ML and data consumption look towards improving data science strategies to widely increase data value and data model efficiency.

DBTA recently hosted a webinar, “Machine Learning Today: New Technologies and Strategies,” where ML experts, Dr. Victor Lee, VP of machine learning and AI at TigerGraph; Julian Forero, senior product marketing manager at Snowflake; and Greg Steck, senior director of industry solutions at Katana Graph, discussed the ways in which ML must be adapted to new strategies for data consumption in order to generate the most value for enterprise growth.

Data model efficiency within organizations poses an issue for efficient data processing, according to Forero. Organizations struggle with development and deployment of data models due to different languages and tools incorporated within the project, forcing data scientists to engage in menial data tasks. Forero offers Snowflake’s latest addition, Snowpark, as a solution; the program accommodates varying code languages and data model tools under a single, collaborative layer for data access within Snowflake. The program eliminates the need for several data pipelines connected to various systems, without the worry of tool requirements or language adaptation. This enables enterprises to disregard archaic processing approaches that are associated with large amounts of complexity and loose governance control and focuses on ML through data pipelines and scale automation.

One of the keys to data value lies in data variety and how that data is structured, according to Lee. You can only extract so much data from your assets, requiring a need for diverse data insights that target different aspects of information. Focusing on the interconnected shape of data, i.e., using graph data science, yields increased data value through the acknowledgement of its variety and unique interplays with other entities. Graphs inherently look for patterns and follow connections between points of data, which can be applied to a broad set of use cases; issues like fraud detection, compliance adherence, maintenance impacts, and customer visibility can all be addressed through graph database analytics.

When ML is integrated with data science, data is enriched by graph algorithms, queries, and graph neural networks (GNN) for optimal data consumption value.

Steck offered insight regarding how GNNs improve graph function through ML and increase model accuracy: while traditional ML relies on modelers to perform feature engineering and aggregate which tradeline features are most impactful, GNNs can automate the feature engineering process and learn which features from the tradeline are most important. GNNs learn from complex relationships between consumers, who are connected via various paths, and uncover signals that offer data of enterprise interest.

Katana Graph, as introduced by Steck, uses GNNs to create more useful data structures for optimal and accurate information consumption. Katana Graph operates on several levels, including graph database and query, graph AI and ML, as well as graph analytics and mining. The program’s promise of efficient, scalable, and flexible data graphing, which supports cloud native systems and developer contributions, will benefit a variety of industries seeking ML-driven data analytics.

You can view an archived version of this webinar here.