Page 1 of 2 next >>

AI and Machine Learning: Data and Infrastructure Implications

Artificial intelligence (AI) is the most discussed hot topic of the moment, but the true impact on our society and the potential impact for enterprises are still unknown. While we may not fully understand its potential impact quite yet, we do understand how AI functions as well as the base level of what it could be capable of. We’re already seeing how AI, along with other forms of machine learning (ML) or deep learning, has the potential to revolutionize how organizations function across all industries by doing seemingly small things such as automating repetitive tasks and accelerating outcomes. Those seemingly small things add up and create a big impact.

AI, ML, and deep learning aren’t just what’s used in the top-of-mind generative AI chatbots everyone is talking about. Used properly, AI can utilize data to learn how to successfully diagnose diseases, be the first line of defense in ever-important fraud detection—and beyond just fraud detection, fraud prediction in banking—or to customize shopping experiences based on past customer history. In all of these cases, AI and ML are meant to improve customer experiences, lives, and business outcomes. In addition, while all these models are trained using existing data, they can also be trained to go through future data on their own, which may make finding value in the rapidly growing amount of data generated each day a little less overwhelming.

This “next era” of data usage and production creates some distinct challenges for IT leaders. The data being generated and stored is scaling in volumes that are exponentially larger than anything seen before.

Moreso, 80% of all data collected is unstructured—like video data and other imagery—which is more complex to store and manage; 90% of that data has been collected in the past decade, with more and more data being collected every day. For example, companies creating driver-assistance technology—a form of ML, specifically machine vision technology—have generated over an exabyte of data in just a few years. (An exabyte is equivalent to 1 billion gigabytes, which is a truly unfathomable amount of data.)

With AI/ML applications growing, and therefore the need for more data to “train” these growing applications, organizations have found themselves in a position in which they need to store more data than ever before. Unfortunately, legacy storage systems simply were not made for this level of scale and complexity. This is not the fault of legacy storage systems.

As mentioned, the amount of data being generated is immense and not something we could’ve predicted a decade ago. However, it’s important to understand the need for this data, what the data does now and what it could do in the future, and therefore why legacy storage systems just aren’t cutting it.

The Process of Processing

To develop AI, ML, and deep learning applications, we generally follow a three-step process for the data involved. First, we have data preparation, in which huge amounts of “raw materials” are translated into useable data. Next, software programs are trained to learn a new capability (or capabilities) from all of that processed data in what is called “model training.”

Finally, there’s the inference stage, in which the program applies this training to new data. This cycle happens continuously, and it all adds up to massive will double, or even triple, in capacity during the next few years due, in large part, to advances in AI/ML.

As AI rapidly evolves, the uncertainty of what is still to come, in conjunction with new sources of data appearing every day, has led to a storage crisis. Applications that have never produced data before are suddenly producing data at an astonishing rate.

On the flip side, applications that have never needed data to function now do. For example, while vacuums have never been capable of (or needed to) collect and store data, robot vacuums are now collecting and storing data in the cloud—something that had never been heard of before. Because we still aren’t sure what data will be valuable and when, most organizations are taking the route of storing everything in case there ends up being value in that data. The large datasets used for data preparation, as well as the datasets that AI, ML, and deep learning rely on to function, may be stored for decades. If models must be retrained, datasets might need to be stored for even longer periods, especially in case the dataset may be used to train a completely new model. The question, “What can my AI learn?” is really, “What data does my AI need in order to learn?” Until that question is answered, a storage solution that offers an inexpensive yet effective way to archive all that data, with the ability to easily retrieve it for reuse, is essential.

Page 1 of 2 next >>


Subscribe to Big Data Quarterly E-Edition