Data Analysis Stalls When Companies Can't Find Critical Datasets

Image courtesy of Shutterstock

A mining truck in South America rolls down a dusty road on a hot day. High-tech sensors pick up an overheated bearing, and the data is instantly transmitted to the mining company’s data center. Now what?

This scenario can go one of two ways, only one of which will save the mining company from spending a lot of money on stranded trucks. If the company’s CEO instructed the engineering and business departments to have a data-driven plan for predictive maintenance, then the truck gets taken out of service and gets a new bearing, saving time and repair costs. If the overheated bearing goes unnoticed because the dataset is stuck in a dark, inaccessible corner of the data infrastructure, then you have a buildup of stranded trucks. Enough stranded trucks eventually leads to the company's demise.

In most industries, today’s business plans can’t just focus on heavy machinery and its maintenance. Most strategic plans must encompass a third element: finding efficient ways to use performance information that emanates from the day-to-day business.

A Data Driven-Business Plan

The business world continues to shortchange a critical step between storing and analyzing the explosion of new data expected over the coming years. A business can quickly move from the old world of siloed, unusable data to a new one where stakeholders around the globe can find information in a few minutes from their local access points.

The “findability” of data is particularly important for Global 1000 companies pursuing Industrial Internet–related innovation. Once a product design team stores a turbine’s schematics in a lab in the United Kingdom, for example, there’s little chance a group of sustaining engineers in the United States will know where to find those schematics 8 years later when those drawings need to be used to check an anomalous sensor reading. In order to deliver predictive maintenance and new product innovation, design and sustaining engineering teams need to review massive unstructured datasets, such as geometry files, simulations and telemetry data coming off of equipment in the field throughout that machinery’s lifecycle—which often spans several years. However, these organizations have recently experienced a rude awakening; these files are buried in the enterprise to the point where they are effectively lost.  

Putting together a data-driven business plan requires an organization to ask some hard questions about whether it can actually locate its own data. Finding data often burns huge amounts of time and effort—engineers spend more than 15% to 20% of their time searching for critical files.

Gone, but Not Forgotten

We often make the mistake of thinking that enterprises access data as being as accessible as services in our daily lives—everything being a tap on the smartphone or a few clicks on a laptop away. Of course, it’s quite the opposite for the enterprise.

Engineering and business cannot simply abdicate data management and access responsibility to the IT team; it must be a collaborative effort. With each technology refresh conducted by the IT department, unstructured data is migrated, causing pathnames to change and links to break. Consequently, the information on where these critical datasets reside diminishes or is completely lost over time.

When you deal with product lifecycles that exceed 10 years, employee turnover also plays a role in losing these assets. Ultimately, these critical datasets go “dark” over time. Aggregating dispersed datasets isn’t easy. Even if you could locate a file this large—some simulations can approach 100 gigabytes—it is too big to travel through the network via email or other traditional means.

Now imagine the needle-in-a-haystack exercise when an employee attempts to retrieve and collaborate on a terabyte-size simulation in 2020. By that time, data will have multiplied tenfold from 2014, according to some estimates.

A Timeless Point of Access

Companies today have an unprecedented capability to “abstract” data from physical hardware, which enables IT to change out hardware, reconfigure the back end or otherwise move datasets around without disrupting end users’ access points. Freed from hardware, datasets that correlate with each other or otherwise belong together can be placed into a single virtual “namespace,” regardless of their respective physical locations, where they can be easily retrieved, commingled and/or used for remote collaboration.

Settling on the right infrastructure can be a huge effort for an enterprise. For industrial manufacturers racing to deliver next-generation heavy machinery design and maintenance, a single unified namespace ensures that engineers know where to find datasets relevant to an individual piece of serialized equipment—for example, a turbine, engine, fuselage, pump or drill. This could include the important informal “homework” notes that illustrate how the original engineers were thinking about specifications and other design details. These specifics are often buried in numerous laptops, desktops and other unhelpful archiving devices around the globe. When businesses can easily access these notes, and pair them with sensor-generated telemetry feeds, they are in a much better position to maintain and manufacture future devices.

We are living in an age of unprecedented data generation and reuse. The possibilities for innovation are only as good as the tools the enterprise implements to find valuable data. Engineers in 2016 cannot be blind to insights imparted many years ago by design and testing engineers that could help solve the problem in front of them today.