Page 1 of 3 next >>

The Importance of Data for Applications and AI

It is time to rethink the approach to solving problems with software. The new approach must be data first, not application first. Data is the key to every downstream activity. Without considering the data first we will continue to incur this high-interest technical debt. Dependencies on data cost more than software dependencies but are constantly overlooked. Why is this important? Well, business applications generate the data used by downstream artificial intelligence (AI)  and analytics tools. It is time to realize that downstream use cases are the most critical of use cases; no longer shall AI and analytics use cases be an afterthought. Then there is the notion of undeclared consumers. Whether we like it or not, this has happened in every business where someone has access to data and a keen interest to use it to solve their problems.

How did this come to be? The problem with common software development practices is that they tend to focus on data only as a side effect of an application instead of a driver to every downstream activity within an organization. This is rather dissimilar to other industries. Consider the oil and gas industry where the upstream side of the house finds oil and makes it available to the downstream side of the business. These companies understand that finding and safely delivering the oil is the most important part of the pipeline.

For more articles on big data trends, access the BIG DATA SOURCEBOOK

A root cause of this problem within most software development organizations is that it is easier to create an application and worry about the data later. This leads to software solutions that are more complex than necessary. Occam’s razor states, “More things should not be used than are necessary.” Putting a data-first focus in place within an enterprise architecture is possible but can be difficult to implement without support from the organization. If a data-first approach hasn’t been “the approach” in the past, extra work is necessary for a successful first iteration. The whole thing gets easier with each successive iteration.

Key Organizational Requirements

Here are some key organizational requirements which are often overlooked. Downstream use cases far outnumber the initial use case which created the data. Don’t think that this stops only with the top-level applications in an organization. This is equally important for those depending on the output from machine learning models. Keeping track of data consumers is as important as the fact that those downstream tools and applications also need to operate in a secure manner. Data sources must be able to be versioned, along with extract-transform-load (ETL) workflows and analytics jobs. Machine learning models and their outputs also need to be able to be versioned and—guess what—so does all that real-time data, because it is being used for analytics and modeling.

Three Approaches to New Solutions

Let’s discuss three ways for an enterprise to approach new solutions.

Approach one starts with underlying technologies that are already in place within the organization. Unless your organization is very young and built on a platform that can truly fulfill all of your future needs, legacy technologies must be taken into consideration. It is easy to just throw more applications on top of the same old technologies. After all, there are standard operating and disaster recovery procedures in place for those technologies. It is, however, unlikely that those legacy technologies will be capable of delivering on the plethora of benefits awaiting the newer technology users. The key benefit to this option is that it is likely the fastest option to get a new solution into a production-ready environment.

Page 1 of 3 next >>


Subscribe to Big Data Quarterly E-Edition