Anyone who has spent time in the highly creative tech world knows that techies are a major contributor of new phrases and terms. In 2018, we helped make predictive and haptics part of Merriam-Webster, and there are sure to be more tech terms added in 2019. Here’s one that anyone working with data should learn: data pipelining.
This up-and-coming concept is a component of data orchestration, and you should know it because it helps enterprises overcome big data pain points. Realistically, you’ll need to move beyond just knowing the term and understand how to build a landscape in 2019 that supports data pipelines, as they are a catalyst to enabling real-time analytics, machine learning, predictive analytics, IoT, and logical data warehousing.
These are the big data projects that executives are expecting to deliver meaningful value in the coming year, but not all of these projects have the right tools and resources in place. During the past few years, one reoccurring misstep has stalled big data projects. Too many have tried, and failed, to centralize data within one system or a Hadoop data lake. Data pipelines that automate processes across a logical data hub avoid that time-consuming requirement and allow data to stay where it is so that organizations can start driving value from their data now—not later.
The new generation of data pipelines also supports governance, which is sure to be in the spotlight during 2019. Enterprises will need to be much more stringent about their data usage and lineage, based on new regulations such as GDPR and consumers’ higher expectations of security and privacy. Effective governance ensures access controls to the huge volumes of data that is now fueling the business. Separately, but as important, governance assures the data is trustworthy and accurate for automated operational processes and predictive or prescriptive analytics that impact business decisions.
Getting to Know Data Pipelines
Data pipelines focus on helping discover a signal from one data silo and sharing that signal with all relevant data consumers. Data pipeline projects can move currently stalled big data projects out of c-suite concept and into live production. This new approach makes data not only available, it allows you to quickly discover, understand, gather, cleanse and analyze all business-relevant information about a product, customer or market—regardless of where it’s stored or its current format. What makes data pipelines incredibly appealing is that they upend, in the best way possible, how your organization can leverage its data:
- Organizations can excel in customer-facing activities and respond quickly to changes in customer behavior and markets. These responses can include revised pricing, churn prevention, cross- and up-sell offers, and promotion optimization.
- Comprehensive, up-to-date data will lead to lower risk and cost by optimizing internal operations in areas such as predictive maintenance, supply chain optimization and fraud prevention.
- Expanded digital offerings and new business models, such as sales of customer, market and product data; and new analytics-as-a-service offerings, will create new revenue streams.
At its foundation, data pipelines break down the age-old problem of having huge volumes of data in multiple types of formats in different locations. Data pipelines deliver:
- Automated, centralized orchestration of data
- A single, cross-landscape data control center to manage governance policies
- Efficiently processed data from all sources to unlock new use cases
- Minimal data movement and duplication
Data pipelining achieves these goals by combining technologies and processes to provide repeatable, factory-like workflows that meet the multiple challenges that can slow or even stop traditional approaches. That speed is vital, as organizations in every industry now struggle to respond to fast-changing user needs and market conditions.
Most organizations are not equipped to effectively use and operationalize data pipelines fully. To enable data pipelines, they will need to invest in a logical data hub project that either integrates with existing platforms or adds systems that support data integration, data quality, in-memory computing, machine learning, cloud platforms, and a new generation of query and reporting tools capable of advanced analytics into their environments. They can assemble these capabilities together, or evaluate data pipelining solutions that include all these capabilities within a product.
Once the technologies and platforms are in place, enterprises will gain the automation and data orchestration available through data pipelines. This comprehensive approach will allow enterprises to consistently:
- Ingest the data. During this ongoing process, a data orchestration solution identifies and understands all the internal and external sources of data required for a complete picture of customer needs, operational trends and market conditions. This includes understanding when new data becomes available, where it’s stored and how it must be processed to become useful.
- Understand the data. Data is more powerful when more people can leverage it. Data pipelines allow correlations and relationships in the data to be made visible by offering metadata to users. By describing the source of a dataset, the type of information it contains or previous ways the data has been used, metadata can give business users new and different types of information for solving business problems. Easy-to-use metadata catalogs can help nontechnical users find, understand and use this information.
- Refine the data. Dirty data slows down processes that depend on speed. Data pipelines makes it easier for data quality management and data cleansing, which are increasingly important in the context of real-time analytics and new data sources, such as IoT deployments.
- Enrich the data. Correlating data from the enterprise with data from other sources, such as social media, provides accurate, real-time insights that help the organization increase both sales and customer retention.
- Analyze the data in place. In-place analysis helps an organization understand activities that take place at the edge of the network — such as a production facility or store—and then take needed actions quickly enough to prevent an interruption in production or a lost sale.
- Automate processes. Automating these steps reduces both the time and the cost of performing business-critical data analyses. It also improves the consistency of an organization’s data management, which helps increase security and efficiency.
- Continuously improve. Organizations that track metrics—such as the cost and speed of data movement, curation and transformation, as well as workflow speed—can more easily improve them.
- Assure compliance. Security, privacy, and regulatory compliance can be automated with centralized, orchestrated processes such as encryption, identity management, access control and data usage audits.
In this data-centric environment, big data projects transition from being a problem that needs to be solved to a solution that delivers business value every day.
Proving Big Data’s Value
An Accenture analytics survey from 2014 showed that almost eight in 10 users (79%) agree that “companies that do not embrace big data will lose their competitive position and may even face extinction.” Even more (83%) have pursued big data projects in order to seize a competitive edge. Clearly, executives expect big things from their big data projects. In 2019, the question is where are these projects now?
In the coming year, will you be able to show how and where big data is paying off? Instead of justifying delays in the data lake project, date pipelines within a comprehensive data hub provide the access to the data you need whether it’s on premise, in the cloud, in a data warehouse or a data mart. Data pipelines clear the path for big data successes.