<< back Page 3 of 3

Tooling Up for Analytics

By Joyce Wells

Sep 18, 2018

These technologies, and Kubernetes and Docker, in particular, allow applications to be packaged in ways that make them very transportable across clouds and easily able to scale elastically on a cloud platform, noted Guy Harrison, CTO of Southbank Software. “Microservices are the modern equivalent of a modular code architecture since they allow loose coupling between cloud-based application components.”

Containers provide a strong model of isolation, separating the software operating environment from the environment it is physically deployed within, added Jim Scott, director, enterprise strategy and architecture, MapR. “This creates a substantial value proposition for those who want to run software in more than one location.” In the cloud, where companies don’t know what hardware is under the covers, containers simplify the deployment and movement of software applications from on premise to the cloud. “This is a critical requirement when running cloud, multi-cloud, hybrid cloud/on-premise, or even multiple on-premise environments.” Containers, Scott added, “are great to use with microservices” because they are lightweight and can be physically isolated from one another.

Data Governance in the GDPR Era

Data governance and data security initiatives are getting heightened attention with the recent implementation of the General Data Protection Regulation (GDPR) in the European Union; the California Consumer Privacy Act of 2018, which takes effect in 2020; and an assortment of other regulations surrounding handling of data.

While most organizations still view governance as a cost of doing business that slows them down, some have realized that governance requirements are forcing them to gather data and create views that no individual part of the business would do on their own, noted Joe Pasqua, EVP for MarkLogic. “Doing it right means collecting the data as-is—not transforming away value or dropping things on the floor—harmonizing it incrementally, and building business-level microservices to access it. This can result in a transformative tool for the business and deals with governance requirements along the way.”

Since GDPR was put into effect in May 2018, and even in the months leading up to the compliance deadline, there has been an increased awareness of data governance and the risks to data security, observed Hortonworks’ Clinton. “Putting data into the hands of the customers who request it allows for a never-before-seen sense of control and efficiency in which businesses are forced to improve their security protocol—reducing data loss and improving peace of mind. In addition, data governance enables IT organizations to provide trusted data to business consumers, including analysts, data scientists, and others.

The Rise of Graph DBs and Blockchain

Looking to the future, several data management technologies are poised to make an impact. “Graph databases are finally getting their day in the sun, in large part due to the challenges of data lake management,” said Radiant Advisors’ O’Brien. “These graph database implementations are well-suited to highlight relationships and store context for objects captured in data that are related to each other in a variety of ways.” Because graph database engines are easy to use, highly scalable, and fast-performing, they enhance and augment data lake management, semantic layers, and governance, while automated metadata acquisition and self-service metadata collection allow for anything to be connected to anything easily, he said.

In addition, blockchain, the distributed ledger technology, offers “vast opportunities” in enterprise environments, with the promise to fundamentally transform how business is being done by making business-to-business interactions more secure, transparent, and efficient, said Frank Xiong, group VP of product development at Oracle. “It allows enterprises to extend their boundary to reach out to their suppliers, business partners, distributors, and end customers and to carry out operations transactions in automated way.” Internally, a large enterprise can use blockchain technology to further integrate and automate processes to streamline operations and improve productivity, he explained.

However, although blockchain may ultimately transform society, noted Southbank’s Harrison, “In the short term, there are not that many applications
crying out for blockchain enablement. Private blockchains are commonly being deployed in B2B scenarios such as supply chains where multiple businesses need to coordinate. Public blockchains—
Ethereum, for instance—are currently almost unused in an enterprise scenario.”

The industry has yet to see blockchain-enabled frameworks to emerge that will provide “killer use cases,” said Harrison. “For instance, I think that eventually companies that want to assert any sort of compliance, proof of payments—or anything else that might have legal implications—will do so on the blockchain. But for now, there isn’t an easy way to do that—we need application frameworks to emerge that bake that capability in.”

As enterprises evaluate potential blockchain solutions, it’s important that they consider the tool sets offered that can assist in building new blockchain applications and interoperate with other blockchain networks and existing applications, said Oracle’s Xiong.

Key Open Source Projects

Three Apache projects are seen as being on a strong upward trajectory for a variety of reasons.

Apache Arrow has expanded in popularity for in-memory columnar data processing, said Stirman, whose Dremio platform is based on Arrow. In addition, he said, the Gandiva Initiative, a new execution kernel for Arrow that will “dramatically accelerate” processing of Arrow data, was announced in June. In Apache Arrow Flight, an RPC [remote procedure call] for Arrow, was also recently announced to provide a modern alternative to ODBC/JDBC and allow systems to exchange data more efficiently.

“Kafka seems to be the default open source choice for developing integration pipelines,” noted Harrison, while MarkLogic’s Pasqua pointed out that—with the proliferation of data silos in the enterprise and the ascent of the data hub model—Apache NiFi has emerged as a way to easily route data from the silos into the hub.

When data is coming from many different systems into a central data hub, organizations need to know where it came from, when, and using what processes, and NiFi helps to capture that flow of information, said Pasqua. In the new more stringent regulatory environment, “the Wild West days” of data lakes just don’t cut it, Pasqua concluded.

<< back Page 3 of 3

Tooling Up for Analytics

Data Governance in the GDPR Era

The Rise of Graph DBs and Blockchain

Key Open Source Projects

Newsletters

Recent Big Data Quarterly Issues

White Papers

Webinars