<< back Page 3 of 5 next >>

Architecting the Modern Enterprise: 10 Key Technologies for a Strong Foundation


The blockchain is used by all participants in the network. It provides an irrevocable record of transactions that is used by all participants in the network. Data is added to the blockchain in an append-only manner; once recorded, data cannot be changed or deleted. And every participant has access to the same data—not an independent, perhaps different, version of the data as would be the case when each organization has its own ledger (database) for the data.

Think of the blockchain as a mechanism that delivers four fundamental capabilities. First, it provides a shared ledger for a distributed system of record shared across a business network. Second, the business terms, or contract, are embedded in the blockchain. Third, it offers privacy, thereby ensuring that only appropriate parties have visibility to the secure, authenticated, and verifiable transactions. And finally, it delivers a trusted account that is endorsed by all relevant participants in the network.

There are many potential benefits that can be gained by using blockchain, including verifiable transactions, a complete chronology of all transactions, and shared access across the network to the same information. For these reasons, it is seen as having strong potential for market segments such as banking and financial services, retailing, healthcare, manufacturing, and logistics.

Hadoop, Spark, and the Data Lake

It is no secret that organizations are storing ever-increasing amounts of data of all types from disparate internal and external sources. As a result, traditional DBMS products are not ideal for storing all of that data.

Much of the data being generated and collected today is unstructured with no fixed schema, and is therefore not suited for relational storage. But the data is useful, particularly for data analysts and data scientists trying to derive patterns and knowledge from the data.

Hadoop does not require any specific data structure or formatting. Any type of data can be used by Hadoop. It deploys a “schema on read” approach which makes it ideal for analytics processing. However, Hadoop can be slow, as it is a batch process by nature.

Additional technologies can be combined with Hadoop to improve its capabilities. Components of the Apache Hadoop ecosystem such as Hive and Impala can be used to add a schema to Hadoop data and enable analysts to process the data in a table format. Spark can be used to speed up Hadoop processing adding in-memory capabilities, as well as libraries for machine learning and graph computation.

Frequently, Hadoop is used to implement data lakes, which are storage repositories that can hold vast amounts of raw data in its native format until needed. However, it is an erroneous implication that data lakes will replace data marts and data warehouses. A data warehouse, as defined by Bill Inmon, is “a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision-making process.” Contrast that with a data lake, where data is captured and stored with no transformation or aggregation. A data warehouse contains data transformed from multiple sources and is designed for business users. A data lake cannot serve the same purpose unless the data is modified from its “native format” and then it stops being a data lake by definition.

At any rate, enterprises need to understand and deploy technologies such as Hadoop and Spark to be able to manage and process the mountains of data that are becoming commonplace.

The Emergence of AI and Machine Learning

Artificial intelligence, or AI, has been touted for many years as the imminent future for computing. This has been the case ever since the 1950s, but a confluence of events is making now the time that many enterprises are adopting AI technologies.

AI is the practice of enabling computers to effectively mimic human thinking and actions. AI technology allows computers to take on activities that heretofore were not within their domain, including things like learning, reasoning and self-correction.

<< back Page 3 of 5 next >>


Newsletters

Subscribe to Big Data Quarterly E-Edition