Pivotal Expands on Data Lake Vision with Embrace of Project Tachyon

Building on its data lake vision, Pivotal today announced the next step in this approach with the implementation of an architecture that builds upon disk-based storage with memory-centric processing frameworks. 

Earlier this year, Pivotal and EMC introduced a Data Lake Hadoop solution that combines massively scalable enterprise storage arrays with big data and analytics capabilities. The Data Lake Hadoop solution combines EMC Isilon's scale-out storage, with Pivotal HD (Hadoop Distribution) and Pivotal HAWQ, a massively parallel processing SQL compliant query engine.

Pivotal is now actively dedicating resources to an open source project called Tachyon.  Led by UC Berkeley PhD candidate Haoyuan Li, Tachyon is a memory-centric, fault-tolerant distributed file system that enables data exchange at in-memory speed across cluster frameworks. 

Pivotal is the top corporate contributor to the Tachyon code base, and is also contributing to Tachyon’s development and integration with its Big Data Suite by supporting the project through its Research Fellowship program. 

According to a blog post by Paul M. Davis on the Pivotal site, Pivotal’s view is that the future of the data lake will include an in-memory data exchange platform based on Tachyon and in-memory compute layer augmented by Apache Spark, an engine for large-scale data processing.

Davis says the resulting next-generation date lake implementation based on Spark and Tachyon, is being referred to by Pivotal as a “butterfly architecture.”

“Within this model, Tachyon provides a memory-centric caching layer for disparate data sources, and allows the tracking of data lineage, independent of the computation framework. It will serve as an efficient memory-based data exchange layer within the data lake, and is pluggable, enabling existing storage and processing systems to co-exist with the new framework,” writes Davis.

Davis adds that Pivotal believes that Tachyon will improve how in-memory processing works with file storage, such as HDFS, and that Pivotal partner EMC is looking into integration of Tachyon with the advanced flash storage product DSSD, as well as Isilon technologies

Related Articles

Hadoop heavyweight Pivotal is open sourcing components of its Big Data Suite, including Pivotal HD, HAWQ, Greenplum Database, and GemFire; forming the Open Data Platform (ODP), a new industry foundation along with founding members GE, Hortonworks, IBM, Infosys, SAS, and other big data leaders; and forging a new business and technology partnership with Hortonworks.

Posted February 17, 2015