Alluxio, developer of open source cloud data orchestration software, has introduced Alluxio Structured Data Service (SDS) featuring a data Catalog Service and Transformation Service, two new major architectural components of its Data Orchestration Platform. This will enable engineers, architects and developers can now spend fewer resources on storing data and more time delivering data to analytical compute engines.
According to Alluxio, as users and enterprises leverage widely-available analytics engines such as Presto, Apache Spark SQL or Apache Hive, they often run into inefficient data formats and face performance challenges. Typically, those engines consume structured data in different databases with “tables” consisting of “rows” and “columns”, rather than “offset” and “length” in files or objects. This gap creates multiple challenges and inefficiencies, such as mappings or creating converted copies of the data. With this announcement, users benefit from a more simplified data platform that enables connections to different catalogs for access to structured data, with less copies and pipelines and more compute-optimized data.
With the new components, Alluxio now provides just-in-time data transform of data to be compute-optimized, independent of the storage format for OLAP engines, such as Presto and Apache Spark, said Haoyuan Li, founder and CTO, Alluxio. “These schema-aware optimizations are made possible with the new Alluxio Catalog Service which abstracts the widely-used Apache Hive Metastore, so regardless of how the data was initially stored—CSV and text formatted files, for example—the data is now transformed into the generally recognized compute-optimized parquet format. Almost every organization has a surprising amount of data in CSV or other text formats and this removes the manual work to make that data more usable. A second type of transformation will coalesce many smaller files, enabling the data to be combined into fewer files, which is more efficient to process for SQL engines. And yet a third type of transformation is for sorting, enabling table columns to be sorted, adding to the efficiency of queries, newly available in our Enterprise Edition.”
Alluxio 2.2 Community and Enterprise Edition with Structured Data Service are generally available for download at www.alluxio.io/download.