Splice Machine is launching a connector that aims to boost IoT and machine learning applications.
The solution, Apache Spark DataSource, provides a fast, native, ACID-compliant datastore for Spark and also opens up Splice Machine’s underlying Apache Spark engine to directly use advanced capabilities such as Spark SQL, Spark Streaming, and MLlib or R (for machine learning).
The connector enables data engineers, data scientists, and developers to directly use Spark without excessive data transfers in and out of Splice Machine
The connector is now a part of Splice Machine’s community edition, with a simple query example and a streaming example also available on Github.
Apache Zeppelin notebooks with streaming and machine learning examples of the native Spark DataSource are also available on Splice Machine’s Cloud Service.
New functions include the ability to:
- Create Table - create a Splice Machine table from the schema of a Spark DataFrame
- Insert - insert the rows of a DataFrame into a Splice Machine table
- Update - update the rows of a Splice Machine table specified by a DataFrame
- Upsert - update or insert the rows of a Splice Machine table specified by a DataFrame
- Delete - delete the rows of a Splice Machine table specified by a DataFrame
- Query - issue a SQL query and return the result set as a DataFrame
Other features include ACID transactions on all CRUD operations, CRUD operations preserve all ACID properties on secondary indexes automatically, ipdates can update any number of columns simultaneously, and result sets return lazy-evaluated Spark DataFrames with instructions pipelined through Spark’s RDD structures.
Splice Machine provides the Native Spark DataSource API in Java, Scala, and Python.
For more information about this news, visit www.splicemachine.com.