Into the Open – Splice Machine Launches Community Edition

By Joyce Wells

Jul 18, 2016

Companies have long developed enterprise versions of software based on or augmenting open source offerings to beef them up with additional features not available in the community editions. Splice Machine, which provides a hybrid in-memory RDBMS powered by Hadoop and Spark, is flipping that model. In conjunction with the general availability of its V2.0 release, the company recently announced the availability of an open source standalone and cluster download and the launch of its developer community site. The company also announced a cloud-based sandbox, powered by AWS, for developers who want to put its new open source 2.0 community edition to the test.

Monte Zweben, CEO and co-founder of the company that was founded in 2012, spoke with Big Data Quarterly about why Splice Machine is rolling out an open source Community Edition - and why it is doing so now.

Why is Splice Machine launching an open source edition in addition to its licensed enterprise edition?

Monte Zweben: It benefits us and benefits our customers. The clear reason to do open source as a software company is adoption. The difference between adoption of an open source project and a proprietary software package is profound; it is orders of magnitude.

How is it different?

MZ: Think of a sophisticated developer who is trying to build a new application and working to get acquainted with the technologies that are out there and get up to speed. Before, a developer could try it, download a version that is a standalone product and get to use it on their laptop and have a great experience. But - if they were building a big data application and needed to see this at scale, they would need a cluster version of this, meaning a version of the database that doesn’t work on a single laptop but works on a cluster of nodes whether that is 4 nodes, 8 nodes, or 50 nodes - depending on what they want to experiment with.

And, what happens next after they download the trial version is that a a sales team member calls them, and, using tried and true sales tactics for qualifying the opportunity, asks the developer a lot of questions about what they are trying to do: when the project is kicking off, what is the decision making that is going into which database to use, who has decision making power, and what is the budget. The technical developer who is just trying to get up to speed, shuts down and says that when it gets to be real they will be happy to answer the questions – and for Splice Machine, it is a lost opportunity to gain an advocate and build a community.

And, with the open source community version?

MZ: With the new open source version, if a developer wants to scale up, they can access a sandbox on our site, put in a few different configuration parameters, and light up any size cluster they want of Splice Machine at scale and try it. No sales person will speak with them and they will even be able to look at the source code for that sandbox that they are executing on and see how it works. That is a fundamentally different experience that is inverting the rules of engagement with developers. And, you can see why the adoption is going to be radically different for us as an open source company.

And the other reason you are doing this?

MZ: We are being compelled to by customers. One of the CTOs at a large bank that is a customer of ours was incredibly supportive of open source because it is an insurance policy for them. It allows them to not be locked into a particular vendor. It also allows them to gain resources from a variety of places, get answers to questions, and it mitigates their risk because often technology companies are acquired or get distracted by other opportunities at their customers. The existence of a vibrant open source community around Splice Machine gives them a great deal of comfort that going with this technology has a persistence beyond Splice Machine in the event that the company were to suddenly go down a different path.

Everybody thought that enterprises embrace open source because it is free but that is not it whatsoever; it is because of this insurance policy and risk mitigation.

Why now? Why didn’t you do this earlier?

MZ: That has to do with critical mass. I don’t think we were mature enough to manage a multi-disciplinary multi-company project before. Most projects start within one organization. Sometimes it might be a company or it might be a university. Spark started at UC Berkeley, Kafka started at LinkedIn. We needed to get to critical mass. With our 2.0 product finally reaching critical mass in terms of its maturity, we have the wherewithal to be able to manage an open source project with a constituency of people around Splice Machine - and not just in Splice Machine.

How important is the development of a community?

MZ: It is a very important focus for us and very important investment for us.

One of the things that I have been coached on as I seek assistance in our open sourcing efforts from leaders in the Hbase, Kafka, Spark, and Hadoop communities is that community doesn’t just emerge because you make your source code available. You have to invest in building community. I took that to heart and we are investing a great deal of our developer resources across the company in building a community engagement model and that is three-fold.

What are the tenets of your approach?

MZ: First, we will have a community site with a robust set of videos, tutorials, and code snippets that will embrace the community and teach them how to use Splice Machine for each of the programming languages. Or, if you are data scientist and you want to see how to use a machine-learning algorithm associated with data that is gathered in Splice Machine, you can learn how you can do feature engineering, and how to run analytics.

We will actually show snippets of code running spark.ml live - and in the future even Google TensorFlow - inside the community site. And, of course, if you are an IoT developer and you have got a streaming set of inputs from devices, how can you take streaming interfaces like Apache Kafka and Spark Streaming and be able to take data from dozens if not thousands of devices and continuously process the streamed input and build a modern Lambda architecture in Splice Machine - all that will be available.

We are inverting the transparency of the company in regard to not just the source code, but also how the source code is being used.

What is the second part?

MZ: We are actively engaging on Stack Overflow and Slack and providing forums for our own customers and our own community to ask us questions. We have embraced that to the point where we have asked every developer in the organization to organize themselves in a way to monitor these forums and be responsive to the community as best they can. And we are actually planning on them investing about 20% of their time on this. We think that is necessary to really build community.

And the third?

MZ: We have engaged with the mentors and champions of many of the projects in the space today - leaders of the open source community -- to mentor and champion us so that when we do apply to the Apache Software Foundation as an official Apache Project, we have got our act together.

Those are the things we are doing to build community. We are going to really embrace community, be open and even have people outside our organization be on our program management committee – PMC. We are really going to try to inculcate a DNA of openness.

In addition to the new free community edition, Splice Machine continues to offer a licensed enterprise edition.

Splice Machine is hosting a webinar on Thursday, July 28, to share information about its new offerings.

This interview has been edited and condensed. Image is courtesy of Shutterstock.