StreamSets Raises $12.5 Million in Series A Funding to Solve Big Data Ingest Problem for Better Analytics

StreamSets Inc., a company that aims to speed access to enterprise big data, has closed a $12.5 million round of Series A funding co-led by Battery Ventures and New Enterprise Associates (NEA), with participation from Accel Partners and Ignition Partners. In addition, StreamSets also launched a new data ingest infrastructure, called StreamSets Data Collector, which helps businesses accelerate data analysis and decision-making.

Available under an open source Apache license (ALv2), this technology automates data movement in order to give data scientists and analysts continuous access to big data.

According to the vendor, it solves the problem faced by operators who are spending too much of their time sanitizing raw data before it can be used to inform business decisions due to constantly changing infrastructure and semantics, which slows down the process of collecting and moving data so it can be used for reliable analytics. Calling this problem “data drift,” StreamSets ingests, cleanses and monitors data in motion to address this challenge and fuel real-time analysis.

“There is a massive opportunity for StreamSets’ technology to bring world-class transparency and monitoring to data -- the next generation of performance management in enterprise IT,” added Pete Sonsini, general partner at NEA.

StreamSets co-founder and CEO Girish Pancha was previously chief product officer at Informatica, where he was responsible for the company’s entire data integration product portfolio. Co-founder Arvind Prabhakar was an early employee of Cloudera, where he led teams working on integration technologies such as Apache Flume and Apache Sqoop. A member of the Apache Software Foundation, Arvind is heavily involved in the open-source community, and was formerly an architect for the Informatica platform.

“Over the years, Arvind and I have seen first-hand that the single biggest barrier to a successful enterprise analytics platform is the challenge of ingesting data. That problem is exacerbated when the data is constantly shifting underfoot,” said Girish Pancha, StreamSets co-founder and CEO. “Current solutions are simply too opaque and brittle to handle a fluid data landscape. We were inspired to start over from the ground up and bring unprecedented transparency and event processing to data in motion.”

Data infrastructure teams can download the open source StreamSets Data Collector software and join the community at, or purchase a commercial subscription license for development or production support. StreamSets will use its Series A funding to build a thriving open source community, advance the company’s product roadmap, and incrementally invest in partnerships and other go-to-market activities. In addition, Pete Sonsini from NEA and Dharmesh Thakker from Battery Ventures will join the company’s board of directors.

StreamSets is headquartered in San Francisco. For more information, visit

Image courtesy of Shutterstock.


Subscribe to Big Data Quarterly E-Edition