Intuitive, Robust Pipeline Creation on Databricks with Prophecy’s Low-Code Platform

Building robust, reliable, and highly performant data pipelines is critical for ensuring downstream analytics and AI success. Despite this need, many organizations struggle on the pipeline front, often resulting in exhaustive, tedious manual labor that drains company resources.

Mei Long, product manager at Prophecy, and Nathan Tong, sales engineer at Prophecy, joined DBTA’s webinar, Build Data Pipelines on Databricks in 5 Easy Steps, to discuss various practical strategies and techniques to streamline data transformation, foster seamless collaboration between teams, and establish data transformation standards that value reliability and quality.

To produce quality data, you have two options, according to Long—legacy ETL and a cloud franken-stack. Legacy ETL, while having the ability to drive greater productivity, results in vendor lock-in and suffers from not being cloud native. A cloud franken-stack—or a pieced together amalgamation of technologies—benefits code capabilities and overall performance, yet only supports a few users, with productivity taking a massive hit.

Compounding these lack-luster choices, raw data is rarely suitable for immediate consumption. Meaning, data transformation is crucial to building AI- and analytics-ready data products, and without it, enterprises are left with inefficient, fragile data ecosystems.

To empower enterprises with the necessary tech to achieve analytics and AI success, Long introduced Prophecy, the low-code data engineering platform with native-to-cloud execution that spans data pipeline development, deployment, management, and orchestration. Centralizing access, intelligence, and systems layers within a single platform, Prophecy aims to democratize data pipeline generation while imbuing robust standards to ensure the creation of reliable, quality pipelines.

“What we [Prophecy] are very passionate about is to bring these folks [data platform teams, business data teams, data scientists, and data analysts] together in one arena,” explained Long. “All data players work in one arena as teams, rather than everyone off doing their own things, [resulting in] ...a lot of friction, nobody understands what the requirements are, miscommunications, and folks that are duplicating work.”

The Prophecy platform offers the following capabilities that empower data pipeline creation on Databricks:

  • Visual UI with a drag-and-drop interface for building pipelines
  • 100% open, git-committable code that is native to the underlying cloud data platform, enabling DataOps practices and preventing lock-in
  • Framework Builder enables users to add a library of visual components, building standards for data and enabling reuse across data stakeholders
  • Rapid generative AI app creation based on unstructured, enterprise data
  • Prophecy Data Copilot, enabling rapid pipeline generation based on natural language prompts

“What’s really important to us is providing a layer of a low-code, UI interface that’s giving folks the power to build these things consistently with best standards put into practice,” noted Long.

Moving onto the five steps for building data pipelines on Databricks, Long listed the following:

  1. Set up the environment for development and execution.
  2. Read and parse raw data from sources.
  3. Build a visual pipeline to transform data.
  4. Schedule and Run Pipeline Workflows on Databricks.
  5. Commit and merge Spark code using Git.

Tong then led webinar viewers through a live demo of these steps, building a pipeline live on Databricks with Prophecy’s intuitive UI and AI copilot.

For the full discussion of building data pipelines on Databricks with Prophecy, you can view an archived version of the webinar here.