<< back Page 3 of 6 next >>

Big Data 50—Companies Driving Innovation in 2020

Founded by the original creators of Apache Spark, Delta Lake, and MLflow, Databricks provides an open and unified platform for data engineering, machine learning, and analytics and is used by thousands of organizations worldwide—including Comcast, Condé Nast, and Nationwide.

Incorporating agile software development, DevOps, and lean manufacturing methods into analytics and data management, DataKitchen provides a DataOps platform that enables data and analytics teams to gain value from data and deploy features into production while automating quality.

The company behind a cloud-native NoSQL data platform built on Apache Cassandra, DataStax gives users and organizations, including 450 of the world’s leading enterprises, the freedom to run data in any cloud at a global scale.

A provider of data virtualization, Denodo helps customers gain business agility and ROI by enabling faster and easier access to unified business information for BI, big data analytics, web, cloud integration, single-view applications, and enterprise data services.

Big Data Trailblazer by Ravi Shankar, SVP & Chief Marketing Officer

Despite numerous advances in big data, one fundamental challenge remains: Big data systems failed to become the single repository for all enterprise data, contrary to their hype. Within big data implementations, organizations cannot move data of different types, store it in different formats, and make it accessible to different people within different departments of the same organization. Very often, no single individual has a holistic view of the business, and organizations still spend more time collecting data than they do analyzing it.

To integrate siloed data within big data systems, organizations still rely on ETL processes, which are scripted to move data in scheduled batches, so they cannot deliver data in real time, or accommodate new sources without coding and re-testing. More challenging, these legacy processes cannot support modern data formats, such as streaming IoT data or unstructured data.

Leveraging data virtualization, however, which is a data integration technology that integrates data in real time, without replicating it, companies can establish a logical data fabric that seamlessly draws data across the silos of a big data implementation, knitting them into an integrated view of the data, no matter the kind. Data virtualization can perform all of the necessary transformations on the fly, and can be managed with little or no code. A logical data fabric provides a way to connect to all data without having to collect it.

The award-winning Denodo Platform offers the most advanced data virtualization capabilities available for establishing a logical data fabric to maximize big data investments. Its built-in data catalog provides seamless access to data via a searchable, contextualized interface, and in-memory parallel processing accelerates data access to unparalleled speeds.


Offering a data lake engine for fast querying and a self-service semantic layer operating directly against data lake storage, Dremio eliminates the need to copy and move data to data warehouses or create cubes, aggregation tables, and BI extracts.

Big Data Trailblazer by Billy Bosworth, CEO

DREMIO IS SHATTERING a 30-yearold paradigm that holds virtually every company back—the belief that in order to query data, it needs to be extracted and loaded it into a costly, proprietary data warehouse. We’ve created a new analytics architecture that removes these limitations by pairing cloud data lake storage with a purpose-built data lake engine, accelerating time to insight for data consumers and increasing productivity for data engineers.

We achieve these results in four key ways:

1. Up to 1000x faster BI queries directly on data lake storage means no copying and moving data to proprietary data warehouses and no cubes, aggregation tables or extracts, improving data team productivity.

2. A secure, self-service semantic layer enables data engineers to easily and securely aggregate and provision data from physical sources as virtual data sets, while empowering data analysts to create derived virtual datasets without creating copies of data.

3. An open data lake architecture that completely separates storage and data from compute and enables flexible, future-proof sharing of data in S3 and ADLS with multiple best-of-breed applications and processing engines.

4. A highly-efficient query engine requires up to 90% less infrastructure compared to traditional SQL engines, dramatically shrinking cloud costs while increasing performance and flexibility.

Imagine how much more productivity and business insight your data teams will derive when they can get access to new data sets quickly and query that data interactively. If you’re ready to kick your data lake analytics initiatives into high
gear, please contact us.


With a technology portfolio that includes data modeling, enterprise architecture, business process modeling, and data governance, erwin views data as an asset that must be inventoried, cataloged, protected, and made accessible in the right context so employees can act on it.

Big Data Trailblazer by Adam Famularo, CEO


In these times of great uncertainty and massive disruption, is your enterprise data helping you drive better business

A data-driven approach has never been more valuable to addressing the complex yet foundational questions enterprises
must answer. However, you can’t manage or govern what you can’t see, much less use it to make smart decisions.

Knowing what data you have, where it lives and where it came from is complicated. The lack of visibility and control, as well as difficulties with legacy architectures, means organizations spend more time trying to find the data they need rather than using it to produce meaningful outcomes.

With erwin, you can stop wasting time discovering where your data is and start using it to produce real value. And
because we automate many of the associated processes, both errors and reliance on technical resources are reduced while the speed and quality of the data pipeline increases.

The erwin EDGE platform integrates enterprise modeling and data governance/intelligence so customers can:

  • Understand their business, technology and data architectures and the relationships between them
  • Create and automate a curated enterprise data catalog, complete with physical assets, data models, data movement, data quality and on-demand, end-to-end lineage
  • Increase data literacy with agile, well-governed data preparation and integrated business glossaries and data dictionaries that provide context

Organizations that have their data management, data governance and data intelligence houses in order are much better positioned to respond to challenges and thrive moving forward—because data equals truth, and truth matters.

erwin, Inc.

An early innovator in AI and supplier of semantic graph database technology with expert knowledge in developing and deploying knowledge graph solutions, Franz offers the AllegroGraph semantic graph database and Allegro CL, which provides a Lisp programming environment.

Big Data Trailblazer by Jans Aasman, CEO

AllegroGraph’s FedShard technology underpins flexible AI Knowledge Fabrics

Ubiquitous AI requires a new data model approach that unifies typical enterprise data with knowledge bases such as taxonomies, ontologies, industry terms, and other domain knowledge.

Franz’s Knowledge Graph approach encapsulates a novel Entity-Event Model, natively integrated with domain ontologies and metadata, and dynamic ways of setting the analytics focus on all entities in the system (patient, person, devices, transactions, events, operations, etc.) as prime objects that can be the focus of an analytic (AI, ML, DL) process.

The Entity-Event Data Model utilized by AllegroGraph with FedShard puts core "entities" such as customers, patients, students, or people of interest at the center and then collects several layers of knowledge related to the entity as "events." Events represent activities that transpire in a temporal context.

The rich functional and contextual integration of multi-modal, predictive modeling and artificial intelligence is what distinguishes AllegroGraph as a modern, scalable, enterprise analytic platform. AllegroGraph is the first big temporal Knowledge Graph technology that encapsulates a novel entity-event model natively integrated with domain ontologies and metadata, and dynamic ways of setting the analytics lens on all entities in the system.

Financial institutions, healthcare providers, contact centers, manufacturing firms, government agencies, and other data-driven enterprises that use AllegroGraph gain a holistic, future-proofed Knowledge Graph architecture for big data predictive analytics and machine learning across complex knowledge bases in order to discover deep connections, uncover new patterns, and attain explainable results.

Contact Franz Inc. today to build your AI Knowledge Graph solution.

Franz Inc.

Providing a platform for the creation, delivery, and automated management of analytics at massive scale, GoodData enables companies to embed analytics within their products so customers, partners, and other users can make analytics-driven decisions.

Google Cloud 
Providing a suite of cloud computing resources, Google Cloud has data centers around the globe that allow for a distribution of resources to enable redundancy in case of failure and reduced latency by locating resources closer to clients.

Offering an in-memory computing platform built on Apache Ignite, the GridGain platform is used for application acceleration and as a digital integration hub for real-time data access across data sources and applications.

Big Data Trailblazer by Abe Kleinfeld, President & CEO

We see digital transformation driving rapidly increasing adoption of Digital Integration Hubs, a high-performance version of a Smart Data Hub. Companies use a Digital Integration Hub (DIH) to aggregate data from many data streams, siloed databases, and SaaS applications in a single data access layer that leverages the speed and scalability of in-memory computing. Business applications access the aggregated data using a variety of APIs to drive real-time business processes.

Companies deploy DIHs that aggregate streaming and siloed data for a range of high value use cases. DIHs can significantly decrease API calls to SaaS solutions and operational datastores, reducing costs. They can aggregate data from across the enterprise to create 360-degree customer views that power upsell and cross-sell opportunities, increasing revenue. They can decouple back-end systems of record from front-end business applications, allowing companies to strategically replace back-end systems over time, providing operational flexibility. Once data from across the enterprise is aggregated, new real-time business analytics and machine learning are possible, increasing business insights.

In-memory computing plays a pivotal role for DIHs that drive real-time business processes. Aggregating data from multiple data sources may achieve some business objectives, but in-memory computing can provide the real-time performance and massive scalability necessary to drive high-performance data access layers and more engaging user experiences. As companies transform their businesses and use data from across the enterprise to drive real-time processes, Digital Integration Hubs built on a GridGain® in-memory computing platform will continue to enjoy rapidly increasing adoption.


<< back Page 3 of 6 next >>


Subscribe to Big Data Quarterly E-Edition