Graph databases—databases that store data along with connections to ensure rapid access to all relationships—are emerging as a key component of real-time environments. Jim Webber, chief scientist at Neo4j, has seen a surge of use cases for supply chain applications. “As global businesses slowed production due to COVID-19, we saw a rise in demand from organizations that wanted to implement graph databases into their supply chain strategies and ensure business continuity.” The advantage gained using graph databases, Webber added, is that they allow arbitrary associativity numbers and types of relationships—rather than just joining tables on keys. “Since graphs are a more useful data structure than tables, they are better equipped to process connections in real time.” The biggest challenge with moving to graph databases, he pointed out, is “unlearning everything we have learned about relational databases and accepting that graphs are different.”
Vectorized databases—often seen as columnar databases that employ vectorized data to more rapidly and efficiently utilize CPU cache—are seeing increased demand, as they “deliver next-generation performance on behavioral data from sensors and machines,” said Nima Negahban, chief technology officer and co-founder at Kinetica.
Vectorized databases “offer orders of magnitude performance improvements on common big data analytic workloads like aggregations, predicate joins, equijoins, derived columns, window functions, graph solvers, and certain GIS [geographic information system] functions,” Negahban said. “Traditional databases have their internal data structures and their query processing logic designed for using as little compute as possible to process a given query or accept a mutation. This works well for problems where the actual amount of data that needs to be analyzed can be minimized using indexes. However, modern data-driven decision making requires the ability to aggregate, filter, and sort large amounts of data, which does not lend itself well to traditional indexing techniques.”
While many breeds of next-generation databases—graph databases, vectorized databases, time series databases, and streaming databases—are seen as must-haves for 2021 and beyond, the tried-and-true relational database management systems will continue to play a critical role in data environments. “Even with the growing popularity of these next-gen technologies, the traditional relational database management systems will still hold sway in the database marketplace,” said Sri Raghavan, data science and advanced analytics product marketing for Teradata.
Still, with a wider variety of database types available, data managers can be selective in their deployment choices, Raghavan said. “Most enterprises today are strong on RDBMSs, with a number of them also having other database types such as graph or NoSQL,” he said. “In fact, applications developed on one are also in some cases supported by the other.” However, he continued, while there is some compatibility among database types, there are specific tasks that one will be better suited for than another. “NoSQL and graph databases make it easy to retrieve vast volumes of data that are analyzed natively and are then made available for access to a wide range of third-party tools for further access and analysis. While this is possible with RDBMSs too, the ease of retrieval, analytics, and sharing across a wide ecosystem of tools is better with next-generation solutions.”
It’s also important to note that commodity hardware is more compatible with NoSQL databases than RDBMS alternatives, Raghavan cautioned. RDBMS environments “typically need purpose-built, optimized hardware for data access and storage. NoSQL databases are designed for access and expansion across multiple cheap, commodity servers.” In addition, “NoSQL databases are more developer-friendly; they are more amenable to frequent code changes and modifications that can be done across shorter development sprints.”
2021 AND BEYOND
While the RDBMS will continue to play an important role for many organizations, clearly, the one-size-fits-all approach of years past is well behind us. A new breed of data systems is taking on more workloads and, as the 2020s progress, we are likely to see additional data management technologies and techniques come to the fore, allowing for more immediate and ubiquitous data access and enabling companies to truly compete on data.