Things are simply not the way your grandfather’s data warehouse and dependent data marts were anymore; they are now more layered and complex. With the rise of big data approaches and the new suite of database/data storage platforms, more choices exist for implementing analytics solutions than ever before.
While relational database management systems are still the workaday workhorse, we are now adding into the mix document, columnar, and graph datastores, and their variants. Each datastore has something at which it excels, and other things it may not. Similarly, the rules followed in composing data structures, based on the platforms selected, also vary greatly.
A typical example set of logical data objects could be an Account, a Person, and an Address. In a relational database we could implement these as individual tables, where a Person has one-to-many Account(s), and an Account has one-to-many Address(es). A graph database might have these same objects, but the difference would be in how the physical navigation occurs. A graph database would support a more transactional approach, starting with a single Person, then grabbing that person’s Account(s) and that person’s Address(es).
For the right circumstances, the graph choice might be able to fly through those connections using its internal pointers to link directly each related segment of data. When using a graph approach, the architect establishes the structures and links based on the exact queries to be used. Those connections are optimized for the expected use. Any alteration in use may require altering the database structures. These graph structures most likely will not be optimal for handling queries doing millions of joins across ALL Account(s), Address(es), and Person(s). Both relational and graph approaches could be considered transactional based, in that the intent is to optimize data structures for the needs of a defined transaction.
A document database might have JSON tags for Person, Account, Address, and all the sub-elements within each; but it is likely that all three of these objects would be contained within one of the three tags, so then the entirety of these three objects would comprise a single document. Depending on the perspective of the developer, Account or Person could become the “base” document to be processed. This approach, of having all related data elements together in this way, could be said to have data bounded in the aggregate, rather than bounded as a transaction. L
ike the graph database, document databases likely will not perform well in doing millions of joins bringing together multiple types of documents. When millions of documents are processed, the expectation is that everything needed for an individual instance exists within the single document. Now, if a document database is used for looking up a single document, then a join into another single document could be handled very well. Columnar datastores are very similar to documents, but things are broken up under more hierarchical or business domain boundaries.
In many circumstances, both transactional needs and aggregate needs can be handled very well by many relational solutions. But certainly circumstances do exist that push the limits of a relational solution; and opting for an alternative, be it graph-based or document-based or columnar based, can be a fine choice.
Architects must balance the nature of the queries to be run, as well as identify which style of data store might best optimize those queries. Even then, how the structures are designed and put together will change from one choice to another. More so than ever before, there is often no one-size-fits-all enterprise solution. The enterprise has many data needs that will require several different approaches to work in harmony. This does not mean that anything and everything goes. Proper architecture requires prudence to be used, along with a parsimonious eye. Therefore, have as many options as you need and no more.