Theory Means Never Having to Say ‘Performance’

Far too often, someone will start out complaining about why relational theory is a problem, is insufficient, is just plain wrong, whatever. And when the relational-naysayer explains why he holds that opinion, he rambles on about performance. I have actually heard the expression, “I can get better retrieval speed using indexed VSAM files,” to elaborate why relational theory is a “waste of time.”

Theory is the framework, the idea, the attitude brought to the game. And as such, relational theory asks that any “relational tool” allows the user to perceive the data and to manipulate that data as if the data were contained within tables having certain qualities. Relational theory does not require data be physically implemented as “tables.” A “table” is simply an idea, a table has no known quantifiable physical implementation to go along with it. If one tells developers to “implement a table” from scratch, I doubt that any two would create exactly the same physical thing. The implementation is the choice of the implementer. An implementer could use VSAM files as much as they want; however, they are asked to hide those VSAM files from the data user’s coding essentials if they wish to refer to their tool as “relational.” Relational theory seems to be held to a higher standard by some; ballistics theory is not often held accountable for vendors providing bad cannons.

It is fair to say that relational theory is the only solid framework for establishing a rational expression of data that falls anywhere inside the boundaries of formal logic. As people continue to laud the “death of relational” by coming up with one or other “new” physical implementations of coding or data engines, from object-oriented, XML, columnar, or anything else one might name, the primary short-coming is that these are physical implementations that avoid having any formalized logic underpinning them. And as long as that missing-foundational-logic circumstance remains true, even should one use a “non-relational” engine, one should always work through a relational design in order to understand the true nature of the data and the functional dependencies between the items that constitute the universe of discourse. Those functional dependencies will drive the requirements of the code that queries and manipulates the data—regardless of how you choose to physically store it. Data knowledge is important; a relational design comprised via an entity-?relationship diagram is valuable for the business rules about the data that are expressed. Lists of fields within a non-relational design are non-expressive shopping lists and little more.

Can non-relational approaches provide better performance than relational approaches? That is a non-question. Better or worse performance only comes into play once you have a physical implementation to cope with. So to reframe the question more properly, can non-relationally minded tools be coded internally so that they provide better data query and manipulation performance than relationally minded tools? I imagine the answer to that performance issue depends on which tool has the better designers and developers. There are times and circumstances where all sorts of things may turn out to optimize performance. However, should the desire be to establish structures that would generally optimize many differing data query needs, then the chances are that relational approaches might very well be more appropriate than non-relational approaches. Additionally, one could allow that the relationally minded tool developers have the harder task because they have the added burden of trying to follow the tenets of relational theory. The non-relationally minded tool developers are freer to have their implementation go down whatever path they choose. And certainly, many relationally minded tools cut an awful lot of corners, and in doing so, unfortunately, expose their implementations. The bottom line seems to be that trying to create a truly relational user data experience is hard, very hard indeed.