Many businesses seem to believe that dimension tables and reference tables are indistinguishable objects. Apparently, the only difference worthy of note seems to be altering the object's name from "something reference," or "something xref," to "something dimension." As these organizations build multidimensional data marts, they often place a view on top of their reference table and feel good about how quickly and efficiently they complete their data mart.
Posted March 02, 2017
At one of my very first jobs in the service industry, I had a boss who had a variation on an old adage. He would say, "The customer is not always right. In fact many times the customer is dead wrong; but the customer is always the customer." In a similar way, in data modeling, business users and internal IT users, make the world go around. If they don't use the models that are designed, or worse, if they cannot relate to the designed data models, then what value has been delivered?
Posted February 08, 2017
There was a time when what you saw was what you got. Building up the components of a business intelligence area was very straight-forward. A staging area was a staging area; an operational data store was an operational data store. But like buying a pitcher of beer for $2, or gas for less than a dollar per gallon, those days are gone. The dynamics have changed, things are more federated, and IT must accept more than one standard tool.
Posted January 03, 2017
Terms such as "active," "inactive," and "canceled" may seem mundane and inconsequential, and when folks hear the term,"valid values," their eyes glaze and expectations of interest diminish. But exciting or not, reference values and an understanding of them are important to every organization.
Posted December 01, 2016
Why is there such pervasive abuse and misuse of database views today? Views are a helpful tool in building a business intelligence environment, yet many organizations establish practices that not only rob views of their full usefulness but present patterns that actually confuse issues instead.
Posted November 02, 2016
One symptom of an organization in the middle of a knowledge vacuum is evidenced by SQL that often includes what appears to be extravagant usage of the GROUP BY clause. Writing GROUP BYs here, there, and everywhere becomes a little SQL development dance step, a jitterbug to bypass the issue—moving but not really getting anywhere. Why do these kinds of circumstances exist? Well, maybe the only expert on the involved system has retired and no one else has picked up the torch, so no one is willing to touch the code.
Posted October 07, 2016
Programming is a literal sport. Code does exactly what it is configured to do, no compromises. When the definition of a task is fuzzy, it is up to the developer to do what they believe is correct. Does the code reflect what is desired? That answer is left open to interpretation. Sadly, developers may not have a clear understanding, and even the users requesting the solution may not be sure. The results can be very painful for an organization. Expectations may not align with the delivered solutions. Users will blame IT; IT will blame users.
Posted September 02, 2016
As one works through the normal forms, be it a journey to the placid shores of third normal, or the more arid climes of Boyce-Codd normal form, or even the cloudy peaks of fourth normal and beyond—and before one starts thinking about normalizing the design—the database designer has covered a lot of ground work already. Before thinking of normalizing, one needs to have conceptualized the relations that might be within the solution's scope.
Posted August 04, 2016
In the data warehousing arena, development databases often get short shrift. Frequently, this situation arises because development databases are considered too much work to be done properly. So, instead of embracing the problem and following through on the necessary work, it is ignored or done poorly. One could almost say that ignoring the development database has become the standard practice.
Posted July 12, 2016
Many of the NoSQL tools out there, such as MongoDB, Couchbase, Hadoop, and others, purport to be leading a revolution and breaking the bonds of servitude to the restrictive, inflexible, established, relational market. They claim users need more, users need better … and they are there to help. Of course, when speaking about those relational flaws, the comments always focus on problematic aspects of a DBMS' physical implementation.
Posted June 09, 2016
Not too long ago, IT embraced the pattern language concepts of Christopher Alexander. Being an architect, of the more traditional variety, his ideas were based on creating spaces in which people felt good, even if they didn't comprehend exactly why. Architected spaces need to express multiple qualities that include being alive, whole, comforting, free, exact, egoless, and eternal. The more those qualities were embodied, the better people responded to the desirability of the space.
Posted May 04, 2016
The traditional information engineering approach advocates the placement of as much business logic as possible inside the database management system (DBMS). But, more recently, under the umbrella of service-oriented architecture (SOA), folks are arguing for placement of that business logic in a layer of code outside the DBMS. Occasionally, those who favor locating business logic outside the DBMS have even gone so far as to say that this logic "naturally" belongs in a non-DBMS-supported layer.
Posted March 31, 2016
When it comes to operational, third normal form approaches, too many database models today have very low levels of business veracity. This is because current IT trends work against incorporating too much truth inside the entity-relationship diagram. Instead, development teams center on the finished product and excessively worry about describing a solution rather than describing the actual organization.
Posted March 03, 2016
Many people claim to understand the basics of good data and database hygiene. Often, these same people claim it is all very simple and very obvious. However, when peering into existing code and databases, it doesn't seem that good practices are as obvious as people say. "A2" as the name of a column may sound ridiculous, but it has happened.
Posted February 10, 2016
Data modelers must look at the big picture of an organization's data ecosystem to ensure additions and changes fit in properly. Simultaneously, each data modeler must be focused on the minute details, adhering to naming standards, domain rules, data type practices, still remaining ever vigilant for instilling consistency across everything they do. And while focused on all of the above, their efforts must culminate in a practical model that serves the individual project's requirements while also being implementable, maintainable, and extensible.
Posted January 07, 2016
Our data models, in reflecting a specific business, must accurately portray the essence of each business. The unique reality within each organization drives the shape of every data model. The logical meaning of each data element originates with what is actually done and how it is accomplished within that particular organization.
Posted December 02, 2015
Many neophyte data modelers have trouble distinguishing between logical and physical data models. These novices likely cannot explain why each model exists, or the differences expressed between them. Sadly, such confusion also exists in the realm of the experienced data modeler. Not to say an experienced data modeler can't express the difference between a logical and a physical data model; but across a group of experienced data modelers one would not get a consistent answer.
Posted November 09, 2015
The Agile methodology is great for getting turgid development teams to start working faster and more coherently. With Agile, which focuses on more rapid, incremental deliverables and cross-departmental collaboration, the bureaucratic plaque is flushed from the information technology groups' arteries. But there is a dark side to Agile approaches.
Posted October 07, 2015
Data modelers face a choice when encountering multiple variations of a data item. Designers must focus on the longer term appropriateness of their decisions when choosing how their designs will play out; and going vertical or horizontal does have an impact over time.
Posted September 09, 2015
There are fundamental differences in priorities and approaches in building operational solutions and in constructing analytical supporting solutions. However, these differences can produce a gap in understanding that occasionally results in minor battles and skirmishes between these groups along the path of getting work done. A critical area of conflict is access to production data. When building an operationally focused solution such as a website, the application is supreme.
Posted August 10, 2015
Data governance is a touchy subject. Many organizations have tried to build a practice…and failed. Other organizations feel too daunted to even try. IT in particular has a very conflicted relationship with data governance.
Posted July 08, 2015
Petabytes, and more, of data are meaningless without guides to what it all is. Future data users need to know where it came from, how it was filtered, and how it was transformed. Without some functional level of governance we are headed toward a "datapocalypse"—having lots of content but little of value.
Posted June 09, 2015
The objects and relationships identified within any database design need to match with the very subjective realities considered true within a given organization. An optimal data model must incorporate the semantics of the business and provide for a structural arrangement supporting the necessary corporate instrumentality. And that model should do so, even if in the details things within the model are contrary to what other organizations may consider to be the nature of their reality.
Posted May 14, 2015
How does an organization acknowledge that data is important? An organization does so by enabling and supporting efforts for gathering and persisting information about the organization's data resources.
Posted April 06, 2015
In information services, providing a solution that removes a user's pain can be a great boost to one's psyche. Being the hero, wins trust, and may even enhance job security. However, organizationally, too many sideline heroics create chaos. The day after the crisis has passed, things are left in a less than wonderful state.
Posted March 12, 2015
In the early days of data warehousing, lines were very simple. The data warehouse reflected the business. In providing this reflection, data was summarized, data was cleansed, data was standardized, and data was even drastically reformatted for legibility and reporting usage. But, the big rule of thumb, the never-to-be-crossed line, was that the data warehouse did not create new data.
Posted February 11, 2015
Many designers seem to approach a database design as if there is a single objective reality that circumscribes all truth and that happens to align with exactly how they see the world. Honestly, is it expected that there is an absolute truth for everything? And that single absolute truth goes from the highest level of abstraction to the lowest level of detail?
Posted January 07, 2015
Shunryu Suzuki, in his classic book, Zen Mind, Beginner's Mind, tells us that in the beginner's mind there are many possibilities, but in the expert's mind there are few. Both a beginner's and an expert's perspective have value, but bias toward either an expert or a beginner approach seems to consume an organization far too often.
Posted December 03, 2014
Big data tool vendors try to downplay the notion that data warehouses and data marts still need to exist, even in a big data world. Relational DBMSs are painted as "old-fashioned," "yesterday," and "inadequate." They beckon potential customers to take a dip in the refreshing data lake. The fact that big data, in all of its glory, is only part of a larger business intelligence solution is getting lost in the dialog.
Posted November 12, 2014
It is fair to say that relational theory is the only solid framework for establishing a rational expression of data that falls anywhere inside the boundaries of formal logic. As people continue to laud the "death of relational" by coming up with one or other "new" physical implementations of coding or data engines, from object-oriented, XML, columnar, or anything else one might name, the primary short-coming is that these are physical implementations that avoid having any formalized logic underpinning them.
Posted October 08, 2014
"On occasion, the inmates actually do take over the asylum. Similarly, business personnel can be defeated by their IT peers," writes Todd Schraml. Although some might argue that such an unexpected role reversal cannot happen, it may indeed occur when data architects skilled in their craft can identify the options that help achieve the business goals, versus those alternatives that stifle the business goals. "A good data architect must also be a very good business analyst," he notes.
Posted September 10, 2014
A healthy data warehouse environment can self-correct to accomplish the right things. And that self-correction is crucial, for no process is likely to be perfect from the start.
Posted August 05, 2014
Commitment to the creation of a business intelligence environment means embracing a well-measured and thoughtful allocation of resources with the right skills, the right tools, the right scope, and the right timelines
Posted July 03, 2014
How does one tame the data beast to start making tangible progress toward an organization's business intelligence goals? For any progress to occur on the business intelligence-side, tasks, projects and goals need to be fashioned into small steps along a much larger path.
Posted June 11, 2014
Finding the means to harmonize conflicting processing requirements is where a design comes alive. When done well, providing symmetry to a suite of processes greatly empowers those who develop and maintain those processes; changes and enhancement arise seemingly in the blink of an eye. In ETL, there are three key principles to driving exceptional design.
Posted May 08, 2014
Denormalization of a database design is a legitimate tool, when used appropriately. However, it is simply sad and sometimes even silly to see designs denormalized as a matter of course.
Posted April 04, 2014
Historical content within data structures increases their complexity. Making complexity "easy" for the initial developers and "harder" for the later users is a short-sighted approach.
Posted March 12, 2014
In many organizations, users find it hard to trust their own internal information technology (IT) group, leading them to try any possible option to solve problems own their own. The resulting stealth IT projects can lead to confusion or even complete chaos.
Posted February 10, 2014
Changes and enhancement to solutions are hard, even under the best of circumstances. It is not usual that, as operational changes roll out into production, the business intelligence area is left uninformed, suggesting that data warehouses and business intelligence be categorized according to the view of the old comedian Rodney Dangerfield because they both "get no respect."
Posted January 07, 2014
In the world of database products, trends seem to pop up quickly, like prairie dogs testing the air. However, regardless of the marketing jargon, or platform, vendor-product purchased, or open source utility downloaded, the one thing that remains unaltered is that in order to extract value from data, the data must be understood. The individuals within any organization who actually comprehend the data, the data structures, and all the exceptions to the usual rules are individuals who are considered critical resources.
Posted December 04, 2013
Changes to database structures should be performed in a coordinated fashion as the application processes that support the new functionality are rolled out into production. While the "work" involved in adding a column or a table to a relational database is actually minimal, often there are circumstances where developers and DBAs create additional columns and additional tables in anticipation of future needs. Sadly, this "proactive" effort results in databases littered with half-formed ideas, fits-and-starts, and scattered-about columns and tables that provide no meaningful content.
Posted November 13, 2013
How does one avoid the semantically wishy-washy use of NULL-surrogates and instead, actually design structures wherein NULLs are not necessary?
Posted October 09, 2013
Database management systems support numerous unique date and time functions - and while the date-related functions are many, they do not go far enough. One date-driven circumstance often encountered has to do with objects having a type of date range that needs to be associated with it. While there are some exceptions, this date range need generally ends up implemented via two distinct date columns—one signaling the "start" and the other designating the "end." Maybe, should the creative juices of DBMS builders' flow, such things as numeric-range-datatypes could be created in addition to a date-range data-type. Who knows where things could end up?
Posted September 11, 2013
Data models attempt to express the business rules of an organization. A good data model reflects the semantics used within an organization to such an extent that business people within that organization can relate to and easily agree with what is being expressed. In this regard the data modeler's goal is to properly mirror back the organization's concepts onto those people within the organization. The goal is not to force an organization into a "standard" data model, nor is the goal to abstract everything in the creation of a master model that will never need to change even if the business rules were drastically shifted.
Posted August 07, 2013
One of the principles within relational theory is that each entity's row or tuple be uniquely identifiable. This means the defined structure includes some combination of attributes whose populated values serve to identify an individual row within the table/relation. This, or these, attribute(s) are the candidate key(s) for the structure. The candidate key is also known as the primary key, or if a structure has multiple candidate keys, then one of them is designated as the primary key. When building up a logical design, primary keys should be identified by the actual data points in play.
Posted July 09, 2013
The grain of a fact table is derived by the dimensions with which the fact is associated. For example, should a fact have associations with a Day dimension, a Location dimension, a Customer dimension, and a Product dimension, then the usual assumption would be for the fact to be described as being at a "by Day," "by Location," "by Customer," "by Product" metrics level. Evidence of this specific level of granularity for the fact table is seen by the primary key of the fact being the composite of the Day dimension key, Location dimension key, Customer dimension key, and Product dimension key. However, this granularity and these relationships are easily disrupted.
Posted June 13, 2013
It seems that juggling is the most useful of all skills when embarking on a data warehousing project. During the discovery and analysis phase, the workload grows insanely large, like some mutant science fiction monster. Pressures to deliver can encourage rampant corner-cutting to move quickly, while the need to provide value urges caution in order not to throw out the proverbial baby with the bath water as the project speeds along. Change data capture is one area that is a glaring example of the necessary juggling and balancing.
Posted May 09, 2013
Dimensions are the workhorses of a multidimensional design. They are used to manage the numeric content being analyzed. It is through the use of dimensions that the metrics can be sliced, diced, drilled-down, filtered and sorted. Many people relate to dimensions by thinking of them as reference tables. Such thoughts aren't exactly accurate. A dimension groups together the textual/descriptor columns within a rationalized business category. Therefore, much of the content coming from relational tables may be sourced from reference tables, but the relationship between each source reference table and the targeted dimension is unlikely to be one-for-one. These grouped-format dimensions often contain one or more hierarchies of related data items used within the OLAP queries supported by the structures.
Posted April 10, 2013
Do not allow well-meaning but confused proponents to obscure concepts related to normalization and dimensional design. Under a normalized approach one usually would not expect for numeric data items and textual data items to fall into different logical relations when connected to the same entity object. Yet within a multidimensional approach that is exactly what happens. Multidimensional design and normal design are not the same, and one should not expect to claim that both approaches were used and that they resulted in the same data model.
Posted March 14, 2013
Establishing a data warehousing or business intelligence environment initiates a process that works its way through the operational applications and data sources across an enterprise. This process focuses not only on identifying the important data elements the business lives and breathes, but the process also tries very hard to provide rationality in explaining these elements to business intelligence users.
Posted February 13, 2013