An enterprise conceptual data model is often seen as a high mountain to be climbed, a journey that will last a lifetime. People have visions of 10 feet or more of wall in the corporate offices wallpapered with an entity relationship diagram [ERD] that has zillions of teeny, tiny boxes and more relationship lines than the combined lines of queuing patrons in all Disney Resorts, when full. In this context, an enterprise conceptual data model is a daunting task not to be taken lightly. But in today's world, that enterprise conceptual data model can simply be a list of subject areas.
Posted July 01, 2020
As the number of types of slowly changing dimensions (SCDs) increased, things have ended with Types 0 through 7, making essentially eight of them. But it is unclear whether there full consensus exists among current practitioners on what actually differentiates each of these eight SCD types. Some confusion may result from the fact that when the first three SCD types were defined, each could be equated to a result for a dimensional attribute. Type 1 had facts associated to dimension values as they are currently, or always current, Type 2 had facts associated with dimension values as were current when the facts were processed, Type 3 had facts associated with both current values and values current when processed.
Posted June 10, 2020
One creates the potential for some interesting anomalies when building a star schema wherein the fact table contains future-dated metrics and any of the dimensions are Type 2. A Type 2 dimension tracks changes to the data items contained within it. Effectively, each dimension contains a surrogate key, a natural key with a start and stop date, and additional descriptor columns. If any of the descriptor column values change, the existing dimension row has the stop date populated while a new row is inserted with the same natural key, new start date, and new descriptor values.
Posted May 13, 2020
As we have moved forward with APIs and microservices, every organization has even more data stores to manage and more sources of data to consider. Sorting through data structures for operational solutions can become mind-numbing due to the variety, or even frustrating due to a lack of detail from many vendors. Source systems are no longer the monoliths they once were.
Posted April 08, 2020
From CEO to presidents, VPs, directors, to any number of mid-to-low level managers, the concept of hierarchies is pervasive in organizations. But, if one is dealing with a relational DBMS, a hierarchy remains an awkward concept.
Posted March 05, 2020
A good data modeler must know when there is something the users need to analyze, even when what that is is not necessarily obvious. Subtle differences abound between things in a state, versus actions taken to create or change states, versus the duration of a business object within or across states, versus business objects and the workflows containing them. Each of these subtleties drives differing metrics with distinct uses.
Posted February 10, 2020
When applied implementation efforts are not efficient, more often than not, the inefficiencies are due to the interference of an imp known as "churn," i.e., implementation wheels spinning away and not actually making progress. Churn is bad. Churn is one of the most destructive circumstances for any IT project. Churn may raise its ugly head at any point where a project requirement or need is left unclear.
Posted January 02, 2020
What exactly is a data architecture? As the Zachman Framework exposed long ago, different people look for different kinds of details and documentation to answer fundamental questions about an enterprise's architecture. Someone involved with infrastructure will need to understand the tools used and the methods employed to move data and to be clear on concepts about how security will be enforced. But these aspects are only initial parts of the overall architecture, and as such, a simple diagram of tools used is incomplete and insufficient for a comprehensive view of data architecture.
Posted December 01, 2019
Every organization needs a data warehouse. A data warehouse has never been a one-size-fits-all kind of solution. Variations exist and should be accepted.
Posted October 31, 2019
Certain kinds of issues in data modeling need to be addressed in specific ways. Many options may exist, but it is very rare for all the possible choices to be equally appropriate. The reasons for using a less-than-satisfactory path may be many. They may include a misguided concern for speed-to-completion or be a matter of the areas of control for those "in the loop" on the existence of the issue. Largely, this involves communication or, more precisely, the lack of communication.
Posted October 01, 2019
In order to protect your organization, it is critical to watch over the elements that have been built, keep processes running, and be on top of change. Spend the time and resources necessary to properly maintain the solutions for which you are responsible. The amount spent in such endeavors will be less time than that spent trying to play catch up on too many changes after bad things have resulted.
Posted September 03, 2019
An effective approach to processing and transforming large datasets is likely comprised of multiple steps. The large data will likely be split apart into several smaller sets, maybe even in a couple of differing fashions with a common and understandable theme. But there should not be too many split-apart variants; rather, as with the three bears, it should be just the right number of smaller datasets. And then, similar to solving a Rubik's Cube, a twist or two at the very end brings all the new and old datapoints together in a complete and organized fashion.
Posted August 07, 2019
Data virtualization enables the ability to have one or more data stores that break the bank processing-wise, because they can physically exist once but logically exist in multiple transformed structures. Occasionally, IT managers get the idea that data virtualization is a more generic answer, presuming that if it works for the big data, it can work for all data.
Posted July 18, 2019
At times, there is a need to have security within the database be a bit more sophisticated than what is available. On specific tables, there may be a need to limit access to a subset of rows, or a subset of columns to specific users. Yes indeed, views have always existed, and yes indeed, views can be established limiting rows or columns displayed. However, views only can go so far.
Posted June 10, 2019
CDC can greatly minimize the amount of data processed; but the cost is that the processes themselves become more complicated and overall storage may be higher. Costs are moved around, the final level of processing becomes focused on the minimal changes, and this minimization is the efficiency to be gained. Moving forward, using the data becomes standardized and ultimately straightforward.
Posted May 01, 2019
When working on a multidimensional design, every fact table within scope should be handled with care. In an ideal world, each low-level fact table represents the metrics related to a business event. The meaning of a fact table, ideally, should be evident based on the table name and the composition of the fact table's primary key. Deciding on a primary key for a fact table is an important choice.
Posted April 09, 2019
Clarity of vision is absolutely the most important part of database design. The data architect must understand the shape and patterns of the data being modeled. This lucidity arises when the designer understands the subject area, the goals of the target database, the nature of the data sources involved, and the internal lifecycle of the database objects in scope.
Posted March 04, 2019
In dimensional modeling, business events are typically designated as facts while descriptive information elements are dimensions. However, events (or information about them) occasionally serve as dimensions as well as facts. A good data architect must watch their p's and q's and be certain when it is appropriate for a fact to also serve as a dimension—or when the dual function is not appropriate.
Posted February 08, 2019
More harm than good has been done to software development by letting the planning dates drive the work instead of having the work drive the dates. This planning-date-driven approach causes more stress, more bad decisions, more rework, more failed projects than all other causes combined.
Posted January 02, 2019
Data mart builders must understand what they are working to accomplish. The DBMS is not going to magically guide them to a solution. The builder is responsible for knowing how dimensional techniques work, why they work, and what options may exist within the dimensional framework.
Posted December 04, 2018
While relational database management systems are still the workaday workhorse, we are now adding into the mix document, columnar, and graph datastores, and their variants. Each datastore has something at which it excels, and other things it may not. Similarly, the rules followed in composing data structures, based on the platforms selected, also vary greatly.
Posted November 01, 2018
There has always been a need to tightly control some data items, such as passwords and Social Security numbers. Today, with the rise of concerns over personally identifiable information (PII), the General Data Protection Regulation (GDPR), and other legal mandates, a much larger group of data elements must be controlled. These legal data governance issues may need to guide our hands as we establish database designs.
Posted October 10, 2018
In establishing a staging or landing area for a data lake, a data hub, or a quaint data warehouse environment, structures need to be established that will mimic source data in support of two very basic queries. The first is: "What does the current source dataset look like?" And the second: "What change activity has occurred against the source since the last time it was interrogated?"
Posted September 04, 2018
In the big data world of today, issues abound. People discuss structured data versus unstructured data; graph versus JSON versus columnar data stores; even batch processing versus streaming. Differences between each of these kinds of things are important. How they are used can help direct how best to store content for use. Therefore, deep understanding of usage is critical in determining the flavor of data persistence employed.
Posted August 08, 2018
When we hear the term "think outside the box," how often do we really examine what that phrase truly means? First, one needs a box. And it is on this issue where most folks fail. Before one can consider what is "outside the box," one must clearly understand what exactly is meant by "inside the box." People often consider random approaches the same as being "outside the box." However, just different is not enough.
Posted July 02, 2018
Under usual circumstances, the one-to-many or many-to-many relationship, alone, drives the pattern used within the database model. Certainly, the logical database model should represent the proper business semantics of the situation. But on the physical side, there may exist extenuating circumstances that would cause a data modeler to consider including an associative table construct for a one-to-many relationship.
Posted June 01, 2018
Agile approaches to projects have been touted by many organizations across the globe. IT shops frustrated over expensive projects falling well shy of their goals have been desperate for change. These anxious organizations race into trying Agile as a solution to their woes.
Posted May 08, 2018
For data architects, it is not unusual to use a data modeling tool to reverse-engineer existing solutions' databases. The reverse-engineering could occur for a functional reason, such as gathering information to evaluate a replacement option, or to comprehend a solution, seeking to work out what data should be extracted for a downstream business intelligence need.
Posted April 12, 2018
Many organizations talk about data governance, but rather than establishing an ongoing governance process that is involved with every project, the governance is viewed as a one-time task to be slogged through, over with and done. Such myopic approaches will only lead to failure. Data governance is like life, it is the journey, not a destination.Proper data governance brings commonality to an organization; it leads the journey to a single version of the truth. A single version of the truth does not mean everyone must kowtow to a single metric, but it does mean distinct calculations unique to dissimilar sub-groups have different specific names at the corporate level, even if those naming differences are subtle.
Posted March 07, 2018
Far too often, business users seem consumed by the systems they handle. This makes them unable to define the necessary business processes or needs of the organization. All these users can do is describe what their off-the-shelf packages provide for them. In fact, most users take great pride in their "understanding" of the current system.
Posted February 01, 2018
Don't let the term "unstructured data" confuse you. Structure exists, somehow and somewhere, within unstructured source data. It is that subtle, possibly even encrypted, structure that contains the gems of knowledge that an enterprise seeks. Sometimes those gems only shine when extracted and combined with other little gems from other data sources.
Posted January 02, 2018
No, we weren't born in a crossfire hurricane, nor schooled with a strap right across our backs; we have other crises, mixed priorities, and resourcing deficiencies to cope with. But it's all right now. In fact, it's a gas. There is little choice for those who wish to survive in the IT trade. Either one copes with an ever-changing landscape, or one moves on to another industry. Technology shops across the nation, if not across the world, are in themidst of a crisis. The only problem is that this crisis has gone on for a couple of decades or more.
Posted December 01, 2017
Leadership in IT can be very dynamic. At one time or another it seems anyone may be impressed into a leadership role. The data people (DBAs, data architects, data modelers), often end up considered as unofficial leaders (largely because they are the smarter ones on the team). A leader, whether official or unofficial, has some responsibilities.
Posted November 01, 2017
Operational systems are where data is born. These systems either force people to enter their details or acquire the same from trusted sources. Names, addresses, merchandise selections, and credit card numbers are consumed. The operational solutions interact with users and compatriot applications to give birth to their raison d'etre, be it purchase orders, payroll checks, or any of the thousands of other documents and transactions.
Posted October 18, 2017
Every few years we hear of one new idea or product that will surely bring death to the relational database, or death to the data warehouse, or death to something. It appears that many prefer to see death or at least they greatly enjoy planning for it. Often these finalities never seem to arrive.
Posted September 07, 2017
There is an old joke that a tourist in New York City is lost while sightseeing and notices a musician leaving a taxi; he walks up to the musician and asks, "How do you get to Carnegie Hall?" The musician responds, "Practice, practice, practice." Obviously, such jokes are doomed to obscurity as people now will simply use the navigation app on their phone. However, the advice itself, in its own way, will always be relevant—practice, practice, practice.
Posted August 09, 2017
For everyone normalizing their designs, just remember: There ain't no such thing as a partial foreign key. When a project pretends that partial foreign keys are a valid concept, the end result is a database model that either lies to viewers, or does not give them the necessary information to readily join tables together.
Posted July 05, 2017
Updating fact rows inside a star schema set of tables is never a best practice. Even so, some organizations travel down this path.
Posted June 01, 2017
Choices are pervasive when designing a database. The data modeler must progress through a series of issues: What ideas are important? Which objects stand out? Which concepts can take a back seat? Adding to all those decisions comes understanding the target structures one is shooting at. A normalized design may lean one way, while a dimensional design may lean another.
Posted May 05, 2017
It often seems that working around things is a full-time task in every area of information technology. When workarounds are conceived and deployed, people are not always in agreement.
Posted April 07, 2017
Many businesses seem to believe that dimension tables and reference tables are indistinguishable objects. Apparently, the only difference worthy of note seems to be altering the object's name from "something reference," or "something xref," to "something dimension." As these organizations build multidimensional data marts, they often place a view on top of their reference table and feel good about how quickly and efficiently they complete their data mart.
Posted March 02, 2017
At one of my very first jobs in the service industry, I had a boss who had a variation on an old adage. He would say, "The customer is not always right. In fact many times the customer is dead wrong; but the customer is always the customer." In a similar way, in data modeling, business users and internal IT users, make the world go around. If they don't use the models that are designed, or worse, if they cannot relate to the designed data models, then what value has been delivered?
Posted February 08, 2017
There was a time when what you saw was what you got. Building up the components of a business intelligence area was very straight-forward. A staging area was a staging area; an operational data store was an operational data store. But like buying a pitcher of beer for $2, or gas for less than a dollar per gallon, those days are gone. The dynamics have changed, things are more federated, and IT must accept more than one standard tool.
Posted January 03, 2017
Terms such as "active," "inactive," and "canceled" may seem mundane and inconsequential, and when folks hear the term,"valid values," their eyes glaze and expectations of interest diminish. But exciting or not, reference values and an understanding of them are important to every organization.
Posted December 01, 2016
Why is there such pervasive abuse and misuse of database views today? Views are a helpful tool in building a business intelligence environment, yet many organizations establish practices that not only rob views of their full usefulness but present patterns that actually confuse issues instead.
Posted November 02, 2016
One symptom of an organization in the middle of a knowledge vacuum is evidenced by SQL that often includes what appears to be extravagant usage of the GROUP BY clause. Writing GROUP BYs here, there, and everywhere becomes a little SQL development dance step, a jitterbug to bypass the issue—moving but not really getting anywhere. Why do these kinds of circumstances exist? Well, maybe the only expert on the involved system has retired and no one else has picked up the torch, so no one is willing to touch the code.
Posted October 07, 2016
Programming is a literal sport. Code does exactly what it is configured to do, no compromises. When the definition of a task is fuzzy, it is up to the developer to do what they believe is correct. Does the code reflect what is desired? That answer is left open to interpretation. Sadly, developers may not have a clear understanding, and even the users requesting the solution may not be sure. The results can be very painful for an organization. Expectations may not align with the delivered solutions. Users will blame IT; IT will blame users.
Posted September 02, 2016
As one works through the normal forms, be it a journey to the placid shores of third normal, or the more arid climes of Boyce-Codd normal form, or even the cloudy peaks of fourth normal and beyond—and before one starts thinking about normalizing the design—the database designer has covered a lot of ground work already. Before thinking of normalizing, one needs to have conceptualized the relations that might be within the solution's scope.
Posted August 04, 2016
In the data warehousing arena, development databases often get short shrift. Frequently, this situation arises because development databases are considered too much work to be done properly. So, instead of embracing the problem and following through on the necessary work, it is ignored or done poorly. One could almost say that ignoring the development database has become the standard practice.
Posted July 12, 2016
Many of the NoSQL tools out there, such as MongoDB, Couchbase, Hadoop, and others, purport to be leading a revolution and breaking the bonds of servitude to the restrictive, inflexible, established, relational market. They claim users need more, users need better … and they are there to help. Of course, when speaking about those relational flaws, the comments always focus on problematic aspects of a DBMS' physical implementation.
Posted June 09, 2016