In writing a definition for an entity, an attribute, or any other element within a database design, the desired end is a descriptive text that is clear, factual and concise. Semantics are an ambiguous and often painful tool to employ. Balancing the need for clarity against the desire to avoid redundancy can be a juggling act that is hard to accomplish. One might not easily recognize what is complete versus what is lacking, versus what has gone too far. But even so, within a definition if one finds oneself listing valid values and decoding the value's meaning, then one has likely already moved beyond what is "concise." Lists of values easily add bulk and quantity of verbiage into a definition, yet such lists do not usually increase the quality of a definition.
Posted November 13, 2012
The beauty of a truly wonderful database design is its ability to serve many masters. And good database designers are able to empathize with those who will use their designs. In business intelligence settings, three perspectives deserve consideration when composing designs.
Posted October 10, 2012
It seems easy to fall into a state where projects and activities assume such soft-focus that routine takes control, where one simply does necessary tasks automatically, no questions are raised regarding what is moving through the work-life production line and everyone is essentially asleep at the switch. Certainly, we may have one eye open ensuring that within a broad set of parameters all is well, but as long as events are basically coloring inside the borders we continue allowing things to just move along. In this semi-somnambulant state we can easily add columns to tables, or even add new entities and tables, or triggers and procedures to our databases, then eventually at some point down the road have someone turn to us and ask, "Why this?" or, "What does this really mean?" And at that point, we surprise ourselves with the discovery that the only answer we have is that someone else told us it was what we needed, but we do not really understand why it was needed.
Posted September 11, 2012
A database design may occasionally show evidence that it lacks proper prioritization. Data models should express truths about the business, or about the universe of discourse. But in expressing business truth this does not mean a data model should express absolutely every truth that anyone might conceive. Some relationships are significant while other relationships are not. And as a general rule, database design is not an exercise in trivial pursuit. Insignificant truths only clutter up a design, increasing complexity, causing users' eyes to glaze over more quickly, and adding no real value towards the endeavors of the enterprise.
Posted August 09, 2012
The whole world can be divided into two groups, these being splitters and lumpers. Design battles are waged across conference rooms as debates rage over whether to split or to lump. Splitters take a group of items divide them up into sub-groups and sub-sub-groups occasionally going so far as to end with each lowest level becoming a group of one. On the other side of the design fence, lumpers combine items until everything is abstracted into group objects covering very broad territory, such as a "Party" construct, or ultimately an "Object" object. Within data modeling, arguments arise, such as whether to sub-type an entity. Or perhaps lumping is discussed as the grain of a multidimensional fact is proposed. This debate underlies much of the decision-making involved in determining what domains to create within a data model. The split-versus-lump issue is ubiquitous and universal. The question to split or lump arises across many kinds of choices, in addition to the entity definition, table grain, or the domain grain mentioned in the previous examples; this issue is at the heart of deliberations regarding establishing functions, overriding methods, or composing an organizational structure.
Posted July 11, 2012
In the dim, dark past of data warehousing, there was a time when the argument was put forward that "history does not change." It was posited that once a piece of data was received by the data warehouse, it was sacrosanct and nonvolatile. A fact record, once processed, was to remain unchanged forever. Dimensions, due to their descriptive nature, could be changed following the prescribed Type 1, 2, or 3 update strategies, but that was all. It was the expectation that due to their very nature, fact tables would become huge and in being huge would give poor update performance; performance so poor that updates would be virtually impossible to enact.
Posted June 13, 2012
It seems only reasonable that what one person can do, others can learn. On the other hand, taking people through training does not usually result in the creation of great new database administrators (DBAs). It often appears as if those who are exceptional at the craft operate at higher levels as they dive into a problem. Can training alone provide folks with the attention to detail, the urge to keep digging, or the ability to recall minutiae that allow them to rise from simply holding the DBA title to becoming someone who is a great DBA? Or must the genetic potential exist first, and then one might fall into the DBA occupation and astound those around them. It is very hard to say with any degree of certainty whether great DBAs are made or born; yet again the battle between nature and nurture arises.
Posted May 09, 2012
Organizations are populated with solutions entitled DW, EDW, BIC, ODS, EIW, IW, CIF, or BW. Why must every organization have a data warehousing or analytics solution identified by monikers from a very limited pool of choices? Why must every deployment of a database that is expected to function as an operational data store be called ODS? For internal solutions it seems that plain and dreary naming approaches are de rigueur. Two and three letter acronyms have long been a part of corporate-speak; but when it comes to IT systems, these TLAs have become exceedingly narrow and soul-less.
Posted April 11, 2012
Solution development work is usually accomplished via projects, or a combination of programs and projects. This project perspective often leads to thoughts of documentation as project-owned. And while many documents are project-specific, such as timelines, resource plans, and such, not everything is project-specific. Unless projects are established in a fashion whereby each is very limited in scope to the creation or enhancement of a single application or system, specification and design documents belong to the final solution and not to the project.
Posted March 07, 2012
Lectures related to master data bring forth all sorts of taxonomies intended to help clarify master data and its place within an organization. Sliding scales may be presented: at the top, not master data; at the bottom, very much master data; in the middle, increasing degrees of "master data-ness." For the longest of times everyone thought metadata was confusing enough ... oops, we've done it again. And, we have accomplished the establishment of this master data semantic monster in quite a grand fashion.
Posted February 09, 2012
Retaining the particulars of change over time is a fairly intricate configuration. Audit log or shadow tables are sometimes employed, but on occasion there is a need for the "old" and "new" rows to exist in a single operation table for application use. Far too often, the implementation of temporal data structures is shoddy, loose, and imprecise; rather than the fairly complex dance move such temporal arrangements must perform in actuality. The sub-optimal result is much like one's performance of the Funky Chicken at a friend's wedding; the desired moves are mimicked, after a fashion, but it is unlikely to earn high marks on "So You Think You Can Dance." The usual temporal implementation simply slaps on start and stop dates, debates a little over default date values versus NULLs, then moves on to the next subject.
Posted January 11, 2012
The cost for new development can often be easily justified. If a new function is needed, staffing a team to create such functionality and supporting data structures can be quantified and voted up or down by those controlling resources. Money can be found to build those things that move the organization forward; often, the expense may be covered by savings or increased revenue derived from providing the new services.
Posted December 01, 2011
It is not magic. Building a successful IT solution takes time. And that time is used in various ways: obtaining an understanding of the goal; mapping out what components are necessary and how those components interact; testing components and their interaction; and finally migrating those components into the production environment - otherwise known as analysis, design, development, testing, and deployment. Regardless of the methodology employed, these functions must always be addressed. Different approaches focus on differing needs and aspects. But any complete methodology must fill in all the blanks for accomplishing each of these tasks.
Posted November 10, 2011
Every project is, or should be, driven by user requirements. Requirements are the organization's way to articulate what needs to happen in order to provide value. Yet time and again requirements are looked at as something overly technical, mysterious, and too confusing to easily handle. Repeatedly, organizations use a template to ensure that requirements are defined early in the solution process. Sadly, the intended purposes are habitually defeated as these templates are filled with a lack of understanding for what information belongs in a given section, resulting in people creating documents that the authors themselves do not understand. Across many organizations requirements documents are created, reviewed, and even agreed to, that far too often are incoherent monstrosities saying nothing of actual value.
Posted October 15, 2011
When assembling a database design, one of the keys for success is consistency. There should be more than just similarity in the way things are named, the manner in which tables or groups of tables are constructed; the manifestation of these elements should follow standards and practices that are documented and understood. If one tries to rely on the idea that individual developers will simply look at existing tables and glean standards via osmosis as they add on or create new tables, then one does not actually have any standards at all.
Posted September 14, 2011
The Broadway tune goes, "The sun will come out tomorrow ... it's only a day away." The words from this optimistic jingle are often heard on IT projects that are overburdened with features and functions. On any project of significant size the list of desired things often becomes larger than the budgeted resources or time. Faced with limiting circumstances, the only option becomes aligning the work effort with the constraints and only doing what fits within those constraints. The items are "timeboxed," and the amount of planned work is exactly the amount of allowed work. Some things remain in, while other features are left out. Alternatively, responsibility for controlling a project may be ignored and the end date arrives with some things simply not completed. Essentially, a project is timeboxed by default.
Posted August 11, 2011
Naïve approaches to business intelligence will occasionally trap designers as they juggle operational data stores and data warehouses. The trap results from an honest endeavor to simplify designs and increase consistency throughout the solution. Under the umbrella of consistency a designer may plan for a reference table used for operational look-ups to perform a second service as a star schema dimensional table. Some or all reference tables then are declared by fiat to also be dimensions. While on a superficial level there are similarities between dimension tables and more normalized look up or reference tables, fundamentally these two concepts are separate things.
Posted July 07, 2011
Occasionally, one sees a data structure abomination. This atrocity involves an object of almost any type, in almost any database wherein the object has a start date but no end date. It is not that the finish date currently has no value and is null; it is that the end date does not even exist on the table structure. The stop date was never intended to exist. The object in question starts, but it doesn't ever end.
Posted June 08, 2011
Tables within a database management system (DBMS) need primary keys and defined indexes in order for the DBMS to have the opportunity to provide good query performance. Without indexing, the worst possible query performance is guaranteed. The content of these non-keyed tables remains unknown to the DBMS, a black box, where the only possible approach for query execution is to read every row one-by-one. Under such scenarios, the DBMS is little more than a file server that operates more slowly than usual. Without the primary key and indexing, the DBMS may expend even more processor cycles in moving data in and out of the DBMS proprietary storage areas than a file server uses in opening files.
Posted May 12, 2011
Dates are important. Without dates how can anything be planned? However, due dates have been know to increase in importance in the delivery of software solutions. Sometimes the due date becomes such an overwhelming creature of importance that the date is more important than following best practices, more important than verifying that what is built is correct, more important than the solution team gaining a proper understanding of the work they are attempting to perform.
Posted April 05, 2011
The understanding of object states and their transitions obviously is of great importance to the solution developers because as processes are built they will need to support each state and every possible transition. Additionally, knowledge of object states and transitions is of vital importance to the data modeler because the data must be persisted across each of those states, and often the state of an object needs to be easily identifiable. A data model minimally requires a reference table, along with the varying entities that reference that table (the foreign keys tracking an individual object's status). Specific states drive variations of required attributes or combinations of those attributes that apply to one state and not another. The logical definition of the database can identify these variations through the use of supertype/subtype constructs.
Posted March 09, 2011
Referential integrity helps manage data by enforcing validation between related entities. This enforcement follows logical semantics behind the database design -- i.e., an employee can only work for an already defined department; a prescription can only be written by a health care practitioner with the proper authority. A Foreign Key on an Employee table rejects data when any attempt is made to insert or update a row with a department value that does not already exist as a department identifier within a Department table.
Posted February 02, 2011
Back in the 1970s, the ANSI SPARC three-tiered model arose, foreshadowing a smooth intertwining of data and architectural design. The three tiers concept isolated the physical storage needs of data structures independent of business' perception of these structures. The three levels were comprised of schemas labeled external, conceptual, and internal, with each level describing the data in focus from varying perspectives.
Posted January 07, 2011
How does one know what one doesn't know? When evaluating what one knows, it is hard to know where to begin. The wise men say, "The more you know, the more you know you don't know." If one believes such commentary, what is known constitutes the tip of the proverbial iceberg. Databases have an easier time with such missing circumstances. If the rows of a database table are lacking referents, an outer join query filtering for NULLs might detail for you all the missing items. In developing and delivering projects, such a reference list for our minds to link to does not exist, for an outer join or anything else. Often, we do not know everything that needs to be done, particularly as a project starts. The difference between success and failure is not so much what one knows, but in how one handles the gaps between what is known now and what needs to be known before one finishes.
Posted November 30, 2010
Psychiatrists report play is important for keeping our minds and bodies healthy and happy. It has been uncovered that without at least some play, we mentally and physically deteriorate. Lack of play can lead to depression. Our brains are so hardwired in their need for playful activities that without such expressions we loose focus. Play enhances the creativity and imagination within us. Likewise, play has a proper role within our work. And not just for the psychological benefit to the individual, but for the overall health of the organization. For an IT development group, play is essential for finding better ways to approach problems. Within an IT context, play involves evaluating new ways to handle old problems. Prototyping, at times, constitutes play. Prototypes can be used for more than just the practicalities of the user interface; sometimes a prototype of specific functions is necessary for comparisons in order to judge how non-obvious techniques may perform. Such prototype endeavors can involve not only new algorithms, but potentially variant data structures. Ideally, such prototypes will improve team understanding of the data structures and offer insights into potential new and innovative approaches.
Posted November 09, 2010
As bizarre as it seems, no one sabotages our efforts more easily than ourselves. We often leave ourselves vulnerable to failure whenever we respond to requests for project estimates. Like "A Tale of Two Cities" estimating can bring our best and our worst. In offering estimates, individuals may undercut their own efforts. For example, a developer may only consider time coding, and fail to include enough headroom for testing, rework, or documentation. While some aspects of these estimating attempts might suggest a passive-aggressive approach to avoid tasks, much of this estimate-short-sheeting often results from a desire to please and provide a number to make management "happy." There is always a desire from management to have more things done, and to do things quickly on every project. Even when not stated, this unspoken desire colors the responses of the individuals doing the estimating.
Posted October 12, 2010
As databases are established, particularly databases intended to support analytics initiatives, responsibilities for the design must include articulating the planned approaches for enhancing and scaling the database over time. If a database is created to express a multidimensional data warehouse bus architecture, or a corporate information factory, or anything else, the explanation of this connection should exist somewhere. Such documentation should also expand on why things were decided as they were and the expected stylings to be associated with proposed enhancements. Descriptions involving anticipated processing patterns extend naturally from such architecture artifacts. Database and application design personnel should work together in the creation of such credentials to ensure these documents thoroughly cover the needs of the personnel involved in building and maintaining the solution.
Posted September 07, 2010
Design should be an intention, preferably a planned intention. In that intention, design requires more groundwork than a simple thought-train such as the following, "I planned to write a module that functions; since the module functions, my designs are working." Some situations do exist where true design really is less important than successful functionality. Determining the appropriate level of design and preparation offers an interesting question to every architect. At the lowest level, standards and practices present suitable patterns that can serve as a design skeleton for those low-level or isolated items not requiring a heavy-handed blueprint.
Posted August 10, 2010
When integrating data, evaluating objects from multiple sources aids in determining their equivalence. Each source may identify customers, but determining which customer from each system represents the same customer can prove daunting. Sometimes matching things is straight-forward; for example, if all sources should have an accurate social security number or taxpayer ID, success involves simply linking the matching numbers.
Posted July 12, 2010
Quality can be a hard thing to define. What is good and what is bad may not be easily identified and quantified. When a data mart accurately reflects data exactly as found in the source, should that be considered a quality result? If the source data is bad, is the data mart of high quality or not? If the data mart differs from the source, when is the difference an improvement of quality and when is said difference evidence of diminished quality? While it may seem self-evident that correcting the source of load data would be the "right" thing to do, in practice that direction is not necessarily self-evident. The reasons supporting this nonintuitive approach are varied. Sometimes changes to the source impact other processes that must not change, or the changes will expose problems that may provoke undesired political fallout, or it may simply be that making the proper adjustments to the source application would prove too costly to the organization. For all these reasons and more, in the world of business intelligence, the dependent data often is expected to be of higher quality than the source data. In order for that improvement to occur, data placed within the dependent mart or data warehouse must be altered from the source. Sometimes these alterations become codified within the process migrating data from the source. Other times changes are made via one-time ad hoc updates. Either way, this alteration leads to a situation in which the dependent data will no longer equate one-for-one to the source data. Superficial comparisons of this altered content will highlight the disparity that what exists for analytics is not the same as what exists for the operational system.
Posted June 07, 2010
It is rare to find an implemented database devoid of reference tables. Reference tables provide valid values for drop-down lists and in a slightly obtuse way, also allow for expression of a domain or user-defined data type within the database design. Nominally a reference list is but a list of values to be referenced. In practice, the implemented structure of a reference table is driven by small nuances regarding how each list is used. The signifiers may consist of a code or number that creates a short-hand way of expressing a state or value. Yet in addition to such simple codes, a "medium length" name column or even a column to contain a lengthy text description should exist in support of these codes. The names may be used in drop-down lists so that people need not memorize many code values, and the descriptions can assist help-functions by providing users with more understanding of a value's meaning or intended use. For operational applications it may be worthwhile to have an attribute expressing the order in which items should be displayed, unless alphabetical display is a preference. If codes are maintained via online screens, an indicator may be helpful in flagging values that are part of the system and not to be removed or deactivated via those online screens.
Posted May 10, 2010
Far too often we all talk past one another. This cross talking, while not always drastic, remains perceived as an understood fuzziness. Much of the time we ignore these minor miscommunications because precision and clarity are not necessarily critical in all situations. If the general gist is effectively understood between those communicating, that generality may be all that is necessary. Those involved in the communication may feel comfortable that assumptions made to "fill in the gaps" will fall within an acceptable range. Although the lack of clarity in the message communicated may be acceptable, in other circumstances it may not be acceptable.
Posted April 07, 2010
Primary keys come from candidate keys. Each candidate key consists of the attribute or attributes used to label a distinct row in a table. Every candidate key should contain the fewest number of attributes possible to identify rows individually and uniquely. Every entity within a design requires at least one candidate key.
Posted March 04, 2010
Designing a data model that supports the reporting and analytical functions is no different, initially, than any other modeling effort. Understanding the data is crucial. The data architect or modeler needs to feel comfortable with dimensional modeling techniques and needs to obtain a working knowledge of the universe of discourse for the subject-at-hand. A good start in gathering this knowledge begins by reviewing the operational data structures containing the identified source elements. The challenge in designing analytical solutions is found in applying best practices for analytics simply and effectively.
Posted February 09, 2010
The process for designing a database that supports the operational functions of an organization begins with simple understanding. The data architect or modeler needs to obtain a working knowledge of the language comprising the universe of discourse for the solution. This awareness is gathered through many activities, such as talking with the people currently doing the work, sitting with them and watching how they do their tasks, reading over existing training manuals or standard operation procedures. The designer is best served when figuratively walking a mile in the shoes of the future application users. The more that the designer knows about the user needs and goals, the better able the designer is to definitively craft a data model supporting user tasks.
Posted January 11, 2010
In composing a data model, the structures are put together thoughtfully and with intention. Data structures emerge from the application of semantics germane to the universe-of-discourse and filtered through the rules of normalization. Each table expresses meaning, with columns that are self-evident. The best models reduce the data items within the covered subject area into obvious arrangements. However, this simplicity often confuses observers, persuading many that modeling itself must therefore be a simple task. DBMS tools often incorporate wizards that facilitate the quick definition of tables which are then immediately created within the chosen database. These tools enable the developer to build tables in the blink of an eye. At times some prototypes are approached in this fashion, and while this provides for placeholders, such slapped-up table structures are insufficient for an industrial strength solution. Under these instant-table setups, developers often have no problem reworking large sections of code for minor changes, or misusing data elements to mean many things while making the determination of meanings at a specific point-in-time less clear. Unaware of these less-than-stellar consequences, users become confused; they often wonder why modeling tasks should ever need to be done because the proof of concept worked, didn't it?
Posted December 14, 2009
The pervasive nature of data continues unabated, leaving organizations more awash in data than ever before. Technology has enabled the access and leveraging of information to heights undreamed of a generation ago. Between corporate dashboards and internet Googling, vast quantities of information are truly at one's finger tips. Data-driven, domain-driven, model-driven … the data itself is a force to be met and managed. When managed well, users never explicitly think about the databases that persist all that data.
Posted November 11, 2009
The art of building software solutions is comprised of many moving parts. There are project managers coordinating tasks; business analysts gathering requirements; architects establishing standards and designs; developers assembling processes; DBAs building database structures; quality assurance staff testing components for compliance to requirements; and an array of supporting roles providing for functional environments, infrastructure, security, etc. A common task that everyone must perform is estimating the effort necessary to deliver results. Certainly for simple and/or repetitive tasks there is no need for recurrent estimating, since applicable values are based on past known metrics. For example, creating the third or fourth version of the same database within the same environment should allow a DBA to incorporate costs experienced previously as a guide. And unless something unusual occurs, such estimates should be on target. However for creative tasks, such as designing new structures, or building new kinds of processes, there will be no previous documented events to refer to. Faced with these circumstances, individuals usually are not allowed to shrug and say, "it will take as long as it takes," and be left alone.
Posted October 13, 2009
Composite keys are an implementation of business rules within the database. As an example, a table named INVOICE has a composite primary key consisting of Account_Number and Invoice_Date. In this example several possible rules are being expressed. The Account_Number which partially identifies an INVOICE instance implies that an account must exist before an invoice can exist. In addition to this INVOICE table, one expects to see a table named ACCOUNT with only the Account_Number as its primary key. Likewise, a referential integrity constraint would be defined between the INVOICE and the ACCOUNT tables based on that Account_Number value. In this manner, the DBMS would prevent Account_Number values from insertion into the INVOICE table unless they already existed in the ACCOUNT table.
Posted September 14, 2009
New data modelers often see things as black and white. But rather than being concrete flooring beneath our feet, knowledge is more like a gossamer web that builds up layer upon layer to provide the effect of a solid foundation. We may think we know facts, such as one plus one equals two, or Columbus discovered America in 1492. What we have come to know about our world comprises our own internal knowledge base. We gain much of this knowledge because it has been passed onto us by others, people with experience, parents, educators, clever friends, verbally or in print. But how do we know such items are real, actual, absolute bona fide beyond-a-shadow-of-a-doubt truths?
Posted August 14, 2009
While good database design is always necessary, the value of good design is reinforced by such endeavors as service-oriented architecture. The semantics-laden engineering of service-oriented tactics used in creating solutions melds seamlessly into the semantics-laden world of data modeling. Although one aspect does not necessarily build on top of the other, each works as part of a team in braiding solutions that grow in usefulness as they emerge. The processes surface as a method to further define the meaning of an object, and the object serves as a harbinger of the processes that must exist. Business rules much more advanced than today's simple constraints may one day reside within the database itself as a built-in function of a some future DBMS. Therefore, data architects should never try to wall themselves behind a veneer of thinking only in terms of data without a thought regarding the processes that touch the data. As a database designer, working within projects that use service-oriented architecture approaches can be exhilarating.
Posted July 13, 2009
There comes a time at the start of a new engagement when the data architect must acquaint himself with the system for the first time. When first learning about a new application, the relevant data, and its foundational concepts, many questions are reviewed.
Posted June 15, 2009
An inherent awkwardness exists in every many-to-many relationship between entities. Ambiguity causes this persistent awkwardness, primarily because a many-to-many relationship is such a fuzzy thing. In data discovery, encountering many-to-many relationships may actually expose a level of disregard about details by the subject matter experts.
Posted May 15, 2009
In a relational database approach, regardless of whether one is considering persisted data or transitory query results, almost everything is viewed as a logical table. Associations between these envisioned tables are based on data values, versus the alternative to the pre-relational idea of linking multiple data structures via "hidden" pointers maintained by the system. Relationships among objects are ultimately derived from the semantics of a situation.
Posted April 15, 2009
In formulating the tenets of relational theory, issues anent to order were explicitly addressed. These relational theory tenets included defining a relational database so that it need have no concern with the order of columns in a row, or with the order of rows in a table. And yet, such a stance seems counter-intuitive since the database brings structure and organization to content. Chaos is the primordial soup from which all things originated. Thus it seems only reasonable that a relational database, being the best and brightest of its kind, should abhor such chaos and bring ever more order instead, right?
Posted March 15, 2009
Increased governmental oversight has amplified organizational attention concerning the tracking of content changes made to application database tables. Database log reading tools do not always provide the most useful means of reviewing specific changes for all organizational purposes. Fortunately, other reviewing options do exist.
Posted February 15, 2009
Occasionally the database under design becomes more than a one-off project. For example, the data mart or data warehouse might act as a profit center for the organization, and as a solution sold to many customers. The possibility exists to capture a basic truth for a given industry in a simple universal design exactly suiting everyone. But unless the situation really qualifies as a valid "one-size-fits-all" circumstance, one needs a level of customization within the architecture.
Posted January 15, 2009
Tight budgets and limited talent pools do not make a winning combination for staffing a database management group. Despite the importance of saving money, problems can arise when the lines blur regarding what gets wasted versus what gets saved.
Posted December 15, 2008