Staging Versus Landing Areas in Analytics Environments

Plenty of analytics environments have landing areas, plenty have staging areas, and some have both. So, are landing and staging just synonyms for the same thing? A survey of usage would show there is much overlap in implementations, and even some confusion.

As the words themselves imply, “land” is a place and “stage” is part of progression. Landing databases serve as the initial dropping off point for incoming data—although, as noted, there are those who choose to blur the lines between a landing area and a staging area.

Purpose of the Database

Usually, structures within the landing database are one-for-one matches of the received sources. When these landing structures are extended beyond the incoming content, such extensions consist of auditing information. Additions may be a column for when the data was loaded, the name of the loading program, or a surrogate key in case the source is remiss in row uniqueness. Perhaps metadata tables are defined to track when cycles ran and the success or failure of loads.

Staging databases are intended for preparing data for further processing. A staging process can have few or many pieces to it, depending on how many steps are needed to get from the incoming “here” to the ready-to-go-out “there.” In this preparation role, a variety of table structures may be necessary to hold the varying forms of in-process work. Some tables may hold versions of data to be used for a future cycle’s comparison; other tables, being purely intermediary, are truncated and repopulated each cycle.

Variety Within a Framework

Building up an analytics environment involves choices based on a combination of the needs within a specific circumstance and the desires of the architects involved. This last aspect means that not everything is a science. Preferences, individualism, and comfort levels play a part in arriving at a solution to be implemented. However, this is not the Wild West, and every architect is not free to establish brand-new paradigms. Variety within a fairly standard framework is expected as one explores the design components established across organizations.

Of these choices, decisions about having a landing area and a staging area may arise. If a business has no need for staging processes because the data, as it comes in, is ready for use, then only a landing area may be defined. If everything goes through significant processing—say every data store must be compared to a previous copy to determine deltas—then a distinct landing area may seem moot. In this case, only a staging area may be defined. It is possible that if a staging area is forgone, in its place, an ETL area may be defined for engineers to use as a sandbox for their processing needs. Another possibility is that an architect-in-charge, who chooses to see no distinction, may lump everything together and simply name things either landing or staging based on the word he or she likes better.

Agreement Is Crucial

If one is building a data warehouse, a data hub, or a data lake, one might have a landing area, a staging area, or both. The chosen preference of defined areas is not a harbinger of design quality. As long as needed functionality exists and has a designed place to live, agreement is the crucial key to making things work. Whether an organization chooses to squish landing into staging, or staging into landing, or—similar to the compartments on the microwave dinners separating the peas and the carrots—have both a functioning landing and staging area, the selection is OK.

The only real requirement is that everyone knows what has been chosen as the design option. Equally important, folks should also acknowledge an understanding of these subtle industry distinctions so that when the team is augmented by newer members who think differently, those new members may be set right without being scolded for thinking “wrong.”