True Surrogate Keys Silently Live in the Shadows

Surrogate means a stand-in, a substitute. Surrogate keys take the place of a natural key as the primary key within a database table. Ideally, surrogate keys stand in the shadows and are used in joining structures but not displayed or used in reporting. Using a surrogate key on a data­base table does not mean there is no natural key for the object in focus. If one has an object without a natural key, it means either one does not under­stand one’s data, or one’s data is meaningless—just a nonsensical group of stuff.

Using Surrogate Keys

Using surrogate keys within a database is often considered a technique to improve performance. The assumption is that using anything other than a numeric data value to join tables provides “bad” performance. Therefore, whatever the nat­ural key may be—one column, multiple columns, alphanumeric, etc.—the surrogate key can be a 100% numeric single value, standing in for that natural key value set. Some DBMSs have key generators that are numeric, others may be more wide-ranging in values. Some organizations may choose to use surrogate keys generated from hashed natural key values. Will surrogate keys improve everyone’s query performance? As with the stock market, specific circumstances differ everywhere, so the individual results may vary.

If surrogate keys in a database are exposed to and known by users, then these surrogates are essentially no longer surrogates. As the surrogate keys become values in user-created filters, the users now see these values as objects’ identities, removing the veil of surrogacy. Similar to Pinocchio becoming a real boy, these surrogates are now new business keys, IDs, or whatever. The use of surrogate keys in joins alone do not supply this level of magic, as joining on a column does not require users to remember and embrace individual val­ues. Similarly, the rise from surrogate value to business value may occur as data is piped from one data store into another. The second per­sistence of this data may now have two candidate keys, as both the original business key and the source’s surrogate key are extracted, exposed, and used indiscriminately by user communities. One man’s ceiling is another man’s floor. Adding more surrogate keys at each layer only complicates and confuses; where can the chaos stop?

Operational applications often use generated keys, but these generated keys may not always be surrogates. An ordering appli­cation generates orders, and that application’s generated order numbers are most likely legitimate business keys. Alternatively, dimension keys in a star schema are more often meant to be surrogate keys and not used as filters in queries. Surrogate keys and their values should not be extracted and pumped into other downstream solutions. These downstream systems should only extract business values, and, if they need surrogates internally, they should generate their own new surrogate values.

A Level of Discretion

Similar to a masonic brotherhood, surrogate keys are known to exist, but at the same time there is a level of discretion about them—a secret handshake of sorts. Using the surrogate values only for joins—and never for filtering—allows for the ceremo­nies within the “masonic lodge” to remain undisclosed. Surrogate keys are not meant for the front pages of the news. Surrogate keys may improve query performance, and they also can be thought of as a form of tidiness. Some folks appreciate that all tables can be defined to have a single numeric primary key. In dimensional designs, surrogate key values support the magic of slowly chang­ing dimensions, as the surrogate values are not exactly one-for-one with the natural keys but are instead aligned with the nat­ural key plus the value changes of any critically dependent data elements. When using surrogate keys, use them wisely and try to keep them in the shadows, as they are best intended.