Primary keys come from candidate keys. Each candidate key consists of the attribute or attributes used to label a distinct row in a table. Every candidate key should contain the fewest number of attributes possible to identify rows individually and uniquely. Every entity within a design requires at least one candidate key. The candidate keys should arise normally during analysis from business data, such as order number, employee ID, etc. These expected candidate keys are also referred to as natural keys. If no natural keys are uncovered, or if the natural key is a composite of an unusually large number of attributes, then an artificial, or surrogate, key may be defined. The artificial key is a data item, created for the specific database entity, that otherwise would not exist. Therefore, the processes maintaining data will also need to maintain the artificial key values and their assignment. If no natural keys exist for a table, then very awkward logic may arise in maintaining keys. During the design process the primary key of a table is designated by choosing one of the candidate keys. If a single candidate key exists for a table, the single candidate key defaults into service as the primary key. When multiple choices exist, then one candidate key must be chosen as the primary key. While the primary key choice may be arbitrary, the best primary key selection is often the option containing the fewest number of columns, i.e., select a one-column alternative over a multiple-column option. Alternatively, selecting the option that appears to be the most "obvious" choice, within the everyday semantics used by the organization, is frequently the best decision.
A primary key is a logical concept, not a physical one. Some folks confuse this issue when implementing the physical details of this logical concept for the DBMS. One example of the logical/physical confusion emerges when designers add extra columns onto a primary key definition because they are focusing on the DBMS-built index "behind" the primary key. By default, said index is likely to be clustered. Clustering should be done along the most likely access path used to select and filter rows. The confused designers add several columns onto the definition of the primary key because they know that frequent access is optimal when clustered a certain way. While the designer's assumptions about access paths may be entirely true, the designer who makes such choices creates primary keys that are no longer logical primary keys. Not sustaining the primary key as the logical and minimal set of columns results in the DBMS relying on a larger set of columns to identify uniqueness. This incorrect belief occasionally may cause the DBMS to make bad choices in resolving queries because it assumes values are necessary for all of the columns in the defined primary key to uniquely identify any row.
The same confused designer likely has the alternative of leaving untouched the initial, logical designation of the primary key. Invoking the proper options during the primary key definition may result in the supporting index not being clustered. A separate clustering index still could be defined, using all proper access path knowledge identified for the table. In this manner, the semantics of the logical primary key remain true in their physical implementation, and access performance is optimized via the other clustered index. Additionally, logical and physical ideas remain separate and distinct, eliminating the confusion caused by an attempt to define a single object serving both purposes. Indexes and keys are different concepts. The flaw in a single-object approach is that one side is misstated when forcing the two thoughts into one database object.