A Database Perspective on Data Security

Although security is often related to privacy, they are not synonymous. Data security can be defined as the set of policies and techniques to ensure the confidentiality, availability, and integrity of data at all times. Data privacy refers to the fact that the parties accessing and using the data do so only in ways that comply with the agreed-upon purposes of data use in their roles. These purposes can be expressed as part of a company’s policy, but are also subject to legislation. In this way, several aspects of security can be considered as necessary instruments to guarantee data privacy.

More concretely, data security pertains to the following concerns:

Guaranteeing Data Availability: Ensures that the data is accessible to all authorized users and applications, even in the occurrence of partial system malfunctions. Many techniques exist to safeguard data by means of backup and/or replication. Examples are tape backup, hard disk backup, electronic vaulting, replication, and mirroring.

Authentication and Access Control: Refers to the tools and formats to express which users and applications have which type of access (read, add, modify) to which data. An SQL privilege corresponds to the right to use certain SQL statements such as SELECT, INSERT, DELETE, UPDATE, etc., on one or more database objects. Views are part of the external data model. A view is defined by means of an SQL query and its content is generated upon invocation of the view by an application or by another query. In this way, it can be considered as a virtual table without physical data tailored toward the needs of one or more applications or users. An important condition for adequate access control is the availability of authentication techniques, which allow for unambiguously identifying the user or user category for which the access rights are to be established. The most widespread technique is still the combination of a user ID and password, although several other approaches are gaining ground, such as fingerprint readers or iris scanning.

Guaranteeing Confidentiality: Ensures that users and other parties cannot read or manipulate data to which they have no appropriate access rights. This is the data security concern most closely related to privacy. One possible technique here, especially in the context of analytics, is anonymization, which is the process of transforming sensitive data so the exact value cannot be recovered by other parties. Another important tool is encryption, which renders data unreadable to unauthorized users that do not possess the appropriate key to decrypt the data back into a readable format.

Auditing: Is critical to keep track of which users performed which actions on the data (and at what time). Most DBMSs automatically track these actions in a rudimentary fashion by means of the log file. Regulated settings require an advanced form of auditing, with extensive tracking and reporting facilities, and maintenance of a detailed inventory of all database accesses and data manipulations, including the users and user roles involved.

Mitigating Vulnerabilities: This class of concerns pertains to detecting and resolving shortcomings or downright bugs in applications, DBMSs, or network and storage infrastructure that yield opportunities to malicious parties to circumvent security measures. A very important concept in the context of DBMSs is avoiding SQL injection, in which one injects malicious fragments into normal-looking SQL statements. The well-known three-layer database architecture consisting of an internal, logical, and external layer is also instrumental to this purpose. Hiding implementation details from users and the outside world by means of logical and physical data independence makes it much harder to discover and exploit potential vulnerabilities.

For more information on this topic, refer to the upcoming book, Principles of Database Management, by Wilfried Lemahieu, Bart Baesens, and Seppe vanden Broucke, at www.pdbmbook.com.

In addition, videos to accompany each chapter, are presented on this YouTube channel:


Subscribe to Big Data Quarterly E-Edition