Resilience Is a Key Quality for Databases … and DBAs

In today’s tumultuous computing landscape, the DBMS (database management system) remains core and central to the IT infrastructure. And, as such, it is central to the applications used by modern organizations to conduct business and service their customers. So, it stands to reason that a core competency for a DBA is to be able to keep their organization’s databases up and operational. This requires resilience.

Resilience regarding database systems refers to the ability to:

  • Recover from failures.
  • Adapt to changing conditions.
  • Maintain functionality even when faced with disruptions.

To ensure that database systems can recover from failures, DBAs must first understand the availability needs of their data in business terms. How rapidly must we be able to recover from a failure? Keep in mind that the failure could be either physical, such as a failed disk drive, or logical, such as applying the wrong input to a process, which corrupts the database.

Not all applications are mission-critical and demand immediate recovery. Only after we know the impact to the business can we develop an appropriate backup and recovery plan. DBAs need service level agreements (SLAs) for recovery just like we have SLAs for performance. The recovery SLA, or recovery time objective (RTO), needs to be from an application perspective, such as, “Time to restore application availability after a failure for application X cannot exceed 2 hours (or 10 minutes or …) …” To create effective RTOs, you must be able to answer the question, “What is the cost of not having this data available?” When we know the expectations of the business, we can work to create a backup and recovery plan that matches the requirements. There are multiple techniques and methods for backing up and recovering databases. Some techniques, while more costly, can enhance availability by recovering data more rapidly.

It is imperative that the DBA team creates an appropriate recovery strategy for each database object. This requires mapping database objects to applications so we can adopt the proper strategy in accordance with RTOs. Some database objects will participate in multiple applications, and their recovery strategy will therefore be more complex.

Furthermore, DBAs must be able to adapt to changing conditions. More users, more data, changing requirements, reacting to changing regulations, and more can cause DBAs to quickly adapt and modify their implementations. There are many tactics DBAs can use to build databases that adapt to changing conditions. Partitioning and sharding data can improve scalability and performance by distributing data across multiple servers. This design allows for easier scaling as data volumes increase or when there are changes in access patterns.

Additionally, DBAs can implement automated monitoring and tuning mechanisms to detect changes in workload patterns or performance bottlenecks. Using tools and scripts that can automatically adjust database configurations based on real-time data and workload analysis can protect the system from experiencing downtime due to changing conditions.

Another helpful tactic for DBAs is implementing version control for database schemas and configurations. This enables DBAs to roll back changes or deploy new versions of the database with minimal disruption in case of issues or changes in requirements.

Depending on the nature of the applications being supported, additional techniques can be used to bolster adaptability, including flexible data modeling, using NoSQL and schema-less database designs, and deploying dynamic schemas that can evolve as data requirements change.

And the choice of architecture for the application, as well as the DBMS, can have a profound impact on your ability to quickly adapt to change. Consider adopting a microservices architecture in which different components of the application have their own databases. This approach allows for more granular control over databases and easier adaptation to changes in specific parts of the system. Also, leveraging cloud-native architectures and services that offer elasticity and scalability can help.

The third consideration is maintaining functionality even when faced with disruptions. Some tactics here have been part of the DBA’s job responsibilities since Day One, such as implementing a disaster recovery plan, deploying monitors with alerts to signal impending problems in advance, and conducting regular testing and simulation exercises to validate recovery and migration plans and evaluate the effectiveness of their procedures.

The Bottom Line

By combining these strategies, DBAs can build systems, databases, and applications that are recoverable, more adaptable to changing conditions, and more resilient in the face of problems. Of course, this is a long list of tasks and responsibilities, so it is important that DBAs themselves are also resilient. By this I mean that DBAs should be optimistic, adaptable, and resourceful … as well as equipped with problem-solving skills. Armed with resilient systems and resilient DBAs, organizations can confidently continue to serve their customers. And isn’t that the bottom line?