When to Scale Your Database

Sep 10, 2014

By Cory Isaacson

If you are like many people, your current application environment or experience lies mainly with a single monolithic database tier running on a single server. You probably use a simple replication facility for reliability, but beyond that lies a whole new frontier.

We know that the database tier forms the last mile of a truly scalable architecture. Scaling other tiers of your application is easy to do.

On the other hand, scaling the database tier can be very challenging, and requires a far greater understanding of the available options and how it all works.

For example, when you scale your database tier, you are talking about implementing a database cluster, operating across many nodes or servers to take advantage of partitioning and parallel processing. With such a system, you are not only adding scalability to your database, you are also adding potential failure points, and a far higher burden of system management responsibility. Because a database cluster operates across many independent servers, you must be prepared to tackle a far greater challenge than presented by a single monolithic database server.

How Do You Know When to Scale Your Database?

When should you scale your database? The short answer I most often recommend is: Don’t scale until you have to.

I know this may seem counterintuitive coming from a book on big data scalability, but many times I have encountered database performance problems that can be readily solved with simple tuning, query optimization techniques, or cost-effective hardware upgrades.

The more complete answer to the question is to scale your database after you have exhausted other options to address your performance issues in your monolithic single-server database environment. In other words, tackle all of the obvious performance optimization steps in your current environment before you make the jump to a scalable database cluster.

That said, if you have a fast-growing application that will continue to tax your database tier at an ever-increasing rate, then it is time to develop your plans for a scalable database tier. In other words, regardless of your current situation, the time to plan for a database cluster is now—before it is too late and under total emergency or outage conditions.

Moving to a fully scalable database cluster is an important decision. It can be a big project in terms of financial resources, system resources, and the one commodity we always seem to be short of—time. It needs to be well planned and thought out to ensure you implement a scalable database strategy that meets your needs.

Next, I will review the important basic options for addressing database performance issues. Then I will briefly review the key indications you should be aware of for determining when you need to scale.

Options for Increasing Database Performance

Before I delve into knowing when to scale your database, it is important to understand all of your options for improving database performance. There are several methods to obtain increased database performance, and it is important to understand each so you can use the right method for your particular situation when required. Here are the common options available:

Performance optimization on your monolithic database
Vertical scaling
Read scaling
Implementing a full big data scalability environment (horizontal scaling)

These options are listed in the order of easiest to most difficult; therefore, you should evaluate and implement your database performance strategy in the priorities shown above. In particular, the final option for a complete big data scalability environment can involve a lot of time, effort, and cost. While this is by far the most powerful option (and the one that should ultimately solve any performance issue you are encountering), it is important to consider all of the “quick fixes” before you move to this level.

NOTE: There are definite exceptions to this recommendation. In some applications, even some brand new projects, we know upfront that data and transaction volumes will be tremendous with fast growth. In these cases, you can (and should) move directly to a big data scalability environment as your first step.

Implementing a Full Big Data Scalability Environment

Now we get to the most exciting and comprehensive option: horizontal scaling with a big data infrastructure. I have left the best for last, but of course this is also the most powerful option, presenting the biggest implementation challenge as well. This is the only long-term solution for a fast-growth, high-volume application, so if that is your scenario it’s time to get down to business and figure out the best big data approach for your situation.

When you encounter heavy writes with high transaction volume, this is almost certainly the option you need to pursue. In other cases, a read-intensive application can also benefit from a big data approach.

When implementing a horizontal scaling big data infrastructure, we are talking about effective partitioning of your data across a number of servers.

Note that all previous approaches were really limited by what a single monolithic database server can accomplish. With horizontal big data scaling, we are breaking that barrier in a big way. When this approach is done right, the results are miraculous, supporting your application and growth for some time to come.

The end result of a big data scalability implementation is a true database cluster—a group of coordinated servers managed as a unit. With a database cluster you have the flexibility to expand the cluster (or contract it) to meet the performance and growth needs of your application. Further, a database cluster must be highly available, ensuring that interim server failures do not affect live operations. This is a big goal, long-sought after, but with today’s innovations in the big data arena it is not only achievable, I believe it is what architects will rely on more and more as time goes on.

It is important to understand how a scalable big data cluster functions, what its strengths are, and—even more important—the type of limitations you may run into. There are many, many approaches to this problem, and not all are good for a given application situation.

Indications of the Need for Scale

There are many factors that must be considered, each of which indicates that it is time to scale your database. The items covered in this section are far from exhaustive but do provide the most important things to evaluate.

A primary (and sometimes painful) indication that scaling your database tier is overdue is user complaints. If you are already at the point where your user experience is adversely affected by slow performance or, worse, by outright service outages, you must respond very quickly indeed. The best advice, of course, is to detect issues well before it gets to this point and plan your scalability strategy accordingly.

There are many early items you can monitor that can serve as “red flags” for where your problem areas are, thus leading to a timely resolution. I have found that systems start to “creak” well before they “fall over,” and hopefully these pointers will provide a list of important items you should monitor.

The first place to look is basic system metrics. Given that the fundamental components of a computer are CPU, memory, and disk, evaluating metrics for these components can steer you in the right direction. There are numerous tools available for obtaining statistics on these elements, and they should provide your first early warning system. As mentioned earlier, nmon is a simple open source command-line utility to help with this, but there are many other excellent commercial and open source products with very nice UIs to highlight problem areas. In the past few years there has also been an increase in the popularity of system monitoring services—tools such as New Relic and AppDynamics—to constantly monitor your application. Whatever option you pick, make sure it is a toolset you are comfortable working with and will cover the basic stress points of your application infrastructure.

Key system metrics such as CPU, memory, and disk I/O are critical to early detection of problems with your database tier.

However, and just as important, you should pay close attention to increasing latency in your application. The latency indicators to watch can be broken down into a few basic categories.

Slow Read Queries

Every good database administrator has a tool to monitor slow queries, whether through application metrics or a slow query log. If you start seeing escalating response times, you need to jump on this right away to find the underlying cause. Very often this is due to normal table or volume growth, but it can also point out some gnarly inefficiencies (e.g., table scans) in your database access code.

If the cause is table scans, you can normally fix this with database tuning and schema changes. On the flip side, if you start seeing increased read latency for routine, well-indexed queries, then this is likely due to genuine growth in application volume and database size. This is a great position to be in for a company or fast-growing organization, but it can spell disaster for the database tier if not

addressed promptly and effectively. Database slowdown is inevitable when you start growing, and the faster you grow the more alert you need to be to this type of issue. The answer to a fast-growth database is, of course, to implement one of the scaling strategies outlined earlier. To reiterate, a hardware or configuration fix may work temporarily, but for long-term sustained growth you must implement a full scaling strategy, ultimately headed toward the goal of a full database cluster.

Slow Database Writes

Database writes and disk I/O are the most important things to monitor in your database infrastructure. If you see increasing latency increases for database writes, unless there is something awry with your transaction scope, indexing strategy, or database schema design, or there are other gross inefficiencies, this is almost always a dead-on indicator that you must scale to a full big data cluster implementation.

A common cause of slow writes is index updates on a large table or object store. I have often seen companies struggle through hardware upgrades and other temporary fixes, only to quickly become dismayed that all of the cost and effort were for naught. This is because storage infrastructures typically can only perform up to a certain maximum speed, even with the best technology available.

Therefore, when you see slow writes in the absence of serious application code or database schema inefficiencies, you are likely at your limit and need to address the situation fast.

This article is extracted and adapted from the e-book, “Understanding Big Data Scalability: Big Data Scalability Series, Part I” (Prentice Hall, 2014) by Cory Isaacson, CEO/CTO of CodeFutures Corp., maker of dbShards. For more information about the book, go to www.codefutures.com/big-data-scalability-series.