Love it or hate it, MySQL is the most popular open source database and one of the top relational databases—and with good reason. It is an extremely feature-rich and powerful database platform.
With all its positives, however, MySQL has a number of challenges that can limit its ability to perform with the agility and performance businesses increasingly require. There are ways to overcome these challenges, but they’re not necessarily intuitive. Let’s take a quick look at the state-of-the-state and then dive into how you can optimize MySQL.
Relational, or structured, database management systems (RDBMS) have been with us since the 1970s and have served us extremely well. Yet, while RDMS has traditionally provided excellent data management services, its rigid data model can prove frustrating to New Economy companies that need to be able to quickly innovate and adapt to market demands and opportunities. There are many stories about the challenges of schema changes taking far too long to implement. One of the most infamous is that of Craigslist. As the MySQL shop grew to have more than 2 billion records in its databases, schema changes, such as adding a new column, often took months to complete.
Over the last 5 to 10 years, the market has responded to this problem with new unstructured (NoSQL) databases that attempt, among other things, to offer greater scale and flexibility, particularly with regard to the data model and schema. Yet they often sacrifice important features for what are relatively small gains in these areas. Many people view the idea of using NoSQL systems as an alternative to RDMS as flawed at the most basic level. For most businesses, data is precious and its management should be well-thought-out and disciplined.
The good news is that the MySQL community has also been reacting to the agility challenge. With the release of MySQL 5.6, you can make online changes to your schema (add/drop columns and indexes). Important new features like global transaction IDs, precision time values and query performance enhancements have also been introduced. For MySQL users who are bumping up against limitations that claim to be solved by NoSQL and NewSQL, it may be well worthwhile to stick with MySQL and follow its roadmap. If you aren’t running on MySQL 5.6 you should be.
Understanding Your Workloads in Relation to MySQL Options
There are several actions you can take to bring MySQL to a much greater level of flexibility, scale and performance. Many people struggle with how to accomplish this, however, because they don’t have a good handle on which of the many technology and configuration options are best for their deployment. These options include different distributions that offer clustering, replication and a variety of storage engines designed to handle different workloads. Your database workload and application demands should inform the decisions you make for your MySQL deployment configuration.
Pick a Distribution or Fork
The first challenge you come up against is picking a version of MySQL. MySQL Community edition is an open source product owned by Oracle. The official MySQL distribution is curated by Oracle and can be found at their website www.mysql.com. In addition to the "official" distribution, there are also a couple of MySQL forks. The two most prevalent are offered by Percona and MariaDB, and may be interesting alternatives depending on your workloads and your tolerance of the risk involved in moving off the official distribution.
The Percona distribution has remained very close to the standard distribution, mirroring each of their releases. Percona has made some enhancements to the default InnoDB storage engine, they call their version XtraDB, to improve transactional performance and deliver more consistent latencies. The changes in Percona are all "black-box" in nature, so there is no need to change your application to benefit from these enhancements.
MariaDB is the other major fork of the MySQL product. MariaDB has made extensions to the APIs that enable interesting new features such as virtual columns, table elimination and other query-related performance enhancements. Where Percona has kept its fork’s APIs identical to those of MySQL, MariaDB has extended them. These extended APIs allow MariaDB to offer new features faster, in many cases, than its MySQL parent. If you choose to migrate to MariaDB, be aware that if your applications start to take advantage of these API changes, it may be difficult to switch back to MySQL in the future.
Choose a storage engine
Selecting the right storage engine is crucial to optimizing MySQL and the applications that are built on it. Luckily, one of the more powerful features of MySQL is that it has a great interface for installing plugins, like storage engines, that add new capabilities. For those who may not be familiar with the role of the storage engine (lest you think this unique to MySQL, be advised that MongoDB just announced support for multiple storage engines!), it is where the majority of the data management occurs. It handles the transactional processing and rollbacks, as well as the heavy lifting for delivering query results. The storage engine also has a huge impact on the performance of MySQL for a given workload.
MySQL comes standard with several storage engines "out-of-the-box," with the default being InnoDB. It also has the legacy MyISAM, a memory only engine, among a few other obscure ones. There are also third-party provided engines, including TokuDB, WiredTiger and XtraDB, which are optimized for different workloads. InnoDB is an ACID compliant engine that, when properly tuned, will deliver decent performance but for specific use cases based on the specific optimizations selected. However, if you are running a use case that is write heavy (lots of inserts and updates) and has many indexes, InnoDB performance is likely to suffer as your data grows. In this case, selecting a different engine, or using multiple engines, may make more sense, depending on your workload.
For example, if you have a workload that has heavy inserts in certain tables as well as other tables that require complex queries, you might end up having to deploy (and manage) both TokuDB and InnoDB. In some cases, you may have hybrid workloads operating against the same tables, which may force you to pick the closest storage engine fit, even if it’s not ideal.
However, an emerging category known as adaptive engines, which sit under the MySQL head, radically changes its performance and scalability. These engines use machine learning to auto-adapt for hosts, workloads and flow, allowing performant responses at much greater scale. The database remains ACID-compliant, but can handle multiple concurrent workloads and deliver strong performance for both writes and reads, even with heavy indexing. Adaptive database engines can be used alongside or in place of MySQL’s default engine.
Now that you’ve picked the fork and the engine, you need to figure out configuration. Proper configuration of the MySQL database is extremely important, especially when you have a heavy workload. As an example, a quick test using sysbench demonstrates the impact that proper configuration can have. In a simple sysbench test on a MySQL instance running on an Amazon m4.2xlarge for a table with 1M rows, there were startling results. First, the test was run with the default configuration parameters, yielding a transaction rate of 362 transactions per second. Then the InnoDB engine was configured with better buffers and bigger log files, and the rate jumped to more than 3,000 transactions per second.
When selecting your storage engine, proper configuration is important so focus on getting it right or select one that requires minimal tuning or can self-tune. Of course, if there are many configuration parameters tuning can be difficult and engine-specific.
If we focus on InnoDB, the MySQL default engine, there are two important parameters:
- InnoDB_buffer_pool_size: The bigger the better, but keep it to less than 70% of available system memory. It may need to be smaller if other engines are in use and if other applications (web servers, CMS, etc.) are running on the database machine.
- InnoDB_log_file_size: Bigger is usually better but beware that big log files can have unexpected periodic drops in performance when the system is under load, resulting in erratic latencies. Also, in the event of a database crash, the recovery time will increase proportionately with the size of the log files.
Keep in mind that configuration isn’t one size fits all. Every time you make a change to the database or have new hardware, you have to reconfigure. Not only can this quickly become very time-consuming and slow down your ability to be as agile as the application requires, the configuration treadmill is never-ending. This means you’re basically guaranteed to have a database that is never optimally tuned.
To overcome the MySQL configuration problem, look to alternate aftermarket storage engines that can both improve performance and simplify tuning. Adaptive databases are self-configuring. They deliver flexible data structures that adapt to behavior based on observed changes in the data and operational capabilities of hardware resources, without manual intervention. Machine learning techniques continuously optimize how that data is organized in memory and on disk according to application workload and resource capabilities. Using the same sysbench test as above with an adaptive database delivers much greater performance.
Scale up and scale out
How do you make sure you won’t outgrow MySQL? You figure out how to best scale up and then scale out if necessary.
Scale-up describes the process of maximizing the capacity that a single MySQL node can handle. The process of scaling-up can involve optimally tuning your database software and choosing the right storage engine (as previously discussed), and selecting appropriate hardware. There are limits to how big you can scale-up a node, and they’re determined by some combination of data size, schema complexity, CPU cycles, system memory and disk IO. While, lately, scale-out has been garnering much attention because of the need to handle increasingly massive data sets, it is very important to remember that the better you can scale-up, the fewer scale-out nodes that will be required, and the less you’ll need to spend on hardware.
Scale-out can be used to deliver solutions that cover several different use cases. Some of most common are to increase read capacity by using replication or to use database sharding to increase total database size and overall transactional throughput.
Read-replicas can be used very effectively to greatly enhance an application’s read capacity. Think about a website like Wikipedia, for example, where there are far fewer editors than viewers. Read replication allows a single "Master" server to handle full read/write operations while providing for one or more read-only "Slaves" that can serve up content to thousands of viewers.
Sharding can be used to increase the maximum overall data size and row counts. Sharding is based on the idea that data can be split, or sharded, across multiple database servers. Facebook has arguably the largest MySQL systems, with thousands of database servers, and relies heavily on sharding to handle the billions of views, posts and updates that happen every day. A simplistic way to think about sharding is to imagine that there are 26 MySQL servers supporting the application, and all of the users whose last name begins with “A” are on server 1, those whose name begins with “B” are on server 2, and so on. The use of sharding requires that your application be aware of the sharding architecture in order to find the right ‘shard’ that holds the data you are seeking.
If we refer back to the earlier point about maximizing scale-up before you scale-out, it should be apparent that for a given dataset, the bigger a shard can be, the fewer the shards that will be required. In the previous example, if you could double the scale-up the capacity of each node, you could reduce the number of shards required from 26 to 13. This reduction in shards has obvious financial benefits, but it also reduces the workload of database administrators and overall operational cost of support.
MySQL Remains a Popular Choice
MySQL certainly has its challenges, but it is the richness of the solution that has made it such a popular choice, even in the face of NoSQL and NewSQL contenders. To overcome its challenges and optimize MySQL for your environment, you must first understand the workloads being serviced and have a well-defined data architecture. Then, you can make the "big" decisions – what engine, schema, replication strategy, and when and how to scale-out – that will enable you to optimize MySQL for your ever-changing environment.