Video produced by Steve Nathans-Kelly
Today, DBAs who have traditionally managed relational database systems such as Oracle and SQL Server are also managing companies’ "non-relational" databases. As platforms such as MongoDB and Cassandra are coming under the management of enterprise IT, a new set of skills is required.
In a recent Data Summit 2018 talk, titled “NoSQL Concepts for the Relational DBA,” Jason Hall, senior systems consultant, Quest Software, explored concepts familiar to relational DBAs, such as data modeling, high availability, and scalability, and discussed how those concepts translate into NoSQL platforms.
Looking at scaling, Hall said that traditional relational systems often hit a limit in regard to how high they can theoretically scale—both a technical as well as a cost limit. Horizontal scaling is often a more cost-effective and performant way to improve performance. Non-relational systems such as MongoDB and Cassandra have been built from the ground up to offer a much simpler implementation of horizontal scalability.
However, he noted, non-relational systems are, of course, not the only to scale data. “We've been scaling database servers for dozens of years without this non-relational concept. “
There are there are three ways to improve database performance,” said Hall.
The first way is tuning an application design. This is sometimes forgotten because people have become so accustomed to throwing hardware resources at a system that they just forget, he noted. “But that's always going to be, in my opinion, the best way to improve performance. Let's have a well-tuned, efficient application that can run on as little hardware as possible.”
Beyond that, he said, there has also always been the ability to vertically scale a database server. Vertical scaling is basically throwing hardware at a system. “Let's take my one SQL server instance and if it's slow, let's just add some memory to it. And, if it's still slow, let's add some CPU. And if it's still slow, let's move its data files to all-flash disk arrays. And at the end of that process, it's probably faster. Is it fast enough? Maybe?”
Now, vertical scaling is a good option and it's an easy option, but it's extremely expensive. And the more cloud starts to infiltrate our environments, the more expensive it becomes, Hall said.
The third option is horizontal scaling. “Now, we could horizontally scale with relational systems. I'm not going to stand here and tell you that you can't do this with Oracle or MySQL or SQL Server. But the fact of the matter is, it's really, really hard.” There's a halfway point here that most relational systems have around table partitioning that allows you to take one table and split it among multiple disks. That's a good option for balancing I/O, but you still have that data living on a single server, he said.
This is where the fundamental concept of what big data is comes in. “When I can't store data efficiently in a single table, that's big data to me,” said Hall. And, when that happens, horizontal scaling allows you to take the data, this one table, and split it between multiple servers. “Again, you can do that in a relational model; it's just very hard. The application logic is going to have to determine where to write data and where to read data, balancing that data over time gets really difficult. I think Oracle, even in some of their latest versions, has some automated charting. I just think it's really, really hard in a relational model.”
To access more Data Summit 2018 videos, go to www.dbta.com/DataSummit/2018/videos.aspx.
Many PowerPoint presentations from Data Summit 2018 have been made available for review at www.dbta.com/DataSummit/2018/Presentations.aspx.
Data Summit 2019, presented by DBTA and Big Data Quarterly, is scheduled for May 21-22, 2019, at the Hyatt Regency Boston, with pre-conference workshops on May 20.