Databases Without Bounds

Bookmark and Share

Resources used to be expensive. Resources used to be scarce. Resources used to take a long time to provision. As such, it made sense to put resource consumption at the top of the list when talking about database performance. Those days are gone. With more than 80% of databases running in virtual environments, where hardware is more commoditized every day, access to physical resources—CPU, memory, network, and disk—whenever needed is much easier. In fact, Moore’s Law predicts that technology advancements will double every 2 years. Well, most physical resources are certainly on pace with that, or better:

  • CPU evolution: Transistor counts through 2011 topped out at 2.6 million, up from 2,300 in 1971.
  • Memory availability: Practical limits put this number at around 256 terabytes. Do you recall when 64 kilobytes was awesome?
  • Network bandwidth: Keck’s Law and Butter’s Law of Photonics predict network bandwidth growing faster than processing, approximately doubling every 9  months.
  • Disk Speeds: With the advent of SSD technology, access to fast and plentiful disk storage (persisted bits) has ramped up significantly, although perhaps not with the quantum leaps of other physical resources.

These are just a few examples, as there are many laws out there projecting the advancement of technology. Regardless of whichone is followed, it’s undeniable that computing power and resources are growing at a tremendous rate, and that they are getting much cheaper. Is a world of unlimited practical physical resources available for our databases in the near-future so outlandish?

As a purely mental exercise, let’s say that’s the world we live in today. We have unlimited access to physical resources and, thanks to virtualization, it’s dynamic—we have instant access when needed—and automated. Since practical limits don’t exist, would you not want your infrastructure (hypervisor) to auto-allocate resources when you need them? No more 2 a.m. wake-up calls because someone kicked off a huge report during your ETL load that crushed CPU and memory. So, what comes next?

Processes stuck in a runnable state are a thing of the past. There are no more signal waits. CPU Ready Time? Gone! Disk queue length is drastically reduced, if perhaps not eliminated altogether. All databases become in-memory databases—all pages/blocks reside in cache for all objects and parsed statements. Sorts happen strictly in memory. Determining when to parse an execution plan again gets a facelift. Parallelism cost structures get refactored in all major RDBMS optimizers—why not parallelize the execution of all SQL statements if the overhead associated with processing using multiple threads plus the cost of executing the longest thread doesn’t exceed single threading? Cost-based optimization (CBO) itself gets a complete overhaul (arguably, this needs some TLC even in the database world as it truly currently exists).

We can also get much more horsepower for much less relative money than even a decade, 5 years, or 1 year ago. Access, thanks to virtualization, is quick and easy. Dynamic rebalancing of VM workload will automatically lead to the ability to dynamically reallocate resources at the guest operating system level in short order.

As database professionals, our time spent focusing on performance and tuning is over! What’s that? It’s not? Oh, yes, you’re correct—there are many other aspects of performance and tuning that aren’t so tightly coupled with physical resources. Examples include query tuning, transactional batch logic, transactional external dependencies, data modeling, indexing (but not over-indexing), multi-threaded concurrency (locking/blocking), driving tables when joining, inefficient plan steps, etc. Remember, inefficiencies and contention can still cause issues, even if resources are removed as a constraint.

What’s my point with this exercise? All of the clues point to a future where physical resources will continue to diminish as the primary constraint in our database infrastructure. Advocating for throwing hardware at performance issues would be too simplistic. Other factors come into play, too, such as  software licensing models (if processor-based), reduction in  carbon footprints, and concurrency constraints.It’s just nice to think of a world where there isn’t such an emphasis on physical resource constraints. After all, that world could be rapidly approaching.