Achieving Exadata-Class Performance in the Public Cloud

By Behnam Eliyahu, Technology Evangelist, Excelero

Apr 13, 2022

For years, Oracle Exadata has been the hardware/software platform of choice for running Oracle databases—a resource deployed when organizations are looking to simplify digital transformations, increase database performance, and reduce costs. However as enterprises continue their cloud migration, questions arise about how to effectively migrate database workloads off of Oracle’s Exadata Database Machine and onto the public cloud. Technology executives sizing up the risk against the often-significant rewards are particularly concerned about database performance, resiliency, and cost.

The recently-debuted Exadata lineup for the Oracle Cloud won analyst kudos for significant enhancements, but it requires a commitment to the Oracle Cloud.

The challenge is to ensure adequate performance for the massive databases and analytics work on Exadata systems—mission-critical workloads with zero tolerance for a jump in latency or lags in end-user response times. This is especially a concern in industries such as financial services where timely reports and data are absolutely critical. The weak link among the mighty triumvirate of compute, network, and storage is usually storage. With such data-intensive workloads, achieving performance, ensuring resiliency, and maintaining cost-efficiency, including the requirements of Oracle’s licensing model, is a tall order—particularly in a cloud-based modality.

Three Considerations

The lessons of recent deployments of Exadata-based workloads into the public cloud show three critical parameters that must define the short list of architectural approaches.

1) Performance for data-intensive workloads

Transactional workloads typically require low latency, i.e., the ability to read and write small amounts of data as fast as possible. In contrast, data warehousing workloads rely on high throughput, i.e., pumping large volumes of data through the database at the highest possible bandwidth. Achieving adequate speed of analysis is the first challenge for running Oracle databases in the cloud. It is very important that data performance is predictable through the upper limits of access speed and throughput by the cloud providers. To maintain good user experience and ensure that queries or batch jobs execute quickly, it is essential that data can be read and written at high speed so that individual operations complete as quickly as possible.

New NVMe SSDs enable an uptick in performance and decrease in total cost of ownership. Advances in the software that runs them now allow even lower latency and higher throughput. Yet, even with this technology available, the biggest gap in the cloud providers’ product portfolios today is in the area of high performance relational databases, primarily Oracle Database and Microsoft SQL Server. The PaaS solutions are designed for the average workloads, not the high end. A complex database running on, for example, Oracle Exadata, will struggle to run on a “vanilla” infrastructure as a service (IaaS) deployment—while the refactoring required to take that database and migrate it to Managed PostgreSQL is almost unimaginable.

For example, one organization deployed 6 HBv3 Microsoft Azure virtual nodes for high speed storage for a single Oracle application node, which also ran on an HBv3 machine. The team used the HBv3’s RDMA connectivity to form a distributed pool of NVMe storage (using NVMesh on Azure) with the remaining HBv3-enabled nodes. It pooled the 14 NVMe drives from the seven machines and created a unique mirrored RAID-10 volume. Each HBv3 node had two NVMe drives with 1TB of capacity. As data is mirrored for protection, each HBv3 node effectively provided 1TB of the highest performance storage.

Before going live, the IT team used SLOB for Oracle tailored benchmarking, with 20 schemas of 125GB each. Local and remote NVMe access times were 100 microseconds. Storage access bandwidth could reach 27 GB/s^[1]. The database could realistically perform more than 1,300 transactions per second using some 800,000 database I/O operations per second^[2]. This was sufficient for extremely challenging database workloads.

This cluster of HBv3 nodes could access remote storage also. It could use the HBv3’s 40Gbps Ethernet NIC to access LSv2 machines and Premium SSDs. This enabled generating a second performance tier from the LSv2 media, relaying up to 5GB/s of such storage from each HBv3 node. A third performance tier from the Premium SSD tier, could provide up to 4GB/s.

2) Add resiliency and lower risk

Many on-prem Oracle database customers incur considerable license costs to use the Real Application Clusters (RAC) option, which allows highly available solutions to be built across multiple nodes. In the public cloud however, Oracle RAC is supported exclusively in Oracle’s own cloud and customers who wish to move to Azure or AWS have to find alternative methods to architect for resilience.

To overcome this limitation and achieve high availability, enterprises consider relying on the Oracle Data Guard feature in the Enterprise Edition—including both switchover and failover cases to a standby database which exists on a different availability zone.

3) Cost-efficiency

A significant consideration when moving away from Exadata is the impact on Oracle licensing costs. Oracle’s core factoring table will increase the number of required licenses when running in the cloud. In addition, moving away from Exadata will mean the resource doesn’t have access to Oracle’s Hybrid Columnar Compression (HCC) technology, and the now-uncompressed database will expand and thus cost more. Lastly, the need to create database clones, as required for CI/CD processes, may mean that the production copy consumes more storage capacity and takes more IT time to manage.

To avoid licensing costs becoming a barrier to successful cloud migration, teams need to use constrained VM sizes which restrict the VM instance so that it only uses a subset of the total available cores. The deployment saves costs on unnecessary database licenses, although it still has the full cost of the VMs.

This approach satisfies demanding workloads while keeping the total cost of ownership in check. The set of configurations ranges from ultimate performance with multi-node reliability, to reliable high performance on large databases. By providing such a range for critical database environments, decision makers can make the right choice for their needs.

New Options Address the Trend

In industries such as financial services where “de-risking the environment” is a mantra, and where competitive demands require the infrastructure to support outperforming the organization’s peers, IT leaders face corporate imperatives to diversity data center equipment beyond any one vendor, as well as lower costs. Migrating Exadata-class workloads to the public cloud is a strategy that is increasingly being explored. New technologies, particularly for storage, are de-risking the migration, while laying the groundwork for top performance at lower complexity and cost.

^[1] Read bandwidth in OLAP test—100% table scan and no updates with 400 SQL processes

^[2] OLTP test—0% table scans and 20% updates—with 240 SQL processes