Tips for Accelerating Analytics with Oracle Database In-Memory

May 4, 2016

By Jim Czuprynski

The latest release of Oracle Database (12.1.0.2) offers a unique set of features that portend increases in application workload execution, especially for analytics and data warehousing queries. This release, debuts Oracle Database In-Memory which provides a new columnar format - the In-Memory Column Store (IMCS) - for data that is likely to be accessed regularly for aggregation or analysis, as well as other features such as In-Memory Aggregation and In-Memory Joins that potentially offer several orders of magnitude of performance improvement. Finally, the new In-Memory Advisor makes short work of determining exactly which database objects are most likely able to take advantage of the IMCS.

In-Memory Column Store: Turning Data “Sideways” To Improve Query Performance

Before Oracle Database Release 12.1.0.2, the only way that a SQL statement could access data from a table or partition in memory was through the database buffer cache in the System Global Area (SGA) as complete in-memory images of the database blocks that hold data needed for the operation in row-major format. Starting with Oracle 12.1.0.2, however, it’s now possible for the first time ever to retain the data from tables, table partitions and sub-partitions, and materialized views within a new structure - the In-Memory Column Store (IMCS) – in a columnar format instead. Figure 1 shows how the IMCS is constructed within the SGA.

Oracle Database uses the following logic to populate data for candidate objects into the IMCS on the fly:

If the data a statement is requesting is part of an existing table or partition that’s been marked as a candidate for IMCS, then the database determines if the data being requested is already in the IMCS.
If it’s not, the appropriate data is retrieved –via a physical read from storage, it’s transformed from row-major storage into columnar format – and populated into the IMCS, assuming there is spare capacity in the IMCS. If in-memory compression has been requested, then the data is also compressed at the level specified for the object.
Future queries will scan and filter data directly from the IMCS and decompress only what is required to satisfy the requesting statement. However, queries can take immediate advantage of the data already populated into the IMCS while it’s still being populated, thus providing improved query performance even as population completes.

While it’s certainly possible to populate candidate objects into the IMCS “on the fly,” most customers have found that it’s simplest to determine the best candidates first and then allow those objects’ data to populate automatically during database instance startup.

Note that the data is still retained in row-major format in the buffer cache, and this is actually good news because it lets the buffer cache do what it’s best at – maintaining data via data manipulation language (DML) – while providing the same data in columnar format for analytic queries.

Another big advantage of this approach is that the indexes that Oracle DBAs have typically applied to tables - to either speed data retrieval, apply filtering to row sets, or join data sources together - can often be eliminated completely. Not all indexes are evil, of course: they are excellent for fast lookups of single rows, for maintaining uniqueness as part of referential integrity, and for cross-reference in parent-child relationships, but they are among the most expensive structures to maintain in just about any RDBMS, especially during DML. IMCS essentially eliminates the need for many indexes, and that promises improved DML response times as well.

The biggest benefit of using IMCS is what it doesn’t require: Not one line of application code needs to change to take advantage of this feature. Once an object has been placed within the IMCS, the Oracle optimizer is intrinsically aware that it can leverage data retained in the IMCS – no pesky optimizer hints or directives required.

Figure 1. Basics of Oracle In-Memory Database: The In-Memory Column Store.

Activating IMCS

Enabling IMCS is extremely simple, as it requires modification of only one initialization parameter (INMEMORY_SIZE) and then performing a bounce of the database instance. For example, this code snippet shows how to set the IMCS size to 4 GB and then bounce the database to create the IMCS cache:

Note that the INMEMORY_CLAUSE_DEFAULT parameter can also be set to a value that applies a default setting for any table or materialized view that’s created in the future, as this code snippet shows: To place an existing object within the IMCS, only the object’s INMEMORY setting needs to be changed, as this code snippet shows:

As a result, Oracle will now attempt to place table SH.CUSTOMERS into the IMCS. If there is enough space remaining in the IMCS, the object will completely reside there; however, if insufficient space is available, then as much of the object’s data as possible will be populated within. This is not necessarily a drawback because even if, say, only 75% of a one-billion-row table could fit within the IMCS, the remaining 250M rows could still be retained within the database buffer cache if space there allowed.

One question that I often hear about IMCS is, “Well, what happens when I update data in the underlying table? Doesn’t that mess up the contents of that object within IMCS?” Happily, there is no impact to the existing columnar structure; what happens instead is that just the changed data is recorded within the IMCS’s transaction journal. This means that a query can access data even after a table has been changed; Oracle simply looks at the deltas recorded in the transaction journal as of the appropriate SCN of the DML instead of the original data. (IMCS also periodically resynchronizes the “stale” data in the IMCS without any intervention by the DBA, and the refresh rate is actually tunable if the need should arise.)

Advantages for Analytic and Data Warehousing Queries

There is a literal cornucopia of advantages that IMCS provides for just about any type of SQL query, but it is especially useful in data warehousing environments and for heavy analytic processing.

In-Memory Compression. IMCS offers five different levels of in-memory compression, and that means a compute node’s relatively precious DRAM is preserved. Compression factors vary based on the type of data retained in the table’s columns, the number of NULL values, and the compressibility of the data itself. The DBMS_COMPRESSION utility has been upgraded to estimate just how much compression can be reasonably expected for a targeted object based on the object’s size, its columns’ attributes, and the desired level of compression.

In-Memory Joins and Filtering. Joining data sources together – especially using hash joins when data sources are enormous – can consume equally huge amounts of memory and processing cycles. IMCS can leverage the concept of Bloom Filters – present “under the covers” in the Oracle optimizer since Oracle 10gR2 – to convert the join to a filter that can be applied very efficiently to a massive row sources, thus reduce processing cycles. Also, IMCS has adapted and leveraged the Oracle Exadata concept of storage indexes to identify exactly where data isn’t present and often completely avoids scanning vast sections of database objects when applying selection criteria filters.

In-Memory Aggregation. Designed especially for star schema query performance improvements, this feature leverages a completely new set of optimizer methods collectively known as vector transformation to quickly identify the matching dimension keys for the query and then apply those limiting filters to the fact table. Vector transformation leverages single-instruction processing multiple data values (SIMD) vector processing so that it’s essentially possible to read all of the possible values in one or more dimensional row sets at one time. This reduces the need for complex bitmap join index structures to weed out non-matching dimension data. SIMD vector processing can also benefit analytic functions that require aggregation – SUM, COUNT, AVERAGE, and many others – especially when huge data volumes are involved.

In-Memory Advisor: Helping Hand for IMCS Placement

It’s important to remember that the IMCS is a memory column store within the database’s SGA that requires an instance restart to establish; furthermore, once it’s been sized, it cannot be resized without another bounce of the instance. It’s therefore important to establish exactly which tables and table partitions will benefit most from being populated into the IMCS, and – more importantly – which ones shouldn’t be placed within it.

The good news is that as of October 2015, Oracle offers a new feature set called the In-Memory Advisor that makes relatively short work of determining which objects would most benefit based on an observed workload. It does require the installation of some additional database objects and the creation of a new schema owner, but it takes only seconds to complete.

Once installed, SQL script imadvisor_analyze_and_report.sql will prompt for the duration of the “lookback period” on which it should construct its advice, as shown below:

The In-Memory Advisor produces a series of informative HTML-based reports that detail:

The approximate recommended size of the IMCS
Which objects should be placed in the IMCS and why
Estimated in-memory compression ratios for recommended objects
The Advisor’s rationale for its recommendations

It also provides a SQL script that, when executed, will alter the necessary database initialization parameters and even add the recommended objects at the appropriate memory compression level to the IMCS.

One final note: Please be aware that the In-Memory Advisor and the In-Memory Column Store do required additional Oracle licensing.