Newsletters




New In SQL Server 2012: Columnstore Indexes


SQL Server 2012 includes a lot of new and exciting features.  One feature that has caught the imagination of many in the user community is the high-performance feature called Columnstore Indexes.  (Incidentally, it was also known as Apollo during its beta cycles).  Columnstore indexes, as their name implies, store indexed (and always compressed) data contiguously in columns, rather than in standard format where the data is stored contiguously on 8Kb data pages according to the rows in which the data resides.  Because of their structures, columnstore indexes speed up read-heavy operations like data warehouse queries from factors of 10x to 100x.  Wow!  (A performance benchmarking white paper can be found at http://download.microsoft.com/download/7/2/E/72E63D2D-9F73-42BB-890F-C1CA0931511C/SQL_Server_2012_xVelocityBenchmark_DatasheetMar2012.pdf).

Okay, Tell Me More

When it comes to querying a table with a columnstore index, there’s really no change from the standard ways of extracting data from SQL Server.  If the columnstore index exists, the SQL Server query engine will automatically utilize it wherever the cost-based optimizer deems it best.  (You can also use a query hint to force SQL Server to use the columnstore index). Columnstore index usage will show up in the query execution plan, just like any other type of index.  Columnstore indexes also compress the data, which they store in a way that is distinct from PAGE and ROW data compression. 

Columnstore indexes are for BIG, read-mostly tables.  Don’t consider using them for tables with less than one million records.  Assuming a particular table is used for data warehousing with lots of reporting, Microsoft recommends that you place columnstore indexes on all of the columns in a given table.  The recommendation essentially says “If you put one columnstore index on one column in a table, you might as well put them on ALL columns of the table.”

Part of the reason for this recommendation is that tables with one or more columnstore indexes are read-only.  If you need to update the data in a columnstore index, Microsoft recommends that you use partitions, and then use partition switching to occasionally update the data.  That way, if most updates are simply appending the newest data to the existing table, the partition switching process can easily accommodate it.  Microsoft also has created a document that discusses how to use a local-global aggregation to frequently perform smaller data loads into a columnstore.  Details on this so-called “trickle load” technique are explained at http://social.technet.microsoft.com/wiki/contents/articles/trickle-loading-with-columnstore-indexes.aspx

What are the Rules?

Columnstore indexes have a lot of rules.  For example, columnstore indexes may only be non-clustered indexes.  They cannot be created on data types of decimal or numeric with precision greater than 18, datetimeoffset with precision greater than two, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, or xml.  They cannot be created as filtered index, placed on computed or sparse columns, multiple columns at once, or on an indexed view.  Columnstore indexes are also created using a parallelized algorithm and somewhat more memory than is used during regular index creation.  Microsoft’s proscriptive guidance provides a lot more detail about how much memory is needed to create a columnstore index on any given table and column.  On the other hand, columnstore indexes are much smaller than B-tree indexes, often as much as a factor of 4x to 15x smaller. 

SQL Server 2012 has a lot of metadata about columnstore indexes.  For example, sys.indexes show column store indexes as type = 6 and type_desc as ‘NONCLUSTERED COLUMNSTORE.’  There also are two new system tables, sys.column_store_segments and sys.column_store_dictionaries, that offer a great deal more information about columnstore indexes existing within a given database.

Also note that with regards to statistics, columnstore indexes are handled similarly to B-tree indexes.  A deep dive into columnstore and statistics is available at http://social.technet.microsoft.com/wiki/contents/articles/7404.using-statistics-with-columnstore-indexes.aspx.  Best practice guidance for columnstore indexes on filegroups can be found at http://msdn.microsoft.com/en-us/library/gg605238.aspx.

A massive amount of practitioner information is available.  While I’ve already pointed out several good resources, be sure to check the Columnstore FAQ at http://social.technet.microsoft.com/wiki/contents/articles/3540.sql-server-columnstore-index-faq-en-us.aspx.  You’ll find a great many other resources in the FAQ, such as demo videos, full conference sessions, and performance tuning guides. 


Sponsors