Reaching for Highly Scalable Systems with SQL Server 2008

SQL Server has supported VLDBs (very large databases) for some time now. Back in the SQL Server 2000 days, I recall hearing multi-terabyte databases were unusual but doable. Now, they are commonplace, while databases in the hundreds of terabytes inhabit the part of the map that says "there be dragons." While VLDBs are quite common on SQL Server today, highly scalable systems that can be flexibly extended in the same fashion as Oracle/RAC are less so. So, how do you design a highly available architecture for SQL Server if it's not like Oracle/RAC.

Available third-party tools include xkoto's GRIDSCALE technology. Older architectures also can be implemented for a scaled-out application such as distributed partitioned views, or a two-phase commit using DTC (data transaction coordinator). Since those older technologies involve various costly tradeoffs in performance, maintainability, or implementation, and third-party products can be costly, what other options exist for a SQL Server application?

You can exploit some new features quickly and easily if you have a specific non-OLTP scenario requiring flexible scalability. For example, you can use the scalable shared database architecture, which requires a SAN and is described at , to build highly flexible and extensible hardware infrastructure for reporting applications and Business Intelligence (BI) data marts. The database must be in read-only mode most of the time, so its strongest applicability is to reporting and BI data marts.

You also can read white papers by the Microsoft SQL Server Customer Advisory Team (SQLCAT) that provide excellent techniques to scale out Analysis Services, with or without a SAN, at .

Through acquisition, Microsoft also is actively advancing its high-end architecture for database warehousing and BI. Microsoft recently acquired DATAllegro, a maker of data warehousing appliances, and, using DATAllegro's technology, Microsoft data warehouses can effectively scale to hundreds of terabytes, even reaching petabyte sizes.

Some analysts say the win for Microsoft is huge because the reach of its enterprise scalability now enters the rarefied air of the likes of Teradata, while others are not sure this is a sustainable, competitive differentiator for the long run. Another compelling aspect of the technology is its open architecture. You can select from a menu of hardware platforms, as well as the grid architecture, where you can distribute processing power and memory wherever it is needed. See the FAQ at .

As with any such acquisition, I suspect it will be 18-24 months before the software reaches a version 1.0 launch; so, working back from that date, early adopters should be working on the-product-formerly-known-as DATAllegro. The way it is released and packaged-whether as software, software plus hardware, or an entire turn-key appliance-remains to be seen.

It's clear, however, that DATAllegro software running on Windows Server 2008 and SQL Server 2008 will be capable of assaulting territory only Teradata could hold in the past. With Microsoft's cut-rate pricing policies, I'd expect this offering to cost a fraction of what Teradata charges. Also, since the product runs on inexpensive commodity hardware, even modestly budgeted IT departments will be able to afford high-end data marts with confidence, while maintaining the IT hardware purchasing standards of their favorite commodity vendors.

If you've been around for awhile, you'll recognize the pattern. Microsoft has repeatedly tackled the domain of a specific high-priced specialty market and made it a subset of the SQL Server product set. It happened in SQL Server 7.0 when Data Transformation Services first challenged high-cost ETL products. It happened in SQL Server 2000 when Reporting Services challenged expensive vendors like Crystal Reports. The SQL Server 2005 and Performance Point Server package is a strong challenger to Cognos. So, it looks like other vendors in the wider market should continue to glance over their shoulders.