Amazon Web Services Inc. has announced the limited preview of Amazon Redshift, a managed, petabyte-scale data warehouse service in the cloud, which aims to enable customers to increase the speed of query performance when analyzing data sets using the same SQL-based BI tools they use today.
“Over the past 2 years, one of the most frequent requests we’ve heard from customers is for AWS to build a data warehouse service,” says Raju Gulabani, vice president of Database Services, AWS. “Enterprises are tired of paying such high prices for their data warehouses and smaller companies can’t afford to analyze the vast amount of data they collect (often throwing away 95% of their data).” According to Gulabani, Amazon Redshift not only lowers the cost of a data warehouse, but also makes it easy to analyze large amounts of data quickly. “While actual performance will vary based on each customers’ specific query requirements, our internal tests have shown over 10 times performance improvement when compared to standard relational data warehouses. Having the ability to quickly analyze petabytes of data at a low cost changes the game for our customers.”
Amazon Redshift manages the work needed to set up, operate, and scale a data warehouse, from provisioning capacity to monitoring and backing up the cluster, to applying patches and upgrades. According to the company, Amazon Redshift is also priced cost-effectively to enable larger companies to substantially reduce their costs and smaller companies to take advantage of the analytic insights that come from using a powerful data warehouse. Through the AWS Management Console, customers can launch a Redshift cluster, starting with a few hundred gigabytes and scaling to a petabyte or more, for under $1,000 per terabyte per year – which Amazon notes is one-tenth the price of most data warehousing solutions currently available.
Amazon Redshift uses a number of techniques, including columnar data storage, advanced compression, and high performance IO and network, to achieve higher performance than traditional databases for data warehousing and analytics workloads. By distributing and parallelizing queries across a cluster of inexpensive nodes, Amazon Redshift makes it easy to obtain high performance without requiring customers to hand-tune queries, maintain indices, or pre-compute results. Amazon Redshift is certified by popular business intelligence tools, including Jaspersoft and MicroStrategy. Over 20 customers, including Flipboard, NASA/JPL, Netflix, and Schumacher Group, are in the Amazon Redshift private beta program.
“The Amazon Enterprise Data Warehouse manages petabytes of data for every group at Amazon. We are seeing significant performance improvements leveraging Amazon Redshift over our current multi-million dollar data warehouse," notes Erik Selberg, manager of the Amazon.com Data Warehouse team. “Some multi-hour queries finish in under an hour, and some queries that took 5-10 minutes on our current data warehouse are now returning in seconds with Amazon Redshift. Early estimates show the cost of Amazon Redshift will be well under 1/10th the cost of our existing solution. Amazon Redshift is providing us with a cost-effective way to scale with our growing data analysis needs."
According to Amazon, Amazon Redshift includes technology components licensed from ParAccel and is available with two underlying node types, including either 2 terabytes or 16 terabytes of compressed customer data per node. One cluster can scale up to 100 nodes and on-demand pricing starts at just $0.85 per hour for a 2-terabyte data warehouse, scaling linearly up to a petabyte and more. Reserved instance pricing lowers the effective price to $0.228 per hour or under $1,000 per terabyte per year – less than one tenth the price of comparable technology available to customers today. To learn more and sign up for the limited preview of Amazon Redshift, visit http://aws.amazon.com/redshift.