The rising popularity of web 2.0 and cloud computing services has prompted a reexamination of the infrastructure that supports them. More and more people are using web-based communities, hosted services, and applications such as social-networking sites, video-sharing sites, wikis and blogs. And the number of businesses adopting cloud computing applications such as software as a service and hosted services is climbing swiftly.
With all this growth, internet data centers are struggling to handle unprecedented workloads, spiraling power costs, and the limitations of the legacy architectures that support these services. The industry has responded by moving towards a "data center 2.0" model where new approaches to data management, scaling, and power consumption enable data center infrastructures to support this growth.
These 2.0 data centers leverage standard low-cost X86 servers, Gigabit Ethernet interconnect and open-source software to build scale-out applications with tiering, data, and application partitioning, dynamic random access memory (DRAM)-based content caching servers, and application layer node failure tolerance.
These loosely coupled architectures have enabled service scaling-but at a very high price. Today's data centers are reeling from the high costs of power, capital equipment, network connectivity and space, and are hindered by serious performance, scalability and application complexity issues.
Advances in multi-core processors, flash memory and low-latency interconnects offer tremendous potential improvements in performance and power at the component level, but adapting them to realize such benefits requires major engineering and research efforts. These efforts unfortunately draw resources away from core business activities that drive revenue and profitability for web 2.0 and cloud computing companies. Because these companies must maintain focus on revenue-generating functions, demand for higher level building blocks that can exploit advanced technologies has never been higher.
Multi-core processors place many processors and shared caches on a single chip, providing very high potential performance throughput for workloads with thread level parallelism. To fully realize the benefits of advanced multi-core processors, applications and operating environments need to have many parallel threads with very fast switching between them. They also need to support memory affinity and have granular concurrency control to prevent serialization effects.
Flash memory is a non-volatile computer memory that can be electronically erased and reprogrammed. Flash memory has many promising characteristics-but also many idiosyncrasies. Flash memory offers access times that are 100 times faster than those of hard disk drives (HDDs), and requires much less space and power than HDDs. It consumes only 1/100th the power of DRAM, and can be packed much more densely-providing much higher capacities than DRAM. Flash memory is also cheaper than DRAM and is persistent when written, whereas DRAM loses its content when the power is turned off. Flash memory can be organized into modules of different capacities, form factors, and physical and programmatic interfaces.
However, flash memory access times are much slower than DRAM, and flash memory chips have write access behavior that is very different than their read access behavior. Flash memory writes can only be done in large blocks (~128 kB), and before writing, the region needs to be erased. Also, flash memory has limits on how many times it can be erased (~100k). As a result, small writes need to be buffered and combined into large blocks before writing (write coalescing), and block writes need to be spread uniformly across the total flash memory subsystem to maximize the effective lifetime (wear leveling).
The latency, bandwidth, capacity and persistence benefits of flash memory are compelling. However, incorporating flash memory into system architectures requires specific design and optimization-starting at the application layer, throughout the operating environment and down to the physical machine organization.
Incorporating Flash Memory into Overall System Architecture
A very high degree of parallelism and concurrency control is required in the application and server operating environment in order to utilize the tremendous potential I/O throughput and bandwidth offered by advanced flash memory technology. Also, flash memory driver, controller, device optimization and tuning are required to match to workload behavior, especially to access size distributions and required persistence semantics.
Interconnects have come a long way since Ethernet first became popular in the 1980s. Bandwidth continues to increase while latencies are steadily getting smaller. Today, Gigabit Ethernet (GbE) is standard on most server motherboards. 10GbE is being used in data centers mostly as a backbone to consolidate gigabit links and is starting to gain traction as a point-to-point interconnect.
With latencies as low as a single microsecond between server nodes, it is feasible to distribute workloads across multiple servers and to use replication to multiple server nodes to provide high availability and data integrity. Nevertheless, most applications available today were written with the assumptions of high latencies and slow bandwidth. The software to manage data movement at such high speeds while running simultaneously on multiple server nodes is very complex.
Loosely Coupled Scale-Out Architectures
A modern web 2.0 and cloud computing data center scale-out system architecture deployment has at the front a web server tier and an application server tier (sometimes merged together), and in the back-end a reliable data tier, usually hosted by database servers, which are typically slow and expensive elements. They often operate at very low CPU utilization due to blocking on HDD accesses, lock serialization effects and low HDD capacity utilization due to having to minimize head movement to reduce access latencies.
Between the web server tier and the back-end server tier are a content caching tier and specialized application services, which may perform generic functions such as search, ad serving, photo store/retrieval, authentication, etc., or specific functions for the enterprise. Completing a response to a customer interaction involves accessing a web server, application servers, database servers and various other generic and specialized applications and servers.
Data centers generally require that user responses complete in less than a quarter second. A DRAM caching tier, consisting of servers filled with DRAM, usually completes this. Customer information, data retrieved from slow databases and common user interaction results are cached in this DRAM tier so they can be accessed very quickly.
Since the performance of a site can be improved dramatically through extensive caching, many racks of caching servers are typically deployed, with each holding a limited amount of DRAM, so the data must be partitioned among the caching servers. IT staff need to carefully lay out the data between the caching servers, which typically operate at very low network and CPU utilization as they are simply storing and retrieving small amounts of data.
When loosely coupled scale-out architectures are examined closely, it becomes clear that the database and caching tiers suffer from very low utilization, high power consumption and excessive programmatic and administrative complexity-all contributing to high total cost of ownership (TCO).
Putting the Higher Level Building Blocks to Use
Great potential exists for using the new generation of commodity multi-core processors, flash memory and low latency interconnects in web 2.0 and cloud computing data centers. But due to major development and support costs involved in implementing them-which are not the core value of the businesses-the benefits are often unrealized. Today, new, higher-level building blocks are available to address these challenges, and this is happening at exactly the right time for the market. Exploding demand for services from today's web 2.0 and cloud computing data centers has placed existing architectures and technologies under tremendous strain, significantly sacrificing service availability.