Big Data Notes
“Big data” represents a paradigm shift in the technologies and techniques for storing, analyzing and leveraging information assets. In this column, we track the progress of technologies such as Hadoop, NoSQL and data science and see how they are revolutionizing database management, business practice, and our everyday lives.
One feature of the big data revolution is the acknowledgement that a single database management system architecture cannot meet all needs. However, the Lambda Architecture provides a useful pattern for combining multiple big data technologies to achieve multiple enterprise objectives. First proposed by Nathan Marz, it attempts to provide a combination of technologies that together can provide the characteristics of a web-scale system that can satisfy requirements for availability, maintainability, and fault-tolerance.
Posted October 08, 2014
The pioneers of big data, such as Google, Amazon, and eBay, generated a "data exhaust" from their core operations that was more than sufficient to allow them to create data-driven process automation. But, for smaller enterprises, data might be the scarcest commodity. Hence, the emergence of data marketplaces.
Posted August 05, 2014
Big data analytics is a complex field, but if you understand the basic concepts—such as the difference between supervised and unsupervised learning—you are sure to be ahead of the person who wants to talk data science at your next cocktail party!
Posted June 11, 2014
About 3 years ago, the AMP (Algorithms, Machines, People) lab was established at U.C. Berkeley to attack the emerging challenges of advanced analytics and machine learning on big data. The resulting Berkeley Data Analytics Stack—particularly the Spark processing engine—has shown rapid uptake and tremendous promise.
Posted April 04, 2014
Solid State Disk (SSD)—particularly flash SSD—promised to revolutionize database performance by providing a storage media that was orders of magnitude faster than magnetic disk, offering the first significant improvement in disk I/O latency for decades. Aerospike is a NoSQL database that attempts to provide a database architecture that can fully exploit the I/O characteristics of flash SSD.
Posted February 10, 2014