Big Data Notes
“Big data” represents a paradigm shift in the technologies and techniques for storing, analyzing and leveraging information assets. In this column, we track the progress of technologies such as Hadoop, NoSQL and data science and see how they are revolutionizing database management, business practice, and our everyday lives.
While the new data stores and other software components are generally open source and incur little or no licensing costs, the architecture of the new stacks grows ever more complex, and this complexity is creating a barrier to adoption for more modestly sized organizations.
Posted April 06, 2015
Someone new to big data and Hadoop might be forgiven for feeling a bit confused after reading some of the recent press coverage on Hadoop. On one hand, Hadoop has achieved very bullish coverage in mainstream media. However, counter to this positive coverage, there have been a number of claims that Hadoop is overhyped. What's a person to make of all these mixed messages?
Posted February 11, 2015
The introduction of increased transactional capability into non-relational databases makes sense—in the same way that providing SQL layers on top of Hadoop and many other non-relational stores makes sense. But it does raise the possibility of convergence of relational and non-relational systems. After all, if I take a non-relational database and add SQL and ACID transactions, have I still got a non-relational database, or have I come full circle back to the relational model?
Posted December 03, 2014
One feature of the big data revolution is the acknowledgement that a single database management system architecture cannot meet all needs. However, the Lambda Architecture provides a useful pattern for combining multiple big data technologies to achieve multiple enterprise objectives. First proposed by Nathan Marz, it attempts to provide a combination of technologies that together can provide the characteristics of a web-scale system that can satisfy requirements for availability, maintainability, and fault-tolerance.
Posted October 08, 2014
The pioneers of big data, such as Google, Amazon, and eBay, generated a "data exhaust" from their core operations that was more than sufficient to allow them to create data-driven process automation. But, for smaller enterprises, data might be the scarcest commodity. Hence, the emergence of data marketplaces.
Posted August 05, 2014
Big data analytics is a complex field, but if you understand the basic concepts—such as the difference between supervised and unsupervised learning—you are sure to be ahead of the person who wants to talk data science at your next cocktail party!
Posted June 11, 2014
About 3 years ago, the AMP (Algorithms, Machines, People) lab was established at U.C. Berkeley to attack the emerging challenges of advanced analytics and machine learning on big data. The resulting Berkeley Data Analytics Stack—particularly the Spark processing engine—has shown rapid uptake and tremendous promise.
Posted April 04, 2014
Solid State Disk (SSD)—particularly flash SSD—promised to revolutionize database performance by providing a storage media that was orders of magnitude faster than magnetic disk, offering the first significant improvement in disk I/O latency for decades. Aerospike is a NoSQL database that attempts to provide a database architecture that can fully exploit the I/O characteristics of flash SSD.
Posted February 10, 2014