Newsletters




Guy Harrison

Guy Harrison

Guy Harrison, a software professional with more than 20 years of experience in database design, development, administration, and optimization, is director and CTO of Southbank, which he founded in 2016. He is the author of Next Generation Databases, Oracle Performance Survival Guide and MySQL Stored Procedure Programming as well as well as other books, articles and presentations on database technology.  He writes a monthly column for Database Trends and Applications and is an Oracle ACE and a MongoDB certified DBA.    

Harrison can be found on the internet at www.guyharrison.net, on email at guy.a.harrison@gmail.com and is @guyharrison on Twitter.  He is a partner at Toba Capital and can be reached there at guy@tobacapital.com.

Articles by Guy Harrison

By now, I'm sure most of us have spent quality time with generative AI systems such as ChatGPT and have been amazed at how convincingly they mimic human intelligence. Furthermore, it's just as easy to integrate generative AI into application code as it is to interact manually. However, when it comes to adding generative AI capabilities to enterprise applications, we usually find that something is missing—the generative AI programs simply don't have the context to interact with an application's users or a company's customers.

Posted March 14, 2024

The remarkable and rapid uptake of ChatGPT and similar large language model (LLM)-based AIs may be driving the biggest increase in demand for computing power since the advent of the internet. The neural networks that power LLM AI solutions, such as ChatGPT, rely on massively parallel processing. This processing is similar in some respects to the massively parallel graphics processing that was demanded in the past by computer games.

Posted February 08, 2024

As you may know, an index is a database object with its own storage that provides a fast access path into a collection. Indexes exist primarily to enhance performance, so using indexes effectively is paramount when optimizing MongoDB performance.

Posted January 11, 2024

For most of the last few decades, AI has over-promised and under-delivered. However, behind the scenes, very significant advances in deep learning technologies have had a revolutionary impact within important but narrow domains. In 2022 these technologies became generally usable with the release of ChatGPT-3.

Posted December 14, 2023

I can't remember the first time a database or storage vendor told me, "Disk is cheap," but it was probably in the 1990s.  Vendors like to say disk is cheap because it helps them sell more of it and to encourage bigger deployments. The fact is that data storage is getting cheaper all the time. When I entered the business, one GB of storage cost about $1000. Today, it's more like 10 cents—10,000 times less!

Posted November 09, 2023

The mainstream media detractors of blockchains and cryp­tocurrency articulate multiple criticisms of the technol­ogy, but the one with perhaps the greatest legitimate grounds is the environmental impact created by the original incarnations of Bitcoin and Ethereum. In the original Bitcoin white paper, Satoshi Nakamura out­lined the Proof of Work (PoW) algorithm. PoW is the core inno­vation within the Bitcoin blockchain that creates the immutable record of cryptocurrency transactions and supports "trustless" transactions between two parties with­out the need for third-party involvement.

Posted October 12, 2023

The dramatic advances in generational AI—chatGPT in particular—have motivated almost all technology companies to find an AI story that can break through the heavily AI-oriented technology news feeds. It can be hard for database companies to tell an AI story. While most AI is data-driven and therefore dependent on database technology, advances in database technology have not in themselves driven advances in AI, and AI has generally not yet revolutionized database technology. 

Posted September 14, 2023

In the 1992 movie Sneakers, hackers discover a device that can break the encryption of virtually any computer system. At the time of release, the idea seemed far-fetched. But today, because of quantum computers, we may soon have a similar device. Quantum theory is now approaching its 100th anniversary, with the central tenants of the theory having been well-established by the mid-1920s. Together with general relativity, quantum theory is one of the foundational theories of modern science. Our modern world would not be possible without quantum theory: It's central to the design of semiconductors, lasers, MRIs, and even LEDs.

Posted August 10, 2023

Traditionally, MongoDB has held its signature event—MongoDB World—in June in New York City. MongoDB usually announces significant releases of their flagship database product at this event. However, this year, the company has transitioned to a global tour of "local" events that start in New York and then proceed to a few dozen regional events across five continents. 

Posted July 13, 2023

In the seminal novel 1984, George Orwell imagines a dystopian future in which governments routinely rewrite historical records as a means of control: "He who controls the past controls the future." While few Western governments have yet to engage in the wholesale creation of false history, the practice is too common in totalitarian governments. And with AI-generated fakes being increasingly hard to differentiate from the real thing, the ability to create alternate historical records has never been easier.

Posted June 08, 2023

Mongo Atlas—MongoDB's Database as a Service offering—has become increasingly central to MongoDBs commercial strategy. Since its launch in 2016, MongoDB has onboarded tens of thousands of customers to Atlas, and Atlas now represents almost two-thirds of MongoDB's revenue.

Posted May 11, 2023

The biggest technology news of the year so far has undoubtedly been the release of the OpenAI ChatGPT technology. ChatGPT is a chat-bot style AI that can generate amazingly knowledgeable and human-like conversations. ChatGPT can also generate essay-type answers to exam questions, write halfway decent song lyrics, or even generate computer code. I was going to explain how ChatGPT works, but instead, I'll let ChatGPT tell you.

Posted April 13, 2023

A long, long time ago—in the early 90s—I first worked as a DBA with responsibility for enterprise databases. I will never forget how surprised and disappointed I was to discover that the data in the database files were completely unencrypted.  A system administrator with access to the database files could read the data in those files, even if they didn't have a database username and password.

Posted March 09, 2023

Sometime during the pandemic, I stopped carrying a wallet when leaving the house. Almost all shops supported—often insisted on—touchless payments, so there was no need to carry around paper money, and since my phone was fully able to perform touchless payments, my phone became all I needed to navigate the commercial world. I'm sure most of you have had a similar experience.

Posted February 09, 2023

MongoDB is often found within a complete application stack that combines multiple technologies to deliver functionality to users both within and outside of the organization. Very often, organizations want to create automated workflows that propagate information automatically between disparate technologies. 

Posted January 12, 2023

Modern applications have increasingly leveraged Kubernetes as the "OS of the cloud" because of its ability to abstract the underlying cloud platform and coordinate the activities of multiple docker containers. Kubernetes does indeed radically simplify the deployment and administration of multi-service distributed applications. However, it has a significant learning curve, and maintaining a largescale Kubernetes cluster can be daunting.

Posted December 08, 2022

In the last Mongo Matters column, we looked at MongoDB's new SQL interface for querying data in MongoDB Atlas. Almost all of the "NoSQL" databases today support some form of SQL interface in order to leverage the multitude of SQL-based BI tools and the wealth of SQL expertise in the data analytics market. This "SQL for NoSQL" usually bemuses SQL aficionados and is often seen as some sort of repudiation of the document database concept.  

Posted November 10, 2022

PostgreSQL arguably has been somewhat overlooked by database commentators. PostgreSQL doesn't have the mas­sive marketing machine of Oracle or Microsoft, and it lacks the "new kid on the block" appeal of MongoDB or CockroachDB. However, PostgreSQL continues to increase in importance in terms of deployments and mindshare. This year, PostgreSQL overtook MongoDB as the "most loved" and "most wanted" database platform. Developer enthusiasm is probably the strongest leading indicator of future deployments since developers, more than anyone else, get to decide what tech­nologies are used in an application.

Posted October 06, 2022

In previous columns, we've noted that the SQL language is in the ascendant. New SQL native databases such as CockroachDB and Yugabyte are showing robust adoption, while non-relational (NoSQL) databases increasingly provide SQL interfaces to their data. In light of this increasing trend, it's no surprise to see the introduction of a new SQL capability within the latest release of MongoDB—the Atlas SQL framework. 

Posted September 08, 2022

The cryptocurrency ecosystem strikes as being in a perpetual state of boom and bust, though the long-term trend is an increasing valuation for crypto assets. Within the crypto space, NFTs are particularly bubbly. Individual NFTs have sold for as much as $67 million, while total NFT sales reached $40 billion in 2021.

Posted August 11, 2022

The SQL language served as a universal language for data­base manipulation from the mid-1980s until NoSQL data­bases started gaining strength about 12 years ago. However, after a short period in the wilderness, SQL is back and possibly more vital than ever.

Posted June 02, 2022

MongoDB's recent enhancements are definitely of the perfective variety—broadly improving on the initial implementations of new features of 5.0. However, they go a long way toward enhancing the capabilities of 5.0 and creating a significant advantage for users of the MongoDB Atlas cloud.

Posted May 04, 2022

Although IPFS—the so-called Interplanetary File System— has far less name recognition than blockchain, it represents one of the essential technologies underlying the current boom in NFTs (non-fungible tokens).

Posted April 07, 2022

MongoDB initially gained traction as a backend for web applications, in which it was mostly concerned with so-called "CRUD" operations—creating, reading, updating, and deleting documents. Since then, MongoDB has broadened its capabilities remarkably, but it is still typically deployed as an operational database rather than as an analytic DB or data warehouse.

Posted March 11, 2022

The old Chinese expression "May you live in interesting times" has never been more applicable in my lifetime than in 2020 and 2021. The global pandemic combined periods of great anxiety with long stretches of mind-numbing lockdown boredom. But it certainly kept our attention!

Posted February 08, 2022

The roots of open source go back to the original pioneers of the computer revolution. Early pioneers of computing at organizations such as Bell Labs and MIT held a belief that sharing program code was essential to the progression of computer technology.

Posted January 17, 2022

The design of a schema in MongoDB is just as important as it is in RDBMS.  Indeed, schema design can be even more complicated in MongoDB. At least in SQL databases, we have the "first normal form" representing the starting point for a well-designed first cut data model. In MongoDB, we have more choices, but as a consequence, we have more potential pitfalls.

Posted January 03, 2022

Since the emergence of cloud computing more than a decade ago, many have been waiting for a completely cloud-based, elastic database that could expand (and even contract) its footprint dynamically and would eliminate the significant opera­tional effort of maintaining a production database system.

Posted December 08, 2021

One of the big-ticket items at the recent MongoDB 5.0 launch was the introduction of specialized "time-series" collections. The key concept in a time-series database is that, for a lot of data, the timestamp of the data is a critical element, both in terms of operational cost and analytic value. Time-series collections represent another attempt by the MongoDB company to extend the use of MongoDB to a wider set of applications and scenarios. They are a welcome addition.

Posted November 01, 2021

If you pay attention to the annual "StackOverflow Devel­oper Survey"—and, as a DBTA reader, you probably should—you might be interested in how developers use and rate the various database platforms. Usage responses are unsur­prising; MySQL, SQLite, SQL Server, and PostgreSQL all show up as the most widely used databases. But when you look at the most "loved" databases, the results are actually somewhat sur­prising—Redis consistently shows up as the most loved data­base platform by developers

Posted October 05, 2021

It's been 3 years since MongoDB 4.0 was announced at the 2018 MongoDB World conference—and that is a long time in the software industry.

Posted September 16, 2021

In the last 5 years, we've seen a blurring of the distinction between many of the upstart databases and the traditional SQL databases. NoSQL databases such as MongoDB have added features typically associated with relational databases—trans­actions, SQL connectors, and the like—while the SQL databases have introduced support for JSON document models. We can see that databases such as PostgreSQL and MongoDB are increasingly converg­ing on a common set of features. However, one category of NoSQL databases seems to be bucking the conver­gence trend: graph databases.

Posted August 02, 2021

MongoDB has had quite a wild ride over the past 10 years and has succeeded beyond expectations. However, what got them here won't get them where they want to be in the next 10 years. I can't wait to see the next phase of MongoDB's technology evolution.

Posted July 15, 2021

Whatever you think about NFTs, the increase in load on the Ethereum network has created another scalability crisis. Ethereum transaction fees are going through the roof, and delays on the network are increasing. If Ethereum is going to compete successfully against up-and-coming alternative chains such as Hedera Hashgraph, something has to be done to improve the throughput of the network. Luckily, we are on the verge of several big paradigm shifts in Ethereum with ETH 2.0, which may pave the way for greater throughput.

Posted June 02, 2021

Setting up a distributed MongoDB cluster by hand is a complex and error-prone process. However, MongoDB provides a Kubernetes operator, which allows such a deployment to be established within a Kubernetes cluster amazingly easily. The "operator" is a controller program that runs within the Kubernetes cluster and contains the MongoDB-specific logic for establishing MongoDB cluster topologies. One need only supply the operator with a configuration file, and the operator will do the rest—creating and configuring MongoDB nodes, setting up best-practice security, and handling the connectivity between nodes.

Posted April 29, 2021

There's still life in the data lake concept, as evidenced by the growing success of Dremio. Dremio describes itself as "the cloud data lake" platform. It provides a cloud-based engine that layers over cloud object storage such as Amazon S3, Azure's Data Lake Storage, or even legacy Hadoop systems.Why would Dremio succeed where Hadoop ulti­mately failed?

Posted April 06, 2021

It's hard to overstate the impact WiredTiger technology has had on MongoDB. When MongoDB announced its storage engine API in 2014, the WiredTiger team immediately saw the opportunity and raced to provide the best solution. The rest, as they say, is history.

Posted March 01, 2021

If you want to find out what will be mainstream tomorrow, look at what developers are using today. Developers typically embrace new technologies years before they hit the data center. As technology continues to power competitive advantage, developers are increasingly in the driver's seat when it comes to enterprise technology strategy.

Posted February 10, 2021

From the beginning, MongoDB has had a laser focus on making life easier for developers. MongoDB has continued to produce new developer tooling as well. In June, MongoDB introduced a new shell—the mongosh. The traditional Mongo shell is a command-line utility that provides an easy way to execute commands against the database. The existing shell included a JavaScript engine, so it was capable of running scripts that performed administrative functions or simplified complex commands. The new shell includes most of the features of the traditional shell, but adds modern experience such as syntax highlighting, error handling and autocomplete.

Posted January 07, 2021

Over the past 10 years we've seen a proliferation of non-re­lational database systems, usually based around a distributed, fault-tolerant architecture with flexible consistency models—databases such as DynamoDB, Cassandra, and Hadoop. However, in recent years, a new set of cloud-native, SQL-enabled databases have established significant traction.

Posted December 10, 2020

The cloud-tipping point for database technology had already occurred by the beginning of 2020, but what was a gradual migration to the cloud is now looking increasingly like a sprint.

Posted December 08, 2020

Since the very inception of MongoDB, Eliot Horowitz, MongoDB's CTO and co-founder, has been a consistent and articulate owner of the MongoDB technical vision and the apparent creative force behind MongoDB's architecture and feature set. In appointing replacement CTO Mark Porter, MongoDB has chosen someone with a high degree of "geek cred" that will hopefully satisfy developers who want to hear from someone who can speak their language. However, he also brings experience from the enterprise database and cloud world that will undoubtedly be useful.

Posted November 04, 2020

For the past few years, database vendors have been busily enhancing their cloud offerings and consolidating the innovations that arose more than 10 years ago from the big data and NoSQL movements. While both NoSQL and big data were enormously influential for database technology, it remains true that the vast majority of databases are running on architectures that are positively ancient in computer science terms.

Posted October 08, 2020

MongoDB has worked hard over the past few years to improve the security of its flagship MongoDB database server. It desperately needed to do this because MongoDB has been subjected to more high-profile attacks than any other database platform.

Posted September 09, 2020

In the early days of the internet, a famous New Yorker cartoon noted, "On the Internet, nobody knows you're a dog." The cartoon was making the point that the internet allowed for truly anonymous interactions in a manner we had not seen before. The anonymity of the internet is as much a feature as a bug. The internet was initially designed to be censorship-resistant, and anonymity—by eliminating the possibility of punishing speech—reduced the potential for censorship.

Posted August 11, 2020

This year, an in-person MongoDB World conference in New York City was inconceivable. With NYC and the world still in various levels of COVID-19 lockdown, MongoDB World was held in the cloud as a virtual event—MongoDB.Live—in the first week of June. Holding the annual conference in the cloud is quite apt in many ways.

Posted July 01, 2020

The COVID-19 pandemic has provided most of us with more disruption and change in a few months than we have experienced in decades. The short-term effects of lockdown, health tragedies, and severe economic downturn are all too apparent. But we all know that there will be a post-COVID world, and it makes sense to try and predict what role technology will play in that recovery.

Posted June 10, 2020

To predict how MongoDB may navigate COVID-19 and post COVID-19 environment would require an ability to predict how our industry as a whole will fare—which is a big ask. Nevertheless, let's give it a shot.

Posted May 13, 2020

In the HBO series "Silicon Valley," Pied Piper CEO Richard Hendricks attempted to return power to the people by creating a peer-to-peer "new internet" in which data and communications are distributed across all the devices on the network with no central point of control. In the real world, several technology companies are trying to do the same thing, and blockchain is the critical enabling technology. Elastos is one of these next-generation blockchain projects. 

Posted April 08, 2020

Applications written in NodeJS are particularly synergistic with MongoDB since JavaScript objects (JSON) are native to MongoDB and to Node. So, it might seem strange at first that an ORM-like layer—Mongoose—has emerged. Why would we want a mapping layer between JSON objects and a JSON database?

Posted March 05, 2020

The decade just ended has truly been revolutionary. Technological forces have combined to revolutionize almost every aspect of our daily lives and trans­form our society—and not always for the better. But although the components of a revolution were in place in 2010, the transformations that would result from the integration of cloud, mobile, and social were far from obvious.

Posted February 10, 2020

When I was a young man—a long, long time ago now—I worked as an Oracle DBA (Oracle version 6, if you must know). I remember my astonishment at finding out that information in the database was stored in plain text within the database files. That meant if I could gain read access just to those files, I could read all the information in the database. It didn't matter what security controls I, as the DBA implemented at the database level—an attacker who could gain read access to the files on disk could read everything.

Posted January 02, 2020

DBMS 2020: State of Play

Posted December 18, 2019

DBMS 2020: State of Play

Posted December 09, 2019

Researchers at Google recently announced they had achieved "quantum supremacy" by performing a non-trivial computation on a quantum computer that decisively outperformed a "classical" computer performing the same task. Although IBM disputed some details of the achievement, this announcement will probably stand as a milestone in the development of quantum computing technology. In minutes Google's quantum computer performed the calculation that would have taken most traditional computers thousands of years.

Posted December 01, 2019

Despite the failed promises of the data lake, the concept retains some resonance in larger enterprises, and so MongoDB has chosen to leverage the term for one of its latest offerings. MongoDB's Atlas Data Lake bears only superficial similarity to Hadoop-powered data lakes. Nevertheless, it's a useful feature that stands to see significant uptake.

Posted October 31, 2019

Regardless of its long-term prospects, Hadoop remains a pivotal technology in the history of databases. Hadoop was one of the key technologies that broke the stranglehold of relational databases, and it forced a shift in the way in which we think about and store data.

Posted October 01, 2019

MongoDB 4.2 may seem like a grab bag of features, but all of the features represent useful additions to your MongoDB toolkit. Some features—the Atlas Data Lake, for instance—need significant enhancements to cover all conceivable use cases. Nevertheless, MongoDB 4.2 will be a useful upgrade and I'd expect it to be widely deployed.

Posted September 03, 2019

After almost a generation of relative stability, database technology has been rocked over the past decade by two megatrends—the end of the one-size-fits-all RDBMS model and the rise of cloud computing.

Posted August 07, 2019

MongoDB gained popularity with  developers very early on, but serious database engineers were often skeptical about MongoDB architecture and implementation.  One area that came under some criticism was cluster consistency.

Posted July 18, 2019

The blockchain technology market is generally believed to be about $2 billion in 2019 and growing at an annual rate in excess of 50%—with projections for the market to exceed $10 billion by the end of 2025. Almost all of that new spending will be cloud-oriented; very few organizations consider running their own blockchain hardware. Therefore, it's not surprising to see cloud vendors actively promoting blockchain solutions.

Posted June 10, 2019

If you have been wondering whether the tipping point for database-as-a-service (DBaaS) has arrived, it's instructive to look at the success of MongoDB Atlas. 

Posted May 01, 2019

The implications of CAP Theorem, more than anything else, led to the schism in modern database management systems. With the rise of global applications with extremely high uptime requirements, it became unthinkable to sacrifice availability for perfect consistency.  Almost in unison, the leading Web 2.0 companies such as Amazon, Google, and Facebook introduced new database services that were only "eventually" consistent but globally and highly available. Google has tried to resolve this database schism with its Spanner SQL database.

Posted April 09, 2019

MongoDB's recent and well-publicized new license—the Server Side Public License (SSPL)—was explicitly designed to prevent cloud vendors such as Amazon from deploying a MongoDB cloud service without paying MongoDB license fees. In early January, Amazon announced DocumentDB, a MongoDB-compatible cloud database service.

Posted March 04, 2019

Virtually all analyst firms, industry experts, and technology authorities still believe that blockchain is a key innovation in computer science and expect blockchain to have a massive impact across multiple industries over the next 10 years. The same authorities also agree that the ultimate success of blockchain technology depends on significant advancements to overcome the limitations inherent in the current implementations. In particular, the environmental impact of the public blockchain is too high, and the throughput of the public blockchain is too low.

Posted February 08, 2019

For some time now, the majority of open source investment has come from venture capital and mega-corporations. There is good reason to think that this level of patronage will not persist indefinitely. Should VC-funded open source companies fail to return on investment, then VC investment will dry up.  The high-profile companies bolstering open source do so primarily for selfish motives and can't necessarily be relied on to do so forever.  So it's essential that companies that develop and market open source products be able to generate some return on their investment.

Posted January 02, 2019

GraphQL has emerged as the favorite alternative to REST for modern web API design. GraphQL was first used internally at Facebook before being open sourced in 2015. GraphQL is described as a data query and manipulation language which at first glance might suggest more in common with SQL than with REST. However, in reality both GraphQL and REST are applicable across a very similar range of web API scenarios.

Posted December 04, 2018

It's been a while since MongoDB has felt threatened by another document database vendor. Historically, the closest contender for document database dominance was Couchbase, the offspring of the original CouchDB database, which arguably ignited the document database segment.

Posted November 01, 2018

We've all seen the movie about the kidnap victim whose family asks for a "proof of life." The proof is typically a photograph of the victim posed with a current newspaper. Blockchain technology is now allowing us to provide similar proofs for the existence of digital assets. The immutable nature of the blockchain—the fact that it is impossible to overwrite time-stamped blockchain ledger entries—allows us to create "proof of existence" entries for digital assets.

Posted October 10, 2018

We all know that self-driving cars and other autonomous vehicles are coming. Prototypes of self-driving vehicles can be seen around Silicon Valley, and self-driving features are commercially available in Teslas and other brands. However, there are significant "smart car" features on the way that will affect both human- and self-driven vehicles.

Posted August 08, 2018

The introduction of transactions in MongoDB 4.0 represents possibly the most significant change in MongoDB's architecture since its original release.  The lack of a transactional capability previously defined the capabilities of the database: Without transactions, MongoDB was blocked from consideration for a wide range of application scenarios.  With the implementation of transactions, MongoDB can for the first time truly claim to be a general purpose DBMS.

Posted July 02, 2018

It may seem strange to see MongoDB expanding the very features of the relational databases that it originally rejected. In the last few releases, we've seen implementation of joins, strict schemas, and now ACID transactions. However, what this indicates is that MongoDB is increasingly contending for serious enterprise database workloads: MongoDB is expanding the scope of its ambitions.

Posted June 01, 2018

The serverless computing architecture—sometimes called function as a service or FaaS—hides not just the underlying virtual machine, but also the application server itself. The cloud simply agrees to execute your code on demand or in response to an event.

Posted April 12, 2018

It's understandable that those new to MongoDB - a so-called "schema free" database - might assume that they no longer need to be concerned with the art-science of data modeling. However, in reality data modeling is just as important in MongoDB as in other databases. Indeed, because of some of the modeling principles for MongoDB are less well understood, arguably, more attention needs to be given to the data modeling process.  

Posted March 07, 2018

The Bitcoin bubble is a mixed bag for blockchain and cryptocurrency enthusiasts. While the incredible increase in Bitcoin's valuation has resulted in a huge windfall for early adopters and enhanced the recognition of blockchain technology, it has also highlighted the volatility of Bitcoin as a currency and the limitations of the underlying blockchain network.

Posted February 01, 2018

The suddenness of the non-relational "breakout" created a lot of noise and confusion and—at least initially—an explosion of new database systems. However, the database landscape is settling down, and in the past few years, the biggest meta trend in database management has been a reduction in the number of leading vendors and consolidation of core technologies. Additionally, we're starting to see database as a service (DBaaS) offerings become increasingly credible alternatives to on-premise or do-it-yourself cloud database configuration.

Posted January 03, 2018

MongoDB 3.6 was announced publically in November and should be in production by the time this article is posted.  There are no shock features in this version, but it is an attractive release that should see rapid uptake.

Posted January 02, 2018

Of all modern languages, JavaScript has one of the most fascinating backstories. During the early days of the web, Netscape hired Brendan Eich to create a prototype "glue language" that could be used in conjunction with HTML to increase the interactivity of webpages. The prototype was thrown together in just 10 days and named JavaScript. However, JavaScript owes very little to Java—it is often said that JavaScript is to Java as hamburger is to ham.

Posted December 01, 2017

MongoDB Files for IPO and Reveals Its Official Strategy for Success

Posted November 01, 2017

Virtual reality (VR)—the use of computers to create a complete simulation of a human reality—has been an active concept since the early days of digital computing and a popular theme in science fiction for many decades. The idea became part of mainstream consciousness with the release of The Matrix in 1999, in which the protagonist turned out to be living in a simulation indistinguishable from the real thing.

Posted October 18, 2017

MongoDB recently announced some interesting, though incremental, enhancements. These included improved "joins" in the aggregation framework, better document validation using JSON schema, and more reliable behavior in the event of network failures. These features attempt to close the gap between the functionality of MongoDB and traditional relational databases - joins, schemas, and commits. On top of these incremental updates, MongoDB announced a couple of features that intrude on functionality usually provided by application servers or desktop programs.

Posted September 07, 2017

The emergence of cryptocurrencies and blockchain technology may prove to be almost as significant an innovation as the internet itself. Blockchain offers a mechanism for the mediation of any transactions that previously would have required trusted third parties, while cryptocurrencies such as Bitcoin may eventually become a significant alternative to traditional "fiat" (e.g., government-backed) currencies. These technologies could eventually revolutionize the global banking infrastructure which has underpinned global commerce for centuries.

Posted August 09, 2017

MongoDB has become a favorite among developers in no small part because of its alignment with modern software development practices. Its flexible schemas are compatible with agile software development and the JSON-based document structure is well-matched with modern JavaScript-centric web architectures. However, databases don't exist solely for the convenience of software developers: Data in a database is a critical business asset.

Posted July 05, 2017

Although Java and JavaScript are the most popular all around programming languages today, the C programming language remains the language of choice for high performance computing after almost 45 years of mainstream use. However, where runtime performance considerations are paramount, Go and Rust are emerging as valid successors to C.

Posted June 01, 2017

MongoDB faced its worst-ever public relations challenge earlier this year when a spate of ransomware attacks plagued tens of thousands of Mongo instances. It's important to realize that we are not talking about some sort of obscure vulnerability here - these MongoDB databases were configured with NO passwords at all, and were easily found listening on the default port (27017) on publicly-accessible servers.

Posted May 05, 2017

There's a wide and growing acceptance that containers are replacing operating systems as the deployment target for application components. While application modules were previously designed to be installed upon a specific version of an operating system on a particular hardware platform, they are now increasingly being designed to run within a virtualized representation of an operating system—most frequently within a Docker container.

Posted April 07, 2017

Imagine you are standing by a railway track near a lever that switches between two sets of tracks. A runaway rail trolley is heading toward the fork in the tracks, and five people are trapped on the currently activated line. You could switch the trolley to the alternative track, but there is a single person trapped there as well. Do you switch the trolley? The artificial intelligence community is increasingly wrestling with similar moral conundrums implicit in the ever-more pervasive algorithms that underlie much of our technological infrastructure.

Posted February 08, 2017

Java started its life in the early 1990s as an attempt to develop an architecture-independent language that could be used in consumer electronics and other embedded contexts. It found itself in the right place at the right time when the web exploded in the mid-1990s and over the next 10 years became one of the mainstays of web development. Today, Java remains as popular as ever. It's arguably the most popular programming language of our generation.

Posted January 03, 2017

The Modern Heterogeneous Enterprise Data Architecture Takes Shape

Posted December 08, 2016

It's been amusing to watch the NoSQL movement transition from a "We don't need no stinking SQL" attitude to a "Can I please have some SQL with that?" philosophy. The nonrelational databases that emerged over the past 8 years initially offered no SQL capabilities. However, today we have an embarrassment of SQL options for "NoSQL." Hive offers SQL for Hadoop systems, Spark has SparkSQL, MongoDB has a SQL-based BI connector, and so on.

Posted December 01, 2016

The business of arranging millions or billions of zeros and ones in exactly the right order—also known as the business of software—has undergone many significant shifts over the history of computing, and we might be about to experience yet another.

Posted November 02, 2016

For many years now, Cassandra has been renowned for its ability to handle massive scaling and global availability. Based on Amazon's Dynamo, Cassandra implements a masterless architecture which allows database transactions to continue even when the database is subjected to massive network or data center disruption. Even in the circumstance in which two geographically separate data centers are completely isolated through a network outage, a Cassandra database may continue to operate in both geographies, reconciling conflicting transactions—albeit possibly imperfectly—when the outage is resolved.

Posted October 07, 2016

It is widely accepted that to realize the full potential of blockchain technology we will need a next-generation blockchain to supplement the one provided in the Bitcoin implementation. Ethereum represents just such a next-generation blockchain.

Posted September 02, 2016

With 20 million downloads to date, MongoDB is arguably today's fastest-growing database technology. MongoDB's rapid growth has been driven primarily by its attractiveness to developers. By using JavaScript Object Notation (JSON) documents as the native database format, MongoDB reduces the impedance mismatch between program code and database, allowing more agile and rapid application development.

Posted August 04, 2016

By the mid-2000s, a huge number of web apps were built upon the so-called LAMP stack. LAMP applications utilize the Linux operating system, Apache web server and MySQL database server, and implement application logic in PHP or another language starting with the letter "P," such as Python or Perl. But the LAMP stack is now essentially obsolete technology, and the MEAN stack provides a lot of productivity advantages, especially for modern highly-interactive web sites. But the MEAN stack is not without compromise. Here's why.

Posted July 12, 2016

For those who haven't encountered the term, the "trough of disillusionment" is a standard phase within the Gartner hype cycle. New technologies are expected to pass from a "peak of inflated expectations" through the trough of disillusionment before eventually reaching the "plateau of productivity." Most new technologies are expected to go through this trough, so it's hardly surprising to find big data entering this phase.

Posted June 09, 2016

It's become almost a standard career path in Silicon Valley: A talented engineer creates a valuable open source software commodity inside of a larger organization, then leaves that company to create a new startup to commercialize the open source product. Indeed, this is virtually the plot line for the hilarious HBO comedy series, Silicon Valley. Jay Krepes, a well-known engineer at LinkedIn and creator of the NoSQL database system, Voldemort, has such a story.

Posted March 31, 2016

In a new book titled "Next Generation Databases," Guy Harrison, an executive director of R&D at Dell, shares what every data professional needs to know about the future of databases in a world of NoSQL and big data.

Posted March 08, 2016

Few of us working in the software industry would dispute that agile methodologies represent a superior approach to older waterfall-style development methods. However, many software developers would agree that older enterprise-level processes often interact poorly with the agile methodology, and long for agility at the enterprise level. The Scaled Agile Framework (SAFe) provides a recipe for adopting agile principles at the enterprise level.

Posted March 03, 2016

Say what you will about Oracle, it certainly can't be accused of failing to move with the times. Typically, Oracle comes late to a technology party but arrives dressed to kill.

Posted February 10, 2016

The development of a functional and practical quantum computing system has been "pending" for some decades now, but there are some real signs that this technology may become decisive soon. The implications of cryptography are encouraging major government investment - both the U.S. and China, in particular, are heavily investing in quantum computing technology. The arms race to develop functional quantum computing has begun.

Posted January 07, 2016

It's commonly asserted—and generally accepted—that the era of the "one-size-fits-all" database is over. We expect that enterprises will use a combination of database technologies to meet the distinct needs created by various application architectures.

Posted December 02, 2015

Almost every commercial endeavor and, indeed, almost every human undertaking, has software at its core. Yet, with software at the core of so much of our society, it's surprising to realize it's getting harder and harder to actually make a living selling software. In his recent book, "The Software Paradox," Stephen O'Grady - co-founder of analyst firm RedMonk - provides a cohesive and persuasive analysis of what those of us in the software business have been experiencing for several years - it's getting increasingly difficult to generate revenues selling "shrink-wrapped" software.

Posted November 09, 2015

There are quite a few databases competing to be "king" of NoSQL. MongoDB claims to have the fastest-growing NoSQL database ecosystem, MarkLogic claims to be the only Enterprise NoSQL database, while other databases claim to be the fastest or most scalable system.

Posted October 07, 2015

Shortly after the explosion of non-relational databases, around 2009, it became apparent that rather than being part of the problem, SQL would instead continue to be part of the solution. If the new wave of database systems excluded the vast population of SQL-literate professionals, then their uptake in the business world would be impeded. Furthermore, a whole generation of business intelligence tools use SQL as the common way of translating user information requests into database queries. Nowhere was the drive toward SQL adoption more clear than in the case of Hadoop.

Posted August 10, 2015

Dystopian visions of a future in which automation eliminates the vast majority of jobs are nothing new. However, even though previous predictions of doom have been misplaced, there is new concern about the impact of the latest generation of automation on the nature of work and the prospects for universal employment in the future. In particular, we're increasingly seeing automation disrupt jobs that were long considered to require human judgment or abilities.

Posted July 08, 2015

There's no doubt that the new wave of nonrelational systems represents an important and necessary revolution in database technology. But while we need to avoid being wedded to the technologies of the past and continuously innovate, ignoring the lessons of history is never a good idea.

Posted June 09, 2015

You would have to have been living under a rock for the past few years not to have heard of Bitcoin. Bitcoin is an electronic "crypto" currency which can be used like cash in many web transactions. At time of writing there are about 14 million bitcoins in circulation, trading at approximately $250 for a total value of about $3.5 billion.

Posted May 14, 2015

While the new data stores and other software components are generally open source and incur little or no licensing costs, the architecture of the new stacks grows ever more complex, and this complexity is creating a barrier to adoption for more modestly sized organizations.

Posted April 06, 2015

Someone new to big data and Hadoop might be forgiven for feeling a bit confused after reading some of the recent press coverage on Hadoop. On one hand, Hadoop has achieved very bullish coverage in mainstream media. However, counter to this positive coverage, there have been a number of claims that Hadoop is overhyped. What's a person to make of all these mixed messages?

Posted February 11, 2015

Smart watches can perform continuous biometric validation (through pulse signatures and other cues), and are always at hand - or at least, at wrist. Coupled with the ability to continually monitor health and fitness, and perhaps even the ability to include basic phone capability, there are real advantages to be had as smart watches mature.

Posted January 07, 2015

The introduction of increased transactional capability into non-relational databases makes sense—in the same way that providing SQL layers on top of Hadoop and many other non-relational stores makes sense. But it does raise the possibility of convergence of relational and non-relational systems. After all, if I take a non-relational database and add SQL and ACID transactions, have I still got a non-relational database, or have I come full circle back to the relational model?

Posted December 03, 2014

We are now seeing a seismic shift and increase in the significance of social network data for marketing and brand analysis. The next wave of social network exploitation promises to allow companies to narrowly target consumers and leads to predict market trends, and to more actively influence consumer behavior.

Posted November 12, 2014

One feature of the big data revolution is the acknowledgement that a single database management system architecture cannot meet all needs. However, the Lambda Architecture provides a useful pattern for combining multiple big data technologies to achieve multiple enterprise objectives. First proposed by Nathan Marz, it attempts to provide a combination of technologies that together can provide the characteristics of a web-scale system that can satisfy requirements for availability, maintainability, and fault-tolerance.

Posted October 08, 2014

In this month's column, Guy Harrison writes about Docker, an open source project based on Linux containers that is showing rapid adoption. "Unlike virtual machines, Docker containers do not have to include a copy of the guest OS - each Docker container essentially shares the same copy of the underlying OS," explains Harrison. "This allows Docker containers to be much smaller, which, in turn, allows them to be more easily deployed, provides for greater density (more containers per host) and permits faster initialization."

Posted September 10, 2014

The pioneers of big data, such as Google, Amazon, and eBay, generated a "data exhaust" from their core operations that was more than sufficient to allow them to create data-driven process automation. But, for smaller enterprises, data might be the scarcest commodity. Hence, the emergence of data marketplaces.

Posted August 05, 2014

Big data analytics is a complex field, but if you understand the basic concepts—such as the difference between supervised and unsupervised learning—you are sure to be ahead of the person who wants to talk data science at your next cocktail party!

Posted June 11, 2014

The "Internet of Things" (IoT) is shifting from aspirational buzzword to a concrete and lucrative market. New-generation computing devices require new types of operating systems and networks. While many have been initially based on some variation of the Linux OS and connect using existing Wi-Fi and Bluetooth wireless protocols, new operating systems and networking protocols are emerging.

Posted May 08, 2014

About 3 years ago, the AMP (Algorithms, Machines, People) lab was established at U.C. Berkeley to attack the emerging challenges of advanced analytics and machine learning on big data. The resulting Berkeley Data Analytics Stack—particularly the Spark processing engine—has shown rapid uptake and tremendous promise.

Posted April 04, 2014

Ironically, although the thin client advocates were right about many things - the success of browser-based applications, in particular - they were dead wrong about the diminishing role of the OS. More than ever, the OS is the source of competitive differentiation between various platforms, and a clear focus of innovation for the foreseeable future.

Posted March 12, 2014

Solid State Disk (SSD)—particularly flash SSD—promised to revolutionize database performance by providing a storage media that was orders of magnitude faster than magnetic disk, offering the first significant improvement in disk I/O latency for decades. Aerospike is a NoSQL database that attempts to provide a database architecture that can fully exploit the I/O characteristics of flash SSD.

Posted February 10, 2014

New devices promise to open up ways for us to improve our mental functioning and perhaps to further revolutionize social networking and big data. A world in which Facebook "likes" are generated automatically might not be far off, and mining the big data generated from our own brains has some amazing - though sometimes creepy - implications.

Posted January 07, 2014

Not all Hadoop packages offer a unique distribution of the Hadoop core, but all attempt to offer a differentiated value proposition through additional software utilities, hardware, or cloud packaging. Against that backdrop, Intel's distribution of Hadoop might appear to be an odd duck since Intel is not in the habit of offering software frameworks, and the brand, while ubiquitous, is not associated specifically with Hadoop, databases or big data software. However, given its excellent partnerships across the computer industry, Intel has support from a variety of vendors, including Oracle and SAP, and many of the innovations in its distribution show real promise.

Posted December 04, 2013

Two new approaches to application quality have emerged: "risk-based testing" - pioneered in particular by Rex Black - and "exploratory testing" - as evangelized by James Bach and others. Neither claim to eradicate issues of application quality, which most likely will continue as long as software coding involves human beings. However, along with automation of the more routine tests, these techniques form the basis for higher quality application software.

Posted November 13, 2013

Security for NoSQL continues to evolve rapidly in order to attract wider enterprise adoption. Robust security is a must-have for any database in the enterprise, and over the decades since the emergence of the relational model, security and authentication capabilities have continually improved. The first new-generation non-relational "NoSQL" databases, like the early relational databases, had very simplistic security mechanisms. Here is how security for NoSQL is changing.

Posted October 09, 2013

We've seen a lot of progress in the nearly 50 years between 1965 and 2013: the modern smartphone and the World Wide Web, in particular,have been transformative. However, it's probably true that our current world would not astonish a time traveler from 1965 as much as 1965 would have surprised a visitor from 1915. Where are the jetpacks, flying cars, colonies on Mars? However, it does look like we are finallygoing to see one of the long-awaited benefits of the future: the self-driving car.

Posted September 11, 2013

New Hadoop Frameworks Feed the Need for Speed

Posted August 07, 2013

Like many of my generation, my early visions of the future were influenced by films like "2001: A Space Odyssey" and the original "Star Trek" TV series. In each of these, humans interact with computers using conversational English, posing complex questions and getting intelligent relevant responses. So, you can imagine how primed someone like me is to hear that Google has been explicitly trying to create that Star Trek computer. At the Google IO conference in San Francisco in May, Amit Singhal, Google senior vice president, spoke of his early childhood experiences watching "Star Trek," and his dreams of one day building that computer.

Posted July 09, 2013

When NoSQL first hit the IT consciousness in 2009, an explosion of NoSQL databases seemed to appear out of thin air. Some of these contenders had in fact been around for some time, with others thrown together rather quickly to exploit the NoSQL buzz. The NoSQL pack thinned out as leaders in specific categories emerged, but for some time, there was no clear leading key-value NoSQL database.

Posted June 13, 2013

The term "NoSQL" is widely acknowledged as an unfortunate and inaccurate tag for the non-relational databases that have emerged in the past five years. The databases that are associated with the NoSQL label have a wide variety of characteristics, but most reject the strict transactions and stringent relational model that are explicitly part of the relational design. The ACID (Atomic-Consistent-Independent-Durable) transactions of the relational model make it virtually impossible to scale across data centers while maintaining high availability, and the fixed schemas defined by the relational model are often inappropriate in today's world of unstructured and rapidly mutating data.

Posted April 10, 2013

Google's dominance of internet search has been uncontested for more than 12 years now. Before Google, search engines such as AltaVista indexed web pages and allowed for keyword search with an interface and functionality superficially similar to that provided by Google. However, these first-generation search engines provided relatively poor ordering of results. Because an internet search would return pages ranked by the number of times a term appeared on the website, unpopular or irrelevant sites would be just as likely to achieve top rank as popular sites.

Posted March 14, 2013

Hadoop is the most significant concrete technology behind the so called "Big Data" revolution. Hadoop combines an economical model for storing massive quantities of data - the Hadoop Distributed File System - with a flexible model for programming massively scalable programs - MapReduce. However, as powerful and flexible as MapReduce might be, it is hardly a productive programming model. Programming in MapReduce reminds one of programming in Assembly language - the simplest operations require substantial code.

Posted February 13, 2013

Coverage of Windows 8 has understandably focused on the revolutionary Metro interface. Many believe that this new interface, while fine for tablets and phones, is a step backwards for desktop productivity. By forcing users to switch between two modes of operation - desktop and Metro, Windows 8 diminishes productivity and imposes steep learning curve on new users. The Metro interface itself supports only very limited multi-tasking, so, serious work often must be done in the traditional Windows desktop. Microsoft implicitly acknowledges these limitations by providing the latest version of Microsoft Office, not in Metro format, but as traditional "desktop" applications.

Posted January 03, 2013

As the undisputed pioneer of big data, Google established most of the key technologies underlying Hadoop and many of the NoSQL databases. The Google File System (GFS) allowed clusters of commodity servers to present their internal disk storage as a unified file system and inspired the Hadoop Distributed File System (HDFS). Google's column-oriented key value store BigTable influenced many NoSQL systems such as Apache HBase, Cassandra and HyperTable. And, of course, the Google Map-Reduce algorithm became the foundation computing model for Hadoop and was widely implemented in other NoSQL systems such as MongoDB.

Posted December 06, 2012

Five years ago, Radio Frequency ID (RFID) seemed posed to revolutionize commerce. Way back in 2003, Wal-Mart announced that it would be requiring that RFID tags - so called "electronic barcodes" - be attached to virtually all merchandise. Many- myself included - became convinced that the Wal-Mart directive would be the tipping point leading to universal adoption of RFID tabs in consumer goods and elsewhere.

Posted November 13, 2012

Hadoop and the Big Data Revolution

Posted October 10, 2012

Google is the pioneer of big data. Technologies such as Google File System (GFS), BigTable and MapReduce formed the basis for open source Hadoop, which, more than any other technology, has brought big data within reach of the modern enterprise.

Posted October 10, 2012

The first computer program I ever wrote (in 1979, if you must know) was in the statistical package SPSS (Statistical Package for the Social Sciences), and the second computer platform I used was SAS (Statistical Analysis System). Both of these systems are still around today—SPSS was acquired by IBM as part of its BI portfolio, and SAS is now the world's largest privately held software company. The longevity of these platforms—they have essentially outlived almost all contemporary software packages—speaks to the perennial importance of data analysis to computing.

Posted September 11, 2012

Throughout the 2000s, a huge number of website developers rejected the Enterprise Java or .NET platforms for web development in favor of the "LAMP" stack - Linux, Apache, MySQL and Perl/Python/PHP. Although the LAMP stack was arguably less scalable or powerful than the Java or .NET frameworks, it was typically easier to learn, faster in early stages of development - and definitely cheaper. When enterprise architects designed systems, they often chose commercial application servers and databases (Oracle, Microsoft, IBM). But, when web developers or startups faced these decisions, the LAMP stack was often the default choice.

Posted July 25, 2012

Seriously chronic geeks like me usually were raised on a strong diet of science fiction that shaped our expectations of the future. Reading Heinlein and Asimov as a boy led me to expect flying cars and robot servants. Reading William Gibson and other "cyberpunk" authors as a young man led me to expect heads-up virtual reality glasses and neural interfaces. Flying cars and robot companions don't seem to be coming anytime soon, but we are definitely approaching a world in which virtual - or at least augmented - reality headsets and brain control interfaces become mainstream.

Posted July 11, 2012

One of the earliest of the new generation of non-relational databases was CouchDB. CouchDB was born in 2005 when former Lotus Notes developer Damien Katz foresaw the nonrelational wave that only fully arrived in 2009. Katz imagined a database that was fully compatible with web architectures — and more than a little influenced by Lotus Notes document database concepts.

Posted June 13, 2012

Websites such as MySpace, Facebook, and LinkedIn have brought social networking and the concept of online community to a huge cross-section of our society. Penetration and usage of these platforms may vary depending on demographic (age and geography, in particular), but no one can debate the impact of Facebook and Twitter on both everyday life and on society in general.

Posted May 09, 2012

It's hard to overestimate Amazon's influence on cloud computing and on NoSQL databases. Amazon Web Services (AWS) was the first and still is the leading concrete example of an infrastructure as a service (IaaS) cloud - a collection of cloud-based services such as compute (EC2), storage (S3) and other application building blocks.

Posted April 11, 2012

Sentiment Analysis Could Revolutionize Market Research

Posted March 29, 2012

In years to come, we might remember October 2011 as the month the big database vendors gave in to the dark side and embraced Hadoop. In October, both Microsoft and Oracle announced product offerings which included and embraced Hadoop as the enabler of their "big data" solution. The last of the big three database vendors - IBM - embraced Hadoop back in 2010.

Posted February 09, 2012

Along with thousands of IT professionals, I was in the San Francisco Moscone Center main hall last October listening to Larry Ellison's 2011 Oracle Open world keynote. Larry can always be relied upon to give an entertaining presentation, a unique blend of both technology insights and amusingly disparaging remarks about competitors.

Posted January 11, 2012

As the leading provider of relational database software, it's hardly surprising that Oracle initially gave little or no credence to the NoSQL movement that emerged in 2009. Indeed, an Oracle white paper from May 2011 concluded with the recommendation to "Go for the tried and true path," and avoid NoSQL databases.

Posted December 01, 2011

My 20-year-old daughter recently remarked that Facebook isn't as cool as it used to be. Sure, everyone has to be on Facebook, but that very ubiquity removes its mystique. The recently released Google+ is clearly targeted at Facebook and adds some features - particularly "Circles" - that are not available on Facebook. Facebook dominance may be indisputable today, but it is not guaranteed for all time. If I were Mark Zuckerberg, I would fear losing my cool status more than anything else.

Posted November 10, 2011

One of the greatest achievements in artificial intelligence occurred earlier this year when IBM's Watson supercomputer defeated the two reigning human champions in the popular Jeopardy! TV show. Named after the IBM founder Thomas Watson and not - as you may have thought - Sherlock Holmes' famous assistant, Watson was the result of almost 5 years of intensive effort by IBM, and the intellectual successor to "Deep Blue," the first computer to beat a chess grand master.

Posted October 15, 2011

The term "machine learning" evokes visions of massive super computers that eventually turn on and enslave humanity - think SkyNet from Terminator or HAL from 2001: A Space Odyssey. But the truth is that machine learning algorithms are common in web applications that we use every day and have a growing relevance to enterprise applications.

Posted September 14, 2011

Michael Stonebraker is widely recognized as one of the pioneers of the relational database. While at Berkeley, he co-founded the INGRES project, which implemented the relational principles published by Edgar Codd in his seminal papers. The INGRES project became the basis for the commercial Ingres RDBMS, which, during the 1980s, provided some of the most significant competition to Oracle.

Posted August 11, 2011

One of the funniest moments in the classic Star Trek motion pictures is the scene when the engineer "Scotty" - who has traveled back in time to the 1980s with his comrades - attempts to use a computer. "Computer!" he exclaims, attempting to initiate a dialogue with the PC. Embarrassed, a contemporary engineer hands him a mouse. "Aha," says Scotty who then holds the mouse to his mouth only to again exclaim, "Computer!" The idea that computers in the future would be able to understand human speech was common a few decades ago. Speech generation and recognition is so fundamental to the human experience that we tend to underestimate the incredible complexity of human information processing that makes it possible.

Posted July 07, 2011

Both HBase and Cassandra can deal with large data sets, and provide high transaction rates and low latency lookups. Both allow map-reduce processing to be run against the database when aggregation or parallel processing is required. Why then, would a merge of Cassandra and Hadoop be a superior solution?

Posted June 08, 2011

The rise of "big data" solutions - often involving the increasingly common Hadoop platform - together with the growing use of sophisticated analytics to drive business value - such as collective intelligence and predictive analytics - has led to a new category of IT professional: the data scientist.

Posted May 12, 2011

The relational database is primarily oriented toward the modeling of objects (entities) and relationships. Generally, the relational model works best when there are a relatively small and static number of relationships between objects. It has long been a tricky problem in the RDBMS to work with dynamic, recursive or complex relationships. For instance, it's a fairly ordinary business requirement to print out all the parts that make up a product - including parts which, themselves, are made up of smaller parts. However, this "explosion of parts" is not consistently supported by all the relational databases. Oracle, SQL Server and DB2 have special, but inconsistent, syntax for these hierarchical queries, while MySQL and PostgreSQL lack specific support.

Posted April 05, 2011

When computers first started to infringe on everyday life, science fiction authors and society in general had high expectations for "intelligent" systems. Isaac Asimov's "I, Robot" series from the 1940s portrayed robots with completely human intelligence and personality, and, in the 1968 movie "2001: A Space Odyssey," the onboard computer HAL (Heuristically programmed ALgorithmic computer) had a sufficiently human personality to suffer a paranoid break and attempt to murder the crew!

Posted March 09, 2011

Salesforce.com is well known as the pioneer of software as a service (SaaS) - the provision of hosted applications across the internet. Salesforce launched its SaaS CRM (Customer Relationship Management) product more than 10 years ago, and today claims over 70,000 customers. It's less widely known that Salesforce.com also has been a pioneer in platform as a service (PaaS), and is one of the first to provide a comprehensive internet-based application development stack. In 2007 - way before the current buzz over cloud development platforms such as Microsoft Azure - Salesforce launched the Force.com platform, which allowed developers to run applications on the same multi-tenant architecture that hosts the Salesforce.com CRM.

Posted February 02, 2011

The NoSQL acronym suggests it's the SQL language that is the key difference between traditional relational and newer non-relational data stores. However, an equally significant divergence is in the NoSQL consistency and transaction models. Indeed, some have suggested that NoSQL databases would be better described as "NoACID" databases - since they avoid the "ACID" transactions of the relational world.

Posted January 07, 2011

Because any database that does not support the SQL language is, by definition, a "NoSQL" database, some very different databases coexist under the NoSQL banner. Massively scalable data stores like Cassandra, Voldemort, and HBase sacrifice structure to achieve scale-out performance. However, the document-oriented NoSQL databases have very different architectures and objectives.

Posted November 30, 2010

Oracle CEO Larry Ellison has been notoriously critical of cloud computing - or at least of the way in which the term "cloud" has been applied. He often has expressed his frustration when "cloud" is applied to long established patterns such as software as a service (SaaS), especially when this is done by Salesforce.com. While there's widespread agreement that "cloud" has become a faddish, over-hyped and often abused term, some have speculated that Ellison's obvious frustration has been fueled by Oracle's inability to fully engage in the cloud computing excitement prior to the conclusion of the Sun acquisition.

Posted November 09, 2010

The relational database - or RDBMS - is a triumph of computer science. It has provided the data management layer for almost all major applications for more than two decades, and when you consider that the entire IT industry was once described as "data processing," this is a considerable achievement. For the first time in several decades, however, the relational database stranglehold on database management is loosening. The demands of big data and cloud computing have combined to create challenges that the RDBMS may be unable to adequately address.

Posted October 12, 2010

In Greek mythology, Cassandra was granted the gift of prophesy, but cursed with an inability to convince others of her predictions - a sort of unbelievable "oracle," if you like. Ironically, in the database world, the Cassandra system is fast becoming one of the most credible non-relational databases for production use - a believable alternative to Oracle and other relational databases.

Posted October 12, 2010

The promises of public cloud computing - pay as you go, infinite scale and outsourced administration - are compelling. However, for most enterprises, security, geography and risk mitigation concerns make private cloud platforms more desirable. Enterprise customers like the idea of on-demand provisioning, but are often unwilling to take the performance, security and risk drawbacks of moving applications to remote hardware that is not under their direct control.

Posted September 07, 2010

NoSQL - probably the hottest term in database technology today - was unheard of only a year ago. And yet, today, there are literally dozens of database systems described as "NoSQL." How did all of this happen so quickly? Although the term "NoSQL" is barely a year old, in reality, most of the databases described as NoSQL have been around a lot longer than the term itself. Many databases described as NoSQL arose over the past few years as reactions to strains placed on traditional relational databases by two other significant trends affecting our industry: big data and cloud computing.

Posted August 10, 2010

In biology, we are taught that survival favors diversity. Organisms that reproduce without variation die out during periods of rapid change, while organisms that show variation in feature tend to survive and adapt. Likewise, ecosystems consisting of relatively few homogenous species thrive only when conditions stay static. Does IT diversity create a competitive advantage in the business application ecosystem? Predictably, large vendors with vertically integrated stacks argue that mixing software components is a Bad Thing. These vendors claim that reducing the diversity in the application stack leads to better efficiency and maintainability.

Posted July 12, 2010

Although VMware continues to hold the majority share of the commercial virtualization market, other virtualization technologies are increasingly significant, though not necessarily as high profile. Operating system virtualization-sometimes called partial virtualization-allows an operating system such as Solaris to run multiple partitions, each of which appears to contain a distinct running instance of the same operating system. However, these technologies cannot be used to host different operating system versions, making them less appealing to enterprises seeking to consolidate workloads using virtualization.

Posted June 07, 2010

Until recently, IT professionals have been conditioned to regard response time, or throughput, as the ultimate measure of application performance. It's as though we were building automobiles and only concerned with faster cars and bigger trucks. Yet, just as the automotive industry has come under increasing pressure to develop more fuel-efficient vehicles, so has the IT industry been challenged to reduce the power drain associated with today's data centers.

Posted May 10, 2010

Spreadsheets, which have long been a disruptive force to enterprise IT, to some extent are the "killer" applications that helped drive the adoption of personal computers (PCs) in the enterprise. Spreadsheet products such as Lotus 1,2,3 - and early versions of Excel on the Mac - saw rapid adoption by business users. Inevitably, these users pushed the boundaries of the spreadsheet model, using spreadsheets as databases, and even to develop simple business applications. In the late 1980s, it was typical to see corporate IT rolling out massively expensive mainframe-based solutions, while departmental users got their real work done on spreadsheets running on cheap PCs.

Posted April 07, 2010

Open source applications were somewhat niche at the beginning of the decade but now are clearly mainstream. Credible open source alternatives now exist for almost every category of application, as well as every component of the application.

Posted March 04, 2010

In 1995, Netscape founder Marc Andreessen famously claimed that applications of the future would run within a web browser, relegating the role of the operating system - Windows, in particular - to "a poorly debugged set of device drivers." Fifteen years later, we can see that although rich applications such as Microsoft Office are still dominant, the web browser has become a platform that can deliver almost any conceivable type of business or consumer application.

Posted February 09, 2010

Google's first "secret sauce" for web search was the innovative PageRank link analysis algorithm which successfully identifies the most relevant pages matching a search term. Google's superior search results were a huge factor in their early success. However, Google could never have achieved their current market dominance without an ability to reliably and quickly return those results. From the beginning, Google needed to handle volumes of data that exceeded the capabilities of existing commercial technologies. Instead, Google leveraged clusters of inexpensive commodity hardware, and created their own software frameworks to sift and index the data. Over time, these techniques evolved into the MapReduce algorithm. MapReduce allows data stored on a distributed file system - such as the Google File System (GFS) - to be processed in parallel by hundreds of thousands of inexpensive computers. Using MapReduce, Google is able to process more than a petabyte (one million GB) of new web data every hour.

Posted January 11, 2010

When a company like Microsoft talks about the future of computing, you can expect a fair bit of self-serving market positioning - public software companies need to be careful to sell a vision of the future that doesn't jeopardize today's revenue streams. But, when a company like Microsoft releases a new version of its fundamental development framework - .NET, in this case - you can see more clearly the company's technical vision for the future of computing.

Posted December 14, 2009

There's an old but clever internet parody describing the "Built-in Orderly Organized Knowledge device (BOOK)." This device is described as a "revolutionary breakthrough in technology" that is compact and portable, never crashes and supports both sequential and indexed information access. Though satirical, the article makes excellent points: the printed book is indeed an information technology device, arguably the oldest in widespread use today

Posted November 11, 2009

The idea of "virtual" reality—immersive computer simulations almost indistinguishable from reality—has been a mainstay of modern "cyberpunk" science fiction since the early 1980s, popularized in movies such as The Thirteenth Floor and The Matrix. Typically, a virtual reality environment produces computer simulated sensory inputs which include at least sight and sound, and, perhaps, touch, taste and smell. These inputs are presented to the user through goggles, earphones and gloves or—in the true cyberpunk sci-fi—via direct brain interfaces.

Posted October 13, 2009

Google introduced the MapReduce algorithm to perform massively parallel processing of very large data sets using clusters of commodity hardware. MapReduce is a core Google technology and key to maintaining Google's website indexes.

Posted September 14, 2009

Attendees at the O'Reilly Velocity conference in June were treated to the unusual phenomenon of a joint presentation by Google and Microsoft. The presentation outlined the results of studies by the two companies on the effects of search response time. Aside from the novelty of Microsoft-Google cooperation, the presentation was notable both in terms of its conclusions and its methodology.

Posted August 14, 2009

Predictive Analytics - sometimes referred to as Predictive Data Mining - is a branch of Business Intelligence that attempts to use historical data to make predictions about future events. At its simplest, predictive analytics utilizes statistical techniques, such as correlation and regression, which many of us have encountered in college or even high school. Correlation analysis determines if there is a statistically significant relationship between two variables. For instance, height and age are highly correlated, while IQ and height are very weakly correlated. Regression attempts to find an equation between the two or more variables, so that you can predict one from the other.

Posted July 13, 2009

Virtualization has changed the IT landscape more dramatically than perhaps any other technology introduced over the past decade. Virtualized environments are omnipresent in the modern data center due to their economic advantages in hardware consolidation and manageability.

Posted June 15, 2009

Applications Insight: Open Source in the Cloud

Posted May 19, 2009

Both open source software ( OSS) and cloud computing continue to experience strong interest and growth despite the economic downturn. Clearly, both provide the promise of reduced operating and software licensing costs. For instance, corporations looking to reduce the cost incurred by Microsoft Office licensing are looking more closely at the open source OpenOffice alternative, or at Google's online application suite, Google Apps. There's understandable resistance to moving from the rich experience offered by Microsoft to these lower-cost alternatives, but resistance has a way of disappearing in the face of financial imperatives.

Posted May 15, 2009

The business intelligence (BI) market is big: at least $10 billion in 2008 and much more if you include data warehousing projects. The tough economic environment may slow the growth of the BI market, but cost constraints, compliance and similar measures demanded by the current economy require accurate and timely business data, so BI is expected to remain a vigorous market segment regardless of the macro-economic situation.

Posted April 15, 2009

Way back in 2003, Walmart announced that it would require Radio Frequency ID (RFID) tags—so-called "electronic barcodes"—to be attached to virtually all merchandise. Walmart pioneered the use of the printed bar code back in the 1970s, and many—myself included—became convinced that the company's directive would be the tipping point leading to universal adoption of RFID tabs in consumer goods and elsewhere.

Posted March 15, 2009

few years ago, it seemed as though the days of the "micro-ISV"-very small Independent software vendors consisting of one or two developers-were over. The role once played by shareware windows applications had been supplanted by free web applications financed by advertising revenue. The start-up costs for such web applications-including funding a scalable and reliable web hosting infrastructure-were beyond the reach of most small software entrepreneurs.

Posted February 15, 2009

In the classic comedy, "The Hitchhikers' Guide to the Galaxy," a frustrated Ford Prefect can't understand why a bunch of marketing consultants shipwrecked on prehistoric earth can't invent the wheel.

Posted January 15, 2009

MOORE'S law—first expressed by Intel cofounder Gordon Moore in 1965—predicts that computing power will increase exponentially, doubling roughly every 18 months. Moore's law has proved remarkably accurate and we have all benefited from the rapid growth in CPU and computer memory available for our desktop computers.

Posted December 15, 2008

Posted November 15, 2008

Posted September 15, 2008

Non-relational cloud databases such as Google's BigTable, Amazon's SimpleDB and Microsoft'sSQL Server Data Services (SSDS) have emerged. But while these new data stores may well fill a niche in cloud-based applications, they lack most of the features demanded by enterprise applications - in particular, transactional support and business intelligence capabilities.

Posted July 15, 2008

For the first time in over 20 years, there appear to be cracks forming in the relational model's dominance of the database management systems market. The relational database management system (RDBMS) of today is increasingly being seen as an obstacle to the IT architectures of tomorrow, and - for the first time - credible alternatives to the relational database are emerging. While it would be reckless to predict the demise of the relational database as a critical component of IT architectures, it is certainly feasible to imagine the relational database as just one of several choices for data storage in next-generation applications.

Posted June 15, 2008

Sponsors