The roots of open source go back to the original pioneers of the computer revolution. Early pioneers of computing at organizations such as Bell Labs and MIT held a belief that sharing program code was essential to the progression of computer technology.
These ideals arguably crystalized in the GNU Project launched by Richard Stallman. The objective of GNU (a recursive acronym for GNU’s Not Unix) was to create a complete UNIX-compatible operating system. In 1989, the GNU General Public License (GPL) was born. This license—still in use today—gave users the rights to run, share, and modify the source code.
The GNU vision of an open source, UNIX-compatible operating system was realized with the emergence of Linux. While not created by GNU, it was licensed under the GPL. Very quickly, open source offerings such as the Apache Web Server and programming languages such as Python and Perl combined to create a viable alternative to the previously dominant commercial software stacks.
The Open Source Advantage
I like to think of open source adoption in evolutionary terms. Viruses such as COVID-19 initially succeed because they are transmissible—they move quickly from one host to another. Over time, they also succeed because they evolve—the delta variant of the coronavirus has a stronger feature set (is more contagious) than the alpha variant—and so newer versions rapidly replace older versions.
Open source software has many of these attributes. It’s very transmissible because developers can use open source software without paying any licensing fees. Open source typically evolves faster as well. If developers want a new feature, they can code it themselves and contribute it back to the project. These two attributes have resulted in open source projects out-evolving closed source alternatives during the 2000s.
The growth of web-based software has also helped drive this transformation. The LAMP stack—Linux Apache, MySQL, and PHP/Perl/Python—provided a totally free integrated software stack that could be used to economically build the second generation of web software (e.g., Web 2.0). As web-based applications increasingly replaced desktop-based Windows apps, they powered a shift toward open source.
As open source products mature, they become attractive to the enterprise as well for two reasons. First, they tend to be less expensive than their closed source equivalents. Second, the open source approach avoids vendor lock-in since—at least in theory—an enterprise can transition to the open source version without vendor assistance or get support from another organization that offers support services for the product.
Open Source Databases
Open source databases emerged on the scene in the mid-to-late 1990s, following closely behind the release of Linux.
Postgres had its roots in relational database research projects led by Michael Stonebraker at the University of California–Berkley in the mid-1980s, but it was when Postgres95 emerged in 1995, with SQL support and under a permissive license, that it saw significant adoption.
MySQL also emerged at about the same time. Whereas Postgres emerged from academia, MySQL arrived from a pragmatic need for an easy-to-use SQL engine. While Postgres was correct, MySQL was practical. Although MySQL was less technically sophisticated than Postgres, developers found its ease-of-use compelling, and it found a sweet spot as the “M” in the LAMP stack.
Although databases such as MySQL and Postgres changed the market landscape through the first decade of the 2000s, they did not change the technology outlook. These open source databases implemented subsets of the features in the big relational databases, but they were rarely unique feature sets.
However, when the biggest revolution in database technology since the relational model occurred, it was directly enabled by open source.
The Open Source Database Revolution
By the middle of the 2000s, the relational model had completely dominated the database market for more than 20 years. But by the end of that decade, an amazing proliferation of alternative database models—almost all driven by open source—had emerged.
The key drivers for this NoSQL trend were the demands of a new breed of always-on, globally distributed databases and the increasing value and volumes of data (as part of the rise of “big data”). However, if these were the drivers for the revolution, open source was the enabler. The ability for developers to rapidly iterate and innovate new database offerings based on open source allowed for the creation of literally dozens of new databases. Some of these—such as Cassandra, Neo4j, and MongoDB—still exist today and have achieved huge uptake. Others, such as Project Voldemort, Tokyo Cabinet, Dynamite, and Riak, are now gone. Out of a “Cambrian explosion” of new databases in the late 2010s, survival of the fittest resulted in the strongest open source databases surviving.
This explosion of new, open source database technologies dramatically illustrated the innovation advantage of open source. Commercial vendors, such as Oracle and Microsoft, seemed frozen in place while the open source database community seemed to be moving at the speed of light. A time-traveling database professional from 1995 would have no difficulty recognizing in 2020 the feature sets of the Oracle RDBMS and SQL Server, but the feature sets of Cassandra, Neo4j, and MongoDB would be completely unfamiliar.
Within a decade, open source databases were completely mainstream. Of the top five database platforms listed by DB-Engines as of 2018, three—MongoDB, MySQL and Postgres—were open source.
However, in recent years, some open source database platforms have felt themselves under attack by the mega-cloud vendors and, most particularly, by Amazon. Amazon has successfully monetized many open source database products in its cloud. Amazon offers PostgreSQL and MySQL as commercial services in its cloud; it is able to generate revenue off these open source products without having to invest in database research and development. Amazon’s versions of Elasticsearch and Redis have similarly been very popular.