The term "machine learning" evokes visions of massive super computers that eventually turn on and enslave humanity - think SkyNet from Terminator or HAL from 2001: A Space Odyssey. But the truth is that machine learning algorithms are common in web applications that we use every day and have a growing relevance to enterprise applications.
Attracting and retaining customers on the web requires that you provide the most compelling user experience possible. The competitive advantage for ecommerce involves not only competition on price and inventory but also the sophistication of recommendations, targeted marketing, and tailored user experience. Some of these can be achieved by applying relatively simplistic algorithms to the mass of customer-generated data that web businesses accumulate as they grow. But, to deliver the ideal user experience, one needs algorithms that can improve over time and evolve with the shifts and turns of the market and consumer. This is the domain of machine learning.
Machine learning programs modify their algorithms based on experience. For instance, Gmail-and most other mail programs-invite the users to identify messages they consider to be spam. As users tag mail as spam, the mail program adapts its spam filter accordingly and becomes able to filter out an increasing number of spam emails.
Early well-known attempts at creating learning software emulated learning in the natural world. For instance, neural networks are modeled on the structure of the human brain. The brain comprises many billions of brain cells (neurons), each of which connects to a limited number of other neurons. Each connection (synapse) can become strengthened or weakened by activity over time, which (very simplistically) results in human learning. A neural network emulates the behavior of neurons and synapses in computer code.
Genetic algorithms attempt to emulate the development of complex behaviors that we observe in biological evolution. Algorithms replicate and mutate while a "survival of the fittest" regime ensures that only the best algorithms survive.
Genetic algorithms and neural networks are flexible and widely applicable in theory, and have some applicability in practice. However, modern machine learning techniques tend to leverage approaches that are targeted to specific problem sets and have been shown to have more practical application in commercial contexts.
Many machine learning techniques involve clustering and categorization. There are a variety of business problems that can be solved if we can group like objects effectively. For instance, a mechanism that groups emails can be used to identify spam, a mechanism that groups customers can be used to target advertising, and a mechanism to group purchases can be used as the basis of a recommendation system.
A group of algorithms with impressive-sounding names-state vector machines, k-means clustering, etc.-provide mechanisms for grouping or classifying customers and other entities. Another impressively named algorithm, NonNegative Matrix Factorization (NMF), can identify components or commonalities in data sets. NMF is particularly useful when trying to determine the themes in text documents for content recommendation systems.
While some of these the techniques are capable of learning with little human intervention, most require some degree of training or supervision to achieve optimal results. This training is usually achieved by providing examples. For instance, a classification algorithm may be provided with a list of items that have been assigned specific categories. The example serves as a starting point for future classification, though the classifications can still evolve over time.
Machine learning has definitely crossed the chasm from academic research to commercial application. Although most heavily utilized today in web 2.0 applications, there are very few realms of business for which some form of machine learning cannot provide competitive advantage.