Artificial Intelligence: There’s a Ghost in the Machine—Our Own Biases

This decade has seen AI take on a somewhat similar role to that of the internet in the 1990s. Everyone knows that it’s set to become ubiquitous, major organizations are shifting toward it, and yet very few people fully understand it. This is a problem because, as AI becomes increasingly prevalent, it’s critical that we do not treat it as simply a magic box to which we can pose any question and it will give us the “right” answer.

Computing pioneer Charles Babbage was once asked, “If you give your computer the wrong data, how does it arrive at the right result?” He responded that he simply couldn’t comprehend the level of confusion that would lead someone to ask such a question. As an anecdote, it’s a perfect reminder that any computer or AI is only ever as good as two things—the algorithms that underpin it and data it is given to work with. The crucial point is that data is delivered by humans, and algorithms are built by humans, who can introduce all sorts of issues, not the least of which is their own unconscious biases.

As the algorithms we’re using become more sophisticated, bias can become harder to spot. Researchers who work with AI have a responsibility to constantly guard against this and keep their work as impartial as possible. This is especially important in industries such as pharmaceuticals, where poor calculations could have huge healthcare ramifications, including slowing down the progress of drugs to market or causing reduced efficacy for new drugs.

When Things Go Wrong

Bias—whether consciously or unconsciously introduced—can ruin any project and lead to extremely damaging conclusions if inaccurate or incomplete data is fed in. We’ve already seen multiple examples of engineers being unable to prevent AI networks from accidentally becoming racist or sexist, and it’s not hard to see how bias can lead to many worrying outcomes.

For example, to take a case of fairly obvious bias, what could happen if a new drug were only ever tested on Caucasian men? Other genders and ethnicities could have entirely different, more negative, reactions. It’s hardly a theoretical problem either—in 2014, a study found that 86% of the clinical trial population was white and two-thirds of the population was male. The more we learn about the genetic code, the more we are realizing the safety and efficacy of any given drug will vary greatly depending on the patients’ genetic profile. Nor is this simply a theoretical problem. Across the industry, a staggering 93% of patients fail to make it through clinical trials. This rate of dropout means researchers can’t gather fresh data each time and instead are relying on historical data, which can be heavily weighted toward particular ethnicities or genders.

To make matters even more complicated, there are less obvious sources of bias that can also creep in. For example, if a company is conducting pharmacovigilance by using AI to monitor social media posts for any mention of adverse reactions to a particular drug, it could easily fail to account for cultural differences between different groups or demographics. This could lead to the company to erroneously believe that it is only having side effects for one group of patients and not others. 

Removing the Bias?

So how can researchers combat this problem? Unfortunately, it’s impossible to fully remove bias from an experiment. No dataset is ever totally comprehensive and so results will always be skewed to a certain degree. On top of that, not all information is created equal. Some datasets are more reliable and objective than others.

One partial solution is to ensure that the widest and most diverse range of data is being used so that bias is minimized. If AI is able to gather and contextualize data from multiple sources, the outputs will inevitably be richer and more comprehensive. However, this is a big data management challenge as firms need to collate and harmonize data from both a number of different internal company data silos as well as external sources, such as published literature or patent data. This mandates the need for robust platforms with advanced computing power and intelligent algorithms built by expert scientists in their field.

In fact, smart researchers will not only look to offset any potential bias in their AI systems but have the systems hunt for it themselves by flagging issues such as a dataset overwhelmingly biased toward white men. This allows researchers to easily spot certain forms of bias and make sure they’re accounted for.

The Human Touch

Beyond using the broadest possible datasets, first-class tools, and best practices for data management, another key factor in offsetting bias is the knowledge of experienced researchers. Having experts look over the findings from AI systems can often highlight results that could have occurred due to bias in the data. Scientists working with AI systems must constantly interrogate and sanity-check the answers they’re given. For example, in our pharmacovigilance case above, AI might tell researchers that patients in a few select countries are having side effects that aren’t being detected elsewhere on social media. Rather than assuming the problem is limited to those countries, they would need to ask why those countries and not others? More importantly, they would need to ask if there are factors involved that the AI algorithm wouldn’t pick up, such as cultural differences.

The problem of bias is only increasing in importance as AI moves into the mainstream. Experts have predicted that, by 2030, AI will add more value globally than the outputs of China and India combined. It has an exceptional capacity to enhance our lives by aiding in scientific discovery. Yet this is only going to be possible if it’s used intelligently and in conjunction with expert human researchers.

AI is not a panacea, and it has the potential to make huge miscalculations if given incorrect or biased data. The modern researcher should heed the words of Charles Babbage and remember to constantly question and interrogate the results they are given, no matter how advanced a computer they are working with.


Subscribe to Big Data Quarterly E-Edition