Wednesday, December 30, 2020

Computational Statistics (So-Called AI) Is Inherently Biased

FT |  Baysean statistical models, (so-called AI) inherently amplify bias of whatever data set they have been modeled on.  Moreover, this amplification is exponential, meaning the more you use AI, the more biased it will get via self learning. Since it is impossible to eliminate  bias completely in a training dataset, any AI system will eventually become extremely biased. Self correcting mechanisms suffer from the same problem since they too are AI based, you end up with a system that is unstable and will always eventually become extremely biased based on even minute impossible to eradicate biases in its initial data set. 

“F*** the algorithm!” became one of the catchphrases of 2020, encapsulating the fear that humanity is being subordinated to technology. Whether it was British school students complaining about their A level grades or Stanford Medical Centre staff highlighting the unfairness of vaccination priorities, people understandably rail against the idea of faceless machines stripping humans of agency. 

This is an issue that will only grow in prominence as artificial intelligence becomes ubiquitous in the computer systems that power our modern world. To some extent, these fears are based on a misconception. Humans are still the ones who exercise judgment and algorithms do exactly what they are designed to do: discriminate. Whether they do so in a positive or a negative way depends on the humans who write these algorithms and interpret and act upon their output. 

It may on occasion be convenient for a government official or an executive to blame some “rogue” algorithm for their mistakes. But we should not be fooled by this rhetoric. We should hold those who deploy AI systems legally and morally accountable for the outcomes they produce. Artificial intelligence is no more than a technological tool, like any other. It is a powerful general purpose technology, akin to electricity, that enables other technologies to work more effectively. But it is not a property in its own right and has no agency. AI would sound a lot less frightening if we were to relabel it as computational statistics. 

That said, companies, researchers and regulators should pay particular attention to the feedstock used in these AI systems: data. Researchers have shown that partial data sets used to power modern AI systems can bake in societal inequities and racial and sexual prejudices. This issue has been highlighted at Google following the departure of Timnit Gebru, an ethical researcher, who claimed she was dismissed after warning of the dangers of large-scale language generation systems that rely on historic data taken from the internet.