Friday, March 19, 2021

If A Representation Is "A Bid For Power" Then Google Already Rules The World

doxa.substack |  The other point that’s always stressed in the AI Ethics literature, is that in the hands of large, powerful, status-quo-defining entities like Google, there's a feedback loop: the models are released back into the real world, where they tend to reinforce in some way the very status quo that produced them.

This circularity of status quo => model => status quo is well covered in Cathy O'Neil's 2016 book, Weapons of Math Destruction. O'Neill is mostly concerned with the models used by Big Finance, but the principle is exactly the same — models don't just reflect the status quo, they're increasingly critical to perpetuating it. Or, to borrow words from the title of an even earlier book on financial models by Donald MacKenzie, these models are "an engine, not a camera."

Unless I've missed something major, a very big chunk of the AI Ethics work amounts to stating and restating the age-old truth that big, costly, public representations of the regnant social hierarchy are powerful perpetuators of that very hierarchy. That's it. That's the tweet... and the paper... and the conference... and the discipline.

In the formulation of Gebru's paper, large language models (“large” because they’re trained on a massive, unsanitized corpus of texts from the wilds of the internet) re-present, or "parrot," the roblematic linguistic status quo. And in parroting it, they can perpetuate it.

As people in positions of privilege with respect to a society’s racism, misogyny, ableism, etc., tend to be overrepresented in training data for LMs (as discussed in §4 above), this training data thus includes encoded biases, many already recognized as harmful...

In this section, we have discussed how the human tendency to attribute meaning to text, in combination with large LMs’ ability to learn patterns of forms that humans associate with various biases and other harmful attitudes, leads to risks of real-world harm, should LM-generated text be disseminated.1

As someone who trained as an historian, it's not at all surprising to me that what was true of the Roman Colosseum — in everything from the class-stratified seating arrangement to the central spectacle — is also true of a the massively complex and expensive public display of cultural power that is Google's language model.