Search GSSD

Semantics derived automatically from language corpora contain human-like biases

Abstract: 
"Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology." This paper proves that artificial intelligence applications are susceptible to human-biases. The paper gives specific proof of racism and sexism found in off-the-shelf artificial intelligence technology when applied to job resumes and other text-based data. In this paper, the bias is learned by the AI algorithm from humans. Keywords: artificial intelligence and privacy, how data becomes private, artificial intelligence and equal opportunity
Author: 
Caliskan, Aylin; Bryson, Joanna J.; Narayanan, Arvind.
Institution: 
Princeton
Year: 
2017
Domains-Issue Area: 
Dimensions-Problem/Solution: 
Region(s): 
Industry Focus: 
Internet & Cyberspace
Country: 
United States
Datatype(s): 
Theory/Definition