CS Seminar Speaker Dr. Jonathan Dunn

Measuring and Correcting Geographic Bias within NLP
Dr. Jonathan Dunn | University of Canterbury

Event Details
Tuesday, November 5, 2019
12:45–1:45 p.m.
Stuart Building, Room 111

Current methods in natural language processing depend heavily on text data sets collected from Twitter and the web to represent both human languages and human activities. But how well do these two sources actually correspond with global population demographics? First, I evaluate digital language data against global census information to show which populations are currently under-represented. Second, I use computational dialectology to show that there is significant linguistic differences across these populations. Third, I present a new set of geographically balanced gigaword corpora for over 30 languages, providing a practical solution to ensure that NLP represents all human populations.

Jonathan Dunn is an assistant professor of Linguistics at the University of Canterbury in Christchurch, New Zealand, where he teaches syntax and computational linguistics. He received his PhD in linguistics (2013) from Purdue University, and then worked as a computer science research fellow (2014–15) at Illinois Institute of Technology and a visiting scientist (2015–18) at the National Geospatial Intelligence Agency. He works on computational models of both individual cognitive processes (grammar and metaphor) and socially situated processes (dialectal variation and language mapping).