The Department of Chemical and Biological Engineering presents the fall 2023 seminar series with guest speaker Hadis Anahideh, an assistant professor of industrial engineering at the University of Illinois Chicago, who will give a presentation on “Fairness in Machine Learning: Challenges and Solutions for Data Labeling.” This event will take place on Wednesday, November 1, from 3:15–4:30 p.m. in the Perlstein Hall Auditorium (room 131). This event is open to the public.
Machine learning (ML) is a powerful tool for solving complex problems in various domains, such as healthcare, education, finance, and social justice. However, ML models are not immune to the biases and inequalities that exist in the real world. As we seek to harness the full potential of ML, questions about fairness in model behavior and the reliability of labeled data loom large. How can we ensure that ML models are fair and do not discriminate against certain groups of people? How can we obtain reliable and representative labeled data for training and evaluating ML models? How can we leverage the wisdom of the crowd to foster fairness in data collection and prediction? This seminar explores two pivotal works addressing these pressing questions.
In the quest for fairness, we grapple with the challenge of obtaining accurate and representative labeled data. Active learning, an approach that enables ML models to query unlabeled data points for human annotation, offers a cost-effective means of improving model performance. However, it is not without its caveats. Active learning can inadvertently introduce or amplify unfairness in the data, particularly when data is scarce, expensive, or prone to bias. To address this, we introduced the concept of an “expected fairness metric.” This metric assesses the potential impact of unlabeled samples on model fairness, even when the true labels remain unknown. Moreover, we proposed optimization methods designed to fine-tune the delicate balance between model accuracy and fairness in active learning.
But data labeling and fairness concerns extend beyond active learning. To acquire the label for human annotation crowdsourcing is a unique resolution. Crowdsourcing, a popular method for obtaining labeled data from a diverse pool of workers, has its own set of challenges. Workers in a crowdsourcing environment bring varying perspectives, expertise, and potential biases to the table. This diversity can lead to conflicting or unreliable responses, potentially undermining the fairness of the collected data. In response, we introduced a novel similarity measure that captures the degree of agreement among workers on a given task, accounting for their backgrounds and preferences. This measure provides a foundation for assessing fairness in the context of crowdsourced data. We also proposed pre-processing and in-processing methods that leverage this similarity measure to modify worker responses and generate fair predictions, taking into account both the consensus and context of worker contributions.
Hadis Anahideh is an assistant professor of industrial engineering at the University of Illinois Chicago. She has a Ph.D. in industrial engineering from the University of Texas at Arlington and a B.S. in applied mathematics from Shahid Beheshti University in Iran. Her research interests include black-box optimization, active learning, machine learning, and algorithmic fairness. She aims to develop innovative learning and optimization methodologies for engineering operations and design, and social systems. She has published several papers in journals and conferences. She is the director of the Optimal Learning and Exploration Laboratory (OPLEX) at UIC, where she works with students and collaborators on various projects related to her research topics.