This article is part one of a two-part series on NSF-funded AI research by professors in Purdue Polytechnic’s Department of Computer and Information Technology.
Navigating the new world of AI
In late June, the National Science Foundation (NSF) awarded Purdue Polytechnic’s Romila Pradhan with a five-year grant of nearly $500,000. Pradhan, an assistant professor in computer and information technology, received NSF’s early CAREER award for promising researchers for her proposal to study and reduce bias in artificial intelligence (AI) and machine learning systems.
“In healthcare, finance, education and many other fields, machine learning was already being used in these domains for a fairly long time. But AI in general is really present in the public consciousness now,” Pradhan said. “If machine learning will continue to be used in all these critical areas, how can we make this system trustworthy?”
In 2018, Amazon’s machine learning specialists revealed that they scrapped the company’s AI recruiting tool after they discovered it was consistently “downgrading” female applicants. Similar systems have come under scrutiny in other sectors of business and public life; ProPublica reported that the COMPAS tool used by certain U.S. police departments to predict long-term criminal outcomes was biased against Black offenders, and Pradhan cited systems in both healthcare and banking that have displayed racial bias.
The data’s fault, or human error?
“This project is under the umbrella of ‘explainable AI’ and responsible data science, which seek to explain the processes and outcomes of algorithmic systems to users,” Pradhan stated. She has pulled from real-world examples, such as financial algorithms used to analyze individuals’ credit history, to explain to stakeholders whether certain AI outcomes are biased.
“If someone is denied a loan because they don’t have money in a savings account, that’s an appropriate appraisal of the situation by the learning model,” Pradhan said. “But if it’s just denying that person because the applicant is part of a demographic group where data indicates worse outcomes, then that’s a pretty straightforward instance of discrimination.
Pradhan, whose research interests are in data management and machine learning, emphasized that data isn’t the problem. It’s the tough judgement calls where a person may decide something incorrectly. If the data is coming from a good source, “then it simply says what it says. … Machine learning models are as good as the data they’re trained on.” Rather, it’s the various processes by which the data engineers, scientists and other experts prepare and curate data (e.g. not accounting for statistical anomalies, geographically including or excluding certain population data, etc.) that can lead to a wide variety of unintended consequences.
For instance, Amazon’s recruitment tool was trained by repeated analysis of resume data from current and past applicants within the company. Because the majority of successful candidates within tech companies are men, Pradhan asserted that “the algorithm essentially learned that ‘male equals good’” through repeated exposure to the winning job-seekers.
NSF and the CAREER awards
The NSF CAREER award is designed to allow Pradhan, an early-career scholar who received her doctorate in computer science from Purdue in 2018, to investigate data preparation and data handling practices as a first step toward reducing bias and discrimination in machine learning applications.
“I want to address the problem of biased decisions through the lens of data. Once an application is found to generate biased decisions, I want to dig deeper into the data and the data pipeline to locate the source of the bias,” Pradhan stated. “The eventual goal is to be able to fix the problem, say by acquiring additional data, removing problematic data or modifying a data cleaning step.”
“The hope is to go beyond analyzing how these models get things wrong, but to develop training methods for the creators that show how to correct the errors or to prevent the issue from happening in the first place,” Pradhan said.
A fairer future
It may be easier than one thinks to bend data sets toward discriminatory outcomes. The judgement calls regarding what data to include and exclude touch many different field experts, and are dependent on the decisions each one individually makes. “It’s not just end-users,” Pradhan stated. “It’s also data engineers, which are separate from data scientists. And then there are specified machine learning engineers whose job is just to make and upkeep the learning model. Then these models also affect the domain experts and the people in charge of doing [user experience] design.”
The task of creating a truly fair learning model requires the subject matter experts to not only have the technical know-how, but to also understand how to present data to learning models correctly. “In essence, it is important to study how humans with different technical roles and expertise help in mitigating the bias," Pradhan said.
As it currently stands in most industries, “there’s a black-box decision-making system that produces outcomes which aren’t always fair or don’t always make sense,” Pradhan stated. “The hope is to go beyond analyzing how these models get things wrong, and to develop training methods that show how to correct the errors or to prevent the issue from happening in the first place.”
Prior to joining Purdue Polytechnic’s Department of Computer and Information Technology in 2021, Pradhan received a Ph.D. from the Department of Computer Science at Purdue University and completed postdoctoral research at Purdue University and the University of California, San Diego.
- NSF CAREER award: Data Preparation for Trusted and Fair Data Science
- Amazon scraps secret AI recruiting tool that showed bias against women (Reuters)
- Machine Bias (ProPublica analysis of the COMPAS crime prediction tool)
- New faculty, staff bring diverse backgrounds, research interests to Purdue Polytechnic (Purdue Polytechnic newsroom)