Skip to content

Digital Impact was created by the Digital Civil Society Lab at Stanford PACS and was managed until 2024. It is no longer being updated.

Closing the Fairness Gap in Machine Learning

Grants, Profiles

With racial unrest shedding new light on AI’s fairness problem, an open source tool developed with the help of Digital Impact aims for a more holistic fix.

Artificial intelligence has a problem with bias. Just ask Niki Kilbertus, a doctoral student and researcher at Germany’s Max Planck Institute for Intelligent Systems. The rising star in machine learning wants to put an end to discrimination in digital services—something that won’t happen until we take a cold hard look at our failing relationship with AI.

Kilbertus sees the current global reckoning of racial injustice—Black Lives Matter protests in particular—as a reminder of how fairness techniques often get lost between the research and development stages of real-world applications.

Minimizing inequities in machine learning will become increasingly difficult if researchers, companies, and policymakers don’t do a better job of coordinating their efforts. “There is no consensus on how discrimination in machine learning algorithms should be assessed or prevented,” he says. “There is a real need to develop a unified set of tools.”

“Think of it as an entry point for anyone who wants to build fair machine learning models.”

With support from a Digital Impact grant, Kilbertus and three like-minded scientists built Fairensics, an online library that collects and makes available advanced tools developed by researchers and private industry for counteracting bias in algorithms.

“Think of it as an entry point for anyone who wants to build fair machine learning models,” he says. “Fairensics lets them see, and play around with, the latest state-of-the-art tools without having to start from scratch. The hope is that they will advance what has already been done.”

As project lead, Kilbertus hopes to eventually make the GitHub-hosted repository user-friendly enough to help non-techies understand how to think about—and evaluate—potential bias in any dataset, including their own.

“We want to make it easier for journalists and people working for NGOs to know how to discover bias within any machine learning system they encounter, and to also identify potential problems in their own data,” he says.

A Collection of Standardized Tools

The idea behind Fairensics was born in 2016, after ProPublica reported that a popular risk-assessment tool used by criminal courts to set bail and prison terms was biased against Black defendants. In response, the software developer claimed its own audit had found that the tool’s recommendations were racially neutral.

Kilbertus says ProPublica and the developer were both right. Each had used different approaches to arrive at opposing conclusions. “They sliced and diced the data in different ways,” he says. “What’s interesting is that both methodologies seemed like something you would want your algorithm to satisfy in order to be fair.”

The problem of distinct yet valid approaches to data analysis extends to research, according to Kilbertus. Scholars, too, rely on different models and are often unaware of each other’s work.

Kilbertus decided to do his part to unify methodologies and help create a shared understanding of how best to assess fairness given AI’s inherent limitations. When he and his team began working on Fairensics, he couldn’t find a single place where scholars, companies or anyone else concerned about machine learning bias could go to find what others in the community were doing.

That changed fast, thanks to a number of high-profile stories that exposed the true extent of discrimination in AI. For example, an AI-based hiring system Amazon planned to use but then scrapped, favored male applicants. Google Translate showed signs of gender bias. In another example, facial recognition technology was found to misidentify Black women at far higher rates than white males.

Many scholars leave the ethical considerations to policymakers and private industry.

Soon after, universities, nonprofits, and private companies were developing and open sourcing toolkits; IBM and Microsoft were among the tech giants building and publicly releasing fairness solutions. Last month, LinkedIn weighed in with its own fairness fix.

“The research community and industry have really started to pull in the same direction in terms of trying to solve and mitigate the problem,” says Kilbertus. He was so impressed with IBM’s contribution, AI Fairness 360 (AI360), that his team built Fairensics on top of it.

“The Digital Impact grant enabled us to invest the time and effort to make our ideas useable for the general public,” Kilbertus says. “As researchers, we do not always get that opportunity.”

A Call to Action

Kilbertus says there is a lot more work to be done—not just to realize his vision for Fairensics, but also to mitigate bias in AI. He thinks that while scholars are quick to identify the technical fixes to human bias in machine learning, many of them leave the ethical considerations to policymakers and private industry.

Black Lives Matter, however, shows that researchers have a bigger responsibility to address inequality in machine learning—whether it is based on race, sex, religion, or another trait. This means questioning whether AI is appropriate in a given situation.

As an example, he points to the pretrial risk assessment program ProPublica reported on in 2016. To determine flight risk, the software would draw on a criminal defendant’s history of missing court appearances.

But the algorithm failed to consider the many reasonable explanations for why someone would not appear in court—work scheduling conflicts and a lack of childcare, to name a few. One case showed an unresolved conflict with overlapping court dates. In other words, the court system expected the defendant to be in two places at once.

“As researchers, we turn ideas around equity and fairness and disparate impact into technical discussions of error rates and optimization techniques,” says Kilbertus. But the bigger issue is, ‘Why do we even have pretrial risk assessments?’ We need to start taking the bigger picture into account and thinking about how social unfairness comes about in the first place.”

To some researchers, Kilbertus’s suggestion is sacrilege. But his hope is that Fairsenics becomes a place for finding the latest innovations and cutting-edge ideas—whether about ethics or something else—around minimizing biases in machine learning.

“We want Fairensics to be a dynamic library, where new insights and new methods can be found and explored by researchers and non-researchers alike,” he says. “There are gaps in how the problem of bias is currently being addressed. We want to help bridge them.”

From 2016 to 2018, Digital Impact awarded grants to research teams looking to advance the safe, equitable, and effective use of digital resources for social good. With support from the Bill & Melinda Gates Foundation, the Digital Impact Grants program awarded more than half a million dollars over three years.