Is your software racist?


Late last year, a St. Louis tech executive named Emre Şarbak noticed something strange about Google Translate. He was translating phrases from Turkish — a language that uses a single gender-neutral pronoun “o” instead of “he” or “she.” But when he asked Google’s tool to turn the sentences into English, they seemed to read like a children’s book out of the 1950’s. The ungendered Turkish sentence “o is a nurse” would become “she is a nurse,” while “o is a doctor” would become “he is a doctor.”

The website Quartz went on to compose a sort-of poem highlighting some of these phrases; Google’s translation program decided that soldiers, doctors and entrepreneurs were men, while teachers and nurses were women. Overwhelmingly, the professions were male. Finnish and Chinese translations had similar problems of their own, Quartz noted.

What was going on? Google’s Translate tool “learns” language from an existing corpus of writing, and the writing often includes cultural patterns regarding how men and women are described. Because the model is trained on data that already has biases of its own, the results that it spits out serve only to further replicate and even amplify them.

It might seem strange that a seemingly objective piece of software would yield gender-biased results, but the problem is an increasing concern in the technology world. The term is “algorithmic bias” — the idea that artificially intelligent software, the stuff we count on to do everything from power our Netflix recommendations to determine our qualifications for a loan, often turns out to perpetuate social bias.

Voice-based assistants, like Amazon’s Alexa, have struggled to recognize different accents. A Microsoft chatbot on Twitter started spewing racist posts after learning from other users on the platform. In a particularly embarrassing example in 2015, a black computer programmer found that Google’s photo-recognition tool labeled him and a friend as “gorillas.”

Sometimes the results of hidden computer bias are insulting, other times merely annoying. And sometimes the effects are potentially life-changing. A ProPublica investigation two years ago found that software used to predict inmates’ likelihood of being a high risk for recidivism was nearly twice as likely to be inaccurate when assessing African-American inmates versus white inmates. Such scores are increasingly being used in sentencing and parole decisions by judges, directly affecting the way the criminal justice system treats individual citizens. Crucial pieces of software can have big societal effects, and their biases can often go unnoticed until the results are already being felt.

The industry knows it has a problem; Google took a huge public relations hit after its gorilla-photo scandal. But the issue keeps cropping up, often hidden inside proprietary “black box” software packages, and compounded by the cultural blind spots of a disproportionately white and male tech industry. The problem is now landing squarely in the public-policy realm, and leaders are struggling with how to fix it.

THE UPBEAT WAY to talk about algorithms in public life is “smart governance” — the idea that software can give leaders quick answers and better tools to make decisions. Given their ability to crunch a massive amount of information in rapid fashion, algorithms are expected to become more and more important to decision-making at every level. Already, they’re being used to determine individuals’ eligibility for welfare, the number of police that should be sent to different neighborhoods and the residents most in need of public health assistance.

As they’ve caught on, the impressive potential of smart governance has become clouded by the uncertainty over just how those “smart” systems are sizing up people. The potential for underyling bias in software is not an easy issue for political leaders to tackle, in part because it’s so deeply technical. But regulators have begun taking notice at the federal level. A 2016 report from the Obama-era Office of Science and Technology Policy warned that the impact of artificial-intelligence-driven algorithms on workers has the potential to worsen inequality, and noted that bias buried in computer code could disadvantage individuals in a host of fields. (It’s not clear that the current White House shares its concerns: Microsoft and Google researchers behind the AI Now Initiative, which works with the American Civil Liberties Union, have warned about the Trump administration’s lack of engagement with AI policy.)

For all the agreement that bias is an issue, it’s far from clear just how to tackle it. One piece of legislation introduced in Congress does mention it; the Future of AI Act, sponsored by a small bipartisan group in the House and Senate, includes a plank titled “Supporting the unbiased development of AI.” The provision, though pioneering, doesn’t offer a solution: It would set up a 19-person federal advisory committee within the Commerce Department to track the growth of such technology and provide recommendations about its impact.

It’s uncertain whether this bill would get serious consideration, and if it did, that advisory committee would have its hands full. For one, the problem of hidden software bias is as varied as the number of algorithms out there. Because each algorithm learns from different data sets and features its own unique design, it’s tough to develop a standardized set of requirements that would apply to every distinct model. On top of all that, the software packages that contain the algorithms — even those used in public policy — are often proprietary, owned and guarded by the companies that developed them. Government bodies that use AI-driven software don’t necessarily have rights to look at the underlying code.

In the case of the ProPublica investigation into recidivism bias, for example, the algorithm was inside a piece of software called COMPAS, used by various states to estimate the likelihood and severity of any future crime a released prisoner might commit. The software was developed by Northpointe, a private company that was acquired by the Toronto-based firm Constellation Software in 2011. Sentencing judges weren’t able to see into the internal workings of the model because the code was proprietary. Some states have done statistical analyses to evaluate its accuracy, but the details of how it weighs variables correlated with race remain an open question. (Researchers are continuing to look into it.) Northpointe ultimately did share the loose structure of its algorithm with ProPublica, but declined to share specific calculations, and has since disputed the story’s conclusions.

Within the field, many now say there’s overwhelming acknowledgement about the need to tackle the issue. The conversation has moved from “What do you mean there’s a problem?” to “Oh no, we need to fix it,” said University of Utah professor Suresh Venkatasubramanian, one of the academics who contributed to the OSTP’s review on the subject.

A number of the efforts aimed at curbing bias are pressing companies to shine a light on the algorithms they create. “Many of the ill effects are not intentional. It comes from people designing technology in closed rooms in close conversations and not thinking of the real world,” Rachel Goodman, a staff attorney for the ACLU’s racial justice program, told Fast Company.

In December, the New York City Council passed a bill to establish a task force dedicated to reviewing algorithms that the city employs for everything from school placement to police distribution. The bill, which is now officially law, sets up a task force that is expected to include representatives from city departments that use algorithms, members of the tech and legal industries, as well as technical ethicists.

One key focus of this group would be to test how the city’s algorithms affect different groups of New Yorkers. For example, the task force could look at whether an algorithm used to determine bail eligibility treats white and black offenders differently. The unit is also expected to figure out how to alert residents when they were affected by a decision made by an algorithm. If a model tells law enforcement to dispatch more police officers to one part of the city, that could prompt government officials to notify residents of that neighborhood about the change. Additionally, the task force would examine how the data used to train these algorithms can be more broadly disclosed.

“If we’re going to be governed by machines and algorithms and data, well, they better be transparent,” NYC Council member James Vacca said during a hearing of the body’s technology committee. “As we advance into the 21st century, we must ensure our government is not ‘black boxed.’”

Mandating the open-sourcing of algorithms is one way lawmakers could put pressure on private firms, said University of San Francisco professor Rachel Thomas, the founder of, a nonprofit research lab and coding course that teaches users how to build machine learning systems. Thomas argues that governments can refuse to take bids from black-box software providers, and require all contractors to divulge their source code and explain how their systems work.

“Ideally, it would be great if the data could be open-source as well,” she said.

The city of Pittsburgh recently put the issue of transparency front and center when implementing a new software program developed by the Allegheny County Department of Human Services to figure out which children were most susceptible to abuse and neglect. Unlike many models bought from software companies, the risk algorithm is owned by Allegheny County itself, so all records about its internal workings are public and available for researcher scrutiny. The government officials implementing the program were highly cognizant of the potential for bias and elected not to use race as a factor in the model. (They do, however, acknowledge that there are variables in the algorithm like “criminal justice history,” which could be correlated with race.) As of last December, the algorithm has been found to treat black and white families “more consistently” than human screeners had before, The New York Times reported.

Rich Caruana, a Microsoft researcher who has worked to better understand the internal mechanisms of algorithms, said that omitting variables like gender and race in different algorithms isn’t always the solution to countering bias. In some cases, like medical predictions, these variables could be important to accuracy. And there can be other variables, like ZIP codes, that can correlate with race and introduce bias into models that don’t explicitly include race. which can embody biases as well when they are included in models.

In Europe, a sweeping new set of privacy regulations slated to take effect this spring strives to take a crack at the issue of transparency as well. One of its provisions would offer users a “right to explanation” when their data is processed by an automated system. But while it sounds straightforward, imposing a broad requirement like this can pose its own challenges.

“It’s not yet clear how this will work,” Caruana said. He worries that as appealing as “transparency” may sound, there’s no easy way to unpack the algorithm inside AI software in a way that makes sense to people. “There are cases where the law can go too far … too soon. Suddenly, everyone in the EU has a legal right to an explanation. Most of the time, we wouldn’t know how to do it,” he said.

EVEN IF IT’S hard to unpack an algorithm, simply putting a high priority on transparency could have useful effects. Two Oxford researchers published a paper arguing that one of the main benefits of the new European rules would be to force tech companies to consider transparency from the beginning when designing their products. The new rules could pressure firms to develop models that can be more easily outlined for a consumer.

And algorithmic transparency may not be the only way regulators could offer greater insight into biased systems. Another way would be scrutinizing the data that an algorithm learns from. If historic data showed that successful candidates for a certain job were mostly men, an algorithm that learned from that data could be more likely to recommend men for job in the future, tilting the playing field. “If you give anything bad data, it’s going to give you a bad answer,” Venkatasubramanian notes. That’s aparently what was going on with the “gorilla” scandal of 2015: Google said the issue arose because its algorithm hadn’t conducted enough analysis of people with different skin tones, in different lighting. Similarly, Georgetown Law School researchers have found that facial recognition systems used by police are statistically worse at recognizing African-American faces because they’ve been trained on more images of white people.

One idea for tackling the data problem—and a place that many experts believe Washington could play a useful role—is new industry-wide standards or benchmarks that algorithms need to meet before they can be used broadly in the wild. These standards could call for systems to be trained on equal amounts of data for users of different racial backgrounds and genders, for instance.

Researchers have suggested that government bodies also could develop a set of principles that outline priorities for fairness, as a barometer for policymaker use, in addition to industry development of algorithms. Having such standards could push regulators to check on the results their algorithms produce, and force them to evaluate how variables like gender and race, or proxies for them, are weighted in these outputs.

This kind of monitoring is key to spotting and preventing bias, said Sarah Tan, a former Microsoft researcher and current Cornell graduate student, who studies algorithms and fairness. Companies should be required to review their algorithm’s outcomes, a potentially more accessible measure than parsing a complicated formula, she suggests. A federal body like the Federal Trade Commission, which already oversees tech companies and consumer harm, could play a key role.

The idea of auditing algorithms on a routine basis is something that’s come up repeatedly among researchers as a potential means for holding companies accountable, not only to track instances of bias, but also to measure the impact of opaque processes like social media models that recommend news stories. Former FCC Chairman Tom Wheeler has backed the concept of a “public-interest API,” which would call on firms like Facebook to provide the details of the outputs of its algorithm.

But as AI grows in importance, some believe the scope of the problems the industry faces will extend beyond the ability of legacy agencies to keep up. Some prominent voices, including Elon Musk, a founder of research group OpenAI, and former Department of Justice attorney-adviser Andrew Tutt, have proposed going so far as to establish a standalone federal agency that would oversee the development of AI, much like the Federal Communications Commission and Food and Drug Administration do for their respective industries.

The scale and uncertainty involved is part of what makes this problem so challenging, Venkatasubramanian said. “The tendency in technology is to assume there is a single answer,” he said. “It’s our instinct, it’s how we’ve been trained. It’s not clear to me that’s what’s going to happen here.”