When Donna Strickland won the Nobel Prize this month, she became only the third woman in history to receive the award in physics. An optical physicist at the University of Waterloo, Strickland is brilliant, accomplished and inspiring. To use Wikipedia parlance, she is very clearly notable.
Except that, somehow, she wasn’t. Despite her groundbreaking research on a method of generating laser beams with ultrashort pulses, Strickland did not have a Wikipedia page until shortly after her Nobel win.
Perhaps more disconcerting, a volunteer Wikipedia editor had drafted a page about Strickland in March only to have it declined in May. The reason: There wasn’t enough coverage of Strickland’s work in independent secondary sources to establish her notability. Her achievements simply weren’t documented in enough news articles that Wikipedia editors could cite.
Before Wikipedia points a finger that might rightly be pointed back at us, let me acknowledge that Wikipedia’s shortcomings are absolutely real. Our contributors are majority Western and mostly male, and these gatekeepers apply their own judgment and prejudices. As a result, Wikipedia has dozens of articles about battleships and not nearly enough on poetry. We’ve got comprehensive coverage on college football but significantly less on African marathoners.
At the same time, Wikipedia is by design a living, breathing thing — a collection of knowledge that many sources, in aggregate, say is worth knowing. It is therefore a reflection of the world’s biases more than it is a cause of them.
We are working to correct biases in Wikipedia’s coverage. For instance, in 2014, Wikipedia editors evaluated all the biographies on English Wikipedia and found that only about 15% of them were about women. To rectify the imbalance, groups of volunteers, including the WikiProject Women Scientists and WikiProject Women in Red, have been identifying women who should have pages and creating articles about them.
Today, 17.82% of our biographies are about women. This near 3% jump may not sound like much, but it represents 86,182 new articles. That works out to 72 new articles a day, every single day, for the past three and a half years.
But signs of bias pop up in different ways. A 2015 study found that, on English Wikipedia, the word “divorced” appears more than four times as often in biographies of women than in biographies of men. We don’t fully know why, but it’s likely a multitude of factors, including the widespread tendency throughout history to describe the lives of women through their relationships with men.
Technology can help identify such problems. Wikipedia articles about health get close attention from our community of medical editors, but for years, some articles on critical women’s health issues, such as breastfeeding, languished under a “low importance” categorization. An algorithm identified this mistake.
But there is only so much Wikipedia itself can do. To fix Wikipedia’s gender imbalance, we need our contributors and editors to pay more attention to the accomplishments of women. This is true across all under-represented groups: people of color, people with disabilities, LGBTQ people, indigenous communities.
Although we don’t believe that only women editors should write pages about other women, or writers of color about people of color, we do think that a broader base of contributors and editors — one that included more women and people of color, among others — would naturally help broaden our content.
Wikipedia is founded on the concept that every individual should be able to share freely in the sum of all knowledge. We believe in “knowledge equity,” which we define as as the idea that diverse forms of knowledge should be recognized and respected. Wikipedia is not limited to what fits into a set of encyclopedias.
We also need other fields to identify and document diverse talent. If journalists, book publishers, scientific researchers, curators, academics, grant-makers and prize-awarding committees don’t recognize the work of women, Wikipedia’s editors have little foundation on which to build.
Increasingly, Wikipedia’s content and any biases therein have ramifications well beyond our own website. For instance, Wikipedia is now relied upon as a major source in the training of powerful artificial intelligence models, including models that underlie common technologies we all use.
In such training processes, computers ingest large data sets, draw inferences from patterns in the data and then generate predictions. As is well understood in the programming world, bad or incomplete data generate biased outcomes. This phenomenon is known by the acronym GIGO: garbage in, garbage out.
People may intuitively understand that Wikipedia is a perennial work in progress. Computers, on the other hand, simply process the data they’re given. If women account for only 17.82% of the data, we may find ourselves with software that thinks women are only 17.82% of what matters in the world.
It is true that Wikipedia has a problem if Donna Strickland, an accomplished physicist, is considered worthy of a page only when she receives the highest possible recognition in her field. But this problem reflects a far more consequential and intractable problem in the real world.
Wikipedia would like to encourage other knowledge-generating institutions to join us in our efforts to balance this inequity. We may not be able to change how society values women, but we can change how women are seen, and ensure that they are seen to begin with. That’s a start.