Monitoring and responding to hate speech is becoming an ever increasing problem for governments, social platforms, the media and brands. In order to react to hate speech companies first have to detect and understand it — often a very difficult process given the way that languages evolve.
Timothy Quinn is the founder of Hatebase, a Canada-based organisation dedicated to detecting and understanding hate speech. Its technology analyses the language used on the web and its structures and then contextualizes the resulting data. It sells this database to companies who don't have the expertise or the resources to do this themselves.
At DIS 2020, Timothy will be explaining why hate speech is increasing and what his company and its clients can do to identify and counter it. Here he outlines why companies use Hatebase’s services and how hate speech differs on the various online platforms.
Explain briefly what Hatebase is and how you got started?
Hatebase was founded in 2013 as a pilot project with a Canadian NGO which was working to mitigate the risk of mass atrocities in various parts of the world. The initial idea was to monitor hate speech as a potentially quantifiable precursor to violence. Subsequently, we found that not only were other government and non-governmental entities using our data, there were a variety of other organizations leveraging our data. This included law enforcement with an interest in reducing violence in at-risk neighborhoods and social networks concerned about the increase of hate speech in their ecosystems.
What is the key problem you are trying to solve?
Hate speech degrades public conversation, silences diverse and under-represented viewpoints, and can be an early indicator of violence. Our primary goals are therefore to:
Reduce incidents of hate speech by monitoring the use and dissemination of discriminatory language against targeted groups
Lessen the acceptability of hate speech by encouraging counter-messaging and awareness of the impact of hate speech
Where possible, prevent violence which is predicated by hate speech
And why do your clients use your services?
We help organisations who are trying to reduce hateful content on their platforms, which is a significant problem for companies like Facebook, Twitter and TikTok because discriminatory content and online harassment alienate legitimate users, spook advertisers, and risk large regulatory penalties in countries like Germany, the UK, Australia and France.
It’s increasingly unrealistic to expect every online ecosystem to build from scratch the linguistic expertise necessary to maintain a large, changing lexicon of multilingual hate speech, and to know how to apply that lexicon accurately in a variety of contexts... so that’s where a company like Hatebase comes in. We have expertise in hate speech moderation that other companies can benefit from, the same way they leverage Gmail to run their email infrastructure or Cloudflare to protect them from DDoS attacks.
Are there differences in the ways in the hate speech is used on the various social networks, ie. does Facebook’s hate speech differ from Twitter?
There are differences in the way various platforms define and moderate hate speech, and there are differences in how creators of hate speech use those platforms to spread their content. Every ecosystem has a different ratio of automated to manual moderation, and a different threshold beyond which they’ll quarantine content they find objectionable. On platforms that take a more sophisticated, proactive approach to moderating hate speech (which is far from the majority), we see creators of hate speech responding by being more creative with their messaging — for instance, using code words, dog whistles, double entendres, leetspeak and other forms of text obfuscation.
Do you have any theories as to why we are seeing a rise in hate speech? Is it just controversial politics - Trump Brexit, etc. - or is there something more sinister?
Although most NGOs who monitor hate speech agree that hate speech is increasing, and is exacerbated by populist xenophobic movements around the world, it’s also important to realize that the overall volume of online conversation is increasing, which makes it difficult to figure out how to measure “net new hate”. It’s like measuring diagnoses of medical disorders, which may increase as a result of new disease vectors or rising environmental risks, or may simply be a statistical artifact of better and more aggressive monitoring.
At the end of the day, more hate and more opportunities for hate are interrelated phenomena, which is evident in the increasing homogeneity of online discrimination. Language which might have previously been regionally limited is now much more broadly accessible, and this has an exponential effect on the palatability of hate speech in communities everywhere.
What is the most tricky part of maintaining a database of hate speech?
Hate speech is a moving target for two reasons. First, every organisation has its own definition of hate speech, which is influenced by the purposes for which they’re monitoring it. What the United Nations or Council of Europe considers hate speech is very different than what Facebook or Twitter consider hate speech. Second, language is always in flux as new terms enter the lexicon through slang or memes or linguistic migration. This puts a tremendous responsibility on Hatebase to be aware of the myriad ways people disparage each other across nearly a hundred language and nearly two hundred countries, and it’s something no one ever gets exactly right because of the fluid nature of language and the changing economic and political thresholds that define hate speech.
How do you work with NGOS and volunteers - what is the split between humans and ML?
Hatebase uses a combination of technological and human assets to curate its data. We work with large and small NGOs in various linguistically diverse areas of the world to acquire new vocabulary, and also have a committed group of 70+ Citizen Linguists who help us further grow and moderate our multilingual vocabulary. We then use our natural language processing technology, which we call HateBrain, to search various public data sources for incidents of hate speech.
We also benefit from the 275+ universities around the world using our data to better understand how and why hate speech spreads, since this research helps makes our algorithms smarter.
How do you imagine that Hatebase will evolve in the future?
Although we’ve worked closely over the past year with some of the world’s largest social networks, we’re already seeing an increasing need to work in other sectors of the economy. Media, multiplayer gaming, online retail and customer support, travel and hospitality — all of these industries share the same problem of having to moderate an increasing volume of hateful content.