Home > Tech

New anti-harassment AI program has 90% success rate

And Big Tech needs to notice.

Carmen Triola

on August 3, 2016

Original image has been replaced. Credit: Mashable

Yahoo has developed an AI algorithm that it says can correctly detect up to 90 percent of abusive comments online, making it outperform other "state-of-the-art" deep-learning-based algorithms, according to a report by the algorithm's developers.

"While automatically detecting abusive language online is an important topic and task, the prior [abuse detection] has not been very unified, thus slowing progress ... [abuse] can have a profound impact on the civility of a community or a user’s experience," the developers wrote.

The algorithm used a mix of machine learning and crowdsourced abuse detection to scan the comment sections of Yahoo News and Finance.

You May Also Like

Currently, most abusive language detectors are keyword-based systems. The problem is that abusers might avoid certain words to avoid the filters or come up with new slang. Additionally, these systems are bad at reading for context or sarcasm.

Yahoo, on the other hand, went a little deeper and was able to track responses by the length of comments and words, the number of punctuations, URLS and capitalized letters. It also tracked usage of "politeness words," modal verbs (like "could," "would" or "should," which can indicate either hedging or a confident speaker) and "insult and hate blacklist words." All in all, the algorithm outperformed Yahoo's old detectors by about 10 percent.

All in all, the algorithm outperformed Yahoo's old detectors by about 10 percent

Specially trained Yahoo employees also looked at the same comments and rated them as abusive or not, which helped to train the algorithm to look for implicit abuse. (The annotated database of what was marked as abusive will soon be available online on Yahoo Webscope.)

Mashable Light Speed

Want more out-of-this world tech, space and science stories?

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

Yahoo crowdsourced abuse ratings from Amazon's Mechanical Turk as well, which allows anyone to sign up and filter through images or language. Participants received $0.02 for every comment they tried to categorize as abusive or not. These people, however, had not been trained in abuse detection like Yahoo's employees, and were found to be much worse at it. So even with the AI, human judgement is still vital to the operation.

Ultimately, the program might be severely limited as abuse is still defined by Yahoo itself and not the user -- unlike Instagram's new anti-harassment measures, which allow users to filter out comments with certain words. The program also may not be able find fake accounts or filter through abusive pictures or videos tweeted at a user, which happened to Leslie Jones on Twitter.

Some websites also have double standards when it comes to their community guidelines. On Facebook, Australian journalist and feminist activist Clementine Ford pointed out that she'd been blocked for telling a man to "fuck off" while someone who called her a "diseased whore" hadn't been flagged. That post was ultimately removed but the situation is a good example of how nuanced determining what is offensive and what isn't can be.

For instance, critics argue that women's bodies are often policed unnecessarily, censoring images with womens' nipples or period stains.

Still, the team at Yahoo remains optimistic.

"As the amount of online user generated content quickly grows, it is necessary to use accurate, automated methods to flag abusive language," they wrote. "In our work we take a major step forward."

Have something to add to this story? Share it in the comments.

Topics Artificial Intelligence Instagram X/Twitter Yahoo