Home > Tech

Reddit introduces an AI-powered tool that will detect online harassment

The new "harassment filter" is trained on previously flagged content.

Chase DiBenedetto

on March 7, 2024

Credit: Jonathan Raa / NurPhoto via Getty Images

Reddit has introduced an AI-powered safety filter that will help sift out posts that contain harassing or other objectionable content.

The "harassment filter" — quietly added to the platform's support page last week and detected by Android Authority — uses a Large Language Model (LLM) "trained on moderator actions and content removed by Reddit’s internal tools and enforcement teams," Reddit explains. The tool intends to support the already tenuous work of reddit moderators tasked with supervising the online communities they're a part of.

Just last month, Bloomberg reported that Reddit had signed a content licensing deal to a major "AI player," which would offer site and user data to train potential AI tech.

Mashable Light Speed

Want more out-of-this world tech, space and science stories?

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

You May Also Like

When a community and its moderators turn on the filter, a new flag will appear in the site's mod queue indicating content (posts and comments) that has been flagged as “potential harassment." Moderators can then approve or remove the content, and report back to Reddit if it was accurately detected.

The platform has introduced a slew of new features and updated experiences in recent months, ahead of its stock market debut this month. Last year, Reddit announced the Modmail Harassment Filter, which acts like a "spam" folder for moderator messages containing potentially abusive content.

How to set up Reddit's harassment filter

For desktop, go to the About Community tab on the right sidebar and select Mod Tools. For iOS and Android, click on the Mod Tools button below your community's banner.
Go to Moderation. Click on Safety.
Select the Harassment filter option, and toggle on.
Choose between the Low or High filter options. Low filtering blocks the least amount of content, but is more accurate in spotting harassment. High filter does a broader sweep of posts, and thus will block more posts. Reddit recommends using the High option if your community encounters a "significant amount of harassing content."

While Reddit says administrators will continue to automatically remove posts that directly violate Reddit’s Content Policy, the harassment filter provides communities oversight on objectionable but still "policy-complying" content that might slip through the cracks.

Topics Artificial Intelligence Social Good Reddit

Chase sits in front of a green framed window, wearing a cheetah print shirt and looking to her right. On the window's glass pane reads "Ricas's Tostadas" in red lettering.

Chase DiBenedetto

Social Good Reporter

Chase joined Mashable's Social Good team in 2020, covering online stories about digital activism, climate justice, accessibility, and media representation. Her work also captures how these conversations manifest in politics, popular culture, and fandom. Sometimes she's very funny.