Christopher Dengsø - Moderation API

Tutorial

How to handle users reporting inappropriate content

Users often come across inappropriate content, and it's crucial for social platforms to handle this scenario effectively. Allowing your users to report content builds trust, maintains a safe online environment, and ultimately improves the bottom line. But as your user base grows, managing these reports can become challenging.

New API endpoints for wordlists and review queues

Today we just rolled out a suite of new API endpoints designed to improve the experience with our Wordlists and Review Queues for enterprise plans. These enhancements offer greater flexibility if you're aiming to customise your moderation interface and leverage our robust moderation and review queue engine. There&

New model

New image toxicity model

The new image toxicity model adds a single but robust label for detecting and preventing harmful images. Where the image NSFW model can distinguish between multiple types of unwanted content, it can fail to generalise to toxic content outside of the provided labels. The toxicity model on the other hand

Object moderation endpoint

Until now, Moderation API allowed for the moderation of individual pieces of text or images. In practice, there’s often a need to moderate entire entities composed of multiple content fields. While one solution has been to call the API separately for each field, this approach can be inefficient and

Smart wordlists for moderation now available

You can now add smart wordlists that understand semantic meaning, similar words, and obfuscations in your Moderation API projects. When to use a wordlist In many cases an AI agent is a better solution to enforce certain guidelines as they understand context and intent, but wordlists are useful if you

Updates

Llama-Guard 3 on Moderation API

Llama Guard 3, now on Moderation API, offers precise content moderation with Llama-3.1. It’s faster and more accurate than GPT-4, perfect for real-time use and customizable for nuanced moderation needs.

Tutorial

How to self-host Llama-Guard 3 for content moderation

Llama-guard is a one of the best available models for content moderation at the moment. Here's a guide on everything from setting up a server to configuring Llama for your use case.

Context awareness for message moderation

Context is crucial when handling content moderation. One thing might seem innocent in one context, but hateful in a different context. You can already supply contextId and authorId with content, and this can help you understanding the context when reviewing items in the review queue. Now you can also enable

GPT-4o-mini for AI Agents

Update: since the creation of this post we've also added Llama Guard 3. Llama Guard 3 is now the recommended model for AI agents. Read about Llama Guard here. OpenAI have just released their latest model GPT-4o-mini. We're excited about the updated model and are already

Updates

Upgraded Content Analytics.

I'm excited to announce the launch of the upgraded analytics dashboard that will provide deeper insights into user behaviour and content trends on your platform. With a privacy-first design, these analytics tools will allow you to track and understand how people are using and potentially abusing your platform,

Updates

Review Queue Label Filter

You can now add custom label filters for review queues. This allows you to create queues like: * Show items with the POSITIVE label to find positive user comments. * Show items where the TOXICITY label has a score between 20% and 70% to find content where the AI is uncertain. * Filter

Updates

Image moderation now available

I'm excited to announce that you can now moderate images with moderation API. Setting up image moderation works similarly to text moderation. You can adjust thresholds, and disable labels that you do not care about when flagging content. We offer 9 different labels out of the box -