Llama-Guard 3 on Moderation API
Llama Guard 3, now on Moderation API, offers precise content moderation with Llama-3.1. It’s faster and more accurate than GPT-4, perfect for real-time use and customizable for nuanced moderation needs.
Llama Guard 3, now on Moderation API, offers precise content moderation with Llama-3.1. It’s faster and more accurate than GPT-4, perfect for real-time use and customizable for nuanced moderation needs.
Llama-guard is a one of the best available models for content moderation at the moment. Here's a guide on everything from setting up a server to configuring Llama for your use case.
Context is crucial when handling content moderation. One thing might seem innocent in one context, but hateful in a different context. You can already supply contextId and authorId with content, and this can help you understanding the context when reviewing items in the review queue. Now you can also enable
Update: since the creation of this post we've also added Llama Guard 3. Llama Guard 3 is now the recommended model for AI agents. Read about Llama Guard here. OpenAI have just released their latest model GPT-4o-mini. We're excited about the updated model and are already
I'm excited to announce the launch of the upgraded analytics dashboard that will provide deeper insights into user behaviour and content trends on your platform. With a privacy-first design, these analytics tools will allow you to track and understand how people are using and potentially abusing your platform,
You can now add custom label filters for review queues. This allows you to create queues like: * Show items with the POSITIVE label to find positive user comments. * Show items where the TOXICITY label has a score between 20% and 70% to find content where the AI is uncertain. * Filter
I'm excited to announce that you can now moderate images with moderation API. Setting up image moderation works similarly to text moderation. You can adjust thresholds, and disable labels that you do not care about when flagging content. We offer 9 different labels out of the box -
We are thrilled to kick off 2024 with a host of exciting new features, and we have many more in store for the year ahead. Label Thresholds In your moderation project, you now have the ability to adjust the sensitivity per label, providing fine-grained control over content flagging. Additionally, you
We've just made 4 new classifier models availabel in your dashboards. Sexual model - Moderation APIModeration APIDiscrimination model - Moderation APIModeration APISelf harm model - Moderation APIModeration APIViolence model - Moderation APIModeration API
* New features: * Add your own options to actions. Useful if you need to specify why an action was taken. For example an item was removed because of "Spam", "Innapropriate", etc. * Select specific queues an action should show up in. * Performance improvements: much better reponsiveness and speed
The Clbuttic Mistake, also known as the Scunthorpe problem, has frustrated users and developers alike for years. But could new solutions finally make this problem a thing of the past? Let's explore.
We have been hard at work to develop new features and enhance the experience with Moderation API. Today, we are incredibly excited to announce: 1. A brand-new feature to create and train custom AI models 🛠 2. A new Sentiment Analysis model 🧠🌟 Introducing Custom Models 🌟 Say hello to the era of
Updates
A new sentiment model just became available in your dashboards. In our evaluations the new model seems to surpass other solutions on the market when understanding underlying sentiment in more complex senteces. This is probably due to the underlying large language model with its remarkable contextual understanding. The model detects
Tutorial
Complex real-world problems often require tailored solutions, rather than a one-size-fits-all approach. For example, different businesses have unique values and objectives, meaning their AI models need to reflect their distinct requirements. Traditionally, training a production-level classifier was a resource-intensive and time-consuming process that required significant amounts of data to label
Updates
We've released a new NSFW ("Not Suitable For Work") model for detecting NSFW or otherwise sensitive text. However, it's still in the experimental stage, so we recommend using it alongside your existing models. The model can detect and categorize UNSAFE or SENSITIVE content. It
For security reasons all accounts have a character limit at 10.000 characters pr. request. If you have cases where you need to analyze a big amount of text, you might need to increase this limit. Now you can - just send us a message at support@moderationapi.com. In
I'm excited to announce 3 new models available in the dashboard as of today! In this post, I'll briefly go over each of them and describe some interesting use cases. Each analyzer returns the respective scores of each label and decides on the label with the
As we add more and more models, we saw the dashboard getting more cluttered. To improve and anticipate upcoming features, we've changed to a search interface where you add each model to a project. This makes for a powerful workflow where you can add specific models needed per
We've just published a new model for recognizing a collection of sensitive numbers.
Improvements
We've discovered and fixed an issue where sentences added to word-lists would not be detected properly.
Updates
You will now see a word list tab on the dashboard. You can create multiple word lists that can be used across your filters and analyzers. For example, create a word list for mild swear words that you want to allow in your app, and leave our profanity filter to
Zapier
In this tutorial, I'll be showing you how to automatically detect and remove contact details from unstructured text. We'll be using Zapier to automate the workflow, and the Moderation API integration to do detect and moderate text coming from TypeForm submissions. In this example the data
Zapier
In this tutorial, I'll be showing you how to automatically detect and remove swear words from text using Zapier and the Moderation API.
Updates
We've updated the API with a new language detection endpoint. This is the first of our analysis endpoints that we're working on at the moment. The endpoint detects over 160 languages using probabilistic reasoning. The more words you give it, the higher the confidence in the