New NSFW text detection model
We've released a new NSFW ("Not Suitable For Work") model for detecting NSFW or otherwise sensitive text. However, it's still in the experimental stage, so we recommend using it alongside your existing models.
The model can detect and categorize UNSAFE or SENSITIVE content. It covers subjects like profanity, violence, pornography, discrimination, politics etc.
It does overlap a bit with the existing toxicity and propriety models. Still, where those have been trained mainly on conversational data, the NSFW model has been trained on a more extensive and diverse data set. That doesn't mean that the NSFW model is a better choice per se, but it can offer better accuracy and certainty in some edge cases or on different types of content. We've also seen that it can handle spelling mistakes with higher accuracy. We recommend you try all relevant models for your use case and see what works best.
In addition to detecting unsafe content, the model also notices sensitive and controversial topics like politics and religion. That can be the mention of a president or a controversial topic in a neutral or positive context, but if the text is negative or hateful it will usually be labelled unsafe . That way, the sensitive label is like a level before the text is deemed unsafe.
We hope the new model is helpful for your text moderation needs, and please reach out in case you have any questions.