Content Safety
Hosted Llama Guard classifier — flags text containing violence, sexual content, hate speech, self-harm, harassment, and 9 other categories. Useful for moderators, parents, and anyone shipping user-generated content. Uses 1 credit per run.
About Content Safety
Content Safety runs text through a hosted Llama Guard classifier that flags violence, sexual content, hate speech, self-harm, harassment, and nine other categories. Reach for it when you moderate user-generated content, screen submissions, or want a programmatic safety check before publishing. It's a Pro tool using 1 credit per run.
- Category
- privacy
- Input
- Accepts: text/plain.
- Output
- Outputs: application/json.
- Cost
- Credit-metered
- Memory
- low
Common uses
- Screen forum posts or comments for hate speech and harassment before they go live
- Flag self-harm or violence references in support-chat transcripts so a human can follow up
- Pre-filter user-submitted reviews for sexual or abusive content on a marketplace
- Gate a community submission form, blocking entries the classifier marks unsafe
- Help a parent quickly assess whether a block of text a child encountered is concerning
- Add an automated moderation step to a content pipeline that returns a machine-readable verdict
Frequently asked questions
What does it return?
JSON: whether the text is flagged and which of the safety categories it matched (violence, sexual content, hate, self-harm, harassment, and others).
Which model powers it?
A hosted Llama Guard classifier covering 14 safety categories. It's a Pro tool that uses 1 credit per run.
Is my text sent to a server?
Yes. Classification happens on a hosted model, so the text is sent for the request. It's processed to return the verdict and not used for anything else.
Does it edit or redact the content?
No. It only classifies and reports categories. To remove personal data or cover content, use a dedicated redaction tool.
Can it handle longer passages?
Yes, it accepts plain text and classifies the passage as a whole. For very long documents, splitting into sections gives more localized flags.
Keywords
- safety
- moderation
- classify
- guard
- pii
- pro
- llm