privacy

Content Safety

Hosted Llama Guard classifier — flags text containing violence, sexual content, hate speech, self-harm, harassment, and 9 other categories. Useful for moderators, parents, and anyone shipping user-generated content. Uses 1 credit per run.

Checking access…

About Content Safety

Content Safety runs text through a hosted Llama Guard classifier that flags violence, sexual content, hate speech, self-harm, harassment, and nine other categories. Reach for it when you moderate user-generated content, screen submissions, or want a programmatic safety check before publishing. It's a Pro tool using 1 credit per run.

Category
privacy
Input
Accepts: text/plain.
Output
Outputs: application/json.
Cost
Credit-metered
Memory
low
Privacy: Content Safety runs entirely on your device. Files you provide never leave your browser — no uploads, no server, no tracking. The page works offline once loaded.

Common uses

  • Screen forum posts or comments for hate speech and harassment before they go live
  • Flag self-harm or violence references in support-chat transcripts so a human can follow up
  • Pre-filter user-submitted reviews for sexual or abusive content on a marketplace
  • Gate a community submission form, blocking entries the classifier marks unsafe
  • Help a parent quickly assess whether a block of text a child encountered is concerning
  • Add an automated moderation step to a content pipeline that returns a machine-readable verdict

Frequently asked questions

What does it return?

JSON: whether the text is flagged and which of the safety categories it matched (violence, sexual content, hate, self-harm, harassment, and others).

Which model powers it?

A hosted Llama Guard classifier covering 14 safety categories. It's a Pro tool that uses 1 credit per run.

Is my text sent to a server?

Yes. Classification happens on a hosted model, so the text is sent for the request. It's processed to return the verdict and not used for anything else.

Does it edit or redact the content?

No. It only classifies and reports categories. To remove personal data or cover content, use a dedicated redaction tool.

Can it handle longer passages?

Yes, it accepts plain text and classifies the passage as a whole. For very long documents, splitting into sections gives more localized flags.

Keywords

  • safety
  • moderation
  • classify
  • guard
  • pii
  • pro
  • llm

Try next