privacy

Prompt Injection Demo

Name: Prompt Injection Demo
Author: Wyreup

Visualise where prompt-injection content hides in text — invisible characters, confusable lookalikes (Cyrillic а vs Latin a), mixed-script words, instruction-override phrases ("ignore previous instructions"), and chat-fence markers ([INST], <|im_start|>, System:). Returns positions, severity, and HTML with <mark> spans for direct display.

Loading…

About Prompt Injection Demo

Paste any text — a scraped page, a PDF body, a chat transcript, a copy-paste from the web — and see exactly where prompt-injection content is hiding. The tool surfaces three categories in one pass: invisible characters that humans cannot see (zero-width Unicode, BOM, control codes), confusable lookalikes where one alphabet impersonates another (Cyrillic `а` masquerading as Latin `a`), and instruction-override patterns ("ignore previous instructions", `[INST]`, `<|im_start|>`, `System:`, named jailbreaks). Each finding is reported with character offsets, severity, and a rendered HTML view ready to display. Runs in your browser — never sends your text to anyone.

Category: privacy
Input: Accepts: text/plain.
Output: Outputs: application/json.
Cost: Free, runs in your browser
Memory: low

Privacy: Prompt Injection Demo runs entirely on your device. Files you provide never leave your browser — no uploads, no server, no tracking. The page works offline once loaded.

Common uses

Audit text before pasting it into an LLM prompt or fine-tune dataset.
Defensive scan of scraped pages, PDFs, or user-submitted content fed to an agent.
Demo prompt-injection risk to a team that hasn't internalised the threat model.
Catch homoglyph attacks in URLs, account names, or imported user data.
Spot zero-width characters in payment-form values or address fields where they're used to bypass filters.

Frequently asked questions

What counts as a "prompt injection"?

Any content embedded in untrusted text designed to alter an LLM's behaviour: explicit overrides ("ignore previous instructions"), role-takeover phrases ("you are now"), chat fences from common templates (ChatML `<|im_start|>`, Llama `[INST]`, OpenAI), `System:` role spoofing, and named jailbreaks (DAN). The tool also catches non-instructional but high-risk content: invisible characters (zero-width Unicode) and confusable lookalikes that bypass naïve string filters.

Will it catch every attack?

No — this is a heuristic over known classes. Novel attacks worded differently from the pattern table will slip past. Use it as one defensive layer, not as the only one. The output is structured enough that you can plug it into your own ranking / blocking pipeline.

Does it work on PDFs?

Run `pdf-to-text` first, then pipe the output into prompt-injection-demo. The Wyreup chain builder makes that two-step flow a one-click chain.

Can I get the HTML render directly?

Yes — the JSON output includes an `html` field with `<mark data-kind="…" data-severity="…">` spans around each finding. Drop it into a `<div>` to render the marked-up text inline.

Keywords

prompt injection
security
llm
safety
suspicious
homoglyph
jailbreak
visualize

About Prompt Injection Demo

Common uses

Frequently asked questions

Keywords

Try next

Related tools