Unicode Info
Inspect every character in text — codepoint, UTF-8 bytes, JS escape, Unicode block, category — and flag invisibles, BOMs, RTL marks, and control characters.
About Unicode Info
Unicode Info breaks text down character by character, showing each one's codepoint, UTF-8 byte sequence, JavaScript escape, Unicode block, and category. Crucially, it flags the things you can't see — zero-width characters, BOMs, right-to-left marks, and control characters — that quietly break parsers, diffs, and logins. It runs in your browser and returns a structured JSON report.
- Category
- inspect
- Input
- Accepts: text/plain.
- Output
- Outputs: application/json.
- Cost
- Free, runs in your browser
- Memory
- low
Common uses
- Track down an invisible zero-width space that's breaking a string comparison or a password match
- Diagnose a leading byte-order mark (BOM) corrupting the first column of a CSV import
- Spot a hidden RTL override character injected into a filename or commit message
- Inspect emoji and combining sequences to understand why a string's length looks wrong
- Verify that pasted text contains no stray control characters before storing it
- Look up the exact codepoint and Unicode block of an unfamiliar glyph
Frequently asked questions
What does it report for each character?
Codepoint, UTF-8 bytes, JavaScript escape sequence, Unicode block, and general category — plus flags for invisibles, BOMs, RTL marks, and control characters.
Why would I care about invisible characters?
They cause bugs that are nearly impossible to see — failed equality checks, mangled imports, and spoofed identifiers — so surfacing them is the whole point.
What's the output format?
A JSON array with one entry per character, suitable for inspection or feeding into another tool.
Does my text get uploaded?
No. The analysis is performed entirely in your browser; the text stays on your device.
Does it handle characters outside the basic plane?
Yes — codepoints are reported per character, including astral-plane characters like many emoji.
Keywords
- unicode
- codepoint
- character
- inspect
- utf-8
- utf8
- escape
- invisible
- zero-width
- bom
- rtl
- hex
- encoding