The Unicode Lab
Type anything to see how computers actually "see" your text.
How Unicode Works
1. The Code Point
Think of Unicode as a giant spreadsheet. Every character gets a row number. This number is called a Code Point, usually written as U+1234. For example, "A" is always row 65 (U+0041), and "💀" is row 128,128 (U+1F480).
2. Encoding (UTF-8)
Computers don't store "row numbers"; they store bits (0s and 1s). UTF-8 is the most popular way to turn those Code Points into bits. It's clever: it uses 1 byte for English letters (to match old ASCII), but expands to 2, 3, or 4 bytes for other languages and emojis. This saves massive amounts of space on the web.
3. Rendering (Fonts)
Unicode tells the computer what the character is, but not how it looks. That's the job of a Font. If you see a square box (□) or a question mark (), it doesn't mean the Unicode is broken; it just means your current font doesn't have a drawing for that specific Code Point.
ASCII vs. Unicode
The Old Way (ASCII)
- check_circle Only 128 characters total.
- check_circle English alphabets and numbers only.
- cancel No accents (é, ñ), no other scripts (汉, Ω), no emojis.
- cancel Caused "Mojibake" (garbled text) when sharing files between countries.
The Unicode Way
- check_circle Over 150,000 characters.
- check_circle Covers virtually all written languages (living and dead).
- check_circle Includes math symbols, musical notation, and emojis.
- check_circle The foundation of the modern internet.
Did You Know?
The "Ghost" Characters
Unicode includes characters that are invisible but change how text works. For example, the "Zero Width Joiner" (ZWJ) acts like digital glue. It combines "Man" + "ZWJ" + "Woman" + "ZWJ" + "Boy" to create the single family emoji 👨👩👦.
Private Use Areas
There are blocks in Unicode left intentionally empty (U+E000 to U+F8FF). Companies like Apple or Google use these for internal icons before they become official standards.
From Stone Tablets to Smart Phones