Digital network of languages
Version 16.0 • 154,998 Characters

What is Unicode?

It's the universal standard that assigns a unique number to every character, no matter the platform, program, or language.

info The Short Answer

Without Unicode, computers wouldn't understand that "A", "あ", "Ж", and "🍕" are all text. Before Unicode, different systems used conflicting codes (like ASCII), causing garbled text (mojibake). Unicode unifies human language into a single digital list, allowing you to tweet in Japanese, email in Greek, and text emojis seamlessly.

The Unicode Lab

Type anything to see how computers actually "see" your text.

Under the Hood Analysis
ASCII 2-Byte 3-Byte 4-Byte (Emoji)
H
U+0048 Basic Latin (ASCII)
UTF-8: 0x48
e
U+0065 Basic Latin (ASCII)
UTF-8: 0x65
l
U+006C Basic Latin (ASCII)
UTF-8: 0x6C
l
U+006C Basic Latin (ASCII)
UTF-8: 0x6C
o
U+006F Basic Latin (ASCII)
UTF-8: 0x6F
U+0020 Basic Latin (ASCII)
UTF-8: 0x20
🌍
U+1F30D Emoticons / Symbols
UTF-8: 0xF0 0x9F 0x8C 0x8D
7 Characters 10 Bytes (UTF-8)
Binary to Character processing

How Unicode Works

1. The Code Point

Think of Unicode as a giant spreadsheet. Every character gets a row number. This number is called a Code Point, usually written as U+1234. For example, "A" is always row 65 (U+0041), and "💀" is row 128,128 (U+1F480).

2. Encoding (UTF-8)

Computers don't store "row numbers"; they store bits (0s and 1s). UTF-8 is the most popular way to turn those Code Points into bits. It's clever: it uses 1 byte for English letters (to match old ASCII), but expands to 2, 3, or 4 bytes for other languages and emojis. This saves massive amounts of space on the web.

3. Rendering (Fonts)

Unicode tells the computer what the character is, but not how it looks. That's the job of a Font. If you see a square box (□) or a question mark (), it doesn't mean the Unicode is broken; it just means your current font doesn't have a drawing for that specific Code Point.

ASCII vs. Unicode

A

The Old Way (ASCII)

  • check_circle Only 128 characters total.
  • check_circle English alphabets and numbers only.
  • cancel No accents (é, ñ), no other scripts (汉, Ω), no emojis.
  • cancel Caused "Mojibake" (garbled text) when sharing files between countries.
🌍

The Unicode Way

  • check_circle Over 150,000 characters.
  • check_circle Covers virtually all written languages (living and dead).
  • check_circle Includes math symbols, musical notation, and emojis.
  • check_circle The foundation of the modern internet.

Did You Know?

The "Ghost" Characters

Unicode includes characters that are invisible but change how text works. For example, the "Zero Width Joiner" (ZWJ) acts like digital glue. It combines "Man" + "ZWJ" + "Woman" + "ZWJ" + "Boy" to create the single family emoji 👨‍👩‍👦.

Private Use Areas

There are blocks in Unicode left intentionally empty (U+E000 to U+F8FF). Companies like Apple or Google use these for internal icons before they become official standards.

History of text

From Stone Tablets to Smart Phones