Base64: How It Actually Works Under the Hood
Base64 is everywhere — in JWTs, data URLs, email attachments. This is the byte-level walkthrough of what it does, why it grows files by 33%, and the URL-safe variant.
If you’ve ever inspected a JWT, embedded an image as a data URL in CSS, or read raw email source, you’ve seen Base64. It looks like garbage:
SGVsbG8sIFhlcm9iaXQu
That’s Hello, Xerobit. Six bytes encoded as twenty characters. This post walks the algorithm bit by bit so the next time you see Base64, you understand exactly what’s happening.
The problem Base64 solves
Some data is binary. Image bytes, audio samples, encrypted blobs, hash digests. Some transports are text-only. Email bodies (historically), JSON fields, URL query strings, HTTP headers, console output.
You can’t just paste binary into text. Bytes 0–31 are control characters that hose terminals. Byte 0 is null, which terminates C strings. Bytes above 127 are interpreted differently depending on encoding. JSON parsers reject most non-ASCII byte sequences. The whole text-based ecosystem assumes a printable subset.
Base64 is the bridge. It maps any sequence of bytes onto a 64-character alphabet — A-Z, a-z, 0-9, +, / — that’s safe in essentially every text context. The cost: the encoded form is about 33% larger than the original.
The algorithm in three steps
Base64 turns three input bytes (24 bits) into four output characters (24 bits, 6 per character). The math:
- Take 3 bytes. That’s 24 bits.
- Split those 24 bits into 4 groups of 6 bits each.
- Map each 6-bit group to one character via the Base64 alphabet.
Let’s encode Cat:
Input: C a t
ASCII: 67 97 116
Binary: 01000011 01100001 01110100
Concatenate the binary: 010000110110000101110100 (24 bits).
Split into 6-bit groups: 010000 110110 000101 110100 (4 groups).
Convert each to decimal: 16 54 5 52.
Look up in the Base64 alphabet:
0 A 8 I 16 Q 24 Y 32 g 40 o 48 w 56 4
1 B 9 J 17 R 25 Z 33 h 41 p 49 x 57 5
2 C 10 K 18 S 26 a 34 i 42 q 50 y 58 6
3 D 11 L 19 T 27 b 35 j 43 r 51 z 59 7
4 E 12 M 20 U 28 c 36 k 44 s 52 0 60 8
5 F 13 N 21 V 29 d 37 l 45 t 53 1 61 9
6 G 14 O 22 W 30 e 38 m 46 u 54 2 62 +
7 H 15 P 23 X 31 f 39 n 47 v 55 3 63 /
16 → Q, 54 → 2, 5 → F, 52 → 0.
Result: Q2F0 — four characters. Original was three bytes, output is four characters. Ratio: 4/3 ≈ 1.33x growth. Always.
Padding: the = characters at the end
The algorithm assumes input length is divisible by 3. What if it isn’t?
If the input has 1 byte (8 bits), you only get 12 bits to work with, which is 2 six-bit groups. You’d encode just 2 characters. By convention, you pad to 4 characters with ==.
If the input has 2 bytes (16 bits), you get 18 bits → 3 groups → 3 characters. Pad to 4 with one =.
So:
| Input bytes | Output chars | Padding |
|---|---|---|
| 1 | 2 + == | XX== |
| 2 | 3 + = | XXX= |
| 3 | 4 (no padding) | XXXX |
The = exists so encoded output length is always a multiple of 4, which lets some legacy parsers detect chunk boundaries without counting bytes.
Concrete example: encoding Hi (2 bytes):
Input: H i
Binary: 01001000 01101001
Pad with zero bits to make 18 bits → split into 6-bit groups → output 3 chars + 1 padding =:
010010 000110 1001(00)
18 6 36 (last group padded with zeros)
S G k
Result: SGk=. The encoded output is 4 characters; the third group’s last 2 bits are zero-padding because there were no real bits there. The = signals “the last group only had data for the first 4 bits; the rest are padding.”
URL-safe Base64 (RFC 4648 §5)
The standard Base64 alphabet uses + and /. Both have special meanings in URLs (+ is sometimes interpreted as space, / is the path separator). URL-safe Base64 swaps these for - and _:
| Index | Standard | URL-safe |
|---|---|---|
| 62 | + | - |
| 63 | / | _ |
Padding is also commonly omitted in URL-safe variants. JWTs use URL-safe Base64 without padding — that’s why JWT signatures don’t have trailing = characters.
To encode in URL-safe mode: encode normally, then s/+/-/g and s/\//_/g. Strip trailing = if your protocol accepts unpadded.
To decode URL-safe: reverse-substitute, re-pad to multiple of 4 with =, then standard-decode. The Base64 tool does both directions.
Why exactly 33% larger?
Base64 expansion is always 4/3 = 1.333… regardless of input. Every 3 input bytes become 4 output characters. Each output character takes 1 byte (ASCII). So output_bytes = ceil(input_bytes / 3) * 4.
For very small inputs the padding makes the ratio worse:
| Input bytes | Output bytes | Ratio |
|---|---|---|
| 1 | 4 (X===) | 4.0× |
| 2 | 4 (XX==) | 2.0× |
| 3 | 4 | 1.33× |
| 1000 | 1336 | 1.336× |
| 1,000,000 | 1,333,336 | 1.333× |
This 33% overhead is the immutable cost of fitting binary into a printable text channel. If size matters, don’t Base64 — use a binary-aware transport.
Common Base64 use cases
Data URLs in HTML/CSS
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..." alt="" />
Inlines a PNG directly in HTML. Saves an HTTP request. Worth it for small icons (<2KB). Beyond that, the request overhead is cheaper than the 33% size penalty applied to every cache load.
JWT signatures
A JWT looks like header.payload.signature. Each part is URL-safe Base64-encoded. The header and payload are JSON; the signature is binary HMAC or RSA bytes. Base64 makes the binary signature fit in an HTTP header.
Email attachments (MIME)
Email bodies were originally 7-bit ASCII. To send a binary file, MIME wraps it in Base64 with line breaks every 76 characters (the MIME standard width). That’s why email source code has those wrapped Base64 blocks.
API payloads with binary fields
JSON cannot carry raw bytes. APIs that need to send binary (image upload responses, file checksums, public keys) typically Base64-encode and put the string in a JSON field. The receiver decodes after parsing JSON.
What Base64 is NOT
Base64 is not encryption. It’s a reversible encoding. Anyone who sees the encoded form can decode it instantly. If your “secret” is Base64-encoded, it isn’t a secret. Use AES, bcrypt for passwords, or proper crypto.
Base64 is not compression. It makes data 33% larger, not smaller. Compress first, encode second if you need both.
Base64 is not a checksum. Encoding doesn’t detect or correct errors in the data.
Base64 is not the only encoding. Base32 (32-char alphabet, used in TOTP) and Base85 (denser, used in PostScript) exist. Base64 is just the most popular middle ground.
The UTF-8 / Latin-1 gotcha
This is the bug that bites every JavaScript developer at least once.
The browser’s built-in btoa() function only accepts strings where every character has a code point ≤ 255 (Latin-1). Pass it "Hello 🦊" and it throws InvalidCharacterError.
The fix: convert the string to UTF-8 bytes first, then Base64-encode the bytes:
const text = "Hello 🦊";
const bytes = new TextEncoder().encode(text); // UTF-8 bytes
const base64 = btoa(String.fromCharCode(...bytes)); // Base64
For decoding, reverse: atob returns a “binary string” where each character represents one byte. Convert back to UTF-8:
const binary = atob(base64);
const bytes = Uint8Array.from(binary, c => c.charCodeAt(0));
const text = new TextDecoder('utf-8').decode(bytes);
The Base64 tool on Xerobit handles this correctly via TextEncoder/TextDecoder. Most homemade implementations don’t.
Tooling note
When you debug a Base64 string in the wild:
- Check if it contains
+/=(standard) or-_(URL-safe). - Check the length. If it’s not a multiple of 4 and there’s no padding, it’s URL-safe with stripped padding.
- If it decodes to gibberish, you may have a UTF-8/Latin-1 encoding mismatch — try forcing UTF-8 decode.
The Base64 tool auto-detects all of this. It also runs entirely client-side, so tokens you decode (JWTs, API keys, etc.) never leave your browser.
Bottom line
Base64 is a fixed-cost transformation: 4 characters out for every 3 bytes in. The growth is a feature, not a bug — it’s the price you pay for shipping arbitrary binary through text-only channels. Understand the algorithm once, recognize the variants (standard, URL-safe, padded, unpadded), and the next mystery JWT or data URL you see will read like prose.
Further reading
Related posts
- When You Should NOT Use Base64 Encoding — Base64 is the duct tape of the web — and like real duct tape, it's used in place…
- Percent Encoding and RFC 3986 Explained — Why is `+` sometimes a space and sometimes a literal plus? Why does `%2520` show…
Related tool
Encode and decode Base64 strings and files. Client-side, safe for sensitive data.
Written by Mian Ali Khalid. Part of the Encoding & Crypto pillar.