X Xerobit

Base64: How It Actually Works Under the Hood

Base64 is everywhere — in JWTs, data URLs, email attachments. This is the byte-level walkthrough of what it does, why it grows files by 33%, and the URL-safe variant.

Mian Ali Khalid · · 10 min read
Use the tool
Base64 Encoder / Decoder
Encode and decode Base64 strings and files. Client-side, safe for sensitive data.
Open Base64 Encoder / Decoder →

If you’ve ever inspected a JWT, embedded an image as a data URL in CSS, or read raw email source, you’ve seen Base64. It looks like garbage:

SGVsbG8sIFhlcm9iaXQu

That’s Hello, Xerobit. Six bytes encoded as twenty characters. This post walks the algorithm bit by bit so the next time you see Base64, you understand exactly what’s happening.

The problem Base64 solves

Some data is binary. Image bytes, audio samples, encrypted blobs, hash digests. Some transports are text-only. Email bodies (historically), JSON fields, URL query strings, HTTP headers, console output.

You can’t just paste binary into text. Bytes 0–31 are control characters that hose terminals. Byte 0 is null, which terminates C strings. Bytes above 127 are interpreted differently depending on encoding. JSON parsers reject most non-ASCII byte sequences. The whole text-based ecosystem assumes a printable subset.

Base64 is the bridge. It maps any sequence of bytes onto a 64-character alphabet — A-Z, a-z, 0-9, +, / — that’s safe in essentially every text context. The cost: the encoded form is about 33% larger than the original.

The algorithm in three steps

Base64 turns three input bytes (24 bits) into four output characters (24 bits, 6 per character). The math:

  1. Take 3 bytes. That’s 24 bits.
  2. Split those 24 bits into 4 groups of 6 bits each.
  3. Map each 6-bit group to one character via the Base64 alphabet.

Let’s encode Cat:

Input:  C        a        t
ASCII:  67       97       116
Binary: 01000011 01100001 01110100

Concatenate the binary: 010000110110000101110100 (24 bits).

Split into 6-bit groups: 010000 110110 000101 110100 (4 groups).

Convert each to decimal: 16 54 5 52.

Look up in the Base64 alphabet:

0  A    8  I   16 Q   24 Y   32 g   40 o   48 w   56 4
1  B    9  J   17 R   25 Z   33 h   41 p   49 x   57 5
2  C    10 K   18 S   26 a   34 i   42 q   50 y   58 6
3  D    11 L   19 T   27 b   35 j   43 r   51 z   59 7
4  E    12 M   20 U   28 c   36 k   44 s   52 0   60 8
5  F    13 N   21 V   29 d   37 l   45 t   53 1   61 9
6  G    14 O   22 W   30 e   38 m   46 u   54 2   62 +
7  H    15 P   23 X   31 f   39 n   47 v   55 3   63 /

16Q, 542, 5F, 520.

Result: Q2F0 — four characters. Original was three bytes, output is four characters. Ratio: 4/3 ≈ 1.33x growth. Always.

Padding: the = characters at the end

The algorithm assumes input length is divisible by 3. What if it isn’t?

If the input has 1 byte (8 bits), you only get 12 bits to work with, which is 2 six-bit groups. You’d encode just 2 characters. By convention, you pad to 4 characters with ==.

If the input has 2 bytes (16 bits), you get 18 bits → 3 groups → 3 characters. Pad to 4 with one =.

So:

Input bytesOutput charsPadding
12 + ==XX==
23 + =XXX=
34 (no padding)XXXX

The = exists so encoded output length is always a multiple of 4, which lets some legacy parsers detect chunk boundaries without counting bytes.

Concrete example: encoding Hi (2 bytes):

Input:  H        i
Binary: 01001000 01101001

Pad with zero bits to make 18 bits → split into 6-bit groups → output 3 chars + 1 padding =:

010010 000110 1001(00)
  18     6      36 (last group padded with zeros)
   S     G      k

Result: SGk=. The encoded output is 4 characters; the third group’s last 2 bits are zero-padding because there were no real bits there. The = signals “the last group only had data for the first 4 bits; the rest are padding.”

URL-safe Base64 (RFC 4648 §5)

The standard Base64 alphabet uses + and /. Both have special meanings in URLs (+ is sometimes interpreted as space, / is the path separator). URL-safe Base64 swaps these for - and _:

IndexStandardURL-safe
62+-
63/_

Padding is also commonly omitted in URL-safe variants. JWTs use URL-safe Base64 without padding — that’s why JWT signatures don’t have trailing = characters.

To encode in URL-safe mode: encode normally, then s/+/-/g and s/\//_/g. Strip trailing = if your protocol accepts unpadded.

To decode URL-safe: reverse-substitute, re-pad to multiple of 4 with =, then standard-decode. The Base64 tool does both directions.

Why exactly 33% larger?

Base64 expansion is always 4/3 = 1.333… regardless of input. Every 3 input bytes become 4 output characters. Each output character takes 1 byte (ASCII). So output_bytes = ceil(input_bytes / 3) * 4.

For very small inputs the padding makes the ratio worse:

Input bytesOutput bytesRatio
14 (X===)4.0×
24 (XX==)2.0×
341.33×
100013361.336×
1,000,0001,333,3361.333×

This 33% overhead is the immutable cost of fitting binary into a printable text channel. If size matters, don’t Base64 — use a binary-aware transport.

Common Base64 use cases

Data URLs in HTML/CSS

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..." alt="" />

Inlines a PNG directly in HTML. Saves an HTTP request. Worth it for small icons (<2KB). Beyond that, the request overhead is cheaper than the 33% size penalty applied to every cache load.

JWT signatures

A JWT looks like header.payload.signature. Each part is URL-safe Base64-encoded. The header and payload are JSON; the signature is binary HMAC or RSA bytes. Base64 makes the binary signature fit in an HTTP header.

Email attachments (MIME)

Email bodies were originally 7-bit ASCII. To send a binary file, MIME wraps it in Base64 with line breaks every 76 characters (the MIME standard width). That’s why email source code has those wrapped Base64 blocks.

API payloads with binary fields

JSON cannot carry raw bytes. APIs that need to send binary (image upload responses, file checksums, public keys) typically Base64-encode and put the string in a JSON field. The receiver decodes after parsing JSON.

What Base64 is NOT

Base64 is not encryption. It’s a reversible encoding. Anyone who sees the encoded form can decode it instantly. If your “secret” is Base64-encoded, it isn’t a secret. Use AES, bcrypt for passwords, or proper crypto.

Base64 is not compression. It makes data 33% larger, not smaller. Compress first, encode second if you need both.

Base64 is not a checksum. Encoding doesn’t detect or correct errors in the data.

Base64 is not the only encoding. Base32 (32-char alphabet, used in TOTP) and Base85 (denser, used in PostScript) exist. Base64 is just the most popular middle ground.

The UTF-8 / Latin-1 gotcha

This is the bug that bites every JavaScript developer at least once.

The browser’s built-in btoa() function only accepts strings where every character has a code point ≤ 255 (Latin-1). Pass it "Hello 🦊" and it throws InvalidCharacterError.

The fix: convert the string to UTF-8 bytes first, then Base64-encode the bytes:

const text = "Hello 🦊";
const bytes = new TextEncoder().encode(text);  // UTF-8 bytes
const base64 = btoa(String.fromCharCode(...bytes));  // Base64

For decoding, reverse: atob returns a “binary string” where each character represents one byte. Convert back to UTF-8:

const binary = atob(base64);
const bytes = Uint8Array.from(binary, c => c.charCodeAt(0));
const text = new TextDecoder('utf-8').decode(bytes);

The Base64 tool on Xerobit handles this correctly via TextEncoder/TextDecoder. Most homemade implementations don’t.

Tooling note

When you debug a Base64 string in the wild:

  1. Check if it contains + / = (standard) or - _ (URL-safe).
  2. Check the length. If it’s not a multiple of 4 and there’s no padding, it’s URL-safe with stripped padding.
  3. If it decodes to gibberish, you may have a UTF-8/Latin-1 encoding mismatch — try forcing UTF-8 decode.

The Base64 tool auto-detects all of this. It also runs entirely client-side, so tokens you decode (JWTs, API keys, etc.) never leave your browser.

Bottom line

Base64 is a fixed-cost transformation: 4 characters out for every 3 bytes in. The growth is a feature, not a bug — it’s the price you pay for shipping arbitrary binary through text-only channels. Understand the algorithm once, recognize the variants (standard, URL-safe, padded, unpadded), and the next mystery JWT or data URL you see will read like prose.

Further reading


Related posts

Related tool

Base64 Encoder / Decoder

Encode and decode Base64 strings and files. Client-side, safe for sensitive data.

Written by Mian Ali Khalid. Part of the Encoding & Crypto pillar.