X Xerobit

Percent Encoding and RFC 3986 Explained

Why is `+` sometimes a space and sometimes a literal plus? Why does `%2520` show up in your logs? RFC 3986 percent-encoding, end to end, with the rules nobody quite remembers.

Mian Ali Khalid · · 10 min read
Use the tool
URL Encoder / Decoder
Percent-encode and decode URLs per RFC 3986.
Open URL Encoder / Decoder →

You build a URL. The user’s input contains an ampersand. Or a slash. Or a non-ASCII character. Suddenly the server parses the URL wrong, the email arrives with & in the subject, or + becomes a space (or doesn’t). Welcome to URL encoding — the most consistently misunderstood corner of web development.

This post is the rules. Not the broad strokes — the exact rules from RFC 3986 about what gets encoded and why.

What percent encoding is

A URL can only safely carry a fixed alphabet of characters. Anything outside that alphabet — spaces, accented letters, emoji, ampersands in user input — must be encoded.

The encoding scheme is “percent encoding”: each disallowed byte is written as % followed by two hex digits representing the byte’s value.

Input:  Hello World!
Output: Hello%20World%21

Space (byte 0x20) → %20. Exclamation mark (0x21) → %21. Letters and digits pass through unchanged because they’re in the allowed alphabet.

For non-ASCII characters, percent encoding works on UTF-8 bytes:

Input:  café
UTF-8:  c (0x63) a (0x61) f (0x66) é (0xC3 0xA9)
Output: caf%C3%A9

é is a multi-byte character in UTF-8 (bytes 0xC3 0xA9), so it becomes two percent-escape sequences. The decoder reverses this: read percent-escapes, reassemble UTF-8 bytes, decode to text.

The five character categories in RFC 3986

This is where things get interesting. RFC 3986 puts every ASCII character into one of five categories that determine when encoding is required.

1. Unreserved characters — never encoded

A-Z  a-z  0-9  -  _  .  ~

These 66 characters are always safe. They never need encoding. They never should be encoded — encoding them is technically valid but generally pointless. Use these for URL slugs and identifiers.

2. Reserved characters — context-dependent

These have syntactic meaning in URLs:

gen-delims:  : / ? # [ ] @
sub-delims:  ! $ & ' ( ) * + , ; =

Gen-delims separate the major URL components: :// host separator, / path separator, ? query start, # fragment start. Sub-delims are special inside specific components.

When these characters appear in a literal sense (inside a value rather than as syntax), they must be encoded. When they’re acting as syntax (the ? separating path from query), they pass through unencoded.

This is why encodeURI and encodeURIComponent differ — one is for the URL skeleton (preserve syntax chars), the other for values (encode everything that has special meaning).

3. Other ASCII (printable) — must encode

<>{}|\\^" and the backtick. These are technically illegal in URIs and must always be encoded.

4. Control characters and ASCII codes 0-31, 127 — must encode

Newlines, tabs, NUL bytes, DEL — all encoded.

5. Non-ASCII (UTF-8) — must encode as bytes

Everything outside ASCII is multiple UTF-8 bytes, each percent-encoded.

The two JavaScript functions and why both exist

JavaScript ships two URL-encoding functions, and confusion between them causes most URL bugs:

encodeURIComponent — for values

Encodes everything except unreserved characters. Use this for query string values, path segments, anything that’s a value going into a URL.

encodeURIComponent("https://example.com")
// → "https%3A%2F%2Fexample.com"

The : and / are encoded because we’re treating the input as a value, not as URL syntax.

encodeURI — for whole URLs

Encodes only what’s strictly illegal in URLs. Preserves the URL syntax characters (/, :, ?, #, &, =, +, etc.).

encodeURI("https://example.com/search?q=hello world")
// → "https://example.com/search?q=hello%20world"

The :, //, ? are kept because they’re URL syntax. Only the space gets encoded.

Rule of thumb: Use encodeURIComponent for values. Use encodeURI only when you have a complete URL with potentially-illegal characters and need to leave the structure intact. Almost always you want encodeURIComponent.

The mistake people make: using encodeURI on a value. Then & and = in the value are not encoded, and they get parsed as URL syntax — splitting one query parameter into two.

The application/x-www-form-urlencoded variant

HTML form submissions and most REST APIs use a variant of percent encoding called application/x-www-form-urlencoded. Two key differences from RFC 3986:

  1. Spaces become +, not %20. This is the form-encoding convention.
  2. Different reserved character list. Slightly stricter than RFC 3986.
Plain percent encoding:        hello%20world
Form encoding:                  hello+world

When URLSearchParams builds query strings, it produces form encoding (+ for spaces). When you decodeURIComponent form-encoded text, the + stays a literal + — you have to first replace + with %20, then decode.

const formEncoded = "hello+world";
// decodeURIComponent doesn't decode + → space:
decodeURIComponent(formEncoded);  // "hello+world"

// To handle form encoding:
decodeURIComponent(formEncoded.replace(/\+/g, '%20'));  // "hello world"

The URL Encoder handles both — there’s a “form-urlencoded” mode that decodes + as space.

RFC 3986 strict mode (sub-delims)

RFC 3986 marks ! * ' ( ) as sub-delims that can appear in URIs. JavaScript’s encodeURIComponent does NOT encode these, even though some applications expect strictly-encoded forms.

For maximum compatibility (mostly with older OAuth flows and some XML-based APIs), encode these too:

function strictEncode(str) {
  return encodeURIComponent(str).replace(
    /[!*'()]/g,
    c => '%' + c.charCodeAt(0).toString(16).toUpperCase()
  );
}

The Url Encoder has this as the “RFC 3986 strict” mode.

Why your “%2520” is a sign of double-encoding

Single percent-encoding: space → %20. Double percent-encoding: space → %20, then encode %%25, giving %2520.

If you see %2520 in your URLs, something is encoding twice. Common causes:

  • A framework that auto-encodes input that’s already encoded.
  • A redirect chain where each hop re-encodes the URL.
  • A reverse proxy that decodes/re-encodes incorrectly.

The fix is finding the duplicate-encoder, not band-aiding the output. Trace the request path; one of the layers is treating already-encoded input as raw input.

Encoding by URL component

Different parts of a URL have different encoding rules:

Scheme

http, https, mailto. Letters, digits, +, -, .. Never percent-encoded — the scheme alphabet is restricted.

Host

Domain names use punycode (xn—) for international characters, not percent encoding. Modern browsers accept both forms but display the human-readable one.

Path

Slashes (/) are syntax — kept literal. Spaces, special characters in path segments are encoded. RFC 3986 path-encoding is less strict than query-encoding.

Query

After ?. Each parameter is name=value, separated by &. Inside values, encode everything that’s not unreserved. Use URLSearchParams to do this correctly:

const params = new URLSearchParams({
  q: "hello world",
  filter: "name=Ada"
});
fetch(`/search?${params}`);
// → /search?q=hello+world&filter=name%3DAda

Note = in the value is encoded; & separating params is not (it’s syntax).

Fragment

After #. Same character set as queries but & and = aren’t reserved (they’re allowed literally). Spaces still encode to %20.

This is why deep-linked SPA URLs with state in the fragment can be tricky — the encoding rules subtly differ from query strings.

A canonical encoding workflow

For most applications, this is the safe pattern:

// Building URLs
const baseUrl = 'https://api.example.com/users';
const userId = encodeURIComponent(userInput);  // path segment
const queryParams = new URLSearchParams({       // query values
  filter: filterText,
  sort: 'desc'
});

const url = `${baseUrl}/${userId}?${queryParams}`;
  • encodeURIComponent for path segments and individual query values.
  • URLSearchParams for assembling query strings.
  • Don’t manually concat ? and & — let URLSearchParams handle it.

Common bugs (full list elsewhere)

These come up so often they have their own post: URL Encoding Common Bugs. Highlights:

  • Double-encoding%2520 everywhere.
  • encodeURI for query value — splits param on unencoded &.
  • + ambiguity — space in query, literal in path.
  • Trailing slash inconsistencypath vs path/ are different URLs.
  • Fragment encoding mismatch# must be encoded if it appears mid-URL.

Bottom line

Percent encoding rules are simple in isolation — five categories, two JavaScript functions, one variant for forms. The complexity comes from getting the same input to round-trip cleanly through three or four URL-handling layers (browser, framework, proxy, server). Use URLSearchParams and encodeURIComponent consistently, never use encodeURI for values, and watch for %25 patterns in logs as the canary for double-encoding.

Further reading


Related posts

Related tool

URL Encoder / Decoder

Percent-encode and decode URLs per RFC 3986.

Written by Mian Ali Khalid. Part of the Encoding & Crypto pillar.