Percent Encoding and RFC 3986 Explained
Why is `+` sometimes a space and sometimes a literal plus? Why does `%2520` show up in your logs? RFC 3986 percent-encoding, end to end, with the rules nobody quite remembers.
You build a URL. The user’s input contains an ampersand. Or a slash. Or a non-ASCII character. Suddenly the server parses the URL wrong, the email arrives with & in the subject, or + becomes a space (or doesn’t). Welcome to URL encoding — the most consistently misunderstood corner of web development.
This post is the rules. Not the broad strokes — the exact rules from RFC 3986 about what gets encoded and why.
What percent encoding is
A URL can only safely carry a fixed alphabet of characters. Anything outside that alphabet — spaces, accented letters, emoji, ampersands in user input — must be encoded.
The encoding scheme is “percent encoding”: each disallowed byte is written as % followed by two hex digits representing the byte’s value.
Input: Hello World!
Output: Hello%20World%21
Space (byte 0x20) → %20. Exclamation mark (0x21) → %21. Letters and digits pass through unchanged because they’re in the allowed alphabet.
For non-ASCII characters, percent encoding works on UTF-8 bytes:
Input: café
UTF-8: c (0x63) a (0x61) f (0x66) é (0xC3 0xA9)
Output: caf%C3%A9
é is a multi-byte character in UTF-8 (bytes 0xC3 0xA9), so it becomes two percent-escape sequences. The decoder reverses this: read percent-escapes, reassemble UTF-8 bytes, decode to text.
The five character categories in RFC 3986
This is where things get interesting. RFC 3986 puts every ASCII character into one of five categories that determine when encoding is required.
1. Unreserved characters — never encoded
A-Z a-z 0-9 - _ . ~
These 66 characters are always safe. They never need encoding. They never should be encoded — encoding them is technically valid but generally pointless. Use these for URL slugs and identifiers.
2. Reserved characters — context-dependent
These have syntactic meaning in URLs:
gen-delims: : / ? # [ ] @
sub-delims: ! $ & ' ( ) * + , ; =
Gen-delims separate the major URL components: :// host separator, / path separator, ? query start, # fragment start. Sub-delims are special inside specific components.
When these characters appear in a literal sense (inside a value rather than as syntax), they must be encoded. When they’re acting as syntax (the ? separating path from query), they pass through unencoded.
This is why encodeURI and encodeURIComponent differ — one is for the URL skeleton (preserve syntax chars), the other for values (encode everything that has special meaning).
3. Other ASCII (printable) — must encode
<>{}|\\^" and the backtick. These are technically illegal in URIs and must always be encoded.
4. Control characters and ASCII codes 0-31, 127 — must encode
Newlines, tabs, NUL bytes, DEL — all encoded.
5. Non-ASCII (UTF-8) — must encode as bytes
Everything outside ASCII is multiple UTF-8 bytes, each percent-encoded.
The two JavaScript functions and why both exist
JavaScript ships two URL-encoding functions, and confusion between them causes most URL bugs:
encodeURIComponent — for values
Encodes everything except unreserved characters. Use this for query string values, path segments, anything that’s a value going into a URL.
encodeURIComponent("https://example.com")
// → "https%3A%2F%2Fexample.com"
The : and / are encoded because we’re treating the input as a value, not as URL syntax.
encodeURI — for whole URLs
Encodes only what’s strictly illegal in URLs. Preserves the URL syntax characters (/, :, ?, #, &, =, +, etc.).
encodeURI("https://example.com/search?q=hello world")
// → "https://example.com/search?q=hello%20world"
The :, //, ? are kept because they’re URL syntax. Only the space gets encoded.
Rule of thumb: Use encodeURIComponent for values. Use encodeURI only when you have a complete URL with potentially-illegal characters and need to leave the structure intact. Almost always you want encodeURIComponent.
The mistake people make: using encodeURI on a value. Then & and = in the value are not encoded, and they get parsed as URL syntax — splitting one query parameter into two.
The application/x-www-form-urlencoded variant
HTML form submissions and most REST APIs use a variant of percent encoding called application/x-www-form-urlencoded. Two key differences from RFC 3986:
- Spaces become
+, not%20. This is the form-encoding convention. - Different reserved character list. Slightly stricter than RFC 3986.
Plain percent encoding: hello%20world
Form encoding: hello+world
When URLSearchParams builds query strings, it produces form encoding (+ for spaces). When you decodeURIComponent form-encoded text, the + stays a literal + — you have to first replace + with %20, then decode.
const formEncoded = "hello+world";
// decodeURIComponent doesn't decode + → space:
decodeURIComponent(formEncoded); // "hello+world"
// To handle form encoding:
decodeURIComponent(formEncoded.replace(/\+/g, '%20')); // "hello world"
The URL Encoder handles both — there’s a “form-urlencoded” mode that decodes + as space.
RFC 3986 strict mode (sub-delims)
RFC 3986 marks ! * ' ( ) as sub-delims that can appear in URIs. JavaScript’s encodeURIComponent does NOT encode these, even though some applications expect strictly-encoded forms.
For maximum compatibility (mostly with older OAuth flows and some XML-based APIs), encode these too:
function strictEncode(str) {
return encodeURIComponent(str).replace(
/[!*'()]/g,
c => '%' + c.charCodeAt(0).toString(16).toUpperCase()
);
}
The Url Encoder has this as the “RFC 3986 strict” mode.
Why your “%2520” is a sign of double-encoding
Single percent-encoding: space → %20.
Double percent-encoding: space → %20, then encode % → %25, giving %2520.
If you see %2520 in your URLs, something is encoding twice. Common causes:
- A framework that auto-encodes input that’s already encoded.
- A redirect chain where each hop re-encodes the URL.
- A reverse proxy that decodes/re-encodes incorrectly.
The fix is finding the duplicate-encoder, not band-aiding the output. Trace the request path; one of the layers is treating already-encoded input as raw input.
Encoding by URL component
Different parts of a URL have different encoding rules:
Scheme
http, https, mailto. Letters, digits, +, -, .. Never percent-encoded — the scheme alphabet is restricted.
Host
Domain names use punycode (xn—) for international characters, not percent encoding. Modern browsers accept both forms but display the human-readable one.
Path
Slashes (/) are syntax — kept literal. Spaces, special characters in path segments are encoded. RFC 3986 path-encoding is less strict than query-encoding.
Query
After ?. Each parameter is name=value, separated by &. Inside values, encode everything that’s not unreserved. Use URLSearchParams to do this correctly:
const params = new URLSearchParams({
q: "hello world",
filter: "name=Ada"
});
fetch(`/search?${params}`);
// → /search?q=hello+world&filter=name%3DAda
Note = in the value is encoded; & separating params is not (it’s syntax).
Fragment
After #. Same character set as queries but & and = aren’t reserved (they’re allowed literally). Spaces still encode to %20.
This is why deep-linked SPA URLs with state in the fragment can be tricky — the encoding rules subtly differ from query strings.
A canonical encoding workflow
For most applications, this is the safe pattern:
// Building URLs
const baseUrl = 'https://api.example.com/users';
const userId = encodeURIComponent(userInput); // path segment
const queryParams = new URLSearchParams({ // query values
filter: filterText,
sort: 'desc'
});
const url = `${baseUrl}/${userId}?${queryParams}`;
encodeURIComponentfor path segments and individual query values.URLSearchParamsfor assembling query strings.- Don’t manually concat
?and&— letURLSearchParamshandle it.
Common bugs (full list elsewhere)
These come up so often they have their own post: URL Encoding Common Bugs. Highlights:
- Double-encoding —
%2520everywhere. encodeURIfor query value — splits param on unencoded&.+ambiguity — space in query, literal in path.- Trailing slash inconsistency —
pathvspath/are different URLs. - Fragment encoding mismatch —
#must be encoded if it appears mid-URL.
Bottom line
Percent encoding rules are simple in isolation — five categories, two JavaScript functions, one variant for forms. The complexity comes from getting the same input to round-trip cleanly through three or four URL-handling layers (browser, framework, proxy, server). Use URLSearchParams and encodeURIComponent consistently, never use encodeURI for values, and watch for %25 patterns in logs as the canary for double-encoding.
Further reading
- URL Encoding: The 7 Bugs That Break Your API
- URL Encoder / Decoder tool — all four variants
- RFC 3986 — URI Generic Syntax (the actual spec)
- WHATWG URL Standard — modern browser behavior
Related posts
- URL Encoding: The 7 Bugs That Break Your API — Every API has at least one URL-encoding bug. Here are the seven I see most — wha…
- Base64: How It Actually Works Under the Hood — Base64 is everywhere — in JWTs, data URLs, email attachments. This is the byte-l…
Related tool
Percent-encode and decode URLs per RFC 3986.
Written by Mian Ali Khalid. Part of the Encoding & Crypto pillar.