URL Encoding: The 7 Bugs That Break Your API
Every API has at least one URL-encoding bug. Here are the seven I see most — what each looks like in production, the symptom, and the fix.
URL encoding is one of those topics every developer thinks they understand until production breaks at 2 AM. After enough debugging sessions, you start recognizing the same seven bugs over and over. Here’s the field guide — symptom first, then root cause, then fix.
For the underlying spec, see Percent Encoding and RFC 3986 Explained.
Bug #1: Double-encoding (the %2520 mystery)
Symptom: Your URLs in logs show %2520 where you expected %20. Search queries fail because %20 is being treated as literal text.
Example:
Expected: /search?q=hello%20world
Logged: /search?q=hello%2520world
Root cause: The URL is being encoded twice. The first encoding turned space → %20. The second encoding saw % and turned it into %25, leaving %2520.
Where it happens:
- Frontend encodes user input, then a framework re-encodes the whole URL.
- A redirect chain where each redirect re-percent-encodes the target URL.
- A reverse proxy that decodes/re-encodes incorrectly.
- Manually concatenating an already-encoded value into another encoded URL.
Fix: Find the second encoder and skip it. Trace the request through every layer. The pattern: search your codebase for encodeURI and encodeURIComponent calls, identify which one is acting on already-encoded input, and remove that call.
Quick diagnosis: if you see %25xx patterns in URLs (where xx is two hex digits), you’re double-encoding. The %25 is “literal percent.”
Bug #2: Using encodeURI for query values
Symptom: A query parameter contains characters that look fine, but the server parses the URL with extra parameters or wrong values.
Example:
// 🚫 Wrong
const userInput = "name=Ada&role=admin";
const url = `/api/lookup?q=${encodeURI(userInput)}`;
// → /api/lookup?q=name=Ada&role=admin
// Server parses: { q: "name=Ada", role: "admin" } — TWO params!
Root cause: encodeURI preserves URL syntax characters (&, =). When user input contains those, they’re left unencoded and get parsed as if they were query syntax.
Fix: Use encodeURIComponent for values. It encodes everything that has URL-syntax meaning.
// ✅ Correct
const url = `/api/lookup?q=${encodeURIComponent(userInput)}`;
// → /api/lookup?q=name%3DAda%26role%3Dadmin
// Server parses: { q: "name=Ada&role=admin" } — ONE param
Better: Use URLSearchParams and let it handle the encoding:
const params = new URLSearchParams({ q: userInput });
const url = `/api/lookup?${params}`;
The mnemonic: encodeURI is for whole URLs. encodeURIComponent is for URL components (which means “everything that goes inside a URL”). Always reach for encodeURIComponent.
Bug #3: Plus sign confusion (space vs literal)
Symptom: Your search query “C++” returns results for “C ” (two spaces). Or your form submission for “1+2” gets stored as “1 2”.
Root cause: + has two meanings in URLs depending on context:
- In
application/x-www-form-urlencoded(form bodies, common in query strings):+is a space. - Everywhere else (path segments, RFC 3986 strict):
+is a literal plus.
If your code uses URLSearchParams to encode and a parser that doesn’t decode + as space, the literal + gets through. If your code uses regular encodeURIComponent (which encodes space as %20, not +) and the receiver expects form encoding (+ for space), the mismatch creates garbage.
Fix: Match the encoding to the consumer:
| Consumer | Use |
|---|---|
| Reading a typical query string | + = space (form-encoded convention) |
| Reading a URL path | + = literal + |
| Sending to an HTML form-style endpoint | Use URLSearchParams (produces form encoding) |
| Sending to an API expecting RFC 3986 | Use encodeURIComponent for each value |
If you need a literal + in form-encoded query, encode it explicitly: %2B.
Bug #4: Trailing-slash inconsistency
Symptom: /api/users/42 returns 200; /api/users/42/ returns 404 (or vice versa). Or analytics shows them as separate URLs and you can’t aggregate.
Root cause: Trailing slashes change the URL identity. The HTTP spec doesn’t say what they mean — it says “they identify different resources” and leaves it to the server.
Most frameworks have an opinion:
- Express and Fastify: usually treat
/fooand/foo/as the same. - Some Apache rules: redirect one to the other.
- Static site generators:
/foo/becomes a directory withindex.html;/foowould need a redirect.
Fix:
- Pick a convention and enforce it everywhere. Either always with trailing slashes or never.
- Add a 301 redirect from the non-canonical form to the canonical form.
- Set the canonical URL in your
<link rel="canonical">tags. - Reference the canonical form in internal links.
This isn’t strictly an encoding bug, but it surfaces in URL handling and is the same shape of problem.
Bug #5: Fragment encoding mismatch
Symptom: Deep-linked URLs with #fragment in path-style state break in some clients but work in others.
Example:
https://app.example.com/page#section?filter=active
The ? after # is part of the fragment, not a query separator. But some parsers treat # as a query stop and try to parse the rest as a query string.
Root cause: Fragments (#) terminate the URL parser. Everything after # is the fragment — not parsed further by the browser, sent client-side only. But hash-based SPA routes overload the fragment to carry route state, including query-like syntax.
Fix:
- For SPA hash routing, use a clear convention like
#!or#/to signal “this is a route, not a static fragment.” - Always
encodeURIComponentuser input that goes into the fragment. - Don’t rely on fragment-as-query parsing — it’s not standard.
- Modern SPAs should prefer the History API (path-based routing) over hash routing.
Bug #6: Multi-byte UTF-8 character mishandling
Symptom: International characters in URLs render as é instead of é, or as ? literals.
Root cause: A layer in the chain decoded the URL as Latin-1 / Windows-1252 instead of UTF-8. Each UTF-8 multi-byte sequence got interpreted as multiple Latin-1 characters.
Example: é is UTF-8 bytes 0xC3 0xA9. Decoded as Latin-1, those become é — two visible characters.
Where it happens:
- Old PHP servers without
mb_stringextension. - A frontend that encodes text as UTF-8 but a backend that reads it as Latin-1.
- File-system operations that use the OS’s default encoding (not UTF-8 on Windows historically).
- Database connections that don’t specify UTF-8 charset.
Fix:
- Standardize on UTF-8 everywhere — HTTP
Content-Typeheader, database connection charset, file reading code, file system encoding. - Validate at every boundary. If the input should be UTF-8 but isn’t, reject it loudly rather than silently mojibake-ing.
- For URLs specifically: ensure your server sees
Content-Type: ...; charset=utf-8on POST bodies and decodes path/query as UTF-8.
Bug #7: Inconsistent strict vs lenient parsing
Symptom: Some URLs work in your tests but break in production, or vice versa. Same URL behaves differently in different parts of the system.
Root cause: “Strict” RFC 3986 parsers reject characters that “lenient” parsers (most browsers) accept. JavaScript’s URL object follows the WHATWG URL Standard, which is more lenient than RFC 3986. Other languages (Java, Go, Rust) often use stricter RFC 3986 parsing.
Example: A URL with raw spaces or ^ will be rejected by Go’s url.Parse but accepted by browser new URL().
Fix:
- Always run user input through
encodeURIComponent(or your language’s equivalent) before assembling URLs. - Don’t rely on lenient parsing as a feature — code as if every parser is strict.
- Add input validation to reject out-of-spec characters before they get to URL assembly.
A protective workflow
Whenever I’m building URL handling, I follow this:
// 1. Treat user input as raw, untrusted data
const raw = userInput;
// 2. Encode at the URL boundary, never deeper
const safe = encodeURIComponent(raw);
// 3. Use URL-aware tools to assemble
const params = new URLSearchParams({ q: raw }); // URLSearchParams encodes for you
const url = new URL('/api/search', baseUrl);
url.search = params.toString();
// 4. Convert to string only at the last moment
fetch(url.toString());
Three rules:
- Never manually concat URL pieces with
+. UseURLandURLSearchParams. - Encode at the boundary. Don’t spread encode/decode calls through your codebase.
- Pick one convention for trailing slashes, query parameter ordering, and fragment usage. Document it.
Quick-fix lookup table
| Symptom | Likely bug | Fix |
|---|---|---|
%2520 in URLs | Double encoding | Find duplicate encodeURIComponent call |
| Query splits into multiple params | encodeURI for value | Switch to encodeURIComponent |
+ shows as literal + | Form vs URL encoding mismatch | Replace + with %20 before decode |
é instead of é | UTF-8 / Latin-1 mismatch | Force UTF-8 in Content-Type and DB |
| 404 with trailing slash | Inconsistent slash convention | Add 301 redirect, pick one canonical |
| Hash-route + query confusion | Parser treating # as ? | Encode user input in fragment, use proper SPA router |
| URL works in browser, fails server | Lenient vs strict parsing | Always encodeURIComponent user input |
Bottom line
URL encoding is consistently one of the most-debugged topics in web dev because the rules are simple but the boundaries between systems are subtle. The fix for 90% of these bugs is the same: trust nothing, encode at the boundary with the right function, and use URLSearchParams whenever building query strings.
Further reading
- Percent Encoding and RFC 3986 Explained
- URL Encoder / Decoder tool — debug specific encoding issues
- WHATWG URL Standard — what browsers actually do
- RFC 3986 — the strict spec
Related posts
- Percent Encoding and RFC 3986 Explained — Why is `+` sometimes a space and sometimes a literal plus? Why does `%2520` show…
- Base64: How It Actually Works Under the Hood — Base64 is everywhere — in JWTs, data URLs, email attachments. This is the byte-l…
Related tool
Percent-encode and decode URLs per RFC 3986.
Written by Mian Ali Khalid. Part of the Encoding & Crypto pillar.