URL Canonicalization — Normalize URLs for APIs, Caching, and SEO
URL canonicalization normalizes URLs to a consistent form, removing redundant parameters, encoding differences, and trailing slash inconsistencies. Learn how to canonicalize...
Use the tool
URL Encoder / Decoder
Percent-encode and decode URLs per RFC 3986.
URL canonicalization produces a consistent “canonical” form of a URL, making comparison, caching, and deduplication reliable. Without it, example.com/path, example.com/path/, and EXAMPLE.COM/Path are treated as different URLs.
Encode and decode URL components with the URL Encoder.
What canonicalization normalizes
Input URL: HTTP://Example.COM:80/path/../blog/./Post%20One?b=2&a=1&b=2#section
Canonical: https://example.com/blog/post%20one?a=1&b=2#section
Changes made:
1. Lowercase scheme: HTTP → https
2. Lowercase host: Example.COM → example.com
3. Remove default port: :80 removed (default for HTTP)
4. Resolve path: /path/../blog/./Post%20One → /blog/Post%20One
5. Lowercase percent-encoded chars: %20 stays lowercase
6. Sort query parameters: b=2&a=1 → a=1&b=2
7. Remove duplicate parameters: b=2&b=2 → b=2
8. Preserve fragment (context-dependent)
Canonical URL in JavaScript
import { URL } from 'url';
function canonicalizeURL(rawUrl, options = {}) {
const {
scheme = 'https',
sortParams = true,
removeFragment = false,
removeTrailingSlash = true,
} = options;
const url = new URL(rawUrl);
// Normalize scheme:
url.protocol = scheme;
// Lowercase hostname:
url.hostname = url.hostname.toLowerCase();
// Remove default ports:
if ((url.protocol === 'https:' && url.port === '443') ||
(url.protocol === 'http:' && url.port === '80')) {
url.port = '';
}
// Resolve dots in path:
url.pathname = url.pathname
.split('/')
.reduce((acc, part) => {
if (part === '..') acc.pop();
else if (part !== '.') acc.push(part);
return acc;
}, [])
.join('/') || '/';
// Remove trailing slash (except root):
if (removeTrailingSlash && url.pathname.length > 1 && url.pathname.endsWith('/')) {
url.pathname = url.pathname.slice(0, -1);
}
// Sort and deduplicate query parameters:
if (sortParams && url.search) {
const params = new URLSearchParams(url.search);
const sorted = new URLSearchParams([...new Map(params)].sort());
url.search = sorted.toString() ? '?' + sorted.toString() : '';
}
// Remove fragment:
if (removeFragment) url.hash = '';
return url.toString();
}
// Examples:
canonicalizeURL('HTTP://EXAMPLE.COM/path/../blog/post?b=2&a=1');
// "https://example.com/blog/post?a=1&b=2"
canonicalizeURL('https://example.com/path/');
// "https://example.com/path"
Python URL normalization
from urllib.parse import urlparse, urlunparse, urlencode, parse_qs
def canonicalize_url(url: str) -> str:
"""Normalize a URL to canonical form."""
parsed = urlparse(url)
# Normalize scheme and host:
scheme = parsed.scheme.lower() or 'https'
netloc = parsed.netloc.lower()
# Remove default ports:
if netloc.endswith(':80') and scheme == 'http':
netloc = netloc[:-3]
elif netloc.endswith(':443') and scheme == 'https':
netloc = netloc[:-4]
# Normalize path (resolve . and ..):
path_parts = []
for part in parsed.path.split('/'):
if part == '..':
if path_parts:
path_parts.pop()
elif part != '.':
path_parts.append(part)
path = '/'.join(path_parts) or '/'
# Remove trailing slash (except root):
if len(path) > 1 and path.endswith('/'):
path = path.rstrip('/')
# Sort query parameters:
if parsed.query:
params = parse_qs(parsed.query, keep_blank_values=True)
sorted_params = sorted(params.items())
query = urlencode(sorted_params, doseq=True)
else:
query = ''
return urlunparse((scheme, netloc, path, '', query, ''))
SEO canonical URLs
For SEO, canonicalization tells search engines the “preferred” URL for duplicate content:
<!-- In HTML: declare canonical URL -->
<link rel="canonical" href="https://example.com/blog/post">
<!-- With trailing slash inconsistency: always redirect to one form -->
# nginx: redirect trailing slash to non-trailing slash:
rewrite ^(/.+)/$ $1 permanent;
# Or prefer trailing slash:
rewrite ^([^.]+[^/])$ $1/ permanent;
// Express: canonicalize before handling:
app.use((req, res, next) => {
const canonical = req.path.replace(/\/+$/, '') || '/';
if (req.path !== canonical) {
return res.redirect(301, canonical + (req.search || ''));
}
next();
});
URL comparison (canonicalize before comparing)
// Without canonicalization: same page, different URLs appear unequal
const url1 = 'HTTP://Example.com/path/';
const url2 = 'https://example.com/path';
url1 === url2; // false
// With canonicalization:
canonicalizeURL(url1) === canonicalizeURL(url2); // true: both → "https://example.com/path"
// Cache key deduplication:
const cache = new Map();
function fetchWithCache(url) {
const key = canonicalizeURL(url);
if (cache.has(key)) return cache.get(key);
const result = fetch(url).then(r => r.json());
cache.set(key, result);
return result;
}
Related tools
- URL Encoder — encode/decode URL components
- URL Query String Guide — build query strings
- Regex Tester — test URL patterns
Related posts
- URL Encoding: The 7 Bugs That Break Your API — Every API has at least one URL-encoding bug. Here are the seven I see most — wha…
- Percent Encoding and RFC 3986 Explained — Why is `+` sometimes a space and sometimes a literal plus? Why does `%2520` show…
- Double URL Encoding — What It Is, Why It Happens, and How to Prevent It — Double URL encoding happens when an already-encoded URL is encoded again, turnin…
- URL Encoding in JavaScript — encodeURIComponent vs encodeURI — JavaScript has two URL encoding functions: encodeURI for full URLs and encodeURI…
- URL Query String Guide — Parsing, Building, and Best Practices — URL query strings pass data in URLs after the ? character. Learn how to parse an…
Related tool
URL Encoder / Decoder
Percent-encode and decode URLs per RFC 3986.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.