The 2026 Regex Cheatsheet (PCRE, JS, Python — Side by Side)
A dense reference for modern regex: anchors, character classes, quantifiers, lookarounds, capture groups, named groups, Unicode, and the dialect differences that actually bite.
You haven’t written a regex in three months. You sit down to write one and immediately can’t remember whether \d matches Unicode digits in your language’s regex engine, whether (?P<name>...) works in JavaScript, and what \b does inside a character class. This cheatsheet is the answer. Dense, opinionated, current to 2026, with the side-by-side dialect notes that turn a “works on my regex tester” into a “works in production.”
For interactive testing while you read, the Regex Tester lets you paste patterns and inputs and see captures inline.
Anchors
| Pattern | Matches |
|---|---|
^ | Start of string (or line in multiline mode) |
$ | End of string (or line in multiline mode) |
\A | Start of string (always, ignores multiline) |
\z | End of string (always, ignores multiline) |
\Z | End of string, optionally before final newline (PCRE/Python only) |
\b | Word boundary (between \w and non-\w) |
\B | Non-word-boundary |
JavaScript only has ^, $, \b, \B. No \A/\z. Python and PCRE have all of them.
Character classes
| Pattern | Matches |
|---|---|
. | Any character except newline (or any character with s flag) |
\d | Digit ([0-9] in ASCII mode, full Unicode digits with Unicode flag) |
\D | Non-digit |
\w | Word character ([A-Za-z0-9_] in ASCII; Unicode letters with Unicode flag) |
\W | Non-word character |
\s | Whitespace (space, tab, newline, etc.) |
\S | Non-whitespace |
[abc] | One of a, b, or c |
[^abc] | Anything except a, b, c |
[a-z] | Lowercase letter (range) |
\p{Letter} | Any Unicode letter (PCRE, Python re w/ flag, JS w/ u flag) |
\P{Letter} | Any non-letter |
Inside [...], most metacharacters lose their special meaning. [.+] matches a literal . or +. The exceptions are \, ^ (only at start = negation), ] (closes the class), and - (range; literal at start or end).
Quantifiers
| Pattern | Matches |
|---|---|
? | 0 or 1 |
* | 0 or more |
+ | 1 or more |
{n} | Exactly n |
{n,} | n or more |
{n,m} | n to m |
*? | 0 or more (lazy) |
+? | 1 or more (lazy) |
{n,m}? | Lazy version of bounded |
*+, ++, ?+ | Possessive (PCRE, Java; not JavaScript or Python re) |
The lazy/greedy distinction is where most regex bugs live. Greedy .* matches as much as possible, then backtracks. Lazy .*? matches as little as possible, then expands. For HTML like <a href="x"><a href="y">, the pattern <a.*> matches the whole thing; <a.*?> matches just the first tag.
Possessive quantifiers (*+, ++) are like greedy but refuse to backtrack. They prevent catastrophic backtracking in patterns vulnerable to ReDoS attacks. JavaScript and Python re don’t support them; Python’s regex package does.
Groups and captures
| Pattern | Effect |
|---|---|
(abc) | Capturing group |
(?:abc) | Non-capturing group |
(?<name>abc) | Named capture (modern JS, PCRE) |
(?P<name>abc) | Named capture (Python, older PCRE) |
\1, \2 | Backreference to numbered group |
\k<name> | Backreference to named group |
(?>abc) | Atomic group (no backtracking past it) |
JavaScript adopted (?<name>...) for named captures in ES2018. Python uses (?P<name>...) historically; modern Python’s regex package accepts both. PCRE accepts both. In a cross-language codebase, use (?<name>...) if it’s supported everywhere — and verify, because Python’s stdlib re only supports (?P<name>...).
Backreferences (\1) are the only way regex can match repetition of content: (\w+) \1 matches “the the” or “abc abc” but not “the cat.” If your engine is finite-automaton-based (RE2, Rust’s regex crate), backreferences aren’t supported because they make the language non-regular.
Lookarounds
| Pattern | Matches |
|---|---|
(?=abc) | Lookahead (followed by abc, but doesn’t consume) |
(?!abc) | Negative lookahead |
(?<=abc) | Lookbehind (preceded by abc) |
(?<!abc) | Negative lookbehind |
Lookarounds are zero-width: they assert a condition without consuming characters. \d+(?=px) matches a number that’s followed by px, but doesn’t include px in the match.
Lookbehind support varied for years. JavaScript got fixed-width and variable-width lookbehind in ES2018. Python’s stdlib re only allows fixed-width lookbehind; the regex package allows variable. PCRE: variable. RE2 (Go’s default): no lookarounds at all — RE2 enforces linear time and lookarounds break that property.
Flags / modifiers
| Flag | Name | Effect |
|---|---|---|
i | Ignore case | Case-insensitive matching |
m | Multiline | ^ and $ match line boundaries |
s | Dotall (single-line) | . matches newlines |
x | Extended | Whitespace and comments in pattern are ignored |
u | Unicode | Treat pattern as Unicode codepoints |
g | Global | (JavaScript only) match all, not just first |
Inline flags: (?i)pattern enables case-insensitive for the pattern. (?-i) disables. (?i:pattern) scopes the flag to a group.
Watch out: in JavaScript, g is what you set when you want replaceAll or matchAll behavior. Without it, replace only replaces the first match, which is rarely what you actually want.
Unicode considerations
In 2026, always use the Unicode flag unless you have a specific reason not to.
- JavaScript: pass the
uflag (orvfor the newer set notation). Without it,\wis ASCII-only and\p{...}doesn’t work. - Python: regex engine is Unicode-aware by default in Python 3. ASCII mode requires the
re.ASCIIflag. - PCRE:
(*UCP)or theuflag. - Go (RE2): always Unicode.
The classic gotcha: \w and \d differ between ASCII and Unicode modes. In Unicode mode, \d matches every Unicode “decimal digit” — including Eastern Arabic numerals (٠١٢٣٤٥٦٧٨٩), Devanagari digits, etc. If you want only ASCII 0-9, use [0-9] explicitly.
Catastrophic backtracking
The pattern ^(a+)+$ against input aaaaaaaaaaaaaaaaaaaaa! will hang for seconds or longer. The reason: nested quantifiers create an exponential number of ways the engine can match — the engine tries them all before giving up.
The pattern doesn’t have to be that obviously bad. (a|a)*, (.*a){25}, and ^(.+)*$ all have the same problem. ReDoS attacks exploit these patterns by feeding crafted input to user-controlled regexes.
Defenses:
- Avoid nested quantifiers.
(a+)+and(a*)*are red flags. - Use possessive quantifiers or atomic groups in PCRE/Java when you can:
(?>a+)+instead of(a+)+. - Use a linear-time engine for untrusted input: Go’s
regexp(RE2), Rust’sregexcrate, Python’sre2package. - Set timeouts. Most engines support a deadline or compilation flag for max time/memory.
Common patterns
| Pattern | Use |
|---|---|
^\d+$ | All digits, whole string |
^[A-Za-z][A-Za-z0-9_-]*$ | Identifier (letter then alnum/dash/underscore) |
\b\w+\b | Word |
\s+ | Run of whitespace (collapse to one) |
| `^\s+ | \s+$` |
| `^(?!.*\b(spam | junk)\b).*$` |
^https?://[^\s/$.?#].[^\s]*$ | URL (rough; for validation use a parser) |
^[\w.+-]+@[\w-]+\.[\w.-]+$ | Email (rough; RFC-compliant is much harder) |
A note on email and URL regexes: don’t use regex for these in production validation. RFC 5322 email addresses are a nightmare with quoted local parts and IP literals. URL parsing has a WHATWG URL spec for a reason. Use the URL constructor in JavaScript or urllib.parse in Python. Regex is fine for “looks roughly like an email/URL” but not for “is a valid email/URL.”
For percent-encoding details on URL parts that regex can help with, see the linked post.
Cross-dialect cheat-strip
| Feature | JavaScript | Python (re) | Python (regex) | PCRE | Go (regexp) |
|---|---|---|---|---|---|
Named captures (?<n>...) | ✓ (ES2018) | ✗ | ✓ | ✓ | ✓ |
Named captures (?P<n>...) | ✗ | ✓ | ✓ | ✓ | ✓ |
| Lookbehind variable-width | ✓ | ✗ (fixed only) | ✓ | ✓ | ✗ |
\p{Letter} Unicode | ✓ (with u) | ✗ | ✓ | ✓ | ✗ |
| Backreferences | ✓ | ✓ | ✓ | ✓ | ✗ |
| Possessive quantifiers | ✗ | ✗ | ✓ | ✓ | ✗ |
| Linear-time guarantee | ✗ | ✗ | ✗ | ✗ | ✓ |
A working principle
Regex is the right tool for finding patterns in strings. It is the wrong tool for parsing structured data. If you find yourself adding lookaheads to handle nesting, or backreferences to validate an XML or JSON structure, or a 200-character pattern with comments — stop. You’re using a regex to write a parser. Use a parser.
When you do reach for regex, write it in extended mode (x flag) with whitespace and comments. Build it up incrementally, test against representative inputs in the Regex Tester, and watch for catastrophic backtracking on adversarial input. The pattern that works on your example may explode on input you haven’t seen.
The boring answer is the right answer: regex for matching simple patterns, parser for structured data, library function for emails and URLs.
Further reading
- The 10 Most Common JSON Validation Errors — when you stop using regex on JSON
- Percent-Encoding (RFC 3986) — URL parts where regex helps
- CSV Quoting and Escaping Rules — another format where naive splitting fails
- regex101.com — interactive tester with engine selection
- PCRE2 documentation
- Russ Cox — Regular Expression Matching Can Be Simple And Fast
Related posts
- Catastrophic Backtracking: The Regex That Kills Your Server — One regex with nested quantifiers can reduce your server to 100% CPU in millisec…
- JavaScript Regex Flags — g, i, m, s, u, and v Explained — JavaScript regex flags change how patterns match. Learn when to use global /g, c…
- Regex Email Validation — Patterns, Edge Cases, and Best Practices — Email validation regex needs to balance strictness with real-world email formats…
- The 10 Most Common JSON Validation Errors (and How to Fix Them) — Every JSON parse error in production traces back to one of ten root causes. This…
- Percent Encoding and RFC 3986 Explained — Why is `+` sometimes a space and sometimes a literal plus? Why does `%2520` show…
Related tool
Test regular expressions with live match highlighting and explanation.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.