X Xerobit

The 2026 Regex Cheatsheet (PCRE, JS, Python — Side by Side)

A dense reference for modern regex: anchors, character classes, quantifiers, lookarounds, capture groups, named groups, Unicode, and the dialect differences that actually bite.

Mian Ali Khalid · · 12 min read
Use the tool
Regex Tester
Test regular expressions with live match highlighting and explanation.
Open Regex Tester →

You haven’t written a regex in three months. You sit down to write one and immediately can’t remember whether \d matches Unicode digits in your language’s regex engine, whether (?P<name>...) works in JavaScript, and what \b does inside a character class. This cheatsheet is the answer. Dense, opinionated, current to 2026, with the side-by-side dialect notes that turn a “works on my regex tester” into a “works in production.”

For interactive testing while you read, the Regex Tester lets you paste patterns and inputs and see captures inline.

Anchors

PatternMatches
^Start of string (or line in multiline mode)
$End of string (or line in multiline mode)
\AStart of string (always, ignores multiline)
\zEnd of string (always, ignores multiline)
\ZEnd of string, optionally before final newline (PCRE/Python only)
\bWord boundary (between \w and non-\w)
\BNon-word-boundary

JavaScript only has ^, $, \b, \B. No \A/\z. Python and PCRE have all of them.

Character classes

PatternMatches
.Any character except newline (or any character with s flag)
\dDigit ([0-9] in ASCII mode, full Unicode digits with Unicode flag)
\DNon-digit
\wWord character ([A-Za-z0-9_] in ASCII; Unicode letters with Unicode flag)
\WNon-word character
\sWhitespace (space, tab, newline, etc.)
\SNon-whitespace
[abc]One of a, b, or c
[^abc]Anything except a, b, c
[a-z]Lowercase letter (range)
\p{Letter}Any Unicode letter (PCRE, Python re w/ flag, JS w/ u flag)
\P{Letter}Any non-letter

Inside [...], most metacharacters lose their special meaning. [.+] matches a literal . or +. The exceptions are \, ^ (only at start = negation), ] (closes the class), and - (range; literal at start or end).

Quantifiers

PatternMatches
?0 or 1
*0 or more
+1 or more
{n}Exactly n
{n,}n or more
{n,m}n to m
*?0 or more (lazy)
+?1 or more (lazy)
{n,m}?Lazy version of bounded
*+, ++, ?+Possessive (PCRE, Java; not JavaScript or Python re)

The lazy/greedy distinction is where most regex bugs live. Greedy .* matches as much as possible, then backtracks. Lazy .*? matches as little as possible, then expands. For HTML like <a href="x"><a href="y">, the pattern <a.*> matches the whole thing; <a.*?> matches just the first tag.

Possessive quantifiers (*+, ++) are like greedy but refuse to backtrack. They prevent catastrophic backtracking in patterns vulnerable to ReDoS attacks. JavaScript and Python re don’t support them; Python’s regex package does.

Groups and captures

PatternEffect
(abc)Capturing group
(?:abc)Non-capturing group
(?<name>abc)Named capture (modern JS, PCRE)
(?P<name>abc)Named capture (Python, older PCRE)
\1, \2Backreference to numbered group
\k<name>Backreference to named group
(?>abc)Atomic group (no backtracking past it)

JavaScript adopted (?<name>...) for named captures in ES2018. Python uses (?P<name>...) historically; modern Python’s regex package accepts both. PCRE accepts both. In a cross-language codebase, use (?<name>...) if it’s supported everywhere — and verify, because Python’s stdlib re only supports (?P<name>...).

Backreferences (\1) are the only way regex can match repetition of content: (\w+) \1 matches “the the” or “abc abc” but not “the cat.” If your engine is finite-automaton-based (RE2, Rust’s regex crate), backreferences aren’t supported because they make the language non-regular.

Lookarounds

PatternMatches
(?=abc)Lookahead (followed by abc, but doesn’t consume)
(?!abc)Negative lookahead
(?<=abc)Lookbehind (preceded by abc)
(?<!abc)Negative lookbehind

Lookarounds are zero-width: they assert a condition without consuming characters. \d+(?=px) matches a number that’s followed by px, but doesn’t include px in the match.

Lookbehind support varied for years. JavaScript got fixed-width and variable-width lookbehind in ES2018. Python’s stdlib re only allows fixed-width lookbehind; the regex package allows variable. PCRE: variable. RE2 (Go’s default): no lookarounds at all — RE2 enforces linear time and lookarounds break that property.

Flags / modifiers

FlagNameEffect
iIgnore caseCase-insensitive matching
mMultiline^ and $ match line boundaries
sDotall (single-line). matches newlines
xExtendedWhitespace and comments in pattern are ignored
uUnicodeTreat pattern as Unicode codepoints
gGlobal(JavaScript only) match all, not just first

Inline flags: (?i)pattern enables case-insensitive for the pattern. (?-i) disables. (?i:pattern) scopes the flag to a group.

Watch out: in JavaScript, g is what you set when you want replaceAll or matchAll behavior. Without it, replace only replaces the first match, which is rarely what you actually want.

Unicode considerations

In 2026, always use the Unicode flag unless you have a specific reason not to.

  • JavaScript: pass the u flag (or v for the newer set notation). Without it, \w is ASCII-only and \p{...} doesn’t work.
  • Python: regex engine is Unicode-aware by default in Python 3. ASCII mode requires the re.ASCII flag.
  • PCRE: (*UCP) or the u flag.
  • Go (RE2): always Unicode.

The classic gotcha: \w and \d differ between ASCII and Unicode modes. In Unicode mode, \d matches every Unicode “decimal digit” — including Eastern Arabic numerals (٠١٢٣٤٥٦٧٨٩), Devanagari digits, etc. If you want only ASCII 0-9, use [0-9] explicitly.

Catastrophic backtracking

The pattern ^(a+)+$ against input aaaaaaaaaaaaaaaaaaaaa! will hang for seconds or longer. The reason: nested quantifiers create an exponential number of ways the engine can match — the engine tries them all before giving up.

The pattern doesn’t have to be that obviously bad. (a|a)*, (.*a){25}, and ^(.+)*$ all have the same problem. ReDoS attacks exploit these patterns by feeding crafted input to user-controlled regexes.

Defenses:

  1. Avoid nested quantifiers. (a+)+ and (a*)* are red flags.
  2. Use possessive quantifiers or atomic groups in PCRE/Java when you can: (?>a+)+ instead of (a+)+.
  3. Use a linear-time engine for untrusted input: Go’s regexp (RE2), Rust’s regex crate, Python’s re2 package.
  4. Set timeouts. Most engines support a deadline or compilation flag for max time/memory.

Common patterns

PatternUse
^\d+$All digits, whole string
^[A-Za-z][A-Za-z0-9_-]*$Identifier (letter then alnum/dash/underscore)
\b\w+\bWord
\s+Run of whitespace (collapse to one)
`^\s+\s+$`
`^(?!.*\b(spamjunk)\b).*$`
^https?://[^\s/$.?#].[^\s]*$URL (rough; for validation use a parser)
^[\w.+-]+@[\w-]+\.[\w.-]+$Email (rough; RFC-compliant is much harder)

A note on email and URL regexes: don’t use regex for these in production validation. RFC 5322 email addresses are a nightmare with quoted local parts and IP literals. URL parsing has a WHATWG URL spec for a reason. Use the URL constructor in JavaScript or urllib.parse in Python. Regex is fine for “looks roughly like an email/URL” but not for “is a valid email/URL.”

For percent-encoding details on URL parts that regex can help with, see the linked post.

Cross-dialect cheat-strip

FeatureJavaScriptPython (re)Python (regex)PCREGo (regexp)
Named captures (?<n>...)✓ (ES2018)
Named captures (?P<n>...)
Lookbehind variable-width✗ (fixed only)
\p{Letter} Unicode✓ (with u)
Backreferences
Possessive quantifiers
Linear-time guarantee

A working principle

Regex is the right tool for finding patterns in strings. It is the wrong tool for parsing structured data. If you find yourself adding lookaheads to handle nesting, or backreferences to validate an XML or JSON structure, or a 200-character pattern with comments — stop. You’re using a regex to write a parser. Use a parser.

When you do reach for regex, write it in extended mode (x flag) with whitespace and comments. Build it up incrementally, test against representative inputs in the Regex Tester, and watch for catastrophic backtracking on adversarial input. The pattern that works on your example may explode on input you haven’t seen.

The boring answer is the right answer: regex for matching simple patterns, parser for structured data, library function for emails and URLs.

Further reading


Related posts

Related tool

Regex Tester

Test regular expressions with live match highlighting and explanation.

Written by Mian Ali Khalid. Part of the Dev Productivity pillar.