Email Extractor — Extract Email Addresses from Text
An email extractor finds and pulls all email addresses from a block of text using regex pattern matching. Here's how the extraction works, the regex pattern used, and common...
An email extractor scans a block of text and returns all email addresses found. Paste a webpage’s source, a CSV export, a log file, or a block of copied text, and it outputs a clean list of every email address present.
Use the Email & URL Extractor to extract emails and URLs from any text.
How email extraction works
Email extraction uses regular expressions to match patterns that follow the structure local@domain.tld.
The basic email regex:
[\w.+%-]+@[\w-]+\.[\w.]+
Broken down:
[\w.+%-]+— local part: word characters, dots, plus, percent, hyphen (one or more)@— the at sign[\w-]+— domain name: word characters and hyphens\.— a literal dot[\w.]+— TLD and any additional domain parts
More complete version:
[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}
What email addresses the regex matches
Valid addresses (should match):
alice@example.com
bob.smith@company.co.uk
user+tag@gmail.com (plus addressing)
first.last@subdomain.org
user@xn--n3h.ws (international domain)
Valid but often missed:
"user name"@example.com (quoted local part — unusual)
user@[192.168.1.1] (IP address domain — rare)
Invalid but commonly matched by simple regex:
not.an@email (TLD too short or missing)
@example.com (missing local part)
user@ (missing domain)
A production-grade email validator must also check:
- Local part is ≤ 64 characters
- Full address is ≤ 254 characters
- Domain has valid DNS records (MX record check)
For extraction purposes (finding emails in text), a simple regex is sufficient. For validation (checking if a user’s email is real), use server-side validation or an email verification service.
Extracting emails in code
JavaScript
function extractEmails(text) {
const pattern = /[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/g;
return [...new Set(text.match(pattern) || [])]; // deduplicate
}
const text = `
Contact Alice at alice@example.com or Bob at bob@company.org.
For support: support@example.com (same as alice@example.com).
`;
console.log(extractEmails(text));
// ['alice@example.com', 'bob@company.org', 'support@example.com']
Python
import re
def extract_emails(text):
pattern = r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}'
emails = re.findall(pattern, text)
# Deduplicate while preserving order:
return list(dict.fromkeys(emails))
text = """
Contact Alice at alice@example.com or Bob at bob@company.org.
For support: support@example.com
"""
print(extract_emails(text))
# ['alice@example.com', 'bob@company.org', 'support@example.com']
Command line (grep)
# Extract emails from a file:
grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}' file.txt
# From stdin:
cat page.html | grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}'
# Deduplicate and sort:
grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}' file.txt | sort -u
Common input sources for email extraction
Exported CSV data: CRM exports, contact lists, spreadsheets — paste the raw CSV, extract emails column.
HTML/webpage source: Copy the page source into the extractor. Emails in contact pages, about pages, and footer sections are easily found.
Log files: Server logs, application logs, and email delivery logs often contain email addresses inline.
Copied email threads: Paste an email chain and extract all addresses from headers, signatures, and CC lines.
Conference/event data: Speaker bios, attendee lists in PDF or word format — extract all contact emails at once.
Email obfuscation (why extraction may miss some)
Some websites obfuscate email addresses to prevent scraping:
Display obfuscation:
alice [at] example [dot] com
alice AT example DOT com
HTML entity encoding:
alice@example.com
(renders as alice@example.com but looks like gibberish in source)
JavaScript-rendered:
document.write('alice' + '@' + 'example.com');
Standard regex extraction won’t catch these. The Email & URL Extractor works on raw text — paste the rendered text (not the HTML source) to extract obfuscated emails that are visually readable.
URL extraction alongside email extraction
Email extractors often pair with URL extraction since both are common structured patterns in text:
URL regex:
https?://[^\s<>"{}|\\^`\[\]]+
The Email & URL Extractor extracts both simultaneously — useful when parsing documentation, web pages, or logs that contain both.
Related tools
- Email & URL Extractor — extract emails and URLs from text
- Email Extractor Guide — detailed usage and edge cases
- Regex Tester — test and build your own extraction patterns
Related posts
- Contact Information Extraction — Emails, Phones, and URLs from Text — Extract emails, phone numbers, URLs, and addresses from unstructured text using …
- Email Extractor — How to Pull Email Addresses from Text — An email extractor scans a block of text and finds all valid email addresses. He…
- Email Extractor in Python — regex, html.parser, and BeautifulSoup — Extract email addresses from plain text, HTML pages, and files using Python. Thi…
- Regex Tester Online — Test Regular Expressions with Live Match Highlighting — A regex tester shows which parts of your test string match your pattern in real …
- URL Encode Decode — Encode and Decode URLs Online — URL encoding converts special characters to percent-encoded sequences (%20 for s…
Related tool
Extract every email address and URL from a block of text. Regex-based, case-insensitive, deduplicated, sorted output.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.