X Xerobit

Email Extractor — Extract Email Addresses from Text

An email extractor finds and pulls all email addresses from a block of text using regex pattern matching. Here's how the extraction works, the regex pattern used, and common...

Mian Ali Khalid · · 5 min read
Use the tool
Email & URL Extractor
Extract every email address and URL from a block of text. Regex-based, case-insensitive, deduplicated, sorted output.
Open Email & URL Extractor →

An email extractor scans a block of text and returns all email addresses found. Paste a webpage’s source, a CSV export, a log file, or a block of copied text, and it outputs a clean list of every email address present.

Use the Email & URL Extractor to extract emails and URLs from any text.

How email extraction works

Email extraction uses regular expressions to match patterns that follow the structure local@domain.tld.

The basic email regex:

[\w.+%-]+@[\w-]+\.[\w.]+

Broken down:

  • [\w.+%-]+ — local part: word characters, dots, plus, percent, hyphen (one or more)
  • @ — the at sign
  • [\w-]+ — domain name: word characters and hyphens
  • \. — a literal dot
  • [\w.]+ — TLD and any additional domain parts

More complete version:

[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}

What email addresses the regex matches

Valid addresses (should match):

alice@example.com
bob.smith@company.co.uk
user+tag@gmail.com         (plus addressing)
first.last@subdomain.org
user@xn--n3h.ws            (international domain)

Valid but often missed:

"user name"@example.com   (quoted local part — unusual)
user@[192.168.1.1]        (IP address domain — rare)

Invalid but commonly matched by simple regex:

not.an@email              (TLD too short or missing)
@example.com              (missing local part)
user@                     (missing domain)

A production-grade email validator must also check:

  1. Local part is ≤ 64 characters
  2. Full address is ≤ 254 characters
  3. Domain has valid DNS records (MX record check)

For extraction purposes (finding emails in text), a simple regex is sufficient. For validation (checking if a user’s email is real), use server-side validation or an email verification service.

Extracting emails in code

JavaScript

function extractEmails(text) {
  const pattern = /[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/g;
  return [...new Set(text.match(pattern) || [])];  // deduplicate
}

const text = `
  Contact Alice at alice@example.com or Bob at bob@company.org.
  For support: support@example.com (same as alice@example.com).
`;

console.log(extractEmails(text));
// ['alice@example.com', 'bob@company.org', 'support@example.com']

Python

import re

def extract_emails(text):
    pattern = r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}'
    emails = re.findall(pattern, text)
    # Deduplicate while preserving order:
    return list(dict.fromkeys(emails))

text = """
    Contact Alice at alice@example.com or Bob at bob@company.org.
    For support: support@example.com
"""

print(extract_emails(text))
# ['alice@example.com', 'bob@company.org', 'support@example.com']

Command line (grep)

# Extract emails from a file:
grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}' file.txt

# From stdin:
cat page.html | grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}'

# Deduplicate and sort:
grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}' file.txt | sort -u

Common input sources for email extraction

Exported CSV data: CRM exports, contact lists, spreadsheets — paste the raw CSV, extract emails column.

HTML/webpage source: Copy the page source into the extractor. Emails in contact pages, about pages, and footer sections are easily found.

Log files: Server logs, application logs, and email delivery logs often contain email addresses inline.

Copied email threads: Paste an email chain and extract all addresses from headers, signatures, and CC lines.

Conference/event data: Speaker bios, attendee lists in PDF or word format — extract all contact emails at once.

Email obfuscation (why extraction may miss some)

Some websites obfuscate email addresses to prevent scraping:

Display obfuscation:

alice [at] example [dot] com
alice AT example DOT com

HTML entity encoding:

alice@example.com
(renders as alice@example.com but looks like gibberish in source)

JavaScript-rendered:

document.write('alice' + '@' + 'example.com');

Standard regex extraction won’t catch these. The Email & URL Extractor works on raw text — paste the rendered text (not the HTML source) to extract obfuscated emails that are visually readable.

URL extraction alongside email extraction

Email extractors often pair with URL extraction since both are common structured patterns in text:

URL regex:

https?://[^\s<>"{}|\\^`\[\]]+

The Email & URL Extractor extracts both simultaneously — useful when parsing documentation, web pages, or logs that contain both.


Related posts

Related tool

Email & URL Extractor

Extract every email address and URL from a block of text. Regex-based, case-insensitive, deduplicated, sorted output.

Written by Mian Ali Khalid. Part of the Dev Productivity pillar.