How to use the email extractor

Paste your text — drop in any block of text: a web page source, a CSV export, a log file, a chat transcript, or raw HTML. The tool accepts any plain text input up to several MB.
Pick your mode — choose "Emails only" to pull email addresses, "URLs only" for links and web addresses, or "Both" to extract everything in one pass.
Copy the results — click Copy to get the deduplicated, sorted list to your clipboard. One result per line, ready to paste into a spreadsheet, import into an email client, or pipe into another tool.

What the email extractor finds

The email pattern targets the format local-part@domain.tld. It catches:

Plus-addressing (user+tag@example.com) — common for Gmail filters and email tracking.
Dots and dashes in the local part (first.last@company.co.uk).
Subdomains in the domain part (support@mail.example.com).
New TLDs — the regex requires at least 2 characters for the TLD, covering .io, .ai, .dev, and longer extensions like .email.

What it intentionally skips: RFC 5322 quoted local parts like "user name"@example.com are technically valid but essentially never appear in real-world data. Supporting them would add significant complexity for near-zero practical benefit.

What the URL extractor finds

The URL extractor matches two categories:

Protocol URLs — anything starting with http:// or https://, followed by the rest of the URL including paths, query strings, and fragments.
Bare domains — domains without a protocol, using a conservative list of common TLDs (.com, .org, .net, .io, .dev, .app, .co, .ai, and others). The TLD list is intentionally limited to avoid false positives — version strings like v1.2.3 or file paths like module.exports should not match.

For text extracted from HTML, the best approach is to paste the raw HTML source rather than rendered text — href values are unambiguous and the extractor will find them cleanly.

Common use cases

Lead extraction from contact pages — paste the page source of a company's contact or team page and get every email address in one step. Far faster than scanning manually.
Log analysis — pull every URL referenced in an nginx or Apache access log to build a request inventory, identify external dependencies, or audit traffic patterns.
Documentation link audit — extract every URL from a Markdown document or README to check for broken links. Pair with a link checker CLI to validate the full list.
Email migration — export contacts to text, extract the addresses, and import the clean list into a new email client or CRM without manual copy-paste.
Scraping cleanup — when raw scraped text contains a mix of content and email/URL noise, extraction gives you a clean separated list.

Deduplication and sorting

Real-world text almost always contains the same email address or URL multiple times. A contact page might list the same support email in the header, footer, and body. A log file might reference the same API endpoint thousands of times.

This tool deduplicates automatically: one email that appears in 15 places becomes one line in the output. Emails are lowercased before deduplication (User@Example.com and user@example.com are treated as the same address), then sorted alphabetically. URLs preserve their original case but are deduplicated on exact match.

The result is a minimal, clean list — useful as the starting point for an import, a validation run, or a manual review.

Email validation vs email extraction

These are two different problems that developers frequently confuse:

Extraction — find strings that look like email addresses in a body of text. Pattern matching. Fast, local, no network. That's what this tool does.
Validation — determine whether an email address will successfully receive mail. This requires either sending a test message (impractical in bulk) or querying the domain's MX records and optionally using an email verification API.

An extracted address like fake123@disposable.ninja passes the extraction regex but may bounce on send. An address like user@company.com looks valid and may also bounce if the mailbox doesn't exist. If your use case requires confidence that addresses are live, extraction is only the first step — you'll need a validation layer.

Working with extracted emails — legal context

Email addresses are personal data under GDPR in the EU and similar regulations globally. A few important points for developers:

Opt-in vs scraped lists — sending marketing email to a list of scraped addresses you don't have consent for violates CAN-SPAM (US), CASL (Canada), GDPR (EU), and most email platform terms of service. ESP platforms (Mailchimp, SendGrid, etc.) will suspend accounts that import cold-scraped lists.
Internal/owned data is different — extracting emails from your own exported data (your CRM, your support tickets, your logs) for migration or analysis is generally fine. The restriction is on using scraped third-party data for unsolicited outreach.
B2B cold email — legitimate practices exist but require sourcing from compliant databases with documented legal basis. Scraping a website and emailing everyone listed is not that.

Technical notes

The tool uses JavaScript regex patterns applied entirely in your browser. No text is sent to any server. The email pattern is a pragmatic approximation of RFC 5322 — full RFC 5322 compliance would require a parser, not a regex, and would match many formats that don't exist in real data. The URL pattern prioritizes precision (low false positives) over recall (catching every possible URL format). Both patterns have been tested against large real-world datasets of logs, HTML exports, and mailing list backups.

FAQ

Is my text sent to a server?

No. Extraction runs entirely in your browser via JavaScript. Your text never leaves your machine. You can disconnect from the internet after the page loads and the tool still works.

Does it support international email addresses (IDN)?

Internationalized domain names (IDNs) — like user@münchen.de or addresses with Unicode in the local part — are not matched by the current regex pattern. The pattern targets ASCII addresses, which covers the vast majority of real-world usage. Non-ASCII email addresses, while valid per RFC 6530, are uncommon in practice and require a different encoding (punycode for the domain part) that most mail servers still don't fully support.

What is the maximum text size?

There is no hard enforced limit, but performance degrades above ~10 MB of input text. The regex engine processes linearly, so very large inputs (multi-hundred-MB log files) will cause the UI to freeze for several seconds. For files that large, use a command-line tool: grep -oE '[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}' file.txt | sort -u extracts and deduplicates emails in milliseconds on any size file.

Related tools

Regex Tester — Test regular expressions with live match highlighting and explanation.
Word Counter — Count words, characters, sentences, paragraphs, and lines. Reading time estimate, char-limit indicators for X, LinkedIn, meta titles, and more.
Text Diff — Compare two text blocks line-by-line or word-by-word. Unified and split view. Shows added, removed, and changed segments with full color coding.
Case Converter — Convert text between camelCase, PascalCase, snake_case, kebab-case, SCREAMING_CASE, Title Case, sentence case, and more. Bulk mode.

Pillar

Part of Dev Productivity.

Written by Mian Ali Khalid. Last updated 2026-05-13.

Email & URL Extractor