Bot Detection Using User Agent Strings — Googlebot, Bingbot, and Crawlers
Identify search engine crawlers, social media bots, and malicious scrapers from user agent strings. Includes patterns for Googlebot, Bingbot, common bots, and how to verify...
Bots use user agent strings to identify themselves — but anyone can fake a UA. Distinguishing real Googlebot from a scraper spoofing it requires reverse DNS verification. Here’s how to detect, classify, and verify bots.
Use the User Agent Parser to classify any user agent string.
Common search engine bot UAs
Googlebot (web):
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Googlebot (mobile):
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 ... (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Bingbot:
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
DuckDuckBot:
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)
Yandex:
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Baidu:
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
AhrefsBot:
Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)
Semrushbot:
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)
Bot detection patterns
const BOT_PATTERNS = {
// Search engines (good bots):
googlebot: /Googlebot/i,
bingbot: /bingbot/i,
duckduckbot: /DuckDuckBot/i,
yandex: /YandexBot/i,
baidu: /Baiduspider/i,
// SEO crawlers:
ahrefs: /AhrefsBot/i,
semrush: /SemrushBot/i,
majestic: /MJ12bot/i,
// Social media crawlers:
facebook: /facebookexternalhit/i,
twitter: /Twitterbot/i,
linkedin: /LinkedInBot/i,
slack: /Slackbot/i,
telegram: /TelegramBot/i,
// Generic patterns:
generic: /bot|crawler|spider|scraper|fetch|curl|wget|python-requests/i,
};
function classifyUserAgent(ua = navigator.userAgent) {
for (const [name, pattern] of Object.entries(BOT_PATTERNS)) {
if (pattern.test(ua)) return { isBot: true, type: name };
}
return { isBot: false, type: 'human' };
}
Node.js / Express bot detection
app.use((req, res, next) => {
const ua = req.headers['user-agent'] || '';
const { isBot, type } = classifyUserAgent(ua);
req.isBot = isBot;
req.botType = type;
// Log bot activity:
if (isBot) {
console.log(`Bot visit: ${type} from ${req.ip} → ${req.path}`);
}
next();
});
// Serve different content to bots:
app.get('/page', (req, res) => {
if (req.isBot && req.botType === 'googlebot') {
// Serve pre-rendered static HTML for SEO
return res.sendFile('static-rendered.html');
}
res.sendFile('app.html');
});
Verify Googlebot with reverse DNS
Anyone can fake a Googlebot UA. Verify it’s real Google by checking that the IP reverse-resolves to googlebot.com or google.com:
import dns from 'dns/promises';
async function verifyGooglebot(ip) {
try {
// Step 1: Reverse DNS lookup (IP → hostname)
const [hostname] = await dns.reverse(ip);
// Step 2: Must end in googlebot.com or google.com
if (!hostname.endsWith('.googlebot.com') && !hostname.endsWith('.google.com')) {
return false;
}
// Step 3: Forward lookup (hostname → IP must match original IP)
const addresses = await dns.lookup(hostname);
return addresses.address === ip;
} catch {
return false;
}
}
// Usage in Express:
app.use(async (req, res, next) => {
const ua = req.headers['user-agent'] || '';
if (/Googlebot/i.test(ua)) {
const isReal = await verifyGooglebot(req.ip);
req.isVerifiedGooglebot = isReal;
if (!isReal) {
console.warn(`Fake Googlebot from ${req.ip}`);
}
}
next();
});
Python: detect bots from UA
import re
BOT_PATTERNS = {
'googlebot': re.compile(r'Googlebot', re.I),
'bingbot': re.compile(r'bingbot', re.I),
'social': re.compile(r'facebookexternalhit|Twitterbot|LinkedInBot|Slackbot', re.I),
'seo': re.compile(r'AhrefsBot|SemrushBot|MJ12bot', re.I),
'generic': re.compile(r'bot|crawler|spider|curl|wget|python-requests', re.I),
}
def classify_bot(ua: str) -> dict:
for name, pattern in BOT_PATTERNS.items():
if pattern.search(ua):
return {'is_bot': True, 'type': name}
return {'is_bot': False, 'type': 'human'}
Block bad bots in Nginx
# Block scrapers and bad bots (not search engines):
map $http_user_agent $blocked_bot {
default 0;
~*SemrushBot 0; # Allow SEO bots
~*AhrefsBot 0;
~*python-requests 1; # Block common scrapers
~*scrapy 1;
~*wget 1;
~*curl 1;
~*Go-http-client 1;
"" 1; # Block empty UA
}
server {
if ($blocked_bot) {
return 403;
}
}
Related tools
- User Agent Parser — identify any user agent
- User Agent String Examples — UA string reference
- Mobile User Agent Detection — iOS and Android detection
Related posts
- Browser Detection in JavaScript — Feature Detection vs User Agent Parsing — Detect browsers, OS, and device type in JavaScript using user agent strings, fea…
- User-Agent Client Hints — Modern Browser Detection Without UA Sniffing — User-Agent Client Hints replace UA string parsing with structured HTTP headers. …
- Device Type Detection — Mobile, Tablet, Desktop in JavaScript — Detect whether a user is on mobile, tablet, or desktop using user agent strings,…
- Mobile User Agent Detection — iOS, Android, and Common Patterns — Detect iOS and Android devices from user agent strings. Includes UA patterns for…
- User Agent String Examples — What UA Strings Look Like in 2025 — User agent strings identify browsers, operating systems, and devices. Here's wha…
Related tool
Parse any User-Agent string into browser, OS, device, and engine. Or detect your own. Built on the maintained ua-parser-js dataset.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.