Bot Detection Using User Agent Strings — Googlebot, Bingbot, and Crawlers

Identify search engine crawlers, social media bots, and malicious scrapers from user agent strings. Includes patterns for Googlebot, Bingbot, common bots, and how to verify...

Mian Ali Khalid · 2026-05-11 · 5 min read

Use the tool

User Agent Parser

Parse any User-Agent string into browser, OS, device, and engine. Or detect your own. Built on the maintained ua-parser-js dataset.

Open User Agent Parser →

Bots use user agent strings to identify themselves — but anyone can fake a UA. Distinguishing real Googlebot from a scraper spoofing it requires reverse DNS verification. Here’s how to detect, classify, and verify bots.

Use the User Agent Parser to classify any user agent string.

Common search engine bot UAs

Googlebot (web):
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Googlebot (mobile):
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 ... (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Bingbot:
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

DuckDuckBot:
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)

Yandex:
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

Baidu:
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

AhrefsBot:
Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)

Semrushbot:
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)

Bot detection patterns

const BOT_PATTERNS = {
  // Search engines (good bots):
  googlebot: /Googlebot/i,
  bingbot: /bingbot/i,
  duckduckbot: /DuckDuckBot/i,
  yandex: /YandexBot/i,
  baidu: /Baiduspider/i,
  
  // SEO crawlers:
  ahrefs: /AhrefsBot/i,
  semrush: /SemrushBot/i,
  majestic: /MJ12bot/i,
  
  // Social media crawlers:
  facebook: /facebookexternalhit/i,
  twitter: /Twitterbot/i,
  linkedin: /LinkedInBot/i,
  slack: /Slackbot/i,
  telegram: /TelegramBot/i,
  
  // Generic patterns:
  generic: /bot|crawler|spider|scraper|fetch|curl|wget|python-requests/i,
};

function classifyUserAgent(ua = navigator.userAgent) {
  for (const [name, pattern] of Object.entries(BOT_PATTERNS)) {
    if (pattern.test(ua)) return { isBot: true, type: name };
  }
  return { isBot: false, type: 'human' };
}

Node.js / Express bot detection

app.use((req, res, next) => {
  const ua = req.headers['user-agent'] || '';
  const { isBot, type } = classifyUserAgent(ua);
  
  req.isBot = isBot;
  req.botType = type;
  
  // Log bot activity:
  if (isBot) {
    console.log(`Bot visit: ${type} from ${req.ip} → ${req.path}`);
  }
  
  next();
});

// Serve different content to bots:
app.get('/page', (req, res) => {
  if (req.isBot && req.botType === 'googlebot') {
    // Serve pre-rendered static HTML for SEO
    return res.sendFile('static-rendered.html');
  }
  res.sendFile('app.html');
});

Verify Googlebot with reverse DNS

Anyone can fake a Googlebot UA. Verify it’s real Google by checking that the IP reverse-resolves to googlebot.com or google.com:

import dns from 'dns/promises';

async function verifyGooglebot(ip) {
  try {
    // Step 1: Reverse DNS lookup (IP → hostname)
    const [hostname] = await dns.reverse(ip);
    
    // Step 2: Must end in googlebot.com or google.com
    if (!hostname.endsWith('.googlebot.com') && !hostname.endsWith('.google.com')) {
      return false;
    }
    
    // Step 3: Forward lookup (hostname → IP must match original IP)
    const addresses = await dns.lookup(hostname);
    return addresses.address === ip;
    
  } catch {
    return false;
  }
}

// Usage in Express:
app.use(async (req, res, next) => {
  const ua = req.headers['user-agent'] || '';
  if (/Googlebot/i.test(ua)) {
    const isReal = await verifyGooglebot(req.ip);
    req.isVerifiedGooglebot = isReal;
    if (!isReal) {
      console.warn(`Fake Googlebot from ${req.ip}`);
    }
  }
  next();
});

Python: detect bots from UA

import re

BOT_PATTERNS = {
    'googlebot': re.compile(r'Googlebot', re.I),
    'bingbot': re.compile(r'bingbot', re.I),
    'social': re.compile(r'facebookexternalhit|Twitterbot|LinkedInBot|Slackbot', re.I),
    'seo': re.compile(r'AhrefsBot|SemrushBot|MJ12bot', re.I),
    'generic': re.compile(r'bot|crawler|spider|curl|wget|python-requests', re.I),
}

def classify_bot(ua: str) -> dict:
    for name, pattern in BOT_PATTERNS.items():
        if pattern.search(ua):
            return {'is_bot': True, 'type': name}
    return {'is_bot': False, 'type': 'human'}

Block bad bots in Nginx

# Block scrapers and bad bots (not search engines):
map $http_user_agent $blocked_bot {
    default                 0;
    ~*SemrushBot            0;  # Allow SEO bots
    ~*AhrefsBot             0;
    ~*python-requests       1;  # Block common scrapers
    ~*scrapy                1;
    ~*wget                  1;
    ~*curl                  1;
    ~*Go-http-client        1;
    ""                      1;  # Block empty UA
}

server {
    if ($blocked_bot) {
        return 403;
    }
}

User Agent Parser — identify any user agent
User Agent String Examples — UA string reference
Mobile User Agent Detection — iOS and Android detection

Browser Detection in JavaScript — Feature Detection vs User Agent Parsing — Detect browsers, OS, and device type in JavaScript using user agent strings, fea…
User-Agent Client Hints — Modern Browser Detection Without UA Sniffing — User-Agent Client Hints replace UA string parsing with structured HTTP headers. …
Device Type Detection — Mobile, Tablet, Desktop in JavaScript — Detect whether a user is on mobile, tablet, or desktop using user agent strings,…
Mobile User Agent Detection — iOS, Android, and Common Patterns — Detect iOS and Android devices from user agent strings. Includes UA patterns for…
User Agent String Examples — What UA Strings Look Like in 2025 — User agent strings identify browsers, operating systems, and devices. Here's wha…

Related tool