X Xerobit

Word Frequency Counter — Find Keyword Density and Overused Words

A word frequency counter ranks every unique word by how often it appears. Use it to find keyword density, spot overused words, and diagnose accidental keyword stuffing.

Mian Ali Khalid · · 6 min read
Use the tool
Word Counter
Count words, characters, sentences, paragraphs, and lines. Reading time estimate, char-limit indicators for X, LinkedIn, meta titles, and more.
Open Word Counter →

A word frequency counter reads a block of text and returns every unique word alongside the number of times it appears, sorted from most frequent to least. That list reveals more about your writing than a raw word count ever could.

The Word Counter on this site counts words, characters, sentences, and paragraphs. For detailed frequency analysis of every term, paste your text into the counter and use the word frequency view.

What a word frequency count reveals

Keyword density for SEO

Keyword density is the percentage of times a target keyword appears relative to total words:

Keyword density = (keyword occurrences ÷ total words) × 100

For a 1,000-word article where your target keyword appears 8 times: 8 ÷ 1,000 × 100 = 0.8%

Google has never confirmed an ideal keyword density, and keyword stuffing (artificially inflating density) is a negative ranking signal. The guideline that practitioners follow is:

  • 1–2% for the primary keyword: natural-feeling, enough to establish topical relevance
  • 0.5–1% for secondary keywords
  • Above 3%: starts to feel forced; risk of over-optimization
  • Above 5%: visible keyword stuffing; likely to harm rankings

The more important metric than density is keyword placement: does the keyword appear in the title, the first paragraph, subheadings, and the last paragraph? Placement matters more than count.

Overused words

Every writer has verbal tics — words they reach for too often. Run a frequency count on a draft and your overused words appear immediately. Common offenders:

  • “Very” — usually a signal to find a stronger word (“very tired” → “exhausted”)
  • “Really” — same problem
  • “Just” — often padding (“just click here” → “click here”)
  • “Thing” — vague; find the specific noun
  • “Get” — overused; often replaceable with “receive,” “obtain,” “achieve”

For technical writing, frequency counts reveal domain jargon that non-expert readers won’t follow. If “asynchronous” appears 15 times in 800 words, you’re writing for an expert audience whether you intended to or not.

Thin content detection

A high word count with low vocabulary diversity signals thin content. If 60% of your word count comes from the 10 most frequent words, the actual unique information density is low. This doesn’t map directly to Google’s “thin content” concept (which is more about value than vocabulary variety), but it’s a useful proxy for reviewing whether you’re saying the same things repeatedly.

How word frequency analysis works

Tokenization

Before counting, the text is tokenized — split into individual words. Simple tokenizers split on whitespace. Better tokenizers also handle:

  • Punctuation attached to words (“word,” → “word”)
  • Hyphenated compounds (“well-known” — should this be one word or two?)
  • Possessives (“developer’s” → “developer” or keep as-is?)
  • Contractions (“don’t” → “don’t” or “do” + “not”?)

Most word counters for content use simple tokenization (split on whitespace, strip punctuation). Linguistic NLP tools do more sophisticated analysis.

Stop word removal

Stop words are common words that carry minimal semantic meaning: the, a, an, is, are, was, were, and, or, but, in, on, at, to, for, of, with…

A word frequency list without stop word removal is dominated by “the” (appears 5–7% of words in English text), “and,” “is,” etc. Removing stop words reveals the semantically meaningful words — the nouns, verbs, and adjectives that carry your content’s meaning.

For SEO purposes, count frequencies both with and without stop words:

  • With stop words: catches keyword phrases (stop words in the middle of phrases, like “how to convert”)
  • Without stop words: shows your actual content focus

Case normalization

“JSON”, “Json”, and “json” are the same word. Word frequency counters typically normalize to lowercase before counting. This matters for technical content where the same concept appears as an acronym (CSS, API, URL) and as a lowercase word (css, api, url).

Practical frequency analysis workflow

For a blog post

  1. Paste the draft into the Word Counter
  2. Note the primary keyword frequency — calculate density (occurrences ÷ total words × 100)
  3. Check secondary keywords appear at least twice
  4. Look for words appearing 5+ times that aren’t intentional keywords — these are your verbal tics
  5. Check the title, first paragraph, and last paragraph for primary keyword presence

For a webpage / landing page

Landing pages are shorter (300–500 words typically). Higher keyword density is expected and acceptable. Target 2–3% for the primary keyword.

The conversion-critical words (“free,” “now,” “guarantee,” names of pain points your product solves) should appear in the most visually prominent places: headings, bullet lists, CTAs.

For technical documentation

Stop words matter less. What matters is:

  • Every feature or concept mentioned in the UI appears in the docs
  • Terms are used consistently (not “user” in one place, “account holder” in another, “client” in a third)
  • Command names, function names, and configuration keys appear exactly as they appear in the code

Run a frequency count on docs and compare against a frequency count of the UI strings and API reference. Terms that appear frequently in the UI but rarely in docs are documentation gaps.

Python: word frequency from scratch

from collections import Counter
import re

def word_frequency(text, stop_words=None, top_n=20):
    # Normalize and tokenize
    words = re.findall(r'\b[a-zA-Z]+\b', text.lower())
    
    # Remove stop words
    if stop_words:
        words = [w for w in words if w not in stop_words]
    
    # Count and return top N
    return Counter(words).most_common(top_n)

stop_words = {'the', 'a', 'an', 'is', 'are', 'was', 'were', 'and', 'or',
              'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'it',
              'this', 'that', 'be', 'as', 'by', 'from', 'not', 'you'}

sample = """JSON is a lightweight data format. JSON is human-readable.
            Developers use JSON for APIs. JSON is used everywhere."""

freq = word_frequency(sample, stop_words=stop_words, top_n=10)
# [('json', 4), ('format', 1), ('lightweight', 1), ('data', 1), ...]

JavaScript: word frequency in the browser

function wordFrequency(text, topN = 20) {
  const stopWords = new Set(['the', 'a', 'an', 'is', 'are', 'was', 'and',
    'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'it',
    'this', 'that', 'be', 'as', 'by', 'from', 'not', 'you']);
  
  const words = text.toLowerCase()
    .match(/\b[a-z]+\b/g) || [];
  
  const freq = {};
  words
    .filter(w => !stopWords.has(w))
    .forEach(w => { freq[w] = (freq[w] || 0) + 1; });
  
  return Object.entries(freq)
    .sort((a, b) => b[1] - a[1])
    .slice(0, topN);
}

Tools that use word frequency data

SEO tools: Ahrefs, Semrush, and Surfer SEO analyze top-ranking pages’ word frequency to identify the terms that appear consistently in high-ranking content for a given query. Their “content optimization” features are built on comparing your page’s word frequency against the frequency profile of top 10 results.

Plagiarism checkers: Turnitin, Copyscape, and similar tools use frequency profiles as one signal in identifying copied content. Similar frequency distributions across unrelated documents flag potential copying.

Stylometric analysis: Word frequency is used in authorship attribution — identifying who wrote an anonymous text by comparing frequency patterns against known samples. This is the technique used to identify the Federalist Papers’ authors and to analyze disputed Shakespeare plays.

Spam filters: Email spam filters use word frequency as part of Bayesian classification. Emails that frequently use words like “FREE,” “CLICK NOW,” “GUARANTEED” have frequency profiles that match spam training data.

Keyword cannibalization detection

If you have multiple pages targeting the same keywords, a frequency analysis across pages reveals overlap. Pages competing for the same primary keyword will have similar top-10 frequency lists. This is a simplified version of what SEO tools call “cannibalization detection.”

The fix: consolidate competing pages, or clearly differentiate each page’s focus. The page with the highest authority (most backlinks) should own the primary keyword; related pages should target distinct variations.

  • Word Counter — count words, characters, sentences, paragraphs
  • Text Diff — compare two versions of a document
  • Case Converter — normalize text case for consistent comparison

Related posts

Related tool

Word Counter

Count words, characters, sentences, paragraphs, and lines. Reading time estimate, char-limit indicators for X, LinkedIn, meta titles, and more.

Written by Mian Ali Khalid. Part of the Dev Productivity pillar.