Diff Algorithm Explained — How Text Comparison Tools Work
Text diff tools use the LCS (Longest Common Subsequence) or Myers diff algorithm to find minimal edits between two texts. Here's how diff algorithms work and how to use them in...
Text diff algorithms find the minimal set of changes to transform one text into another. They power git diff, code review tools, and merge conflict resolution. Understanding how they work helps you interpret diff output and choose the right tool for your use case.
Use the text compare tool to compare two texts and see every difference highlighted.
The LCS foundation
The core of most diff algorithms is the Longest Common Subsequence (LCS) problem: find the longest sequence of elements that appear in both texts in the same order.
Text A: "ABCBDAB"
Text B: "BDCAB"
LCS: "BCAB" (length 4)
Everything in Text A that’s not in the LCS was deleted. Everything in Text B not in the LCS was inserted. The LCS itself is unchanged.
For text diffing, the “elements” are usually lines (for line-by-line diff) or characters (for character-level diff).
Myers diff algorithm
The Myers diff algorithm (1986) is the foundation of diff, git diff, and most modern diff tools. It finds the shortest edit script — the minimum number of insertions and deletions to transform text A into text B.
It uses a greedy approach with a graph:
- Each cell (x, y) represents having matched x characters from A and y characters from B
- Moving right = deleting a character from A
- Moving down = inserting a character from B
- Moving diagonally = matching a character (costs nothing)
The algorithm finds the shortest path through this grid from (0,0) to (|A|, |B|).
Why Myers is used everywhere: It’s O(ND) time where N = total characters and D = edit distance. For texts with few changes (common in code commits), D is small and the algorithm is very fast.
Unified diff format
The standard output format for text diffs:
--- file1.txt 2024-03-01 10:00:00
+++ file2.txt 2024-03-04 14:20:00
@@ -1,7 +1,7 @@
function greet(name) {
- console.log("Hello, " + name);
+ console.log(`Hello, ${name}!`);
return true;
}
-// TODO: add goodbye
+// TODO: add goodbye function
Format elements:
---— original file (deleted content)+++— new file (added content)@@— hunk header:-1,7means starting at line 1, showing 7 lines from original;+1,7same for new(space) — unchanged context line-(minus) — line removed from original+(plus) — line added in new version
Context lines: Unified diff shows 3 context lines by default around each change. This helps locate the change without reading the whole file.
Diff in code
JavaScript (diff library)
import Diff from 'diff';
const oldText = `function greet(name) {
console.log("Hello, " + name);
return true;
}`;
const newText = `function greet(name) {
console.log(\`Hello, \${name}!\`);
return true;
}`;
// Word-by-word diff:
const wordDiff = Diff.diffWords(oldText, newText);
wordDiff.forEach((part) => {
const color = part.added ? 'green' : part.removed ? 'red' : 'white';
console.log(`[${color}] ${part.value}`);
});
// Line-by-line diff:
const lineDiff = Diff.diffLines(oldText, newText);
lineDiff.forEach((part) => {
const prefix = part.added ? '+' : part.removed ? '-' : ' ';
part.value.split('\n').filter(Boolean).forEach(line => {
console.log(`${prefix} ${line}`);
});
});
// Create unified patch:
const patch = Diff.createPatch('file.js', oldText, newText);
console.log(patch);
// Apply a patch:
const patched = Diff.applyPatch(oldText, patch);
Python (difflib)
import difflib
old_lines = """function greet(name) {
console.log("Hello, " + name);
return true;
}""".splitlines(keepends=True)
new_lines = """function greet(name) {
console.log(`Hello, ${name}!`);
return true;
}""".splitlines(keepends=True)
# Unified diff:
diff = difflib.unified_diff(
old_lines,
new_lines,
fromfile='old.js',
tofile='new.js',
n=3 # Context lines
)
print(''.join(diff))
# HTML diff (side-by-side):
differ = difflib.HtmlDiff()
html = differ.make_file(old_lines, new_lines,
fromdesc='Original', todesc='Modified')
with open('diff.html', 'w') as f:
f.write(html)
# Sequence matcher (similarity ratio):
matcher = difflib.SequenceMatcher(None, 'Hello World', 'Hello Python')
print(matcher.ratio()) # 0.636... (similarity 0-1)
# Find similar strings:
matches = difflib.get_close_matches('pytohn', ['python', 'Cython', 'jython'], n=3)
print(matches) # ['python', 'jython', 'Cython']
Three-way merge
Git merge uses three-way diff: the base version, your changes, and their changes.
Base: Line A, Line B, Line C
Ours: Line A, Line B_modified, Line C
Theirs: Line A, Line B, Line C_modified
Result: Line A, Line B_modified, Line C_modified (auto-merge)
A conflict occurs when both sides modify the same area:
Base: Line A, Line B, Line C
Ours: Line A, Line B_ours, Line C
Theirs: Line A, Line B_theirs, Line C
Conflict:
<<<<<<< HEAD
Line B_ours
=======
Line B_theirs
>>>>>>> feature/branch
The conflict markers show both versions; you choose which to keep.
Character-level vs line-level diff
Line-level diff: Compares whole lines. Fast, standard for code, natural for humans to read. A change anywhere on a line marks the whole line changed.
Word-level diff: Compares individual words. Shows which words changed within a line. Useful for prose/documentation.
Character-level diff: Most granular. Shows exact character positions that changed. Useful for cryptic single-character changes (typos, encoding differences).
When to use each:
- Source code review: line-level
- Documentation/markdown: word-level
- Configuration files with long single-line values: character-level
Semantic diff
Standard diff is syntactic — it doesn’t understand code structure. Tools like difftastic use language-aware parsing to produce semantically meaningful diffs:
// Code change:
const x = {a: 1, b: 2, c: 3};
// to:
const x = {b: 2, a: 1, c: 3};
// Syntactic diff: 1 line changed
// Semantic diff: no structural change (just key reorder)
Language-aware diff tools reduce noise in code reviews by ignoring reformatting when the logic is unchanged.
Related tools
- Text Diff — compare two texts online
- Text Compare Online — compare text step by step
- JSON Diff Tool — compare JSON structures
Related posts
- How Diff Tools Work: Myers Algorithm, Unified Format, and Merge Conflicts — A technical walkthrough of how diff works: the Myers algorithm, the three output…
- Understanding Merge Conflicts — How 3-Way Diff Works — Merge conflicts occur when two branches edit the same lines differently. Learn h…
- File Diff Tools — Best Tools for Comparing Files and Directories — File diff tools show what changed between files or folders. Here's a comparison …
- JSON Diff Tool — Compare Two JSON Objects and Find Differences — A JSON diff tool compares two JSON structures semantically, not textually. It fi…
- Text Compare Online — Find Differences Between Two Texts — Text comparison highlights added, removed, and changed lines between two text bl…
Related tool
Compare two text blocks line-by-line or word-by-word. Unified and split view. Shows added, removed, and changed segments with full color coding.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.