Text Diff Algorithms — How Myers, LCS, and Patience Diff Work
Text diff tools use algorithms like Myers diff, LCS (Longest Common Subsequence), and Patience diff to find minimal edit distances between texts. Understand how these work and...
Text diff algorithms find the minimum set of edits (insertions, deletions) to transform one text into another. Different algorithms make different tradeoffs between speed, output quality, and readability of the diff.
Use the text compare tool to compare two texts with highlighted differences.
The core problem: Edit Distance
Given two strings A and B, find the minimum number of insertions and deletions to turn A into B. This is equivalent to finding the Longest Common Subsequence (LCS) — the longest sequence of characters that appear in the same order in both strings.
A = "ABCDE"
B = "AZCDF"
LCS = "ACD"
Diff: delete B, delete E; insert Z after A, insert F after D
LCS (Longest Common Subsequence)
The classic O(mn) dynamic programming approach:
function lcs(a, b) {
const m = a.length, n = b.length;
const dp = Array.from({ length: m + 1 }, () => Array(n + 1).fill(0));
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (a[i-1] === b[j-1]) {
dp[i][j] = dp[i-1][j-1] + 1;
} else {
dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
}
// Backtrack to find actual LCS:
const result = [];
let i = m, j = n;
while (i > 0 && j > 0) {
if (a[i-1] === b[j-1]) {
result.unshift(a[i-1]);
i--; j--;
} else if (dp[i-1][j] > dp[i][j-1]) {
i--;
} else {
j--;
}
}
return result.join('');
}
Problem: O(mn) is too slow for large files. A 1000-line file comparison would be 1,000,000 operations.
Myers Diff Algorithm
The algorithm used by Git, GNU diff, and most diff tools. O(nd) where n = total lines and d = edit distance (number of differences).
Key insight: For typical source files, d is small (most lines are unchanged), so the algorithm terminates early.
Myers represents the diff as a "snake" path through an edit graph:
- Moving right = delete from A
- Moving down = insert from B
- Diagonal = characters match (free move)
Goal: find the path with the maximum number of diagonals (fewest edits)
// Simplified Myers diff for line arrays:
function myersDiff(a, b) {
const n = a.length, m = b.length;
const max = n + m;
const v = new Array(2 * max + 1).fill(0);
const trace = [];
for (let d = 0; d <= max; d++) {
trace.push([...v]);
for (let k = -d; k <= d; k += 2) {
let x;
if (k === -d || (k !== d && v[k - 1 + max] < v[k + 1 + max])) {
x = v[k + 1 + max]; // Move down (insert)
} else {
x = v[k - 1 + max] + 1; // Move right (delete)
}
let y = x - k;
while (x < n && y < m && a[x] === b[y]) { x++; y++; } // Follow diagonal
v[k + max] = x;
if (x >= n && y >= m) return backtrack(trace, a, b, max);
}
}
}
Patience Diff
Used by Bazaar, JJ, and optionally in Git (git diff --patience). Finds unique common lines and uses them as anchors:
1. Find lines that appear exactly once in both A and B
2. Find the LCS of those unique lines
3. Recursively diff the sections between anchors
Advantage: Produces cleaner diffs for refactored code. When functions move around, Patience diff keeps them together rather than matching unrelated { and } characters.
# Use patience diff in git:
git diff --patience
# Set as default:
git config diff.algorithm patience
Histogram Diff
Git’s default since Git 2.0 on some operations. Like Patience but handles repeated lines better:
git diff --diff-algorithm=histogram
Practical diff in JavaScript
// Using 'diff' npm package (battle-tested, uses Myers):
import Diff from 'diff';
// Line-by-line diff:
const changes = Diff.diffLines(oldText, newText);
changes.forEach(change => {
const prefix = change.added ? '+' : change.removed ? '-' : ' ';
change.value.split('\n').filter(Boolean).forEach(line => {
console.log(`${prefix} ${line}`);
});
});
// Word-level diff:
const wordChanges = Diff.diffWords('Hello World', 'Hello Beautiful World');
wordChanges.forEach(part => {
if (part.added) process.stdout.write(`[+${part.value}]`);
else if (part.removed) process.stdout.write(`[-${part.value}]`);
else process.stdout.write(part.value);
});
// Hello [+Beautiful ]World
Python difflib
import difflib
a = ['line 1\n', 'line 2\n', 'line 3\n']
b = ['line 1\n', 'changed line\n', 'line 3\n', 'line 4\n']
# Unified diff:
diff = list(difflib.unified_diff(a, b, fromfile='old.txt', tofile='new.txt'))
print(''.join(diff))
# HTML diff (visual):
d = difflib.HtmlDiff()
html = d.make_file(a, b)
# Sequence matcher (LCS-based):
matcher = difflib.SequenceMatcher(None, a, b)
ratio = matcher.ratio() # 0.0 to 1.0 similarity
opcodes = matcher.get_opcodes()
# [('equal', 0, 1, 0, 1), ('replace', 1, 2, 1, 2), ...]
Related tools
- Text Diff Tool — compare two texts online
- Git Diff Guide — using git diff effectively
- File Diff Tools — command-line and GUI diff tools
Related posts
- How Diff Tools Work: Myers Algorithm, Unified Format, and Merge Conflicts — A technical walkthrough of how diff works: the Myers algorithm, the three output…
- Diff Algorithm Explained — How Text Comparison Tools Work — Text diff tools use the LCS (Longest Common Subsequence) or Myers diff algorithm…
- Understanding Merge Conflicts — How 3-Way Diff Works — Merge conflicts occur when two branches edit the same lines differently. Learn h…
- File Diff Tools — Best Tools for Comparing Files and Directories — File diff tools show what changed between files or folders. Here's a comparison …
- Git Diff Guide — Reading and Using git diff Commands — git diff shows what changed between commits, branches, or the working tree. Here…
Related tool
Compare two text blocks line-by-line or word-by-word. Unified and split view. Shows added, removed, and changed segments with full color coding.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.