X Xerobit

How Diff Tools Work: Myers Algorithm, Unified Format, and Merge Conflicts

A technical walkthrough of how diff works: the Myers algorithm, the three output formats, and what those conflict markers in your code actually mean.

Mian Ali Khalid · · 7 min read
Use the tool
Text Diff
Compare two text blocks line-by-line or word-by-word. Unified and split view. Shows added, removed, and changed segments with full color coding.
Open Text Diff →

Every time you run git diff, type diff file1 file2, or open a pull request, a diff algorithm runs and tells you what changed. Most developers accept the output as a black box. That’s fine until you’re staring at a conflict marker during a painful merge and have no idea what the tool is trying to tell you.

This post opens the box. You’ll understand why git diff outputs what it does, what each format means, and how to resolve merge conflicts with a clear mental model instead of guesswork.

The edit distance problem

Before any algorithm, there’s a math problem: given two sequences A and B, what is the minimum number of insertions and deletions needed to turn A into B?

This is called the edit distance (or Levenshtein distance in the general case, though for diffs we usually count only insertions and deletions, not substitutions). The result is the Shortest Edit Script (SES) — the fewest possible operations.

Why does minimum matter? Because a diff is most useful when it shows you only what actually changed. Any diff tool could show you “delete everything, insert everything” — that’s technically correct and completely useless. The Myers algorithm finds the edit script that is genuinely shortest.

The Myers diff algorithm

Eugene Myers published his algorithm in 1986 in a paper titled “An O(ND) Difference Algorithm and Its Applications.” It’s used in Git, GNU diff, and most other diff tools. The “ND” in O(ND) refers to the length of the sequences (N+M) and the number of differences (D) — the algorithm runs faster when the files are more similar.

The key insight is to model the problem as a graph traversal.

The edit graph

Lay out the lines of file A along the x-axis and file B along the y-axis. You want to find a path from the top-left corner (0, 0) to the bottom-right corner (len(A), len(B)) using three types of moves:

  • Right (x+1): delete a line from A
  • Down (y+1): insert a line from B
  • Diagonal (x+1, y+1): lines match, no edit needed

The shortest edit script corresponds to the path with the most diagonal moves — in other words, the path that matches the most lines without editing them. Diagonal moves are free.

          B: "apple" "cherry" "date"
A: "apple"   \
A: "banana"    (no diagonal — these don't match)
A: "cherry"            \

Myers’ algorithm explores this graph layer by layer, always finding paths that travel the furthest diagonally. It’s essentially a breadth-first search over edit depth D, starting at D=0 (perfect match) and increasing until it reaches the destination.

Why it handles real files efficiently

For files that are mostly the same with a few changes — which is the common case in version control — D is small. The algorithm terminates quickly because it finds the destination with minimal edit steps. The worst case (completely different files) degrades to O(N*M), but in practice Myers runs fast on realistic source code.

One practical detail: Myers operates on hashed lines, not raw bytes. Each line of the file is hashed to a number. The comparison is done on those integers, not on the full string content. This keeps inner-loop operations cheap.

The three diff output formats

diff has three output modes. You’ve seen all of them; here’s what they mean.

Take two versions of a config file:

file-a.conf:

host = localhost
port = 5432
debug = false
timeout = 30

file-b.conf:

host = db.internal
port = 5432
debug = true
timeout = 30
max_connections = 100

Normal diff (default)

$ diff file-a.conf file-b.conf
1c1
< host = localhost
---
> host = db.internal
3c3
< debug = false
---
> debug = true
4a5
> max_connections = 100

The opcodes: 1c1 means “line 1 in A was changed to line 1 in B.” 4a5 means “after line 4 in A, insert line 5 of B.” < is A (old), > is B (new). This format is machine-readable but not human-friendly.

Context diff (-c flag)

$ diff -c file-a.conf file-b.conf
*** file-a.conf
--- file-b.conf
***************
*** 1,4 ****
! host = localhost
  port = 5432
! debug = false
  timeout = 30
--- 1,5 ----
! host = db.internal
  port = 5432
! debug = true
  timeout = 30
+ max_connections = 100

Context diff shows three lines of surrounding context by default (configurable with -C N). ! means changed, + means added, - means deleted. The *** 1,4 **** header says “this is lines 1–4 of the old file.” Context diff was the standard before unified diff took over.

Unified diff (-u flag)

$ diff -u file-a.conf file-b.conf
--- file-a.conf  2026-05-04 10:00:00
+++ file-b.conf  2026-05-04 10:01:00
@@ -1,4 +1,5 @@
-host = localhost
+host = db.internal
 port = 5432
-debug = false
+debug = true
 timeout = 30
+max_connections = 100

This is the format you see in git diff and every pull request on GitHub, GitLab, and Bitbucket. The rules:

  • --- is the old file, +++ is the new file
  • @@ -1,4 +1,5 @@ is the hunk header: “starting at line 1, show 4 lines from old file / starting at line 1, show 5 lines from new file”
  • Lines prefixed with - were removed
  • Lines prefixed with + were added
  • Lines with a space prefix are context (unchanged)

Unified diff is more compact than context diff because shared context is shown once, not twice. The text compare tool produces unified diff output for any pair of text inputs.

Merge conflicts: what causes them and what the markers mean

A merge conflict happens when two branches both modify the same region of a file in incompatible ways. The diff tool knows how to combine them when changes are in different areas. When they overlap, it can’t decide — it marks the conflict and hands it to you.

The mechanics of a 3-way merge

Git uses a 3-way merge, not a 2-way comparison between the two branches. It needs three inputs:

  1. Base: the common ancestor commit — the version before either branch touched the file
  2. Ours: the current branch’s version
  3. Theirs: the incoming branch’s version

With a base, Git can tell the difference between “both branches made the same change” (use either, no conflict) and “they made different changes to the same lines” (conflict). Without a base, two branches adding the same new function would look like a conflict — with a base, Git sees both started from the same code and made identical edits.

Reading conflict markers

<<<<<<< HEAD
debug = true
max_connections = 50
=======
debug = false
max_connections = 100
>>>>>>> feature/scaling
  • <<<<<<< HEAD — the start of your version (current branch)
  • Everything between <<<<<<< and ======= — what your branch has
  • ======= — the divider
  • Everything between ======= and >>>>>>> — what the incoming branch has
  • >>>>>>> feature/scaling — the end of their version, with the branch name

The base version (before either branch touched these lines) is not shown by default. With git diff --merge or git checkout --conflict=diff3 you can enable a three-section format that adds a ||||||| base block between <<< and === showing the ancestor. This is extremely helpful for understanding why both sides made the change they did.

Resolving strategies

Accept ours: Keep everything between <<<<<<< HEAD and =======, delete the rest including all markers. Use this when you know your branch’s version is correct.

Accept theirs: Keep everything between ======= and >>>>>>>, delete the rest. Use this when the incoming branch has the right answer.

Manual merge: Read both versions, write a combined result that’s neither. This is the common case — often both branches made valid changes and you need both. Delete all four marker lines and write the merged content.

Use a visual diff tool: git mergetool launches a configured tool (VS Code, IntelliJ, vimdiff, etc.) that shows three panes — base, ours, theirs — and a fourth output pane. For multi-line conflicts where context matters, visual tools are much faster than hand-editing the markers.

After resolving all conflicts in a file, git add <file> marks it resolved. git status will show remaining conflicted files.

When to use a visual diff tool vs CLI diff

Use CLI diff (git diff, diff -u) when:

  • Reviewing changes as part of a code review workflow in the terminal
  • Writing scripts that parse diff output (unified format is straightforward to parse)
  • You need a quick sanity check on a single file before committing
  • You’re SSH’d into a remote machine and don’t have a GUI

Use a visual diff tool when:

  • Resolving merge conflicts with more than a handful of changed lines
  • Comparing two versions of a configuration file where whitespace matters
  • Reviewing a large diff where you need to scan quickly — colored columns beat +/- lines for pattern-matching
  • Working with binary-adjacent formats (CSV, YAML with complex nesting) where alignment is important

For text comparisons outside of git — comparing API responses, config snapshots, log extracts — the text compare tool does unified diff in the browser, no CLI needed. It runs entirely client-side, so you can safely paste credentials or internal config without them leaving your machine.

A note on line endings

One source of spurious diffs that every developer eventually hits: CRLF vs LF. On Windows, many editors write \r\n line endings. On Unix systems, Git expects \n. If your diff is flagging every single line as changed, check for line-ending differences first.

# Check line endings in a file
file myconfig.conf
# or
cat -A myconfig.conf | head -5
# CRLF shows as ^M at line ends

In Git, core.autocrlf and .gitattributes eol settings control how line endings are handled on checkout and commit. For cross-platform repos, setting * text=auto in .gitattributes normalizes line endings on commit.

The Myers algorithm is comparing hashed lines. If line endings differ, the hashes differ, and every line looks changed. This is not a diff algorithm bug — it’s line-ending inconsistency in the input.

Further reading


Related posts

Related tool

Text Diff

Compare two text blocks line-by-line or word-by-word. Unified and split view. Shows added, removed, and changed segments with full color coding.

Written by Mian Ali Khalid. Part of the Dev Productivity pillar.