Comparing JSON Structurally (Not Just as Strings)
Two JSON documents can be byte-different and semantically identical. Or byte-identical and structurally inequivalent. Here's how to compare JSON the right way.
You diff two JSON files in git. It says they differ on every line. You scan the diffs and they all look the same. The fields are in different orders, the indentation changed, a number got serialized as 1.0 instead of 1. None of it matters semantically. The data is identical.
This is why string-diffing JSON gets you nowhere. JSON is a structured format. Comparing it as text is comparing a sentence by counting commas. You need a structural comparison — one that treats {"a":1,"b":2} and {"b": 2, "a": 1} as equal, because they are.
This post is the field guide to JSON diff: what makes two JSON documents equal, how to do the comparison correctly, and where the edge cases bite.
What does it mean for two JSON documents to be equal?
Two JSON documents are equal when their parsed values are equal under the JSON data model — not their byte-for-byte representations. That means key order in objects doesn’t matter, whitespace doesn’t matter, and equivalent numeric forms (1, 1.0, 1e0) compare equal. Anything else is a textual diff, not a structural one.
The JSON data model has seven value types: string, number, true, false, null, array, and object. Two documents are equal when their root values are equal, applied recursively. The exact rules:
- Strings: equal if their Unicode codepoint sequences match.
"é"and"é"are equal — both produce the codepoint U+00E9. - Numbers: equal if they represent the same mathematical value.
1,1.0, and1e0are all equal. - Booleans and null: equal only to themselves.
- Arrays: equal if same length, element-by-element equal, order matters.
- Objects: equal if same key set, each key’s value pairwise equal, key order does not matter.
Notice the asymmetry between arrays and objects. Array order is part of the data; object key order is not. This is the core insight that string-diff misses.
What string-diff gets wrong
Here are five differences a textual diff flags as changes that a structural diff correctly ignores:
// Document A
{"name": "Ada", "age": 30, "active": true}
// Document B
{
"active": true,
"name": "Ada",
"age": 30.0
}
A line-by-line diff says these documents share zero lines. A structural diff says they’re identical: same keys, same values, same logical types. The differences are:
- Key ordering — A has
name, age, active; B hasactive, name, age. JSON objects are unordered. - Whitespace and indentation — A is one line, B is multi-line. Whitespace between tokens has no semantic value.
- Number representation —
30vs30.0. Both parse to the mathematical integer 30. - Trailing newline — irrelevant.
- Quote style — JSON only allows
", but if you’d transcoded from JSON5, this would matter at the lexical level and not the data level.
When does a textual diff actually help?
A textual diff is useful when the file is the artifact (committed configs, generated fixtures with stable ordering) and you care about churn. If a tool serializes keys in a non-deterministic order, every commit looks like a full rewrite. That’s an annoyance in code review, not a semantic change. Fix it by making the serializer stable, then diffs become meaningful again.
For comparing data — API responses, test outputs, snapshot fixtures — always use a structural diff.
How to do a structural JSON diff
The general algorithm:
function jsonEqual(a, b):
if typeof(a) !== typeof(b): return false
if a is null: return b is null
if a is primitive: return a === b // strings, numbers, booleans
if a is array:
if length(a) !== length(b): return false
for i in 0..length(a):
if !jsonEqual(a[i], b[i]): return false
return true
if a is object:
if keys(a).size !== keys(b).size: return false
for key in keys(a):
if !(key in b): return false
if !jsonEqual(a[key], b[key]): return false
return true
Three things to know about real implementations:
- Parse first, compare values. Never compare on serialized strings — you’ll re-introduce the ordering and whitespace problems. Parse both inputs to native data structures, then compare those.
- Most languages give you 80% of this for free. Python’s
==ondictandlistdoes a deep structural compare. JavaScript does not —{a:1} === {a:1}isfalsebecause objects are reference-compared. You needJSON.stringifywith sorted keys, or a deep-equal library. - Number equality has gotchas. See below.
The number gotcha
JSON numbers are technically arbitrary-precision decimal strings. Most parsers convert them to native floating-point (IEEE 754 doubles). That conversion can lose information.
// Document A
{"id": 9007199254740993}
// Document B
{"id": 9007199254740992}
These differ by one. But if you parse both with JSON.parse in JavaScript, both become 9007199254740992 (the maximum safe integer). The diff library compares two equal floats and says: equal. The data was different; the comparison says it isn’t.
Workarounds:
- For 64-bit IDs, store them as strings in the JSON itself. Both sides see
"9007199254740993"and the parser preserves precision. - Use a BigInt-aware parser (
json-bigintin Node, thedecimal.Decimalround-trip in Python withjson.loads(parse_float=Decimal)). - Compare on the serialized form for known-precision-sensitive fields. Fall back to string equality for those keys.
For most diffs this never matters. For financial data, IDs above 2⁵³, or scientific data, it absolutely does.
Producing a useful diff (not just true/false)
Equality is binary. A diff tells you what changed. The output you want is a list of paths and operations:
[
{ "op": "replace", "path": "/user/age", "from": 30, "to": 31 },
{ "op": "add", "path": "/user/email", "value": "ada@example.com" },
{ "op": "remove", "path": "/user/middleName" }
]
This is the JSON Patch format (RFC 6902). It’s standardized, machine-readable, and can be applied programmatically to transform A into B. Most JSON diff libraries can emit it.
The path syntax is JSON Pointer (RFC 6901): /user/age means “the age key inside the user object.” Slashes inside keys are escaped as ~1; tildes as ~0. Array indices are integers: /items/0/price.
Array diffs are the hard part
Arrays expose the deepest structural-diff problem: when an array element changes, was it edited, or was it inserted, shifting everything after it?
// Document A
["apple", "banana", "cherry"]
// Document B
["apple", "blueberry", "banana", "cherry"]
A naive index-by-index compare says: position 1 changed (banana → blueberry), position 2 changed (cherry → banana), position 3 was added (cherry). Three operations. The actual change is one insertion at position 1.
Real diff algorithms run a longest-common-subsequence (LCS) or Myers diff on the array, the same algorithm git diff uses on lines. The library jsondiffpatch does this; so does json-diff-ts. For arrays of objects, you can hint the diff with a key extractor (like a primary-key field) to align matching items.
If your tool reports five changes when the human change was one, you’re seeing this problem. Switch to a diff that handles array alignment, or pre-sort if order doesn’t matter.
Sorted-key serialization: the cheap structural-equality trick
You don’t always need a fancy diff library. You need to know if two documents are equal. The cheap trick:
function canonical(obj) {
if (obj === null || typeof obj !== 'object') return JSON.stringify(obj);
if (Array.isArray(obj)) return '[' + obj.map(canonical).join(',') + ']';
const keys = Object.keys(obj).sort();
return '{' + keys.map(k => JSON.stringify(k) + ':' + canonical(obj[k])).join(',') + '}';
}
canonical(a) === canonical(b) // structural equality
This sorts keys recursively, drops whitespace, and gives you a canonical string form. Two documents equal under this scheme are structurally equal. It’s not as fast as a deep-equal that short-circuits on the first difference, but it’s three lines and good enough for most cases.
Python users get this nearly for free:
import json
canonical = lambda x: json.dumps(x, sort_keys=True, separators=(',', ':'))
canonical(a) == canonical(b)
Caveat: this still has the float-precision issue. 1 and 1.0 will canonicalize to different strings ("1" vs "1.0") — Python’s json.dumps preserves the input type. If you need numeric equivalence, normalize numbers first.
Common cases summary
| Difference | Textual diff | Structural diff |
|---|---|---|
| Key reordering | flagged | identical |
| Whitespace/indentation | flagged | identical |
30 vs 30.0 | flagged | identical |
| Float precision (2⁵³ + 1) | flagged | depends on parser |
| Array element inserted | reports as N edits | one insert (with LCS) |
| Genuine value change | flagged | flagged |
Tooling
For ad-hoc comparison, paste both documents into the JSON Diff tool — it does a structural compare with array-LCS alignment and shows the path-level diff inline.
For programmatic use, pick by language:
- JavaScript/TypeScript: jsondiffpatch (rich diffs, array LCS), fast-json-patch (RFC 6902)
- Python: jsondiff, deepdiff
- Go: evanphx/json-patch, wI2L/jsondiff
- CLI:
jddoes structural diffs from the command line, useful in CI pipelines
A working principle
Compare data as data. The string is just a serialization — it’s the rendered form, not the value. When you diff a JSON document by its bytes, you’re diffing the rendering, which carries noise from key ordering, whitespace, and numeric format choices that have nothing to do with the actual data.
Same principle applies elsewhere: compare images at the pixel level, not the file level. Compare HTML at the DOM level, not the source level. Compare data at the data level, not the serialization level.
If your CI keeps flagging “JSON changed” when nothing semantic changed, you’re at the boundary. Switch to structural comparison and the false positives evaporate.
Further reading
- What Is JSON and Why You Should Always Format It — the format basics
- The 10 Most Common JSON Validation Errors — what breaks before you can diff
- YAML vs JSON: Which to Use When — when JSON isn’t the right format
- RFC 8259 — JSON spec
- RFC 6902 — JSON Patch
- RFC 6901 — JSON Pointer
Related posts
- How to Compare JSON Objects — Deep Equality and Diff in JavaScript and Python — Comparing JSON objects with == won't work for deep equality. Here's how to deep-…
- Detecting Changes in JSON Data — Audit Logs, Diffs, and Change Tracking — Detecting what changed in a JSON document is essential for audit logs, versionin…
- JSON API Response Format — Structuring REST API Responses — A consistent JSON response format makes APIs predictable and easier to consume. …
- What Is JSON and Why You Should Always Format It — JSON is the universal data format of the modern web. This is what it actually is…
- The 10 Most Common JSON Validation Errors (and How to Fix Them) — Every JSON parse error in production traces back to one of ten root causes. This…
Related tool
Compare two JSON objects structurally with field-by-field diff.
Written by Mian Ali Khalid. Part of the Data & Format pillar.