X Xerobit

Comparing JSON Structurally (Not Just as Strings)

Two JSON documents can be byte-different and semantically identical. Or byte-identical and structurally inequivalent. Here's how to compare JSON the right way.

Mian Ali Khalid · · 11 min read
Use the tool
JSON Diff
Compare two JSON objects structurally with field-by-field diff.
Open JSON Diff →

You diff two JSON files in git. It says they differ on every line. You scan the diffs and they all look the same. The fields are in different orders, the indentation changed, a number got serialized as 1.0 instead of 1. None of it matters semantically. The data is identical.

This is why string-diffing JSON gets you nowhere. JSON is a structured format. Comparing it as text is comparing a sentence by counting commas. You need a structural comparison — one that treats {"a":1,"b":2} and {"b": 2, "a": 1} as equal, because they are.

This post is the field guide to JSON diff: what makes two JSON documents equal, how to do the comparison correctly, and where the edge cases bite.

What does it mean for two JSON documents to be equal?

Two JSON documents are equal when their parsed values are equal under the JSON data model — not their byte-for-byte representations. That means key order in objects doesn’t matter, whitespace doesn’t matter, and equivalent numeric forms (1, 1.0, 1e0) compare equal. Anything else is a textual diff, not a structural one.

The JSON data model has seven value types: string, number, true, false, null, array, and object. Two documents are equal when their root values are equal, applied recursively. The exact rules:

  • Strings: equal if their Unicode codepoint sequences match. "é" and "é" are equal — both produce the codepoint U+00E9.
  • Numbers: equal if they represent the same mathematical value. 1, 1.0, and 1e0 are all equal.
  • Booleans and null: equal only to themselves.
  • Arrays: equal if same length, element-by-element equal, order matters.
  • Objects: equal if same key set, each key’s value pairwise equal, key order does not matter.

Notice the asymmetry between arrays and objects. Array order is part of the data; object key order is not. This is the core insight that string-diff misses.

What string-diff gets wrong

Here are five differences a textual diff flags as changes that a structural diff correctly ignores:

// Document A
{"name": "Ada", "age": 30, "active": true}

// Document B
{
  "active": true,
  "name": "Ada",
  "age": 30.0
}

A line-by-line diff says these documents share zero lines. A structural diff says they’re identical: same keys, same values, same logical types. The differences are:

  1. Key ordering — A has name, age, active; B has active, name, age. JSON objects are unordered.
  2. Whitespace and indentation — A is one line, B is multi-line. Whitespace between tokens has no semantic value.
  3. Number representation30 vs 30.0. Both parse to the mathematical integer 30.
  4. Trailing newline — irrelevant.
  5. Quote style — JSON only allows ", but if you’d transcoded from JSON5, this would matter at the lexical level and not the data level.

When does a textual diff actually help?

A textual diff is useful when the file is the artifact (committed configs, generated fixtures with stable ordering) and you care about churn. If a tool serializes keys in a non-deterministic order, every commit looks like a full rewrite. That’s an annoyance in code review, not a semantic change. Fix it by making the serializer stable, then diffs become meaningful again.

For comparing data — API responses, test outputs, snapshot fixtures — always use a structural diff.

How to do a structural JSON diff

The general algorithm:

function jsonEqual(a, b):
  if typeof(a) !== typeof(b): return false
  if a is null: return b is null
  if a is primitive: return a === b  // strings, numbers, booleans
  if a is array:
    if length(a) !== length(b): return false
    for i in 0..length(a):
      if !jsonEqual(a[i], b[i]): return false
    return true
  if a is object:
    if keys(a).size !== keys(b).size: return false
    for key in keys(a):
      if !(key in b): return false
      if !jsonEqual(a[key], b[key]): return false
    return true

Three things to know about real implementations:

  1. Parse first, compare values. Never compare on serialized strings — you’ll re-introduce the ordering and whitespace problems. Parse both inputs to native data structures, then compare those.
  2. Most languages give you 80% of this for free. Python’s == on dict and list does a deep structural compare. JavaScript does not — {a:1} === {a:1} is false because objects are reference-compared. You need JSON.stringify with sorted keys, or a deep-equal library.
  3. Number equality has gotchas. See below.

The number gotcha

JSON numbers are technically arbitrary-precision decimal strings. Most parsers convert them to native floating-point (IEEE 754 doubles). That conversion can lose information.

// Document A
{"id": 9007199254740993}

// Document B
{"id": 9007199254740992}

These differ by one. But if you parse both with JSON.parse in JavaScript, both become 9007199254740992 (the maximum safe integer). The diff library compares two equal floats and says: equal. The data was different; the comparison says it isn’t.

Workarounds:

  • For 64-bit IDs, store them as strings in the JSON itself. Both sides see "9007199254740993" and the parser preserves precision.
  • Use a BigInt-aware parser (json-bigint in Node, the decimal.Decimal round-trip in Python with json.loads(parse_float=Decimal)).
  • Compare on the serialized form for known-precision-sensitive fields. Fall back to string equality for those keys.

For most diffs this never matters. For financial data, IDs above 2⁵³, or scientific data, it absolutely does.

Producing a useful diff (not just true/false)

Equality is binary. A diff tells you what changed. The output you want is a list of paths and operations:

[
  { "op": "replace", "path": "/user/age", "from": 30, "to": 31 },
  { "op": "add", "path": "/user/email", "value": "ada@example.com" },
  { "op": "remove", "path": "/user/middleName" }
]

This is the JSON Patch format (RFC 6902). It’s standardized, machine-readable, and can be applied programmatically to transform A into B. Most JSON diff libraries can emit it.

The path syntax is JSON Pointer (RFC 6901): /user/age means “the age key inside the user object.” Slashes inside keys are escaped as ~1; tildes as ~0. Array indices are integers: /items/0/price.

Array diffs are the hard part

Arrays expose the deepest structural-diff problem: when an array element changes, was it edited, or was it inserted, shifting everything after it?

// Document A
["apple", "banana", "cherry"]

// Document B
["apple", "blueberry", "banana", "cherry"]

A naive index-by-index compare says: position 1 changed (banana → blueberry), position 2 changed (cherry → banana), position 3 was added (cherry). Three operations. The actual change is one insertion at position 1.

Real diff algorithms run a longest-common-subsequence (LCS) or Myers diff on the array, the same algorithm git diff uses on lines. The library jsondiffpatch does this; so does json-diff-ts. For arrays of objects, you can hint the diff with a key extractor (like a primary-key field) to align matching items.

If your tool reports five changes when the human change was one, you’re seeing this problem. Switch to a diff that handles array alignment, or pre-sort if order doesn’t matter.

Sorted-key serialization: the cheap structural-equality trick

You don’t always need a fancy diff library. You need to know if two documents are equal. The cheap trick:

function canonical(obj) {
  if (obj === null || typeof obj !== 'object') return JSON.stringify(obj);
  if (Array.isArray(obj)) return '[' + obj.map(canonical).join(',') + ']';
  const keys = Object.keys(obj).sort();
  return '{' + keys.map(k => JSON.stringify(k) + ':' + canonical(obj[k])).join(',') + '}';
}

canonical(a) === canonical(b)  // structural equality

This sorts keys recursively, drops whitespace, and gives you a canonical string form. Two documents equal under this scheme are structurally equal. It’s not as fast as a deep-equal that short-circuits on the first difference, but it’s three lines and good enough for most cases.

Python users get this nearly for free:

import json
canonical = lambda x: json.dumps(x, sort_keys=True, separators=(',', ':'))
canonical(a) == canonical(b)

Caveat: this still has the float-precision issue. 1 and 1.0 will canonicalize to different strings ("1" vs "1.0") — Python’s json.dumps preserves the input type. If you need numeric equivalence, normalize numbers first.

Common cases summary

DifferenceTextual diffStructural diff
Key reorderingflaggedidentical
Whitespace/indentationflaggedidentical
30 vs 30.0flaggedidentical
Float precision (2⁵³ + 1)flaggeddepends on parser
Array element insertedreports as N editsone insert (with LCS)
Genuine value changeflaggedflagged

Tooling

For ad-hoc comparison, paste both documents into the JSON Diff tool — it does a structural compare with array-LCS alignment and shows the path-level diff inline.

For programmatic use, pick by language:

A working principle

Compare data as data. The string is just a serialization — it’s the rendered form, not the value. When you diff a JSON document by its bytes, you’re diffing the rendering, which carries noise from key ordering, whitespace, and numeric format choices that have nothing to do with the actual data.

Same principle applies elsewhere: compare images at the pixel level, not the file level. Compare HTML at the DOM level, not the source level. Compare data at the data level, not the serialization level.

If your CI keeps flagging “JSON changed” when nothing semantic changed, you’re at the boundary. Switch to structural comparison and the false positives evaporate.

Further reading


Related posts

Related tool

JSON Diff

Compare two JSON objects structurally with field-by-field diff.

Written by Mian Ali Khalid. Part of the Data & Format pillar.