XML Parsing in JavaScript and Python — DOM, SAX, and XPath
Parse XML in JavaScript using DOMParser and the browser DOM, and in Python using ElementTree, lxml, and BeautifulSoup. Includes XPath queries, handling namespaces, and...
Use the tool
XML Formatter
Format, validate, and beautify XML documents.
XML parsing requires understanding tree traversal, namespace prefixes, and XPath. Modern JavaScript has DOMParser built-in; Python has xml.etree.ElementTree in the standard library.
Use the XML Formatter to format and validate XML before parsing.
JavaScript: DOMParser
const xml = `
<library>
<book id="1">
<title>Clean Code</title>
<author>Robert Martin</author>
<year>2008</year>
</book>
<book id="2">
<title>The Pragmatic Programmer</title>
<author>David Thomas</author>
<year>1999</year>
</book>
</library>`;
const parser = new DOMParser();
const doc = parser.parseFromString(xml, 'application/xml');
// Check for parse errors:
const error = doc.querySelector('parsererror');
if (error) throw new Error(error.textContent);
// Extract all books:
const books = [...doc.querySelectorAll('book')].map(book => ({
id: book.getAttribute('id'),
title: book.querySelector('title').textContent,
author: book.querySelector('author').textContent,
year: parseInt(book.querySelector('year').textContent),
}));
console.log(books);
// [{ id: '1', title: 'Clean Code', author: 'Robert Martin', year: 2008 }, ...]
JavaScript: Fetch and parse XML
async function fetchXml(url) {
const res = await fetch(url);
const text = await res.text();
const parser = new DOMParser();
const doc = parser.parseFromString(text, 'application/xml');
const error = doc.querySelector('parsererror');
if (error) throw new Error(`XML parse error: ${error.textContent}`);
return doc;
}
// Parse RSS feed:
const doc = await fetchXml('https://example.com/feed.rss');
const items = [...doc.querySelectorAll('item')].map(item => ({
title: item.querySelector('title')?.textContent,
link: item.querySelector('link')?.textContent,
pubDate: item.querySelector('pubDate')?.textContent,
description: item.querySelector('description')?.textContent,
}));
Python: xml.etree.ElementTree
import xml.etree.ElementTree as ET
xml_string = """
<library>
<book id="1">
<title>Clean Code</title>
<author>Robert Martin</author>
</book>
<book id="2">
<title>The Pragmatic Programmer</title>
<author>David Thomas</author>
</book>
</library>"""
root = ET.fromstring(xml_string)
# Iterate over children:
for book in root.findall('book'):
book_id = book.get('id')
title = book.findtext('title')
author = book.findtext('author')
print(f"ID: {book_id}, Title: {title}, Author: {author}")
# Parse from file:
tree = ET.parse('books.xml')
root = tree.getroot()
# XPath-like expressions (limited):
titles = [el.text for el in root.findall('.//title')]
# Serialize back to XML:
tree.write('output.xml', encoding='unicode', xml_declaration=True)
Python: lxml (full XPath support)
from lxml import etree
tree = etree.parse('books.xml')
# Full XPath:
titles = tree.xpath('//book/title/text()')
# ['Clean Code', 'The Pragmatic Programmer']
# Complex XPath:
books_after_2000 = tree.xpath('//book[year > 2000]')
# XPath with attribute:
book_1 = tree.xpath('//book[@id="1"]')[0]
Handling namespaces
Namespaces are the most frustrating part of XML parsing:
<!-- RSS 2.0 with Dublin Core namespace: -->
<rss xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<item>
<title>Article Title</title>
<dc:creator>John Doe</dc:creator>
</item>
</channel>
</rss>
import xml.etree.ElementTree as ET
NS = {
'dc': 'http://purl.org/dc/elements/1.1/',
}
root = ET.fromstring(xml_string)
for item in root.findall('.//item'):
title = item.findtext('title')
creator = item.findtext('dc:creator', namespaces=NS)
print(f"{title} by {creator}")
// JavaScript: namespace-aware queries need getElementsByTagNameNS:
const dcNS = 'http://purl.org/dc/elements/1.1/';
const creators = [...doc.getElementsByTagNameNS(dcNS, 'creator')];
creators.forEach(el => console.log(el.textContent));
XML to JSON conversion
import xml.etree.ElementTree as ET
def xml_to_dict(element):
"""Recursively convert XML element to dict."""
result = {}
# Attributes:
if element.attrib:
result['@attributes'] = element.attrib
# Text content:
text = element.text and element.text.strip()
if text:
result['#text'] = text
# Child elements:
for child in element:
child_data = xml_to_dict(child)
if child.tag in result:
if not isinstance(result[child.tag], list):
result[child.tag] = [result[child.tag]]
result[child.tag].append(child_data)
else:
result[child.tag] = child_data
return result
root = ET.fromstring(xml_string)
data = {root.tag: xml_to_dict(root)}
Related tools
- XML Formatter — format and validate XML
- XML to JSON Converter — XML to JSON conversion
- XML vs JSON — when to use XML
Related posts
- XML Still Matters in 2026 (Here's Where and Why) — JSON won the wire format war years ago, but XML is still everywhere it actually …
- XML Formatter Online — Beautify and Validate XML Instantly — An XML formatter adds proper indentation to minified XML, making it human-readab…
- XML vs JSON in API Design — When to Choose Each Format — JSON has largely replaced XML in REST APIs, but XML still dominates in SOAP, ent…
- XML to JSON Converter — Transform XML Data to JSON — Converting XML to JSON maps elements, attributes, and text nodes to JSON objects…
- XML vs JSON — Which Data Format Should You Use? — XML and JSON both represent structured data but make different tradeoffs. XML su…
Related tool
XML Formatter
Format, validate, and beautify XML documents.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.