XPath Tutorial — Query XML Documents with Path Expressions
XPath is a query language for selecting nodes from XML documents. It works like CSS selectors for HTML but for XML. Here's XPath syntax, axes, predicates, and functions with...
XPath (XML Path Language) is a query language for selecting nodes from XML documents. It uses path-like syntax to navigate the element tree, similar to file system paths or CSS selectors — but with more expressive filtering via predicates and functions.
Use the XML Formatter to inspect and work with XML structures.
XPath fundamentals
Given this XML:
<bookstore>
<book category="fiction">
<title lang="en">The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>12.99</price>
</book>
<book category="tech">
<title lang="en">Clean Code</title>
<author>Robert C. Martin</author>
<price>34.99</price>
</book>
<book category="fiction">
<title lang="fr">Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
<price>9.99</price>
</book>
</bookstore>
Basic XPath expressions:
| Expression | Selects |
|---|---|
/bookstore | Root element bookstore |
/bookstore/book | All book elements that are direct children of bookstore |
//book | All book elements anywhere in the document |
//title | All title elements |
//book/title | All title elements that are children of book |
//@lang | All lang attributes |
//book[1] | First book element |
//book[last()] | Last book element |
Predicates (filtering)
Predicates in square brackets filter nodes:
//book[@category='fiction']
→ All book elements where category attribute equals “fiction”
//book[price > 20]
→ All books with price greater than 20
//book[author='Robert C. Martin']
→ Books by Robert C. Martin
//book[@category='tech' and price < 40]
→ Tech books under $40
//title[@lang='en']
→ Titles in English
//book[position() <= 2]
→ First two books
Axes
XPath axes select nodes relative to the context node:
| Axis | Meaning |
|---|---|
child:: | Direct children (default) |
parent:: | Parent element |
ancestor:: | All ancestors (parent, grandparent, …) |
descendant:: | All descendants |
following-sibling:: | Siblings after context node |
preceding-sibling:: | Siblings before context node |
attribute:: | Attributes of context node |
self:: | Context node itself |
//book/following-sibling::book
→ All book elements that come after each book element
//title/parent::book
→ book elements that contain title elements
//book/ancestor::bookstore
→ The bookstore element (ancestor of any book)
Abbreviated syntax:
.=self::node()..=parent::node()@attr=attribute::attr//=/descendant-or-self::node()/
Functions
String functions
//title[contains(text(), 'Clean')]
→ Titles containing “Clean”
//title[starts-with(text(), 'The')]
→ Titles starting with “The”
//book[string-length(title) > 10]
→ Books with titles longer than 10 characters
normalize-space(//title[1])
→ First title with leading/trailing whitespace removed
Numeric functions
count(//book)
→ Number of book elements
sum(//price)
→ Sum of all prices
//book[price = min(//price)]
→ Cheapest book (XPath 2.0)
Node functions
//book[name()='book']
→ Elements named “book” (redundant here but useful dynamically)
//book[not(@category)]
→ Books without a category attribute
//book[@category != 'fiction']
→ Books where category is not fiction (includes books without category)
XPath in JavaScript (browser)
const xml = `<bookstore>
<book category="fiction"><title>Gatsby</title><price>12.99</price></book>
<book category="tech"><title>Clean Code</title><price>34.99</price></book>
</bookstore>`;
const parser = new DOMParser();
const doc = parser.parseFromString(xml, 'application/xml');
// Evaluate XPath:
function xpathQuery(expression, doc) {
const result = doc.evaluate(
expression,
doc,
null, // Namespace resolver
XPathResult.ANY_TYPE,
null
);
const nodes = [];
let node = result.iterateNext();
while (node) {
nodes.push(node);
node = result.iterateNext();
}
return nodes;
}
// Get all book titles:
const titles = xpathQuery('//title/text()', doc);
titles.forEach(t => console.log(t.textContent));
// "Gatsby"
// "Clean Code"
// Get fiction books:
const fiction = xpathQuery('//book[@category="fiction"]', doc);
console.log(fiction.length); // 1
// Get price of tech books:
const techPrices = xpathQuery('//book[@category="tech"]/price/text()', doc);
console.log(techPrices[0].textContent); // "34.99"
XPath in Python (lxml)
from lxml import etree
xml = b'''<bookstore>
<book category="fiction"><title>Gatsby</title><price>12.99</price></book>
<book category="tech"><title>Clean Code</title><price>34.99</price></book>
</bookstore>'''
root = etree.fromstring(xml)
# Simple selection:
titles = root.xpath('//title/text()')
print(titles) # ['Gatsby', 'Clean Code']
# Filter by attribute:
fiction_books = root.xpath('//book[@category="fiction"]')
for book in fiction_books:
print(book.find('title').text) # "Gatsby"
# Numeric result:
total_price = root.xpath('sum(//price)')
print(total_price) # 46.98
count = root.xpath('count(//book)')
print(int(count)) # 2
# With namespaces:
xml_ns = b'''<root xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>My Doc</dc:title>
</root>'''
root_ns = etree.fromstring(xml_ns)
titles = root_ns.xpath('//dc:title/text()',
namespaces={'dc': 'http://purl.org/dc/elements/1.1/'})
print(titles) # ['My Doc']
XPath vs CSS selectors
| Task | XPath | CSS Selector |
|---|---|---|
| All descendants | //div | div |
| Direct child | /div/p | div > p |
| By attribute | //a[@href] | a[href] |
| By attribute value | //a[@class='nav'] | a.nav or a[class='nav'] |
| First element | (//li)[1] | li:first-child |
| Last element | (//li)[last()] | li:last-child |
| Contains text | //p[contains(text(),'hello')] | No equivalent |
| Parent selection | //li/.. | No equivalent |
| Sibling selection | //h2/following-sibling::p[1] | h2 + p |
| Select by text | //button[text()='Submit'] | No equivalent |
XPath is more powerful for data extraction (can select parent nodes, filter by text content). CSS selectors are more concise for styling and simpler DOM traversal.
Related tools
- XML Formatter — format and inspect XML documents
- XML Validator Online — validate XML syntax
- XML Namespace Guide — using XML namespaces
Related posts
- XML Still Matters in 2026 (Here's Where and Why) — JSON won the wire format war years ago, but XML is still everywhere it actually …
- XML Formatter Online — Beautify and Validate XML Instantly — An XML formatter adds proper indentation to minified XML, making it human-readab…
- XML vs JSON in API Design — When to Choose Each Format — JSON has largely replaced XML in REST APIs, but XML still dominates in SOAP, ent…
- XML to JSON Converter — Transform XML Data to JSON — Converting XML to JSON maps elements, attributes, and text nodes to JSON objects…
- XML Validator Online — Check XML Syntax and Structure — An XML validator checks that XML is well-formed (correct syntax) and optionally …
Related tool
Format, validate, and beautify XML documents.
Written by Mian Ali Khalid. Part of the Data & Format pillar.