X Xerobit

XPath Tutorial — Query XML Documents with Path Expressions

XPath is a query language for selecting nodes from XML documents. It works like CSS selectors for HTML but for XML. Here's XPath syntax, axes, predicates, and functions with...

Mian Ali Khalid · · 7 min read
Use the tool
XML Formatter
Format, validate, and beautify XML documents.
Open XML Formatter →

XPath (XML Path Language) is a query language for selecting nodes from XML documents. It uses path-like syntax to navigate the element tree, similar to file system paths or CSS selectors — but with more expressive filtering via predicates and functions.

Use the XML Formatter to inspect and work with XML structures.

XPath fundamentals

Given this XML:

<bookstore>
  <book category="fiction">
    <title lang="en">The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <price>12.99</price>
  </book>
  <book category="tech">
    <title lang="en">Clean Code</title>
    <author>Robert C. Martin</author>
    <price>34.99</price>
  </book>
  <book category="fiction">
    <title lang="fr">Le Petit Prince</title>
    <author>Antoine de Saint-Exupéry</author>
    <price>9.99</price>
  </book>
</bookstore>

Basic XPath expressions:

ExpressionSelects
/bookstoreRoot element bookstore
/bookstore/bookAll book elements that are direct children of bookstore
//bookAll book elements anywhere in the document
//titleAll title elements
//book/titleAll title elements that are children of book
//@langAll lang attributes
//book[1]First book element
//book[last()]Last book element

Predicates (filtering)

Predicates in square brackets filter nodes:

//book[@category='fiction']

→ All book elements where category attribute equals “fiction”

//book[price > 20]

→ All books with price greater than 20

//book[author='Robert C. Martin']

→ Books by Robert C. Martin

//book[@category='tech' and price < 40]

→ Tech books under $40

//title[@lang='en']

→ Titles in English

//book[position() <= 2]

→ First two books

Axes

XPath axes select nodes relative to the context node:

AxisMeaning
child::Direct children (default)
parent::Parent element
ancestor::All ancestors (parent, grandparent, …)
descendant::All descendants
following-sibling::Siblings after context node
preceding-sibling::Siblings before context node
attribute::Attributes of context node
self::Context node itself
//book/following-sibling::book

→ All book elements that come after each book element

//title/parent::book

book elements that contain title elements

//book/ancestor::bookstore

→ The bookstore element (ancestor of any book)

Abbreviated syntax:

  • . = self::node()
  • .. = parent::node()
  • @attr = attribute::attr
  • // = /descendant-or-self::node()/

Functions

String functions

//title[contains(text(), 'Clean')]

→ Titles containing “Clean”

//title[starts-with(text(), 'The')]

→ Titles starting with “The”

//book[string-length(title) > 10]

→ Books with titles longer than 10 characters

normalize-space(//title[1])

→ First title with leading/trailing whitespace removed

Numeric functions

count(//book)

→ Number of book elements

sum(//price)

→ Sum of all prices

//book[price = min(//price)]

→ Cheapest book (XPath 2.0)

Node functions

//book[name()='book']

→ Elements named “book” (redundant here but useful dynamically)

//book[not(@category)]

→ Books without a category attribute

//book[@category != 'fiction']

→ Books where category is not fiction (includes books without category)

XPath in JavaScript (browser)

const xml = `<bookstore>
  <book category="fiction"><title>Gatsby</title><price>12.99</price></book>
  <book category="tech"><title>Clean Code</title><price>34.99</price></book>
</bookstore>`;

const parser = new DOMParser();
const doc = parser.parseFromString(xml, 'application/xml');

// Evaluate XPath:
function xpathQuery(expression, doc) {
  const result = doc.evaluate(
    expression,
    doc,
    null,  // Namespace resolver
    XPathResult.ANY_TYPE,
    null
  );
  
  const nodes = [];
  let node = result.iterateNext();
  while (node) {
    nodes.push(node);
    node = result.iterateNext();
  }
  return nodes;
}

// Get all book titles:
const titles = xpathQuery('//title/text()', doc);
titles.forEach(t => console.log(t.textContent));
// "Gatsby"
// "Clean Code"

// Get fiction books:
const fiction = xpathQuery('//book[@category="fiction"]', doc);
console.log(fiction.length);  // 1

// Get price of tech books:
const techPrices = xpathQuery('//book[@category="tech"]/price/text()', doc);
console.log(techPrices[0].textContent);  // "34.99"

XPath in Python (lxml)

from lxml import etree

xml = b'''<bookstore>
  <book category="fiction"><title>Gatsby</title><price>12.99</price></book>
  <book category="tech"><title>Clean Code</title><price>34.99</price></book>
</bookstore>'''

root = etree.fromstring(xml)

# Simple selection:
titles = root.xpath('//title/text()')
print(titles)  # ['Gatsby', 'Clean Code']

# Filter by attribute:
fiction_books = root.xpath('//book[@category="fiction"]')
for book in fiction_books:
    print(book.find('title').text)  # "Gatsby"

# Numeric result:
total_price = root.xpath('sum(//price)')
print(total_price)  # 46.98

count = root.xpath('count(//book)')
print(int(count))  # 2

# With namespaces:
xml_ns = b'''<root xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>My Doc</dc:title>
</root>'''
root_ns = etree.fromstring(xml_ns)
titles = root_ns.xpath('//dc:title/text()', 
                       namespaces={'dc': 'http://purl.org/dc/elements/1.1/'})
print(titles)  # ['My Doc']

XPath vs CSS selectors

TaskXPathCSS Selector
All descendants//divdiv
Direct child/div/pdiv > p
By attribute//a[@href]a[href]
By attribute value//a[@class='nav']a.nav or a[class='nav']
First element(//li)[1]li:first-child
Last element(//li)[last()]li:last-child
Contains text//p[contains(text(),'hello')]No equivalent
Parent selection//li/..No equivalent
Sibling selection//h2/following-sibling::p[1]h2 + p
Select by text//button[text()='Submit']No equivalent

XPath is more powerful for data extraction (can select parent nodes, filter by text content). CSS selectors are more concise for styling and simpler DOM traversal.


Related posts

Related tool

XML Formatter

Format, validate, and beautify XML documents.

Written by Mian Ali Khalid. Part of the Data & Format pillar.