XPath Introduction and Resources

XPath is a query language used to locate and select content in an XML or HTML document. The language is defined by a W3C standard.

XPath is a more powerful alternative to CSS Selectors. While CSS Selectors are easier to use, but are not always feasible if the document author has omitted common attributes such as id and class. XPath provides a way to specify any node in a DOM tree, even without these attributes. This makes it well-suited for web scraping and document analysis.

The language got a lot of attention when it was introduced in 1999. It is still useful and thriving today, but there aren't as many contemporary tutorials as there might be. So we've collected the best available resources for learning and using XPath.

Tutorials

  • XPath Tutorial from W3Schools is multi-part, in-depth explanation of XPath, with lots of practical examples and a good explanation of how XPath is related to other XML standards.
  • XPath Tutorial and Training is a comprehensive training course on XPath from Altova, the makers of a popular XML editing tool.
  • XPath Overview from Tutorials Point provides a easy to follow introduction to the language.
  • XPath Tutorial is a community-written tutorial on XPath basics from the Edutech Wiki.
  • The 10-Minute XPath Tutorial is a Perl-focused introduction to XPath, primarily geared toward system administrators.
  • XML XPath Tutorial is a Java-based introduction to XPath.
  • XPath Syntax is a highly technical tutorial on XPath, with a focus on .NET implementation, from Microsoft Developer Network.
  • How XPath Works is a Java-focused introduction to XPath, from Oracle.
  • What Can XPath Do for Me? is an introduction to XPath, for the benefit of mostly non-tech-savvy academics working in the humanities. This is a very good place to start if you are using XPath to query documents for scholarly research.
  • XPath Tutorial is a gentle introduction to using XPath, with a focus on searching ebook content.
  • XPath Tutorial Application is an interesting meta-tutorial from Microsoft Developer Network. It helps you learn XPath by showing how to build an application which uses XPath to teach XPath.
  • Learning XPath by Example is a visual tutorial on the language.
  • XPath Tutorial from Tizag is an easy to read introduction to XPath and XML.
  • XPath for Web Scraping explains how to use XPath to programmatically extract content from web pages with Python.
  • PHP Scraping Using DOM and XPath Tutorial explains how to use XPath and PHP to programmatically extract content from web pages.
  • Mozilla Developer Network has a wide array of XPath-related documentation, tutorials, reference materials, and tools.
  • XPath Tutorial for Selenium explains how to use the language with Selenium, a popular browser automation tool used for testing and web scraping.

Reference

Tools

  • Free Online XPath Tester is an online XPath tester that lets you test expressions against online document via URL.
  • XPath Query Expression Tool lets you test queries against XML documents or snippets pasted into a panel in the browser.
  • XPath-Tools is a set of command-line utilities for extracting data from HTML and XML documents.
  • XPath Visualizer is a Windows desktop tool that provides a visual representation of an XML or HTML tree, and the results of XPath queries performed against it.
  • XMLSpy, an XML editor, has a built-in Xpath Editor and Debugger that provides an number of tools for working with XPath, including auto-completion, deep path suggestions, and multi-file evaluations.
  • Stylus Studio has several useful XPath tools, including a visual expression generator that will help you build an XPath query by selecting content within a document. Their XPath tutorials are also worth checking out.

Libraries and Implementations

Books on XPath

  • XPath 2.0 Programmer's Reference (2004), by Michael Kay, is the definitive classic reference work on XPath.
  • Definitive XSLT and XPath (2001), by G Ken Holman, is the authoritative guide to XPath and XSLT. It is highly technical and also provides much of the philosophical and theoretical background to how XML is designed and what is actually contained in the specifications. There are easier books for learning how to use XPath, but few that will help you really understand it in this much depth.
  • Python and XML (2001), by Jones and Drake, includes sections on using Python to query and manipulate XML documents via XPath.
  • XPath Kick Start: Navigating XML with XPath 1.0 and 2.0 (2003), by Steven Holzner, is a concise book designed for beginners.
  • XPath and XPointer: Locating Content in XML Documents (2002), by John Simpson, is a relatively short book covering XPath basics. It's speculation on the future of the standard, from its 2002 vantage point, is a little dated now, but the primary content is still highly relevant.
  • XSLT and XPath On The Edge (2001), by Jeni Tennison, is a cookbook style reference manual with tons of highly useful example queries.
  • Beginning XSLT and XPath: Transforming XML Documents and Data (2009), by Ian Williams, is a very good introduction to using XSLT and XPath. Written a few years later than most other popular books on the topic, this book has the benefit of several years of experience with the standard.

Summary

XPath may not seem trendy right now. When it was first released, most people thought XML was going to become the standard language for web markup. But HTML 5 broke away from strict XML, and JSON has displaced XML as the dominant data serialization format.

However, XPath is as relevant as ever. It is still the most reliable way to query information in an XML (or HTML) document, and is the basis for XSLT. If you're interested in web scraping, web search and indexing, or document analysis, XPath continues to be an important skill.


Further Reading and Resources

We have more guides, tutorials, and infographics related to coding and development:

What Code Should You Learn?

Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, "How much money will I make programming Java for a living?"