XPath Introduction and Resources
XPath is a query language used to locate and select content in an XML or HTML document. The language is defined by a W3C standard.
XPath is a more powerful alternative to CSS Selectors. While CSS Selectors are easier to use, but are not always feasible if the document author has omitted common attributes such as
class. XPath provides a way to specify any node in a DOM tree, even without these attributes. This makes it well-suited for web scraping and document analysis.
The language got a lot of attention when it was introduced in 1999. It is still useful and thriving today, but there aren't as many contemporary tutorials as there might be. So we've collected the best available resources for learning and using XPath.
- XPath Tutorial from W3Schools is multi-part, in-depth explanation of XPath, with lots of practical examples and a good explanation of how XPath is related to other XML standards.
- XPath Tutorial and Training is a comprehensive training course on XPath from Altova, the makers of a popular XML editing tool.
- XPath Overview from Tutorials Point provides a easy to follow introduction to the language.
- XPath Tutorial is a community-written tutorial on XPath basics from the Edutech Wiki.
- The 10-Minute XPath Tutorial is a Perl-focused introduction to XPath, primarily geared toward system administrators.
- XML XPath Tutorial is a Java-based introduction to XPath.
- XPath Syntax is a highly technical tutorial on XPath, with a focus on .NET implementation, from Microsoft Developer Network.
- How XPath Works is a Java-focused introduction to XPath, from Oracle.
- What Can XPath Do for Me? is an introduction to XPath, for the benefit of mostly non-tech-savvy academics working in the humanities. This is a very good place to start if you are using XPath to query documents for scholarly research.
- XPath Tutorial is a gentle introduction to using XPath, with a focus on searching ebook content.
- XPath Tutorial Application is an interesting meta-tutorial from Microsoft Developer Network. It helps you learn XPath by showing how to build an application which uses XPath to teach XPath.
- Learning XPath by Example is a visual tutorial on the language.
- XPath Tutorial from Tizag is an easy to read introduction to XPath and XML.
- XPath for Web Scraping explains how to use XPath to programmatically extract content from web pages with Python.
- PHP Scraping Using DOM and XPath Tutorial explains how to use XPath and PHP to programmatically extract content from web pages.
- Mozilla Developer Network has a wide array of XPath-related documentation, tutorials, reference materials, and tools.
- XPath Tutorial for Selenium explains how to use the language with Selenium, a popular browser automation tool used for testing and web scraping.
- XSLT and XPath Quick Reference (PDF) is a pocket reference, designed to be printed, folded, and carried with you.
- XPath Examples is an index of example XPath expressions, with explanations.
- XPath Locator Examples is a cookbook-style collection of XPath examples that can help you build complex and powerful XPath queries.
- Free Online XPath Tester is an online XPath tester that lets you test expressions against online document via URL.
- XPath Query Expression Tool lets you test queries against XML documents or snippets pasted into a panel in the browser.
- XPath-Tools is a set of command-line utilities for extracting data from HTML and XML documents.
- XPath Visualizer is a Windows desktop tool that provides a visual representation of an XML or HTML tree, and the results of XPath queries performed against it.
- XMLSpy, an XML editor, has a built-in Xpath Editor and Debugger that provides an number of tools for working with XPath, including auto-completion, deep path suggestions, and multi-file evaluations.
- Stylus Studio has several useful XPath tools, including a visual expression generator that will help you build an XPath query by selecting content within a document. Their XPath tutorials are also worth checking out.
Libraries and Implementations
- XPath module for the Gnome XML C Parser provides XPath support in C, and also has bindings to Python, Perl, C++, PHP, Pascal, Ruby, and Tcl.
- XPathTool is a Java class that provides access to XPath nodes.
- XPath Library for the OCaml programming language.
- XPath gem provides XPath support in Ruby.
- Xpath npm package provides Xpath implementation and helpers for Node.js.
- LuaXPath is a simple XPath implementation in the Lua programming language.
Books on XPath
- XPath 2.0 Programmer's Reference (2004), by Michael Kay, is the definitive classic reference work on XPath.
- Definitive XSLT and XPath (2001), by G Ken Holman, is the authoritative guide to XPath and XSLT. It is highly technical and also provides much of the philosophical and theoretical background to how XML is designed and what is actually contained in the specifications. There are easier books for learning how to use XPath, but few that will help you really understand it in this much depth.
- Python and XML (2001), by Jones and Drake, includes sections on using Python to query and manipulate XML documents via XPath.
- XPath Kick Start: Navigating XML with XPath 1.0 and 2.0 (2003), by Steven Holzner, is a concise book designed for beginners.
- XPath and XPointer: Locating Content in XML Documents (2002), by John Simpson, is a relatively short book covering XPath basics. It's speculation on the future of the standard, from its 2002 vantage point, is a little dated now, but the primary content is still highly relevant.
- XSLT and XPath On The Edge (2001), by Jeni Tennison, is a cookbook style reference manual with tons of highly useful example queries.
- Beginning XSLT and XPath: Transforming XML Documents and Data (2009), by Ian Williams, is a very good introduction to using XSLT and XPath. Written a few years later than most other popular books on the topic, this book has the benefit of several years of experience with the standard.
XPath may not seem trendy right now. When it was first released, most people thought XML was going to become the standard language for web markup. But HTML 5 broke away from strict XML, and JSON has displaced XML as the dominant data serialization format.
However, XPath is as relevant as ever. It is still the most reliable way to query information in an XML (or HTML) document, and is the basis for XSLT. If you're interested in web scraping, web search and indexing, or document analysis, XPath continues to be an important skill.
Further Reading and Resources
We have more guides, tutorials, and infographics related to coding and development:
- XML Resources and Validators: learn all about XML itself.
- MSXML Introduction and Resources: this will get you going with Microsoft XML Core services (MSXML), which will help you build XML-based applications.
- Composing Good HTML: learn to write code all browsers on all devices will be able to display properly.
What Code Should You Learn?
Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, "How much money will I make programming Java for a living?"