XML Development: Tutorials and Beyond
XML is short for Extensible Markup Language. It is a highly structured markup language that is designed to be both human and machine readable. But XML is not a language in the way that HTML is a language. XML has no tags like
Instead, XML allows the coder to create any tags at all. And, more important, it allows those tags to be related to each other. So XML allows you to store data in a powerful way. But it doesn't provide any information on what ought to be done with that data. That's where XML based languages come in — things like: XHTML, RSS, and SOAP. It is also a common way that programs like word processors and spreadsheets can save data in an application independent way.
A Brief History of Markup Languages
Markup languages started as a way to combine the best elements of text files (readability of data) and binary files (precise description of data). So in the late 1980s, the Standard Generalized Markup Language (SGML) was created. It was a text-base language that allowed data and its display to be precisely described. HTML was a very simple system that was based on SGML.
But when HTML became hugely popular as the basis of the world wide web, it became apparent that something better was needed. HTML was limited and not well formatted so that browsers had to parse all kinds of code. For example, closing tags were often omitted and tag attributes were not placed inside quotation marks. Remember code like this?
<ul type=square> <li>Bugs Bunny <li>Daffy Duck <li>Foghorn Leghorn </ul>
Poorly structured HTML couldn't be replaced with SGML, because it is ridiculously complicated. It would have been something like replacing HTML with PostScript. So in the mid-1990s, work began on XML. It is a subset of SGML that allows coders to describe data and its relationships. And with the use of style sheets, it can be used to format and transmit data in almost any way imaginable. But unlike SGML, writing parsing programs for it is fairly simple. And in early 1998, the W3C released the first XML standard.
Why Use XML?
This may all sounds kind of abstract. After all, regardless of how powerful XML is at storing data, how does a web browser display anything but a list of data? But that's the point. The big problem with HTML in the early days was that data and layout information were scattered throughout a document. Remember when any kind of page layout had to be done with tables, making HTML code almost unreadable? Today, we use style sheets to separate the layout code from the information presented. Thus, once the layout is completed, it is a simple matter to maintain and add data.
But XML is not a replacement for HTML. In the most general system, XML is a kind of human readable database. But it can be turned into an HTML webpage (And a whole lot more!) by using another took, the Extensible Stylesheet Language Transformations (or XSLT). It converts XML documents into other XML documents — for example: XHTML documents. But even more interestingly, XML is used for things like RSS and SOAP.
A Basic Example
Let's start with a very basic example of how data is entered into an XML file.
<?xml version="1.0" ?> <cartoon_characters> <character> <name>Bullwinkle</name> <intelligence>2</intelligence> <luck>10</luck> </character> <character> <name>Boris Badenov</name> <intelligence>4</intelligence> <luck>0</luck> </character> </cartoon_characters>
Notice that none of these tags are defined by XML. They are defined by the coder. What XML does know (and this is critical) is that
character is a kind of
cartoon_characters and that each
character has characteristics
luck. Other characteristics (like species) as well as more characters (like Wrongway Peachfuzz) could be added and it wouldn't affect any XML parser.
We can take this a step further by creating an XSL transformation file that will create an XHTML file that displays the characters names in an unordered list. First, we would have to add an extra line of code to the previous XML code, right after the first line that defines the file as XML. It would look like this:
<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="bullwinkle.xsl"?> <cartoon_characters> . . .
Next, create an XSL file with the name "bullwinkle.xsl":
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="xml" indent="yes" encoding="UTF-8"/> <xsl:template match="/"> <html> <head> <title>Rocky and Bullwinkle Show</title> </head> <body> <h1>Cartoon Characters</h1> <ul> <xsl:for-each select="cartoon_characters/character"> <li><xsl:value-of select="name"/></li> </xsl:for-each> </ul> </body> </html> </xsl:template> </xsl:stylesheet>
Then load the original XML file, and it will display just like an XHTML file.
You can experiment with these files to get a better idea of what's going on. But what's most important is that you can leave the XSL file alone, while you add more and more data to the XML file.
XML is a huge subject. We've just dipped a toe into some very deep waters. Wikipedia lists roughly 200 XML languages. These include things like XHTML, of course. But they also include closely related XML tools like XML Encryption (for data encryption) and XML Signature (for digital signatures). But more than that, there are various important aspects to the language:
- Namespaces: a way to allow different datasets to exist in a single XML file without naming conflicts.
- Document Type Definitions: the dreaded DTD that website coders normally just copy and paste into their documents without understanding.
- Schema: a way of structuring an XML document to limit how it is used.
- Database: a non-SQL approach to database storage. There are a number different ones available.
There is an amazing amount of XML related material online. In fact, there is so much that it is overwhelming. As a result, we've tried to stick to just the core XML topics. But you will find links here that will answer just about any question you will ever have in your XML coding career.
- Introduction to XML: the basic W3 Schools introduction to XML — easy to understand with lots of examples
- XML Basics - An Introduction to XML: an old introduction, but one that takes you a long way with some advanced examples.
- Møller and Schwartzbach XML Tutorial: a basic but very broad introduction on XML.
- XML Master Basic Edition: a certification oriented tutorial that is very clear.
- Webucator's XML Free Tutorial: a detailed tutorial — an excellent choice after you run through one of the more simple tutorials.
- The Skew.org XML Tutorial: another advanced tutorial.
- XML Tutorial for Beginners: Portnov Computer School's introduction to XLM.
- XML with Java: a free online course consisting of 13 video lectures by David J Malan.
- Computer Science E-75 Lecture 3: from the Harvard extension course "Building Dynamic Websites." This lecture focuses on XML. In less than two hours, it provides everything you need to know to create your own XML based webpages. Note: it assumes knowledge of PHP.
- W3C XML Page: everything about XML — especially upcoming events.
- W3C Archive: lots of recommendations and group notes. The general page has links to information about other XML related topics.
- Annotated XML 1.0 Specification: the raw specification can be hard to get through, but this version provides extra history, technical details, advice, and a whole lot more.
- Extensible Markup Language Frequently Asked Questions: a very basic FAQ for quick answers.
- The XML FAQ: a great collection of questions and answers about basic and advanced aspects of XML.
- XML Namespaces FAQ: detailed questions and answers about namespaces.
- XML and Databases: Ronald Bourret's thorough introduction on XML Databases. It includes an exhaustive list of links. Many of them are dead, but can be found on the Internet Archive.
- The Skew.org XML & XSLT Resources: mostly a bunch of XSLT examples, but other information as well, including its excellent list of links to all things XML related.
Given what a large subject XML is, it can be really helpful to have a book or two: to learn and for reference.
- Beginning XML, 4th Edition by David Hunter and Jeff Rafter: excellent introduction with detailed sections on things like RSS and SOAP.
- Learning XML, Second Edition by Erik Ray: a thorough introduction to XML.
- Beginning XML by Fawcett and Ayers: a basic introduction to XML.
- XML in a Nutshell by Harold and Means: a classic, but out of print and generally expensive. But you might be able to find a copy at a yard sale.
- XML Pocket Reference by St Laurent and Fitzgerald: just what it says — a booklet you can keep in your shirt pocket for reference.
- XML: The Complete Reference by Heather Williamson: an old thousand page reference; good to have around.
XML Coding Tools
- Altova XMLSpy: a complete XML integrated development environment for Microsoft Windows. It's fairly expensive, but for the professional developer, a good investment.
- <oXygen/> XML Editor: more than an editor, it provides debugging, profiling, and other tools. It is Java-based and so will run on any platform. It is also expensive, although it has reasonably priced academic and personal licenses available.
- Stylus Studio: a Microsoft Windows based XML development suite including editor and XSLT visual mapping tool. It is fairly expensive, but offers a reasonably priced home edition.
- EditiX XML Edit: a reasonably priced editor, debugger, and so on. It also offers a free EditiX Lite Version.
- Wikipedia's List of XML Editors: there are lots of editors available from open source to proprietary to web based.
XML is a great tool in part because it is highly standardized. This means that it is picky. So it is critical that you make sure that your code is valid XML. Many of the tools that we've highlighted here contain their own XML validators. But there are plenty of free XML validators to help you with your coding projects.
- The W3C Markup Validation Service: a general tool that allows you to validate by URI, file upload, and direct input.
- W3 Schools XML Validator: an easy to use, online validator.
- XML Validation: a simple online validator that allows direct input or file upload.
- Code Beautify XML Validator: a simple validator that also formats your code so that it is easy to read.
- XML Check: a standalone Windows XML validator.
- XML Schema Validator: a validator for your XML and schema definition.
XML and the Document Object Model
Because of XML's powerful uses by HTML, understanding how XML relates to the Document Object Model (DOM) is critical.
XML and HTML
The first time you heard of XML, you might have though of XML as an alternative to Hypertext Markup Language (HTML). While we know XML can be used in that way, that's pretty unusual. It's greatest use is pulling data into an HTML document.
Let's look at a conceptual example.
Are you starting to see the power of XML? With this arrangement, the data displayed on a webpage can be updated dynamically by updating the referenced XML file, much in the same way a database can be used to update the contents of a webpage.
What is the Document Object Model?
Conceptualizing the XML DOM
The contents of the XML DOM can be manipulated with scripting. However, we have to understand the relationships between XML DOM elements, called nodes, before we can do anything with them.
Let's look at a simplified version of our earlier XML example code:
<?xml version="1.0" encoding="UTF-8"?> <pets> <pet> <name>Max</name> <type>Dog</type> <birthday>July 7, 2014</birthday> </pet> </pets>
The XML DOM is built of nodes. Every part of the XML DOM is a node.
- Document node: The entire contents of the XML document represent the document node.
- Root node: The first element in an XML document is called the root node. In this case, the root node is
- Parent and child nodes: The terms parent and child are used to describe the relationship between DOM elements and the elements nested within them. In our sample code, The
<pets>element node is the parent of the
<pet>node, and the
<pet>node has three children: name, type, and birthday. Every node in an XML document, except for the root node, has exactly one parent node and may have any number of children nodes.
- Sibling nodes: When two nodes are both the children of the same parent they are referred to as sibling nodes. In our example, name, type, and birthday are sibling nodes.
- Text node: The text contained within a element is defined as a text node within the XML DOM. This is an important distinction. If we want to get at the text in a text node, we need to refer to it as the value of the text node, not the value of the child node. In other words, the path to the text "Max" looks like this: pets > pet > name > text node > value:"Max"
Manipulating the XML DOM
- nodeValue: Gets the value contained within the node.
- parentNode: References the parent node. If were were to apply this property to the name node in our sample XML, we would be referring to the pet node.
- childNodes: References a node's children. If applied to the pet node in our code above, this property would return the name, type, and birthday nodes.
- appendChild: This method is used to add child nodes to a node.
- removeChild: Remove a node from a parent node. Keep in mind that the data will remain in the original XML file, it's just removed from the DOM built by the browser.
There seems to be an endless number of online tutorials, and some are much better than others. After looking at dozens of XML DOM tutorials, we think the following tutorials will get you up-to-speed the fastest.
- W3Schools: XML Tutorial and XML DOM Tutorial.
- Microsoft Developer Network: A Beginner's Guide to the XML DOM
- Sitepoint: A Really, Really, Really Good Introduction to XML, a 25,000 word long intro to XML that covers DOM manipulation. This tutorial is actually the first four chapters of a much longer book called No Nonsense XML Web Development With PHP by Thomas Myer.
- Tutorials Point: XML DOM Tutorial.
- Mozilla Developer Network: Introduction to the DOM.
If you prefer a learning format that offers a bit more structure than a tutorial, you might be interested in one of the following online courses that cover XML and the XML DOM.
XML has been around for a long time. As a result, many XML texts have been written over the years. Below are some modern XML titles that cover the XML DOM and are highly-rated by readers:
- Beginning XML by Fawcett, et al.
- New Perspectives on HTML, CSS, and XML by Patrick Carey
- XML Programming Success in a Day by Sam Key
- XML in a Nutshell by Harold and Means
- Beginning XML with DOM and Ajax by Sas Jacobs.
Serious HTML Coders Should Know XML
XML is a powerful and simple language for transporting data in a format that can be used in many different ways. The XML DOM is the model built by the browser to interact with and manipulate XML data. Once you understand how to work with the XML DOM you'll be able to get, change, and style XML data for use in webpages and applications.
MSXML: Microsoft's XML
Microsoft XML Core services (MSXML) is a set of Microsoft tools and services for the creation of XML-based applications using Microsoft development tools.
MSXML is actually a set of World Wide Web Consortium (W3C) compliant application programming interfaces (APIs), widely used by countless software developers.
Brief MSXML History
Over the years, MSXML has gone through numerous updates and releases, usually being released alongside other Microsoft products like Internet Explorer or Microsoft Office.
- MSXML 1.0 was released in 1997 and shipped with Internet Explorer 4.0.
- MSXML 2.0a was released in 1999 and shipped with Internet Explorer 5.0.
- MSXML 2.5 was released in 2000 and shipped with Windows 2000, Internet Explorer 5.01, and MDAC 2.5.
- MSXML 2.6 was released in 2000 and shipped with Microsoft SQL Server 2000 and MDAC 2.6.
- MSXML 3.0 was released in 2001 and shipped with Windows XP, Internet Explorer 6.0, and MDAC 2.7.
- MSXML 4.0 was released in 2001 as an independent software development kit (SDK).
- MSXML 5.0 was released in 2003 and shipped with Microsoft Office 2003 and Office 2007.
- MSXML 6.0 was released in 2005 and shipped with Microsoft SQL Server 2005, Visual Studio 2005, .NET Framework 3.0, Windows Vista, Windows 7, and Windows XP Service Pack 3.
MSXML versions 1.0, 2.0a, 2.5, 2.6, and 4.0 are obsolete and deprecated, while versions 3.0, 5.0 and 6.0 continue to be supported by Microsoft.
MSXML is the native Windows API for XML-based applications, conforming to the XML 1.0 standard.
Some of the services provided by MSXML include the Document Object Model (DOM) — a library for accessing XML documents; the Simple API for XML (SAX) — a programmatic alternative to DOM processing; XMLHttpRequest and Server XMLHTTPRequest for implementing AJAX and RESTful applications; use of XPath 1.0 queries over DOM documents; XML transformations using XSLT 1.0; and support for the XSD 1.0 specification with the XmlSchemaCache.
All new applications should be written to comply with MSXML 6.0, the latest version of MSXML, or XmlLite, a lightweight XML parser for native code projects.
MSXML services are programmatically exposed as Object Linking and Embedding (OLE) automation components, and can be used by developers using C, C++ native programming languages, or Jscript and VBScript active scripting languages.
Use of MSXML Component Object Model (COM) components is not recommended or supported if you are writing managed code targeting .NET Framework in C#, Visual Basic, managed C++ or any other managed programming language. MSXML uses specific threading modes and garbage collection routines that are not compatible with the .NET Framework. XML functionality should be implemented in .NET applications using classes from the System.Xml namespace or the LINQ to XML, both native to the .NET framework. Using MSXML in .NET applications through COM interoperability can result in unexpected problems that are difficult to debug.
MSXML is often used in processing XML in web applications, or as a standalone process using the Document Object Model (DOM). DOM and Simple API to XML (SAX2) can be utilized in any programming language that is capable of using ActiveX or COM objects.
Should I Use and Learn MSXML?
If your programming work revolves around applications using the .NET Framework, you do not need to worry about MSXML, since using it in .NET projects is not recommended.
On the other hand, if you work on native code or scripting programming language projects that interact with XML, you will probably be using MSXML or its lightweight alternative, XmlLite.
Many open-source alternatives to MSXML exist, for example NativeXML, but you must choose an alternative that will support your programming language.
If you work on programs that interact with XML, and those programs do not rely on the .NET Framework, you should take a look at the following resources on MSXML:
- The Microsoft Developer Network MSXML Documentation section provides a full overview and documentation for MSXML.
- The Microsoft Developer Network Learn MSXML section provides useful resources like a beginner's guide, tutorials, and a user forum covering the use of MSXML. You can also download MSXML6 from this page.
Books that cover MSXML specifically are quite rare, in part due to the fact that there are ample MSXML resources available online. Also, many books about scripting language programming have chapters on MSXML. In some cases, these chapters are quite comprehensive and in-depth, while others merely offer a basic overview of MSXML.
- XML Application Development with Msxml 4.0 (2002) by Ayers, et al: this book covers MSXML 4.0, now considered obsolete. Despite that, readers may find many examples useful.
Should You Invest Time in Learning MSXML?
While MSXML has not been deprecated, and is still in widespread use, its long-term relevance is up for debate. Development has slowed down to a crawl and MSXML's time has obviously come and gone.
However, MSXML is still used in many projects, although the range of its potential applications is dwindling. For starters, it should not be used with .NET Framework. It's not the only way of ensuring XML interaction, either. Various open-source alternatives are available, but dealing with each one of them was beyond the scope of this article.
In case you still want to master MSXML, or merely brush up your old skills, you might have a hard time finding fresh resources. A lot of MSXML resources, especially books and other print resources, are woefully outdated and cover deprecated versions of MSXML. This does not render them useless, but it does limit their usefulness and forces you to double-check much of what you read, just to make sure it applies to MSXML 6.0.
MSXML 6.0 was released more than a decade ago, and while Microsoft still supports it (technically), it's obvious that the end of the road for MSXML is near.
XML itself is pretty straightforward. For an experienced XHTML coder, it can seem almost trivial. But there are so many related technologies and so much that can be done with it that you could spend the rest of your life doing nothing else. We've just scratched the surface here.
Other Interesting Stuff
We have more guides, tutorials, and infographics related to coding and development:
- Microsoft Visual Basic / Visual Studio: this is our basic primer on Visual Studio with a focus on Visual Basic.
- HTML for Beginners: this article will take you from the very star. But given it is book-length, there's lots that experienced coders can learn.
- C# Resources: as one of the most popular languages in the .NET firmament, C# is very helpful to know.
What Code Should You Learn?
Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, "How much money will I make programming Java for a living?"