XML Development: Tutorials and Beyond

XML is short for Extensible Markup Language. It is a highly structured markup language that is designed to be both human and machine readable. But XML is not a language in the way that HTML is a language. XML has no tags like <p>.

Instead, XML allows the coder to create any tags at all. And, more important, it allows those tags to be related to each other. So XML allows you to store data in a powerful way. But it doesn't provide any information on what ought to be done with that data. That's where XML based languages come in — things like: XHTML, RSS, and SOAP. It is also a common way that programs like word processors and spreadsheets can save data in an application independent way.

Using XML

A Brief History of Markup Languages

Markup languages started as a way to combine the best elements of text files (readability of data) and binary files (precise description of data). So in the late 1980s, the Standard Generalized Markup Language (SGML) was created. It was a text-base language that allowed data and its display to be precisely described. HTML was a very simple system that was based on SGML.

But when HTML became hugely popular as the basis of the world wide web, it became apparent that something better was needed. HTML was limited and not well formatted so that browsers had to parse all kinds of code. For example, closing tags were often omitted and tag attributes were not placed inside quotation marks. Remember code like this?


<ul type=square>
<li>Bugs Bunny
<li>Daffy Duck
<li>Foghorn Leghorn
</ul>

Enter XML

Poorly structured HTML couldn't be replaced with SGML, because it is ridiculously complicated. It would have been something like replacing HTML with PostScript. So in the mid-1990s, work began on XML. It is a subset of SGML that allows coders to describe data and its relationships. And with the use of style sheets, it can be used to format and transmit data in almost any way imaginable. But unlike SGML, writing parsing programs for it is fairly simple. And in early 1998, the W3C released the first XML standard.

Why Use XML?

This may all sounds kind of abstract. After all, regardless of how powerful XML is at storing data, how does a web browser display anything but a list of data? But that's the point. The big problem with HTML in the early days was that data and layout information were scattered throughout a document. Remember when any kind of page layout had to be done with tables, making HTML code almost unreadable? Today, we use style sheets to separate the layout code from the information presented. Thus, once the layout is completed, it is a simple matter to maintain and add data.

But XML is not a replacement for HTML. In the most general system, XML is a kind of human readable database. But it can be turned into an HTML webpage (And a whole lot more!) by using another took, the Extensible Stylesheet Language Transformations (or XSLT). It converts XML documents into other XML documents — for example: XHTML documents. But even more interestingly, XML is used for things like RSS and SOAP.

A Basic Example

Let's start with a very basic example of how data is entered into an XML file.


<?xml version="1.0" ?>
<cartoon_characters>
  <character>
    <name>Bullwinkle</name>
    <intelligence>2</intelligence>
    <luck>10</luck>
  </character>
  <character>
    <name>Boris Badenov</name>
    <intelligence>4</intelligence>
    <luck>0</luck>
  </character>
</cartoon_characters>

Notice that none of these tags are defined by XML. They are defined by the coder. What XML does know (and this is critical) is that character is a kind of cartoon_characters and that each character has characteristics name, intelligence, and luck. Other characteristics (like species) as well as more characters (like Wrongway Peachfuzz) could be added and it wouldn't affect any XML parser.

We can take this a step further by creating an XSL transformation file that will create an XHTML file that displays the characters names in an unordered list. First, we would have to add an extra line of code to the previous XML code, right after the first line that defines the file as XML. It would look like this:


<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="bullwinkle.xsl"?>
<cartoon_characters>
.
.
.

Next, create an XSL file with the name "bullwinkle.xsl":


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:template match="/">
  <html>
    <head> <title>Rocky and Bullwinkle Show</title> </head>
    <body>
      <h1>Cartoon Characters</h1>
      <ul>
	<xsl:for-each select="cartoon_characters/character">
	  <li><xsl:value-of select="name"/></li>
        </xsl:for-each>
      </ul>
    </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Then load the original XML file, and it will display just like an XHTML file.

You can experiment with these files to get a better idea of what's going on. But what's most important is that you can leave the XSL file alone, while you add more and more data to the XML file.

There's More

XML is a huge subject. We've just dipped a toe into some very deep waters. Wikipedia lists roughly 200 XML languages. These include things like XHTML, of course. But they also include closely related XML tools like XML Encryption (for data encryption) and XML Signature (for digital signatures). But more than that, there are various important aspects to the language:

  • Namespaces: a way to allow different datasets to exist in a single XML file without naming conflicts.
  • Document Type Definitions: the dreaded DTD that website coders normally just copy and paste into their documents without understanding.
  • Schema: a way of structuring an XML document to limit how it is used.
  • Database: a non-SQL approach to database storage. There are a number different ones available.

Online Resources

There is an amazing amount of XML related material online. In fact, there is so much that it is overwhelming. As a result, we've tried to stick to just the core XML topics. But you will find links here that will answer just about any question you will ever have in your XML coding career.

Tutorials

Video Tutorials

  • XML Tutorial for Beginners: Portnov Computer School's introduction to XLM.
  • XML with Java: a free online course consisting of 13 video lectures by David J Malan.
  • Computer Science E-75 Lecture 3: from the Harvard extension course "Building Dynamic Websites." This lecture focuses on XML. In less than two hours, it provides everything you need to know to create your own XML based webpages. Note: it assumes knowledge of PHP.

Data Sources

Advanced Topics

Books

Given what a large subject XML is, it can be really helpful to have a book or two: to learn and for reference.

Learning XML

XML Reference

  • XML in a Nutshell by Harold and Means: a classic, but out of print and generally expensive. But you might be able to find a copy at a yard sale.
  • XML Pocket Reference by St Laurent and Fitzgerald: just what it says — a booklet you can keep in your shirt pocket for reference.
  • XML: The Complete Reference by Heather Williamson: an old thousand page reference; good to have around.

XML Coding Tools

  • Altova XMLSpy: a complete XML integrated development environment for Microsoft Windows. It's fairly expensive, but for the professional developer, a good investment.
  • <oXygen/> XML Editor: more than an editor, it provides debugging, profiling, and other tools. It is Java-based and so will run on any platform. It is also expensive, although it has reasonably priced academic and personal licenses available.
  • Stylus Studio: a Microsoft Windows based XML development suite including editor and XSLT visual mapping tool. It is fairly expensive, but offers a reasonably priced home edition.
  • EditiX XML Edit: a reasonably priced editor, debugger, and so on. It also offers a free EditiX Lite Version.
  • Wikipedia's List of XML Editors: there are lots of editors available from open source to proprietary to web based.

XML Validators

XML is a great tool in part because it is highly standardized. This means that it is picky. So it is critical that you make sure that your code is valid XML. Many of the tools that we've highlighted here contain their own XML validators. But there are plenty of free XML validators to help you with your coding projects.

  1. The W3C Markup Validation Service: a general tool that allows you to validate by URI, file upload, and direct input.
  2. W3 Schools XML Validator: an easy to use, online validator.
  3. XML Validation: a simple online validator that allows direct input or file upload.
  4. Code Beautify XML Validator: a simple validator that also formats your code so that it is easy to read.
  5. XML Check: a standalone Windows XML validator.
  6. XML Schema Validator: a validator for your XML and schema definition.

XML and the Document Object Model

Because of XML's powerful uses by HTML, understanding how XML relates to the Document Object Model (DOM) is critical.

XML and HTML

The first time you heard of XML, you might have though of XML as an alternative to Hypertext Markup Language (HTML). While we know XML can be used in that way, that's pretty unusual. It's greatest use is pulling data into an HTML document.

Let's look at a conceptual example.


<!DOCTYPE html>
<html>
<div id="data"></div>
<script>
  function getXMLData() {
    /* insert JavaScript function to get data from an XML file */
  }
  document.getElementById("data").innerHTML = getXMLData();
</script>
</html>

Alright, so that code doesn't actually do anything, but we can use it to explain how HTML and XML can be used to work together. In the code above, the HTML defines an empty div which will serve as a container for data in an XML file. Then, a JavaScript function is defined. The function is empty, but in practical application this function would identify an XML file, pull data out of the file, and add HTML tags to the data so that it is rendered properly by the browser.

With a properly written function in place, when this bit of HTML was loaded the data div would not be empty, but instead would contain the contents defined by the JavaScript function.

Are you starting to see the power of XML? With this arrangement, the data displayed on a webpage can be updated dynamically by updating the referenced XML file, much in the same way a database can be used to update the contents of a webpage.

What is the Document Object Model?

The Document Object Model (DOM) is the programming interface used to manipulate HTML and XML documents. When you use JavaScript, or another scripting language, to manipulate an element on a webpage what you're actually doing is manipulating the DOM, not the HTML document itself.

The DOM is the virtual layer between the source documents used to build a web page and the scripting that modifies that webpage. Think of the DOM as the version of a webpage rendered by a browser and stored in the browser's memory. The DOM is a dynamic representation of a web page that exists within a web browser and can be accessed and modified by scripting — most commonly JavaScript.

Conceptualizing the XML DOM

The contents of the XML DOM can be manipulated with scripting. However, we have to understand the relationships between XML DOM elements, called nodes, before we can do anything with them.

Let's look at a simplified version of our earlier XML example code:


<?xml version="1.0" encoding="UTF-8"?>
<pets>
  <pet>
    <name>Max</name>
    <type>Dog</type>
    <birthday>July 7, 2014</birthday>
  </pet>
</pets>

The XML DOM is built of nodes. Every part of the XML DOM is a node.

  • Document node: The entire contents of the XML document represent the document node.
  • Root node: The first element in an XML document is called the root node. In this case, the root node is <pets>.
  • Parent and child nodes: The terms parent and child are used to describe the relationship between DOM elements and the elements nested within them. In our sample code, The <pets> element node is the parent of the <pet> node, and the <pet>node has three children: name, type, and birthday. Every node in an XML document, except for the root node, has exactly one parent node and may have any number of children nodes.
  • Sibling nodes: When two nodes are both the children of the same parent they are referred to as sibling nodes. In our example, name, type, and birthday are sibling nodes.
  • Text node: The text contained within a element is defined as a text node within the XML DOM. This is an important distinction. If we want to get at the text in a text node, we need to refer to it as the value of the text node, not the value of the child node. In other words, the path to the text "Max" looks like this: pets > pet > name > text node > value:"Max"

Manipulating the XML DOM

In general, you JavaScript is used to manipulate the XML DOM. JavaScript can be used to retrieve a variety of properties from the nodes in the XML DOM. Commonly accessed XML DOM properties include:

  • nodeValue: Gets the value contained within the node.
  • parentNode: References the parent node. If were were to apply this property to the name node in our sample XML, we would be referring to the pet node.
  • childNodes: References a node's children. If applied to the pet node in our code above, this property would return the name, type, and birthday nodes.

JavaScript can be used to do more than just reference the properties of XML DOM nodes. Here are some of the most common JavaScript methods used to actively manipulate the XML DOM.

  • getElementsByTagName: You might recognize this method if you've ever used JavaScript to manipulate HTML elements. Drop in the name of any XML DOM element, such as "pet" or "name" from our example XML code, to access those elements.
  • appendChild: This method is used to add child nodes to a node.
  • removeChild: Remove a node from a parent node. Keep in mind that the data will remain in the original XML file, it's just removed from the DOM built by the browser.

There are many additional XML DOM methods and properties. However, you really need to have a strong grasp of JavaScript, XML, and know how you plan to use XML data to get much further with this topic.

Resources

There seems to be an endless number of online tutorials, and some are much better than others. After looking at dozens of XML DOM tutorials, we think the following tutorials will get you up-to-speed the fastest.

If you prefer a learning format that offers a bit more structure than a tutorial, you might be interested in one of the following online courses that cover XML and the XML DOM.

XML has been around for a long time. As a result, many XML texts have been written over the years. Below are some modern XML titles that cover the XML DOM and are highly-rated by readers:

Serious HTML Coders Should Know XML

XML is a powerful and simple language for transporting data in a format that can be used in many different ways. The XML DOM is the model built by the browser to interact with and manipulate XML data. Once you understand how to work with the XML DOM you'll be able to get, change, and style XML data for use in webpages and applications.

MSXML: Microsoft's XML

Microsoft XML Core services (MSXML) is a set of Microsoft tools and services for the creation of XML-based applications using Microsoft development tools.

MSXML is actually a set of World Wide Web Consortium (W3C) compliant application programming interfaces (APIs), widely used by countless software developers.

Brief MSXML History

Over the years, MSXML has gone through numerous updates and releases, usually being released alongside other Microsoft products like Internet Explorer or Microsoft Office.

  • MSXML 1.0 was released in 1997 and shipped with Internet Explorer 4.0.
  • MSXML 2.0a was released in 1999 and shipped with Internet Explorer 5.0.
  • MSXML 2.5 was released in 2000 and shipped with Windows 2000, Internet Explorer 5.01, and MDAC 2.5.
  • MSXML 2.6 was released in 2000 and shipped with Microsoft SQL Server 2000 and MDAC 2.6.
  • MSXML 3.0 was released in 2001 and shipped with Windows XP, Internet Explorer 6.0, and MDAC 2.7.
  • MSXML 4.0 was released in 2001 as an independent software development kit (SDK).
  • MSXML 5.0 was released in 2003 and shipped with Microsoft Office 2003 and Office 2007.
  • MSXML 6.0 was released in 2005 and shipped with Microsoft SQL Server 2005, Visual Studio 2005, .NET Framework 3.0, Windows Vista, Windows 7, and Windows XP Service Pack 3.

MSXML versions 1.0, 2.0a, 2.5, 2.6, and 4.0 are obsolete and deprecated, while versions 3.0, 5.0 and 6.0 continue to be supported by Microsoft.

MSXML Features

MSXML is the native Windows API for XML-based applications, conforming to the XML 1.0 standard.

Some of the services provided by MSXML include the Document Object Model (DOM) — a library for accessing XML documents; the Simple API for XML (SAX) — a programmatic alternative to DOM processing; XMLHttpRequest and Server XMLHTTPRequest for implementing AJAX and RESTful applications; use of XPath 1.0 queries over DOM documents; XML transformations using XSLT 1.0; and support for the XSD 1.0 specification with the XmlSchemaCache.

All new applications should be written to comply with MSXML 6.0, the latest version of MSXML, or XmlLite, a lightweight XML parser for native code projects.

Using MSXML

MSXML services are programmatically exposed as Object Linking and Embedding (OLE) automation components, and can be used by developers using C, C++ native programming languages, or Jscript and VBScript active scripting languages.

Use of MSXML Component Object Model (COM) components is not recommended or supported if you are writing managed code targeting .NET Framework in C#, Visual Basic, managed C++ or any other managed programming language. MSXML uses specific threading modes and garbage collection routines that are not compatible with the .NET Framework. XML functionality should be implemented in .NET applications using classes from the System.Xml namespace or the LINQ to XML, both native to the .NET framework. Using MSXML in .NET applications through COM interoperability can result in unexpected problems that are difficult to debug.

MSXML is often used in processing XML in web applications, or as a standalone process using the Document Object Model (DOM). DOM and Simple API to XML (SAX2) can be utilized in any programming language that is capable of using ActiveX or COM objects.

Should I Use and Learn MSXML?

If your programming work revolves around applications using the .NET Framework, you do not need to worry about MSXML, since using it in .NET projects is not recommended.

On the other hand, if you work on native code or scripting programming language projects that interact with XML, you will probably be using MSXML or its lightweight alternative, XmlLite.

Many open-source alternatives to MSXML exist, for example NativeXML, but you must choose an alternative that will support your programming language.

MSXML Resources

If you work on programs that interact with XML, and those programs do not rely on the .NET Framework, you should take a look at the following resources on MSXML:

MSXML Books

Books that cover MSXML specifically are quite rare, in part due to the fact that there are ample MSXML resources available online. Also, many books about scripting language programming have chapters on MSXML. In some cases, these chapters are quite comprehensive and in-depth, while others merely offer a basic overview of MSXML.

Should You Invest Time in Learning MSXML?

While MSXML has not been deprecated, and is still in widespread use, its long-term relevance is up for debate. Development has slowed down to a crawl and MSXML's time has obviously come and gone.

However, MSXML is still used in many projects, although the range of its potential applications is dwindling. For starters, it should not be used with .NET Framework. It's not the only way of ensuring XML interaction, either. Various open-source alternatives are available, but dealing with each one of them was beyond the scope of this article.

In case you still want to master MSXML, or merely brush up your old skills, you might have a hard time finding fresh resources. A lot of MSXML resources, especially books and other print resources, are woefully outdated and cover deprecated versions of MSXML. This does not render them useless, but it does limit their usefulness and forces you to double-check much of what you read, just to make sure it applies to MSXML 6.0.

MSXML 6.0 was released more than a decade ago, and while Microsoft still supports it (technically), it's obvious that the end of the road for MSXML is near.

Conclusion

XML itself is pretty straightforward. For an experienced XHTML coder, it can seem almost trivial. But there are so many related technologies and so much that can be done with it that you could spend the rest of your life doing nothing else. We've just scratched the surface here.


Other Interesting Stuff

We have more guides, tutorials, and infographics related to coding and development:

  • Microsoft Visual Basic / Visual Studio: this is our basic primer on Visual Studio with a focus on Visual Basic.
  • HTML for Beginners: this article will take you from the very star. But given it is book-length, there's lots that experienced coders can learn.
  • C# Resources: as one of the most popular languages in the .NET firmament, C# is very helpful to know.

What Code Should You Learn?

Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, "How much money will I make programming Java for a living?"


Text written by Frank Moraes with additional content by Jon Penland and Nermin Hajdarbegovic. Compiled and edited by Frank Moraes.