XML Resources & Validators

XML is short for Extensible Markup Language. It is a highly structured markup language that is designed to be both human and machine readable. But XML is not a language in the way that HTML is a language. XML has no tags like <p>.

Instead, XML allows the coder to create any tags at all. And, more important, it allows those tags to be related to each other. So XML allows you to store data in a powerful way. But it doesn't provide any information on what ought to be done with that data. That's where XML based languages come in — things like: XHTML, RSS, and SOAP. It is also a common way that programs like word processors and spreadsheets can save data in an application independent way.

A Brief History of Markup Languages

Markup languages started as a way to combine the best elements of text files (readability of data) and binary files (precise description of data). So in the late 1980s, the Standard Generalized Markup Language (SGML) was created. It was a text-base language that allowed data and its display to be precisely described. HTML was a very simple system that was based on SGML.

But when HTML became hugely popular as the basis of the world wide web, it became apparent that something better was needed. HTML was limited and not well formatted so that browsers had to parse all kinds of code. For example, closing tags were often omitted and tag attributes were not placed inside quotation marks. Remember code like this?

<ul type=square>
<li>Bugs Bunny
<li>Daffy Duck
<li>Foghorn Leghorn
</ul>

Enter XML

Poorly structured HTML couldn't be replaced with SGML, because it is ridiculously complicated. It would have been something like replacing HTML with PostScript. So in the mid-1990s, work began on XML. It is a subset of SGML that allows coders to describe data and its relationships. And with the use of style sheets, it can be used to format and transmit data in almost any way imaginable. But unlike SGML, writing parsing programs for it is fairly simple. And in early 1998, the W3C released the first XML standard.

Why Use XML?

This may all sounds kind of abstract. After all, regardless of how powerful XML is at storing data, how does a web browser display anything but a list of data? But that's the point. The big problem with HTML in the early days was that data and layout information were scattered throughout a document. Remember when any kind of page layout had to be done with tables, making HTML code almost unreadable? Today, we use style sheets to separate the layout code from the information presented. Thus, once the layout is completed, it is a simple matter to maintain and add data.

But XML is not a replacement for HTML. In the most general system, XML is a kind of human readable database. But it can be turned into an HTML webpage (And a whole lot more!) by using another took, the Extensible Stylesheet Language Transformations (or XSLT). It converts XML documents into other XML documents — for example: XHTML documents. But even more interestingly, XML is used for things like RSS and SOAP.

A Basic Example

Let's start with a very basic example of how data is entered into an XML file.

<?xml version="1.0" ?>
<cartoon_characters>
  <character>
    <name>Bullwinkle</name>
    <intelligence>2</intelligence>
    <luck>10</luck>
  </character>
  <character>
    <name>Boris Badenov</name>
    <intelligence>4</intelligence>
    <luck>0</luck>
  </character>
</cartoon_characters>

Notice that none of these tags are defined by XML. They are defined by the coder. What XML does know (and this is critical) is that character is a kind of cartoon_characters and that each character has characteristics name, intelligence, and luck. Other characteristics (like species) as well as more characters (like Wrongway Peachfuzz) could be added and it wouldn't affect any XML parser.

We can take this a step further by creating an XSL transformation file that will create an XHTML file that displays the characters names in an unordered list. First, we would have to add an extra line of code to the previous XML code, right after the first line that defines the file as XML. It would look like this:

<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="bullwinkle.xsl"?>
<cartoon_characters>
.
.
.

Next, create an XSL file with the name "bullwinkle.xsl":

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:template match="/">
  <html>
    <head> <title>Rocky and Bullwinkle Show</title> </head>
    <body>
      <h1>Cartoon Characters</h1>
      <ul>
	<xsl:for-each select="cartoon_characters/character">
	  <li><xsl:value-of select="name"/></li>
        </xsl:for-each>
      </ul>
    </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Then load the original XML file, and it will display just like an XHTML file.

You can experiment with these files to get a better idea of what's going on. But what's most important is that you can leave the XSL file alone, while you add more and more data to the XML file.

There's More

XML is a huge subject. We've just dipped a toe into some very deep waters. Wikipedia lists roughly 200 XML languages. These include things like XHTML, of course. But they also include closely related XML tools like XML Encryption (for data encryption) and XML Signature (for digital signatures). But more than that, there are various important aspects to the language:

  • Namespaces: a way to allow different datasets to exist in a single XML file without naming conflicts.
  • Document Type Definitions: the dreaded DTD that website coders normally just copy and paste into their documents without understanding.
  • Schema: a way of structuring an XML document to limit how it is used.
  • Database: a non-SQL approach to database storage. There are a number different ones available.

Online Resources

There is an amazing amount of XML related material online. In fact, there is so much that it is overwhelming. As a result, we've tried to stick to just the core XML topics. But you will find links here that will answer just about any question you will ever have in your XML coding career.

Tutorials

Video Tutorials

  • XML Tutorial for Beginners: Portnov Computer School's introduction to XLM.
  • XML with Java: a free online course consisting of 13 video lectures by David J Malan.
  • XML Tutorial Video For Beginners 2015: a three hour long lecture that goes far into XML including a lot of information on XSLT.
  • Computer Science E-75 Lecture 3: from the Harvard extension course "Building Dynamic Websites." This lecture focuses on XML. In less than two hours, it provides everything you need to know to create your own XML based webpages. Note: it assumes knowledge of PHP.

Data Sources

Advanced Topics

Books

Given what a large subject XML is, it can be really helpful to have a book or two: to learn and for reference.

Learning XML

XML Reference

  • XML in a Nutshell by Harold and Means: a classic, but out of print and generally expensive. But you might be able to find a copy at a yard sale.
  • XML Pocket Reference by St Laurent and Fitzgerald: just what it says — a booklet you can keep in your shirt pocket for reference.
  • XML: The Complete Reference by Heather Williamson: an old thousand page reference; good to have around.

XML Coding Tools

  • Altova XMLSpy: a complete XML integrated development environment for Microsoft Windows. It's fairly expensive, but for the professional developer, a good investment.
  • <oXygen/> XML Editor: more than an editor, it provides debugging, profiling, and other tools. It is Java-based and so will run on any platform. It is also expensive, although it has reasonably priced academic and personal licenses available.
  • Stylus Studio: a Microsoft Windows based XML development suite including editor and XSLT visual mapping tool. It is fairly expensive, but offers a reasonably priced home edition.
  • EditiX XML Edit: a reasonably priced editor, debugger, and so on. It also offers a free EditiX Lite Version.
  • Wikipedia's List of XML Editors: there are lots of editors available from open source to proprietary to web based.

XML Validators

XML is a great tool in part because it is highly standardized. This means that it is picky. So it is critical that you make sure that your code is valid XML. Many of the tools that we've highlighted here contain their own XML validators. But there are plenty of free XML validators to help you with your coding projects.

  1. The W3C Markup Validation Service: a general tool that allows you to validate by URI, file upload, and direct input.
  2. W3 Schools XML Validator: an easy to use, online validator.
  3. XML Validation: a simple online validator that allows direct input or file upload.
  4. Code Beautify XML Validator: a simple validator that also formats your code so that it is easy to read.
  5. XML Check: a standalone Windows XML validator.
  6. XML Schema Validator: a validator for your XML and schema definition.

Conclusion

XML itself is pretty straightforward. For an experienced XHTML coder, it can seem almost trivial. But there are so many related technologies and so much that can be done with it that you could spend the rest of your life doing nothing else. We've just scratched the surface here.