Document Type Definition (DTD) Introduction

A Document Type Definition (DTD) provides a way of defining the structure of SGML kinds of languages — most specifically XML.

XML

XML is eXtensible Markup Language. It is very similar in form to HTML — content is surrounded by angle-bracketed tags, which name or identify the content.

<email>
 <to>[email protected]</to>
 <from>[email protected]</from>
 <subj>Be Careful!</subj>
 <message>
 I think Eve is listening in.
 </message>
</email>

As you can see, this looks a bit like HTML, but none of those tags are standard in HTML.

XML and HTML have intertwined histories. They are both related to an earlier markup form called SGML, which also used angle-bracketed tags to structure content.

The most important difference between XML and HTML is that HTML is used for a specific purpose — documents on the web — while XML is extensible, and can be used for any type of structured data. HTML has only a limited, specific set of tags, so it is easy to validate an HTML document to make sure it conforms to the standard. Any XML document can define its own tags and use them. This makes XML a lot more flexible than HTML.

So you want to use XML to markup an email? You might use <email>, <message>, and <to>. Or, if you have are using XML to store details about a record collection, you might have tags like <album>, <artist>, <releasedate> and so forth.

But there's a problem. How do you define a set of tags so that everyone uses (for example) <album> instead of <record>, or <to> instead of <toAddress>?

Document Type Definition

The solution is a DTD — a Document Type Definition. A DTD is a document which specifies what elements an XML document may have. It includes information about which elements can be nested inside another, which elements are mandatory or optional, and what attributes can be included in an element. The DTD language (itself a derivative of SGML) provides a way to specify this structure of element names and attributes. The resulting definition can be used to validate an XML document to make sure it conforms to the definition.

Why validate?

HTML documents are (usually) intended to be read by humans. The markup is primarily for semantic and presentational purposes, and is used by a web browser to render the document — but the final-user of HTML is almost always a person looking at a web page.

So, while validation of HTML is important and helpful, it is not strictly necessary. Browsers tend to be forgiving, and humans can figure out meaning even if the markup is a little off.

But XML is used to transmit data, not web pages. XML is usually consumed by another piece of software, not a human. There usually isn't room for ambiguity or mistakes. Additionally, it is possible for attackers to embed malicious code into XML, so applications that accept XML input can't trust all the input they receive.

HTML is most often validated by its author, as a sort of "proof-reading" step in the publishing process. XML, on the other hand, is most often validated by the receiver. This is done to ensure security and avoid errors before an application actually does something with the XML data.

DTD vs. XSD

DTD was the first document definition format invented for XML. It has certain limitations, not the least of which being that a DTD itself is not XML. DTD grammar is somewhat difficult to parse, requiring a different toolset than XML parsing.

XSD — XML Schema Definition — is a later standard that improved on DTD in several ways. An XSD document is, itself, valid XML. XSD can specify data types for each element; for example, whether an element should contain a date and time, a number, a string, or another type of data.

For these reasons, XSD has become more popular for validating transactional XML — that is, XML that is generated, sent, and received as part of an API. XSDs, for example, are used in SOAP.

Since DTD is easier to create and read (by humans, that is), it remained popular in contexts where XML was used for publishing information. However, this way of using XML has largely been outmoded with the rise of HTML5, and the increasing divergence of HTML and XML. Today, API developers looking for a lightweight alternative to XML+XSD are more likely to simply use JSON than they are to use XML and DTDs.

But, there are still plenty of DTDs in use. If you work on legacy web technology, especially data systems built in the late 90s, you will likely find yourself working with DTDs at some point. To help you find your way, we've put together the best DTD tutorials, resources, and tools we could find.

DTD Tutorials

Other Learning Resources

Tools

  • Online XML Validator lets you quickly validate an XML file against a DTD mentioned in the file itself.
  • Xmllint is a command-line tool for parsing and linting XML files. It can be used to quickly validate against a DTD.
  • DTDGenerator is a tool that produces a DTD document based on a given XML document.
  • DTD2Schema converts DTD files to XSD.
  • XML Tools by Platform is a comprehensive listing of XML tools for various languages and platforms. Most of these can be used to construct DTD files or to validate XML documents against DTDs.

Conclusion

It may seem like DTDs aren't used much anymore. In the world of XML, they've been superceded by XSD. And XML itself has largely been replaced with newer technology. But many legacy and large enterprise systems continue to use XML and DTD. If you work with large enterprise systems, or develop using enterprise web tools like .NET, you should probably be familiar with DTD and related standards.


Further Reading and Resources

We have more guides, tutorials, and infographics related to coding and development:

What Code Should You Learn?

Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, "How much money will I make programming Java for a living?"