Document Type Definition (DTD) Introduction
A Document Type Definition (DTD) provides a way of defining the structure of SGML kinds of languages — most specifically XML.
XML is eXtensible Markup Language. It is very similar in form to HTML — content is surrounded by angle-bracketed tags, which name or identify the content.
<email> <to>[email protected]</to> <from>[email protected]</from> <subj>Be Careful!</subj> <message> I think Eve is listening in. </message> </email>
As you can see, this looks a bit like HTML, but none of those tags are standard in HTML.
XML and HTML have intertwined histories. They are both related to an earlier markup form called SGML, which also used angle-bracketed tags to structure content.
The most important difference between XML and HTML is that HTML is used for a specific purpose — documents on the web — while XML is extensible, and can be used for any type of structured data. HTML has only a limited, specific set of tags, so it is easy to validate an HTML document to make sure it conforms to the standard. Any XML document can define its own tags and use them. This makes XML a lot more flexible than HTML.
So you want to use XML to markup an email? You might use
<to>. Or, if you have are using XML to store details about a record collection, you might have tags like
<releasedate> and so forth.
But there's a problem. How do you define a set of tags so that everyone uses (for example)
<album> instead of
<to> instead of
Document Type Definition
The solution is a DTD — a Document Type Definition. A DTD is a document which specifies what elements an XML document may have. It includes information about which elements can be nested inside another, which elements are mandatory or optional, and what attributes can be included in an element. The DTD language (itself a derivative of SGML) provides a way to specify this structure of element names and attributes. The resulting definition can be used to validate an XML document to make sure it conforms to the definition.
HTML documents are (usually) intended to be read by humans. The markup is primarily for semantic and presentational purposes, and is used by a web browser to render the document — but the final-user of HTML is almost always a person looking at a web page.
So, while validation of HTML is important and helpful, it is not strictly necessary. Browsers tend to be forgiving, and humans can figure out meaning even if the markup is a little off.
But XML is used to transmit data, not web pages. XML is usually consumed by another piece of software, not a human. There usually isn't room for ambiguity or mistakes. Additionally, it is possible for attackers to embed malicious code into XML, so applications that accept XML input can't trust all the input they receive.
HTML is most often validated by its author, as a sort of "proof-reading" step in the publishing process. XML, on the other hand, is most often validated by the receiver. This is done to ensure security and avoid errors before an application actually does something with the XML data.
DTD vs. XSD
DTD was the first document definition format invented for XML. It has certain limitations, not the least of which being that a DTD itself is not XML. DTD grammar is somewhat difficult to parse, requiring a different toolset than XML parsing.
XSD — XML Schema Definition — is a later standard that improved on DTD in several ways. An XSD document is, itself, valid XML. XSD can specify data types for each element; for example, whether an element should contain a date and time, a number, a string, or another type of data.
For these reasons, XSD has become more popular for validating transactional XML — that is, XML that is generated, sent, and received as part of an API. XSDs, for example, are used in SOAP.
Since DTD is easier to create and read (by humans, that is), it remained popular in contexts where XML was used for publishing information. However, this way of using XML has largely been outmoded with the rise of HTML5, and the increasing divergence of HTML and XML. Today, API developers looking for a lightweight alternative to XML+XSD are more likely to simply use JSON than they are to use XML and DTDs.
But, there are still plenty of DTDs in use. If you work on legacy web technology, especially data systems built in the late 90s, you will likely find yourself working with DTDs at some point. To help you find your way, we've put together the best DTD tutorials, resources, and tools we could find.
- Constructing a Document Type Definition (DTD) for XML is a well-presented overview of DTDs from New Mexico Institute of Mining and Technology.
- DTD Tutorial from W3Schools provides a methodical introduction to the topic, and is a good place to start if you are just coming to this topic.
- XML and DTDs (PDF) provides an explanation of the anatomy of an XML file, and then shows how DTDs define a specific XML document type. This is a good tutorial if you need to brush up on XML basics while learning about DTDs.
- XML DTD — An Introduction to XML Document Type Definitions is a 7-part tutorial that walks readers through the creation of a DTD and validating XML documents against it.
- The 10 Minute Guide to Reading an XML DTD is a brief overview on how to read and interpret an XML Document Type Definition, making no assumptions about how much you already know about XML or DTDs.
- DTD Tutorial is a community-written resource from the EduTech Wiki.
Other Learning Resources
- XML Schema, DTD, and Entity Attacks (PDF) is a paper detailing security vulnerabilities that can occur in systems that use DTDs for XML validation.
- XML Coding Exercises is a series of Java-based tutorials and exercises, including coverage on building and using DTDs.
- SGML Exceptions and XML is an advanced tutorial on building DTDs with complex inclusion and exclusion rules.
- Generic Programming for XML Tools is an advanced paper on implementing DTD-aware XML tools as generic programs.
- XML DTDs vs XML Schema explains the differences between DTDs and XSDs, two ways of defining the structure of an XML document.
- Online XML Validator lets you quickly validate an XML file against a DTD mentioned in the file itself.
- Xmllint is a command-line tool for parsing and linting XML files. It can be used to quickly validate against a DTD.
- DTDGenerator is a tool that produces a DTD document based on a given XML document.
- DTD2Schema converts DTD files to XSD.
- XML Tools by Platform is a comprehensive listing of XML tools for various languages and platforms. Most of these can be used to construct DTD files or to validate XML documents against DTDs.
It may seem like DTDs aren't used much anymore. In the world of XML, they've been superceded by XSD. And XML itself has largely been replaced with newer technology. But many legacy and large enterprise systems continue to use XML and DTD. If you work with large enterprise systems, or develop using enterprise web tools like .NET, you should probably be familiar with DTD and related standards.
Further Reading and Resources
We have more guides, tutorials, and infographics related to coding and development:
- XML Resources and Validators: learn all about XML itself.
- MSXML Introduction and Resources: this will get you going with Microsoft XML Core services (MSXML), which will help you build XML-based applications.
- Composing Good HTML: learn to write code all browsers on all devices will be able to display properly.
What Code Should You Learn?
Confused about what programming language you should learn to code in? Check out our infographic, What Code Should You Learn? It not only discusses different aspects of the languages, it answers important questions such as, "How much money will I make programming Java for a living?"