Composing Good HTML and Validators

In this two-part guide we dive into what seperates HTML from good and valid HTML. You can navigate through the chapters on left, or continue reading below.

Part 1 - Composing Good HTML

Knowing all the rules of proper grammar does not make you a good writer, and knowing every function and construct of a language does not make you a good programmer. There is something more that is required — good style.

Writing HTML is the same way. You might know the ins and outs of every markup element, and still put together sloppy documents that are hard to read and design for. But this doesn't have to be the case — you can learn how to write good, well-styled HTML.

Why bother with good HTML?

Since HTML isn't intended to be seen by your end users and online audience, you might ask:

  • What difference does it make?

  • Why bother worrying about the markup?

Answering that question is the first step toward good HTML style: thinking about who you are writing HTML for.

There are a handful of "audiences" for your HTML:

  • The current designer/developer who has to write CSS

  • Future developers and designers who might need to redesign your website

  • Google and other search engines

  • Facebook, and other social media sites that post page excerpts

  • RSS readers

  • Screen-readers for the visually impaired

  • Your regular users, ultimately.

All these different audiences — only some of which are human people — have different needs, all of which are well-served by good, well-style HTML.

Good Document Structure

The most important aspect of HTML style is to structure the overall document properly.

The overall structure should look like this:


<!DOCTYPE html>
<html>

 <head>
  <title>The Title of the Document</title>
  <!-- CSS Links -->
  <!-- Favicon Links -->
  <!-- Meta data: Description and Keywords -->
  <!-- Open Graph Details -->
  <!-- Other Semantic Data -->
 </head>

 <body>
  <header>
   <h2>Title of Site</h2> <!-- H1 if Home Page -->
   <h1>Title of Document</h1> <!-- This can go here or in <main> -->
   <nav><!-- Main Navigation Menu. Usually a <ul> --></nav>
  </header>
  <main>
   <article>
    <header>
     <h1>Article Title <!--OR--> Document Title</h1>
     <!-- Article Meta Data -->
    </header>
    
    <!-- Article Content -->

    <footer>
     <!-- Article Meta Data -->
    </footer>
    <aside>
     <!-- Comments -->
    </aside>
   </article>
  </main>
  <aside>
   <!-- Sidebar(s) -->
  </aside>
  <footer>
   <!-- Site Footer: Contact info, copyright notices, navigation menu, disclaimers, etc. -->
  </footer>
  <!-- JavaScript -->
 </body>

</html>

Note that you may not need every element specified here. For example: you might not want to put the <header> element into your <article> of you aren't including meta-data details at the top of the article — a simple <h1> or <h2> tag, at the same "level" as the article content would be just fine.

For more details about structuring your document properly, see the Structural HTML section of our HTML Guide.

Ordering Your Major Elements

The <title> and content-related <meta> should be early in your <head>. When your SEO consultant is trying to figure out what's going on with how Google or Facebook is displaying your pages, don't make them sort through a dozen CSS links and random JavaScript code.

Even if your navbar is stuck to the top of your window, and the main title for the site is below it, put your main site title in an <h1> or <h2> tag, and place that first inside <header>. Put the navigation menu after the title.

The <main> element should be the first element after the <header>. Even if you have a left-side sidebar, use CSS to move it over to the left side; don't put your <aside> before the main content.

The page <footer> should be the last content element inside of <body>. There should be no visible HTML elements after </footer>.

Place as much of your JavaScript as possible after the </footer> closing tag. Only put JavaScript into the <head> if it absolutely has to be there for some reason (there usually isn't).

Making your structural markup easier to read

Sometime you just have to include a lot of <div> tags, or you have multiple <aside> sidebars at different levels. Or maybe you have a lot of nested lists.

One of the most helpful things you can do is indent your code. Consider the following two examples:

<!-- Example 1 -->
<header>
<div class="mast-head">
<h1>Document Title</h1>
<div class="nav-bar">
<div class="main-nav">
<nav>
<ul>
<li><a>Menu Item</a></li>
<li><a>Menu Item</a></li>
<li><a>Menu Item</a></li>
<ul>
<li><a>Menu Item</a></li>
<li><a>Menu Item</a></li>
<li><a>Menu Item</a></li>
</ul>
<li><a>Menu Item</a></li>
</ul>
</nav>
</div>
</div>
</div>
</header>

<!-- Example 2 -->

<header>
 <div class="mast-head">
  <h1>Document Title</h1>
  <div class="nav-bar">
   <div class="main-nav">
    <nav>
     <ul>
      <li><a>Menu Item</a></li>
      <li><a>Menu Item</a></li>
      <li><a>Menu Item</a></li>
       <ul>
        <li><a>Menu Item</a></li>
        <li><a>Menu Item</a></li>
        <li><a>Menu Item</a></li>
       </ul>
      <li><a>Menu Item</a></li>
     </ul>
    </nav>
   </div>
  </div>
 </div>
</header>

The second one is much easier to read, isn't it? This is very helpful for designers and developers who are trying to make sense of your document.

Make sure that you are consistent with how you indent — one space, two spaces, a tab. What you pick doesn't matter that much, but the important thing is to keep it consistent.

But what about generated HTML?

Most of the HTML on the internet today is generated by one Content Management System or another. You can't always make the code indenting work the way you want it — especially if the code that generates the HTML is spread across a lot of different theme and plugin files.

The answer is to use meaningful classes or IDs and comment ending tags. This can be especially helpful for repeated blocks of generated content, such as comments or items in a sidebar.

<header>
 <div class="mast-head">
  <h1>Document Title</h1>
  <div class="nav-bar">
   <div class="main-nav">
    <nav>
     <ul> 
      <li><a>Menu Item</a></li>
      <li><a>Menu Item</a></li>
      <li><a>Menu Item</a></li>
       <ul>
        <li><a>Menu Item</a></li>
        <li><a>Menu Item</a></li>
        <li><a>Menu Item</a></li>
       </ul>
      <li><a>Menu Item</a></li>
     </ul>
    </nav>
   </div> <!-- /.main-nav -->
  </div> <!-- /.nav-bar -->
 </div> <!-- /.mast-head-->
</header>

<!-- Comments -->

<div class="comments">
  <div class="comment" id="comment-39874693029">
  <!-- Comment -->
  </div><!-- /#comment-39874693029 -->
</div> <!-- /.comments -->


<!-- Sidebars -->

<aside>
 <div class="sidebar-item" id="subscribe-form">
  <!-- Subscribe form -->
 </div><!-- /#subscribe-form -->
 <div class="sidebar-item" id="archives">
  <!-- Blog Archives -->
 </div><!-- /#archives -->
</aside>

As a general rule — if an element's opening and closing tag are not on the same line, and the element requires a class or id, it is a good idea to comment the closing tag.

Meaningful Classes and IDs

First of all — make your class and ID attributes consistent and easy to read by:

  • using only lower case letters

  • separating word with hyphens.

Next, make sure that your class names and IDs make obvious semantic sense and are not all about design and display.

Good class and ID names:

  • nav-menu

  • blog-post

  • sidebar-widget

  • comment-meta.

Bad class and ID names:

  • green-box

  • left-sidebar

  • fade-in-banner.

Sometimes, the nature of your front-end framework or CSS will cause you to need extra elements, and you'll find yourself using layout-specific class names like wrapper.

This is okay if you can't help it. Just make sure you are keeping things general. There's nothing worse than a redesign that creates CSS that looks like:

.green-box {
    background-color: blue;
}

Use Content elements wisely

Within the main section of your article or other content, use sectional content tags to organize your document properly.

Headlines and Sections

Headlines for sections are very important. Don't neglect them. Your final content is much easier to read if there are several titled sections and sub-sections, rather than one giant block of content.

  • Use heading tags ( <h2>, <h3>, <h4>, <h5>) to title sections and subsections.

  • Make sure your hierarchy of headlines forms a reasonable outline. Don't put an <h5> after an <h2> without an <h3> and <h4> intervening. Make sure your content has a rational and understandable structure.

  • If you are using <strong> to markup section headers, something is wrong.

Also, be sure to put id attribute on section titles so that you and others can create in-document links.


<h3 id="title-of-section">Title of Section</h3>

...

<a href="#title-of-section">This links to that location.</a>

Finally, don't abuse the horizontal rule ( <hr> ). If you are using sections and headlines appropriately, there is almost never any reason for it.

The only required attribute for an anchor tag is the URL of the linked document.

But including a title tag is very helpful, as it lets people know where they are going before they click the link. It also helps Search Engines determine what the link is about.Ideally, the title tag will be the title of the linked document.

Another issue with links is the anchor text — the actual text which the user clicks (or taps, or selects) in order to follow the link.

Try to avoid Click Here if possible. Sometimes it is inevitable, but whenever possible you should try to make the anchor text meaningful. This is helpful for your readers and also for the document you are linking to (which may be your own).


<!-- bad anchor text -->

Learn more about HTML by <a href="">clicking here</a>.

<!-- good anchor text -->

We provide a lot of <a href="">information about HTML</a>.

Images

An image is only required to have a src element — the URL of the image. But including a title and alt text can help.

  • screen readers for the blind can read the description to a user that can't see the image

  • search engines can index the image and have some idea of what the image is about.

Definitions, Quotes, Acronyms

There are several very helpful span-level markup elements that are never used.

These provide fine-grain information about the word on your page. They can help users better understand your content, and they can help computers (search engines, artificial intelligence) make better sense of what you have written.

  • <dfn> — The definition tag. This can be for the first time you use and define a technical term.

  • <abbr> — Used for abbreviations. You can put the expanded form of the abbreviation in the title tag.

  • <q> — Most people just use typographical quote marks to delineate quotations, but using the markup makes it more explicit, and allows you to reference the source of the quote with the <cite> element.

Learn More and Care More

Writing good HTML is a matter of:

  • learning a handful of basic principles

  • caring enough about your content and your site to follow them.

We can only help you with the learning part. You have to make the decision to care.

This article touched on some of the most basic (and commonly ignored) issues, but you can learn a lot more from our HTML Guide, especially the sections on Semantic Markup.

Most of the advice there can be summed up in one sentence:

Make sure your HTML clearly communicates what you want to communicate.

Part 2: HTML Validators

With the advent of modern, standards-based web browsers and HTML, there has been a increasing interest in validation — making sure that the source code of a website is free of errors and conforms to the relevant specifications.

This is is a good thing, of course — the web is generally a better place when websites follow "the rules." But too much emphasis on validation can also be counter-productive.

Here's what you need to know.

What is HTML Validation?

Validation just means checking to see if your web page's source code conforms to the specification for the language laid out by the W3C. This checking is done by a software tool called an HTML Validator.

This is analogous to proofreading — making sure that all the words are spelled right and that conventional rules of punctuation and grammar are followed.

The specification for a markup language spells out (in excruciating detail, sometimes) how each HTML element is to be used, what its potential attributes can be, and how it related to the other elements on a page.

To say that an HTML document is valid just means that is follows each and every one of those rules.

What is HTML Validation not?

HTML doesn't actually tell you if your website is any good, or looks the way you think it should, or will help you achieve your marketing goals. It only tells you if your markup conforms to the specification.

This is a little like the difference between editing and proofreading — validation is like proofreading.

HTML Validation also is only concerned with the HTML — not the CSS, the JavaScript, the underlying PHP. It also has nothing to do with things like forms working properly (form validation is a whole different thing).

Why bother with validation?

Running your HTML through a validator can help you catch mistakes that can creep into your HTML from a variety of avenues.

Simple Typos

Probably the most common source of validation errors is just simple typographical errors. If an element's tag name is spelled wrong, or a right angle-bracket is hit instead of a left one, you'll get a validation error. These are often the most important to find and fix, and also the easiest as well.

Version mismatch

Each version of HTML has a slightly different set of rules, and things that are included in the language. If your HTML is valid, that means you have followed all of those rules, and only included things that are officially a part of the language.

For example, the <article> HTML tag is new in HTML5 — it was not present in the HTML 4.0 specification. That means that if you were validating against that specification, and you included <article>, it wouldn't be valid. You'd get an error.

Another example of something that has changed is the way null elements close.

The image tag ( <img> ) is a null element — it has no content, only attributes (the image itself is an attribute pointing to the image file, not the content of an element). In the past, null elements had to be closed, so you would see this:

<img src="http://example.com/some_image" alt="Some Image" />

Now, in the HTML5 specification, this is not preferred, and the same image would look like this:

<img src="http://example.com/some_image" alt="Some Image" >

Validation is important for a number of reasons:

  • There have been many different specifications over the years

  • Sometimes multiple specifications active at the same time

  • Coders have developed habits based on various ways of doing things.

Bad Server-Side Code

Most websites today use some underlying Content Management System or server-side scripting to generate HTML. This adds a layer of complexity that can introduce additional errors.

For example, if a particular condition isn't met or a template files isn't loaded, the closing tag for a large element might not get included in the output.

It can also be difficult to see the whole HTML document when working on server-side dynamic scripts — the template for a single page is often spread across a number of different files.

Included Bad Code

Along with Content Management Systems, most website owners use a number of third-party plugins to help generate their websites. These are not always as high-quality as they should be, and can be a source of typos, bad markup, and poorly-written code.

Simply not knowing the HTML specification

There is a lot of minute detail to the HTML spec — things that a lot of beginner and intermediate developers may not know or understand.

Did you know that you can't put a list ( <ul> or <ol> ) inside a paragraph ( <p>)? It's not just invalid — from the standpoint of the HTML specification (and most HTML parsers), it is literally nonsense.

There are a lot of rules like that in the HTML specification — some of them explicit, some of them implicit. A validator will catch if you are breaking any of them.

Why is Valid Markup important?

Many basic problems (such as typos or missing tags) can cause a problem with the way a website displays in a browser. A missing </div> can throw off the alignment and layout of every element after it on the page. A misspelled <img> tag probably means the image won't be displayed at all.

But there's more to validation than catching typos.

If use the HTML specification wrong (or in a "nonstandard" way), browsers won't be able to parse and display your site correctly. The whole point of a standard is to make sure that every browser or client knows exactly what each detail of a document means, and how to display it to a user.

Validating helps to make sure that you conform to the specification.

Valid Markup is all important

That being said, it is important to realize that not all invalid markup is wrong. You may get errors and warnings from a validator that you can simply ignore. This can happen for at least two reasons:

  • Sometimes, common industry practice has evolved away from the official standard. If you follow the common practice, you may get an error, even though it doesn't matter.

  • Sometimes, validators are just wrong. This doesn't happen often, but it does happen.

As of the time of this writing, all five of the most popular websites on internet have validation errors on their main page. But if you look into each error in detail (as, I'm sure, the teams behind those sites have done), there is a reason why they have made each choice.

How to Validate

The easiest way to check if a site has valid markup is to use the W3C Markup Validation Service. This is a simple online interface that lets you check a website by its URL. (You can use it to check any site, not just your own.)

If you plan to use a validator a lot, you can do it in the browser with the HTML Validator plugin for Firebug for Firefox or the Validity extension for Chrome.

If you need to do validation on HTML files away from the browser, there is also an Open Source command-line HTML 5 validator.

Further Resources