HTML for Beginners - Ultimate Guide

From the small business owner to the student creating a class project, or even casual individuals working on a blog or personal project online, HTML knowledge is incredibly useful. Although the prospect of having to learn a programming language certainly does seem daunting, the good news is that HTML uses common words so that it is fairly simple to pick up.

In this guide we cover the basics in a (hopefully) easy-to-understand manner, perfect for the absolute beginner. However, we don't stop at the basics -- even seasoned webmasters will find useful tips to expand your working knowledge of HTML.

Start reading below, or use the navigation on the right to jump to a specific topic.

1. HTML Basics

This chapter introduces HTML, the language used to author web pages, and provides a little background regarding its history and the reason it is used.

What is HTML?

HTML is Hypertext Markup Language, a format for creating documents and web pages. It was originally invented in the early 1990s by Tim Berners-Lee, and was based on an earlier markup language called SGML (Standard Generalized Markup Language), which was was based on an earlier format simply called GML (Generalized Markup Language) developed at IBM in the 1960s.

HTML consists primarily of matching pairs of angle-bracketed tags surrounding human-meaningful text (<span>like this</span>). The tags provide meaning, context, and display information to the text they surround.

What is a Markup Language?

Imagine any text-based document you have ever read: a website, a book, a PDF, a Word doc, a church bulletin. There is the text, of course — but there’s something else: how the text is displayed. Some of the words are larger or smaller, some are italicized or in bold, some are a different color or a different font.

The file that one of these documents is saved into has to contain both the human-readable text and also the information about the display. A number of different ways to accomplish this have been tried, and the most convenient way to do it is to store the information in line with the text itself.

So, for example, if you want to make some text bold or italic, you might do something like this:

I want to make [start bold]these words bold[end bold] and [start italic]these other words italic[end italic].

Which, in theory, should produce something like:

I want to make these words bold and these other words italic.

These inline matching pairs of style declarations are called tags, and something like this is the basis of almost every markup language. But this format shown above isn’t HTML, its just a little made-up example.

The example above has many problems with it, and the inventors of HTML (and SGML and GML) came up with something similar, but much better:

  • Square brackets are often used in text, so reserving them for use in tags could cause problems. Instead HTML uses angle brackets: < and >.
  • Writing start and end over and over is very tedious. HTML simplifies this by using the tag name itself as “start” declaration. The tag name with a slash in front of it ( / ) is used as the ending tag.
  • Rather than the whole words “bold” and “italic,” HTML uses abbreviations to make it faster to type and less obtrusive to read.

So, taking these things into account, the above example would look like:

I want to make <b>these words bold</b> and <i>these other words italic</i>.

I want to make these words bold and these other words italic.

Recently, there has been a move away from explicitly declaring typographical details (like bold and italic) and instead using the markup to convey the meaning, not just the look. Therefore, the <b> and <i> tags are no longer recommended for use. Instead, the preferred tags are <strong> and <em> (emphasis). So in contemporary documents the sentence above would be:

I want to make <strong>these words stand out</strong> and to <em>emphasize these words</em>.

I want to make these words stand out and to emphasize these words.

HTML is, at its core, nothing more complicated than a set of defined markup tags.

What is hypertext?

Hypertext is a word that was invented in the 1960s to describe documents that contain links that allow the reader to jump to other places in the document or to another document altogether. These links, which we now take for granted in the modern web, were a big deal when computers were first coming into maturity.

The “hyper” concept of internal and external linking was so revolutionary to the way content is organized on the internet that the word shows up in a number of places:

  • HTML is the “HyperText markup language”
  • http:// stands for “HyperText Transfer Protocol
  • A link from one page to another is called a “hyperlink,” and the data attribute that specifies where a link is pointing to is called a “hyper reference.”

Where and How is HTML used?

HTML is used for almost all web pages. The web page you are reading right now uses HTML. It is the default language of websites.

It can also be used for other types of documents, like ebooks.

HTML documents are rendered by a a web browser (the application you are using to read this page). HTML rendering hides all the tags, and changes the display of the rest of the content based on what those tags say it should look like.

Do I need to learn HTML to run my website?

Unless you plan to become a web developer, and build pages from scratch, you don’t need to know all the intricate details of HTML.

If you are just using a blogging platform, a site builder, or a Content Management System (CMS) set up by someone else, you may be able to get by without knowing any HTML — there are “graphic” editors available that make adding content to a blog similar to writing in Microsoft Word or email.

However, sometimes those graphical editors don’t work exactly the right way, and sometimes you will want to do something and not understand why you can’t. Therefore, it is highly recommended that if you are going to be writing for the web, even just regular blog posts and announcements, that you get a good understanding of basic HTML concepts.

Additionally, there are details of how HTML documents are structured that have an effect on things like SEO and data aggregation. If you are interested in staying informed about how your website appears to non-human visitors, understanding HTML is an important skill.

Similarly, website accessibility — the ability for a website to be navigated successfully by people with visual or other handicaps — is an increasingly important consideration. The blind rely on computerized screen readers to translate web sites into sound, and the quality and structure of the underlying HTML document has a big impact on the ability of the screen reader to work properly.

Mostly, HTML is the common underlying language of the contemporary internet. If you want to understand how the world works, it is a good idea to at least have some familiarity with HTML.

Summary

HTML — Hypertext Markup Language — is the language used for creating web pages and other web-based documents. It consists mainly of matching pairs of angle-bracketed tags, enclosing sections of human-meaningful text. The tags, which are not displayed by web browsers, are used to provide information about how the text and page should be displayed.

2. HTML Elements and Tags

This chapter takes a close look at tags, the fundamental building blocks of HTML. It covers how they work, some exceptions to the normal way they work, and a brief discussion on tag attributes.

Structure of Tags

Generally, matching pairs of tags surround the section of text which they affect. Two matching pairs of tags, along with the content they enclose, are called an “element.”

<strong>This element begins and ends with the "strong" tag.</strong>

The opening tag can contain additional attributes which provide more information about the contents of the tag and how to display them. These attributes are written as name–value pairs, separated by an equals ( = ) sign, with the value placed in quotes.

<a href="http://example.com">This is a link. The tag is an "a" for "anchor," and the href (hyper reference) attribute specifies where the link is pointing.</a>

A few tags do not occur in matching pairs, because they are used to insert something, rather than describe text. These are called “empty” or “void” tags, and the most common one is the one used for inserting an image. The src attribute is used to specify the URL of the image.

<img src="http://media.whoishostingthis.com/2/v60/images/wiht-logo.png" />

Notice there is no closing tag, and therefore no enclosed text. The slash right before the final angle bracket ( /> )is used to “self-close” the tag. This is not absolutely required, but it is a good reminder that whatever follows will not be enclosed.

There are several other empty tags. Two are fairly straight forward and common.

  • <br/> inserts a line-break.
  • <hr/> inserts a horizontal rule (line) separator.

Others do not insert something visual, but are used to provide information about the page itself.

<link rel="stylesheet" type="text/css" href="theme.css" />

<meta name="description" content="Title of this page." />

Additionally, the <script> tag (which is used to add Javascript to a page) can be empty, but it doesn’t have to be.

(More information on <link>, <meta>, and <script> tags will be provided later in this guide.)

Block-level vs. inline

With the exception of tags that provide information about the document itself, HTML tags fall into two categories, block and inline.

Block elements

Block elements represent rectangular blocks of content. They have an implied line break before and after. Block elements include sectional content like paragraphs ( <p> ), divisions ( <div> ), and headlines ( <h1>, <h2>).

It is standard practice to type most block-level tags on individual lines above and below their content:

<div>

This is a div.
</div>

However, this is not always done, especially with headlines:

<h1>This is the title of a page</h1>

<h2>This is a major section</h2>

<p>
Some content in a paragraph.
</p>

Block-level elements can be nested, but some block-level elements can not contain other block-level elements:

<div>
  <h2> Title of an article</h2>
  <p>
    First paragraph of article.
  </p>
</div>

Paragraphs and headlines cannot contain other block-level elements.

Inline elements

Inline elements are elements used within text. Bold ( <strong> ), italic ( <em> ), and links ( <a> ) are all inline elements.

Inline elements are sometimes called “span-level” elements. There is also a generic span-level elements, simply called a span ( <span> ). This doesn’t do a whole lot by itself, but can be used to create customized types of text-display, through the use of elements.

<span class="special-text">This text is special.</span>

(See the chapter on CSS for information on how to make class="special-text" display in a special format.)

Sometime it doesn’t make sense

Images ( <img> ) feel like block-level elements — they are rectangular, have definite dimensions, and are usually displayed outside of the flow of text.

However they are actually inline elements. The reason for this is mostly a hold-over from a less-sophisticated period of web design, but we’re stuck with it now. The weird implications of this can be avoided easily, but its good to know. (See the chapter on images and also the one on CSS.)

There are other weird issues like this, and they will be covered later in this guide when they come up.

More about attributes

Almost every element tag can include attributes. Many elements have a specific set of attributes they support (like <img> and the src= attribute), but there are several elements which are globally supported by all element types.

Two important attribute types are class and id.

<a href="http://example.com" class="example-link" id="link27">This anchor tag has three attributes.</a>

Class attributes

Class attributes are used to mark one or more elements as belonging to a specific “class” or group — this can be used for displaying them all the same way.

For example, it is common to use an unordered list ( <ul>) as a menu, and to make the list item ( <li> ) which points to the current page look different than all the other links in the same list.

<ul class="menu">
 <li class="menu-item">
   <a href="/home">Home</a>
 </li>
 <li class="current-item">
   <a href="/about">About</a>
 </li>
 <li class="menu-item">
   <a href="/contact">Contact</a>
 </li>
</ul>

An element can have more than one class. Multiple classes are separated by spaces inside the class element.

<p class="first drop-cap">
  This is the first paragraph, and it is also part of the drop-cap class.
</p>

Because classes are separated by a space, classes may not include spaces in their names.

In CSS, JavaScript, and other languages, the class of an element is notated with a dot before the name.

/*CSS*/

.first {
    color: green;
}

The above CSS code means that within any element that has a class of first, the text color should display as green.

ID attribute

The ID attribute works similarly to the Class attribute, but is conceptually different. Rather than signifying the element’s membership in a group, it uniquely identifies that element. For this reason, there can be only one element with any specific ID on any given page.

<h1 id="page-title">This is the title of the page</h1>

IDs are less often used for affecting display, and more often used for functional purposes.

IDs can be used for internal linking of a document, such as the Table of Contents on a wikipedia article.

<ol>
 <li>
  <a href="#intro">Intro</a>
 </li>
 <li>
  <a href="#history_of_topic">History of Topic</a>
 </li>
</ol>

<h2 id="intro" class="section-header">Introduction</h2>

<p>
Text of introductory section.
</p>

<h2 id="history_of_topic" class="section-header">

<p>
Some history on this topic.
</p>

Notice that the links to the sections include the name of the tag, prefixed with the hash or pound sign ( # ). This is the standard way to reference the id of an element:

/*CSS*/

#intro {
    font-size: 14px;
}
// Jquery

$("#intro").click(
  // do something when the intro area is clicked
);

Other attributes

Each HTML tag has its own set of available attributes relating to their specific purpose. For example, the <a> tag, which defines a hyperlink, includes the href (hyper-reference) attribute, which specifies a URL being linked to.

These attributes will be covered in more detail as we look at each tag individually in later chapters.

There are also a number of “global” attributes — attributes any element can have. These will also be covered in more detail later on, as their uses become more relevant.

Comment Tags

The last point to cover in basic HTML tags is the comment. Comments begin with an angle bracket, followed by an exclamation point and two dashes. They end with two dashes and a closing angle-bracket.

<!-- This is a comment. -->

Comments may be multi-line.

<!--
This is a comment.
It has two lines.
-->

Comments may not be nested:

<!-- 
If I try to nest a comment inside another comment.
  <!-- Like this -->
Then this the part after the first closing tag will fall outside the comment.
-->

You need to watch our for nesting of comments if you ever try to comment-out a large section of existing HTML — inline comments in the original section will mess up your commenting.

Anything inside the comments will not be displayed to the user inside the browser. However, HTML comments can be viewed by the site visitors if they choose to view the page source. Therefore, do not use comments for anything you wish to hide from the public.

Summary

HTML is essentially text content with tags that are used to specify the meaning of that content within the document and the relationship of each piece of content to the others.

Tags are short snippets of letters inside angle-brackets. They typically consist of a matching pair — an opening and a closing tag. The opening tag is just the tag name, while the closing tag is prefixed with a slash.

Attributes may be added to any element. Attributes are specified inside the opening tag, as name–value pairs joined by an equal sign. The value must be inside single or double quotes (double quptes is standard).

The two most common attributes are the class and id attribute, which are used both for styling and functional purposes.

3. Textual HTML

This chapter covers all the most common elements that are used for typographical styling and semantic meaning within the text of a typical HTML document. Elements covered include headlines, paragraphs, lists, and links — and several others.

Headlines

HTML provides six levels of headline elements, <h1> through <h6>.

<h1>The most important title on a page</h1>

<h2>Title of a major section</h2>

<h3>Sub-section heading.</h3>

<!-- etc. -->

When to use <h1> and <h2>

The <h1> tag is considered by Search Engines to be the most important headline on a page, and look to it for a clue as to what the page is about. It should therefore match the content of the <title> tag inside the <head> if at all possible, and their should be only one <h1> element on any page.

On your home page and blog index page, it is best to put your site title in the <h1> tag, and the titles of articles in a blog index inside <h2> tags.

However, on a single-article page, the title of the post or article itself should be inside the <h1> tag, with the title of the website in an <h2> or even <h3> tag.

Similarly on a category-based or tag-based archive page, it is usually best to put the category or tag name inside an <h1> tag.

Hierarchical organization

It is (mildly) good for SEO, and very good for readers, to break articles into logical sections and use appropriate heading tags within the content of an article. Heading tags should be used in a hierarchical fashion — if an <h4> follows an <h3> tag, it should be the header for a subordinate section.

Subtitles

A title with a subtitle should not used two different header tags:

<!-- Do NOT do this: -->

 <h1>The Main Title of this Article:</h1>
 <h2>The Subtitle of that same article</h2>

<!-- Really. Please don't. -->

Instead, put the entire title and subtitle into a single headline tag and use another tag to define the relationship:

<h1>The Main Title of this Article:<br/>
<small>The Subtitle of that same article</small></h1>

<!-- OR -->

<h1>The Main Title of this Article:
<span class="subtitle">The Subtitle of that same article</span></h1>
<!-- This option requires some additional CSS to display in a sensible manner. -->

Headlines on Widgets

Sidebar sections, or widgets, need titles, but these are not generally relevant from a content (SEO) standpoint. Most well-informed designers use <h4> tags for this purpose, reserving <h1>, <h2>, and <h3> for keyword-relevant content.

<aside> <!-- sidebar -->

 <h4>Recent Posts</h4>
 <ul>
  <li><a href="#">Post title</a></li>
  <li><a href="#">Post title</a></li>
  <li><a href="#">Post title</a></li>
 </ul>

 <h4>Archive</h4>
  <ul>
     <li><a href="#">June 2015</a></li>
     <li><a href="#">May 2015</a></li>
     <li><a href="#">April 2015</a></li>
  </ul>

</aside>

However, if you regularly have so many sub-sections in your page content that you need to use <h4> headlines in your main text, there’s nothing really bad about using <h5> or even <h6> in your sidebar titles.

Headlines as link targets

In a particularly long document, it might be a good idea to make it possible to link not just to the page as a whole, but to a specific section.

In the past, only anchor tags ( <a> ) could be used as the target of a link, but that is no longer the case — any element can become the target of a page-location specific link.

The natural candidate for such links are headline tags, because they are used to identify the beginning of sections.

All that is needed to make this work is to add a unique id attribute to the header element. Links to that section are simply the page URL, appended with the hash sign ( # ) and the ID.


<!-- 
Imagine the following is a headline on the page
http://example.com/page
-->

<h3 id="some-headline">Some headline halfway through the page</h3>

<!--
Then a link to that section of the page would look like:
-->

<a href="http://example.com/page#some-headline">Click here to go there.</a>
<!--
If this doesn't make sense, read the section below about hyperlinks.
-->


<!--
Links on the page itself, such as for a table of contents, do not need the full URL, and can simply begin with the # sign.
-->

<a href="#some-headline">Click here to scroll there.</a>

<!--
A great example of this usage in an on-page table of contents is Wikipedia. Every article uses this type of in-document linking.
-->

Paragraphs

The paragraph tag — <p> — should surround every paragraph of text in your main content.

Multiple line breaks in your source code (without the <br> line-break tag) will not display as line breaks on-page. In order to get proper display spacing between paragraphs of text, you should use the <p> tag.

<article>
 <h2>The importance of paragraph tags</h2>
 <p>Every paragraph of your content should be within an paragraph element. The paragraph element is defined by the p-tag</p>
 <p>Using the paragraph tag properly ensures that your line spacing between paragraphs will display properly. It also helps with assistive technologies like text-to-voice readers (it helps with proper pausing).</p>
</article>

Some people prefer to put the opening and closing tags on individual lines. This may help with reading source code, but it makes no difference to how a page ultimately looks to a user.

<article>
 <h2>Putting p-tags on individual lines</h2>
 <p>
  Some people like to put the opening and closing paragraph tags on individual lines.
 </p>
 <p>
  There is no real benefit or drawback to doing it this way.
 </p>
</article>

Many CMSes, like WordPress, insert <p> tags into your post content automatically, so you don’t have to worry about it if you are using one of these systems.

Lists

There are three types of lists available in HTML:

  • <ul> — Unordered list. — Bulleted lists (like this one), called “unordered” because they are not numbered.
  • <ol> — Ordered list. — Numbered lists, which can use regular numerals (1, 2, 3), roman numerals (I, II, II or i, ii, iii), or letters (A, B, C or a, b, c).
  • <dl> — Definition list. — A list with individual terms and then descriptions for each term. (This list could have been a definition list, but it isn’t.)

Unordered List — <ul>

The unordered list is a way to present a list of bullet-pointed items. The list itself is wrapped in the <ul> tag, and each item in the list is wrapped in the <li> (list item) tag.

<ul>
 <li>Apples.</li>
 <li>Oranges.</li>
 <li>Typewriters.</li>
</ul>
  • Apples.
  • Oranges.
  • Typewriters.

In the past, you could specify what kind of bullet you wanted (disc, square, circle) in the type attribute. But as of HTML5, this is not supported. If you want to change the bullets, you’ll have to use CSS.

<ul type="square"> <!-- DON'T DO THIS -->
 <li>It's bad.</li>
 <li>It's wrong.</li>
 <li>It's unsupported.</li>
</ul>

Ordered List

Ordered lists are lists which are numbered or lettered. The outside element is <ol>, and the <li> tag is used again for each item.

<ol>
 <li>Collect underpants.</li>
 <li>???</li>
 <li>Profit.</li>
</ol>
  1. Collect underpants.
  2. ???
  3. Profit.

The <ol> element supports several attributes which change how the the list is numbered.

The type attribute can be used to change the default Arabic numerals (1, 2, 3) to letters or Roman numerals (capitals or lower-case).

<ol type="i">
 <li>Lowercase Roman numeral 1.</li>
 <li>Lowercase Roman numeral 2.</li>
 <li>Lowercase Roman numeral 3.</li>
</ol>
  1. Lowercase Roman numeral 1.
  2. Lowercase Roman numeral 2.
  3. Lowercase Roman numeral 3.

Options for type are:

  • 1 — Arabic numerals (1, 2, 3) — This is the default.
  • A — Capital letters (A, B, C)
  • a — Lower-case letters (a, b, c)
  • I — Capital Roman numerals (I, II, III)
  • i — Lower-case Roman numerals (i, ii, iii)

The start attribute can be used to begin the list numbering on a number other than 1. This can be used for numbers or for other types.

<ol start="10">
 <li>Chocolate</li>
 <li>Vanilla</li>
 <li>Motor Oil</li>
</ol>

<ol type="I" start="8">
 <li>Telesphorus</li>
 <li>Hyginus</li>
 <li>Pius</li>
 <li>Anicetus</li>
</ol>
  1. Chocolate
  2. Vanilla
  3. Motor Oil
  1. Telesphorus
  2. Hyginus
  3. Pius
  4. Anicetus

Finally the reversed attribute can be used to number the list items in reverse order. This can be combined with either of the other attributes (or both).

<h3> Out of the starting gate!</h3>
<ol start="-3">
 <li>Wait for it.</li>
 <li>Wait for it.</li>
 <li>Wait for it.</li>
 <li>GO!</li>
</ol>

<h3>Top Ten Reasons</h3>
<ol start="10" reversed>
 <li>Because.</li>
 <li>And so therefore.</li>
 <li>QED</li>
 <li>etc.</li>
</ol>

Out of the starting gate!

  1. Wait for it.
  2. Wait for it.
  3. Wait for it.
  4. GO!

Top Ten Reasons

  1. Because.
  2. And so therefore.
  3. QED
  4. etc.

Things to notice about these two examples:

  1. The start attribute can be negative.
  2. Even if the list is reversed, the start value is the first number for the list.
  3. The reversed attribute doesn’t need to specify a value. This is because it has only two possible values: true (present) or false (absent).
  4. A top-ten (or similar countdown) list doesn’t need to specify a start attribute if it ends with 1, which will always be the last number in a reversed list unless otherwise specified. The example above didn’t actually contain ten items, so it was necessary to specify.
  5. The default behavior is to increase the number for each succeeding list item. Therefore, if you want to “countdown” from a negative number, you should not include the reverse attribute.

Description / Definition Lists

Description lists (or “Definition” lists, as they are more comonly called) are a bit different than ordered and unordered lists. They are used to provide a list of terms with descriptions, such as in a glossary.

The whole list is wrapped in the <dl> tag. Each term in the list is marked with a <dt> tag (“definition term”), and each term is followed by one or more <dd> elements (“definition description”).

<h3>Types of Lists</h3>
<dl>
 <dt>Ordered List</dt>
 <dd>A numbered list of items.</dd>
 <dt>Unordered List</dt>
 <dd>A list of bulleted items.</dd>
 <dt>Definition List</dt>
 <dd>A list of terms with associated definitions.</dd>
 <dd>Each term can have one or more definition descriptions.</dd>
</dl>

Types of Lists

Ordered List
A numbered list of items.
Unordered List
A list of bulleted items.
Definition List
A list of terms with associated definitions.
Each term can have one or more definition descriptions.

The obvious use for a description list is a glossary or dictionary, but that isn’t the only standard use.

List of names : with contact information in the description. List of audio track titles : with detailed track information in the description. List of product offerings : with information about the products in the description. List of stats : with the stat name as the term and the stat value in the description

Anytime you have a list of items which each require more detail, the description list is a good idea.

Definition lists are even more powerful than you might already realize because the <dd> tag — the description — can hold any other elements: paragraphs, images, other lists. This means that a description list can be a very content-rich markup scheme whenever you have individual items which each need additional details of any kind.

There is also one off-the-wall use for description lists, which is somewhat controversial. It was included as an example in the HTML4 specification, but removed for HTML5: script-like dialogue.

<dl>
 <dt>Reader</dt>
 <dd>What is your favorite HTML entity?</dd>
 <dt>Author</dt>
 <dd>Funny you should ask! It's the description list.</dd>
 <dt>Reader</dt>
 <dd>Really? What's so great about it?</dd>
 <dt>Author</dt>
 <dd>It's so oddly flexible.</dd>
</dl>

<!-- 
The HTML5 specification removed this type of usage as an example, and provides a different suggestion for how to mark up dialogue. 

However, some people (including this author) still think it is the best solution for this type of thing. 

Since the spec's suggestion for dialogue
 1. Is just a suggestion, and
 2. Contains no meaningful semantic information,
it really makes no difference which style you follow.
-->

<a href="http://www.w3.org/html/wg/drafts/html/master/semantics.html#conversations">The official specification's suggestion for marking up dialogue can be found <strong>here</strong>.</a>
Reader
What is your favorite HTML entity?
Author
Funny you should ask! It’s the description list.
Reader
Really? What’s so great about it?
Author
It’s so oddly flexible.
The official specification’s suggestion for marking up dialogue can be found here.

Definition lists are underused, but they are actually a really great way to present all sorts of content.

Nesting lists

All three styles of list can be nested to form an outline-style hierarchical list.

<ul>
 <li>Item One</li>
 <li>Item Two
  <ul>
   <li>Sub-item A.</li>
   <li>Sub-item B.
    <ul>
     <li>Sub-sub-item i.</li>
     <li>Sub-sub-item ij.</li>
     <li>Sub-sub-item iij.</li>
    </ul>
   </li>
   <li>Sub-item C.</li>
  </ul>
 </li>
 <li>Item Three</li>
</ul>
  • Item One
  • Item Two
    • Sub-item A
    • Sub-item B
      • Sub-sub-item i
      • Sub-sub-item ij
      • Sub-sub-item iij
    • Sub-item C
  • Item Three

Notice that the bullets automatically change with each nesting. This is the default rendering style for most browsers.

Unfortunately the same thing does not happen with ordered lists. If you want the school-notes outline style with different types of numbering at each level, you have to do it yourself.

<h3>This is going to look bad</h3>
<ol>
 <li>Item One</li>
 <li>Item Two
  <ol>
   <li>Sub-item A.</li>
   <li>Sub-item B.
    <ol>
     <li>Sub-sub-item i.</li>
     <li>Sub-sub-item ij.</li>
     <li>Sub-sub-item iij.</li>
    </ol>
   </li>
   <li>Sub-item C.</li>
  </ol>
 </li>
 <li>Item Three</li>
</ol>

<h3>Here's how you have to do it</h3>

<ol type="I">
 <li>Item One</li>
 <li>Item Two
  <ol type="A">
   <li>Sub-item A.</li>
   <li>Sub-item B.
    <ol type="1">
     <li>Sub-sub-item i.</li>
     <li>Sub-sub-item ij.
      <ol type="a">
       <li>Way down in the hierarchy.</li>
       <li>Does anyone need this many list levels?
        <ol type="i">
         <li>This is getting ridiculous.</li>
        </ol>
       </li>
      </ol>
     </li>
     <li>Sub-sub-item iij.</li>
    </ol>
   </li>
   <li>Sub-item C.</li>
  </ol>
 </li>
 <li>Item Three</li>
</ol>


<!-- If this is the sort of thing you need to do a lot, it would be better to specify numbering type in the CSS. This is covered in the CSS chapter. -->

This is going to look bad

  1. Item One
  2. Item Two
    1. Sub-item A.
    2. Sub-item B.
      1. Sub-sub-item i.
      2. Sub-sub-item ij.
      3. Sub-sub-item iij.
    3. Sub-item C.
  3. Item Three

Here’s how you have to do it

  1. Item One
  2. Item Two
    1. Sub-item A.
    2. Sub-item B.
      1. Sub-sub-item i.
      2. Sub-sub-item ij.
        1. Way down in the hierarchy.
        2. Does anyone need this many list levels?
          1. This is getting ridiculous.
      3. Sub-sub-item iij.
    3. Sub-item C.
  3. Item Three

Nested lists can mix list types.

<dl>
 <dt>This is an ordered list:</dt>
 <dd>
  <ol>
   <li>Cakes.</li>
   <li>Pies.</li>
   <li>The cake is a lie.</li>
  </ol>
 </dd>
 <dt>This is an unordered list, listing types of lists:</dt>
 <dd>
  <ul>
   <li>Ordered lists</li>
   <li>Unordered lists</li>
   <li>Description lists</li>
  </ul>
 </dd>
 <dt>This is an unordered list nested inside of an ordered list, which is inside of this description list:</dt>
 <dd>
  <ol>
   <li>The first item.</li>
   <li>The second item.</li>
   <li>The third item, which is has the nested list.
    <ul>
     <li>Knife</li>
     <li>Fork</li>
     <li>Spoon</li>
     <li>Spork</li>
     <li>Chopsticks</li>
    </ul>
   </li>
   <li>gt;This fourth item is here just to frame the nested list better.</li>
  </ol>
 </dd>
</dl>
This is an ordered list:
  1. Cakes.
  2. Pies.
  3. The cake is a lie.
This is an unordered list, listing types of lists:
  • Ordered lists
  • Unordered lists
  • Description lists
This is an unordered list nested inside of an ordered list, which is inside of this description list:
  1. The first item.
  2. The second item.
  3. The third item, which is has the nested list.
    • Knife
    • Fork
    • Spoon
    • Spork
    • Chopsticks
  4. This fourth item is here just to frame the nested list better.

It should be noted that lists can not be nested inside of paragraph elements ( <p> ). This is because all lists are block-level elements, and paragraphs (which are blocks also) can only contain span-level elements.

This can occasionally be annoying because in normal written text there are sometimes perfectly good reasons for wanting to include a list inside of a paragraph. However it simply does not work.

Block quotes and inline quotes

If you are quoting someone or something, use one of the two HTML quote elements.

Blockquotes

The blockquote is much more common. This is because of normal typographical convention:

  • blockquotes (multi-line quotes or excerpts) are displayed a special way (usually indented and sometimes italicized),
  • whereas inline quotes are simple marked with puncutation.
<blockquote>
To be or not to be, that is the question.
</blockquote>
To be or not to be, that is the question.

Blockquotes can be used for large blocks of quoted material, whether that material is an excerpt from a literary work, song, another blog post, or an email that you are responding to.

If you want to cite the source of the quote, there are two ways to do that. The <blockquote> element can be given a cite attribute, or a byline can be added with a <cite> tag surrounding the source title. You can also do both.

<blockquote cite="http://www.gutenberg.org/ebooks/2265">
To be or not to be, that is the question.<br>
&mdash; <cite>Hamlet</cite>, William Shakespeare
</blockquote>

<!-- Either use of "cite" is fine by itself. -->
To be or not to be, that is the question.
Hamlet, William Shakespeare

It should be noted that that the <cite> tag should include the title of the original work being quoted, and may optionally include the name of the author and other information (such as page number or act and scene number).

The citation at the end of the quote could be better identified if it was placed inside of a <footer> element, and if the citation itself linked to the source material. Doing this would make the cite attribute within the <blockquote> tag redundant, so we’ll remove it. Finally, we’ll add a paragraph tag and remove the em-dash (&mdash;), so that only the information — and not display details — are included.

<blockquote cite="http://www.gutenberg.org/ebooks/2265">
<p>To be or not to be, that is the question.</p>
<footer>
 <cite><a href="http://www.gutenberg.org/ebooks/2265">Hamlet</a>, William Shakespeare</cite>
</footer>
</blockquote>

A blockquote could include a <header> as well, which might be used to introduce the quote itself, or to quote original header information.

Inline Quote

The less-commonly used quoting element is the inline quote, <q>.

<p>
My favorite line in <cite>Hamlet</cite> is when he says, <q cite="http://www.gutenberg.org/ebooks/2265"> To be or not to be, that is the question</q>.
</p>

My favorite line in Hamlet is when he says, To be or not to be, that is the question.

This is not often used because there is already a perfectly good way to show that you quoted something — by using quotation marks.

However, using the <q> tag instead of simple quotation marks has a few advantages.

  • The display of the quotation marks can be changed via CSS, which is helpful for internationalization, since not all countries use the same symbols for quotation marks.
  • The fact that the text is a quotation from another source is semantically clear, whereas quotation marks could be used for other reasons:
    • rhetorical “scare quotes”
    • mentioning a word or phrasing
    • reporting a real conversation that has no source text
  • The opportunity to include a cite attribution linking to the original source of the quote.

Hyperlinks

One of the most important tags in HTML is the anchor tag ( <a> ), which defines a hyperlink. The ability to link documents into a network of connections is the essence of the web, and the definition of “hypertext.”

The element is called an “anchor” because it is used to anchor a linked URL to some specific text on a page. (This is in contrast to the <link> tag, which connects the entire document, not a specific section of text.)

The text inside the element is called the “anchor text,” and the linked URL is specified in the href attribute.

<a href="http://example.com">This is a link to example.com</a>

Along with the href, the <a> tag has several important attributes.

  • target specifies what window (or browser tab) to open the link in. The default is the same window. If you want to open a new tab set target="_blank".
  • title sets the tooltip or hover-text of a link. This displays in a small popup when the user mouses over the anchor text. It is useful for providing some additional information about what the user is about to click on.
  • rel reports on a relationship between the linked document and the current document. It has several possible values:
    • alternate — The linked document has the same content as the current document, but in an alternate format. Used most often to link to RSS feeds.
    • author — The linked document is the homepage of profile of the author of the current document or article.
    • bookmark — A link to a specific point in the document (such as when creating an on-page table of contents).
    • help — The linked document provides help documentation to the current document.
    • license — The linked document is the license text for the current document.
    • next — The linked document is the next part in a paginated series. Some browsers will pre-fetch the contents of the linked document in order to speed up rendering when the user finally clicks on it.
    • nofollow — The linked document is not endorsed by the author of the current document. Used to prevent giving SEO benefit to the linked page. Comment systems often add this to user-entered links by default.
    • noreferrer — Used to prevent sending referer information in the HTTP request header when the user clicks on the link. Typically, the HTTP request will specify where the user is coming from (the current page). This requests that the browser client omit that information.
    • prefetch — Similar to next, but without implying an actual sequential relationship. This requests that the browser fetch the contents of the linked page before the user clicks on it, so that navigation to the next page seems instantaneous.
    • prev — The inverse of next, this value specifies that the linked document is the previous page in a paginated series. Some browsers may prefetch the contents.
    • search — The linked page provides an interface specifically for searching the current document and related documents.
    • tag — The linked document provides context as to the topic of the current page.

The rel attribute is underused by non-technical website creators, and it is a great way to bring rich, semantic information into the markup in a way that search engines, aggregators, and screen readers can understand.

For example: - Google uses the rel="author" link (if linked to a Google+ profile) to display links to other content by the same author in search results. - Google image search includes the ability to search by license, to find Creative Commons licensed content for re-use. That feature depends, in part, on the rel="license" attribute being used in links to Creative Commons and other open licenses. - Several search engines and news aggregation sites use the anchor text and referenced page of a rel="tag" link to determine the topic of a given page.

The rel tag can also be used in Microformats, which are simple ways of including additional semantic information within existing HTML attributes (usually rel and class).

For example, the XFN Microformat suggests using the rel attribute when linking to the home or profile pages of people with whom you have a relationship.

<p>Next month I'll be spending a whole weekend at a conference with <a href="http://example.com/kami-profile" rel="co-worker">Kami</a>. The conference is near my home town, so I'm hoping to be able to have lunch with <a href="http://example.com/dave-profile" rel="parent">my dad.</a></p>

There are several additional Microformats that use the rel tag, as well as other ways to include this kind of semantic information in the markup of your website. These will be covered in the chapter on Semantic HTML.

Text decoration

There are a number of simple tags which are used for basic text markup within a paragraph or other element.

Bold

There are two tags that can be used for making text bold.

  • <strong> is recommended for use to mark “important” text. It causes the wrapped text to display as bold but also carries semantic meaning (that the text itself is somehow important).
  • <b> simply bolds the text without suggesting any particular semantic meaning.

Italic

Like bold, there are two ways to make text display in italics.

  • <em> suggests that the wrapped text is “emphasized” somehow.
  • <i> is simply italicized, with no specific semantic meaning attached.

Underline

Although it has become less popular recently, the standard text display for hyperlinks ( <a>) is to underline them. Therefore, non-link underlining does not get used very often. There are, though, markup tags for it.

  • <u> is the generic tag for underlining text. The use-case presented by the specifications is underlining misspelled words. The HTML5 spec also wants you to know that other elements are usually more appropriate, and don’t use this if it could be mistaken for a link.
  • <ins> means text that has been inserted, and is usually used in conjuction with the <del> tag, to show the changes made to a text.
<p>The show will begin at <del>7:00pm</del> <ins>8:00pm</ins>.</p>

The show will begin at 7:00pm 8:00pm

Line-through

There are two elements which mark text to be lined through. Each has a slightly different meaning.

  • <del> is for text which is to be understood as deleted or changed, and it used with the <ins> tag as noted above.
  • <s> is used for text which is no longer correct or no longer relevant.

There is also a <strike> tag which was available in HTML4. It is no longer a part of the HTML specification.

While the specification’s description of <del> and <s> are slightly different in theory, experts have not come to any agreement on the practical details of the difference, or what situations would specifically call for one rather than the other.

Source code and unprocessed text

There are two elements used for displaying text or code which you do not want to be rendered by the browser, but simply displayed “raw” to the used.

  • <pre> — Is used for blocks of code or unprocessed text.
  • <code> — Is used when you need to include a short word or phrase of code inline with your text.

They both display in a monospace font (usually Courier) by default, preserve whitespace, and do not render any tags.

This guide makes heavy use of both the <code> and the <pre> elements for displaying source code examples and discussing elements tag names.

Text sizing

You can make text arbitrarily larger or smaller with two elements that otherwise have no specific meaning:

  • <big>
  • <small>

The most common use of sizing elements is placing the subtitle of a page or article into a <small> element nested inside the <h*> headline tag.

The generic <span> element

If you need to markup specific length of text for semantic or styling purposes, but none of the existing tags makes sense, you can use the generic <span> element, along with a class attribute (and some CSS) to create the desired effect.

<p>I'm not sure why there isn't a sarcasm tag. Maybe it just isn't needed because <span class="sarcasm">tone is so easy to read on the internet.</span></p>
/* CSS */

.sarcasm {
    color: purple;
    font-style: italic;
}

I’m not sure why there isn’t a sarcasm tag. Maybe it just isn’t needed because tone is so easy to read on the internet.

Separators

HTML provides two tags for adding in separation within text.

  • <br> inserts a line break
  • <hr> inserts a horizontal line

Neither of these elements requires a closing tag, because they do not enclose any text. If it helps you read your source code better, you may include the se;f-closing end slash: <br/> and <hr/>.

Line breaks are especially useful when you need to have hard linebreaks but other solutions — like multiple <p> tags don’t make sense. Two good examples are poetry or song lyrics and addresses.

<p>
Roses are red<br>
Violets are blue<br>
Rhyming is hard<br>
HTML5 is awesome.
</p>

<hr> 

<p>
123 Main St.<br>
Fort Worth, TX 76148
</p>

Roses are red
Violets are blue
Rhyming is hard
HTML5 is awesome.


123 Main St.
Fort Worth, TX 76148

Summary

All this may seem complicated, but it really isn’t. Most of the tags you need on a regular basis are easy to remember: headlines, paragraphs, unordered list. You don’t need to memorize all the different options or meanings behind each one. Just try to keep in mind that any normal typographical item (like a headline, a list, a paragraph, or a link) probably has an existing HTML tag to accomplish it. If you keep that in mind, you can just write without focusing on these things and then look up the specific items you can’t remember.

Try not to get bogged down with options, either. The important thing is that your markup is as meaningful as possible, without being overly complicated. If you can’t decide which of two or more options is the best, ask which one is more meaningful. If you can’t figure that out, ask which one is simpler. If you still can’t decide, just pick one — if they seem that similar, then there probably isn’t going to be a huge difference in how it works out.

4. Structural HTML

This chapter explains the overall structure of an HTML document, including what types of informaiton are contained in the <head> and <body>. It also explains how to organize the various sections of a typical web page.

Basic HTML Document Structure

HTML documents (web pages) need to follow a few basic structural rules in order to work properly and be read accurately by web browsers.

The document must begin by declaring a DOCTYPE. There are several different HTML (and related) standards that have been in use over the years, and so therefore it is important to specify which type of document (which HTML standard) your document is using.

Mostly, today, the correct DOCTYPE is simply html. So an HTML document should begin with:

<!DOCTYPE html>

This isn’t exactly an HTML tag in the proper sense, but rather it tells the browser how to interpret all the other tags that follow.

After the DOCTYPE declaration, the opening tag is the <html> tag. The closing of the <html> tag will be the last line of the document.

Inside the HTML tag, you can specify the language of the document (in this case, English).

<!DOCTYPE html>
<html lang="en">
.
.
.
<!-- entire contents of page -->
.
.
</html>

Nested inside the <html> tag are two sections, the <head> and the <body>. The body contains all the visible content, while the head contains information about the document itself. Nothing is outside of these two sections.

<!DOCTYPE html>
<html lang="en">
 <head>
  .
  <!-- Info about document here. -->
  .
 </head>
 <body>
  .
  .
  <!-- Contents of document here. -->
  .
  .
 </body>
</html>

This is the basic structure of every HTML document. Everything is basically extra.

Contents of <head>

The <head> element of an html document usually contains all the information needed by a browser to properly render the document, plus additional information describing the contents (for the benefit of aggregators and bots).

Metadata

The <meta> tag is used several times in the <head> to specify various metadata (data about the document).

Metatags are empty tags, requiring no closing tag. You may end them with the self-closing slash ( />), but this is not required (and some people even specifically discourage it).

Character Encoding

There are several different common ways to encode characters (letters, numbers, and punctuation) in computer memory. If you don’t specify which one you are using, the web browser may mess up and display some of the wrong characters.

Most of the the time, these days, you want to specify the UTF-8 character set.

(The other common encoding — ASCII — doesn’t have all the extended characters like em-dashes and curly-quotes. If you’ve ever seen weird type glitches where quotation marks or apostrophes have been replaced with seemingly random characters, it’s because the document was written in UTF-8 but displayed using ASCII — which means someone didn’t specify the correct character set in the document.)

<meta charset="utf-8">
Description, Author, and Keywords

Basic information about the document — who wrote it and what it is about — are also conveyed through <meta> tags. These each have two attributes: the name of the tag, and the content of the tag.

<meta name="description" content="A page about HTML.">
<meta name="keywords" content="HTML, tags, metadata">
<meta name="author" content="Adam Michael Wood">

This kind of information used to be especially important for SEO purposes. It is no longer the case that this plays a huge role in SEO, however it does affect it. More importantly, having correct and detailed information in these elements contributes to a semantic web, where content all is easily findable and parsable by machines.

(If you use a Content Management System, the tags and post descriptions you write in the editor screen will usually be displayed in these meta tags.)

Title

The <title> tag appears in the head, and usually does not have any attributes. It encloses the title.

<title>
 This is the title of the page.
</title>

The title should be accurate and, if possible, match the on-page visible title (usually in an <h1> or <h2> headline tag) in the body. The contents of the title are typically displayed in the tab at the top of the browser window.

It is not a good idea to nest any other tags in the title (like <b> or <i>) because they will usually not display properly.

An HTML document can only specify one title.

CSS Links

Style Sheets, written in the CSS (Cascading Style Sheet) language, are separate documents which provide information about how to display a page in a browser. Information about sizes, colors, placement, and fonts are all contained in the style sheet. Keeping these details separate from the main HTML document makes it easier to change them without affecting the content of the document itself.

CSS style sheets are linked to within the <head> of the HTML document, using the <link> tag. The href attribute specifies the URL of the style sheet file, and the rel attribute specifies that the link is a stylesheet link (there are other types of links).

<link href="/css/style.css" rel="stylesheet">

RSS Information

RSS — Rich Site Summary, or Really Simple Syndication — is a way of providing a feed of site updates (like new blog posts) to subscribers, so that they are informed of new content as it is posted and can read that content from an RSS reader without having to visit your site.

If you are using a Content Management System, it will generally create an RSS feed for you, which is an XML document available at its own URL. That URL should be linked to from the <head> of your blog’s main index page, so that RSS readers and web browsers can find it easily.

<link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml" />

The rel="alternate" attribute means that the linked URL contains the same content (a list of blog posts), but in an alternative format. The type attribute specifies the type of format (RSS).

Other info

A lot of additional details about a document frequently appear in the <head>. These will be covered in more detail later, in the relevant chapters.

Javascript Links

It is possible to link to JS files from within the head, and this is a common practice. However, it is generally better to place these at the end of the document if possible.


Example of HTML document with <head> element completed

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <meta name="description" content="A page about HTML.">
  <meta name="keywords" content="HTML, tags, metadata">
  <meta name="author" content="Adam Michael Wood">
  <link href="/css/style.css" rel="stylesheet">

  <title>Guide to HTML</title>
  <link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml" />

  
 </head>
 <body>
  .
  .
  <!-- Contents of document here. -->
  .
  .
 </body>
</html>

Contents of <body>

The <body> tag is the main portion of the HML document, and may contain all sorts of things.

Typically, the structure of an HTML body can be divided into several sections, each possibly having one or more subsections:

  • header
    • logo / branding / site title
    • main navigation
    • search bar
  • main content
    • one or more articles
    • article title
    • article content
    • article metadata (author, tags, date posted)
  • sidebar(s)
    • widgets
    • secondary navigation (archives by date, category, or tag)
  • footer
    • copyright / license info
    • tertiary navigation
    • contact info
    • address / phone
    • social links

Not all of these sections will be included in every page, or appear the same way. However this provides a good starting point for an example of how these different pieces would be put together into the <body> of a document.

The <div> element

The most generic block-level element for structuring a webpage is the <div> element. This was once used for every section and subsection of the page contents.

This resulted in a lot of nested <div> tags.

<body>
 <div class="header">
  <div class="logo">
    <!-- logo here -->
  </div>
  <div class="main-nav">
    <!-- main navigation menu here -->
  </div>
  <div class="search-bar">
    <!-- Search bar form here -->
  </div>
 </div>
 <div class="page-content">
  <div class="main">
   <div class="article">
    <div class="article-header">
     <h1>Title of Article</h1>
     <div class="article-meta">
      <!-- Date, Author -->
     </div>
    </div>
    <div class="article-content">
     <p>Article.</p>
     <p>Content.</p>
    </div>
    <div class="article-footer">
     <!-- Tags, Categories, etc. -->
    </div>
    <div class="comments">
     <!-- Article comments and commenting form. -->
    </div>
   </div>
  </div>
  <div class="sidebar">
   <!-- Sidebar content, widgets, etc. -->
  </div>
 </div>
 <div class="footer">
  <div class="license">
   <!-- Copyright info -->
  </div>
  <div class="contact-info">
   <!-- Contact information -->
  </div>
 </div>
</body>

Thanks to an extended set of structural tags in the latest HTML standard (HTML5), this can be made easier to read more meaningful to search engines and other systems that extract information from your page (like screen readers for the blind).

Semantic structural tags

Many (but not all) of the <div> elements above can be replaced by newer semantic elements introduced in HTML5.

“Semantic” means, basically, “linguistically meaningful.” Rather than just a generic <div>, semantic tags have specific meanings related to how they are used on the page.

The most important semantic tags for page structure are:

  • <header> — Used for both document header information (page title, logo, navigation) and also article header (post title, meta data). — Don’t confuse with <head>, which contains metadata for the entire document.
  • <nav> — A container for navigation menus.
  • <main> — The primary, unique content of a page. — There can only be one <main> element in a document.
  • <article> — A single piece of content. A blog index page might have several <article> elements, but the permanent page of a post would have just the one.
  • <section> — A section of a document.
  • <aside> — Can be used for secondary content, like a sidebar. Can also be used within an <article>, for example to display pull-quotes or for comments (which are, by nature, tangential to the article).
  • <footer> — The footer for an entire document or a section of a document (like an <article>).
  • <address> — Used to contain the primary contact information related to the author or publisher of a page. Should not be used for arbitrary postal addresses contained in page content, but only for the contact information (including postal address, if relevant) of the author or publisher of a page or article.

Using these tags, lets recreate the example document above with elements that actually specify their semantic meaning.

<body>
 <header>
  <div class="logo">
    <!-- logo here -->
  </div>
  <nav>
    <!-- main navigation menu here -->
  </nav>
  <div class="search-bar">
    <!-- Search bar form here -->
  </div>
 </header>
 <div class="page-content">
  <main>
   <article>
    <header>
     <h1>Title of Article</h1>
     <div class="article-meta">
      <!-- Date, Author -->
     </div>
    </header>
    <section class="article-content">
     <p>Article.</p>
     <p>Content.</p>
    </section>
    <footer>
     <!-- Tags, Categories, etc. -->
    </footer>
    <aside class="comments">
     <!-- Article comments and commenting form. -->
    </aside>
   </article>
  </main>
  <aside>
   <!-- Sidebar content, widgets, etc. -->
  </aside>
 </div>
 <footer>
  <div class="copyright">
   <!-- Copyright info -->
  </div>
  <address>
   <!-- Contact information -->
  </address>
 </footer>
</body>

Using semantic tags — tags that actually mean something specific — makes the markup easier to read, because there are fewer repeated <div> tags. There’s also less need to make sure everything has a meaningful class attribute related to its use in the document.

Of course, some <div> tags are still needed, but far fewer.

But making markup easier to read only provides a benefit when developing or working with the code (debugging, updating your template). The bigger benefit to semantic markup is that it provides more detailed information to screen readers and bots about how your page is structured. This makes it more accessible to the blind, which is important. It also provides SEO benefit.

(More information about semantic markup, and related benefits, is covered in another chapter.)

A note about <main>

As of this writing, Internet Explorer does not support the <main> tag — it simply doesn’t understand what it means.

You can correct this by telling IE what the element is being used for with the role attribute.

<main role="main">
 <!-- main content here -->
</main>

A note about <article>

The <article> element is intended to be used for a piece of “stand-alone” content, with the most obvious example being a blog post. However, it does not need to be thought of in the “newspaper article” sense of the word “article.”

Each comment on a post can be an <article>, nested inside the larger <article>. Also, each widget in a sidebar could be considered an individual <article>.

It seems likely, however, that having a multitude of <article> elements on each page of a site could tend toward confusion about what content is actually central, and which content is not.

There is no definitive answer about whether this may be the case, but there also seems no real benefit to an abundant use of the <article> tag. For this reason, the most sensible option is likely to restrict its use to the “primary” content of a page. Comments can then be included as an aside to the article text, or (if you prefer) outside the <article> element, in a separate <section> within <main>.

Make <div> tags easier to read

If you still find it difficult to keep track of which closing </div> tags relate to which <div> elements, you can use comments as a helpful reminder. This strategy is used by many Content Management Systems and theme developers, especially when pieces of their HTML document are actually broken up across several different PHP template files.

The easiest way to do this is to put the class (or ID) name into a comment on the same line as the closing </div> tag. Following CSS and JQuery convention, class names are prefixed with a period ( . ) and ids with a hash sign ( # ).

<div class="wrapper">
 <div class="container">
  <div id="center-div">
  </div> <!-- / #center-div -->
 </div> <!-- / .container -->
</div> <!-- / .wrapper -->

This has no actual impact on anything, but can make future debugging and ongoing development easier, especially in a particularly complicated or long HTML document.

5. HTML Tables

This chapter covers HTML tables, including everything you need to know about how to markup various use cases. All the major table elements and attributes are covered, including table headers, footer, body, and columns. This chapter provides concrete suggestions for dealing with some of the difficulties built in to the table markup and touches on real world practices.

What are tables?

A table in HTML is a way to present “tabular data” — information that can be represented in a spread-sheet. Tables in HTML are two dimensional tables with rows and columns.

First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41

Tabular data comes in many forms. The easiest way to tell if something should be in a table, as opposed to a different syntax like definition list, is to ask yourself, “Would this make sense as a spreadsheet?”

If data would make sense as a spreadsheet, it is a good candidate for a table.

Table Syntax

Basic Syntax

All tables use the <table> element, the table row ( <tr> ) element, and the table cell ( <td> ) element.

These three elements alone are enough for a simple table. A table is built one row ( <tr> ) at a time.

<table>
<tr>
 <td>John</td>
 <td>Smith</td>
 <td>31</td>
</tr>
<tr>
 <td>Jane</td>
 <td>White</td>
 <td>32</td>
</tr>
<tr>
 <td>Terry</td>
 <td>Jones</td>
 <td>41</td>
</tr>
</table>
John Smith 31
Jane White 32
Terry Jones 41

Table Headers: Option 1

It is often desirable to put headers at the top of a table. One way of doing this is to replace normal table cells ( <td> ) with table header cells ( <th> ).

<table>
<tr>
 <th>First Name</th>
 <th>Last Name</th>
 <th>Age</th>
</tr>
<tr>
 <td>John</td>
 <td>Smith</td>
 <td>31</td>
</tr>
<tr>
 <td>Jane</td>
 <td>White</td>
 <td>32</td>
</tr>
<tr>
 <td>Terry</td>
 <td>Jones</td>
 <td>41</td>
</tr>
</table>
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41

The benefit of this approach is that you it doesn’t affect the entire row, only those cells which are designated as headers. That is — it’s a benefit if that’s what you want to happen.

Table Headers (and Body): Option 2

The other way to make table headers is to wrap the entire first row (or several rows, even) in a table-head ( <thead> ) element.

When this is done, the rest of the content is usually wrapped in a table-body ( <tbody> ) element.

<table>
<thead>
 <tr>
  <th>First Name</th>
  <th>Last Name</th>
  <th>Age</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td>John</td>
  <td>Smith</td>
  <td>31</td>
 </tr>
 <tr>
  <td>Jane</td>
  <td>White</td>
  <td>32</td>
 </tr>
 <tr>
  <td>Terry</td>
  <td>Jones</td>
  <td>41</td>
 </tr>
</tbody>
</table>

Doing this allows the entire header row to be styled.


thead {
 background-color: black;
 color: white;
 font-weight: bold;
}
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41

Perhaps more interestingly, this also allows the body of the table to be styled without affecting the head.

tbody tr:nth-child(odd) {
    background-color: #eee;
}
tbody tr:nth-child(even) {
   background-color:#fff;
}
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41

Table Footer

Along with a table head and a table body, you can also define one or more rows as belonging to a table footer ( <tfoot> ). This is useful if you need to style the last row differently than the other rows. Most commonly, this might be used if the last row is a summation or calculation based on the rows above it.

<style>
thead {
 background-color: black;
 color: white;
 font-weight: bold;
}
tbody tr:nth-child(odd) {
 background-color: #eee;
}
tbody tr:nth-child(even) {
 background-color:#fff;
}

tfoot {
 background-color: #222222;
 color: white;
 font-style: italic;
}

</style>
<table>
<thead>
 <tr>
  <th>First Name</th>
  <th>Last Name</th>
  <th>Age</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td>John</td>
  <td>Smith</td>
  <td>31</td>
 </tr>
 <tr>
  <td>Jane</td>
  <td>White</td>
  <td>32</td>
 </tr>
 <tr>
  <td>Terry</td>
  <td>Jones</td>
  <td>41</td>
 </tr>
</tbody>
<tfoot>
 <tr>
  <td></td>
  <td>Average Age</td>
  <td>34.67</td>
 </tr>
</tfoot>
</table>
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41
Average Age 34.67

Table Columns

Sometimes you need to style a table column. This can be achieved (to some extent) by using column markup.

Columns work a little strangely in HTML. Since tables are written as a series of rows, columns are define as a secondary overlay on the table.

At the top of the table, the <colgroup> element defines how columns will be laid over the table. Inside the <colgroup> are individual column definitions, using the <col> element. Each <col> spans one or more columns and defines a stylable entity.

<colgroup>
 <col style="background-color: cyan;">
 <col style="background-color:yellow;">
 <col style="background-color:red;">
</colgroup>
<table>
<thead>
 <tr>
  <th>First Name</th>
  <th>Last Name</th>
  <th>Age</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td>John</td>
  <td>Smith</td>
  <td>31</td>
 </tr>
 <tr>
  <td>Jane</td>
  <td>White</td>
  <td>32</td>
 </tr>
 <tr>
  <td>Terry</td>
  <td>Jones</td>
  <td>41</td>
 </tr>
</tbody>
<tfoot>
 <tr>
  <td></td>
  <td>Average Age</td>
  <td>34.67</td>
 </tr>
</tfoot>
</table>
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41
Average Age 34.67

Each <col> in the example above spans one column of table cells. If we wanted to apply styles to the two name columns as a single unit, we could make the <col> span two cell columns.

<colgroup>
 <col span="2" style="background-color: cyan;">
 <col style="background-color:yellow;">
</colgroup>
<table>
<thead>
 <tr>
  <th>First Name</th>
  <th>Last Name</th>
  <th>Age</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td>John</td>
  <td>Smith</td>
  <td>31</td>
 </tr>
 <tr>
  <td>Jane</td>

  <td>White</td>
  <td>32</td>
 </tr>
 <tr>
  <td>Terry</td>
  <td>Jones</td>
  <td>41</td>
 </tr>
</tbody>
<tfoot>
 <tr>
  <td></td>
  <td>Average Age</td>
  <td>34.67</td>
 </tr>
</tfoot>
</table>
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41
Average Age 34.67

There are problems with using the <colgroup> markup, unfortunately:

  • <col> only supports styles related to background, width, border, and visibility. This means you cannot, for example, style the first column of a table in bold.

  • Because <col> is neither a parent nor a child element of any table sections (head, body, footer), you cannot target a specific column within a section.

  • Moreover, the table sections and table rows are more specific than the <col> element, so styles applied to the sections will override any style applied to the

Because of these issues, <col> has limited usefulness for table styling.

There are two common solutions to this: class attributes and nth-child selectors.

To use class attributes, simply apply the column-specific class to each <td> (and/or <th>) element.

<table>
<thead>
 <tr>
  <th class="col1">First Name</th>
  <th class="col2">Last Name</th>
  <th class="col3">Age</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td class="col1">John</td>
  <td class="col2">Smith</td>
  <td class="col3">31</td>
 </tr>
 <tr>
  <td class="col1">Jane</td>
  <td class="col2">White</td>
  <td class="col3">32</td>
 </tr>
 <tr>
  <td class="col1">Terry</td>
  <td class="col2">Jones</td>
  <td class="col3">41</td>
 </tr>
</tbody>
<tfoot>
 <tr>
  <td class="col1"></td>
  <td class="col2">Average Age</td>
  <td class="col3">34.67</td>
 </tr>
</tfoot>
</table>

Of course, this adds a lot of markup which isn’t strictly required. A better way would be to use the :first-child, :nth-child, and :last-child CSS selectors.

For example, what if we wanted the First Name column to be bold, and the ages to display in a red,monospace font — along with the other header and footer styles defined earlier?

<style>
thead {
 background-color: black;
 color: white;
 font-weight: bold;
}
tbody tr:nth-child(odd) {
 background-color: #eee;
}
tbody tr:nth-child(even) {
 background-color:#fff;
}

tfoot {
 background-color: #222222;
 color: white;
 font-style: italic;
}

td:first-child {
 font-weight: bold;
}

td:last-child {
 font-family: monospace;
 color: red;
}

</style>
<table>
<thead>
 <tr>
  <th>First Name</th>
  <th>Last Name</th>
  <th>Age</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td>John</td>
  <td>Smith</td>
  <td>31</td>
 </tr>
 <tr>
  <td>Jane</td>
  <td>White</td>
  <td>32</td>
 </tr>
 <tr>
  <td>Terry</td>
  <td>Jones</td>
  <td>41</td>
 </tr>
</tbody>
<tfoot>
 <tr>
  <td></td>
  <td>Average Age</td>
  <td>34.67</td>
 </tr>
</tfoot>
</table>
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41
Average Age 34.67

Breaking the Grid: rowspan and colspan

Sometimes your tabular data doesn’t fit neatly into the grid created by a table. If you need a table cell to span two or more columns, use the colspan attribute. If you need to span more than one row, use rowspan.

For example, our table of ages has a footer row with a label for “Average Age.” This doesn’t need to be squashed into the second column. It would look better if the label span the first two cells in the last column.

<tfoot>
 <tr>
  <td colspan="2">
   Average Age:
  </td>
  <td>
   34.67
  </td>
 </tr>
</tfoot>
First Name Last Name Age
John Smith 31
Jane White 32
Terry Jones 41
Average Age 34.67

A similar syntax can be used to span two rows. (We’ll have to add a column for this, since we don’t have any good candidates for cell-merging.)

<table>
<thead>
 <tr>
  <th>First Name</th>
  <th>Last Name</th>
  <th>Age</th>
  <th>Cohort</th>
 </tr>
</thead>
<tbody>
 <tr>
  <td>John</td>
  <td>Smith</td>
  <td>31</td>
  <td rowspan="2">Oregon Trail Generation</td>
 </tr>
 <tr>
  <td>Jane</td>
  <td>White</td>
  <td>32</td>
 </tr>
 <tr>
  <td>Terry</td>
  <td>Jones</td>
  <td>41</td>
  <td>Generation X</td>
 </tr>
</tbody>
<tfoot>
 <tr>
  <td colspan="2">Average Age</td>
  <td>34.67</td>
  <td>
 </tr>
</tfoot>
</table>
First Name Last Name Age Cohort
John Smith 31 Oregon Trail Generation
Jane White 32
Terry Jones 41 Generation X
Average Age 34.67

What are tables not?

It shouldn’t really have to be said, but:

Tables are not for layout. Tables should not be used as a convenient way to make columns and headers at the level of a whole document.

This is sometimes still an issue today because before the era of standards-based web browsers and semantic markup, many people used tables (with a lot of complex style rules) to layout HTML documents.

This was a bad idea for a number of reasons, even then: it made the source document almost unreadable, it broke semantics completely, it made it nearly impossible to restyle a page without recoding all of it.

Today there is a new reason to avoid this — a reason that trumps all the others: it doesn’t work on mobile. Table-based layout is definitively not responsive, incapable of gracefully scaling to fit various screen sizes.

Besides all of that — compared to the right way of doing things, table-based layout is much more difficult. Just don’t do it.

Table Edge Case — side-by-side translations

One non-data use for tables that is fairly common is side by side translation. Consider the following excerpt from Dante’s The Divine Comedy.

Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura,
ché la diritta via era smarrita.
Midway upon the journey of our life
I found myself within a forest dark,
For the straightforward pathway had been lost.
Ahi quanto a dir qual era è cosa dura
esta selva selvaggia e aspra e forte
che nel pensier rinova la paura!
Ah me! how hard a thing it is to say
What was this forest savage, rough, and stern,
Which in the very thought renews the fear.
Tant’ è amara che poco è più morte;
ma per trattar del ben ch’i’ vi trovai,
dirò de l’altre cose ch’i’ v’ho scorte.
So bitter is it, death is little more;
But of the good to treat, which there I found,
Speak will I of the other things I saw there.

This is, of course, merely a table with a little styling:

<style>
#inferno-opening {
border: none;
border-spacing: 10px; 
}

</style>

<table id="inferno-opening">
<tr>
<td>
Nel mezzo del cammin di nostra vita <br>
mi ritrovai per una selva oscura, <br>
ché la diritta via era smarrita. <br>
</td><td>
Midway upon the journey of our life <br>
I found myself within a forest dark, <br>
For the straightforward pathway had been lost. <br>
</td>
</tr>
<tr>
<td>
Ahi quanto a dir qual era è cosa dura <br>
esta selva selvaggia e aspra e forte <br>
che nel pensier rinova la paura! <br>
</td>
<td>
Ah me! how hard a thing it is to say <br>
What was this forest savage, rough, and stern, <br>
Which in the very thought renews the fear. <br>
</td>
</tr>
<tr>
<td>
Tant’ è amara che poco è più morte;<br>
ma per trattar del ben ch’i’ vi trovai,<br>
dirò de l’altre cose ch’i’ v’ho scorte.<br>
</td>
<td>
So bitter is it, death is little more;<br>
But of the good to treat, which there I found,<br>
Speak will I of the other things I saw there.<br>
</td>
</tr>
</table>

The benefit of using tables in this example is that each row automatically adjusts it’s height based on the content in all the cells in the row. This keeps translated content next to its source, even if one language is more verbose than the other.

Many developers use this pattern for translated text, and it is perfectly fine. However, there may be a better way.

Consider the following HTML:

<div id="canto-1">
 <div class="italian">
  <p id="it-1" class="p1">
  Nel mezzo del cammin di nostra vita <br>
  mi ritrovai per una selva oscura, <br>
  ché la diritta via era smarrita. <br>
  </p>
  <p id="it-2" class="p2">
  Ahi quanto a dir qual era è cosa dura <br>
  esta selva selvaggia e aspra e forte <br>
  che nel pensier rinova la paura! <br>
  </p>
  <p id="it-3" class="p3">
  Tant’ è amara che poco è più morte; <br>
  ma per trattar del ben ch’i’ vi trovai, <br>
  dirò de l’altre cose ch’i’ v’ho scorte. <br>
  </p>
 </div>
 <div class="english">
  <p id="en-1" class="p1">
  Midway upon the journey of our life <br>
  I found myself within a forest dark, <br>
  For the straightforward pathway had been lost. <br>
  </p>
  <p id="en-2" class="p2">
  Ah me! how hard a thing it is to say <br>
  What was this forest savage, rough, and stern, <br>
  Which in the very thought renews the fear. <br>
  </p>
  <p id="en-3" class="p3">
  So bitter is it, death is little more; <br>
  But of the good to treat, which there I found, <br>
  Speak will I of the other things I saw there. <br>
  </p> 
 </div>
</div>

Using CSS to float the two languages next to each other, and JS to ensure that each pair of paragraphs (en-1 and it-2, etc.) is the the same height, the same effect can be created without recourse to table-based layout.

Advantages:

  • Some screens may not be wide enough to fit both text-columns side-by-side. Using this approach, one or the other can be viewed individually.
  • This allows multiple paragraphs of one text to be selected for copy-and-paste. With the table-based version, this is not possible.

Disadvantages:

  • Requires JavaScript
  • Each paragraph must be IDed in the markup.

Tables in the Real World

The default styles for tables are really quite unattractive, and therefore seldom used. Most front-end UI frameworks (like Bootstrap and Skeleton) provide highly improved default table styling.

Even if you aren’t using a front-end UI framework, it may be a good idea to pull in the styles for tables from one of the light-weight, modular frameworks. Tables have a lot of weird styling edge cases that you probably won’t cover if you try to fix the styling yourself from scratch.

Summary

Tables are probably the most complicated markup structure in HTML, and they have been abused in the past in order to serve as containers for layout. However, when tabular data needs to be displayed on a page, tables are the way to go.

6. HTML Forms

This chapter covers HTML forms in detail. Every variety of form element and user-input interface is covered, along with tips for organizing and styling forms.

Form Basics

An HTML form is a set of UI elements that allow a user to provide data, along with a mechanism for submitting that data to the server.

A very basic example might look like:

Even in this simple form, we can see there is an opportunity for the user to input data (first and last name) and the send that data to a server.

Forms can become very complicated, and there are many interesting input types now available thanks to HTML 5, but no matter how complicated they get, the heart of an HTML form is the same: a series of user input elements together with a way to submit the input to the server.

How a Form Works

Before diving in to all the different user interface elements, it would be a good idea to get a clear grasp on how a form functions when sending user data to a server.

A form creates an HTTP request — the same type of request that your browser sends when loading a page. The contents of that request are determined by the values inputted into the form. The response from the server is essentially the same as the type of response received from a page load — and the browser responds the same way, by loading the response as a new page.

In other words: a form submission is essentially the same as a request for a new page, except the request carries with it user-defined data provided through form inputs.

What happens with the requested data is a subject for server-side scripting (PHP, Ruby, etc.), so we won’t worry about that.

HTTP Requests and Form Methods

Forms can send two different types of requests:

  • POST
  • GET

These two request types have different meanings, which cause them to behave differently and so they should be used in different situations.

__The Semantic Difference Between POST and GET

A GET is the default HTTP request, and is the same type of request used by your browser when you type an address into the address bar. It is a request to “get” something.

A POST is not a request to get something, but rather a request to send or submit something. You can think about posting a letter, posting bail, or posting a sign.

The Technical Difference

When using a GET request, the input parameters are included into the URL.

http://example.com/search?term=thing+i+am+searaching+for

With a POST request, the input parameters are not included in the URL, but are rather sent in the body of the request.

The difference here makes sense based on the meaning of each type of request:

  • A GET request is asking for a specific resource, defined by the URL. Therefore, the details of that request should be included in the URL, because those details define what resource the request is actually asking for.

  • A POST request is sending a message to a particular address. The address is defined in the URL, and the message is defined in the body of the request.

When to use POST and GET in Forms

If a form is being used to request data and information — such as a search form — and isn’t primarily intended to add or edit content, it is probably best to use a GETrequest.

Other indications GET is the right choice:

  • Two different users submitting the same details into the form should get identical responses.
  • The response from the form is something one might want to link to directly.
  • Other than logging the traffic and activity, your database is the same after the form submission as it was before.
  • The form is a search form.
  • The user is using the form in order to get some information from your site, not to provide information to you.

If a form is being used to submit information, a POST is most likely the right choice.

Other indications that POST is the right choice:

  • It is highly unlikely that two different users would submit identical information.
  • It is highly unlikely that a single user would submit the exact same information more than once.
  • The form is used to submit information to the site, not to retrieve information from it.
  • Linking to the response page directly would be meaningless.
  • Your database is different after the form is submitted than it was before.

Additionally, there are two reasons why a POST should be used, even if a GET request makes more sense for other reasons:

  • For security reasons, it is preferable not to put the input parameters into the URL.
  • The length of the URL using a GET request would exceed 2000 characters.

Where to Define POST or GET

Every form submits information to the server using either the GET or the POST request type. This is defined with the method attribute in the <form> element.

<form method="GET">
<!-- form here -->
</form>

<form method="POST">
<!-- form here -->
</form>

The default method is GET, which has led to many unfortunate used of GET when POST would have been the right choice. Don’t rely on the default — use the correct method for your situation.

Form Action — The Requested URL

A form either gets a resource (defined by a URL) or posts information to a resource (defined by a URL).

The URL of the resource is defined by the action attribute in a form.

<form action="search.php" method="GET">
<!-- form here -->
</form>

<form action="edit-post.php" method="POST">
<!-- form here -->
</form>

As with href and src attributes, the URL can be relative (action="search.php") or absolute (action="http://example.com/search.php").

If the action attribute is omitted, the default URL is the current page. (This will still trigger a reload of the page, under normal circumstances.

Other attributes of <form>

The following attributes apply to the <form> element:

  • accept-charset — Defines the character set used for submitting form data. The default is the same as the document’s character set, so this is usually not needed.
  • action — The target URL for form submission. Explained above.
  • autocomplete — Enables autocomplete in supported browsers. Values are on or off. The default value is on. It is possible to override this setting on individual form elements.
  • enctype — Specifies how form data should be encoded. This applies to POST forms only. Values are:
    • application/x-www-form-urlencoded — All characters are encoded before sent. Spaces are converted to + symbols, and special characters are converted to ASCII HEX values. This is the default.
    • multipart/form-data — Characters are not encoded. This is required if you are using a file uploader in your form.
    • text/plain — Spaces are converted to + symbols, but special characters are not encoded.
  • methodGET or POST
  • name — The name of the form. It is usually a good idea to include one, and there’s no reason it couldn’t be the same as the id.
  • novalidate — Specifies that the form data should not automatically be validated when it is submitted. This attribute accepts no value. (Be careful with this.)
  • target — Equivalent to the target attribute on anchor links ( <a> ), this attribute specifies where the response from the form should be displayed.
    • _self — Opens response in same window. This is the default.
    • _blank — Opens the response in a new window or tab.
    • _parent — Opens the response in the form’s parent window or frame.
    • _top — Opens the response in the full body of the window.
    • framename — You can also specify the name of a frame to open the response into, if you have previously opened and named frames in the page.

Using Form Elements

Element Names

When a form is submitted to the server, the request — whether it is a POST or a GET — contains the data entered into the form by the user. This data is sent in the form of a series of key-value pairs.

The value for each form element is the data entered by the user. The key for each element is the name attribute for that element. For this reason it is critical that every data-entry element in your form have a unique name attribute.

<form action="" method="post">
<label for="firstName">First Name</label>
<input type="text" name="firstName" id="firstName">
<label for="firstName">Last Name</label>
<input type="text" name="lastName" id="lastName">
<input type="submit">
</form>

Element Labels

The <label> element is very important, as it defines the label for any form element.

Some designers do not like to use form labels because they prefer to place the label’s text into the input element.


<!-- Don't do this -->

<form action="" method="post">
<input type="text" name="firstName" placeholder="First Name">
<input type="text" name="lastName" placeholder="Last Name">
<input type="submit">
</form>

While this might look better for your design, there’s are two serious usability problems:

  • The labels are used by screen readers to tell blind users what the fields are for.
  • Not all browsers support the “placeholder” attribute.

Without proper labeling, you risk some users being unable to complete you form.

It is also unfortunately too common for people to include the <label> element, but not use it correctly.

For a label element to work properly, the for attribute should contain the value of the id property on the input element.

<label for="firstName">First Name</label>
<input type="text" name="firstName" id="firstname">

This serves two purposes:

  • Making sure that the markup specifies which element the label applies to helps screen readers connect labels to input elements, so that blind users can better navigate your form.

  • Clicking on the label will act like clicking on the input element. This improves usability greatly, especially on click-to-toggle elements like checkboxes and radio-buttons.

In addition to using the for attribute, a label can be bound to an input by including the input inside the <label> element.


<form>
 <label><input type="radio" name="color" value="red"> Red</label>
 <label><input type="radio" name="color" value="blue"> Blue</label>
 <label><input type="radio" name="color" value="green"> Green</label>
</form>

Setting Default Values

The value attribute corresponds with current value of a form input element. By including a value for value, you can set a default (starting) state for any form element.

<form action="" method="post">
<label for="firstName">First Name</label>
<input type="text" name="firstName" id="firstName"value="John">
<label for="lastName">Last Name</label>
<input type="text" name="lastName" id="lastName" value="Smith">
<input type="submit">
</form>

Some developers are tempted to use the value attribute as a way to provide placeholder or user-hint text. This is usually a bad idea, because if the value is not replaced, the placeholder text will be sent to the server, which almost never the desired action.

In the example above (a person’s name), it would be a bad idea to use the “John Smith” values just as a placeholder or hint to the user — the user might submit this to the server. However, if this was (for example) a profile page, where users can update their own information or leave it the same, then using value this way makes sense.

If the user changes the input data on the form element, the value attribute changes as well. If you were to use JavaScript to get the element’s value, you would find it to be the updated value, not the original value in the document source.

Disabling Elements

Most elements can be disabled by adding the disabled attribute to them. Disabled elements cannot be clicked on or edited.

<form action="" method="post">
<label for="firstName">First Name</label>
<input type="text" name="firstName" value="John" id="firstName">
<label for="lastName">Last Name</label>
<input type="text" name="lastName" value="Smith" id="lastName" disabled>
<input type="submit">
</form>

A disabled elements send no value when the form is submitted, so be careful about using this to display (for example) profile data you don’t want the user to change.

The <input> element

The most important and versatile form element is <input>. Unlike most other form elements which have one function, the <input> element is used for a wide variety of functions, from simple text to complex interaction to buttons (the submission button on a form is usually an <input> element).

The different types of input controls are specified by the type attribute on the <input> element.

Type: Text

The most basic (and default) input type is text. This defines a single-line text input as would be used for a username in a log-in form or to enter a query in a search form.

<form>
 <input type="text">
</form>

The list attribute can be used to specify a list of predefined values for an input.

<form>
 <input type="text" list="options">
 <datalist id="options">
  <option value="red">
  <option value="green">
  <option value="blue">
 </datalist>
</form>

Type: Submit

The second most basic input type is the submit input, which defines the form’s submit button.

<input type="submit">

The default text on the submit button is “Submit”. This can be changed with the value attribute.

<form>
<input name="search">
<input type="submit" value="Search">
</form>

Another input type creates a similar GUI as the submit type — the button type. However, do not use the button for generic form submission. (It won’t work.) And don’t use the submit button for generic buttons within a form — it’ll work the wrong way.

Type: Password

If you want to obscure the characters entered into a text input, use the password type.

<form>
<label for="userName">User Name</label><br>
<input name="userName" type="text" id="userName"><br>
<br>
<label for="password">Password</label><br>
<input type="password" name="password" id="password"><br>
<br>
<input type="submit" value="Log In">
</form>

Text Input Types with Validation

Several input types create the same GUI — a box to type text into — but create different conditions for validating input.

For example, the email type checks to make sure that the data entered conforms to standard email address format (some text, followed by the @ sign, followed by text that encompasses at least one dot).

These types are:

  • email
  • number — Field accepts numbers only.
  • tel — A telephone number. (Validation for telepone numbers is not widely supported in browsers.)
  • url — Accepts only well-formed URLs.

These values are validated when the form is submitted, unless the novalidate attribute is specified on the form, or the formnovalidate attribute is specified on the individual elements.

Types with Date or Time Selectors

Several input types create pop-up UI elements for selecting a date and/or a time from a calendar. These UI elements are browser based, and not universally supported.

These types are:

  • date
  • datetime
  • datetime-local
  • month
  • time
  • week
<form>
<label for="date">Example of Date Input</label> <br>
<input type="date" name="date" id="date"> <br>
<br>
<label for="datetime">Example of Datetime Input</label> <br>
<input type="datetime" name="datetime" id="datetime"> <br>
<br>
<label for="datetime-local"> Example of Datetime Local</label> <br>
<input type="datetime-local" name="datetime-local" id="datetime-local"> <br>
<br>
<label for="month">Example of Month Input</label> <br>
<input type="month" name="month" id="month"> <br>
<br>
<label for="time">Example of Time Input</label> <br>
<input type="time" name="time" id="time"> <br>
<br>
<label for="week">Example of Week Input</label> <br>
<input type="week" name="week" id="week">
</form>

<!-- These input controls will look different in different browsers. -->
















Type: Radio

Radio buttons are a type of form input where only one item in a set can be chosen.

Each button in a set of radio buttons is its own <input> button, and there is no requirement that they be bound together as children of a containing element.

The attribute that binds several radio buttons into a set is the name attribute.

<form>
<input type="radio" name="shape" value="square"> Square<br>
<input type="radio" name="shape" value="circle"> Circle<br>
<input type="radio" name="shape" value="triangle"> Triangle<br>
<input type="radio" name="color" value="red"> Red<br>
<input type="radio" name="color" value="blue"> Blue<br>
<input type="radio" name="color" value="Green"> Green
</form>
Square
Circle
Triangle
Red
Blue
Green

The value submitted to the server for each name is the contents of the value attribute of the selected radio button for each pair. Any labeling is for the user only, and has no effect on the value passed to the server.

The best way to label the inputs in a set of radion buttons is to wrap the <input> element and the label text into a <label> element. This makes the label text clickable, which is easier to use.


<form>
 <label>
  <input type="radio" name="shape" value="square">
  Square
 </label>
 <br>
 <label>
  <input type="radio" name="shape" value="circle"> 
  Circle
 </label>
 <br>
 <label>
  <input type="radio" name="shape" value="triangle"> 
  Triangle
 </label>
 <br>
 <label>
  <input type="radio" name="color" value="red">
  Red
 </label>
 <br>
 <label>
  <input type="radio" name="color" value="blue">
  Blue
 </label>
 <br>
 <label>
  <input type="radio" name="color" value="Green"> 
  Green
 </label>
</form>





Notice that since the <label> element wraps the <input> element, the for and id attributes are not needed.

Type: Checkbox

The checkbox input type can be used to define one of two types of input controls (that both look like checkboxes).

The first type is a single key which may have several value (sometimes called multi-select). The second type is a boolean (TRUE/FALSE) key.

To create an array of values which may be assigned to the same key, simply create a group of checkbox inputs with the same name attribute.

<b>Colors I like</b><br>
<form>
<label>
 <input type="checkbox" name="color" value="blue">
 Blue
</label>
<br>
<label>
 <input type="checkbox" name="color" value="green">
 Green
</label>
<br>
<label>
 <input type="checkbox" name="color" value="yellow">
 Yellow
</label>
<br>
<label>
 <input type="checkbox" name="color" value="red">
 Red
</label>
</form>
Colors I like



In this example, multiple color selections can be made. They will each be sent to the server as individual parameters in the request. For example, if all of them were selected in a GET form, the requested URL would look like:

http://example.com?color=blue&color=green&color=yellow&color=red

You can also use checkboxes individually to represent boolean (TRUE/FALSE) values.

<form>
<label>
 <input type="checkbox" name="tos" value="TRUE">
 By clicking here you certify that you agree to our Terms of Service.
</label>
</form>

Of course, any value would work, as long as the server-side code knows how to interpret the presence of the attribute.

In either case, if no boxes are checked, the name key is not sent in the request.

For example, in the colors example above, if none of the options were checked, the submitted data would not include any reference to the color input key. (Not even an empty set.)

Type: Button

A button can be created in a form using the input type of button. Unlike other input types, this carries no specific meaning within a form, and is usually used only to trigger some JavaScript action.

<input type="button" value="Button Label" onclick="alert('I am a button!')">

Type: Color

New in HTML 5 — and only supported in some browsers — is a color-picking input type.

In browsers that support it, clicking on this element brings up a GUI for selecting a color. The value submitted to the server is an HTML/CSS hex-color value (ex. white = #ffffff).

<form>
<label for="favorite-color">What is your favorite color?</label><br>
<input type="color" name="favorite-color" id="favorite-color">
</form>

<!-- This will look different in different browsers, and may not be supported at all. -->

Type: Range

Also new in HTML 5, and also dependent on browser support, is the range input. This input appears as a slider, which the user can move horizontally.

The input element needs to define the highest and lowest value in the range. The value set by the user will be submitted with the form.


<form>
<label for="form-understanding">How well do you understand forms?</label><br><br>
<i>Not at all.</i>
<input type="range" name="form-understanding" id="form-understanding" min="0" max="100">
<i>Very well.</i>
</form>


Not at all. Very well.

Type: Image

The image input replaces a submit button, allowing you to use an image as the button.

In addition to simply changing the way the button works, the submission request also includes the X and Y coordinates of the user’s click within the image. This allows the submission form to acts as a server-side image map.

<form>
<input type="image" src="example.jpg">
</form>

Type: File

A form can include a file-upload input with the file type. The exact display and functionality of the file-upload GUI is controlled by the browser. File handling (where the file will be saved) also has to be specified on the server-side.

<form>
<input type="file" name="file-upload">
</form>

You can limit the files accepted by the file input by using the accept attribute, which allows you to specify either a list of extensions or a list of MIME types.


<!-- Specify a list of file extensions. -->
<form>
<input type="file" name="extension-limited-uploader" accept=".png, .gif, .jpg, .jpeg">
</form>


<!-- Specify a list of MIME types. -->
<form>
<input type="file" name="mime-limited-uploader" accept="image, image/png, image/gif, image/jpg, image/jpeg">
</form>

Browser support for file-extension is not universal, so the MIME-type list is probably the better way to go. (See this list of MIME types for details.)

Even if you use the accept attribute to limit the file extensions which can be uploaded through the form, it is important to verify both the file type and the file contents on the server. For at least two reasons:

  • A malicious (or careless) user can misname a file with the wrong extension. The accept limitation on a file uploader only looks at the extension, not the actual file format, so there is no guarantee that an file is of the right type.
  • It is possible to bypass the form and submit a request directly to the server. (This is why ALL inputs should be validated on the server.)

Type: Hidden

You can designate a non-visible input element, whose value will be included when the form is submitted, by using the type of hidden.

<input type="hidden" name="hidden-value" value="">

The most common use for a hidden input is as a holder for a value generated elsewhere on the page, usually through JavaScript. User interaction on the page causes a value to be assigned to the hidden input, which is then included in the form submission.

Attributes of <input>

The <input> element has a large number of attributes. Some of them are only applicable to particular input types, and others can be used with any input.

  • accept — Defines a list of file types, either by extension or by MIME type. Only used with type="file".
  • alt — Defines an alt text. Only used with type="image".
  • autocomplete — Specifies whether the input field should autocomplete. Values are on or off. Overrides the form-level autocomplete attribute. Only applicable to text-based inputs.
  • autofocus — Specifies that the <input> should be in focus when the page loads. No value required. Should only be used once in a document.
  • checked — Sets a radio or checkbox input to the checked state.
  • disabled — Disables an <input> element. Disabled elements do not send their value when the form is submitted. No value is required.
  • form — Specifies a <form> to which the <input> belongs, for use when the <input> element is outside the <form> element. Value is the id of the target form. This attribute is not universally supported.
  • formaction — Specifies a URL to submit the form to. Overrides the action attribute of the <form> itself, or replaces it. This is only used on the submit or image types. The only reason to use this instead of the form’s action attribute is if a form requires multiple submission buttons with different actions.
  • formenctype — Specifies the character encoding of the submitted data. Overrides the enctype attribute of the <form> element. This is only used on the submit and image types. Values:
    • application/x-www-form-urlencoded
    • multipart/form-data
    • text/plain
  • formmethod — Specifies the method (get or post) of the form submission. Overrides the method attribute of the <form> element. This is only used on the submit and image types.
  • formnovalidate — Specifies that the form data should not be validated before submission. Overrides the novalidate attribute of the <form> element. This is only used on the submit and image types. This attribute requires no value.
  • formtarget — Specifies the browser context in which the response is to be displayed. Overrides the target attribute of the <form> element. This is only used on the submit and image types. Values:
    • _blank
    • _self
    • _parent
    • _top
    • framename
  • height — Specifies the height, in pixels, of an image input. It would be better to use CSS to specify this.
  • list — Specifies the id of a <datalist> element containing pre-defined options. Only used with text-based inputs.
  • max — Specifies the maximum value for a number or date-based input.
  • maxlength — Specifies the maximum number of characters in a text-based input.
  • min — Specifies the minimum value for a number or date-based input.
  • multiple — Specifies that the user can enter more than one value. Used with email and file input types. This attribute requires no value.
  • name — Specifies the name of the input. Used as the key in a key-value pair representing the input when the form is submitted. A unique name should be provided for all form elements.
  • pattern — Defines a regular expression to be used for validating the value of a text-based input.
  • placeholder — Defines placeholder or “helper” text for a text-based input.
  • readonly — Specifies that an input cannot be edited by the user. Similar in behavior to the disabled attribute, except readonly inputs do send their value to the server when the form is submitted. Often used with JavaScript to ensure a user cannot edit a value until certain conditions are met, or cannot a value after certain conditions are met. This attribute requires no value.
  • required — Specifies that the <input> must have a value, or the form will not be submitted. This attribute requires no value.
  • size — Specifies the width, in characters, of a text-based input element. Using CSS is typically a better way to accomplish this.
  • src — Specifies the URL of an image for an image input.
  • step — Defines the interval between valid inputs in a number-based input.
  • type — Specifies the type of the <input> element. Default is text. Not all possible values are supported in all browsers. Values:
    • button
    • checkbox
    • color
    • date
    • datetime
    • datetime-local
    • email
    • file
    • hidden
    • image
    • month
    • number
    • password
    • radio
    • range
    • reset
    • search
    • submit
    • tel
    • text
    • time
    • url
    • week
  • value — Specifies the starting value of an input.
  • width — Specifies the width, in pixels, of an image input. Using CSS is typically a better way to accomplish this.

Text Area

If you want a short, single line of text for input, use the <input type="text"> element. But if you need a larger area for a longer message, use the <textarea> tag.

<style>

textarea {
 height: 6em;
 width: 50em;
}
</style>

<form>
<label for="msg">Your message:</label><br/>
<textarea name="msg" id="msg"></textarea>
</form>

Any text inside the element will be displayed in the text area.

<form>
<label for="msg">Your message:</label><br/>
<textarea name="msg" id="msg">This text is inside the textarea element. It will be seen by the user. If the user doesn't change it, it will be submitted with the form.</textarea>
</form>

Attributes for <textarea> are:

  • autofocus — Specifies that the <textarea> should be in focus when the page loads. Should only be used once on a document. This attribute requires no value.
  • cols — Specifies the width, in characters, of the text area. This is better accomplished with CSS.
  • disabled — Disables the <textarea>. Disabled form elements do not send their value to the server when the form is submitted. This attribute requires no value.
  • form — Specifies the id of a <form> to which the <textarea> belongs, for use the <textarea> is not contained within the <form> element. Not supported in all browsers.
  • maxlength — Specifies the maximum number of characters allowed in the <textarea>.
  • name — Defines the name of the <textarea>, and serves as the key for the key-value pair representing the <textarea> in the form submission request. All form elements should include a unique name.
  • placeholder — Defines placeholder or helper text to be displayed inside the <textarea> before the user types into it.
  • readonly — Specifies that an input cannot be edited by the user. Similar in behavior to the disabled attribute, except readonly elements do send their value to the server when the form is submitted. Often used with JavaScript to ensure a user cannot edit a value until certain conditions are met, or cannot a value after certain conditions are met. This attribute requires no value.
  • required — Specifies that the <textarea> must have a value, or the form will not be submitted. This attribute requires no value.
  • rows — Specifies the height, in lines of text, of the <textarea>. In some cases this is preferable to using CSS (such as when the actual number of lines is relevant), but for simply defining the height, CSS is usually a better choice.
  • wrap — Specifies whether the input should hard wrap (insert a line break character at every line break) or soft wrap (insert a line break character only where the user defines a line break). Values are soft or hard.

Select (Drop down)

To define a drop-down selector, use the <select> element with <option> child elements.


<form>
 <label for="favorite-color">What is your favorite color?</label><br>
 <select name="favorite-color" id="favorite-color">
  <option value="red">Red</option>
  <option value="blue">Blue</option>
  <option value="green">Green</option>
  <option value="yellow">Yellow</option>
 </select>
</form>

Options can be grouped together and given group-level labels with the <optgroup> element.

<form>
 <label for="favorite-color">What is your favorite color?</label><br>
 <select name="favorite-color" id="favorite-color">
  <optgroup label="Primary Colors">
   <option value="red">Red</option>
   <option value="blue">Blue</option>
   <option value="yellow">Yellow</option>
  </optgroup>
  <optgroup label="Secondary Colors">
   <option value="green">Green</option>
   <option value="orange">Orange</option>
   <option value="purple">Purple</option>
  </optgroup>
  <optgroup label="Not Actually Colors">
   <option value="black">Black</option>
   <option value="white">White</option>
   <option value="gray">Gray</option>
  </optgroup>
 </select>
</form>

The content inside the <option> element provides a user-facing label, but the value sent to the server is defined by the value attribute, not by the content of the element.

Attributes of the <select> element:

  • autofocus — Specifies that the <select> element should be in focus when the page loads. Should only be used once on a document. This attribute requires no value.
  • disabled — Disables the element. Disabled elements do not send the value to the server when the form is submitted. This attribute requires no value.
  • form — Specifies the id of the <form> to which this <select> element belongs, for use then the <select> element is not contained within the <form>. Not supported in all browsers.
  • multiple — Specifies that the user may select more than one <option>. Multiple selections are sent as multiple key-value pairs. This attribute requires no value.
  • name — The name of the element, which serves as the key in a key-value pair representing the <select> element when the form is submitted to the server.
  • required — Specifies that the element must have a selected value, or else the form will not be submitted. This attribute requires no value.
  • size — Specifies the number of visible options. The default is 1.

Attribute for the

  • disabled — Specifies that the <option> can not be selected.
  • label — Specifies the label for the <option>, which replaces the contents of the element in the drop-down display.
  • selected — Specifies that the <option> should be pre-selected on page-load.
  • value — Defines the value sent to the server.

Organizing a Form

A large or complex form can be made easier to style and use by grouping form elements into <fieldset> containers. Each <fieldset> can be titled with a <legend> element.


<form>
 <fieldset>
  <legend>Personal Information</legend>
  <label for="firstName">First Name</label> <br>
  <input name="firstName" id="firstName"> <br>
  <br>
  <label for="lastName">Last Name</label> <br>
  <input name="lastName" id="lastName"> <br>
  <br>
  <label for="birthDate">Birth Date</label> <br>
  <input name="birthDate" id="birthDate"> <br>
 </fieldset> 
 <fieldset>
  <legend>Favorite Things</legend>
  <label for="favoriteColor">Favorite Color</label>
  <select name="favoriteColor" id="favoriteColor">
   <optgroup label="Primary Colors">
    <option value="red">Red</option>
    <option value="blue">Blue</option>
    <option value="yellow">Yellow</option>
   </optgroup>
   <optgroup label="Secondary Colors">
    <option value="green">Green</option>
    <option value="orange">Orange</option>
    <option value="purple">Purple</option>
   </optgroup>
   <optgroup label="Not Actually Colors">
    <option value="black">Black</option>
    <option value="white">White</option>
    <option value="gray">Gray</option>
   </optgroup>
  </select>
  <label for="favoriteShape">Favorite Shape</label>
  <select name="favoriteShape">
   <option value="triangle">Triangle</option>
   <option value="square">Square</option>
   <option value="circle">Circle</option>
  </select>
 </fieldset>
 <input type="submit">
</form>

Styling Forms

The default display of form elements in most browsers is extremely unattractive. Besides the general “battleship gray” of buttons and drop-down UI, there are typically serious problems with alignment, line height, and spacing.

This causes two problems:

  • Many of the form elements look bad individually.
    • For example — Radio Buttons and Check Boxes do not usually align properly with their own labels.
  • Form elements do not look good together.
    • For example — an <input type="text"> element and a <select> drop-down on the same line will not line up with each other properly.

This can be very frustrating.

Some of the problems — like vertical height and spacing, are dealt with in some of the more popular CSS Resets, but not all of them.

If you are going to build a CSS Style Sheet for your project from scratch, be sure to create several detailed example forms, using all of the form elements in a variety of combination. Be especially mindful of multi-column forms.

Because of the difficulties of form styling, it is often a good idea to use the form styles from a popular front end framework.

7. Expanded HTML

This chapter introduces some of the new features built into HTML5, and covers how to embed video and audio onto a web page.

Intro and HTML5 Media

HTML5 has brought a number of additions and changes to the way website, pages, and documents are created. This guide has covered many of these specific changes in the context of particular aspects of markup design that already existed prior to the advent of HTML5:

  • Structure
  • Semantics
  • Syntax

That is to say, most of the HTML5 topics that we have covered in the previous chapters covered new ways to do the (more or less) the same old things.

But HTML5 has done more than just replace our <div class="main"> with <main>. HTML5 provides the tools to create rich media experiences inside a web browser, and to make HTML a viable platform for dynamic applications.

This chapter covers the most immediately usable additions to HTML, markup used for embedding media into a web page. The next two chapters will cover more advanced topics that blur the line between website and app.

Video

Prior to HTML5 two things were true about web video:

  • You had to use a plugin to make it work
  • It was complicated

Now, the situation is exactly the opposite:

  • Video is native
  • It is as easy as adding an image

Here is the actual markup for displaying a video on a page:


<video>
  <source src="movie.mp4" type="video/mp4">
</video>

That’s it. It is that simple.

Well, okay — it takes just a little more than that, but not much. The above is a stripped-down version showing the basic structure of an embedded video.

Here is a more realistic code sample:

<video width="400" controls>
  <source src="movie.mp4" type="video/mp4">
  <source src="movie.ogg" type="video/ogg">
  <track src="subtitles_en.vtt" kind="subtitles" srclang="en" label="English">
  Your browser does not support embedded video.
</video>

In this example we see some of the practical elements needed to make HTML5 video work in real life:

  • Multiple <source> elements. This is used to provide more than one option for the source file of the video. This can be done for different reasons:
    • To provide different file formats, for maximum compatibility.
    • To provide different sizes or resolutions based on client attributes like screen size or download speed.
  • width specification. This can usually be specified in CSS, and the default is the width of the downloaded video.
  • controls attribute, which allows the playback controls (play, pause, ff, rv, seek bar, mute, toggle full screen) to be displayed. (The default is not to display them.
  • <track>, which contains a file defining subtitles for the video.
  • Fallback content. Anything inside the <video> element that isn’t a <source> or <track> element will only appear if the video content can not be played. In this example above, this means that if the browser does not support embedded video, the user will get the message, “Your browser does not support embedded video.”

Still very simple, but with a lot of room for enhancement.

Attributes of <video> include: - autoplay — If this attribute is included (it does not require a value), the video will begin playing immediately as soon as it is ready. (Note: This is a very annoying feature in many contexts and you should be judicious about how you use it.) - controls — If included, the controls are displayed. No value is specified. - height — The height of the video player, in pixels. It is usually better not to set the height manually. CSS is the best option, and even if you need to set the width value in the markup, the height will scale automatically with the width. - loop — If included, the video will begin playing again as soon as it is finished. No value is specified. - muted — If included, the video’s volume control is set to mute when the video loads. If you must use autoplay, you really should combine it with muted. No value is specified. - poster — Specifies the URL of an image to display before the video is played. - preload — Suggests whether the browser should preload the video before the user hits “play.” This request may be ignored by the browser. Values are: - auto — Preload the entire video. - meta — Preload just the video’s metadata. - none — Do not preload the video. - src — The URL of a video file to display. It is best to omit this and use the <source> child elminements instead. - width — The width of the player, in pixels. It usually better to specify this with CSS.

Attributes of <source> include:

  • media — The value here is a media query, specifying the type of media for which this particular source should be used. Media queries can specify both the type of device and/or the sixe of the browser window. For example: screen and (min-width:320px). As of this writing, no major browser supports media queries for <source> elements.
  • src — The URL of the video file.
  • type — The MIME type of the video file. This is usually the word video followed by the a slash and the exact file format. (For example: video/ogg.If this is present, the browser will use it to determine if the file is supported. If it is not present, the browser will have to first query the server before determining if it can play the file. So it is a good idea to specify, as it saves a browser request. There are currently three supported video file types that can be specified as values of type:
    • video/mp4 — This is the only video file format supported by all the major browsers.
    • video/ogg
    • video/webm

The <track> attribute specifies a file that contains text which should appear when the video is being played. Most commonly, this is subtitle tracks, but there are a few other types of tracks that can be specified. Attributes for <track> include:

  • src — The URL of the text track.
  • kind — The type of text track. Possible values are:
    • captions
    • chapters
    • descriptions
    • metadata
    • subtitles
  • srclang — Specifies the language of the text track. This is required if the kind attribute is subtitles.
  • label — Specifies the title of the text track.
  • default — If included, this track is enabled (turned on) by default. Otherwise, the use has to turn it on manually. This attribute receives no values.

Hosting and Embedding Video

HTML5 makes it easier than ever to self-host and embed video on your own site.

However — just because you can do something, doesn’t mean that you should do it.

Most people should not host their videos on their own server, or place them into their websites using the HTML5 <video> tag.

There are two big reasons not to do this, and one big reason to use a third part video-host like YouTube or Vimeo.

  • bandwidth — Videos are typically rather large files. They take up a lot of room on your server, and they use up a lot of bandwidth when viewed. If you are paying for bandwidth, that can end up costing a lot of money if you ever have a lot of visitors. If you use “unlimited” bandwidth hosting, you’ll hit your unwritten limits and start to find your website speeds slowing down. At the very least, the bandwidth issue argues in favor of a Content Delivery Network. You should use a CDN if you are determined to “self-host,” but a third-party video sharing site is a better idea because of the other two issues.
  • formats and sizes — Do you really want to go through the trouble of making sure your video is available in five different screen resolutions? You probably don’t. It’s a lot to manage for each video, and its a lot to keep up with as standards evolve. When using a third-party video sharing site, they take care of this for you. You upload a large file, with high resolution, and they automatically convert it to low-resolution options for their users. This is a much simpler process.
  • wider audience — If you host your own videos, they are only accessible from your own website. But if you use YouTube or Vimeo, you have the possibility of being found on those sites, increasing the likelihood that people will come to your site and engage with your content.

Even if you can self-host your videos, it is usually better not to. YouTube and Vimeo both provide a number of benefits over self-hosting.

One reason not to host videos on YouTube is if you need to keep videos private, for example if they are pay-to-view videos on a member-subscription site. In this case, Vimeo’s Pro service is probably the best option. You can make videos accessible only from certain referring sites.

If you must self-host your videos, for whatever reason, you should definitely take advantage of a Content Delivery Network. This will speed up your download speeds and lowed overall costs.

Audio

Audio is embedded into a web page in almost the exact same way that video is.

<audio controls>
  <source src="soundfile.ogg" type="audio/ogg">
  <source src="soundfile.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

The primary element is the <audio> tag, inside of which is one or more <source> elements, specifying audio files. Providing multiple options helps browsers find one that is supported. Text inside the <audio> element will be visible if the browser does not support audio. You can also specify a <track> element which can hold data to be displayed during playback, like a read-along transcript.

The attributes for <audio> are nearly identical to those for <video>:

  • autoplay — Causes the audio to begin playing immediately when ready. (A sure way to annoy website visitors.) This attribute requires no value.
  • controls — Specifies that the controls should be displayed. This should generally be specified. The only reason not to is if you are replacing the controls with sour own interface, powered by Javascript. This attribute requires no value.
  • loop — Sets the audio track to repeat upon finishin. This attribute requires no value. (Combine with autoplay while omitting controls to really drive your visitors insane.)
  • muted — Sets the volume to mute automatically. This attribute requires no value.
  • preload — Makes a suggestion to the browser (which can be ignored) about whether to preload the audio file. Values are:
    • auto — Automatically downloads the file before the user clicks the play button.
    • meta — Downloads only the meta data.
    • none — Does not download anything until the user clicks the play button.
  • src — A URL to an audio file. This is usually omitted in favor of have one or more <source> elements inside the <audio> element.

The attributes for the <source> element are the same as those for the <video> element:

  • src – A URL to an audio file.
  • type — The specific type of file. If this is not included, the browser will have to query the server to ask, in order to decide if the file format is supported, before downloading the file. So it is a good thing to specify. The two most common (and most widely supported) types are:
    • audio/mpeg
    • audio/ogg
  • media — Since this defines a media query, it is not relevant to <audio>.

The <track> element is identical to in <audio> and in <video>. It defines a file of text which runs along with the video:

  • src — The URL of the text track.
  • kind — The type of text track. Possible values are:
    • captions
    • chapters
    • descriptions
    • metadata
    • subtitles
  • srclang — Specifies the language of the text track. This is required if the kind attribute is subtitles.
  • label — Specifies the title of the text track.
  • default — If included, this track is enabled (turned on) by default. Otherwise, the use has to turn it on manually. This attribute receives no values.

While <track> can be used to add captions for the deaf or hard of hearing, a better way to provide a transcript for an <audio> file is probably to simply include the text of the transcript on the page. This makes it easier to read the user’s desired pace, and allows its content to be indexed by search engines.

Self-Hosting Audio

Audio does not have as many problems as video. While the files are large, they are not as large as video files. Moreover, audio does not depend on the size of the screen or its resolution in order to display properly.

For this reason, there’s really no compelling reason not to self-host audio and embed it into your site with HTML5’s <audio> element.

Because of the bandwidth and file storage issues, though, it is a good idea to make sure you are using a Content Delivery Network.

Audio, Video, and the Evolving Web

Direct support for Audio and Video are new in HTML 5.

It isn’t that this sort of thing had never been done before, it’s simply that it wasn’t done with HTML5. Other solutions, like Flash or Java applets, had to be used. Before HTML5, media was secondary to page design, now it is fully integrated.

This begins to change the way web pages are designed, but more than that — it changes the way they are thought of.

Since the beginning of the internet, the primary metaphor for a webpage has been a “document.” We talk about “HTML documents” and begin HTML files with a “document type” declaration.

With video and audio added, our documents are becoming like the newspapers in Harry Potter — still regular documents, but with magical boxes where special things happen.

In the next chapter, we’ll look at two even more magical features of HTML 5, features that allow code to draw pictures directly into a web page. They still exist in a “document paradigm,” but combined with Javascript, this can provide dynamic visual interfaces. After that, we’ll complete our look at expanded HTML with some new HTML 5 features that really blur the lines between apps and documents.

Dynamic Images - SVG and Canvas

This chapter provides a brief introduction to the two HTML elements that can be used for drawing: <svg> and <canvas>. It covers their difference and gives some examples about how to use them.

You have always been able to embed images in HTML, but two features that have slowly become standard over the last decade allow images to be drawn directly into HTML: <canvas> and <svg>.

This chapter can only introduce these concepts and give some brief explanations about how they can be used in an HTML document. An entire book could be written on the SVG format, and the new <canvas> element depends entirely on a functional understanding of JavaScript.

However, a brief explanation will help you understand how they fit into the larger picture of HTML generally.

SVG

The SVG (Scalable Vector Graphics) format has been around for a long time. It is a file format independent of HTML, and support for it is not new with HTML 5. However a new <svg> element has been added in HTML to promote support for the format.

SVG is a vector-graphics format. This means that the images are not stored as a picture (a collection of individual bits or pixels), but rather as a description of the points, lines, fills, shapes, and words in the image. This makes vector graphics much more scalable that “raster” or “bitmap” images, because their display resolution is determined as they are being rendered. They always render clean, never pixellated or blurry.

The other advantage of SVG graphics over other image types is that each piece of the image — each line, shape, or color — is a discreet object which can be manipulated individually.

The SVG format looks a lot like HTML: each shape is defined with angle-bracketed elements with names and attributes that define how they should be rendered.

<svg width="200" height="200">
  <circle cx="100" cy="100" r="80" stroke="black" stroke-width="5" fill="cyan" />
</svg>

When rendered, this looks like:

Let’s take a second to make sure its clear what is going on here.

  • <svg> defines an area within which SVG elements can be place. The width and height attributes specify its size.
    • This usually needs to be done here, as setting it with CSS doesn’t work quite right.
  • The <circle> element defines a circle.
    • cx and cy define the coordinates for the circle’s centerpoint. The coordinate ( cx="0" cy="0" ) refers to the top-left corner of the <svg> element.
    • r is the radius of the circle.
    • stroke specifies the color of the stroke, or perimeter of the circle.
    • stroke-width specifies the thickness of the stroke or perimeter.
    • fill specifies the inside color of the circle.

This is how SVG graphics are built up, one primitive element at a time. There are a number of SVG elements, such as <line>, <polygon>, <rect>, and so forth.

Building up an image like this, with individual elements, makes the pieces of the image accessible.

If the image of a blue circle with a black border were simply a static raster or bitmap image (like a JPG or a GIF), there would be no way to access it via CSS or Javascript. However, with SVG, each element and attribute is accessible. So we can do this:

<style>
circle#switch:hover {
 fill: red;
 stroke: blue;
}
</style>
<body>

<svg width="200" height="200">
  <circle id="switch" cx="100" cy="100" r="80"
  stroke="black" stroke-width="5" fill="cyan" />
</svg>

In the example here, the circle is being targeted by it’s ID element ("#switch"). When the :hover pseudo-class is activated (by hovering over the circle with the mouse) the fill and stroke colors change.

These attributes can also be targeted with Javascript.


<svg width="200" height="200">
  <circle id="change" cx="100" cy="100" r="80" stroke="black" stroke-width="5" fill="cyan" />
</svg>

<button id="switch-button" onclick="changeColor()">SWITCH</button>

<script>

 function changeColor() {
  var x = document.getElementById("change");
  x.setAttribute("stroke", "blue");
  x.setAttribute("fill", "red");
}

</script>

This is a very basic example, but you can start to see how a combination of CSS interactivity (including animations) and JavaScript functionality could combine to make the <svg> element a full-fledged visual interface.

Here is a very simple (and a little silly) example of this sort of thing:


<style>
gameboard circle {
    stroke: black;
    stroke-width: 5;
    fill: white;
}

circle.red {
    fill: red;

}

circle.green {
    fill: green; 
}

circle.blue {
    fill: blue;
}
</style>

<svg id="gameboard" width="200" height="200">
    <circle id="1A" cx="25" cy="25" r="18"  />
    <circle id="1B" cx="75" cy="25" r="18"  />
    <circle id="1C" cx="125" cy="25" r="18"/>/>

    <circle id="2A" cx="25" cy="75" r="18" />
    <circle id="2B" cx="75" cy="75" r="18" />
    <circle id="2C" cx="125" cy="75" r="18" />

    <circle id="3A" cx="25" cy="125" r="18" />
    <circle id="3B" cx="75" cy="125" r="18"  />
    <circle id="3C" cx="125" cy="125" r="18" />

</svg>
<button onclick="changeColor()">GO!</button>


<script>
function changeColor() {
    var gameboard = document.getElementById("gameboard");
    var  list = gameboard.getElementsByTagName("circle");
    for (var i = 0; i < list.length; i++) {
     var colorList = ["green", "red", "blue", "white"];
     var colorNow = colorList.sort(function() {return 0.5 - Math.random()})[0]; 
     list[i].setAttribute("class", colorNow);
    }
}
</script>

Here is that same code, with comments explaining what is going on.

<style>

/* Set beginning style for all circles.*/
gameboard circle {
    stroke: black;
    stroke-width: 5;
    fill: white;
}

/* Set other fill colors based on class name. */
circle.red {
    fill: red;

}


circle.green {
    fill: green; 
}

circle.blue {
    fill: blue;
}
</style>

<!-- The SVG panel is called a "gameboard". 
It is 200px by 200px -->

<svg id="gameboard" width="200" height="200">

<!-- Three rows of three circles each. -->

    <circle id="1A" cx="25" cy="25" r="18"  />
    <circle id="1B" cx="75" cy="25" r="18"  />
    <circle id="1C" cx="125" cy="25" r="18"/>/>

    <circle id="2A" cx="25" cy="75" r="18" />
    <circle id="2B" cx="75" cy="75" r="18" />
    <circle id="2C" cx="125" cy="75" r="18" />

    <circle id="3A" cx="25" cy="125" r="18" />
    <circle id="3B" cx="75" cy="125" r="18"  />
    <circle id="3C" cx="125" cy="125" r="18" />

</svg>

<!-- A button that triggers the changeColor() function when clicked -->
<button onclick="changeColor()">GO!</button>

<script>
// defining the changeColor function, which is triggered when the button above is clicked.

function changeColor() {
    // Place the gameboard element into a variable of the same name.
    var gameboard = document.getElementById("gameboard");
    
    // Create a list of circles by getting all the circle elements inside of "gameboard" and assigning them to the variable named "list".
    var  list = gameboard.getElementsByTagName("circle");
    
    // Loop through each circle in the list.
    for (var i = 0; i < list.length; i++) {
      // Do the following with each circle:

       // Create a list of colors.
       var colorList = ["green", "red", "blue", "white"];

       // Randomly select a color. This is done by picking a random number and then selecting the color that corresponds with that number.
       var colorNow = colorList.sort(function() {return 0.5 - Math.random()})[0];

       //Assign the color name to that circle as a class name. This will affect which CSS styles get assigned to it. 
       list[i].setAttribute("class", colorNow);
    }
}
</script>

Here’s this code live:

It’s a very silly example, but you can imagine how this type of functionality could be expanded to create a game or an application interface.

Canvas

The <canvas> element is new in HTML 5, and represents a completely different approach to drawing than the <svg>.

The <svg> element uses the SVG standard, which is vector-based drawing. The <canvas> element creates bitmap images. This interrelates with another difference between the two formats: there are no elemental contents to access in <canvas>.

The HTML for a <canvas> is very straightforward.

<canvas>
</canvas>

You can also add height and width, and — since the element will be targeted by JavaScript — adding an ID is usually a good idea. Finally, some fall back text to be displayed in case the <canvas> element isn’t supported can be included.


<canvas id="demoCanvas1" height="300" width="300" >
  Sorry. It looks like your browser doesn't support the <canvas> element.
</canvas>

From a markup perspective, that’s it.

In order to use the <canvas> you have to draw into it with JavaScript. Once you’ve drawn onto the canvas, the individual pieces of the drawing are no longer accessible to CSS or JavaScript — they are just part of a bitmap image.

To give you a sense of this JavaScript drawing, here is a very simple example.

<canvas id="demoCanvas2" height="300" width="300" style="border:1px solid #111111;">
  Sorry. It looks like your browser doesn't support the <canvas> element.
</canvas>

<script>
var c = document.getElementById("demoCanvas2");
var ctx = c.getContext("2d");
ctx.fillStyle = "#0000FF";
ctx.fillRect(25,25,250,250);
</script>
Sorry. It looks like your browser doesn’t support the element.

As you can see, drawing even a simple shape to the <canvas> requires quite a bit of code. For this reason, from a practical standpoint, <canvas> is mostly going to be used with complex JavaScript libraries that will render output programmatically. Coding draw instructions manually just doesn’t make a lot of sense for anything other than an exercise or example.

Though individual shapes and elements cannot be accessed via JavaScript or CSS, the image as a whole can be accessed, analyzed, and altered via JavaScript as if it were am <img>. This makes <canvas> ideal for image editing applications that mimic Draw or Photoshop functionality.

Conclusion

With <svg> and <canvas>, HTML has to very different ways of providing defined areas for free drawing and image rendering. Combined with JavaScript and CSS, these can become the basis for highly interactive visual interfaces for games or applications.

Web Applications

This chapter completes the brief your of HTML 5 and its new features by looking at a few Javascript APIs that help move HTML from a document-authoring language to web application platform.

From Documents to Web Apps

Most people don’t think much about what a web page is — they just look at them, read them, watch a video, fill out a form, or whatever else they need to do online, without giving any thought to it.

Even many people who author web pages and manage websites don’t think too much about it.

But how you think about it affects how you understand HTML and websites, so this guide has spent a lot of time making sure you understand the traditional (or conventional) view of web pages: documents.

On the one hand, the language of documents is still embedded into the way HTML functions and continues to be a critically important way to view HTML files. On the other hand, common practice in the age of web applications and social networks has moved us well beyond documents and into the realm of web application and user interface.

This chapter touches on several features added into HTML 5 which bridge the gap between documents and applications.

These features come in the way of new APIs, or Application Programing Interfaces. An API is any standardized way of accessing something from another thing. In the case of web pages and HTML, we’re talking about a way for Javascript to access various aspects of the current document or browser session.

More on these topics will also be covered in the chapter on Javascript. This chapter is just concerned with a brief introduction to some of the new APIs implemented with HTML 5.

New APIs in HTML 5

Timed media playback : Both <audio> and <video> elements expose their playback and timer controls to Javascript, which means that you can trigger events to occur at particular times in your video, or cause the video to jump to specific times based on other events. : This can be used to create enhanced media experiences. For example, an online classroom experience which syncs video of an instructor with on-page slides or outlines. : Several new Javascript libraries and frameworks are being developed to help realize the potential of this new feature. Most notably is Popcorn.js.

Local Storage : For years, web browsers have had the ability to store small pieces of information, called cookies. : Local Storage is a similar idea with some key differences that make it a more robust option for web applications. : Local Storage is only available locally, whereas cookies are sent to the server. While cookies have been used to implement storage, they are primarily used as identification tags, with the relevant data stored server-side. : Local Storage can store much larger pieces of content, and the content is accessible locally. It is only sent to the server by a secondary means (such as am asynchronous Javascipt call). : Local Storage can persist in-browser between sessions. It can therefore be used for user-created content which is to be saved locally, without the risk that data will be lost should the user close the browser or shutdown the device.

Offline Web Applications : Being able to run web applications offline requires two features which have been added in HTML 5 — the ability for the page/app to be “aware” of being offline (disconnected from the internet), and the ability to specify files to be downloaded and cached for offline use (anything that would be required during a session). : Most major browsers have implemented these specifications. JS has access to a global “online/offline” attribute, and apps can also include a manifest file specifying what additional files need to be downloaded and cached. : Local storage provides the final piece of offline functionality — user input can be stored locally and submitted to the server when the device is back online. : Combining these features, one can build — for example — a full-fledged email client in-browser. The application would use the manifest to specify the downloading of all email messages. If the device goes offline, the page can still display messages to the user. Messages created by the user can be stored locally, and sent when the device is reconnected to the internet.

Drag and Drop : Drag and Drop is a common interface paradigm familiar from desktop applications. : In HTML5, it is implemented by adding the draggable="true" attribute to any element. This allows the element to be clicked and dragged around the screen. : The effect of “dropping” the element is managed behind the scenes with Javascript. Data from the drag element is transferred to the drop target, which then can do something with that data (download a file, display the image of the drag element).

Cross Document Messaging : It has historically been impossible for pages at different domains to communicate with each other. This restriction has been implemented in almost all browsers for security reasons. : HTML 5 provides a controlled way to post messages from a page at one domain to a page in another context or from another domain. : This can allow two open documents (two browser tabs) or frames within a single page to communicate and interact with each other.

Microdata : Microdata is an additional set of attributes that can added to markup in order to add semantic data. : Microdata works by defining a specific element as a single item through the use of a the itemscope attribute. Child elements of that element then may include attributes defining their content as being various properties of the item. : This can be used for any of the structured data use cases explained in the Semantic Markup chapters (it is mentioned in brief detail in the “Other Formats” chapter).

Data attribute : HTML 5 introduced a customizable data-attribute that can be added to any element. It takes the form of data-*. : This attribute’s value can be accessed (read and set) by Javascript. Since Javascript already provides methods associated with each element, the addition of arbitrary attributes means that each element in the document is a full-fledged object in the Object Oriented sense of the word.

Browser Compatibility

The HTML 5 specification has not yet been fully implemented in all major browsers, though it is very close at this point.

The features mentioned in this chapter are all widely supported, but there are exceptions. Other, more “cutting edge” features have only been included in one or two browsers. But this is likely to change.

New browser versions are released with some frequency, and keeping up with what specific features are available in each browser is beyond the scope of this guide.

To see if a feature you want to use is available in various major browsers, check out CanIUse.com. They keep track of browser support for every feature in HTML, CSS, and JS.

HTML as an application platform

There was a time when building an in-browser app with HTML was something of a chore, and developers found ways to bring desktop-like technology into the browser. Flash and Java applets are the most prominent example of this tendency to bring non-HTML development into the browser.

Now, the trend is reversed. HTML 5, along with Javascript, is such a robust platform for developing rich applications that developers want to build apps for the browser and are looking for ways to bring their browser-development methods into non-browser environments.

As of the time of this writing, the best platform for doing that is a tool called PhoneGap.

PhoneGap wraps a browser-based application (HTML, CSS, Javascript) in a “headless” web browser and application wrapper written natively for each supported platform: iOS, Android, Windows Phone, etc.

Other tools that provide a similar bridge between browser technology and native app experience include Ionic and AppGyver. This is a hot area of current development, so new tools are certain to appear.

Conclusion

Web browsers used to be tools for looking at documents. Now they are platforms for rich media experiences and internet-connected software applications.

This evolution, from documents to apps, has taken place slowly over the course of the last decade or more. It’s been driven by “bottom up” work by developers of web sites and web browsers, as well as “top down” declarations such the HTML 5 specification.

It’s no longer enough for a “web developer” to know how to translate a Photoshop design into a pixel-perfect rendered web page, and HTMl is just the tip of the iceberg for required skills.

The next two chapters touch on this expanded role for HTML developers, digging into how HTML integrates with CSS and JavaScript.

8. Semantic HTML

Introduction to Semantic Markup

This chapter introduces the idea of semantic markup, explains how semantic markup is different than syntactic (normal) markup, and gives some examples. It also explains the value of semantic markup for SEO and accessibility.

What is semantic markup? The word “semantic” means relating to meaning. So semantic markup is markup that relates to the meaning of the content, rather than the presentation of it.

Example: Meaning of words

A simple example is the difference between two HTML tags which both cause text to display in bold. The <b> tag specifies that some text should be displayed as bold, but carries no additional meaning about that text. The <strong> tag also causes most browsers to display text in bold, but it also carries meaning: this text is important.

The <strong> tag carries semantic meaning, whereas the <b> tag only has presentational meaning.

For another example, consider the following two headlines.


<h2>Headline</h2>

<big><b>Headline</b></big>

The first headline, using a proper headline tag, clearly identifies the words inside the element as being a headline. The second headline only specifies what the text should look like (big and bold), not what the text means.

(Incidentally, this example highlights why using visual editors like Microsoft Word to create documents can create problems.)

This is semantic markup at its most basic, what some people call Plain Old Semantic HTML (POSH — see below), but much more is possible.

Example: Meaning of content

Using one of several expanded vocabularies, additional semantic information can be included in the markup.

For example, imagine a product listing on an ecommerce site. It might include details like the price, item name, inventory number, product description, and other details. For this example, we’ll look at the simplest product listing of all — a product name with a price and description.

<span class="product-name">ABC-15 Ergonomic Office Chair</span>
<p><b>Price:</b> $550</p>
<p><b>Description:</b> This is a great chair! It has adjustable arms and the seat tilts back and forth.</p>

Such a product description would be perfectly fine for a human reader. It contains all the detail that is needed, and the presentation makes it clear what information is referring to.

But it isn’t provided in a way that a computer can understand — it’s just a bunch of text in a document. There’s nothing to indicate that it is a product being offered for sale, what type of product it is, or which string of numbers refers to a price. It isn’t even really clear if all the different paragraphs contain information that is related to the item called “ABC-15 Ergonomic Office Chair.”

Using semantic markup, this information can be made clear.

<div itemscope itemtype="http://schema.org/Product">

 <h2 itemprop="name">ABC-15 Ergonomic Office Chair</h2>
 <dl>
  <dt>Price</dt>
  <dd itemprop="offers" itemscope itemtype="http://schema.org/Offer">
   <span itemprop="priceCurrency" content="USD">$</span><span itemprop="price" content="550.00">550.00</span>
  </dd>
  <dt>Description</dt>
  <dd itemprop="description">
   This is a great chair! It has adjustable arms and the seat tilts back and forth.
  </dd>
 </dl>

</div>

<!-- Don't worry if everything in this example doesn't make sense yet. Just try to get the general gist. -->

In this updated example, the <div>, with the itemscope attribute, makes it clear that everything inside the <div> is referring to a single thing. The attribute itemtype specifies what that thing is (a product) by making reference to a standardized vocabulary’s definition of it. It is saying, in essence, “The contents of this <div> refer to a thing called a product, as schema.org defines a product.”

Within the product <div>, markup specifies which element contains the price, what the price actually is, and what currency that price is in. (You can’t assume these things online — it is a world wide web, afterall.)

While this example is about a piece of content which happens to be a product for sale, there are other types of things that can be specified in this manner. For example, you can add markup which specififes that a piece of content is about a specific person, an event, a piece of art, or a movie (and several other things as well).

Example: Relationships

Another use for semantic markup is to encode not just on-page content, but the relationship between one page and another page.

Whenever a page links to another page, one of the available attributes in the <link> or <a> (anchor) tag is rel. This allows you to specify the relationship between the current page (or article) and the page being linked to.

<article>
 <h2>Everything You Want to Know About Semantic HTML</h2>
 <p>...</p>
 <p>...</p>
 <p>...</p>
 <footer>
  <dl>
   <dt>Posted on:</dt>
   <dd>January 13, 2015</dd>
   <dt>Author</dt>
   <dd><a rel="author" href="http://whoishostingthis.com">WhoIsHostingThis</a></dd>
  </dl>
 </footer>
</article>

Why bother with Semantic HTML?

Choosing to include semantic information into your web pages is not a small decision. As you can see, it includes a lot of extra markup which wouldn’t be there if the entire focus was simply on how your content looked. This requires extra work, either by you or your developers. If you are using a CMS (Content Management System), the additional markup will have to be added to template files.

So, why bother with all that extra work if it doesn’t affect what people see when they look at your site?

Reason: Bots

The first thing to remember is that humans are not the only visitors to your website. Search engines crawl websites, and make decisions about what your sight is about based on what they find. If additional markup can help search engines appropriately rank and display your site, shouldn’t you do it?

Reason: Apps

Wouldn’t it be great if clicking on a phone number launched a phone call, or you could automatically import event details off a web page and into your calendar app?

Apps and browser plugins to provide this sort of interactivity exist already, and more are coming. They rely on semantic markup to properly parse out things like phone numbers, contact details, event information, and addresses.

If you want your own site to have this level of interactivity and data accessibility, semantic markup is critical.

Reason: Accessibility

Speaking of accessibility, the next issue is that not all human visitors have the same capabilities. Blind people and those with other visual problems cannot use your slick design to determine how content relates — they use a screen-reader that parses your content. Screen readers need to know which information is what, otherwise it sounds like a huge block of disorganized words.

Reason: Responsive Design

The screen-reader issue relates to a larger issue of adaptability. The difference between screens and screen-readers is the most pronounced difference between presentational modes, but a similar difference exists between desktop screens and mobile screens, between touch-sensitive and conventional screens, between onscreen display and printed page. Focusing first on good semantic markup (especially POSH) makes adaptive, responsive design easier (or, at least, possible).

Reason: Design Changes

Still related to adaptability is the issue of future design changes. One of the promises of using CSS stylesheets is that markup (content) and style (design) are separated so that the design can easily be changed without having to touch the markup. In theory, a new stylesheet can be swapped in for an old one without creating too many problems. In reality, there are usually too many problems. This is because many creators of HTML documents continue to rely too much on presentational markup. Focusing on semantic markup (again, especially POSH) helps curb that tendency.

Reason: Smarter content

Many people do not take the time to think about the way their content is structured. People use a visual editor to make a headline bold and big, without bothering to think about the fact that it’s a headline. People use lists in weird ways, or fail to use lists when it would be a good idea. People don’t bother to think through which section of their page is the main section, what piece of content constitutes an article, or why they are linking to some particular website.

Often, this results in chaotic and disorganized pages that lack any type of design consistency. Often, parallel types of content will have dissimilar display styling. Hierarchies of importance are lacking, so you can’t immediately tell what is primary and what is secondary. Key information is lacking.

Reason: The Future Semantic Web

The final reason to bother with Semantic Markup is that it contributes to a future where all content, data, and information is globally searchable in a way more meaningful than currently available.

Search engine results provide links to documents and resources, but what if they could provide answers and facts?

This is already starting to happen, and the potential is beyond anything currently possible. But it requires meaningful data, structured in a way that computers can make use of. Semantic markup is the first step towards a semantic web.

How to implement semantic markup

There are several different approaches to embedding semantic information into the HTML on a page, and several layers at which such markup happens.

Since this is an introductory guide, the following chapters will cover only the most high-impact, commonly-supported vocabularies and methods.

The way to get started is to just jump in, to make it a point to use Plain Old Semantic HTML (POSH — see the next chapter) markup whenever possible, and to incorporate each “layer” of semantic markup one piece at a time.

The chapters that follow provide a suggested order for implementing semantic markup. Take it one step at a time. As you incorporate more and more aspects of semantic markup it will start to influence the way you think about page design, content, and meaning. You will start to see results in search engine rankings, social sharing, and general engagement. Semantics will become a natural part of how you think about your website.

Conclusion

The Semantic Web is an exciting frontier for advancement in fields like artificial intelligence and data analysis. If you’ve ever imagined a computer that could simply answer complex questions for you, based on the world’s available information, you have been thinking about a semantic web.

Adding semantic markup to your own site contributes to this worldwide effort, and also benefits you immediately in terms of search engine display and being more accessible for disabled users.

The following chapters will cover some of the primary techniques for providing semantic data in your website.

Plain Old Semantic HTML

This chapter begins our tour of semantic markup methods with a focus on the foundations of all such implementation: solid, standard markup.

What is Plain Old Semantic HTML?

Plain Old Semantic HTML — also known as POSH is straightforward, standard markup that simply says what it means and means what it says within the constraints of standard HTML. It doesn’t carry any extra data (that’s the next chapter), but rather makes the content of the page more transparent and obvious because all the elements are used properly.

Proper Document Structure

The most important aspect of Semantic HTML is proper document structure. Screen readers, search bots, and whatever other technologies develop to read web content will have a much easier time of figuring out where on your page your main content is if you structure your HTML documents properly.

Elements of proper HTML document structure can best be thought of as a hierarchical tree:

  • <!DOCTYPE html> — The document should be prefaced with a proper HTML5 document-type specification.
  • <html> — The entire contents of the document should be enclosed within the <html> element.
    • <head> — Page title, meta-data, CSS links, and other similar items should be contained within the document <head>.
    • <meta charset="UTF-8">
    • <title> — The human-readable, but search-engine optimized title.
    • <meta name="description">
    • <meta name="keywords">
    • <meta name="author">
    • <link rel="stylesheet">
    • <script type="text/javascript" async>
      • JavaScript should be placed either in the header or just before the closing </body> tag.
      • The conventional wisdom has been to place it at the bottom of the page to speed up overall page rendering time, because the page waits to parse the JS before continuing to render the markup.
      • However — Semantically, scripts belong in the document head because they are not part of the content, and they pertain to the entire document.
      • As of HTML5 and the latest generation of web browsers, this can safely be done by adding the “async” attribute to the script tag. This tells the browser to load and parse the script in a separate process without waiting.
      • Unfortunately, this does not work for those people who will have the longest load times — people still using older browsers. Therefore, this is something you would need to consider based on your target (or likely) audience.
    • <body> — The visible content of the document should be placed inside the <body> element, which should immediately follow the closing of the </head> element.
    • <header> — Site name, logo, main navigation, and similar “masthead” elements (such as, perhaps, a search form) should all be contained with the document’s <header>.
      • <h1> — The site name, or page title, should usually be placed within the <header>.
      • <nav> — The main navigation menu for a site or document should be placed inside a <nav> element, inside of the <header>.
        • <a> — Several internal site links.
    • <main> — The main or primary content of a page should, if possible, immediately follow the <header>.
      • <article> — Unique, stand alone pieces of content within the <main> section should be contained within an <article> element. In a blog index page, there may be several separate <article> elements, while a regular content or post page would likely only have one (not including sub-articles).
        • <header> — It is best to think of an article as a mini-document. Details like the article title and possibly the author, publication date, and similar items can be included here.
          • <h2> or <h1> — If this is an index page with many <articles>, it is usually best to place the article titles within <h2> headlines. However, if this is the only article on a page, the <h1> headline is best (with the site title in an <h2> element).
          • <time datetime="YYYY-MM-DD">Publication Date</time> — If desired, the publication date should be wrapped in a <time> element, with a standard-format date ( YYYY-MM-DD ) specified in the datetime attribute.
          • <a rel="author">Author Name</a>— If desired, the author’s name should be placed within an anchor tag, linking to the Author’s profile or index page, with the rel attribute specifying the relationship.
        • <section> — The main text of an article ought best contained within a <section> of an article. This is not absolutely necessary (the content could be a series of <p> and other element which are direct children of <article>), but placing the actual content of an article within a <section> makes the most sense, as it puts it makes it parallel to the other parts of the article (<header>, <footer>, and comments).
        • Block-level content markup, including (but not limited to): <h3>, <h4>, <h5>, <p>, <ul>, <ol>, <dl>, and <img>
          • Text-level content markup, such as <a>, <li>, <dt>, <dd>, <strong>, <em>, and <abbr>
        • <footer> — An article-level <footer> is not at all a requirement, but some people prefer to put article meta-data such as author, date, and tags into the footer.
        • <time> — If not used above.
        • <a rel="author"> — If not used above.
        • <a rel="up"> — Can be used to link to category index pages. (
        • <a rel="tag"> — Can be used to link to tag index index pages.
        • <aside> — A section for comments. There is not complete consensus about using the <aside> element as a container for comments. Some people prefer to use a <section> or <div>, or to include comments into the article’s <footer>. However, <aside> makes sense logically, since the comments to an article are secondary to the article’s content, and that is the relationship expressed by <aside>.
        • <form> — The comment input form.
        • <article> — Each comment can be enclosed in an <article> tag. There is some disagreement about this point, because not all people think a comment satisfies the definition of an <article> (a piece of stand alone content). However, a good working definition of <article> is “something which would be a single item in an RSS feed.” Many blogging platforms create an RSS feed of comments, with each comment being a single item in the feed.
          • <header> — If each comment is an article, then basic comment meta-data (like the comment author and publication time) should be included in a header.
            • <a rel="author"> — The comment author’s name, linking to the author’s website (if provided).
            • <time> — The date that the comment was submitted.
          • <section> — The actual content of the comment.
      • <footer> — A <footer> that belongs to <main> (rather than the <article>) is a good place to put paginated navigation (next post, previous post).
        • <a rel="next">
        • <a rel="prev">
    • <aside> — One or more <aside> elements should be used for sidebars and similar page sections.
      • <section> or <article> — Each widget or unique piece of sidebar content should be enclosed in a <section> or <article> element. Using <section> probably makes the most sense, but some people would argue that a widget is a stand-alone piece of content and should be therefore use the <article> tag. There is no consensus on this point.
        • Tags used in the widget <section> or <article> will vary depending on the type of widget. A date archive may have an unordered list ( <ul> ) of monthly links, while a site search widget would use a <form>.
    • <footer> — The <footer> of a web page usually contains information like the copyright notice, contact information, secondary navigation (or a repeat of primary navigation)
      • <span> or <p> — Copyright notice
        • <time datetime="YYYY"> — The copyright year can be represened with the <time> element.
        • <a rel="publisher"> — The name of the copyright holder that follows the year in the copyright notice is usually the name of the publishing company or blog owner.
        • <a rel="license"> — Link to a license, if applicable. This is usually used for indicating linking to a Creative Commons, GPL, or other Open License.
      • <address> — Contact information for the site owner(s) should be inside an <address> element (which is not used for formatting arbitrary postal addresses).
      • <nav> — The secondary or (repeated) primary navigation for the site.
        • <a> — Several internal site links.
    • <script> — If not included in the document <head>, scripts should be placed just before the end of the </body>

Often, people think of a page (document) like a paper document, an essentially linear series of items. But HTML is actually a nested tree structure, with elements and child elements. Understanding this helps keep things organized and in semantically meaningful relationships with each other.

Here is an example of the above structure.


<!DOCTYPE html>
<html>
 <head>
  <meta charset="UTF-8">
  <title>Sematic HTML</title>
  <meta name="description" content="Introduction to Semantic HTML and Semantic Markup">
  <meta name="keywords" content="Semantic HTML, Sematic Markup, Microformats, POSH, schema.org">
  <meta name="author" content="WhoIsHostingThis">
  <link rel="stylesheet" type="text/css" href="style.css">
  <script type="text/javascript" src="wiht.js" async></script>
 </head>
 <body>
  <header>
   <h2><a href="home">WhoIsHostingThis</a></h2> 
       <!-- Not the home page of a site, so the site title/logo in the header is not H1 -->
   <nav>
    <a href="about">About</a>
    <a href="contact">Contact</a>
       <!-- Etc... -->
   </nav>
  </header>
  <main>
   <article>
    <head>
     <h1>Semantic Markup</h1>
         <!-- <h1> headline because this is a single-article page. Otherwise, the site title would receive the <h1> and each article title would use an <h2> -->
     <dl>
      <dt>Author</dt>
      <dd><a href="Adam Wood" rel="author">Adam Wood</a></dd>
      <dt>Published on</dt>
      <dd><time datetime="2015-06-01">1 June 2015</time>
     </dl>
    </head>
    <section>
     <p>...</p>
     <p>...</p>
     <p>...</p>
         <!-- Actual Content of Article -->
    </section>
    <footer>
     <dl>
      <dt>Posted in</dt>
      <dd><a href="html-guides" rel="up">HTML Guides</a></dd>
      <dt>Tags</dt>
      <dd>
       <ul>
        <li><a href="tag/semantic-markup" rel="tag">Semantic Markup</a></li>
        <li><a href="tag/html" rel="tag">HTML</a></li>
        <li><a href="tag/seo" rel="tag">SEO</a></li>
       </ul>
      </dd>
     </dl>
    </footer>
    <aside>
     <form>
      <!-- Comment Form -->
     </form>
     <article>
      <header>
       <a href="http://commenterwebsite.example.com">John Q. Public</a>, at <time datetime="2015-06-02 23:14">11:14 PM on 2 June 2015</time>
      </header>
      <section>
        <p>I think this article is great!</p>
      </section>
     </article>
         <!-- Each comment would be a separate <article> element. -->
    </aside>
   </article>
       <!-- If this were a blog index page, there would be more <article> elements -->
   <footer>
    <a href="textual-html" rel="prev">Textual HTML</a>
    <a href="forms" rel="next">Forms</a>
        <!-- In a multi-article blog index page, the <footer> of main would typically include links for paging through older and newer posts. -->
   </footer>
  </main>
  <aside>
   <section>
    <h4>Blog Archives</h4>
    <ul>
     <li><a href="archive/2015/05">May 2015</a></li>
     <li><a href="archive/2015/04">April 2015</a></li>
     <li><a href="archive/2015/03">March 2015</a></li>
     <li><a href="archive/2015/02">February 2015</a></li>
    </ul>
   </section>
      <!-- Additional <section> elements for each sidebar widget. -->
      <!-- Some people think <article> is an appropriate element for sidebar widgets. -->
  </aside>
  <footer> 
   <section> <!-- copyright -->
    &copy; <time datetime="2015">2015</time> <a href="http://whoishostingthis.com" rel="publisher">WhoIsHostingThis</a>. Released under Creative Commons <a href="http://creativecommons.org/licenses/by-nc-nd/4.0/" rel="license">CC BY-NC-ND 4.0</a>
        <!-- NOTE: WhoIsHostingThis does NOT release our content under Creative Commons. This is simply an example of how to use the rel="license" attribute -->
   </section>
   <nav>
    <a href="about">About</a>
    <a href="contact">Contact</a>
       <!-- Etc... -->
   </nav>
   <address>
    27 Mortimer Street, London, W1T 3BL, UK.
   </address>
  </footer>
 </body>
</html>

This example would obviously need to be modified in various ways for different types of pages. For example a blog index page would put the site title into an <h1> tag, include multiple <article> elements headlined with <h2>, and exclude comments altogether.

Class and id Attributes

Part of POSH is including semantically meaningful class and id names. But they are often underused, overused, or used inappropriately

Many people use class and id attributes for no reason other than to specify CSS selectors. Often this gets done at the same time that CSS is being written so you get oddly-specific attribute values, class names that should be id names, and redundant markup.

<!-- DO NOT DO THESE THINGS -->

<body>
 <header class="header">  
  <h1 class="logo">WhoIsHostingThis</h1> 
  <nav class="nav float-right">
   ...
  </nav>
 </header>
 <main class="main left-column">
  <article class="first-article">
   <header>
    <h2 class="article-title green">Lorem Ipsum Dolor Sit</h2>
    <dl class="article-metadata">
     <dt>Author</dt>
     <dd class="author"><a href="/authors/cicero">Marcus Tullius Cicero</a></dd>
     <dt>Published on</dt>
     <dd class="publish-date"><time datetime="2015-06-01" pubdate>1 June 2015</time></dd>
    </dl>
   </header>
   <section>
    
    <p class="first-paragraph">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec mollis tincidunt enim, nec sollicitudin sapien ultrices in. </p>
    
    <p>Maecenas sagittis tortor eget enim suscipit, ac iaculis arcu faucibus. Sed bibendum et urna id pharetra. Vivamus ornare pharetra turpis non imperdiet.</p>

    <h3>Etiam Non?</h3> <!-- Sectional heading -->
    <p>Etiam non pretium quam. Mauris eu pellentesque lectus. Duis vitae orci eu diam fermentum aliquam. Phasellus eu nisi varius dui laoreet cursus.</p>

    <h3>Vivamus Hendrerit, yo</h3> <!-- Sectional heading -->
    <p>Vivamus hendrerit a purus eu suscipit. Aenean convallis pellentesque bibendum. Fusce at ante hendrerit, congue nulla vel, facilisis lectus.</p>

   </section>
  </article>
  <article>
   <!-- another article - - this could be an index page -->
  </article>
  <article>
   <!-- another article - - this could be an index page -->
  </article>
 </main>
 <aside class="right-column">
  <section class="widget monthly-archive">
   <h4 class="widget-title">Monthly Archive</h4>
   <ul>
     <li><a href="archive/2015/05">May 2015</a></li>
     <li><a href="archive/2015/04">April 2015</a></li>
     <li><a href="archive/2015/03">March 2015</a></li>
     <li><a href="archive/2015/02">February 2015</a></li>
    </ul>
  </section>
 </aside>
 <footer class="footer">
  <div class="copyright">
   &copy; 2015 WhoIsHostingThis
  </div>
  <nav class="footer-navigation">
   ...
  </nav>
 </footer>
</body>

So what’s wrong with all that?

  • Does the <header> element need to be classified as a header? Likewise the <main>, <nav>, and <footer> elements? This is redundant, but very common, because there was a time when developers created documents with dozens of nested <div> elements which needed to be differentiated. However, now that many of these sections are uniquely named elements, that is not required.

  • Even in the case of elements which appear multiple times in a document, the context provides enough differentiation that an additional class name is usually not required. For example, the <body>...<footer></footer></body> is different than <article>...<footer></footer></article>, and can be accessed via CSS without further specification.
    • One has to be careful about this. The <footer> inside of the <article> will be affected by any styles that target body footer, because the article itself is still in the <body>. The solution to this problem, though, is simple. CSS provides a selector for “immediate child,” the > sign. So body > footer would only target the <footer> element at the end of the document, not the one inside of <article>.
  • Similarly, even class names which don’t repeat the semantic meaning of the tag itself can sometimes be eliminated. For example, the class="logo" attribute is likely redundant because there will only be one <h1> (or <h2>) tag inside of the <header>

  • Using class names that imply a particular presentational style, like class="green" or class="right-column" is almost always a bad idea. Doing so means that later, when the site is redesigned (this will always happen eventually), the HTML will have to be rewritten, or bizarre CSS rules will have to be created, like green { color: red; } and left-column { float: right; }. In a small example with only a few items, these seems like no big deal, but in a large or complex code base this can become a serious hurdle to maintenance, because developers will get confused.

  • Some of the classes used are not sensible classes, but should rather function as ids. Many people think of these things interchangeably, with the exception that multiple elements can have the same class attribute, whereas id elements have to be unique. In order, then, to not have to think too hard about whether any particular designation would be unique or not, they simply put everything into the class attribute.

    This is wrong. Class attributes describe an element as a type of a thing, whereas id attributes identify that specific element in particular.

    For example, a widget is a type of a thing, so it makes sense to label something as belonging to the class of widget with class="widget". However, the element that described the publication date on a particular article is a specific element which might need to be identified individually for some reason.

    If you mean “all the things like this,” then you are talking about targeting members of a class. If you are talking about “this item in particular,” then you are talking about targeting an element by using its id.

    For this reason, class is sometimes the most semantically appropriate way to describe something, even if there is only one of it. For example widget is a type of thing. Even though there is only one of them at the moment, it still wouldn’t make any sense to add the attribute id="widget" to the <section> that defines the widget.

    However, several class names used in the example above would be more semantically appropriate as ids. For example:

    • monthly-archive describes one specific widget
    • author and publish-date both describe the content of specific description elements

    Unfortunately, this is not always clear. On a blog index page, many author and publish-date elements might be employed. It may be that these are simply types of <dd> elements, not specific <dd> elements. (It also may be the case that this is completely moot, as there may be no need to target those elements with either CSS or JavaScript, the two primary consumers of the HTML class and id API.)

  • The <h3> elements defining subsection headings within the article, which do not have either an id or a class, could benefit from an id attribute. This is because one of the uses for id attributes is creating links within a document. If a document is particularly long, it is a good idea to add id attributes to section headings so that someone can link to a particular portion of the document. (For a good example of this in practice, see any Wikipedia article.)
    • This is something that has to be done with each new piece of content that is created, as opposed to many other things discussed in this chapter which would be handled only once in a site design. For this reason, if your website frequently publishes long-form articles, it is a good idea to create a standard process for this. For example, MediaWiki (the technology that powers Wikipedia) creates these ids and links automatically, and a similar functionality can be added to WordPress with the use of a Table of Contents plugin.
  • The class="first-article" is unneeded. If there is a desire to target the first article on the page, CSS provides the :first-of-type selector. Using article:first-of-type in the CSS would then target only that single article. This goes for the class="first-paragraph" attribute as well.

  • The <section> elements would be improved with either class or id elements, because it is possible that a future revision of the page would add a <section> within the same parent element.

  • Similarly, the <aside> element which is functioning as a sidebar might be better with a class="sidebar" attribute, since that is its semantic (not just layout) function, and it is conceivable that there would be other <aside> elements added in the future.

  • If there is any reason to target the article author’s name or the publication date individually with CSS style declarations, the rel value or the pubdate attribute can be used. This is done with the CSS attribute selector. For example: a[rel="author"]

Here is the above code example, with more appropriate class and id attributes.

<!-- DO THIS INSTEAD -->

<body>
 <header>  
  <h1>WhoIsHostingThis</h1> 
  <nav>
   ...
  </nav>
 </header>
 <main>
  <article>
   <header>
    <h2>Lorem Ipsum Dolor Sit</h2>
    <dl>
     <dt>Author</dt>
     <dd><a href="/authors/cicero">Marcus Tullius Cicero</a></dd>
     <dt>Published on</dt>
     <dd><time datetime="2015-06-01" pubdate>1 June 2015</time></dd>
    </dl>
   </header>
   <section>
    
    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec mollis tincidunt enim, nec sollicitudin sapien ultrices in. </p>
    
    <p>Maecenas sagittis tortor eget enim suscipit, ac iaculis arcu faucibus. Sed bibendum et urna id pharetra. Vivamus ornare pharetra turpis non imperdiet.</p>

    <h3 id="etiam-non">Etiam Non?</h3> 
    <p>Etiam non pretium quam. Mauris eu pellentesque lectus. Duis vitae orci eu diam fermentum aliquam. Phasellus eu nisi varius dui laoreet cursus.</p>

    <h3 id="vivamus-hendrerit">Vivamus Hendrerit, yo</h3> 
    <p>Vivamus hendrerit a purus eu suscipit. Aenean convallis pellentesque bibendum. Fusce at ante hendrerit, congue nulla vel, facilisis lectus.</p>

   </section>
  </article>
  <article>
   <!-- another article - - this could be an index page -->
  </article>
  <article>
   <!-- another article - - this could be an index page -->
  </article>
 </main>
 <aside class="sidebar">
  <section class="widget" id="monthly-archive">
   <h4>Monthly Archive</h4>
   <ul>
     <li><a href="archive/2015/05">May 2015</a></li>
     <li><a href="archive/2015/04">April 2015</a></li>
     <li><a href="archive/2015/03">March 2015</a></li>
     <li><a href="archive/2015/02">February 2015</a></li>
    </ul>
  </section>
 </aside>
 <footer>
  <section id="copyright">
   &copy; 2015 WhoIsHostingThis
  </section>
  <nav>
   ...
  </nav>
 </footer>
</body>

This makes the source easier to read and more meaningful, which is good for developers and document authors.

One interesting way this makes things easier is in the way it helps someone coding CSS to think about the document structurally, rather than having to remember class names.

For example, consider targeting the first paragraph of an article for special styling rules. When relying on the class attribute, the coder has to remember (and re-check) the exact value of the class attribute. Was it class="first"? Or class="first-p"? Or maybe class="first-para"? When the focus is on element names in a consistent document structure, it is easier to remember “the first paragraph inside the first article, inside the main portion of the document” ( main > article:first-of-type > p:first-of-type ), because it isn’t arbitrary.

Use of rel attribute

Use of the rel attribute for links has already been mentioned, but there are a number of things you can do with it which are extremely easy to implement and can add semantic value to your markup.

The HTML5 officially supported values for rel are:

  • alternate — Linked document is an alternate version of the current document; that is, same content, different format. For example, an RSS feed that contains the same content as a blog index page or a printable PDF of an article.
  • author — Linked document is the profile or home page of the author of the current document or article. A single document or article could have several rel="author" links, for example, linking to the on-site author bio, the author’s Facebook profile, the author’s Github page, the author’s Wikipedia bio, etc. If you link to a Google+ profile, Google will use the profile information to enrich the snippet displayed when the page appears in search results.
  • help — Linked document is a help document related to the current document. This is especially helpful for web apps. It could, in theory, be used for a link the “About Us” page as well.
  • license — Linked document contains information about the copyright and licensing of the current document. Most often used to link to a boilerplate Open Source license, like Creative Commons or GPL. Google Image Search and other tools use this information to help users search for freely available content.
  • next — Linked document is the next document in a series with the current document. Can be used for multi-page articles, for the next article on a blog, or for the next set of items in a paginated index or feed. Some browsers will use this information to improve the user experience by pre-fetching the linked document so that it displays immediately when clicked.
  • nofollow — Linked document is specifically not endorsed by the author or publisher of the current document. Used by search engines (especially Google, who introduced it) to discount any benefit normally provided to the linked document by the presence of the link. The rel="no-follow" attribute is often inserted automatically into links submitted by public users (such as through blog comments) in order to discourage spam. Can also be used in regular content when discussing a site that the author does not approve of but wishes to critique.
  • noreferrer — Typically, when a user follows a link from one page to another page, the browser sends the server of the new page the URL of the current page as part of its request. The rel="nofollow" attribute specifically requests that the browser not do this. Not all browsers support this request, however.
  • prefetch — Requests that the linked document should be prefetched so that it displays instantly when the link is followed. That is, it has the same functionality as rel="next", but without the semantic meaning indicating a series of documents.
  • prev — Indicates that the linked document is the previous document in a series. Can be used for multi-page articles, for the previous article on a blog, or for the previous set of items in a paginated index or feed. As with rel="next", some browsers will use this information to improve the user experience by pre-fetching the linked document so that it displays immediately when clicked, although this will usually only be done after any rel="next" links are prefetched.
  • search — The linked document is a search tool for the current document or larger site.
  • tag — The linked document provides context for the current document’s content. For example, if you wrote an article about the musician Irving Berlin, it might be a good idea to link to the Wikipedia page on Irving Berlin, so that search engines don’t misunderstand you and think you are talking about Berlin, Germany.

Of these, rel="author" and rel="license" have the most direct impact on search results, while the next, prev, and prefetch have the most impact on user experience.

In addition to these, there are a few others which are not “officially supported” by the HTML5 specifications, but which seem to be used by some search engines, browser tools, and other technologies.

  • up — The linked document is the parent of the current document. Can be used to link to category archives to which the current article belongs (in a blog), to the home page, or to the next higher document in a hierarchical structure. The attribute pairs well with breadcrumb navigation, which is generally recognized as having a positive impact on SEO. It seems to have some influence over how Google constructs their index of your site.
  • archives — The linked document is of historical interest to the current document. This could be used for date-based archives on a blog, or for previous versions of a document.
  • canonical — The linked document is the canonical version of the current document. If there are multiple versions of the same content, they should all link to a single “control” or “canonical” instance of that content, in order to avoid search engines penalizing for duplicating content. The rel="canonical" attribute is commonly used in a <link> element in the <head>. Often employed in order to collapse www and non-www prefixed requests into a single resource, because these are technically different URLs, but almost always refer to the same document or resource.
  • category — The linked document is the index page for the category to which the current document belongs. This was implemented by WordPress. It is unclear if this provides a different level of benefit than rel="up", since neither are supported by the HTML5 specification.
  • discussion — The linked document contains a discussion of the current document. Useful for things like Talk pages in a wiki.
  • edit — The linked resource is the edit page for the current document. Used by Wikipedia. Probably provides no SEO benefit, but may improve accessibility.
  • home — The linked document is the home page for the current site.
  • index — The linked document provides a linked index (a list of pointers) to the current page. This was a part of the HTML 4.01 specification, but was removed in HTML 5.
  • prerender — Sends a request to the browser to prefetch, and render, the linked document — including running any scripts. This rel value was implemented by Google for Chrome, and some other browsers have adopted it.
  • profile — The linked document is a metadata profile for the current document. Used in combination with certain semantic markup vocabularies to specify which vocabulary is being used.
  • publisher — The linked document is the home page or profile of the publisher of the current document. Used by Google to display rich snippets in search results and verify ownership of sites.
  • sitemap — The linked document is a sitemap for the current site.
  • syndication — The linked document is a syndicated copy of the content of the current document. It is unclear if this is used by anyone, as the most commonly recognized way of linking to a syndication feed is with the value rel="alternate", which is supported by the HTML5 specification.
  • shortlink — The URL indicated in href is a shortened URL for the current page.
  • Many years ago, the XFN (XHTML Friends Network) began championing the use of rel values to indicate the social relationships between authors of documents and other web publishers. Sadly, there seems to be very few consuming applications for this data. The idea, though, is brilliant and deserves a mention here. The list of XFN-related rel values is:
  • acquaintance
  • friend
  • met
  • co-worker
  • colleague
  • co-resident
  • neighbor
  • child
  • parent
  • sibling
  • spouse
  • muse
  • crush
  • date
  • sweetheart

Some of these rel values are used by particular technologies (apps, web crawlers, screen readers, social networks, search engines) and some are not. None of them can do any harm to a link that you may already have been planning to include in a document, and they may very well help.

One of the interesting things about the rel attribute is that it has a history of being used informally and experimentally, outside of the current HTML specification. The idea has always been to make the web more useful by providing more information, just in case someone, someday, might be able to make use of that information.

For this reason, it seems that even rel values with little support should be used (whenever doing so is not an inconvenience) in that spirit. It is impossible to know what the future may hold, and it might just turn out that one or another usage of rel attributes makes your own site more valuable and helpful to others.

Search and Social Enhancement

This chapter introduces the idea of embedded data — adding structured, computer-readable data into websites. We look at several formats, focusing on the techniques that affect how your site appears in social links and search engine results pages.

Structured Data

Structured data is data which is organized in a standardized way so that a computer program can read it. Structured data is a major component of the semantic web. It is a start to be able to use markup to identify which section of a page is most important, but it’s an entirely other thing to be able to embed facts, data, and knowledge into your site in a way that can be easily extracted by applications.

Structured data requires special formatting conventions. A number of specific formats for providing information directly to search and social media applications have developed which greatly enhance the way a web page is displayed when it appears in a feed or in search results.

These additions to your site’s markup take a little extra work sometimes, but they are extremely high-impact and beneficial.

Facebook Open Graph

When someone links to a web page on Facebook, the system automatically displays the link with a title, description, and an image — sometimes the user who creates the link can select the image from several options, and they always have the ability to edit the text (although they rarely do).

Facebook attempts to pull this information from the content of the linked page, but you may have noticed that sometimes it doesn’t pull relevant details: - A link title may only have the site title, not the headline for the linked article. - The description text might be something irrelevant, like the site’s privacy policy or mission statement. - The images might be unconnected to the content being linked to. Sometimes images from ads or from authors of other content on the site get displayed. Other times, no images are provided at all.

These problems can be solved, and you can provide Facebook (and its 1.39 billion users) with precisely the title, description, and image you want them to see. All you have to do is provide Facebook with the information on your own page.

Facebook uses a protocol for providing this information called Open Graph. The OG specification is Open Source and community-driven, and is used by other systems as well as Facebook. It is one of the most important additions you can make to your website, and it’s a fairly simple system:

  • Specifying that you are using the Open Graph protocol by including the OG specification as an attribute inside the <html> tag, like this: <html prefix="og: http://ogp.me/ns#">
  • Include the data you would like them to receive into the <head> of your documents, in <meta> tags.
  • Each <meta> tag takes the same key-value pair form.
    • A property attribute specifies the type of information being provided.
    • All the property values are namespaced with the prefix og:. For example: og:image
    • Subproperties (such as the height of an image) are specified with colon-delimited suffixes. For example: og:image:height
    • Subproperties should be specified in a <meta> tag immediately following the item they refer to.
    • Multiple <meta> tags can use the same property, indicating that the property has an array of values. This can be used (for instance) to provide several image options.
    • A content attribute provides the information.
    • In the case of og:title and og:description, among others, this would be text.
    • For og:image and other media-related objects, the content is the URL
    • For some properties, like og:image:height, the content is a number.

A basic implementation would look like this:


<html prefix="og: http://ogp.me/ns#">
 <head>
  <title>Semantic Markup for Fun and Profit</title>
  <meta property="og:title" content="Semantic Markup on Facebook" />
  <meta property="og:description" content="Find out everything you need to know about how to make sure Facebook sees your stuff the right way.">
  <meta property="og:type" content="article" />
  <meta property="og:url" content="http://example.com/semantic-markup" />
  <meta property="og:image" content="http://example.com/image.jpg" />
  <meta property="og:site_name" content="WhoIsHostingThis.com">
 </head>
 ...
</html>

Notice how the og:title and the page <title> are different, and the og:description specifically mentions Facebook.

You can, of course, use the same information in og:title as in the your page <title>, and you can duplicate your <meta name="description"> content into the og:description. However, using Open Graph allows you to create a specific title, description, image, and other content that specifically targets Facebook. (And other applications that use Open Graph, but mostly just Facebook.)

The complete list of properties and subproperties is:

  • og:title — The title of your document or content object. Accepted value is arbitrary text. This property should always be specified
  • og:type — The type of your content. If not specified, the default is “website.” Each type has a number of additional properties which are only used when the document is of that type. Accepted values are:
    • article — A (relatively) short, standalone piece of content, like a blog post, op-ed, or magazine article.
    • Properties which can only belong to an article:
      • article:published_time - datetime - When the article was first published.
      • article:modified_time - datetime - When the article was last changed.
      • article:expiration_time - datetime - When the article is out of date after.
      • article:author - profile array - Writers of the article.
      • article:section - string - A high-level section name. E.g. Technology
      • article:tag - string array - Tag words associated with this article.
    • book — A book. The document might be an online version of a book. Alternatively, the page might be a reference to a book, such as a entry in a library catalog system.
    • Properties which can only belong to a book:
      • book:author - profile array - Who wrote this book.
      • book:isbn - string - The ISBN
      • book:release_date - datetime - The date the book was released.
      • book:tag - string array - Tag words associated with this book.
    • profile — The current document or content object is a profile of a real person. Any type of biographical or person-centered page can be designated as a profile. However, Facebook profile pages are Open Graph profile resources. This means that the properties belonging to other types which must link to a profile as their content can link to Facebook profile pages.
    • Properties which can only belong to a profile:
      • profile:first_name - string - A name normally given to an individual by a parent or self-chosen.
      • profile:last_name - string - A name inherited from a family or marriage and by which the individual is commonly known.
      • profile:username - string - A short unique string to identify them.
      • profile:gender - enum(male, female) - Their gender.
    • video.movie — The content of the document or object either is a movie, or refers to a movie.
    • Properties which belong to a video.movie:
      • video:actor - Actors in the movie. The value for video:actor should be the URL of another document or resource that is a profile referring to the actor. Multiple video:actor elements can be included, each one followed by one or more video:actor:role elements defining the role(s) plated by the actor.
      • video:actor:role - string - The role played by the actor.
      • video:director - The value should be the URL of the director’s profile resource. There may be multiple directors.
      • video:writer - The value should be the URL of the writer’s profile. There may be multiple writers.
      • video:duration - The value should be a positive integer larger than one. The value is the total length, in seconds.
      • video:release_date - datetime - The date the movie was released.
      • video:tag - Tag words associated with this movie.
    • video.episode — The document is an episode of a TV show.
    • Properties belonging to a video.episode (see above for descriptions)
      • video:actor
      • video:actor:role
      • video:director
      • video:writer
      • video:duration
      • video:release_date
      • video:tag
      • video:series - The value should be the URL of an object of type tv_show, to which this episode belongs.
    • video.tv_show — The document either contains, is, or refers to a multi-episode show.
    • The available properties for video.tv_show are the same as those for video.movie.
    • video.other — The document either contains, is, or refers to a video.
    • The available properties for video.other are the same as those for video.movie.
    • music.song
    • Properties that belong to music.song are:
      • music:duration - The song’s length in seconds.
      • music:album - The value should be the URL of another resource of type music.album. Multiple music:album properties may be listed.
        • music:album:disc - Which disc of the album this song is on. The value should be a positive integer.
        • music:album:track - Which track this song is.
      • music:musician - The value should be the URL of a profile of the musician who created the song. Multiple music:musician elements may be listed.
    • music.album
    • Properties that belong to music.album are:
      • music:song - The URL to a music.song on this album. Multiple may be listed.
        • music:song:disc - Which disc this song is on in the current album.
        • music:song:track - The track number of this song in the current album.
      • music:musician - Same as above.
      • music:release_date - datetime - The date the album was released.
    • music.playlist
    • Properties that belong to music.album are:
      • music:song
        • music:song:disc
        • music:song:track
      • music:creator - URL to the profile of the person who created the playlist.
    • music.radio_station
    • music:creator - URL to the profile of the person who created the radio station.
  • og:image - An image URL which should represent your page or content. Multiple can be specified.
    • og:image:secure_url - An alternate url to use if the webpage requires HTTPS.
    • og:image:type - A MIME type for this image.
    • og:image:width - The number of pixels wide.
    • og:image:height - The number of pixels high.
  • og:url - The canonical URL of your object that will be used as its permanent ID.
  • og:audio - A URL to an audio file to accompany this object.
    • og:audio:secure_url - An alternate url to use if the webpage requires HTTPS.
    • og:audio:type - A MIME type for this audio file.
  • og:description - A one to two sentence description of your object.
  • og:determiner - The word that appears before this object’s title in a sentence. An enum of (a, an, the, “”, auto). If auto is chosen, the consumer of your data should chose between “a” or “an”. Default is “” (blank).
  • og:locale - The locale these tags are marked up in. Of the format language_TERRITORY. Default is en_US.
    • og:locale:alternate - An array of other locales this page is available in.
  • og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site.
  • og:video - A URL to a video file that complements this object.
    • og:video:secure_url - An alternate url to use if the webpage requires HTTPS.
    • og:video:type - A MIME type for this video.
    • og:video:width - The number of pixels wide.
    • og:video:height - The number of pixels high.

Notice that og:video is used to specify that a linked resource (identfied in the content attribute) complements the current object or document (the same is true for og:audio and og:image). This is different than when the current page is specifically referring to the video itself via the og:type = video.* syntax.

Here are a few examples of Open Graph markup, along with standard meta data, to show some of the variability and options.


<!-- A blog post -->

<html prefix="og: http://ogp.me/ns#">
 <head>
  <title>15 Minute Workout Routine</title>
  <meta name="description" content="Follow this simple routine to maximize workout results and minimize time.">
  <meta property="article:og:author" content="http:..facebook.com/calvin-luther">
  <meta property="og:title" content="Fitness Expert Calvin Luther's Secret Workout Plan" />
  <meta property="og:description" content="You won't believe how easy getting in shape can be.">
  <meta property="og:type" content="article" />
  <meta property="og:url" content="http://fitness.example.com/15-minute-routine" />
  <meta property="og:image" content="http://fitness.example.com/eye-catching-ridiculous-picture.jpg" />
  <meta property="og:site_name" content="http://fitness.example.com">
  <meta property="og:video" content="http://fitness.example.com/videos/15-minutes">
 </head>
 ...
</html>


<!-- A Video -->

<html prefix="og: http://ogp.me/ns#">
 <head>
  <title>15 Minute Workout Routine</title>
  <meta name="description" content="Watch this video and complete the best 15 minute workout.">
  <meta property="og:title" content="Fitness Expert Calvin Luther's Exposes Secret Workout Tricks ON VIDEO" />
  <meta property="og:description" content="You won't believe how easy getting in shape can be.">
  <meta property="og:type" content="video.episode" />
  <meta property="og:video:director" content="http://facebook.com/calvin-luther">
  <meta property="og:url" content="http://fitness.example.com/videos/15-minute-routine" />
  <meta property="og:image" content="http://fitness.example.com/eye-catching-ridiculous-picture.jpg" />
  <meta property="og:site_name" content="http://fitness.example.com">
 </head>
 ...
</html>

Open Graph also has an API for developers to connect games and web apps into Facebook, and an ability to add new og:type values. For more information on Open Graph see their documentation.

Google’s Knowledge Graph

Google isn’t just trying to index the contents of every web page on the planet. They are trying to harvest and present the combined knowledge of all humanity. Quite a task, but they’re well on their way.

The result of this work is the Google Knowledge Graph, which is a body of knowledge that gets displayed along with search results. When you search Google for public figures and organizations, Google will display key information such as a profile image, a logo, contact information, hours of business, and links to social profiles like Google+, Twitter, Facebook, and YouTube.

You can help with this project, and help your self as well, by providing data about yourself and your organization in a structured way so that Google can automatically index it and provide it to users searching for it. This allows organizations and individuals to customize how their public data is displayed in Google results.

There are several supported semantic vocabularies, but the recommended way to provide the data is using the structure published at schema.org.

Schema.org is a project initiated and supported by several of the world’s largest search engines, including Google, Bing, Yahoo, and Yandex. It provides a systematic way of adding semantic data into web resources, primarily as a means of making them easier to search. Using schema.org’s vocabularies, it is easy to (for example) make clear in the markup of an article that you are talking about a PERSON named “Irving Berlin,” and not a European CITY named “Berlin.”

This data can be added directly into the markup of a site (we’ll cover that later), but the easiest way to add the kind of profile information that will show up in the Knowledge Graph results for a person or organization is to use JSON.

JSON stands for “Javascript Object Notation,” and it is simply a structured way of providing data in a clear manner. Because it is a subset of Javascript, it can be added to any HTML document within a <script> tag.

This is easy. For example, here is Google’s own example code for how to indicate an image that should be displayed as your organization’s logo:

    <script type="application/ld+json">
    { 
      "@context": "http://schema.org",
      "@type": "Organization",
      "url": "http://www.example.com",
      "logo": "http://www.example.com/images/logo.png"
    }
    </script>

According to Google, adding on page “like this is a strong signal to our algorithms to show this image in Knowledge Graph displays.”

Let’s look at what is going on in this example.

    <script type="application/ld+json">
     // The <script> tag indicates a portion of the document that is Javascript, an not HTML.

     // The type attribute of "application/ld+json" indicates that the contents of the script element are following the conventions for "Linked Data" presented in Javascript Object Notation


    { 
        // The open curly bracket is the beginning of a single object in JSON. In this case, the object is not the logo, but rather the organization. The logo is going to be an attribute of the object.

        // JSON objects are a series of key-value pairs.

      "@context": "http://schema.org",

        // @context declares what vocabulary is being used in for the structured data.

        // A vocabulary is an agreed-upon set of meanings for words, standardized by some organization and published at a URL.

        // Vocabularies are important because they help everyone know how words are being used.

        // For example: In the U.S. legal system, a corporation is, technically, a "person." In that context, how do you know that a reference to a "person" is a natural person or an incorporated person. The legal system does this by using phrases like "natural person." The semantic web does this by specifying a vocabulary.

        // Once a vocabulary is specified, any word used within that context means EXACTLY and ONLY what the specification says it means. By indicating a vocabulary, the author of a document is agreeing to only use words in the way the vocabulary intends them to be meant.

        // The vocabulary being specified here is the one at schema.org

      "@type": "Organization",

        // Now we have to specify what sort of thing this object is referring to.

        // By stating it is an organization within the context of a schema.org specification, the code is saying: This object is an Organization in the way that schema.org defines what an organizations is.

        // The definition of Organization, according to schema.org, can be found at:
        // http://schema.org/Organization

        // We could be more specific and use one of the subtypes of Organization, such as Corporation or NGO. 

      "url": "http://www.example.com",

        // This is the main web resource that refers to or is identifiable with the Organization.

      "logo": "http://www.example.com/images/logo.png"

        // This specifies that the image indicated is the logo of the organization identified in the URL.
    }
    </script>

There is a lot of meaning built in to a fairly terse grammar, but it isn’t complicated. The most difficult thing about using JSON for this is that it is too easy to lose track of where the quotation marks and commas ought to go.

Look at the example again to see how a JSON object is organized.

    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "Organization",
      "url": "http://www.example.com",
      "logo": "http://www.example.com/images/logo.png"
    }
    </script>

A single object is enclosed within a single set of curly brackets ( { and } ). Each property of the object is defined as a series of key-value pairs, with the name (key) of the property listed before the the value of that property. The key and value are separated by a colon ( : ). Key-value pairs are separated by a comma ( , ).

All keys are strings (plain text), so they are placed inside quote-marks ( " ). When the value is a string as well, it is also placed inside quote marks.

Values do not need to be strings. Sometimes the value may be a number, a date, an array of values, another object, or even an array of objects. For example, several of the properties of Organization are actually (natural) people, like founder. Since founder is a person, and a person can be specified by an object in JSON, the value side of the key-value pair would be an object enclosed in curly braces.

The following example covers several of these ideas.

<script type="application/ld+json">
  {
    "@context" : "http://schema.org",
    "@type" : "Organization",
    "founder" : {             
      "@type" : "Person",       
      "name" : "John Q. Founder",
      "birthDate" : "1980-03-22",
      },                       
    "employee" : [             
      { 
        "@type" : "Person",
        "name" : "Jane D. Coder"
      },
      { 
        "@type" : "Person",
        "name" : "Jason S. A. Designer"
      }
    ],
    "url": "http://www.example.com",
    "logo": "http://www.example.com/images/logo.png"
  }
</script>

<!-- Here is the same code again, with explanations of what is going on. -->




<script type="application/ld+json">
//The JSON object describing the organization is technically Javascript, not HTML, so it needs to be wrapped in a <script> tag.

  {     // The entire description of this organization forms a single JSON object. Objects in JSON are enclosed in curly brackets.
    "@context" : "http://schema.org",  // This specifies the vocabulary being used.
    "@type" : "Organization",   // This object is going to describe an Organization.
    "founder" : {               // The value of "founder" is an object.
      "@type" : "Person",       // The founder of this organization is a person. There is no need to repeat the @context, since this is nested inside the larger context.
      "name" : "John Q. Tech",
      "birthDate" : "1980-03-22",
      },                        // The founder object is closed with the curly brace, and this pair is concluded with a comma.
    "employee" : [              // The value of employee is not one person, but many people. This is specified by making the value of employee an array. The array is notated with square brackets. Inside the array is a set of comma-separated objects, each enclosed in curly brackets.
      {                        // Each employee is a curly-bracketed object.
        "@type" : "Person",    // Each employee is an object of type "Person". 
        "name" : "Jane D. Coder"
      },                       // The curly-bracketed Person-objects are separated with commas.
      {
        "@type" : "Person",
        "name" : "Jason S. A. Designer"
      }
    ],                         // The end of the list of employees requires a closing square bracket. The comma separates the employee definition from the next property.
    "url": "http://www.example.com",
    "logo": "http://www.example.com/images/logo.png"
                               // URL and Logo are both properties of this Organization, as in the previous example.
  } // The object which describes the Organization is closed with a curly bracket.

</script>
<!-- The entire object was inside a <script> tag, which must be closed before any other HTML content appears on the page. -->

Using this format, any person or organization can easily provide as much information as desired in the form of publicly-accessible structured data. More is usually better, but there are a handful of things that Google in particular will use to enhance a Knowledge Graph display about you or your organization.

Most Important Organization Details for the Knowledge Graph

  • Logo (for Organizations)
  • Contact Information
  • List of Key People
  • Locations, with Hours of Operation
  • Links to Social Media Profiles

All of these items about your organization should be provided as structured data within the markup of your website. (Additionally, they should appear in the visible content of your website.)

Here is a brief example of a JSON object notating all of this data about a single organization.


 <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "Organization",
      "name" : "Example Company Name",
      "legalName" : "ExampleCorp, Ltd.",
      "url": "http://www.example.com",
      "logo": "http://www.example.com/images/logo.png",
      "contactPoint" : [
        { 
            "@type" : "ContactPoint",
            "telephone" : "+1-401-555-1212",
            "contactType" : "customer service"
        }
      ],
      "sameAs" : [ 
        "http://wikipedia.org/example-corp",
        "http://www.facebook.com/examplecorp-profile",
        "http://www.twitter.com/exampleCorp",
        "http://plus.google.com/example_corp"
      ],
      "location" : [
        { 
            "@type" : "LocalBusiness",
            "address" : {
                "@type" : "PostalAdress",
                "streetAddress" : "123 Main St.",
                "addressLocality" : "Townville City",
                "addressRegion" : "CA",
                "postalCode" : "12345"
            },
            "geo" : {
                "@type" : "GeoCoordinates"
                "latitude" : "37.42242"
                "longitude" : "-122.0858"
            },
            "openingHours" : "Mo-Sa 08:00-20:00",
            "photo" : "http://example.com/location/photo.jpg"
        }
      ],
      "founder" : {
        "@type" : "Person",
        "name" : "Sheila D. Founder",
        "url" : "http://example.com/people/sheila-founder",
        "sameAs" : [
          "http://wikipedia.org/sheila-founder",
          "http://www.facebook.com/sheila-founder",
          "http://www.twitter.com/sheilaFounder",
          "http://plus.google.com/sheila_founder"
      ],
      "employee" : [
        {
            "@type" : "Kyle C. Eo"
            "jobTitle" : "CEO"
            "url" : "http://example.com/people/kyle-eo",
            "sameAs" : [
              "http://www.facebook.com/kyle-eo",
              "http://www.twitter.com/kyleEo",
              "http://plus.google.com/kyle_eo"
            ]
        },
        {
            "@type" : "Moe Knee"
            "jobTitle" : "CFO"
            "url" : "http://example.com/people/moe-knee",
            "sameAs" : [
              "http://www.facebook.com/moe-knee",
              "http://www.twitter.com/moeKnee",
              "http://plus.google.com/moe_knee"
            ]
        }
      ]
    }
 </script>

Try to see if you can make sense of that before reading the annotated version below.


 <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "Organization",
      "name" : "Example Company Name",
      "legalName" : "ExampleCorp, Ltd.",
      "url": "http://www.example.com",
      "logo": "http://www.example.com/images/logo.png",
        // All this should be familiar so far.
        // The entire object is inside a <script> tag.
        // The vocabulary for this object is schema.org, and this particular object is an organization.
        // Name and Legal Name have been added here. The "name" value is the name your company commonly goes by in normal conversation. The "legalName" value is the official legal name.
        // The "url" is the organization's home page, and "logo" is a link to an image.
      "contactPoint" : [
        // A "contactPoint" is a single way of getting a hold of a company.
        // The square bracket indicates this is an array, which could have multiple contact points in it.
        // With multiple contact points, you can specify additional details such as language, area served, and type of contact.
        { 
            "@type" : "ContactPoint",
            "telephone" : "+1-401-555-1212",
            "contactType" : "customer service"
        }
          // Each contactPoint is its own curly-bracketed object. If there were multiples, they would be separated by commas.
      ],
      "sameAs" : [ 
        // SameAs is used to specify Social Media profiles and other public-info profiles such as a Wikipedia article. 
        // SameAs is a square-bracketed array of URLs.
        "http://wikipedia.org/example-corp",
        "http://www.facebook.com/examplecorp-profile",
        "http://www.twitter.com/exampleCorp",
        "http://plus.google.com/example_corp"
      ],
      "location" : [
        // Location is a square-bracketed array of location objects, each object defining a specific location. 
        // It doesn't have to be an array if there is only one location. It can be simply a single object.
        {
            "@type" : "LocalBusiness",
              // "LocalBusiness" specifically means "A particular physical business or branch of an organization" in the schema.org vocabulary. So this is appropriate, even if it is a single location in a national chain.
            "address" : {
              // Because an address comprises several individual pieces of data, it is its own object.
                "@type" : "PostalAdress",
                "streetAddress" : "123 Main St.",
                "addressLocality" : "Townville City",
                "addressRegion" : "CA",
                "postalCode" : "12345"
            },
            "geo" : {
              // You can specify the geographic coordinates of a location. This is especially helpful if it doesn't match up precisely with the Postal Address.
              // Geo Coordinates is a self-contained object.
                "@type" : "GeoCoordinates"
                "latitude" : "37.42242"
                "longitude" : "-122.0858"
            },
            "openingHours" : "Mo-Sa 08:00-20:00",
              // Opening Hours uses a special format for specifying time ranges. See details below.
            "photo" : "http://example.com/location/photo.jpg"
              // A link to a photo.
        }
      ],
      "founder" : {
        // A founder is a person (or an array of persons).
        "@type" : "Person",
        "name" : "Sheila D. Founder",
        "url" : "http://example.com/people/sheila-founder",
          // For people within a company, it is best to have "URL" point to a company-specific profile of the person.
          // Additional details about the person, like contact information and image, can be specified at their individual profile page.
          // Their personal home page or blog (if they have one) should appear under "sameAs".
        "sameAs" : [
          // As with the Organization.sameAs attribute, this is used to specify Social Media links and context-defining public profiles like Wikipedia.
          "http://wikipedia.org/sheila-founder",
          "http://www.facebook.com/sheila-founder",
          "http://www.twitter.com/sheilaFounder",
          "http://plus.google.com/sheila_founder",
          "http://sheila-founder-personal-blog.blogspotpress.com"
      ],
      "employee" : [
        // Employee is an array of people, wth each item in the array being a curly-bracketed Person object.
        // There is no need to include every single person in your organization here. The most important people you would like to appear in a Knowledge Graph display is enough. (Although there's no reason not to list many employees if you want to.)
        {
            "@type" : "Kyle C. Eo"
            "jobTitle" : "CEO"
            "url" : "http://example.com/people/kyle-eo",
            "sameAs" : [
              "http://www.facebook.com/kyle-eo",
              "http://www.twitter.com/kyleEo",
              "http://plus.google.com/kyle_eo"
            ]
        },
        {
            "@type" : "Moe Knee"
            "jobTitle" : "CFO"
            "url" : "http://example.com/people/moe-knee",
            "sameAs" : [
              "http://www.facebook.com/moe-knee",
              "http://www.twitter.com/moeKnee",
              "http://plus.google.com/moe_knee"
            ] // Closing the sameAs array bracket.
        } // Closing the Person object bracket.
      ] // Closing the Employee array bracket 
    } // Closing the Organization object bracket.
 </script> <!-- Closing the script tag -->

The openingHours format can be a little confusing. It works like this: - Days are specified with a two-letter code: Mo, Tu, We, Th, Fr, Sa, Su - Time is specified in a 24 hour format. For example, 10pm is 22:00. - A single hours listing specifies one or more days which have the same hours, followed by those hours: - "Mo-Fr 8:00-17:00" - "Mo,Th,Sa 9:00-14:00" - "Su-We,Fr 15:00-22:00" - To specify being open all day (24 hours), omit the hours: - Mo-Fr (Open 24 hrs during the week, but closed on weekends.) - Mo-Su (Open 24/7.) - If times are different on different days, the simplest solution is a square-bracketed array of hours: [ "Mo-Fr 8:00-22:00", "Sa 10:00-15:00"]

A number of other things can be included in the markup about your organization, such as the year it was founded, its industry, products that it sells, and areas serviced. You can can also use more specific types of Organization definitions, like NGO, Corporation, or University.

For a complete detail of what properties can be attributed to an Organization, see the Organization specification at schema.org.

Person Profile

You should also include data about yourself or important individuals within your organization.

The place to do this would be: - As a standalone object, in the markup of a page about that person — a profile page, an author archive page, a bio page. - On your own personal home page or blog. - As an object embedded within a larger object describing something the person is connected to — the organization they work for, a book they authored, an event they are organizing.

Typically, the best practice is to provide a complete profile of the person on their individual profile or home page (a page all about them), and then provide a link to that page when including them as a value in a larger object (as illustrated in the code sample above).

There are a number of biographical details that can be included in an object describing a Person. Some of the more important ones include:

  • Name
  • Home Page
  • Social Media Profiles
  • Birtdate
  • Picture
  • Companies the person works for
  • Important family members

Here is an example JSON object describing a person.


<script type="application/ld+json">
{
    "@context" : "http://schema.org",
    "@type" : "Person",
    "name" : "Sheila D. Founder",
    "url" : "http://example.com/people/sheila-founder",
    "image" : "http://example.com/images/sheila-founder.jpg"
    "sameAs" : [
      "http://wikipedia.org/sheila-founder",
      "http://www.facebook.com/sheila-founder",
      "http://www.twitter.com/sheilaFounder",
      "http://plus.google.com/sheila_founder",
      "http://sheila-founder-personal-blog.blogspotpress.com"
    ],
    "alumniOf" : {
        "type" : "@EducationalOrganization",
        "name" : "Florida State University",
        "url" : "http://fsu.edu",
        "sameAs" : "http://en.wikipedia.org/wiki/Florida_State_University"
    },
    "birthDate" : "1980-08-22",
    "email" : "[email protected]",
    "jobTitle" : "Founder",
    "worksFor" : {
        "type" : "@Organization",
        "name" : "Example Company Name",
        "url" : "http://example.com",
        "sameAs" : [
          "http://wikipedia.org/example-corp",
          "http://www.facebook.com/examplecorp-profile",
          "http://www.twitter.com/exampleCorp",
          "http://plus.google.com/example_corp"
        ]
    }
    "spouse" : {
        "@type" : "Person",
        "name" : "Moe Knee",
        "url" : "http://example.com/people/moe-knee",
        "sameAs" : [
          "http://www.facebook.com/moe-knee",
          "http://www.twitter.com/moeKnee",
          "http://plus.google.com/moe_knee"
        ]
    },
    "description" : "Sheila D. Founder is a world-recognized entrepreneur, working in the field of Semantic Markup and Example Code Writing."
}
</script>

Here’s the same example again with comments. Try to see if you understand the above before reading the explanation.

<script type="application/ld+json">
 // As always, place the entire object inside a <script> tag.
{
    "@context" : "http://schema.org",
      // Define the vocabulary you are using.
    "@type" : "Person",
      // This object defines a person.
    "name" : "Sheila D. Founder",
      // The person's full name.
      // You can optionally declare separate properties for "givenName", "additionalName", and "familyName".
    "url" : "http://example.com/people/sheila-founder",
      // The canonical URL for this person's profile page within this domain.
    "image" : "http://example.com/images/sheila-founder.jpg"
      // A link to an image of this person.
    "sameAs" : [
      // A list of links to Social Media profiles, as well as the Wiipedia article.
      "http://wikipedia.org/sheila-founder",
      "http://www.facebook.com/sheila-founder",
      "http://www.twitter.com/sheilaFounder",
      "http://plus.google.com/sheila_founder",
      "http://sheila-founder-personal-blog.blogspotpress.com"
    ],
    "alumniOf" : {
      // The value of alumniOf must be an Educational Organization (or a more specific type, like University). The Educational Organization is another object, so it is contained within curly brackets.
        "type" : "@EducationalOrganization",
        "name" : "Florida State University",
        "url" : "http://fsu.edu",
        "sameAs" : "http://en.wikipedia.org/wiki/Florida_State_University"
      // As with most linked data object, the best practice is to include a minimum amount of information here, and then (if desired) include the rest of the information on a page strictly about that other thing.
    },
    "birthDate" : "1980-08-22",
      // BirthDate is a date in the format YYYY-MM-DD.
      // BirthPlace, deathDate, and deathPlace can all be added in as well.
      // BirthPlace and DeatPlace are both instances of "location."
    "email" : "[email protected]",
    "jobTitle" : "Founder",
    "worksFor" : {
      // The worksFor attribute also has an object for its value, in this case an Organization. 
      // Again, only enough information is provided to make it clear what organization is being referenced, all other data is reserved for a page just about that organization.
        "type" : "@Organization",
        "name" : "Example Company Name",
        "url" : "http://example.com",
        "sameAs" : [
          "http://wikipedia.org/example-corp",
          "http://www.facebook.com/examplecorp-profile",
          "http://www.twitter.com/exampleCorp",
          "http://plus.google.com/example_corp"
        ]
    }
    "spouse" : {
      // The value of spouse us a person.
      // Many other relationships to people can be defined, such parent, child, sibling, coworker, colleague, and follower.
        "@type" : "Person",
        "name" : "Moe Knee",
        "url" : "http://example.com/people/moe-knee",
        "sameAs" : [
          "http://www.facebook.com/moe-knee",
          "http://www.twitter.com/moeKnee",
          "http://plus.google.com/moe_knee"
        ]
    },
    "description" : "Sheila D. Founder is a world-recognized entrepreneur, working in the field of Semantic Markup and Example Code Writing."
}
</script>

There are a number of additional attributes of a person that you can include in the semantic markup of a site, but those are the main ones that might show up in the Google Knowledge Graph. For a complete list of the properties that can be included, see the complete Person reference at schema.org.

How to get this to show up on Google

Here’s the thing about all this special data: Google doesn’t guarantee that they will show any of it.

How could they?

If simply publishing data was enough to get it to display on the front page of Google, then there would be a A LOT of spam and misinformation available. Google is trying to provide the best possible experience to its users, not create a platform for spammers and awful marketing.

There are two things you need to demonstrate before Google will accept your data into the Knowledge Graph:

  • trust
  • notability

This comes down to classic SEO issues: you need high-quality links that demonstrate your site is trustworthy and notable. You need good on-page content.

Additionally, there are some key things to pay attention to:

  • Make sure your structured data is correctly formatted and valid. Use Google’s own structured data testing tool.
  • All semantically marked-up data inside the JSON object should also appear in the human-readable text of the page.
  • Use multiple Social Media networks, especially Google+, and make sure that your information is the same in all of them and that it matches what you have put into your markup and in your human readable content.
  • Your company name and your domain name should be reasonably similar.
  • Neither your company name nor your domain name should be misleading.
  • Having a Wikipedia article, and linking to it with the sameAs property can help illustrate notability. Unfortunately Wikipedia has its own notability guidelines so you can’t use that as a way to fake being notable. You actually have to do something worth having a Wikipedia page about.
  • It is not necessary to repeat your semantic markup on every page of your site. The details of your Organization should be on your home page, the details of each person should be on their profile or author archive page, and other types of content should be marked up to highlight the content as described below.

Google’s Rich Snippets and Enhanced Search Results

In addition to the Knowledge Graph, which display information about specific people, places, and organization, you can also use structured data to enhance the way your site appears in Search Results, when it appears naturally based on search queries.

(It is probably the case that doing these things will additionally make it more likely that your site will appear in search.)

Some of these enhancements are site-wide things that will affect how your site is displayed whenever any page appears in a search result. Others are specific to particular kinds of content.

Enhanced search result display possibilities include:

  • Displaying navigation breadcrumbs
  • Displaying site name
  • Enabling audio and video content to play from the results screen
  • Rich snippets for type-specific content:
    • Products
    • Recipes
    • Reviews
    • Events
    • Apps
    • Videos
    • News Articles
  • Event Promotion
  • Site-specific search box

This gets very complicated because everyone’s content is different. How you markup your page is specific to what type of content you have. Additionally, the details for how to implement this are constantly being refined.

For complete details on how to add these enhancements to your site, see the Google documentation on structured data for developers.

Conclusion

Facebook and Google each have their own needs and opportunities when it comes to embedding semantic data. It can add a lot of extra work and complication to web page creation, but it can have a huge impact on how your site appears in search results and links from social networks.

The important thing to keep in mind here is that Google and Facebook, and other search engine and social networks, want to present your content in the most positive light, consistent with your message. They provide the tools and information you need to allow them to do that.

Other Formats and Conclusions

This chapter wraps up the topic of semantic markup. It looks at alternative markup formats for including semantic data and provides some practical strategies and tips for implementing semantic markup on your website.

Other Formats for Data Markup

In addition to including all the semantic data in JSON objects, it is possible to embed the semantic meanings directly into the content markup that powers the human-readable portion of your site.

There are two “competing” formats for including the data this way: - Microdata - RDFa

They are remarkably similar, and involve adding semantic names as attributes of elements, with the usual assumption that the value of the name-value pair is the content of that element.

Here are examples of Microdata and RDFa


<!-- Microdata -->

<div itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">Sheila D. Founder</span>
  <img src="sheila-founder.jpg" itemprop="image" alt="Photo of Sheila Founder"/>
  <span itemprop="jobTitle">Founder</span>
  <span itemprop="telephone">(321) 555-4567</span>
  <a href="mailto:[email protected]" itemprop="email">
    [email protected]</a>
  <a href="http://example.com/people/sheila-founder" itemprop="url">Sheila's bio</a>
  Find Sheila on the Web at:
  <a href="http://facebook.com/sheila-founder" itemprop="sameAs">
    Facebook</a>
  <a href="http://twitter.com/sheila" itemprop="sameAs">
    Twitter</a>
</div>


<!-- RDFa -->

<div vocab="http://schema.org/" typeof="Person">
  <span property="name">Sheila D. Founder</span>
  <img src="sheila-founder.jpg" property="image" alt="Photo of Sheila D. Founder"/>
  <span property="jobTitle">Founder</span>
  <span property="telephone">(321) 555-4567</span>
  <a href="mailto:[email protected]" property="email">
    [email protected]</a>
  <a href="http://example.com/people/sheila-founder" property="url">Sheila's bio</a>
  Find Sheila on the Web at:
  <a href="http://facebook.com/sheila-founder" property="sameAs">
    Facebook</a>
  <a href="http://twitter.com/sheila" property="sameAs">
    Twitter</a>
</div>

As you can see, the difference between the two formats is a minor issue of attribute names and a few other very tiny differences.

Unfortunately, this method of doing things leads to very hard to read markup. It takes longer to create, is harder to edit, and ultimately uses more bandwidth. It also can’t be immediately consumed by a JavaScript applications, whereas JSON can be, because it literally is JavaScript.

Additionally, remembering what attribute names to use and keeping track of how sub-items are nested in the the HTML structure properly makes this type of markup a real chore.

You may run across these formats, so it is good to be familiar with them. But if you are building a new site from scratch, go with JSON+LD.

Another advantage of the JSOM method is that it can be added to an existing site without have to rewrite any existing display code.

How to Do All This Stuff

Adding semantic content to a website seems like a lot of trouble, and it certainly does create more work in launching and maintaining a site. However, the benefits — both to oneself through increased traffic and to the world through a better semantic web — are worth the effort.

Also, it doesn’t have to be a huge chore to take care of these things.

If you are using a Content Management System, it doesn’t take much to simply make sure that structured data is included into the edit panel for appropriate content types and outputted properly in the template files. Most of the major CMSes (WordPress, Drupal, Joomla) and most of the major Ecommerce systems either include structured markup by default or have available plugins for handling it easily.

Google, Facebook, and other consumers of semantic data have code validators and tutorials to help you make sure that you are providing the data properly.

All you have to do is use them.

Other things to keep in mind

  • Adding a lot of semantic data to your pages is an additional SEO tool, not a replacement for traditional SEO practices like Link Building and creating unique content. Without links and good content, there is no way for Google to know whether it should trust your semantic data or not.

  • All semantically marked up data should also be visible to human viewers of the website.

  • Semantic data about a person or an organization does not need to appear on every page of your site. A completely fleshed-out JSON description of any particular thing should appear once, on the page that is most specifically about that one thing.

  • <script> elements that contain semantic data should embed the data directly into the HTML document in the contents of the script tag. Do not place this information into a separate JSON file.

Summary

The web is moving past the age of flat documents intended to be read by humans. The coming wave of innovation in search, AI, accessibility, and knowledge gathering involves semantics — the understanding of meaning.

This work in moving toward a semantic web — a web of meaning and not just content — is happening from two directions. The developers of search engines and artificial intelligence are trying to make their programs understand the unstructured mess of normal human communication, while the web authoring community is trying to meet them part-way by making normal human communication less messy.

As a website owner or developer, you can assist in this global movement, and position your own content as strongly as possible, by making sure that your site is well-structured and semantically meaningful.

Broadly speaking, the work of making a website more semantically meaningful and accessible involves three layers of activity:

  • Proper Semantic Structuring
  • Text-level semantics
  • Embedded data

Taken together, these three strategies will make a website easier to understand, both by search engine bots and by accessibility tools such as screen readers for the blind. An additional benefit is that a focus on semantics helps web authors focus on the meaning and purpose of their content, which generally leads to better content and better design.

HTML and CSS — A Brief Introduction

This chapter introduces the web’s styling language — CSS — and explains how it interacts with HTML. Though this is not a complete guide to the subject, it provides a fairly in-depth conceptual look at site design. Topics covered include structural CSS, the “box model,” text styling, typography, animations, responsive design, and the use of front-end frameworks.

What is CSS?

CSS stands for Cascading Style Sheets. It is a language used to define how an HTML document should be displayed on a page.

It is called a “style sheet,” in reference to the idea that a document should contain all the content, and only the content, and that a separate document or sheet should contain information about styles.

It is called “cascading” because styles related to text display “cascade” from parent to child elements. For example, if the CSS for a paragraph ( <p> ) sets the text color to blue, then a span of bold ( <strong> ) or italic ( <em> ) text inside that paragraph would also be blue, unless a new style declaration changes it.

How CSS works — Basic overview

How to Include CSS in a document

CSS can be included inside a <style> element in a document, or in separate .css files (called “style sheets”), which are linked to from the HTML document, inside the head.

<head>

 <link rel="stylesheet" type="text/css" href="theme.css">

 <style>
  p {
    font-family: Georgia, "Times New Roman", serif;
  }
 </style>

You can link to multiple stylesheets in a single document, and include more than one <style> tag.

It is almost always a better practice to include CSS via a linked stylesheet than to embed a <style> element on page. There are specific instances where embedded styles make sense (email, for example) but the general rule of thumb is — if you can link to a stylesheet, you should.

Style declarations

A stylesheet takes the form of a series of style declarations. These are notated as follows:


[selector] {
    [attribute]: [value];
    [attribute]: [value];
}

/* comments here */

That is:

  • A selector, or element identifier, specifying what is being styled. These include the following (there are more):
    • The name of a type of element: a, p, dl, etc. This applies styles to all elements of that type.
    • A class identifier — the name of a class, prefixed with a dot ( . ).
    • An ID identifier — the name of an ID, prefixed with the hash sign ( # ).
    • One of the above, plus some other specialty selector, such as a pseudoclass like :hover.
  • An opening curly brace, signifying the beginning of style rules regarding the given element.
  • Style rules expressed as attribute-value pairs linked with colons and terminated with semi-colons.
  • A closing curly-brace denoting the end of the style rules for that element.

For example:


html {
    color: #222222; /* text color - very dark grey */
    font-family: Georgia, "Times New Roman", Garamond, serif; 
    font-size: 14px;
    line-height: 22px;
}

/* This is a comment. */

#logo {   /* Style by element ID */
    color: #B20000;
    font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
}

.widget {  /* Style by element class. */
    width: 100%;
    border: solid 2px black;
    padding: 22px 11px;
}

a {
    color: #008AE6;
}

a:hover {
    color: #006EB8;
    text-decoration: underline;
}

The Document Tree

Many people simply think of an HTML document as a linear structure: first the <head>, then the <body>, then the <menu>, then the <main>.

However, an HTML document is really a tree:

  • HTML
    • Head
    • Title
    • Meta
    • Body
    • Header
      • H1
      • nav
    • Main
      • p
        • a
        • a
        • strong
      • ol
        • li
        • li
        • li
          • a
    • Aside
      • section
      • section
      • section
    • Footer

Each element is nested inside another element.

In CSS, some style declarations affect the size or shape of the element itself. This has no effect on the elements inside it.

But other styles affect the contents of the element — styles like text color, background color, font size, spacing. These styles are said to “cascade” down the document tree. All the contents, including other elements, are affected unless a different style declaration overrides it at a more specific point.

Consider the following example:


<h1><a href="">Anchor inside a headline</a></h1>

<a href=""><h1>`Headline inside an anchor</h1></a>

a {
    color: blue;
}

h1 {
    color: green;
}

Which will be blue and which will be green?

In this case, the first line (anchor inside a headline) will be blue, while the other will be green.

Why?

In the first example, the <h1> style is green, so everything inside it should be green. But then, inside of that, is an anchor that styles text blue. The anchor is more specific — closer to the text being styled — so its blue styling prevails.

The opposite is the case in the second example. The <a> element should have blue contents, but that style is overridden by the headline, which turns its contents green.

CSS Selectors

Every CSS declaration begins with a selector. This can be a type of element, a class, and ID, or a number of other things.

There is way to target just about any element in a document — not just by class and ID, but by location in a document, context, other attributes.

Selector declarations can also be combined. For example:

#container div {
}

This selects all <div> elements which are inside of the #container element.

Here are some of the more important selectors and selector formats.

  • * — Selects all elements.
  • element-name — Selects all elements of a particular type.
  • .class-name — Selects elements by class name.
  • #id — Selects elements by ID.
  • element-name.class-name — Selects elements of a particular type that match the class name.
  • selector1, selector2 — Selectors separated by commas match all elements indicated by either selector1 or selector2.
  • selector1 selector2 — One selector following another selector, with only a space, selects all elements matching selector2, which are inside of an element matching selector1.
  • selector1>selector2 — Selects all elements matching selector2 which are immediate children of selector1.
  • [attribute] — An attribute name inside square brackets selects all elements which have that attribute, whatever its value.
  • [attribute=value] — Selects elements which have the specific attribute-value pair.
  • [attribute~=string] — Selects elements in which the specified attribute’s value contains the indicated string.
  • selector:first-child — Selects elements which are the first child of their parent elements. This can be used as main p:first-child (for example) to style the opening paragraph on a page.
  • selector::first-letter — Selects the first letter of the matching element. Can be used to creat drop-cap effects.
  • selector::first-line — Selects the first line of text in the matching element.
  • input:focus — Styles a form input when it had focus.
  • a:hover, a:active, a:visited — Styles anchor links at various points in their interaction cycle.

There are many more in addition to these.

CSS and Page Structure — The Box Model

The CSS Box Model is an explanation of the way CSS displays and positions block-level elements.

What are Block elements?

Block-level elements are HTML elements which have width and height and (by default) a line break before and after. They represents blocks of content. (This is in contrast to inline elements, which represent spans of text and do not create new lines by default.)

There are a number of elements which are block-level by default:

  • <address> — Contact information.
  • <article> — Article content.
  • <aside> — Aside content.
  • <audio> — Audio player.
  • <blockquote> — Long (“block”) quotation.
  • <canvas> — Drawing canvas.
  • <dd> — Definition description.
  • <div> — Document division.
  • <dl> — Definition list.
  • <fieldset> — Field set label.
  • <figcaption> — Figure caption.
  • <figure> — Media (usually an image) with a caption.
  • <footer> — Section or page footer.
  • <form> — Input form.
  • <h1>, <h2>, <h3>, <h4>, <h5>, <h6> — Headlines.
  • <header> — Section or page header.
  • <hr> — Horizontal rule (dividing line).
  • <main> — Contains the central content unique to this document.
  • <nav> — Contains navigation links.
  • <noscript> — Content to use if scripting is not supported or turned off.
  • <ol> — Ordered list.
  • <output> — Form output.
  • <p> — Paragraph.
  • <pre> — Preformatted text.
  • <section> — Section of a web page.
  • <table> — Table.
  • <tfoot> — Table footer.
  • <ul> — Unordered list.
  • <video> — Video player.

You can also cause any element to behave like a block-level element by assigning it the style display: block;.

Width and Height

By default, the width of a block-level element is 100% of its containing block-level element, inclusive of any margins, border, or padding. (That is, the entire elements, including margin, border, and padding, will fit inside its container.)

The default height of a block-level element is the height of all its content, plus any margins, border, or padding.

Most of the time, in designing a page layout, you want to specify the width of elements, but not their height. This is because the width of a display window is a fixed size on any given display, but the page can scroll up and down along any height.

The weird thing about specifying width (and height, but you don’t do it as often) is that the width you specify will not be the total width of the element.

Margins, border, and padding

In addition to the content of an element, the total width and total height are determined by three other attributes:

  • margin — The area around an element.
  • border — A line around the perimeter of the element.
  • padding — Space just inside the perimeter of the element.

Any background declarations (such as background, background-color, or background-image) cover the content area, the padding, and the border. The margin does not display the background.

<style>
.field {
    width: 100%;
    background-color: #66FF33; /* Lime green*/
}
.inside {
    width: 100px;
    margin: 25px;
    border: dashed 15px black;
    padding: 25px;
    background-color: #3366FF;
    color: #003366;
    font-weight: bold;
} 
</style> 

<div class="field">
 <div class="inside">
  This is some text inside the inside. Notice that it is set away from the inside edge. That is caused by the padding.  
 </div>
</div>
This is some text inside the inside. Notice that it is set away from the inside edge. That is caused by the padding.

Notice: - Margin, border, and padding are added to the width declared by the CSS. - The height is determined by the content. - The margin of inside pushes it away from the left side of field, but the same is not true for the top and bottom. This is a quirk of CSS. To push the inside element away from the top of its container you would add padding to the containing element. - Margin on the left and right of an element affect its relationship with its parent element AND with sibling elements. - Margin on the top and bottom of an element affects only its relationship with sibling elements. - The background color fills the area of the content, the padding, and the border, but not the margin.

Because the total width includes the declared width and also the width of any margins, border, and padding, the following declaration does not work:


div {
    width: 100%;
    margin: 5px;
    padding: 15px;
}

If you do something like this, you’ll find that the <div> expands past the right side of its containing element by 40px.

In this example, because the idea seems to be to cause the <div> to fill the full width of its container, the right thing to do would be to simply omit the width declaration. This will cause the element to simply fill the width of its container automatically, with no overflow.


div {
    margin: 5px;
    padding: 15px;
}

However, if you want to have an element that only takes up half of the available width, in order to have side-by-side columns, you’ll have to do thing a little differently.

Floating Elements

The default behavior of block level elements is for them to fill the full width of their container and create a line break before and after themselves. IF you place several block level elements in a series, they will simply appear straight down the page, each one below the previous one:

<style>
#container {
    width: 400px;
    background-color: #eeeeee;
    padding: 20px;
}
#container div {  /* Targets all divs that are children elements of #container. */
    height: 50px;
}
#red {
    background-color: red;
}
#blue {
    background-color: blue;
}
#green {
    background-color: green;
}
</style> 

<div id="container">
<div id="red"></div>
<div id="blue"></div>
<div id="green"></div>
</div>

Even if we were to make each of the inner <div> small enough that they could sit next to each other in a row, the line break will still be there.

<style>
#container {
    width: 400px;
    background-color: #eeeeee;
    padding: 20px;
}
#container div {  
    height: 50px;
    width: 50px;
}
#red {
    background-color: red;
}
#blue {
    background-color: blue;
}
#green {
    background-color: green;
}
</style> 

<div id="container">
<div id="red"></div>
<div id="blue"></div>
<div id="green"></div>
</div>

In order to make allow them to sit next to each, they must be allowed to “float.” In CSS, “float” means to allow other elements of the document to flow around the floating element. A block element can be floated to the left, right or center (center is unusual). If multiple sibling elements are set to float, they will line up next to each other, separated by their margins.

To make these three colored boxed site next to each other, we just need to add float: left or float: right to all three of them.


<style>
#container {
    width: 400px;
    background-color: #eeeeee;
    padding: 20px;
}
#container div {  
    height: 50px;
    width: 50px;
    float: left;
}
#red {
    background-color: red;
}
#blue {
    background-color: blue;
}
#green {
    background-color: green;
}
</style> 

<div id="container">
<div id="red"></div>
<div id="blue"></div>
<div id="green"></div>
</div>

Notice:

  • The first colored block ( #red ) is at the left side of the container, followed by #blue and #green.
  • They have no margin, so they are directly next to each other.
  • The padding on the inside of the #container is pushing the blocks down and away from the upper left hand corner.

But – oh no — what is going on with the #container element? Why are the colored blocks hanging out of it?

Here’s the problem with floats: a floated element, by default, does not contribute to the height of its container. So the height of the container is determined by the sum of:

  • its height declaration if it has one (this one doesn’t) OR its non-floated content if it doesn’t have a declared height (in this case, it also doesn’t have any)
  • vertical padding (top and bottom)
  • vertical borders (top and bottom)
  • vertical margin (top and bottom)

The inner height of the #container element in this case is zero, and the total height is only 2x the size of its padding.

This is a very annoying, and very common, problem. The solution (a bit of a hack) is called the clearfix solution. There are several options for how to acocmplish this, but for our examples we are going to use the simplest one, adding overflow: auto to the #container. This does not work in all browsers or in all contexts, but it will work well enough on most browsers that it is fine for our examples here.

Here is the result of adding the clearfix.


<style>
#container {
    width: 400px;
    background-color: #eeeeee;
    padding: 20px;
    overflow: auto;
}
#container div {  
    height: 50px;
    width: 50px;
    float: left;
}
#red {
    background-color: red;
}
#blue {
    background-color: blue;
}
#green {
    background-color: green;
}
</style> 

<div id="container">
<div id="red"></div>
<div id="blue"></div>
<div id="green"></div>
</div>

Now, if we add some margins, padding, content, and less garish colors, you can see how this basic idea can be turned into a standard content+sidebar layout.

<style>
#container {
    background-color: #eeeeee;
    overflow: auto;
    padding: 20px;
    width: 600px;
}

main {
    float: left;
    background-color: #efefef;
    width: 300px;
    padding: 15px;
    margin-right: 20px;
}

aside {
    float: left;
    background-color: #dedede;
    padding: 15px;
    width: 220px;
}
</style>

<div id="container">
 <main>
  <h2>Lorem Ipsum</h2>
  <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur pretium, mi eu elementum ullamcorper, dui justo egestas turpis, sed auctor turpis tellus eget augue. Quisque vel malesuada erat. Vestibulum non felis et turpis iaculis iaculis.</p>

  <p>In arcu metus, finibus id dolor a, interdum lacinia lectus. Vestibulum vulputate neque eget ante tincidunt sodales. Quisque efficitur a turpis nec scelerisque. Donec commodo, diam id consequat sodales, justo quam posuere libero, non fringilla ante dui id tortor. Sed efficitur in ipsum nec pellentesque. </p>
 </main>
 <aside>
  <h3>Archives</h3>
  <ul>
   <li>May 2015</li>
   <li>April 2015</li>
   <li>March 2015</li>
   <li>February 2015</li>
   <li>January 2015</li>
  </ul>
 </aside>
</div>

Lorem Ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur pretium, mi eu elementum ullamcorper, dui justo egestas turpis, sed auctor turpis tellus eget augue. Quisque vel malesuada erat. Vestibulum non felis et turpis iaculis iaculis.

In arcu metus, finibus id dolor a, interdum lacinia lectus. Vestibulum vulputate neque eget ante tincidunt sodales. Quisque efficitur a turpis nec scelerisque. Donec commodo, diam id consequat sodales, justo quam posuere libero, non fringilla ante dui id tortor. Sed efficitur in ipsum nec pellentesque.

#### __Extra

s__

In a perfect world, every element in an HTML document would be semantically significant, related only to the content of the document, and never added solely to support presentational styles.

The real world is far from perfect, unfortunately. Things have gotten a lot better over the last decade, but there are still times when you need to add an extraneous <div> element for no reason except it makes the page display the right way.

Most blogs — probably most websites — follow a typical pattern:

  • Header section that spans the width of the page
  • Central content area divided into two columns:
    • 2/3 of the width for primary content
    • 1/3 of the width for a widget sidebar
  • Footer section spanning the width of the page.

The ideal markup for this document structure would be:


<header>
 <!-- Logo and menu here -->
</header>

<main>
 <!-- Primary content here -->
</main>

<aside>
 <!-- Sidebar content here -->
</aside>

<footer>
 <!-- Footer content here -->
</footer>

Unfortunately, there’s almost no way to cause the <main> and <aside> elements to sit next to each other properly unless they can float inside of a containing element.

Therefore, the most common markup structure for a typical blog layout actually looks like this:


<header>
 <!-- Logo and menu here -->
</header>

<div id="container">
 <main>
  <!-- Primary content here -->
 </main>
 <aside>
  <!-- Sidebar content here -->
 </aside>
</div>

<footer>
 <!-- Footer content here -->
</footer>

It will often be the case that similar container elements are needed in order to get a page to display correctly. Don’t worry about it — the web is as much a visual medium as it is a collection of text-based documents.

CSS for Text Styling

CSS is used for all aspects of document presentation, including both page structure and text styling. There could be a whole book on CSS and typography, but a short introduction to key concepts will do for this guide.

The Cascade

When thinking about text styling in CSS, it is especially important to remember the cascade part of Cascading Style Sheets.

When you add a style that affects contents (as opposed to element shape, size, or placement), the style cascades down the document tree. So any styles applied to <body> affect all the content, while any style applied to <main> affects all the content inside it, but not outside of it.

The practical implication of this is that you should build your stylesheet from the general to the specific. You should usually set a single base font in the <body>, which covers the majority of your content, and then set any fonts that deviate from that (such as for menus or buttons) at the specific point where they are different.

Any style declaration having to do with typography should be included at the most general (highest in the tree) point that it applies to. This will help avoid duplicating styles.

The Styles

The style declarations most relevant to text styling and typography are:

font-family : Specifies the font(s) to use. : Best practice is to declare a list of font, starting with your preferred font and a series of fallbacks, ending with a a generic font family. : Font names that include spaces must be wrapped in quote marks. : font-family: Garamond, Georgia, "Times New Roman", serif;

font-size : Specifies the size of the text. : Can be a size in pixels ( 14px ), typographical ems based on the parent’s text size ( 1em ), a percentage based on the parent’s text size ( 115%), a descriptive size name ( small, medium, large), or a comparative descriptor based on the parent’s text size ( smaller, larger ). : font-size: 14px;

line-height : Specifies the height of a line of text. : This should usually be larger than the font-size. A line of text is typically centered inside the line-height. : Line height can be specified in pixels, ems, or percentages. It can also be set as a multiplier of the font-size; this is done by omitting a units suffix. : line-height: 1.4;

margin-bottom : The margin below paragraphs, headlines, lists, and other typographical elements contributes to the overall readability. : This attribute has to be set on each affected element, not on the document or section. : Typically, setting the margin-bottom equal to the line-height created the best results. : margin-bottom: 1.4em;

color : The default color of text is black. You may wish to change this to a something less stark. : color: #111111; : Also, links have a set of default colors to denote whether or not they have been visited, and a color denoting their active state. : Typically, the default link color is not particularly desirable. In order to change color for various states of the anchor tag use: a:link (default), a:hover, a:visited, and a:active. : a:hover { color: #3399FF; }

font-weight : Font-weight is the property that controls whether text is bold or normal. : According to the specification, you can use a numeric value to set the weight with great precision. However, this is not well supported by most fonts or browsers, and it it not a common practice. : The most common use of font-weight is to simply specify bold. : strong { font-weight: bold; }

font-style : Font style is used to indicate italic type. : Relevant values are normal and italic. (A third option, oblique, is rarely used. : em { font-style: italic; }

text-decoration : This is used to add a line to a span of text: over (overline — rarely used), through (line-through — used for deleted text), or under (underline — used for links and sometimes for headings). : a { text-decoration: underline; }

text-transform : This property allows you to control the capitalization of text. : Relevant values are: capitalize (Title Case), uppercase (ALL CAPS), and lowercase (all lower case). : h3 { text-transform: uppercase; }

font-variant : This is used to specify small-caps. It sets all the lowercase letters in the content to uppercase letters at a smaller font size. : h1 { font-variant: small-caps; }

@font-face : This is not a property assigned to an element, but acts as a selector in a CSS document. It is used to define a new font family. This allows designers to specify any font they wish, instead of just relying on the system fonts of client computers. : The font-family property defines the name of the new font (to be used in the font-family property of other elements), and the src property specifies a font file. : Not all font file formats are supported on all browsers. The most widely supported formats are TTF, OTF, and WOFF (but not WOFF2). : If variations of a font (bold, italic) require separate font files, multiple @font-face declarations can be set, each one with a different src property, and with additional properties specifying their context ( for example — font-weight: bold;) : @font-face { font-family: myNewFont; src: url(myNewFont.woff; }

Typography Tips

Web typography is tricky. The default display of text in most browsers is really unattractive, and getting text to look good, and be readable, is a non-trivial problem.

If you are designing a site’s typography, keep the following tips in mind.

  • Use a text page that includes all possible typographical styles. Many web typography designers forget about description lists ( <dl> ), blockquotes ( <blockquote> ), headlines after <h3> ( <h4>, <h5>, <h6> ), and other rarely-used styles. However, even though they are rarely-used, they do get used. Be sure to include nested lists, images with and without captions, and code samples as well (if you write about technology).

  • Notice how elements’ margins interact with each other and with their context. For example, it is common to put a margin-top on headlines. This might make sense in some contexts, but if its the first element in an <article>, this might not look right. Similarly, a sectional headline after a paragraph might end up duplicating a line-space, creating more of a visual break than desired.

  • Use the value of line-height as a scale rule to keep the page’s text in a constant rhythm. Fonts larger than the global line-height (headlines, for example) should have their line-height set to a multiple of the global value. Also use the line-height value for margins below elements and for set-in (indented) sections.

  • It is common to identify paragraphs with a double line-break (accomplished with a margin-bottom set to the same value as line-height). It is also common to not indent paragraphs.

  • Lists tend to look better set in (margin-left) about the same distance as line-height (or a multiple thereof).

  • Description lists have really bad default styling. They generally look best with the entire list being set-in, the <dt> in bold, and the <dd> set-in even further.

  • It is very common to set the primary font size for a page to 12px, but this is not very readable. Consider 14px or even 16px as a global base font size (depending on your font-family).

  • It is very common to leave the font color black (the default), but usually a very dark gray is more attractive and readable.

  • Links ( <a> ) have been blue by default in almost every browser since the beginning of the internet. Even if you want to change the color of links to fit your branding (and you should), it should probably be a shade of blue. Also, be careful about using blue text in other places, as it may signal to users they should be able to click the text.

  • Similarly, underlined text is a near-universal signal that something is clickable. Change this convention at your own peril.

  • It used to be common advice to avoid serif fonts on the web, and to use sans serif fonts only. This was good advice because serif fonts do not render as well in low-resolution screens. However, this is no longer a concern for most people. Using a serif font (such as Georgia, Times, or Garamond) can make text more readable.

  • Generally, you don’t want more than three fonts on a page:
    • A content font, which can either be serif or sans.
    • A menu and navigation font, which should usually be sans serif, and may also be the same as your content font.
    • A font for code samples (if you do that sort of thing), which should be a monospace font like Courier or Console.
  • The “measure” of a text is the number of characters per line. A measure more than 80 characters long becomes unreadable. Most experts would set the ideal measure at about 65 characters per line. This is a function of the width of your content area and the size of your font.

  • If you use @font-face to import your own font onto a page, make sure to test how this looks on multiple browsers. Some browsers render certain fonts poorly. Also be sure to include fallback fonts in your font-family declarations — you cannot always count on @font-face to work in every situation.

CSS Animation

CSS allows you to animate HTML elements without Javascript. For simple effects, this can be quite convenient. It renders faster than similar effects created in Javascript, and will be supported even in browsers that have Javascript disabled.

Keyframes

An animation is defined with a CSS selector called @keyframes. An @keyframes declaration has a name, and a set of style rules for various points (key frames) in the animation.


@keyframes colorChange {
    from {
        background-color: red;
    }
    to {
        background-color: blue;
    }
}

In the above example, the “colorChange” animation causes an element to change background color from red to blue.

More than color can be animated. Using the @keyframes rules to set positions will cause an affected element to move from one place to another.

@keyframes moveLeft{
    from {
        position: relative;
        left: 100px;
    to {
        position: relative;
        left: 0px;
    }
}

A multi-point animation can also be specified, using percentages.

@keyframes moveAround {
    0% {
        top: 0px;
        left: 0px;
    }
    25% {
        top: 0px;
        left: 50px;
    }
    50% {
        top: 50px;
        left: 50px;
    }
    75% {
        top: 50px;
        left: 0px;
    }
    100% {
        top: 0px;
        left: 0px;
    }
}

This animation would cause an animated element to move over, down, back over, and then back up to its original position.

Assigning animations to elements

Animations are defined separately from the elements they will affect. Animations are defined in @keyframes declarations as shown above, and then applied to elements.

Animations are applied to elements using the @keyframe name and a length of time that the animation should last. Other attributes can affect the animation as well.

@keyframes colorChange {
    from {
        background-color: red;
    }
    to {
        background-color: blue;
    }
}

.color-changing{
    width: 100px;
    height: 100px;
    animation-name: colorChange;
    animation-duration: 5s;
    }

If you don’t set the animation-duration, the animation will not run, because the default value is 0.

Several other animation properties can be set as well:

  • animation-delay — Specifies a delay for the start of an animation. The default is 0, which means the animation will play right away. Usually noted in seconds ( s ), but can also be noted in milliseconds ( ms ).
  • animation-direction — Specifies whether an animation should run as normal (the default), in reverse, or alternate between forward in reverse. The alternate value only works if the animation-iteration-count is higher than 1.
  • animation-fill-mode — Defines whether, and how, styles from the animation should affect the element when the animation is not running (when it is finished, or when there is a delay).
    • normal — The default. The animation styles do not have an effect on the element when the element is not playing.
    • forwards — The last keyframe style (to or 100% if the animation did not run in reverse) is applied to the element.
    • backwards — The first keyframe style (from or 0% if the animation did not run in reverse) is applied to the element.
  • animation-iteration-count — Specifies the number of times an animation should be played.
  • animation-timing-function — Specifies the speed curve of the animation:
    • ease — The animation begins slowly, speeds up, and then ends slowly.
    • ease-in — The animation begins slowly, speeds up, and ends at full speed.
    • ease-out — The animation begins at full speed, but slows down as it ends.
    • linear — The animation runs at full speed from beginning to end.

Browser Support for Animations

Most major browsers support CSS animations, but there’s a weird quirk with browsers that use the WebKit rendering enegine (Safari, Chrome, Opera).

In order to make sure that animations will work on those browsers, you have to duplicate the @keyframes declaration and the animation assignment attributes with a special webkit syntax.


/* This is for non-webkit browsers. */
@keyframes moveLeft{
    from {
        position: relative;
        left: 100px;
    to {
        position: relative;
        left: 0px;
    }
}

/* This is for webkit browsers: Chrome, Safari, and Opera. */
@-webkit-keyframes moveLeft{
    from {
        position: relative;
        left: 100px;
    to {
        position: relative;
        left: 0px;
    }
}

#animationSupportDemo {
    animation-name: moveLeft;
    animation-duration: 4s;
    -webkit-animation-name: moveLeft; /* Chrome, Safari, and Opera. */
    -webkit-animation-duration: 4s; /* Chrome, Safari, and Opera. */
    }

(In the live examples above, these additional styles have been added to the running demo code, but not the code examples, just to make it less complicated.)

This may seem redundant and silly (because it is), but it’s the only way to get your animations to work in those browsers.

When to use CSS Animations

If you need highly dynamic elements flying around the screen like an arcade game, CSS animations are not really the way to go — that’s more appropriate to Javascript.

CSS animations are best used to provide subtle design enhancements to a page. Some examples: - filling in the bars of a chart - pulling down a tab on hover - flying-in elements as a user scrolls down a page - slow and subtle changes of background color - “bouncing” buttons to encourage clicking

Combining CSS Animation with Javascript

CSS animations either start immediately, or after a specified delay. However, if you want to trigger a CSS animation upon some event occurring, you can set the animation to “paused” in the CSS, and then unpause it with Javascript at a later point. Pausing an animation is done with the attribute animation-play-state.

.pausedAnimation {
    animation-name: example;
    animation-duration: 5s;
    animation-play-state: paused;
    -webkit-animation-name: example;
    -webkit-animation-duration: 5s;
    -webkit-animation-play-state: paused;
}

The value for animation-play-state when unpaused is running. This can be manipulated in Javascript:

[element].style.animationPlayState = "running"

CSS in the Real World

In the real world of web development and design, very few developers sit down with an HTML document and a blank CSS file and start specifying styles from scratch. (Though doing so can certainly be fun.)

There are a number of typical “best practices” that CSS designers usually follow to make their work easier and more consistent.

CSS Resets

All HTML elements have default styles set by their browser. Each browser has slightly different default styles. This means that a single document with a single stylesheet may look different in different browsers. (Aside from issues of support and compatibility.)

A “CSS reset” is a set of styles that can be placed in a CSS stylesheet before any other styles are declared. The CSS resent provides a common base for adding styles across all browsers.

CSS resets are also used to set default display styles for HTML 5 elements that might not have default displays in older browsers — elements like <article> and <main>.

There are several common CSS reset templates. The most well known is the one created by Eric Meyer.


/* http://meyerweb.com/eric/tools/css/reset/ 
   v2.0 | 20110126
   License: none (public domain)
*/

html, body, div, span, applet, object, iframe,
h1, h2, h3, h4, h5, h6, p, blockquote, pre,
a, abbr, acronym, address, big, cite, code,
del, dfn, em, img, ins, kbd, q, s, samp,
small, strike, strong, sub, sup, tt, var,
b, u, i, center,
dl, dt, dd, ol, ul, li,
fieldset, form, label, legend,
table, caption, tbody, tfoot, thead, tr, th, td,
article, aside, canvas, details, embed, 
figure, figcaption, footer, header, hgroup, 
menu, nav, output, ruby, section, summary,
time, mark, audio, video {
    margin: 0;
    padding: 0;
    border: 0;
    font-size: 100%;
    font: inherit;
    vertical-align: baseline;
}
/* HTML5 display-role reset for older browsers */
article, aside, details, figcaption, figure, 
footer, header, hgroup, menu, nav, section {
    display: block;
}
body {
    line-height: 1;
}
ol, ul {
    list-style: none;
}
blockquote, q {
    quotes: none;
}
blockquote:before, blockquote:after,
q:before, q:after {
    content: '';
    content: none;
}
table {
    border-collapse: collapse;
    border-spacing: 0;
}

CSS Preprocessors

Imagine that you define a set of colors for your website design and want to use them in various places throughout your CSS.

If you apply the same color to your secondary headlines that you do to your copyright notice (and so forth), you’ll end up duplicating the color declaration in several different places.

What happens when you want to change it?

What about a scale of text size? You’ve set your default text size to 12px, and each headline size is a specific multiple of that size to make a unified set of sizes. What happens when you change the default size?

CSS is a declarative language — it requires each individual property and value to be specified, and there are no variables, functions, or calculations to make things easier.

In the same way the PHP (and other languages) make it easier to output HTML without having to repeat content on every page, CSS preprocessors allow you to include variables, function, and other programming constructs into stylesheets.

This makes it easier to specify color schemes, size ratios, and other types of repetitive declarations.

The two most popular CSS preprocessors are: - Less — Used by Twitter Bootstrap - Sass — Used by Ruby on Rails, Jekyll, and many other Ruby-based development tools.

Generally, developers write their stylesheets in Less or Sass and then compile them into CSS before launching a site. There are also client-side (in-browser) compilers written in Javascript, but these use a lot of resources and are typically only sed in development.

It is highly unusual at this point for a professional web developer to not use Less or Sass.

This guide cannot serve as an introduction to either of these tools, so we simply encourage you to learn more yourself.

Responsive Design

There was a time when you could be pretty sure you knew what type of screen your site’s visitors were going to be viewing your page with: a desktop monitor in one of a handful of default sizes.

Those days are long gone.

Users may be viewing your site on any one of a number of devices and screen sizes: mobile phone, mini tablets, tablets, laptops, desktops, televisions.

It is nearly impossible to individually target all these different screen sizes, and you can’t simply ignore them.

You may think that your site’s demographic is more likely to view with a desktop, but that’s unlikely to be true in any situation. Over 60% of internet traffic is coming from mobile devices. Having a website with a bad mobile experience is going to be bad for your business.

What’s more, Google has begun to alter its search results based on whether sites are optimized for mobile devices or not. Increasingly, if you don’t look good in mobile, no one will see you anyway.

The solution to this problem is Responsive Design.

Responsive design is an approach to site design — a philosophy — not a tool or a program. It is a way of structuring a page’s markup and CSS so that the page’s elements will reconfigure appropriately in different size windows.

The three pillars of responsive web design are:

Fluid Grids : Fluid grids involves dividing page grids based on percentages, rather than absolute pixels. : For example, you might have a main content column that spans 70% of a screen and a sidebar column that spans 30%. As the screen increases or decreases in size, the grid also expands or shrinks.

Fluid Images : Images should never be wider than the screen they are being displayed on, which means they should never be wider than the grid element containing them. : The easiest solution to this is to set the max-width on images to be 100%. : img { max-width: 100%; }

Media Queries : Media queries are a bit more complicated. They allow CSS to target specific screen types and sizes, declaring rules that only apply in particular circumstances. : Using media queries, you can make your main content and sidebar appear side-by-side on larger screens, and make them stack on top of each other vertically on smaller screens. : Media queries can also be used to hide or display certain elements which would be more or less useful in different contexts, as well as change things like font sizes or even swap out different images.

Responsive web design is built on simple concepts but the actual implementation of it — attempting to get it right in so many different contexts, with different screen sizes, device types, and browsers — is actually very complicated and difficult.

For this reason, and other reasons, many designers working today do not try to solve these problems themselves but choose to use a front end framework.

Front End Frameworks

There are a lot of things to think about when coding the CSS for a site design, and this short introduction only touched on a bit of it.

  • Getting a responsive grid structure to work in every browser and device.
  • Designing appropriate media queries and determining optimum screen-width breakpoints.
  • Typography that is beautiful and easy to read.
  • Making forms look even moderately decent, let alone beautiful and user-friendly.
  • Styling tables to look like they aren’t from 1997.
  • Graceful degradation for older browsers.
  • Remembering to use duplicate styling rules for certain -webkit features (and making sure you type the same rule the same way each time).
  • Getting CSS and Javascript to interact the right way.

Many of these issues are simple “boilerplate” coding tasks that take time away from coding new designs and features, and many are just to complicated to risk “hand-coding” on every project. No one wants to spend 50% or more of development time debugging foundational code — developers want to develop.

For these reasons, and following the similar trend in application code development, more and more developers are turning to front-end frameworks.

A front-end framework provides one or more HTML document structures and a complete CSS stylesheet that handles some or all the basic needs listed above. This frees the developers to focus on actual design decisions, instead of trying to get the media-queries to to work properly.

As with application development frameworks, there are a wide range of possibilities. Some frameworks specify a lot of design elements, including colors and button shapes. Others simply provide a minimal responsive grid. Some frameworks are highly customizable, and others provide a one-size-fits-or-not approach. Some include Javascript interactivity, and some do not.

Some of the more popular front end design framework currently in use are:

  • Bootstrap — From the development team at Twitter, Bootstrap was designed to speed up the prototyping and development of web applications. It provides excellent form and interactivity UI and a bold, distinctive design. Many people complain that it creates bloated markup (it does), because inexperienced users are encouraged to add styling classes into the HTML. Even so, it provides a great tool for rapid prototyping of any form-based web app.
  • Pure.css — A modular toolkit of CSS framework pieces which can be used individually or as a whole.
  • Foundation — Billing itself as “the most advanced responsive front-end framework in the world,” Foundation is thought by many to be like Bootstrap for people who care about good markup.
  • Skeleton — Skeleton is a “dead simple” CSS frameowrk that provides a responsive grid, clean typography, and just generally minimal design. It is intended to be “a starting point, not a framework.”

There are many more besides these, and new ones being developed all the time. Additionally, there are individual pieces — CSS snippets — that can be assembled into a bespoke framework. For example, you could combine a simple fluid-grid tool with a typographical library, and use a third tool kit for forms.

CSS Frameworks — like software development frameworks — are the way forward for complex web design and development. There’s just no sense in reinventing the wheel, or recoding the same solution, with every new project.

Still — to get the most out of a CSS framework, you have to understand how CSS works and how it interacts with HTML.

Summary

CSS — Cascading Style Sheets — is the language of design and visual presentation on the web. An HTML document is just a collection of content nodes, but CSS tells a browser how to present that content to the user.

As with HTML, there are both structural aspects of CSS and text-level aspects. CSS can also be used to create dynamic animations and responsive layouts.

Because of the great complexity of modern site design — which is largely due to the variety of browsing devices and screen sizes — site design must be responsive. That is, it must work on any size or type of device — the page must respond to its environment.

The difficulty and tedium of making sure that a site design can work in so many different settings, and that the large number of competing interests are dealt with, has led most developers to adopt CSS frameworks.

Like application development framework, CSS (or “front-end”) frameworks provide a starting point for the development of a new site design. They provide structure, guidance, and an opinion about how a page should be laid out.

CSS Frameworks can make it possible for nearly anyone to create responsive, functioning websites, but only someone with their own depth of understanding of CSS and HTML will be able to get the most use out of one.

JavaScript and HTML

This chapter introduces Javascript, the scripting language built into most modern web browsers. The focus is on how Javascript works with HTML and the browser, along with some practical tips for getting started using JavaScript as a developer. Topics covered include the Document Object Model, JavaScript Libraries, and JavaScript application frameworks. This chapter is not a JavaScript tutorial.

What is JavaScript

JavaScript is a scripting language built into (almost) every web browser. It is used to add dynamic interactivity and scripting to web pages. (It can also be used server-side, along with tools such as Node.js, but that is not the focus here.)

JavaScript is a fully-featured programming language, so anything is possible. It is geared, however, specifically to the needs of interacting with and manipulating HTML documents.

ECMAScript

If you work with JavaScript at all, you will run across the very weird name “ECMAScript.” This is the “official” name of JavaScriipt.

The standard specification for the language is maintained by an organization named Ecma, which used to be the European Computer Manufacturers Association. They have since changed their name to simply “Ecma,” which is no longer an acronym for anything.

Each web browser implements the ECMAScript standard a little differently (and Microsoft goes so far as to call their implementation JScript instead of Javascript). So some people will use “ECMAscript” to refer specifically to the standard form of the language, not to any dialects or derivations built into web browsers.

JavaScript and Java

Just so there isn’t any confusion — JavaScript has no real relationship to Java. Java was the new and trendy language when JavaScript was first being developed, and the original idea was that JavaScript would be based on Java — hence the name. However, JavaScript did not come to be based on Java at all, and there are really very few similarities between the two languages apart from the name.

Document Object Model

One of the key things to understand if you want to have a good handle on Javascript is the Document Object Model.

The Document Object Model is conceptually very similar to the document tree structure described in previous chapters — in fact, it essentially refers to the same thing.

The DOM is the API (Application Programming Interface) by which Javascript code interacts with the HTML document. When a document is rendered by a browser, the browser isn’t just showing you the source code with some additional style rules attached. The browser has read the source code and generated a view based it. Each element in the document has been turned into an object (in the programming sense) with attributes and methods accessible by Javascript.

The attributes of a DOM object include the declared (and declarable) attributes of the HTML element (such as class, id, and name), the CSS style of the element, and the content of the element itself. Methods associated with each DOM object include functions to change any of these attributes.

A rendered web page in a browser is a live view of the Document Object Model. This means that if the DOM changes in anyway (for example — the attributes of any element are changed) the view will change as well. This allows JavaScript to update or change the content of a web page without having to refresh or reload the page.

JavaScript also has APIs to most browser functions, so it can (for example) read the current state of a document, trigger refreshes, get the width of the browser window, and resize the browser window.

Using Javascript

This is not a JavaScript tutorial. This chapter is only trying to provide some helpful context for HTML developers dealing with JavaScript.

Including JavaScript in a page

Much like CSS, JavaScript can be included into a web page two ways — embedded scripts and linked resources.

To embed a script, simply include JavaScript code between to <script> tags.

<script>
 function changeColor (toChange, newColor) {
  document.getElementbyId(toChange).style.backgroundColor = newColor; 
    }
</script>

To include a separate JavaScript file into an HTML document, link to it with the <script> element.

<script type="text/javascript" src="app.js"></script>

The URL (relative or fully specified) of the JavaScript file is placed in the src attribute. The type attribute is not strictly needed in HTML5, but it is a good thing to include for both backward and (potentially) forward compatibility. Additionally, if there are multiple <script> elements on a page, it isn’t a bad idea to give them each unique id attributes.

It is generally considered better to include JavaScript as a separate file, rather than embedded on the page. This keeps functionality (JS) separate from content (HTML). However, there are practical exceptions to this general policy. For example, it is typical practice to include tracking code (such as the snippet of JS provided by Google Analytics) directly on page.

Where to put JavaScript

There are two common places for including JavaScript on a page, in the <head> and below the <footer>.

Placing links to a JavaScript file in the <head> of a document makes good semantic sense — it is a script that has an impact on the page as a whole, so it belong with other similar elements in the document <head>.

However, because the loading of the page is suspended while the JavaScript files are fetched and parsed, placing the <script> tags into the <head> has traditionally been frowned upon. The typical advice is to place them as the last element inside the <body>, just after the <footer>.

This is still good advice. However, there’s now a twist to the odl advice: there is now a way to place <script> elements into the document <head> without causing the page rendering to slow down. As of HTML 5, the <script> tag include an attribute labeled async. If you add the async attribute to your script tag, the page will not stop rendering, and the JS will be parsed and executed asynchronously (that is, in a separate thread).

<script type="text/javascript" src="app.js" async></script>

This should speed up the loading and rendering of the page as a whole. However, asynchronous loading and parsing may still cause some problems. If the data connection is slow or if the client’s computer is especially low-powered, it may still slow down the rendering of the page.

If you are especially concerned with squeezing out any inefficiency in slow connections and low-powered devices, it might still make sense to place your <script> tags at the bottom of the document.

JSON

A subset of JavaScript, which was touched on in the chapters on Semantic HTML, is JSON — JavaScript Object Notation.

JavaScript is an object oriented language, which means (among other things) that individual objects (in the real-world sense of the word) can be coded into data objects (in the programming sense of the word). For example, a blog post (a “real world” object) could be encoded as a JavaScript object:

{
    "title" : "Lorem Ipsum and All That Jazz",
    "author" : "Adam Wood",
    "content" :
      'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed mauris metus, euismod non sodales eu, molestie congue nibh. Nunc eu dignissim est. Donec non est a sapien rutrum imperdiet. Nunc vitae libero nec velit porta pulvinar vitae ut sapien. Aliquam consequat orci eget libero blandit semper. Aenean malesuada risus nec volutpat dapibus. Aliquam sit amet bibendum enim. Suspendisse at faucibus erat. Proin quis facilisis nisl. Vivamus sit amet enim elit. Aliquam nisl sapien, sagittis vitae nisi nec, vulputate efficitur urna.',
}

As we saw with Semantic Markup, JSON can be used to encode data about the content of a page, for consumption as a Semantic Markup API. This is helpful for Google’s indexing.

There’s also another important use for JSON — AJAX.

AJAX

AJAX is a design pattern (not a specific technology) in which asynchronous HTTP requests are made by JavaScript, allowing fresh data to be populated onto a web page without the page needing to be reloaded.

AJAX originally stood for “Asynchronous JavaScript and XML,” because originally the most common data transfer language for this type of design was XML. However, JSON has become the most common language because it is lighter, flexible, native to JavaScript, and easier to read. (Unfortunately, “AJAJ” isn’t very catchy, so the old name has stuck.)

The AJAX pattern doesn’t make any sense on most pages, but on web apps (like email, for example), it can make all the difference in the world.

AJAX is called “asynchronous JavaScript” because the JS code is making calls to the server at times other than on page load (which is when a browser usually makes calls to the server). Based on user actions or elapsed time, an AJAX-style web application will send a request to the server without reloading the page. The data from that response is then used to update the DOM (and thus, the view to the user) without reloading.

This type of design can be used in (for example) an email or chat application to send and fetch new messages. Social Networking sites use it to add new posts, and even some blogs use it to add newly-published content onto a page.

JSON is an ideal data format for this type of use case because it is native to JavaScript. Unlike XML, which has to be parsed into JavaScript for a browser to “understand” it, JSON is already in the target language.

Additionally, because JSON can be the sole contents of a JavaScript file (which can have any domain has its src URL), JSON can be used to avoid the difficulties of the cross-origin lockout. Typically, web browsers prevent web sites from making requests to and receiving data from domains other than the domain of the primary document. However, JavaScript (like CSS and images) is treated as a separate resource, which can come from anywhere. With a JSON implementation of the AJAX pattern, the cross-origin request can be “disguised” as a resource inclusion.

JavaScript in the real world

JQuery

The syntax of JavaScript can be a bit verbose (and obtuse) at times, and a number of very common operations require an abundance of “boilerplate” code.

For this reason, almost all JS developers use a JavaScript library called JQuery. JQuery provides concise APIs for a number of common use-cases, like document-traversal, DOM manipulation, and AJAX.

Most programming languages have a “standard library,” a core set of language extensions that automate or abstract the most common things programmers need to do in the language. JavaScript does not have an official standard library. Though there are other competing projects, with devoted developer bases (like Closure and Prototype), JQuery is — for many people — the JavaScript standard library.

Here is a very short example of how JQuery speeds up development. Suppose you want to change an attribute of an element.


// The "pure" JavaScript way.

document.getElementById("toChange").setAttribute("title","New Title Text");

// The JQuery way.

$.("#toChange").attr("title","New Title Text");

In this case it is only 28 characters being saved, but sometimes it is a lot more than that. And even the little savings add up over a big project.

JavaScript Front End Libraries

A lot of decent web designers, who have a good understanding of HTML and CSS, use JavaScript GUI enhancements without becoming JavaScript programmers.

This is possible because there are a number of JavaScript UI libraries (like JQuery UI) and front end frameworks (like Bootstrap) that provide a relatively easy HTML API.

An API (as mentioned above) in an Application Programming Interface. That is, a way to access the functionality of a piece of software from outside that software. In the context of and HTML API to JavaScript features, this means an easy way to include JavaScript features into HTML simply by adding things to the markup (usually special classes).

Exploring the feature set of JavaScript UI libraries can provide beginning developers (and not-so-beginning ones) with a lot of easy ideas for improving the interactivity of a site, and the tools to implement those ideas. (Just remember to pick one and stick with it for a project — try not to mix-and-match different UI libraries. The results are visual chaos and design confusion.)

JavaScript Front-End Application Frameworks

Separate from the front-end design frameworks (like Bootstrap) discussed in the previous chapter on CSS, a front-end application framework is a skeleton software app, which serves as a starting point not for the visual design but for the actual functionality of the web app.

JavaScript application frameworks like Backbone.js and Angular provide a structured template for a web app, automating a number of common tasks and providing a design architecture.

Most JS app frameworks implement the MVC, or Model View Controller, design pattern. This pattern works like this:

  • The Model handles the data
  • The View displays the data
  • The Controller connects the data in the model with the view, and handles application logic

Each JS framework implements this patten differently, but almost all of them do some version of this.

If you are going to attempt to build a JavaScript application, trying to build from scratch is almost always a terrible idea. Using a framework ensures that your app is built on a solid architectural foundation, and saves countless hours that otherwise would be spent coding low-level, generic functionality.

As has become the case with most software development, being a good JavaScript developer isn’t just about knowing how to code JavaScript — its about knowing what libraries and frameworks are best suited to various types of applications, and being able to use them with some degree of fluency.

Summary

JavaScript began life as a low-powered language used for silly effects and minor enhancements on web pages. But in the last decade or more a new generation of standards-compliant web browsers, along with advanced JavaScript libraries and frameworks, have turned the language into a serious platform for full scale in-browser applications.

HTML vs. Everything

This chapter wraps up our HTML coverage by looking at alternatives to HTML in several contexts, and showing that HTML has become the dominant language for content across the modern technology landscape.

HTML as an alternative

HTML is the default language of the web, but it is also being used as a platform for other types of design and content.

E-books

The most popular Open Source standard for e-books, the .epub format — which is usable on almost all e-readers except the Amazon Kindle — is actually just an HTML based format. Individual chapters, and sections like Tables of Contents and Indexes, are individual HTML files. All the content files, along with assets like images and CSS files, are simply zipped into a single file and given a .epub file extension. In many ways, an .epub book is analogous to a website with many pages in it.

Amazon Kindle has always preferred a proprietary file format. The first generation of Kindle’s used .mobi, and after that a related format called .azw. These were more complex than the .epub format, and were not based on HTML.

However, the latest generation of Kindles uses the new .azw3 format, which is based on HTML 5. While it was always possible to create Kindle e-books from HTML via a conversion software, HTML is now a primary authoring language for e-book content.

Mobile Apps

The two most popular mobile platforms — Android and iPhone — use completely different programming languages for app development. Android app development is typically done in Java, while iOS uses Objective C and the new language Swift.

This means that if you anted to build and release an app for both platforms, you would typically need to build the entire thing twice in two different languages. This is fine for large software companies like Facebook, but it can put a serious strain on smaller development houses.

A few different solutions to this problem have been created, but the most intriguing is the use of HTML (along with JavaScript and CSS).

Tools like PhoneGap allow developers to build an app once in browser based languages (HTML, JS, CSS) and then package them into apps for different operating systems. The PhoneGap software wraps the browser-based application into a “chromeless browser” — a browser-style rendering frame that only views files within the app’s directory and provides no user-facing navigation. This can be done for any supported operating system, allowing apps to be built once and deployed everywhere.

HTML has become the universal language.

Alternatives to HTML

Even though HTML is the native language of the web, and a powerful platform for interoperability, there are some alternative technologies that refuse to go away.

Flash

Flash is a multimedia software platform that can run in most web browsers with a plugin.

In the late 1990s, there was nothing cooler than a Flash-based website. They were highly interactive, animated, and boasted better graphics than simple HTML-based sites. You could even have them play music and video.

These sites quickly became tedious and annoying, but the technology hung on out of habit (people write what they are used to writing) and out of fear that sites and applications written in HTML 5 and JavaScript wouldn’t be supported in all browsers. Internet Explorer 6 remained in heavy use for years after the advent of better, standards-compliant browsers, so people kept pushing out Flash-based sites.

For the most part, this trend has died down. Flash has now mostly used for desktop and mobile video games, and its use on the web has been mostly curtailed among the smart crowd.

Unfortunately, small non-tech businesses (especially churches and community non-profits) are perpetually several years behind in technology trends, and many of them still want to include Flash elements (like a “Flash Intro”) into their website. This is almost always a mistake.

  • The most common use of Flash on non-interactive websites is the “Flash Intro,” which is incredibly annoying to users. No one wants to wait and watch your entertaining pre-show before finding the things they were looking for on your site. Never, ever do this.

  • Some people think it is a good idea to place content and menus into a Flash app, so that they can create cool effects like light-up menu items or rolling tabs. This is a bad idea:
    • People care about your content, not your special effects. Make your content easy to read and easy to navigate. No one is going to stay on your site longer or recommend it to their friends because they liked the way your content unfurled like a scroll when they clicked the flying menu button.
    • Placing content into Flash, instead of HTML, hides it from Search Engines, making your site effectively invisible to Google.
    • Placing content into Flash means that any particular view of your content is actually a specific state in a running app, not a shareable URL. This makes it very difficult for users to bookmark or share your content.
    • Content and navigation in a Flash app is not accessible to screen readers, making your site unavailable to people with visual disabilities.
    • Most of the effects you might want to create in Flash can be created more easily in JavaScript and CSS, without creating any of these problems.

PDF

PDF — Portable Document Format — is a great cross-platform format for print-focused documents.

With PDF, you create a single view on your content. A PDF has specific page dimensions, a particular layout and document flow, a particular font, a particular text-size. PDFs can embed information for printers (like ink colors).

All the things that make PDFs a good choice for things like sheet music and print books make it a terrible choice for online content.

And yet, many people and organizations (especially small non-tech businesses and non-profits) continue to publish PDFs to the web. A pervasive pattern of behavior is the creation of print-focused brochures which are then made available on a website, while no one ever actually receives a printed copy.

Anyone who wants to see the information in your brochure does not want to see a PDF. They want your content, and you should provide in the format appropriate to the medium — in the case of browsers, that medium is HTML.

One particularly egregious use of PDFs where HTML would be a better choice is fillable forms. Many organizations create PDF forms which can be filled out inside a PDF reader and then emailed back. One imagines some overworked secretary copying these forms into a whatever database system the office uses internally.

A more sensible solution would be an online HTML form, which posts the data directly into the database application. This would save labor and reduce errors. With HTTPS, it would be even more secure than emailing PDF forms around.

Summary: HTML wins

HTML has become the universal language of the web, as well as related technologies like e-books and mobile apps.

In some cases HTML is one choice among several, providing a set of benefits and drawbacks just like any other technology choice. In other cases HTML is not just the clear winner, but the only sensible choice.