HTML and CSS Standards

Webpage Analysis Criteria

Below are criteria by which I judge the quality of a site's HTML/CSS and its accessibility. It starts with a numerical analysis of various features and then evaluates the meaning of those numbers within the given context. There are few absolutes, but neither validation errors nor <font> tags should exist

The whole purpose of the HTML/CSS distinction is to separate structure from presentation through judicious use of CSS in combination with well-structured HTML. HTML provides a semantic and logical structure to the file while the CSS provides the appearance you want to achieve. The separation also makes it possible to change the appearance of an element throughout a site from one location, the CSS file(s), rather than having to edit each HTML file or some collection of include files. This process also simplifies meeting accessibility standards.

For my own part, all code will validate as correct HTML and CSS and will address accessibility issues. I use XHTML 1.0 Strict unless otherwise required. This most restrictive implementation path ultimately leads to the most flexible, accessible, and easily maintainable code.

There is a glossary at the end of this document just in case any term or especially acronym is unfamiliar.

HTML

DOCTYPE
Is there a DOCTYPE declaration and, more importantly, does the code conform to it? Very often there is a declaration of HTML 4.01 but then the code includes XHTML tag closers on contentless tags (e.g., link, input) or vice-versa. This throws the validator into a tizzy, producing spurious errors and masking real ones.

Validation errors
With standards well-defined and agreed upon everyone should strive to meet those standards. Adherence improves accessibility and uniformity across platforms and user agents. It also reduces the chance of user agents giving wildly different interpretations to the applied CSS (I've seen some really screwy looking material when code violated standards).
There are some recent sophisticated developments in accessibility which use techniques which don't validate under the present system. Those are the only exceptions to validation that should be accepted.

<font> tags
Totally unnecessary with CSS and a heavy-handed and inflexible way of controlling text. CSS has more ways to achieve more effects.

inter-element spacing achieved by using <br> tags, &nbsp;, and spacer images
Also made obsolete by CSS. Margins and padding can be controlled better and margins can be negative to achieve visual effects like these outdented paragraphs (effect achieved here through a semantic/HTML method).

<table> tags
Generally superseded by <div> and CSS for layout purposes. CSS is far more flexible and allows the HTML to be structured in a logical fashion for people who use alternate access methods (e.g., the handicapped); tables often present material in an illogical order. Any page with more than five tables for layout is using them badly. I have also seen only one used badly. Tabular data should take full advantage of <th>, <colgroup>, and other structural markup, though colgroup, thead, etc. are not well supported in CSS by current browsers. We hope that will change in the new releases.
For a demonstration of the flexibility provided by abandoning tables for layout purposes see CSS Zen Garden. In the right-hand column you will find alternate views of the same material—exactly the same material since there is no difference in the HTML; the apparent changes are in the applied CSS. There are over 200 designs the webmaster has thought worthy of posting—my guess is that many times that were rejected. My favorite is “Mozart”, number 189.

<div> tags
The preferred structural element, but can also be over-used with multiple nesting levels just because someone didn't bother to think out structure (tables redux). Machine generation (e.g., PHP, ColdFusion) often prompts people to stop thinking and overload a page (especially when using tables but also with divs).

<h#> tags
Should be used to give structure to the page and never used solely to size text. They are needed only if the page has a relevant structure. If used, must start with h1 and go in proper nesting order. They serve the same purpose as an outline but on the Web the page includes the text that fills in the outline. Accessibility standards encourage their use—at least <h1> and a blind informant says he relies heavily on levels of headings.
For instance, this page uses “HTML and CSS Standards” as its <h1> and the page title, “Webpage Analysis Criteria”, as its <h2>. It then also has several <h3> and <h4> tags to organize subsidiary portions. The size and other display characteristics are controlled by CSS.

<ul> tags
Should be used for all menus and for anything else that looks like a list. The fact that the menu is horizontal or you don't want bullets is irrelevant; we're talking structure, not appearance. CSS performs the bulletless horizontal magic (see Zephyr Press for a horizontal menu—top and bottom—and MGA or Wheelchair Mobility for a vertical menu whose buttons are solely CSS creations).

forms
Is the form restricted in scope to its place of use? People commit one of two form sins—enclose the whole page in a form even though the actual form is only a small part of the page or break the form across multiple structural elements. Avoiding the latter by committing the former is not a solution. A form whose only purpose is to present a search box should consist of little more than the associated input tags. Feedback and data entry forms may require extensive structural elements within the form—but not the whole page. Locating a form syntactically correctly also seems to be a challenge.

Skip links
Extra credit. Provide on-page navigation aids to the handicapped, especially the blind (see Zephyr Press).

CSS

Validation errors
Not generally a severe problem, but sometimes people invent values or even properties or use a value that's not valid for that property.

Efficiency/readability
This is where WYSInWYG tools truly shine in their stupidity. I have seen rules like “elementx { border-top: 1px solid red; padding-bottom: 4px; padding-top: 4px; margin-right: 7px; border-left: 1px solid red; margin-left: 7px; border-bottom: 1px solid red; margin-top: 7px; border-right: 1px solid red; padding-left: 4px; padding-right: 4px; margin-bottom: 7px; }”. I've seen this sort of thing many times; it is not rare (and often font or other information is included randomly just to add to the confusion). If someone actually wants to read the code without having to feed it into some compatible WYSInWYG tool, it will take many minutes to figure out that the rule says “elementx { margin: 7px; border: 1px solid red; padding: 4px; }”. And the first way probably isn't even the right way when there are differences in the TRBL values.

External vs. page-level vs. inline
As much as possible CSS should be put in external files for sharing across pages. A typical page should have no inline styles and only a minimum of page-level styles. Inline styles should be used only when there is a single usage of that style and it is unlikely to be needed by any other element. Home pages are often different from the rest of a site; a few page-level styles are okay but extensive CSS should be moved to a home page-specific CSS file. Remember, an HTML page can reference as many CSS files as required to do the job and one CSS file can reference others to help provide some coherent structure. I hate trying to read through CSS files that are 20+ screens long (there are a lot of them) just to find the code that applies to some limited portion of a page. Some of those bloated CSS files are a result of the efficiency issue mentioned above, but not all.
As an example of multiple CSS files, King's College London uses a different colour scheme for each major portion of the Website—Undergraduate, Graduate, Research, etc. This is effected by the invocation of different external CSS files whose only function is to control issues surrounding colour (background, border, associated images, etc.). Other CSS files create a consistent appearance across the whole Website.

Text size
Should be specified relatively rather than absolutely, so it can be resized by users (see browser standards for basic type information and two scalable methods).

Class/id names
Should be chosen to reflect the function of the matter covered and not its appearance (e.g., bad: class="bluebox"; good: id="special-note").

Some things, like class and id name choices, cannot be quantified. Another is where to place rules and yet another is how much repetition to tolerate. These are judgment calls where I choose the highest level that can easily be controlled. For example, people often specify the same font-family for all paragraphs, headings, and table data when it could be specified in the body rule (e.g., MIT home page eleven times, but at no point do they use any other font-family, even the default; ditto WGBH—actually, these sites have recently updated their usage or are in the process of doing so and I need to find other examples).

A large CSS file is not, by itself, an indication of good CSS usage. Very often CSS is over-specified and underutilized. Bloating occurs from things mentioned already as well as creating many more classes than needed. The BBC has at least seven CSS files, all large, and I couldn't find the organizing element. One class name can occur in multiple files, making it frighteningly difficult to find the rule that applies at any given moment. Yet many such sites still have a <body> tag that specifies the non-standard attributes of marginheight, marginwidth, leftmargin, and topmargin—which are correctly handled in CSS.

For the basics of CSS usage check A CSS Quick Reference.

JavaScript

Like CSS, as much as possible—all functions—should be in external files to reduce clutter and load times, leaving only the function invocations (onload, etc.) in HTML. JavaScript also needs to have a workaround for user agents that don't recognize JavaScript or have it turned off. See J Korpela for one example of how to fix a common problem and see the rest of the page for more JavaScript advice—including having the introduction say “Specifically, one should never rely on JavaScript alone in the processing of data entered by user” (my emphasis).

That being said, the Web is constantly evolving and where it started out as an alternate publishing medium, it has recently also acquired the function of an alternate application platform. Instead of writing a document in MSWord or some other word processor and then sending copies (print or electronic) to interested parties, it is now possible to write the document on a word processor accessed through the web, have it immediately available to others, and to allow them to contribute to or modify/edit it themselves. You can also create and submit forms whose contents vary according to initial and evolving conditions, where JavaScript changes the page dynamically, without going back to the server. I don't think the standards have caught up with this situation.

Accessibility

The Web Accessibility Initiative (WAI) is the W3C set of “Strategies, guidelines, resources to make the Web accessible to people with disabilities.” The method of achieving accessibility is set out in the Web Content Accessibility Guidelines (WCAG 1.0 – stable – and 2.0 – under development).

WCAG 1.0 consist of fourteen guidelines, each with several checkpoints which are grouped into three priority levels. Some of these checkpoints can be checked with automated tools while others must be checked manually.

The U.S. government has standards set forth in Section 508 that government sites and contractors are supposed to follow. They are similar to WAI, but not as rigorous. The U.K. also has its own set of standards, as do other governments.

Typically, simply converting from the old, table-based structure to modern structure and validating the code will reduce the number of accessibility errors and warnings significantly. For instance, the “alt” attribute for images is required for both validation and WCAG. In addition, WCAG wants the contents of the alt attribute to be meaningful; that has to be a manual check. Eliminating the use of images as spacers thus eliminates all associated errors and warnings.

Several online tools make it relatively easy to find and fix many errors.

Graphs

Page graph

Websites_as_Graphs

This tool graphs the tags on an individual page within a website, despite its name. It creates a tree of color-coded nodes that gives some idea of how the page is put together. For instance, lots of red says table-based structure and green indicates div-based. Lots of nodes off the body tag indicates a probable lack of structure. All elements within a form should be clustered together (apparently harder than one might think). I've also seen pages where the form tag encloses everything (<body><form>…</form></body>), even though only a few lines, if any, are the real form. Lots of images may indicate their use as spacers. I'd like to see an additional color for lists, since they should be a strong structural element. Not all table tags get colored red; caption, th, thead, tbody, tfoot, col, and colgroup are omitted.

Unbranched chains of red or green indicate nesting that is probably not well thought out and therefore unnecessary.

What do the colors mean?

Site graph

Recommendations for a good one gratefully accepted.

http://www.touchgraph.com/ have to construct the tool?

VisVIP check this out

Glossary

<…>
Material enclosed between angle brackets constitute one of many HTML tags and associated attributes which control what appears on the computer screen.

Accessibility
The concept that Web pages should be structured and constructed in such a way that they are available to the widest range of people possible regardless of access method. Some accommodations are directed at hand-held devices or text-only browsers. Others address disabilities ranging from color-blindness to cognitive and physical impairments (perhaps 10% of the U. S. population).

CSS
Cascading Style Sheets—a tri-level system of applying rules to control the appearance of Web pages. The rules consist of one or more property/value pairs and can be applied to multiple pages with external files, to a single page, or to a single tag. A CSS Quick Reference gives the basic outline for use.

DOCTYPE
A formal statement at the beginning of a conforming document of its Document Type Definition (DTD)—a rigourous specification for a language so the user agent knows how to treat what follows. Failure to include a DOCTYPE leaves the user agent to guess at what parsing rules to use and how best to display the document.
Browsers operate in “standards mode” or “quirks mode” based on a correct DOCTYPE. The latter tries to match the bad-old-methods that don't display the same under the former.

HTML
HyperText Markup Language—the basic language for writing pages that appear on the World Wide Web (WWW). This includes XHTML (eXtensible HTML), which is a subset of XML (eXtensible Markup Language), a more rigourous definition of how a computer language should be structured. Until HTML version 4.0 the language did not have a clear definition that most players accepted and agreed would be the basis for browser and other user agent development.

Tag (W3C often uses “element” to refer to a tag)
The basic structural element of HTML which may include several attribute/value pairs to more precisely control the presentation of a Web page.

TRBL
Top, Right, Bottom, Left (TRouBLe—i.e., stay out of trouble by following this sequence); the sequence for interpreting CSS shortcut properties. For example, the rule “img { margin-left: 5px; margin-right; 2px; margin-top: 10px; margin-bottom: 0px }” is more simply and clearly written as “img { margin: 10px 2px 0px 5px; }”.

User agent
Any device through which a person accesses the Web, whether it be one of the standard browsers, a handheld device (PDA, cell phone), a text-only browser, or screen reader or tactile device for the blind (list not exhaustive).

Validation
The process of measuring HTML or other code against a precise syntactic definition or other specification of a standard.

W3C
World Wide Web Consortium—the body responsible for setting standards for the Web, i.e., HTML, CSS, etc. It's members constitute various stakeholders in the Web.

Web or WWW
Shorthand for World Wide Web. (WWW is sometimes called “dub-dub-dub” to avoid having to say so many syllables.)

Website
The collection of pages (one or many; static or dynamic) which originate at a single page (generally designated the home page) which is itself uniquely identified by a WWW domain name (e.g., NPR.org).

WYSInWYG
What you see is NOT what you get—my reformulation of the usual WYSIWYG (What You See Is What You Get) description of a tool that, unlike original computer tools, purported, like MSWord, to immediately reflect the appearance of the final product. A WYSInWYG tool, on the other hand, mimics its namesake but has no hope of actually fulfilling that mission because of internal and external constraints beyond its control. Any visual web authoring tool is WYSInWYG because it uses an internal browser which is of necessity different from all outside browsers.

There is also another similar formulation called WYSINWOG—What You See Is Not What Others Get. Again, the reason is that each browser interprets the code differently and not everyone is using the standard visual browsers. That's why disciplined use of modern methods and standards is necessary.