(X)HTML Follows a Content Model
All forms of markup support a content model that specifies that
certain elements are
supposed to occur only within other elements. For example, markup like this
<ul>
<p>What a simple way to break the content model!</p>
</ul>
which often is used for simple indentation, actually doesn't follow
the content model for the
strict (X)HTML specifications. The <ul> tag is only supposed to
contain <li> tags. The <p>
tag is not really appropriate in this context. Much of the time, Web
page authors are able to
get away with this, but often they can't. For example, in some
browsers, the <input> tag
found outside a <form> tag is simply not displayed, yet in other browsers it is.
Elements Should Have Close Tags Unless Empty
Under traditional HTML, some elements have optional close tags. For
example, both of the
paragraphs here are allowed, although the second one is better:
<p>This isn't closed
<p>This is</p>
However, given the content model, the close of the top paragraph can
be inferred since its
content model doesn't allow for another <p> tag to occur within it.
HTML5 continues to
allow this, as discussed in Chapter 2.
A few elements, like the horizontal rule (hr) and line break (br), do
not have close tags
because they do not enclose any content. These are considered empty
elements and can be
used as is in traditional HTML. However, under XHTML you must always
close tags, so
you would have to write <br></br> or, more commonly, use a
self-closing tag format with
a final "/" character, like so: <br />.
Unused Elements May Minimize
Sometimes tags may not appear to have any effect in a document.
Consider, for example,
the <p> tag, which specifies a paragraph. As a block tag, it induces a
return by default, but
when used repeatedly, like so,
<p></p><p></p><p></p>
does this produce numerous blank lines? No, since the browser
minimizes the empty p
elements. Some HTML editors output nonsense markup such as
<p> </p><p> </p><p> </p>
to deal with this. If this looks like misused markup to you, you're right!
Elements Should Nest
A simple rule states that tags should nest, not cross; thus
<b><i>is in error as tags cross</b></i>
whereas
<b><i>is not since tags nest</i></b>
and thus is syntactically correct. All forms of markup, traditional
HTML, XHTML, and
HTML5, follow this rule, and while crossing tags may seem harmless, it
does introduce
some ambiguity in parse trees. To be a well-formed markup, proper
nesting is mandatory.
Attributes Should Be Quoted
Under traditional HTML as well as under HTML5, simple attribute values
do not need to be
quoted. If the attribute contains only alphanumeric content, dashes,
and periods, then the
quotes can safely be removed; so,
<img src=robot.gif height=10 width=10 alt=robot>
would work fine in most browsers and would validate. However, the lack
of quotes can
lead to trouble, especially when scripting is involved. Quotes should
be used under
transitional markup forms and are required under strict forms like XHTML; so,
<img src="robot.gif" height="10" width="10" alt="robot" />
would be the correct form of the tag. Generally, it doesn't matter
whether you use single or
double quotes, unless other quotes are found within the quotes, which
is common with
JavaScript or even with CSS when it is found in an attribute value.
Stylistically, double
quotes tend to be favored, but either way you should be consistent.
Entities Should Be Used for Special Characters
Markup parsers are sensitive to special characters used for the markup
itself, like < and >.
Instead of writing these potentially parse-dangerous characters in the
document, they should
be escaped out using a character entity. For example, instead of <,
use < or the numeric
equivalent <. Instead of >, use > or >. Given that the
ampersand character has
special meaning in an entity, it would need to be escaped as well
using & or &.
Beyond escaping characters, it is necessary to insert special
characters for special quote
characters, legal symbols like copyright and trademark, currency,
math, dingbats, and a
variety of other difficult-to-type symbols. Such characters are also
inserted with entities. For
example, to insert the Yen symbol (¥), you would use ¥ or ¥.
With Unicode in
play, there is a vast range of characters to choose from, but
unfortunately there are
difficulties in terms of compatibility, all of which is discussed in Appendix A.
Browsers Ignore Unknown Attributes and Elements
For better or worse, keep in mind that browsers will ignore unknown elements and
attributes; so,
<bogus>this text will display on screen</bogus>
and markup such as
<p id="myPara" obviouslybadattribute="TRUE">will also render fine.</p>
Browsers make best guesses at structuring malformed content and tend
to ignore code
that is obviously wrong. The permissive nature of browsers has
resulted in a massive number
of malformed HTML documents on the Web. Oddly, from many people's
perspective, this
isn't an issue, because the browsers do make sense out of the "tag
soup" they find. However,
such a cavalier use of the language creates documents with shaky
foundations at best. Once
other technologies such as CSS and JavaScript are thrown into the mix,
brazen flaunting of the
rules can have repercussions and may result in broken pages.
Furthermore, to automate the
exchange of information on the Web, collectively we need to enforce
stricter structure of our
documents. The focus on standards-based Web development and future
development of
XHTML and HTML5 brings some hope for stability and structure of Web documents.
Major Themes of (X)HTML
The major themes addressed in this section are deep issues that you
will encounter over and
over again throughout the book.
Logical and Physical Markup
No introduction to (X)HTML would be complete without a discussion of the logical
versus physical markup battle. Physical markup refers to using a
markup language such
as (X)HTML to make pages look a particular way; logical markup refers
to using (X)HTML
to specify the structure or meaning of content while using another
technology, such as CSS,
to designate the look of the page. We begin a deeper exploration of
CSS in Chapter 4.
Physical markup is obvious; if you want to highlight something that is
important to the
reader, you might embolden it by enclosing it within a <b> tag:
<b>This is important!</b>
This simple approach fits with the WYSIWYG (what you see is what you
get) world of programs
such as Microsoft Word.
Logical markup is a little less obvious; to indicate the importance of
the phrase, it should
be enclosed in the logical strong element:
<strong>This is important.</strong>
Interestingly, the default rendering of this would be to embolden the
text. Given the
difference, it seems the simpler, more obvious approach of using a <b>
tag is the way to go.
However, actually the semantic meaning of strong provides a bit more
flexibility and is
preferred. Remember, the <strong> tag is used to say that something is
important content,
not to indicate how it looks. If a CSS rule were defined to say that
important items should
be big, red, and italic
<style="text/css">
strong {font-size: xx-large; color: red; font-style: italic;}
</style>
confusion would not necessarily ensue, because we shouldn't have a
predisposed view of
what strong means visually. However, if we presented a CSS rule to
make <b> tags act
as such, it makes less sense because we assume that the meaning of the
tag is simply to
embolden some text.
HTML unfortunately mixes logical and physical markup thinking. Even
worse, common
renderings are so familiar to developers that tags that are logical
are assumed physical. What
does an <h1> tag do? Most Web developers would say it defines a big
heading. However,
that is assuming a physical view; it is simply saying that the
enclosed content is a level one
heading. How such a heading looks is completely arbitrary. While many
of HTML's logical
elements are relatively underutilized, others, such as headings and
paragraphs (<p>), are
used regularly though they are generally thought of as physical tags
by most HTML users.
Consider that people generally consider <h1> a large heading, <h2> a
smaller heading, and
predict that <p> tags cause returns and you can see that, logical or
not, the language is physical
to most of its users. However, does that have to be the case? No,
these are logical elements and
the renderings, while common, are not required and CSS easily can change them.
The benefits of logical elements might not be obvious to those
comfortable with physical
markup. To understand the benefits, it's important to realize that on
the Web, many browsers
render things differently. In addition, predicting what the viewing
environment will be is
difficult. What browser does the user have? What is his or her
monitor's screen resolution?
Does the user even have a screen? Considering the extreme of the user
having no screen at
all, how would a speaking browser render a <b> tag? What about a
<strong> tag? Text
tagged with <strong> might be read in a firm voice, but boldfaced text
might not have an
easily translated meaning outside the visual realm.
Many realistic examples exist of the power of logical elements. Consider the
international aspects of the Web. In some countries, the date is
written with the day first,
followed by the month and year. In the United States, the date
generally is written with
the month first, and then the day and year. A <date> or a <time> tag,
the latter of which
is actually now part of HTML5, could tag the information and enable
the browser to
localize it for the appropriate viewing environment. In short,
separation of the logical
structure from the physical presentation allows multiple physical
displays to be applied
to the same content. This is a powerful idea which, unfortunately,
even today is rarely
taken advantage of.
Whether you subscribe to the physical (specific) or logical (general) viewpoint,
traditional HTML is neither purely physical nor purely logical, at
least not yet. In other
words, currently used HTML elements come in both flavors, physical and
logical, though
users nearly always think of them as physical. This is likely not
going to get settled soon;
the battle between logical and physical markup predates HTML by
literally decades.
HTML5 will certainly surprise any readers who are already logical
markup fans, because
it fully preserves traditional presentational tags like <b> and <i>,
given their common
use, though jumps through some interesting mental hoops to claim
meaning is changed.
Further, the new specification promotes media- and visual-focused markup like
<canvas> and <video> and introduces tremendously powerful navigational and
sectioning logical-focused tags. If recent history is any guide, then
HTML5 is likely going
to pick up many fans.
Standards vs. Practice
Just because a standard is defined doesn't necessarily mean that it
will be embraced. Many
Web developers simply do not know or care about standards. As long as
their page looks
right in their favorite browser, they are happy and will continue to
go on abusing HTML
tags like <table> and using various tricks and proprietary elements.
CSS has really done little to change this thinking, with the latest
browser hacks and filters as popular as the pixel
tricks and table hacks of the generation before. Developers tend to
favor that which is easy
and seems to work, so why bother to put more time in, particularly if
browsers render the
almost right markup with little complaint and notice?
Obviously, this "good enough" approach simply isn't good enough.
Without standards,
the modern world wouldn't work well. For example, imagine a world of
construction in
which every nut and bolt might be a slightly different size. Standards
provide needed
consistency. The Web needs standards, but standards have to
acknowledge what people
actually do. Declaring that Web developers really need to validate,
use logical markup, and
separate the look from the structure of the document is great but it
doesn't get them to do
so. Standards are especially pointless if they are never widely implemented.
Web technologies today are like English—widely understood but poorly
spoken. However,
at the same time they are the Latin of the Web, providing a strong
foundation for development
and intersecting with numerous technologies. Web standards and
development practices
provide an interesting study of the difference between what theorists
say and what people
want and do. HTML5 seems a step in the right direction. The
specification acknowledges that,
for better or worse, traditional HTML practices are here for now, and
thus attempts to make
them solid while continuing to move technology forward and encourage
correct usage.
Myths and Misconceptions About HTML and XHTML
The amount of hearsay, myths, and complete misunderstandings about
HTML and XHTML
is enormous. Much of this can be attributed to the fact that many
people simply view the
page source of sites or read quick tutorials to learn HTML. This
section covers a few of the
more common misconceptions about HTML and tries to expose the truth behind them.
Misconception: WYSIWYG Works on the Web
(X)HTML isn't a specific, screen- or printer-precise formatting
language like PostScript.
Many people struggle with HTML on a daily basis, trying to create
perfect layouts using
(X)HTML elements inappropriately or using images to make up for HTML's
lack of screen
and font-handling features. Interestingly, even the concept of a
visual WYSIWG editor
propagates this myth of HTML as a page layout language. Other
technologies, such as CSS,
are far better than HTML for handling presentation issues and their
use returns HTML to its
structural roots. However, the battle to make the end user see exactly
what you see on your
screen is likely to be a futile one.
Misconception: HTML Is a Programming Language
Many people think that making HTML pages is similar to programming.
However, HTML
is unlike programming in that it does not specify logic. It specifies
the structure of a
document. The introduction of scripting languages such as JavaScript
into Web documents
and the confusing terms Dynamic HTML (DHTML) and Ajax (Asynchronous JavaScript
and XML) tacked on may lead many to overestimate or underestimate the
role of markup in
the mix. However, markup is an important foundation for scripting and
should be treated
with the same syntactical precision that script is given.
Misconception: XHTML Is the Only Future
Approaching its tenth birthday, XHTML still has yet to make much
inroads in the widespread
building of Web pages. Sorry to say, most documents are not authored
in XHTML, and many of those that are, are done incorrectly. Poor
developer education, the more stringent syntax
requirements, and ultimately the lack of obvious tangible benefit may
have kept many from
adopting the XML variant of HTML.
Misconception: XHTML Is Dead
Although XHTML hasn't taken Web development by storm, the potential
rise of HTML5
does not spell the end of XHTML. In fact, you can write XML-style
markup in HTML,
which most developers dub XHTML 5. For precision, XHTML is the way to
go, particularly
when used in an environment that includes other forms of XML documents. XHTML's
future is bright for those who build well-formed, valid markup documents.
Myth: Traditional HTML Is Going Away
HTML is the foundation of the Web; with literally billions of pages in
existence, not every
document is going to be upgraded anytime soon. The "legacy" Web will
continue for years,
and traditional nonstandardized HTML will always be lurking around
underneath even the
most advanced Web page years from now. Beating the standards drum
might speed things
up a bit, but the fact is, there's a long way to go before we are rid
of messed-up markup.
HTML5 clearly acknowledges this point by documenting how browsers
should act in light
of malformed markup.
Having taught HTML for years and having seen how both HTML editors and people
build Web pages, I think it is very unlikely that strictly conforming
markup will be the norm
anytime soon. Although (X)HTML has had rules for years, people have
not really bothered to
follow them; from their perspective, there has been little penalty for
failing to follow the
rules, and there is no obvious benefit to actually studying the
language rigorously. Quite
often, people learn markup simply through imitation by viewing the
source of existing
pages, which are not necessarily written correctly, and going from
there. Like learning a
spoken language, (X)HTML's loosely enforced rules have allowed many
document authors
to get going quickly. Its biggest flaw is in some sense its biggest
asset and has allowed
millions of people to get involved with Web page authoring. Rigor and
structure is coming,
but it will take time, tools, and education.
Myth: Someday Standards Will Alleviate All Our Problems
Standards are important. Standards should help. Standards likely won't
fix everything.
From varying interpretations of standards, proprietary additions, and
plain old bugs, there
is likely never going to be a day where Web development, even at the
level of (X)HTML
markup, doesn't have its quirks and oddities. The forces of the market
so far have proven
this sentiment to be, at the very least, wishful thinking. Over a
decade after first being
considered during the writing of this book's first edition, the wait
for some standards
nirvana continues.
Myth: Hand-Coding of HTML Will Continue Indefinitely
Although some people will continue to craft pages in a manner similar
to mechanical
typesetting, as Web editors improve and produce standard markup
perfectly, the need to
hand-tweak HTML documents will diminish. Hopefully, designers will
realize that knowledge
of the "invisible pixel" trick or the CSS Box Model Hack is not a
bankable resume item and
instead focus on development of their talents along with a firm
standards-based understanding
of markup, CSS, and JavaScript.
Myth: (X)HTML Is the Most Important Technology Needed to Create Web Pages
Whereas (X)HTML is the basis for Web pages, you need to know a lot
more than markup to
build useful Web pages (unless the page is very simple). However,
don't underestimate
markup, because it can become a bit of a challenge itself. Based on
the simple examples
presented in this chapter, you might surmise that mastering Web page
creation is merely a
matter of learning the multitude of markup tags, such as <h1>, <p>,
<em>, and so on, that
specify the structure of Web documents to browsers. While this
certainly is an important
first step, it would be similar to believing you could master the art
of writing by simply
understanding the various commands available in Microsoft Word. There
is a tremendous
amount to know in the field of Web design and development, including information
architecture, visual design, client- and server-side programming,
marketing and search
engines, Web servers and delivery, and much, much more.
The Future of Markup—Two Paths?
Having followed markup for well over a decade in writing editions of
this book and
beyond, it is still quite difficult to predict what will happen with
it in the future, other than
to say the move towards strict markup will likely be a bit slower than
people think and
probably not ideal. The sloppy syntax from the late 1990s is still
with us and is likely to be
so for some time. The desire to change this is strong, but so far the
battle for strict markup is
far from won. We explore here two competing, or potentially
complementary, paths for the
future of markup.
XHTML: Web Page Markup XML Style
A new version of HTML called XHTML became a W3C recommendation in January 2000.
XHTML, as discussed earlier in the chapter, is a reformulation of HTML
using XML that
attempts to change the direction and use of HTML to the way it ought
to be. So what does
that mean? In short, rules now matter. As you know, you can feed a
browser just about
anything and it will render. XHTML would aim to end that. Now if you
make a mistake, it
should matter.
Theoretically, a strictly XHTML-conforming browser shouldn't render a
page at all if it
doesn't conform to the standard, though this is highly unlikely to
happen because browsers
resort to a backward-compatibility quirks mode to display such
documents. The question is,
could you enforce the strict sense of XML using XHTML? The short
answer is, maybe not
ideally.
To demonstrate, let's reformulate the xhtmlhelloworld.html example
slightly by adding
an XML directive and forcing the MIME type to be XML. We'll then try
to change the file
extension to .xml to ensure that the server gets the browser to really
treat the file as XML data.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xml; charset=utf-8" />
<title>Hello XHTML World</title>
<!-- Simple hello world in XHTML 1.0 strict example -->
</head>
HTML5: Back to the Future
Starting in 2004, a group of well-known organizations and individuals
got together to form a
standards body called the Web Hypertext Application Technology Working Group, or
WHATWG (www.whatwg.org), whose goal was to produce a new version of
HTML. The exact
reasons and motivations for this effort seem to vary depending on who
you talk to—slow
uptake of XHTML, frustration with the lack of movement by the Web
standards body, need for
innovation, or any one of many other reasons—but, whatever the case,
the aim was to create a
new, rich future for Web applications that include HTML as a
foundation element. Aspects of
the emerging specification such as the canvas element have already
shown up in browsers
like Safari and Firefox, so by 2008, the efforts of this group were
rolled into the W3C and drafts
began to emerge. Whether this makes HTML5 become official or likely to
be fully adopted is
obviously somewhat at the mercy of the browser vendors and the market,
but clearly another
very likely path for the future of markup goes through HTML5. Already
we see Google
adopting it in various places, so its future looks bright.
NOT E While HTML5 stabilized somewhat around October 2009, with a W3C
final candidate
recommendation goal of 2012, you are duly warned that the status of
HTML5 may change.
Because of the early nature of the specification, specific
documentation of HTML5 focuses more on
what works now than on what may make it into the specification later.
HTML5 is meant to represent a new version of HTML along the HTML 4 path. The
emerging specification also suggests that it will be a replacement for
XHTML, yet it ends up
supporting most of the syntax that end users actually use,
particularly self-identifying
empty elements (for example, <br />). It also reverses some of the
trends, such as case
sensitivity, that have entered into markup circles, so it would seem
that the HTML styles of
the past will be fine in the future. In most ways, HTML5 doesn't
present much of a
difference, as you saw earlier in the chapter's introductory example,
shown again here:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Hello HTML World</title>
<!-- Simple hello world in HTML5 example -->
</head>
<body>
<h1>Welcome to the Future World of HTML5</h1>
<hr>
<p>HTML5 <em>really</em> isn't so hard!</p>
<p>Soon you will ♥ using HTML.</p>
<p>You can put lots of text here if you want.
We could go on and on with fake text for you
to read, but let's get back to the book.</p>
</body>
</html>
ONLINE http://htmlref.com/ch1/helloworldhtml5.html
No comments:
Post a Comment