[LINK] XHTML

stephen loosley stephen@melbpc.org.au
Wed, 02 Feb 2000 23:45:39 +1100


Hello Jan, Glen and Stewart ..

Interesting discussion! As one who instructs Year Sevens (ie 12 year olds)
on writing quite sophisticated web pages using a basic text editor ( eg M/S 
Notepad) the X development will be a bit of a pain.  However, the changes
required don't appear all that difficult, and the benefits do seem worthwhile.

At 08:03 PM 2/02/2000 +1100, Stewart Fist wrote: 
 >
 > Jan in her XML rant, is right on.

Fair enough Stewart .. (though, just very quietly, Jan is in fact a 'he')

 > Here we go again making thing difficult because experts like to
 > exhibit their expertise, and corporations want all the possible goodies.

http://www.zdnet.com/devhead/resources/tag_library/history/xhtml.html

The HTML to XHTML Headache ... What Needs to Change:
Converting a document from HTML 4.0 to XHTML 1.0 will not be a totally painless affair -- some changes WILL need to be made. 

* TML document MUST be well-formed XML
It must conform to basic XML syntax. If it does not, the XML parser does not have an obligation to continue processing the document. Unlike today's HTML parsers, an XML parser will not try to recover and "guess" what you meant if the syntax is incorrect. 

* <html> MUST be the top-level element.
Not a change from HTML, but there are quite a few documents out there that neglect this important point. 

* Element and attribute names MUST be in lower case
HTML is not case-sensitive; XML is. 

* Attribute values MUST be quoted 

* End tags are required for non-empty elements
They are no longer optional. Affected Elements: basefont, body, colgroup, dd, dt, head, html, li, p, rt, spacer, tbody/thead/tfoot, th/td, tr 

* All empty elements must use the XML "empty tag" syntax
XML empty elements are explicitly closed with a trailing forward slash ("/") before the end bracket (eg. <br> becomes <br />) Affected Elements: area, base, bgsound, br, col, frame, hr, img, input, isindex, keygen, link, meta, option, param, wbr 

* XML does not allow attribute minimization.
Stand-alone attributes must be expanded (eg. <td nowrap>cell</td> becomes <td nowrap="nowrap">cell</td>) 

* Whitespace handling in attribute values is different in XML.
Leading/trailing spaces are truncated, and multiple spacing characters within the attribute value are collapsed to single spaces. 

* Script sections should be wrapped in XML CDATA sections
  
* SGML DTD exclusions are not possible in XML, but they should still be observed as "good practice".
Not allowed to nest within themselves: a, button, form, label. Pre exclusions: big, img, object, small, sub, sup
Button exclusions: fieldset, form, iframe, input, label, select, textarea 

Several of the above changes are to require certain features that were optional in the SGML world, or are optional in current usage because of historical leniency in implemented HTML parsers. When something becomes optional, people tend to abuse it. XML parsers will be very strict regarding these changes. In theory, any of these changes should NOT make documents unreadable by current browsers. 

HTML Tidy:
Dave Raggett (the co-author or primary author of the HTML 3.0, 3.2 and 4.0 specs) has created a free little program that converts an HTML page to XHTML for you, along with correcting many common authoring mistakes. See http://www.w3.org/People/Raggett/tidy/ for more details. 

Why it's Important:
The world of the web is changing, as are the browsers that access it. HTML has needed to change for quite some time in order to keep up, but it didn't have the power to do so. Changing HTML 4.0 into XHTML 1.0 will give it the power it needs to adapt today and to flourish in the future. 
--

Cheers all ..
Stephen Loosley
www.stephen.hm