Serving up XHTML with the correct MIME type
Serving up XHTML with the correct MIME type
Why is this necessary?
Ian Hickson of Opera Software, and one of the World Wide Web Consortium's major contributors, wrote this article in an attempt to clarify the correct way to serve up documents written in the Extensible Hypertext Markup Language (XHTML).
The W3C note on XHTML media types is not very specific about how XHTML should be served. Use of the word "should" creates ambiguity, which may be why Ian Hickson felt compelled to write about it. For your convenience, we have re-worked a table from the document's summary:
SHOULD
MUST NOT
MUST NOT
MUST NOT
MAY
SHOULD
MAY
MAY
SHOULD NOT
SHOULD
MAY
MAY
SHOULD NOT
SHOULD
MAY
MAY
SHOULD NOT
SHOULD
MAY
MAY
SHOULD NOT
SHOULD
MAY
MAY
Served just right
For most websites, authoring in HTML 4.01 is perfectly sufficient. Most of the features available in XHTML are available in good old HTML. However, some sites may wish to take advantage of the extensibility of XML, so delivering in XHTML with the correct MIME may be important.
For this reason, Keystone Websites has developed a technique that takes advantage of the PHP server-side scripting language. Web pages can be served in one of two ways:
As XHTML with a MIME type of application/xhtml+xml
to those browsers and other user agents with the proper support
As HTML with a MIME type of text/html
to all other user agents, and those agents that indicate a preference for that type
This technique will only work on pages that are valid XML. Authoring in this way requires considerable discipline. Documents must be well-formed, special characters must be encoded as entities, and all client-side scripting must be updated to avoid parsing errors.
Using PHP includes
By using PHP to redefine a document's header, it can be served in the appropriate manner. Any changes to the header of a document must come before anything else. By including external PHP in the first line of each document in a website, it will only be necessary to create the new header once. Changes can easily be made to accommodate new scenarios. The external PHP looks like this:
<?php
$charset = "iso-8859-1";
$mime = "text/html";
function fix_code($buffer) {
return (preg_replace("!\s*/>!", ">", $buffer));
}
if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) {
if(preg_match("/application\/xhtml\+xml;q=([01]|0\.\d{1,3}|1\.0)/i",$_SERVER["HTTP_ACCEPT"],$matches)) {
$xhtml_q = $matches[1];
if(preg_match("/text\/html;q=([01]|0\.\d{1,3}|1\.0)/i",$_SERVER["HTTP_ACCEPT"],$matches)) {
$html_q = $matches[1];
if((float)$xhtml_q >= (float)$html_q) {
$mime = "application/xhtml+xml";
}
}
} else {
$mime = "application/xhtml+xml";
}
}
if($mime == "application/xhtml+xml") {
$prolog_type = "<?xml version=\"1.0\" encoding=\"$charset\" ?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://http://www.zjjv.com///TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://http://www.zjjv.com///1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n";
} else {
ob_start("fix_code");
$prolog_type = "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://http://www.zjjv.com///TR/html4/loose.dtd\">\n<html lang=\"en\">\n";
}
header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");
print $prolog_type;
?>
By adding an include at the top of each document, the code above is then pulled in to alter the header appropriately. The include code looks like this:
<?php include("/path/filename.php"); ?>
The path used will depend on the web server. Some servers are set up to accept relative paths with PHP includes, and others insist on a path from a "home" directory, or something like that. Keystone Websites uses this path, for example:
<?php include("/home/username/public_html/includes/mime.php"); ?>
How it works
The included file is doing a number of things. First of all, a variable is created called $charset
that holds the character encoding, and another called $mime
is set to hold the value of "text/html". A function called $fix_code
is then created that replaces instances of " />
" with simply ">
". Those who create XHTML pages without a space before the trailing slash should remove that space from the function.
In the following if
statements, the stristr()
function is used to identify the substring of "application/xhtml+xml". If that is found, the preg_match()
function is employed to check for the existence of q
-ratings. If the q
-rating favors application/xhtml+xml
, or no q
-rating is present, $mime
is given a value of "application/xhtml+xml". Otherwise, the variable remains set to "text/html".
Some conditional logic is applied to the variable $mime
to determine which flavor of page will be sent. If it is set to "application/xhtml+xml", a variable called $prolog_type
is created that includes:
If $mime
remains set to "text/html", the $fix_code
function is called within an ob_start()
function. This has the effect of holding the entire page in a buffer while the trailing forward slashes are removed from the code. The buffer flushes automatically when the page has finished being processed. Then $prolog_type
is created to include:
A full HTML DOCTYPE
The html
element language attribute
Finally, the new header is sent with:
header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");
Note that Vary: Accept
is added to indicate the basis of the content negotiation. The variable $prolog_type
is displayed after the header. The rest of the document (after the include statement) follows immediately after. All pages of the website must be saved with a .php file extension to instruct the server to process the scripts. Please feel free to validate this page:
World Wide Web Consortium validator
Web Design Group validator
Keystone Websites would like to thank Anne, Sean, Bill, Simon, Basil, and Kornel for technical assistance with this article. We appreciate any assistance in helping to refine this technique, which must still be considered a work in progress. This page was last updated in October 2009.
Keystone Websites - Affordable Web Design Services
Copyright © 2003 - 2013 Keystone Websites & Simon Jessey, unless otherwise indicated. All rights reserved.