Serving up XHTML with the correct MIME type

2013 年 12 月 25 日3800

Serving up XHTML with the correct MIME type

Why is this necessary?

Ian Hickson of Opera Software, and one of the World Wide Web Consortium's major contributors, wrote this article in an attempt to clarify the correct way to serve up documents written in the Extensible Hypertext Markup Language (XHTML).

The W3C note on XHTML media types is not very specific about how XHTML should be served. Use of the word "should" creates ambiguity, which may be why Ian Hickson felt compelled to write about it. For your convenience, we have re-worked a table from the document's summary:

Media types summary for serving XHTML documents Media Type text/html application/xhtml+xml application/xml text/xml HTML 4

SHOULD

MUST NOT

MUST NOT

MUST NOT

XHTML 1.0 (HTML Compatible)

MAY

SHOULD

MAY

MAY

XHTML 1.0 (other)

SHOULD NOT

SHOULD

MAY

MAY

XHTML Basic

SHOULD NOT

SHOULD

MAY

MAY

XHTML 1.1

SHOULD NOT

SHOULD

MAY

MAY

XHTML + MathML

SHOULD NOT

SHOULD

MAY

MAY

Served just right

For most websites, authoring in HTML 4.01 is perfectly sufficient. Most of the features available in XHTML are available in good old HTML. However, some sites may wish to take advantage of the extensibility of XML, so delivering in XHTML with the correct MIME may be important.

For this reason, Keystone Websites has developed a technique that takes advantage of the PHP server-side scripting language. Web pages can be served in one of two ways:

    As XHTML with a MIME type of application/xhtml+xml to those browsers and other user agents with the proper support

    As HTML with a MIME type of text/html to all other user agents, and those agents that indicate a preference for that type

This technique will only work on pages that are valid XML. Authoring in this way requires considerable discipline. Documents must be well-formed, special characters must be encoded as entities, and all client-side scripting must be updated to avoid parsing errors.

Using PHP includes

By using PHP to redefine a document's header, it can be served in the appropriate manner. Any changes to the header of a document must come before anything else. By including external PHP in the first line of each document in a website, it will only be necessary to create the new header once. Changes can easily be made to accommodate new scenarios. The external PHP looks like this:

<?php



$charset = "iso-8859-1";



$mime = "text/html";



function fix_code($buffer) {



return (preg_replace("!\s*/>!", ">", $buffer));



}



if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) {



if(preg_match("/application\/xhtml\+xml;q=([01]|0\.\d{1,3}|1\.0)/i",$_SERVER["HTTP_ACCEPT"],$matches)) {



$xhtml_q = $matches[1];



if(preg_match("/text\/html;q=([01]|0\.\d{1,3}|1\.0)/i",$_SERVER["HTTP_ACCEPT"],$matches)) {



$html_q = $matches[1];



if((float)$xhtml_q >= (float)$html_q) {



$mime = "application/xhtml+xml";



}



}



} else {



$mime = "application/xhtml+xml";



}



}



if($mime == "application/xhtml+xml") {



$prolog_type = "<?xml version=\"1.0\" encoding=\"$charset\" ?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://http://www.zjjv.com///TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://http://www.zjjv.com///1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n";



} else {



ob_start("fix_code");



$prolog_type = "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://http://www.zjjv.com///TR/html4/loose.dtd\">\n<html lang=\"en\">\n";



}



header("Content-Type: $mime;charset=$charset");



header("Vary: Accept");



print $prolog_type;



?>

By adding an include at the top of each document, the code above is then pulled in to alter the header appropriately. The include code looks like this:

<?php include("/path/filename.php"); ?>

The path used will depend on the web server. Some servers are set up to accept relative paths with PHP includes, and others insist on a path from a "home" directory, or something like that. Keystone Websites uses this path, for example:

<?php include("/home/username/public_html/includes/mime.php"); ?>

How it works

The included file is doing a number of things. First of all, a variable is created called $charset that holds the character encoding, and another called $mime is set to hold the value of "text/html". A function called $fix_code is then created that replaces instances of " />" with simply ">". Those who create XHTML pages without a space before the trailing slash should remove that space from the function.

In the following if statements, the stristr() function is used to identify the substring of "application/xhtml+xml". If that is found, the preg_match() function is employed to check for the existence of q-ratings. If the q-rating favors application/xhtml+xml, or no q-rating is present, $mime is given a value of "application/xhtml+xml". Otherwise, the variable remains set to "text/html".

Some conditional logic is applied to the variable $mime to determine which flavor of page will be sent. If it is set to "application/xhtml+xml", a variable called $prolog_type is created that includes:

If $mime remains set to "text/html", the $fix_code function is called within an ob_start() function. This has the effect of holding the entire page in a buffer while the trailing forward slashes are removed from the code. The buffer flushes automatically when the page has finished being processed. Then $prolog_type is created to include:

A full HTML DOCTYPE

The html element language attribute

Finally, the new header is sent with:

header("Content-Type: $mime;charset=$charset");



header("Vary: Accept");

Note that Vary: Accept is added to indicate the basis of the content negotiation. The variable $prolog_type is displayed after the header. The rest of the document (after the include statement) follows immediately after. All pages of the website must be saved with a .php file extension to instruct the server to process the scripts. Please feel free to validate this page:

    World Wide Web Consortium validator

    Web Design Group validator

Keystone Websites would like to thank Anne, Sean, Bill, Simon, Basil, and Kornel for technical assistance with this article. We appreciate any assistance in helping to refine this technique, which must still be considered a work in progress. This page was last updated in October 2009.

Keystone Websites - Affordable Web Design Services

Copyright © 2003 - 2013 Keystone Websites & Simon Jessey, unless otherwise indicated. All rights reserved.

0 0