HTML Standard Tracker

Filter

File a bug

SVNBugCommentTime (UTC)
2861[Conformance Checkers] Reword how we require that XML documents that use <meta charset> must use UTF-8. Also require it in the first 512 bytes.2009-02-23 12:57
@@ -9481,29 +9481,32 @@ http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20HTML%3E%0
   <code title="attr-meta-http-equiv">http-equiv</code>, and <code
   title="attr-meta-charset">charset</code> attributes must be
   specified.</p>
 
   <p>If either <code title="attr-meta-name">name</code> or <code
   title="attr-meta-http-equiv">http-equiv</code> is specified, then
   the <code title="attr-meta-content">content</code> attribute must
   also be specified. Otherwise, it must be omitted.</p>
 
   <p>The <dfn title="attr-meta-charset"><code>charset</code></dfn>
-  attribute specifies the character encoding used by the document. In
-  <span title="HTML5">HTML documents</span> this is a <span>character
-  encoding declaration</span>. If the attribute is present in an <span
-  title="XHTML">XML document</span>, its value must be an <span>ASCII
+  attribute specifies the character encoding used by the
+  document. This is a <span>character encoding declaration</span>. If
+  the attribute is present in an <span title="XHTML">XML
+  document</span>, its value must be an <span>ASCII
   case-insensitive</span> match for the string "<code
-  title="">UTF-8</code>", and the resource must be encoded using the
-  UTF-8 character encoding. (The element has no effect in XML
-  documents, and is only allowed to facilitate migration to and from
-  XHTML.)</p>
+  title="">UTF-8</code>" (and the document is therefore required to
+  use UTF-8 as its encoding).</p>
+
+  <p class="note">The <code title="attr-meta-charset">charset</code>
+  attribute on the <code>meta</code> element has no effect in XML
+  documents, and is only allowed in order to facilitate migration to
+  and from XHTML.</p>
 
   <p>There must not be more than one <code>meta</code> element with a
   <code title="attr-meta-charset">charset</code> attribute per
   document.</p>
 
   <p>The <dfn title="attr-meta-content"><code>content</code></dfn>
   attribute gives the value of the document metadata or pragma
   directive when the element is used for those purposes. The allowed
   values depend on the exact context, as described in subsequent
   sections of this specification.</p>
@@ -10074,21 +10077,23 @@ people expect to have work and what is necessary.
 
   <p>Conformance checkers must use the information given on the WHATWG
   Wiki PragmaExtensions page to establish if a value not explicitly
   defined in this specification is allowed or not.</p>
 
 
   <h5 id="charset">Specifying the document's character encoding</h5>
 
   <!-- XXX maybe the rest should move to "writing html" section,
   though if we do then we have to duplicate the requirements in the
-  parsing section for conformance checkers -->
+  parsing section for conformance checkers, and we have to make sure
+  that the requirements for charset="" apply even in XML, for the
+  <meta charset=""> polyglot hack -->
 
   <p>A <dfn>character encoding declaration</dfn> is a mechanism by
   which the character encoding used to store or transmit a document is
   specified.</p>
 
   <p>The following restrictions apply to character encoding
   declarations:</p>
 
   <ul>
 
@@ -10103,32 +10108,34 @@ people expect to have work and what is necessary.
    <li>The character encoding declaration must be serialized without
    the use of <span title="syntax-charref">character references</span>
    or character escapes of any kind.</li>
 
    <li id="charset512">The element containing the character encoding
    declaration must be serialised completely within the first 512
    bytes of the document.</li>
 
   </ul>
 
-  <p>If the document does not start with a BOM, and if its encoding is
-  not explicitly given by <span title="Content-Type">Content-Type
-  metadata</span>, then the character encoding used must be an
-  <span>ASCII-compatible character encoding</span>, and, in addition,
-  if that encoding isn't US-ASCII itself, then the encoding must be
-  specified using a <code>meta</code> element with a <code
+  <p>If an <span title="HTML documents">HTML document</span> does not
+  start with a BOM, and if its encoding is not explicitly given by
+  <span title="Content-Type">Content-Type metadata</span>, then the
+  character encoding used must be an <span>ASCII-compatible character
+  encoding</span>, and, in addition, if that encoding isn't US-ASCII
+  itself, then the encoding must be specified using a
+  <code>meta</code> element with a <code
   title="attr-meta-charset">charset</code> attribute or a
   <code>meta</code> element in the <span
   title="attr-meta-http-equiv-content-type">Encoding declaration
   state</span>.</p>
 
-  <p>If the document contains a <code>meta</code> element with a <code
+  <p>If an <span title="HTML documents">HTML document</span> contains
+  a <code>meta</code> element with a <code
   title="attr-meta-charset">charset</code> attribute or a
   <code>meta</code> element in the <span
   title="attr-meta-http-equiv-content-type">Encoding declaration
   state</span>, then the character encoding used must be an
   <span>ASCII-compatible character encoding</span>.</p>
 
   <p>Authors should not use JIS_X0212-1990, x-JIS0208, and encodings
   based on EBCDIC. Authors should not use UTF-32. Authors must not use
   the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a
   href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a

|