Short URL: http://html5.org/r/7647
| SVN | Bug | Comment | Time (UTC) |
|---|---|---|---|
| 7647 | 17839 | Embrace the Encodings specification. | 2013-01-24 01:38 |
Index: source
===================================================================
--- source (revision 7646)
+++ source (revision 7647)
@@ -521,13 +521,9 @@
been copying fixes made by the WHATWG into their fork of the document, as well as making other
changes, many of which are described <a href="#is-this-html5?">above</a>.</p>
- <p>A separate document has been published by the W3C HTML working group to document the
- differences between the HTML specified in this document and the language described in the HTML4
- specification. <a href="#refsHTMLDIFF">[HTMLDIFF]</a></p>
-
<h3>Design notes</h3>
<!--END dev-html--><p><i>This section is non-normative.</i></p><!--START dev-html-->
@@ -2284,76 +2280,68 @@
</div>
- <h4>Character encodings</h4>
- <p>The <dfn>preferred MIME name</dfn> of a character encoding is the
- name or alias labeled as "preferred MIME name" in the IANA
- <cite>Character Sets</cite> registry, if there is one, or the
- encoding's name, if none of the aliases are so labeled. <a
- href="#refsIANACHARSET">[IANACHARSET]</a></p>
+ <h4 id="encoding-terminology">Character encodings</h4>
- <p>An <dfn>ASCII-compatible character encoding</dfn> is a
- single-byte or variable-length encoding in which the bytes 0x09,
- 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A,
- and 0x61 - 0x7A<!-- is that list ok? do any character sets we want
- to support do things outside that range? -->, ignoring bytes that
- are the second and later bytes of multibyte sequences, all
- correspond to single-byte sequences that map to the same Unicode
- characters as those bytes in ANSI_X3.4-1968 (US-ASCII). <a
- href="#refsRFC1345">[RFC1345]</a></p>
+ <p>A <dfn title="encoding">character encoding</dfn>, or just <i>encoding</i> where that is not
+ ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
+ WHATWG Encoding standard. An <span>encoding</span> has an <dfn>encoding name</dfn> and one or more
+ <dfn title="encoding label">encoding labels</dfn>, referred to as the encoding's <i>name</i> and
+ <i>labels</i> in the Encoding specification. <a
+ href="#refsENCODING">[ENCODING]</a></p>
- <p class="note">This includes such encodings as Shift_JIS,
- HZ-GB-2312, and variants of ISO-2022, even though it is possible in
- these encodings for bytes like 0x70 to be part of longer sequences
- that are unrelated to their interpretation as ASCII. It excludes
- such encodings as UTF-7, UTF-16, GSM03.38, and EBCDIC variants.</p>
+ <p>An <dfn>ASCII-compatible character encoding</dfn> is a single-byte or variable-length
+ <span>encoding</span> in which the bytes 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C -
+ 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any character sets we want to support
+ do things outside that range? -->, ignoring bytes that are the second and later bytes of multibyte
+ sequences, all correspond to single-byte sequences that map to the same Unicode characters as
+ those bytes in ANSI_X3.4-1968 (US-ASCII). <a href="#refsRFC1345">[RFC1345]</a></p>
+ <p class="note">This includes such encodings as Shift_JIS, HZ-GB-2312, and variants of ISO-2022,
+ even though it is possible in these encodings for bytes like 0x70 to be part of longer sequences
+ that are unrelated to their interpretation as ASCII. It excludes UTF-16 variants, as well as
+ obsolete legacy encodings such as UTF-7, GSM03.38, and EBCDIC variants.</p>
+
<!--
- We'll have to change that if anyone comes up with a way to have a
- document that is valid as two different encodings at once, with
- different <meta charset> elements applying in each case.
+ We'll have to change that if anyone comes up with a way to have a document that is valid as two
+ different encodings at once, with different <meta charset> elements applying in each case.
-->
- <p>The term <dfn>a UTF-16 encoding</dfn> refers to any variant of
- UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
- a BOM, raw UTF-16LE, and raw UTF-16BE. <a
+ <p>The term <dfn>a UTF-16 encoding</dfn> refers to any variant of UTF-16: self-describing UTF-16
+ with a BOM, ambiguous UTF-16 without a BOM, raw UTF-16LE, and raw UTF-16BE. <a
href="#refsRFC2781">[RFC2781]</a></p>
- <p>The term <dfn>code unit</dfn> is used as defined in the Web IDL
- specification: a 16 bit unsigned integer, the smallest atomic
- component of a <code>DOMString</code>. (This is a narrower
- definition than the one used in Unicode.) <a
+ <p>The term <dfn>code unit</dfn> is used as defined in the Web IDL specification: a 16 bit
+ unsigned integer, the smallest atomic component of a <code>DOMString</code>. (This is a narrower
+ definition than the one used in Unicode, and is not the same as a <i>code point</i>.) <a
href="#refsWEBIDL">[WEBIDL]</a></p>
- <p>The term <dfn>Unicode code point</dfn> means a <i
- title="">Unicode scalar value</i> where possible, and an isolated
- surrogate code point when not. When a conformance requirement is
- defined in terms of characters or Unicode code points, a pair of
- <span title="code unit">code units</span> consisting of a high
- surrogate followed by a low surrogate must be treated as the single
- code point represented by the surrogate pair, but isolated
- surrogates must each be treated as the single code point with the
- value of the surrogate. <a href="#refsUNICODE">[UNICODE]</a></p>
+ <p>The term <dfn>Unicode code point</dfn> means a <i title="">Unicode scalar value</i> where
+ possible, and an isolated surrogate code point when not. When a conformance requirement is defined
+ in terms of characters or Unicode code points, a pair of <span title="code unit">code units</span>
+ consisting of a high surrogate followed by a low surrogate must be treated as the single code
+ point represented by the surrogate pair, but isolated surrogates must each be treated as the
+ single code point with the value of the surrogate. <a href="#refsUNICODE">[UNICODE]</a></p>
- <p>In this specification, the term <dfn>character</dfn>, when not
- qualified as <em>Unicode</em> character, is synonymous with the term
- <span>Unicode code point</span>.</p>
+ <p>In this specification, the term <dfn>character</dfn>, when not qualified as <em>Unicode</em>
+ character, is synonymous with the term <span>Unicode code point</span>.</p>
- <p>The term <dfn>Unicode character</dfn> is used to mean a <i
- title="">Unicode scalar value</i> (i.e. any Unicode code point that
- is not a surrogate code point). <a
+ <p>The term <dfn>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i>
+ (i.e. any Unicode code point that is not a surrogate code point). <a
href="#refsUNICODE">[UNICODE]</a></p>
- <p>The <dfn>code-unit length</dfn> of a string is the number of
- <span title="code unit">code units</span> in that string.</p>
+ <p>The <dfn>code-unit length</dfn> of a string is the number of <span title="code unit">code
+ units</span> in that string.</p>
- <p class="note">This complexity results from the historical decision
- to define the DOM API in terms of 16 bit (UTF-16) <span title="code
- unit">code units</span>, rather than in terms of <span
+ <p class="note">This complexity results from the historical decision to define the DOM API in
+ terms of 16 bit (UTF-16) <span title="code unit">code units</span>, rather than in terms of <span
title="Unicode character">Unicode characters</span>.</p>
+ <p>When a byte stream is to be <dfn>decoded as UTF-8, with error handling</dfn>, the user agent
+ must return the result of running the <span>utf-8 decoder</span> on that byte stream.</p>
+
<!--END dev-html-->
<!--FIXUP 2dcontext +1-->
@@ -2728,6 +2716,35 @@
<dl>
+ <dt>Unicode and Encoding</dt>
+
+ <dd>
+
+ <p>The Unicode character set is used to represent textual data, and the WHATWG Encoding standard
+ defines requirements around <span title="encoding">character encodings</span>. <a
+ href="#refsUNICODE">[UNICODE]</a></p>
+
+ <p class="note">This specification <a href="#encoding-terminology">introduces terminology</a>
+ based on the terms defined in those specifications, as described earlier.</p>
+
+ <p>The following terms are used as defined in the Encoding specification: <a
+ href="#refsENCODING">[ENCODING]</a></p>
+
+ <ul class="brief">
+
+ <li><dfn>Getting an encoding</dfn>
+
+ <li>The <dfn>encoder</dfn> and <dfn>decoder</dfn> algorithms for various encodings, including
+ the <dfn>utf-8 encoder</dfn> and <dfn>utf-8 decoder</dfn>
+
+ </ul>
+
+ <p class="note">The <span>utf-8 decoder</span> is distinct from the <i>utf-8 decode
+ algorithm</i>. The latter is not used by this specification.</p>
+
+ </dd>
+
+
<dt>XML</dt>
<dd>
@@ -3520,74 +3537,7 @@
matches of each other.</p>
- <div class="impl">
- <h3>UTF-8</h3>
-
- <p>When a user agent is required to <dfn title="decoded as UTF-8, with error handling">decode a
- byte string as UTF-8, with error handling</dfn>, it means that the byte stream must be converted
- to a Unicode string by interpreting it as UTF-8, except that any errors must be handled as
- described in the following list. Bytes in the following list are represented in hexadecimal. <a
- href="#refsRFC3629">[RFC3629]</a>
-
- <dl class="switch">
-
- <dt>One byte in the range FE to FF</dt>
-
-
- <dt><span title="overlong form">Overlong forms</span> (e.g. F0 80 80 A0)</dt>
-
- <dt>One byte in the range C0 to C1, followed by one byte in the range 80 to BF</dt> <!-- overlong ASCII (redundant with the previous line, really, but worth calling out separately as it's especially dangerous to miss this case) -->
-
-
- <dt>One byte in the range F0 to F4, followed by three bytes in the range 80 to BF that represent a code point above U+10FFFF</dt>
-
- <dt>One byte in the range F5 to F7, followed by three bytes in the range 80 to BF</dt> <!-- above U+10FFFF -->
-
- <dt>One byte in the range F8 to FB, followed by four bytes in the range 80 to BF</dt> <!-- above U+10FFFF -->
-
- <dt>One byte in the range FC to FD, followed by five bytes in the range 80 to BF</dt> <!-- above U+10FFFF -->
-
-
- <dt>One byte in the range C0 to FD that is not followed by a byte in the range 80 to BF</dt> <!-- too short -->
-
- <dt>One byte in the range E0 to FD, followed by a byte in the range 80 to BF that is not followed by a byte in the range 80 to BF</dt> <!-- too short -->
-
- <dt>One byte in the range F0 to FD, followed by two bytes in the range 80 to BF, the last of which is not followed by a byte in the range 80 to BF</dt> <!-- too short -->
-
- <dt>One byte in the range F8 to FD, followed by three bytes in the range 80 to BF, the last of which is not followed by a byte in the range 80 to BF</dt> <!-- too short -->
-
- <dt>One byte in the range FC to FD, followed by four bytes in the range 80 to BF, the last of which is not followed by a byte in the range 80 to BF</dt> <!-- too short -->
-
-
- <dt>Any byte sequence that represents a code point in the range U+D800 to U+DFFF</dt> <!-- surrogate halves -->
-
-
- <dd>The whole matched sequence must be replaced by a single U+FFFD
- REPLACEMENT CHARACTER.</dd>
-
-
- <dt>One byte in the range 80 to BF not preceded by a byte in the range 80 to FD</dt>
-
- <dt>One byte in the range 80 to BF preceded by a byte that is part of a complete UTF-8 sequence that does not include this byte</dt>
-
- <dt>One byte in the range 80 to BF preceded by a byte that is part of a sequence that has been replaced by a U+FFFD REPLACEMENT CHARACTER, either alone or as part of a sequence</dt>
-
- <dd>Each such byte must be replaced with a U+FFFD REPLACEMENT CHARACTER.</dd>
-
-
- </dl>
-
- <p>For the purposes of the above requirements, an <dfn>overlong form</dfn> in UTF-8 is a sequence
- that encodes a code point using more bytes than the minimum needed to encode that code point in
- UTF-8.</p>
-
- <p class="example">For example, the byte string "41 98 BA 42 E2 98 43 E2 98 BA E2 98" would be
- converted to the string "A��B�C☺�".</p>
-
- </div>
-
-
<h3>Common microsyntaxes</h3>
<p>There are various places in HTML that accept particular data types, such as dates or numbers.
@@ -7005,8 +6955,8 @@
<ol>
- <li>Encode the character into a sequence of octets as defined by
- UTF-8.</li>
+ <li>Encode the character into a sequence of octets as defined by the <span>utf-8 encoder</span>
+ algorithm. <a href="#refsENCODING">[ENCODING]</a></li>
<li>Replace the character with the percent-encoded form of those
octets. <a href="#refsRFC3986">[RFC3986]</a></li>
@@ -7045,8 +6995,8 @@
single 0x3F octet (an ASCII question mark) and skip the remaining
substeps for this character.</li>
- <li>Encode the character into a sequence of octets as defined by
- the encoding <var title="">encoding</var>.</li>
+ <li>Encode the character into a sequence of octets as defined by the <span>encoder</span>
+ algorithm for <var title="">encoding</var>. <a href="#refsENCODING">[ENCODING]</a></li>
<li>Replace the character with the percent-encoded form of those
octets. <a href="#refsRFC3986">[RFC3986]</a></li>
@@ -7913,35 +7863,29 @@
<h4>Extracting character encodings from <code>meta</code> elements</h4>
- <p>The <dfn>algorithm for extracting a character encoding from a
- <code>meta</code> element</dfn>, given a string <var
- title="">s</var>, is as follows. It either returns a character encoding or
+ <p>The <dfn>algorithm for extracting a character encoding from a <code>meta</code> element</dfn>,
+ given a string <var title="">s</var>, is as follows. It either returns a character encoding or
nothing.</p>
<ol> <!-- http://www.hixie.ch/tests/adhoc/html/parsing/encoding/all.html -->
- <li><p>Let <var title="">position</var> be a pointer into <var
- title="">s</var>, initially pointing at the start of the
- string.</p></li>
+ <li><p>Let <var title="">position</var> be a pointer into <var title="">s</var>, initially
+ pointing at the start of the string.</p></li>
- <li><p><i>Loop</i>: Find the first seven characters in <var
- title="">s</var> after <var title="">position</var> that are an
- <span>ASCII case-insensitive</span> match for the word "<code
- title="">charset</code>". If no such match is found, return nothing
- and abort these steps.</p></li>
+ <li><p><i>Loop</i>: Find the first seven characters in <var title="">s</var> after <var
+ title="">position</var> that are an <span>ASCII case-insensitive</span> match for the word "<code
+ title="">charset</code>". If no such match is found, return nothing and abort these
+ steps.</p></li>
- <li><p>Skip any <span title="space character">space
- characters</span> that immediately follow the word "<code
- title="">charset</code>" (there might not be any).</p></li>
+ <li><p>Skip any <span title="space character">space characters</span> that immediately follow the
+ word "<code title="">charset</code>" (there might not be any).</p></li>
- <li><p>If the next character is not a U+003D EQUALS SIGN (=),
- then move <var title="">position</var> to point just before that
- next character, and jump back to the step labeled
- <i>loop</i>.</p></li>
+ <li><p>If the next character is not a U+003D EQUALS SIGN (=), then move <var
+ title="">position</var> to point just before that next character, and jump back to the step
+ labeled <i>loop</i>.</p></li>
- <li><p>Skip any <span title="space character">space
- characters</span> that immediately follow the equals sign (there
- might not be any).</p></li>
+ <li><p>Skip any <span title="space character">space characters</span> that immediately follow the
+ equals sign (there might not be any).</p></li>
<li>
@@ -7951,7 +7895,8 @@
<dt>If it is a U+0022 QUOTATION MARK character (") and there is a later U+0022 QUOTATION MARK character (") in <var title="">s</var></dt>
<dt>If it is a U+0027 APOSTROPHE character (') and there is a later U+0027 APOSTROPHE character (') in <var title="">s</var></dt>
- <dd>Return the encoding corresponding to the string between this character and the next earliest occurrence of this character.</dd>
+ <dd>Return the result of <span>getting an encoding</span> from the substring that is between
+ this character and the next earliest occurrence of this character.</dd>
<dt>If it is an unmatched U+0022 QUOTATION MARK character (")</dt>
<dt>If it is an unmatched U+0027 APOSTROPHE character (')</dt>
@@ -7959,10 +7904,9 @@
<dd>Return nothing.</dd>
<dt>Otherwise</dt>
- <dd>Return the encoding corresponding to the string from this
- character to the first <span>space character</span> or U+003B
- SEMICOLON character (;), or the end of <var title="">s</var>,
- whichever comes first.</dd>
+ <dd>Return the result of <span>getting an encoding</span> from the substring that consists of
+ this character up to but not including the first <span>space character</span> or U+003B
+ SEMICOLON character (;), or the end of <var title="">s</var>, whichever comes first.</dd>
</dl>
@@ -7970,15 +7914,12 @@
</ol>
- <p class="note">This algorithm is distinct from those in the HTTP
- specification (for example, HTTP doesn't allow the use of single
- quotes and requires supporting a backslash-escape mechanism that is
- not supported by this algorithm<!-- not to mention not having any
- rules for error-handling, which is of course why we're having to
- define it ourselves -->). While the algorithm is used in contexts
- that, historically, were related to HTTP, the syntax as supported by
- implementations diverged some time ago. <a
- href="#refsHTTP">[HTTP]</a></p>
+ <p class="note">This algorithm is distinct from those in the HTTP specification (for example, HTTP
+ doesn't allow the use of single quotes and requires supporting a backslash-escape mechanism that
+ is not supported by this algorithm<!-- not to mention not having any rules for error-handling,
+ which is of course why we're having to define it ourselves -->). While the algorithm is used in
+ contexts that, historically, were related to HTTP, the syntax as supported by implementations
+ diverged some time ago. <a href="#refsHTTP">[HTTP]</a></p>
</div>
@@ -15565,38 +15506,31 @@
<dd>
- <p>The <span title="attr-meta-http-equiv-content-type">Encoding
- declaration state</span> is just an alternative form of setting
- the <code title="meta-charset">charset</code> attribute: it is a
- <span>character encoding declaration</span>. <span
- class="impl">This state's user agent requirements are all handled
- by the parsing section of the specification.</span></p>
+ <p>The <span title="attr-meta-http-equiv-content-type">Encoding declaration state</span> is just
+ an alternative form of setting the <code title="meta-charset">charset</code> attribute: it is a
+ <span>character encoding declaration</span>. <span class="impl">This state's user agent
+ requirements are all handled by the parsing section of the specification.</span></p>
- <p>For <code>meta</code> elements with an <code
- title="attr-meta-http-equiv">http-equiv</code> attribute in the
- <span title="attr-meta-http-equiv-content-type">Encoding
- declaration state</span>, the <code
- title="attr-meta-content">content</code> attribute must have a
- value that is an <span>ASCII case-insensitive</span> match for a
- string that consists of: the literal string "<code
- title="">text/html;</code>", optionally followed by any number of
- <span title="space character">space characters</span>, followed by
- the literal string "<code title="">charset=</code>", followed by
- the character encoding name of the <span>character encoding
+ <p>For <code>meta</code> elements with an <code title="attr-meta-http-equiv">http-equiv</code>
+ attribute in the <span title="attr-meta-http-equiv-content-type">Encoding declaration
+ state</span>, the <code title="attr-meta-content">content</code> attribute must have a value
+ that is an <span>ASCII case-insensitive</span> match for a string that consists of: the literal
+ string "<code title="">text/html;</code>", optionally followed by any number of <span
+ title="space character">space characters</span>, followed by the literal string "<code
+ title="">charset=</code>", followed by the <span title="encoding name">name</span> of the <span
+ title="encoding">character encoding</span> of the <span>character encoding
declaration</span>.</p>
- <p>A document must not contain both a <code>meta</code> element
- with an <code title="attr-meta-http-equiv">http-equiv</code>
- attribute in the <span
- title="attr-meta-http-equiv-content-type">Encoding declaration
- state</span> and a <code>meta</code> element with the <code
- title="attr-meta-charset">charset</code> attribute present.</p>
+ <p>A document must not contain both a <code>meta</code> element with an <code
+ title="attr-meta-http-equiv">http-equiv</code> attribute in the <span
+ title="attr-meta-http-equiv-content-type">Encoding declaration state</span> and a
+ <code>meta</code> element with the <code title="attr-meta-charset">charset</code> attribute
+ present.</p>
- <p>The <span title="attr-meta-http-equiv-content-type">Encoding
- declaration state</span> may be used in <span>HTML
- documents</span>, but elements with an <code
- title="attr-meta-http-equiv">http-equiv</code> attribute in that
- state must not be used in <span>XML documents</span>.</p>
+ <p>The <span title="attr-meta-http-equiv-content-type">Encoding declaration state</span> may be
+ used in <span>HTML documents</span>, but elements with an <code
+ title="attr-meta-http-equiv">http-equiv</code> attribute in that state must not be used in
+ <span>XML documents</span>.</p>
</dd>
@@ -15915,118 +15849,89 @@
<h5 id="charset">Specifying the document's character encoding</h5>
- <!-- READ ME WHEN EDITING: if we ever move this to the "writing
- HTML" section, then we have to duplicate the requirements in the
- parsing section for conformance checkers, and we have to make sure
- that the requirements for charset="" apply even in XML, for the
- <meta charset=""> polyglot hack. -->
+ <!-- READ ME WHEN EDITING: if we ever move this to the "writing HTML" section, then we have to
+ duplicate the requirements in the parsing section for conformance checkers, and we have to make
+ sure that the requirements for charset="" apply even in XML, for the <meta charset=""> polyglot
+ hack. -->
- <p>A <dfn>character encoding declaration</dfn> is a mechanism by
- which the character encoding used to store or transmit a document is
- specified.</p>
+ <p>A <dfn>character encoding declaration</dfn> is a mechanism by which the <span
+ title="encoding">character encoding</span> used to store or transmit a document is specified.</p>
- <p>The following restrictions apply to character encoding
- declarations:</p>
+ <p>The following restrictions apply to <span title="character encoding declaration">character
+ encoding declarations</span>:</p>
<ul>
- <li>The character encoding name given must be the name of the
- character encoding used to serialize the file.</li>
+ <li>The character encoding name given must be an <span>ASCII case-insensitive</span> match for
+ the <span title="encoding name">name</span> of the <span title="encoding">character
+ encoding</span> used to serialize the file. <a href="#refsENCODING">[ENCODING]</a></li>
- <li>The value must be a valid character encoding name, and must be
- an <span>ASCII case-insensitive</span> match for the
- <span>preferred MIME name</span> for that encoding. <a
- href="#refsIANACHARSET">[IANACHARSET]</a></li>
+ <li>The character encoding declaration must be serialized without the use of <span
+ title="syntax-charref">character references</span> or character escapes of any kind.</li>
- <li>The character encoding declaration must be serialized without
- the use of <span title="syntax-charref">character references</span>
- or character escapes of any kind.</li>
+ <li id="charset1024"><span title="" id="charset512">The element containing the character encoding
+ declaration must be serialized completely within the first 1024 bytes of the
+ document.</span></li> <!-- span is for historical reasons, to keep an old ID alive -->
- <li id="charset1024"><span title="" id="charset512">The element
- containing the character encoding declaration must be serialized
- completely within the first 1024 bytes of the document.</span></li>
- <!-- span is for historical reasons, to keep an old ID alive -->
-
</ul>
- <p>In addition, due to a number of restrictions on <code>meta</code>
- elements, there can only be one <code>meta</code>-based character
- encoding declaration per document.</p> <!-- conformance criteria for
- this one are given in the XML spec, the <meta> section just after
- defining charset="", and the character encoding pragma section. Note
- that you _can_ have two character encoding declaration per document,
- if the document is an XML document and one is an XML declaration,
- the other is <meta charset>, and the encoding is UTF-8. -->
+ <p>In addition, due to a number of restrictions on <code>meta</code> elements, there can only be
+ one <code>meta</code>-based character encoding declaration per document.</p> <!-- conformance
+ criteria for this one are given in the XML spec, the <meta> section just after defining
+ charset="", and the character encoding pragma section. Note that you _can_ have two character
+ encoding declarations per document, if the document is an XML document and one is an XML
+ declaration, the other is <meta charset>, and the encoding is UTF-8. -->
- <p>If an <span title="HTML documents">HTML document</span> does not
- start with a BOM, and its encoding is not explicitly given by <span
- title="Content-Type">Content-Type metadata</span>, and the document
- is not <span>an <code>iframe</code> <code
- title="attr-iframe-srcdoc">srcdoc</code> document</span>, then the
- character encoding used must be an <span>ASCII-compatible character
- encoding</span>, and the encoding must be specified using a
- <code>meta</code> element with a <code
- title="attr-meta-charset">charset</code> attribute or a
- <code>meta</code> element with an <code
- title="attr-meta-http-equiv">http-equiv</code> attribute in the
- <span title="attr-meta-http-equiv-content-type">Encoding declaration
- state</span>.</p>
+ <p>If an <span title="HTML documents">HTML document</span> does not start with a BOM, and its
+ <span>encoding</span> is not explicitly given by <span title="Content-Type">Content-Type
+ metadata</span>, and the document is not <span>an <code>iframe</code> <code
+ title="attr-iframe-srcdoc">srcdoc</code> document</span>, then the character encoding used must be
+ an <span>ASCII-compatible character encoding</span>, and the encoding must be specified using a
+ <code>meta</code> element with a <code title="attr-meta-charset">charset</code> attribute or a
+ <code>meta</code> element with an <code title="attr-meta-http-equiv">http-equiv</code> attribute
+ in the <span title="attr-meta-http-equiv-content-type">Encoding declaration state</span>.</p>
- <p class="note">A character encoding declaration is required (either
- in the <span title="Content-Type">Content-Type metadata</span> or
- explicitly in the file) even if the encoding is US-ASCII, because a character
- encoding is needed to process non-ASCII characters entered by the
+ <p class="note">A character encoding declaration is required (either in the <span
+ title="Content-Type">Content-Type metadata</span> or explicitly in the file) even if the encoding
+ is US-ASCII, because a character encoding is needed to process non-ASCII characters entered by the
user in forms, in URLs generated by scripts, and so forth.</p>
- <p>If the document is <span>an <code>iframe</code> <code
- title="attr-iframe-srcdoc">srcdoc</code> document</span>, the
- document must not have a <span>character encoding
- declaration</span>. (In this case, the source is already decoded,
- since it is part of the document that contained the
+ <p>If the document is <span>an <code>iframe</code> <code title="attr-iframe-srcdoc">srcdoc</code>
+ document</span>, the document must not have a <span>character encoding declaration</span>. (In
+ this case, the source is already decoded, since it is part of the document that contained the
<code>iframe</code>.)</p>
- <p>If an <span title="HTML documents">HTML document</span> contains
- a <code>meta</code> element with a <code
- title="attr-meta-charset">charset</code> attribute or a
- <code>meta</code> element with an <code
- title="attr-meta-http-equiv">http-equiv</code> attribute in the
- <span title="attr-meta-http-equiv-content-type">Encoding declaration
- state</span>, then the character encoding used must be an
- <span>ASCII-compatible character encoding</span>.</p>
+ <p>If an <span title="HTML documents">HTML document</span> contains a <code>meta</code> element
+ with a <code title="attr-meta-charset">charset</code> attribute or a <code>meta</code> element
+ with an <code title="attr-meta-http-equiv">http-equiv</code> attribute in the <span
+ title="attr-meta-http-equiv-content-type">Encoding declaration state</span>, then the character
+ encoding used must be an <span>ASCII-compatible character encoding</span>.</p>
- <p>Authors are encouraged to use UTF-8. Conformance checkers may
- advise authors against using legacy encodings. <a
- href="#refsRFC3629">[RFC3629]</a></p>
+ <p>Authors should use UTF-8. Conformance checkers may advise authors against using legacy
+ encodings. <a href="#refsRFC3629">[RFC3629]</a></p>
<div class="impl">
- <p>Authoring tools should default to using UTF-8 for newly-created
- documents. <a href="#refsRFC3629">[RFC3629]</a></p>
+ <p>Authoring tools should default to using UTF-8 for newly-created documents. <a
+ href="#refsRFC3629">[RFC3629]</a></p>
</div>
- <p>Encodings in which a series of bytes in the range 0x20 to 0x7E
- can encode characters other than the corresponding characters in the
- range U+0020 to U+007E represent a potential security vulnerability:
- a user agent that does not support the encoding (or does not support
- the label used to declare the encoding, or does not use the same
- mechanism to detect the encoding of unlabelled content as another
- user agent) might end up interpreting technically benign plain text
- content as HTML tags and JavaScript. For example, this applies to
- encodings in which the bytes corresponding to "<code
- title=""><script></code>" in ASCII can encode a different
- string. Authors should not use such encodings, which are known to
- include JIS_C6226-1983<!-- aka JIS-X-0208, x-JIS0208 -->,
- JIS_X0212-1990<!-- aka JIS-X-0212 -->, HZ-GB-2312<!-- has crazy
- handling of ASCII "~" -->, JOHAB <!-- a supplementary encoding in KS
- C 5601-1992 Annex 3 (= KS X 1001:1998 Annex 3) --> (Windows code
- page 1361), encodings based on ISO-2022<!--
- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 and
- http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-October/023797.html
- -->, and encodings based on EBCDIC. Furthermore, authors must not
- use the CESU-8, UTF-7, BOCU-1 and SCSU encodings, which also fall
- into this category, because these encodings were never intended for
- use for Web content.
+ <p>Encodings in which a series of bytes in the range 0x20 to 0x7E can encode characters other than
+ the corresponding characters in the range U+0020 to U+007E represent a potential security
+ vulnerability: a user agent that does not support the encoding (or does not support the label used
+ to declare the encoding, or does not use the same mechanism to detect the encoding of unlabelled
+ content as another user agent) might end up interpreting technically benign plain text content as
+ HTML tags and JavaScript. Authors should therefore not use these encodings. For example, this
+ applies to encodings in which the bytes corresponding to "<code title=""><script></code>" in
+ ASCII can encode a different string. Authors should not use such encodings, which are known to
+ include JIS_C6226-1983<!-- aka JIS-X-0208, x-JIS0208 -->, JIS_X0212-1990<!-- aka JIS-X-0212 -->,
+ HZ-GB-2312<!-- has crazy handling of ASCII "~" -->, JOHAB <!-- a supplementary encoding in KS C
+ 5601-1992 Annex 3 (= KS X 1001:1998 Annex 3) --> (Windows code page 1361), encodings based on
+ ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 and
+ http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-October/023797.html -->, and encodings
+ based on EBCDIC. Furthermore, authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings,
+ which also fall into this category; these encodings were never intended for use for Web content.
<a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types -->
<a href="#refsRFC1842">[RFC1842]</a><!-- HZ-GB-2312 -->
<a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP -->
@@ -16042,27 +15947,24 @@
<!-- no idea what to reference for JOHAB or EBCDIC, so... -->
</p>
- <p>Authors should not use UTF-32, as the encoding detection
- algorithms described in this specification intentionally do not
- distinguish it from UTF-16. <a href="#refsUNICODE">[UNICODE]</a></p>
+ <p>Authors should not use UTF-32, as the encoding detection algorithms described in this
+ specification intentionally do not distinguish it from UTF-16. <a
+ href="#refsUNICODE">[UNICODE]</a></p>
- <p class="note">Using non-UTF-8 encodings can have unexpected
- results on form submission and URL encodings, which use the
- <span>document's character encoding</span> by default.</p>
+ <p class="note">Using non-UTF-8 encodings can have unexpected results on form submission and URL
+ encodings, which use the <span>document's character encoding</span> by default.</p>
- <p>In XHTML, the XML declaration should be used for inline character
- encoding information, if necessary.</p>
+ <p>In XHTML, the XML declaration should be used for inline character encoding information, if
+ necessary.</p>
<div class="example">
- <p>In HTML, to declare that the character encoding is UTF-8, the
- author could include the following markup near the top of the
- document (in the <code>head</code> element):</p>
+ <p>In HTML, to declare that the character encoding is UTF-8, the author could include the
+ following markup near the top of the document (in the <code>head</code> element):</p>
<pre><meta charset="utf-8"></pre>
- <p>In XML, the XML declaration would be used instead, at the very
- top of the markup:</p>
+ <p>In XML, the XML declaration would be used instead, at the very top of the markup:</p>
<pre><?xml version="1.0" encoding="utf-8"?></pre>
@@ -16629,10 +16531,10 @@
<p>The <dfn title="attr-script-charset"><code>charset</code></dfn> attribute gives the character
encoding of the external script resource. The attribute must not be specified if the <code
title="attr-script-src">src</code> attribute is not present. If the attribute is set, its value
- must be a valid character encoding name, must be an <span>ASCII case-insensitive</span> match for
- the <span>preferred MIME name</span> for that encoding, and must match the encoding given in the
- <code title="">charset</code> parameter of the <span title="Content-Type">Content-Type
- metadata</span> of the external file, if any. <a href="#refsIANACHARSET">[IANACHARSET]</a></p>
+ must be an <span>ASCII case-insensitive</span> match for the <span title="encoding
+ name">name</span> of an <span>encoding</span>, and must specify the same <span>encoding</span> as
+ the <code title="">charset</code> parameter of the <span title="Content-Type">Content-Type
+ metadata</span> of the external file, if any. <a href="#refsENCODING">[ENCODING]</a></p>
<p>The <dfn title="attr-script-async"><code>async</code></dfn> and <dfn
title="attr-script-defer"><code>defer</code></dfn> attributes are <span title="boolean
@@ -16913,8 +16815,8 @@
<p>If the <code>script</code> element has a <code title="attr-script-charset">charset</code>
attribute, then let <var>the script block's character encoding</var> for this
- <code>script</code> element be the encoding given by the <code
- title="attr-script-charset">charset</code> attribute.</p>
+ <code>script</code> element be the result of <span>getting an encoding</span> from the value of
+ the <code title="attr-script-charset">charset</code> attribute.</p>
<p>Otherwise, let <var>the script block's fallback character encoding</var> for this
<code>script</code> element be the same as <span title="document's character encoding">the
@@ -53104,15 +53006,12 @@
elements</span>, some of which can represent editable values that
can be submitted to a server for processing.</p>
- <p>The <dfn
- title="attr-form-accept-charset"><code>accept-charset</code></dfn>
- attribute gives the character encodings that are to be used for the
- submission. If specified, the value must be an <span>ordered set of
- unique space-separated tokens</span> that are <span>ASCII
- case-insensitive</span>, and each token must be an <span>ASCII
- case-insensitive</span> match for the <span>preferred MIME
- name</span> of an <span>ASCII-compatible character encoding</span>.
- <a href="#refsIANACHARSET">[IANACHARSET]</a></p>
+ <p>The <dfn title="attr-form-accept-charset"><code>accept-charset</code></dfn> attribute gives the
+ character encodings that are to be used for the submission. If specified, the value must be an
+ <span>ordered set of unique space-separated tokens</span> that are <span>ASCII
+ case-insensitive</span>, and each token must be an <span>ASCII case-insensitive</span> match for
+ the <span title="encoding name">name</span> of an <span>ASCII-compatible character
+ encoding</span>. <a href="#refsENCODING">[ENCODING]</a></p>
<p>The <dfn title="attr-form-name"><code>name</code></dfn> attribute
represents the <code>form</code>'s name within the <code
@@ -67105,6 +67004,56 @@
</div>
+ <h5>Selecting a form submission encoding</h5>
+
+ <p>If the user agent is to <dfn title="picking an encoding for the form">pick an encoding for a
+ form</dfn>, optionally with an <i>allow non-ASCII-compatible encodings</i> flag set, it must run
+ the following substeps:</p>
+
+ <ol>
+
+ <li><p>Let <var title="">input</var> be the value of the <code>form</code> element's <code
+ title="attr-form-accept-charset">accept-charset</code> attribute.</p></li>
+
+ <li><p>Let <var title="">candidate encoding labels</var> be the result of <span title="split a
+ string on spaces">splitting <var title="">input</var> on spaces</span>.</p></li>
+
+ <li><p>Let <var title="">candidate encodings</var> be an empty list of <span
+ title="encoding">character encodings</span>.</p></li>
+
+ <li><p>For each token in <var title="">candidate encoding labels</var> in turn (in the order in
+ which they were found in <var title="">input</var>), <span title="getting an encoding">get an
+ encoding</span> for the token and, if this does not result in failure, append the
+ <span>encoding</span> to <var title="">candidate encodings</var>.</p></li>
+
+ <li><p>If the <i>allow non-ASCII-compatible encodings</i> flag is not set, remove any encodings
+ that are not <span title="ASCII-compatible character encoding">ASCII-compatible character
+ encodings</span> from <var title="">candidate encodings</var>.</p></li>
+
+ <li><p>If <var title="">candidate encodings</var> is empty, return UTF-8 and abort these
+ steps.</p></li>
+
+ <li>
+
+ <p>Each character encoding in <var title="">candidate encodings</var> can represent a finite
+ number of characters. (For example, UTF-8 can represent all 1.1 million or so Unicode code
+ points, while Windows-1252 can only represent 256.)</p>
+
+ <p>For each encoding in <var title="">candidate encodings</var>, determine how many of the
+ characters in the names and values of the entries in the <var title="">form data set</var> the
+ encoding can represent (without ignoring duplicates). Let <var title="">max</var> be the
+ highest such count. (For UTF-8, <var title="">max</var> would equal the number of characters
+ in the names and values of the entries in the <var title="">form data set</var>.)</p>
+
+ <p>Return the first encoding in <var title="">candidate encodings</var> that can encode <var
+ title="">max</var> characters in the names and values of the entries in the <var title="">form
+ data set</var>.</p>
+
+ </li>
+
+ </ol>
+
+
<h5>URL-encoded form data</h5>
<p class="note">This form data set encoding is in many ways an
@@ -67131,26 +67080,20 @@
<li>
<p>If the <code>form</code> element has an <code
- title="attr-form-accept-charset">accept-charset</code> attribute,
- then, taking into account the characters found in the <var
- title="">form data set</var>'s names and values, and the character
- encodings supported by the user agent, select a character encoding
- from the list given in the <code>form</code>'s <code
- title="attr-form-accept-charset">accept-charset</code> attribute
- that is an <span>ASCII-compatible character encoding</span>. If
- none of the encodings are supported, or if none are listed, then
- let the selected character encoding be UTF-8.</p>
+ title="attr-form-accept-charset">accept-charset</code> attribute, let the selected character
+ encoding be the result of <span>picking an encoding for the form</span>.</p>
- <p>Otherwise, if the <span>document's character encoding</span> is
- an <span>ASCII-compatible character encoding</span>, then that is
+ <p>Otherwise, if the <code>form</code> element has no <code
+ title="attr-form-accept-charset">accept-charset</code> attribute, but the <span>document's
+ character encoding</span> is an <span>ASCII-compatible character encoding</span>, then that is
the selected character encoding.</p>
<p>Otherwise, let the selected character encoding be UTF-8.</p>
</li>
- <li><p>Let <var title="">charset</var> be the <span>preferred MIME
- name</span> of the selected character encoding.</p></li>
+ <li><p>Let <var title="">charset</var> be the <span title="encoding name">name</span> of the
+ selected <span title="encoding">character encoding</span>.</p></li>
<li>
@@ -67181,9 +67124,8 @@
<li>
- <p>Encode the entry's name and value using the selected
- character encoding. The entry's name and value are now byte
- strings.</p>
+ <p>Encode the entry's name and value using the <span>encoder</span> for the selected character
+ encoding. The entry's name and value are now byte strings.</p>
</li>
@@ -67381,7 +67323,7 @@
component of the first such pair, when decoded as US-ASCII, is the
name of a supported character encoding, then let <var
title="">encoding</var> be that character encoding (replacing the
- default passed to the algorithm).</p></li>
+ default passed to the algorithm).</p></li> <!-- XXX -->
<li><p>Convert the name and value components of each name-value
pair in <var title="">pairs</var> to Unicode by interpreting the
@@ -67425,26 +67367,20 @@
<code>form</code> element described in the next paragraph.)</p>
<p>Otherwise, if the <code>form</code> element has an <code
- title="attr-form-accept-charset">accept-charset</code> attribute,
- then, taking into account the characters found in the <var
- title="">form data set</var>'s names and values, and the character
- encodings supported by the user agent, select a character encoding
- from the list given in the <code>form</code>'s <code
- title="attr-form-accept-charset">accept-charset</code> attribute
- that is an <span>ASCII-compatible character encoding</span>. If
- none of the encodings are supported, or if none are listed, then
- let the selected character encoding be UTF-8.</p>
+ title="attr-form-accept-charset">accept-charset</code> attribute, let the selected character
+ encoding be the result of <span>picking an encoding for the form</span>.</p>
- <p>Otherwise, if the <span>document's character encoding</span> is
- an <span>ASCII-compatible character encoding</span>, then that is
+ <p>Otherwise, if the <code>form</code> element has no <code
+ title="attr-form-accept-charset">accept-charset</code> attribute, but the <span>document's
+ character encoding</span> is an <span>ASCII-compatible character encoding</span>, then that is
the selected character encoding.</p>
<p>Otherwise, let the selected character encoding be UTF-8.</p>
</li>
- <li><p>Let <var title="">charset</var> be the <span>preferred MIME
- name</span> of the selected character encoding.</p></li>
+ <li><p>Let <var title="">charset</var> be the <span title="encoding name">name</span> of the
+ selected <span title="encoding">character encoding</span>.</p></li>
<li>
@@ -67553,22 +67489,18 @@
that it isn't limited to ASCII-compatible encodings -->
<p>If the <code>form</code> element has an <code
- title="attr-form-accept-charset">accept-charset</code> attribute,
- then, taking into account the characters found in the <var
- title="">form data set</var>'s names and values, and the character
- encodings supported by the user agent, select a character encoding
- from the list given in the <code>form</code>'s <code
- title="attr-form-accept-charset">accept-charset</code> attribute.
- If none of the encodings are supported, or if none are listed,
- then let the selected character encoding be UTF-8.</p>
+ title="attr-form-accept-charset">accept-charset</code> attribute, let the selected character
+ encoding be the result of <span>picking an encoding for the form</span>, with the <i>allow
+ non-ASCII-compatible encodings</i> flag unset.</p>
- <p>Otherwise, the selected character encoding is the
- <span>document's character encoding</span>.</p>
+ <p>Otherwise, if the <code>form</code> element has no <code
+ title="attr-form-accept-charset">accept-charset</code> attribute, then that is the selected
+ character encoding.</p>
</li>
- <li><p>Let <var title="">charset</var> be the <span>preferred MIME
- name</span> of the selected character encoding.</p></li>
+ <li><p>Let <var title="">charset</var> be the <span title="encoding name">name</span> of the
+ selected <span title="encoding">character encoding</span>.</p></li>
<li><p>If the entry's name is "<code
title="attr-fe-name-charset">_charset_</code>" and its type is
@@ -67601,7 +67533,7 @@
</li>
- <li><p>Encode <var title="">result</var> using the selected
+ <li><p>Encode <var title="">result</var> using the <span>encoder</span> for the selected
character encoding and return the resulting byte stream.</p></li>
</ol>
@@ -96046,15 +95978,9 @@
<h3>Event definitions</h3>
- <p>Messages in <span>server-sent events</span>, <span>Web
- sockets</span>, <span>cross-document messaging</span>, and
- <span>channel messaging</span> use the <dfn
- title="event-message"><code>message</code></dfn> event.
- <!--END complete-->
- <a href="#refsEVENTSOURCE">[EVENTSOURCE]</a>
- <a href="#refsWEBSOCKET">[WEBSOCKET]</a>
- <!--START complete-->
- </p>
+ <p>Messages in <span>server-sent events</span>, <span>Web sockets</span>, <span>cross-document
+ messaging</span>, and <span>channel messaging</span> use the <dfn
+ title="event-message"><code>message</code></dfn> event. </p>
<p>The following interface is defined for this event:</p>
@@ -101433,78 +101359,58 @@
invalid UTF-8 byte sequences in a UTF-8 input byte stream) are
errors that conformance checkers are expected to report.</p>
- <p>Any byte or sequence of bytes in the original byte stream that is
- <span>misinterpreted for compatibility</span> is a <span>parse
- error</span>.</p>
-
<h5>Determining the character encoding</h5>
- <p>In some cases, it might be impractical to unambiguously determine
- the encoding before parsing the document. Because of this, this
- specification provides for a two-pass mechanism with an optional
- pre-scan. Implementations are allowed, as described below, to apply
- a simplified parsing algorithm to whatever bytes they have available
- before beginning to parse the document. Then, the real parser is
- started, using a tentative encoding derived from this pre-parse and
- other out-of-band metadata. If, while the document is being loaded,
- the user agent discovers a character encoding declaration that conflicts with
- this information, then the parser can get reinvoked to perform a
- parse of the document with the real encoding.</p>
+ <p>In some cases, it might be impractical to unambiguously determine the encoding before parsing
+ the document. Because of this, this specification provides for a two-pass mechanism with an
+ optional pre-scan. Implementations are allowed, as described below, to apply a simplified parsing
+ algorithm to whatever bytes they have available before beginning to parse the document. Then, the
+ real parser is started, using a tentative encoding derived from this pre-parse and other
+ out-of-band metadata. If, while the document is being loaded, the user agent discovers a character
+ encoding declaration that conflicts with this information, then the parser can get reinvoked to
+ perform a parse of the document with the real encoding.</p>
- <p id="documentEncoding">User agents must use the following
- algorithm, called the <dfn>encoding sniffing algorithm</dfn>, to
- determine the character encoding to use when decoding a document in
- the first pass. This algorithm takes as input any out-of-band
- metadata available to the user agent (e.g. the <span
- title="Content-Type">Content-Type metadata</span> of the document)
- and all the bytes available so far, and returns a character encoding and a
- <dfn title="concept-encoding-confidence">confidence</dfn>. The
- confidence is either <i>tentative</i>, <i>certain</i>, or
- <i>irrelevant</i>. The encoding used, and whether the confidence in
- that encoding is <i>tentative</i> or <i>certain</i>, is <a
- href="#meta-charset-during-parse">used during the parsing</a> to
- determine whether to <span>change the encoding</span>. If no
- encoding is necessary, e.g. because the parser is operating on a
- Unicode stream and doesn't have to use a character encoding at all, then the
- <span title="concept-encoding-confidence">confidence</span> is
+ <p id="documentEncoding">User agents must use the following algorithm, called the <dfn>encoding
+ sniffing algorithm</dfn>, to determine the character encoding to use when decoding a document in
+ the first pass. This algorithm takes as input any out-of-band metadata available to the user agent
+ (e.g. the <span title="Content-Type">Content-Type metadata</span> of the document) and all the
+ bytes available so far, and returns a character encoding and a <dfn
+ title="concept-encoding-confidence">confidence</dfn>. The confidence is either <i>tentative</i>,
+ <i>certain</i>, or <i>irrelevant</i>. The encoding used, and whether the confidence in that
+ encoding is <i>tentative</i> or <i>certain</i>, is <a href="#meta-charset-during-parse">used
+ during the parsing</a> to determine whether to <span>change the encoding</span>. If no encoding is
+ necessary, e.g. because the parser is operating on a Unicode stream and doesn't have to use a
+ character encoding at all, then the <span title="concept-encoding-confidence">confidence</span> is
<i>irrelevant</i>.</p>
<ol>
<li>
- <p>If the user has explicitly instructed the user agent to
- override the document's character encoding with a specific
- encoding, optionally return that encoding with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>certain</i> and abort these steps.</p>
+ <p>If the user has explicitly instructed the user agent to override the document's character
+ encoding with a specific encoding, optionally return that encoding with the <span
+ title="concept-encoding-confidence">confidence</span> <i>certain</i> and abort these steps.</p>
- <p class="note">Typically, user agents remember such user requests
- across sessions, and in some cases apply them to documents in
- <code>iframe</code>s as well.</p>
+ <p class="note">Typically, user agents remember such user requests across sessions, and in some
+ cases apply them to documents in <code>iframe</code>s as well.</p>
</li>
<li>
- <p>The user agent may wait for more bytes of the resource to be
- available, either in this step or at any later step in this
- algorithm. For instance, a user agent might wait 500ms or 1024
- bytes, whichever came first. In general preparsing the source to
- find the encoding improves performance, as it reduces the need to
- throw away the data structures used when parsing upon finding the
- encoding information. However, if the user agent delays too long
- to obtain data to determine the encoding, then the cost of the
- delay could outweigh any performance improvements from the
- preparse.</p>
+ <p>The user agent may wait for more bytes of the resource to be available, either in this step
+ or at any later step in this algorithm. For instance, a user agent might wait 500ms or 1024
+ bytes, whichever came first. In general preparsing the source to find the encoding improves
+ performance, as it reduces the need to throw away the data structures used when parsing upon
+ finding the encoding information. However, if the user agent delays too long to obtain data to
+ determine the encoding, then the cost of the delay could outweigh any performance improvements
+ from the preparse.</p>
- <p class="note">The authoring conformance requirements for
- character encoding declarations limit them to only appearing <a
- href="#charset1024">in the first 1024 bytes</a>. User agents are
- therefore encouraged to use the prescan algorithm below (as
- invoked by these steps) on the first 1024 bytes, but not to stall
- beyond that.</p>
+ <p class="note">The authoring conformance requirements for character encoding declarations limit
+ them to only appearing <a href="#charset1024">in the first 1024 bytes</a>. User agents are
+ therefore encouraged to use the prescan algorithm below (as invoked by these steps) on the first
+ 1024 bytes, but not to stall beyond that.</p>
</li>
@@ -101516,14 +101422,11 @@
Content-Type: text/html; charset=GB2312
-->
- <p>For each of the rows in the following table, starting with the
- first one and going down, if there are as many or more bytes
- available than the number of bytes in the first column, and the
- first bytes of the file match the bytes given in the first column,
- then return the encoding given in the cell in the second column of
- that row, with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>certain</i>, and abort these steps:</p>
+ <p>For each of the rows in the following table, starting with the first one and going down, if
+ there are as many or more bytes available than the number of bytes in the first column, and the
+ first bytes of the file match the bytes given in the first column, then return the encoding
+ given in the cell in the second column of that row, with the <span
+ title="concept-encoding-confidence">confidence</span> <i>certain</i>, and abort these steps:</p>
<!-- this table is present in several forms in this file; keep them in sync -->
<table>
@@ -101556,38 +101459,30 @@
-->
</table>
- <p class="note">This step looks for Unicode Byte Order Marks
- (BOMs).</p>
+ <p class="note">This step looks for Unicode Byte Order Marks (BOMs).</p>
- <p class="note">That this step happens before the next one
- honoring the HTTP <code>Content-Type</code> header is a
- <span>willful violation</span> of the HTTP specification,
- motivated by a desire to be maximally compatible with legacy
- content. <a href="#refsHTTP">[HTTP]</a></p>
+ <p class="note">That this step happens before the next one honoring the HTTP
+ <code>Content-Type</code> header is a <span>willful violation</span> of the HTTP specification,
+ motivated by a desire to be maximally compatible with legacy content. <a
+ href="#refsHTTP">[HTTP]</a></p>
</li>
- <li><p>If the transport layer specifies a character encoding, and it is
- supported, return that encoding with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>certain</i>, and abort these steps.</p></li>
+ <li><p>If the transport layer specifies a character encoding, and it is supported, return that
+ encoding with the <span title="concept-encoding-confidence">confidence</span> <i>certain</i>, and
+ abort these steps.</p></li>
<li>
- <p>Optionally <span title="prescan a byte stream to determine its
- encoding">prescan the byte stream to determine its
- encoding</span>. The <var title="">end condition</var> is that the
- user agent decides that scanning further bytes would not be
- efficient. User agents are encouraged to only prescan the first
- 1024 bytes. User agents may decide that scanning <em>any</em>
- bytes is not efficient, in which case these substeps are entirely
- skipped.</p>
+ <p>Optionally <span title="prescan a byte stream to determine its encoding">prescan the byte
+ stream to determine its encoding</span>. The <var title="">end condition</var> is that the user
+ agent decides that scanning further bytes would not be efficient. User agents are encouraged to
+ only prescan the first 1024 bytes. User agents may decide that scanning <em>any</em> bytes is
+ not efficient, in which case these substeps are entirely skipped.</p>
- <p>The aforementioned algorithm either aborts unsuccessfully or
- returns a character encoding. If it returns a character encoding,
- then this algorithm must be aborted, returning the same encoding,
- with <span title="concept-encoding-confidence">confidence</span>
- <i>tentative</i>.</p>
+ <p>The aforementioned algorithm either aborts unsuccessfully or returns a character encoding. If
+ it returns a character encoding, then this algorithm must be aborted, returning the same
+ encoding, with <span title="concept-encoding-confidence">confidence</span> <i>tentative</i>.</p>
</li>
@@ -101624,54 +101519,42 @@
</li>
- <li><p>Otherwise, if the user agent has information on the likely
- encoding for this page, e.g. based on the encoding of the page when
- it was last visited, then return that encoding, with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>tentative</i>, and abort these steps.</p></li>
+ <li><p>Otherwise, if the user agent has information on the likely encoding for this page, e.g.
+ based on the encoding of the page when it was last visited, then return that encoding, with the
+ <span title="concept-encoding-confidence">confidence</span> <i>tentative</i>, and abort these
+ steps.</p></li>
<li>
- <p>The user agent may attempt to autodetect the character encoding
- from applying frequency analysis or other algorithms to the data
- stream. Such algorithms may use information about the resource
- other than the resource's contents, including the address of the
- resource. If autodetection succeeds in determining a character
- encoding, and that encoding is a supported encoding, then return
- that encoding, with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>tentative</i>, and abort these steps. <a
- href="#refsUNIVCHARDET">[UNIVCHARDET]</a></p>
+ <p>The user agent may attempt to autodetect the character encoding from applying frequency
+ analysis or other algorithms to the data stream. Such algorithms may use information about the
+ resource other than the resource's contents, including the address of the resource. If
+ autodetection succeeds in determining a character encoding, and that encoding is a supported
+ encoding, then return that encoding, with the <span
+ title="concept-encoding-confidence">confidence</span> <i>tentative</i>, and abort these steps.
+ <a href="#refsUNIVCHARDET">[UNIVCHARDET]</a></p>
- <p class="note">The UTF-8 encoding has a highly detectable bit
- pattern. Documents that contain bytes with values greater than
- 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
- while documents with byte sequences that do not match it are very
- likely not. User-agents are therefore encouraged to search for
- this common encoding. <a href="#refsPPUTF8">[PPUTF8]</a> <a
+ <p class="note">The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
+ bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
+ while documents with byte sequences that do not match it are very likely not. User-agents are
+ therefore encouraged to search for this common encoding. <a href="#refsPPUTF8">[PPUTF8]</a> <a
href="#refsUTF8DET">[UTF8DET]</a></p>
</li>
<li>
- <p>Otherwise, return an implementation-defined or user-specified
- default character encoding, with the <span
- title="concept-encoding-confidence">confidence</span>
- <i>tentative</i>.</p>
+ <p>Otherwise, return an implementation-defined or user-specified default character encoding,
+ with the <span title="concept-encoding-confidence">confidence</span> <i>tentative</i>.</p>
- <p>In controlled environments or in environments where the
- encoding of documents can be prescribed (for example, for user
- agents intended for dedicated use in new networks), the
- comprehensive <code title="">UTF-8</code> encoding is
- suggested.</p>
+ <p>In controlled environments or in environments where the encoding of documents can be
+ prescribed (for example, for user agents intended for dedicated use in new networks), the
+ comprehensive <code title="">UTF-8</code> encoding is suggested.</p>
- <p>In other environments, the default encoding is typically
- dependent on the user's locale (an approximation of the languages,
- and thus often encodings, of the pages that the user is likely to
- frequent). The following table gives suggested defaults based on
- the user's locale, for compatibility with legacy content. Locales
- are identified by BCP 47 language tags. <a
+ <p>In other environments, the default encoding is typically dependent on the user's locale (an
+ approximation of the languages, and thus often encodings, of the pages that the user is likely
+ to frequent). The following table gives suggested defaults based on the user's locale, for
+ compatibility with legacy content. Locales are identified by BCP 47 language tags. <a
href="#refsBCP47">[BCP47]</a></p>
<!-- based on mozilla 1.9.1 localizations:
@@ -101810,29 +101693,24 @@
</ol>
- <p>The <span>document's character encoding</span> must immediately
- be set to the value returned from this algorithm, at the same time
- as the user agent uses the returned value to select the decoder to
- use for the input byte stream.</p>
+ <p>The <span>document's character encoding</span> must immediately be set to the value returned
+ from this algorithm, at the same time as the user agent uses the returned value to select the
+ decoder to use for the input byte stream.</p>
<hr>
- <p>When an algorithm requires a user agent to <dfn>prescan a byte
- stream to determine its encoding</dfn>, given some defined <var
- title="">end condition</var>, then it must run the following steps.
- These steps either abort unsuccessfully or return a character
- encoding.</p>
+ <p>When an algorithm requires a user agent to <dfn>prescan a byte stream to determine its
+ encoding</dfn>, given some defined <var title="">end condition</var>, then it must run the
+ following steps. These steps either abort unsuccessfully or return a character encoding.</p>
<ol>
<li>
- <p>Let <var title="">position</var> be a pointer to a byte in the
- input byte stream, initially pointing at the first byte. If at any
- point during these steps the user agent either runs out of bytes
- or reaches its <var title="">end condition</var>, then abort the
- <span>prescan a byte stream to determine its encoding</span>
- algorithm unsuccessfully.</p>
+ <p>Let <var title="">position</var> be a pointer to a byte in the input byte stream, initially
+ pointing at the first byte. If at any point during these steps the user agent either runs out of
+ bytes or reaches its <var title="">end condition</var>, then abort the <span>prescan a byte
+ stream to determine its encoding</span> algorithm unsuccessfully.</p>
</li>
@@ -101845,11 +101723,10 @@
<dt>A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--')</dt>
<dd>
- <p>Advance the <var title="">position</var> pointer so that it
- points at the first 0x3E byte which is preceded by two 0x2D
- bytes (i.e. at the end of an ASCII '-->' sequence) and comes
- after the 0x3C byte that was found. (The two 0x2D bytes can be
- the same as the those in the '<!--' sequence.)</p>
+ <p>Advance the <var title="">position</var> pointer so that it points at the first 0x3E byte
+ which is preceded by two 0x2D bytes (i.e. at the end of an ASCII '-->' sequence) and comes
+ after the 0x3C byte that was found. (The two 0x2D bytes can be the same as the those in the
+ '<!--' sequence.)</p>
</dd>
@@ -101858,67 +101735,52 @@
<ol>
- <li><p>Advance the <var title="">position</var> pointer so
- that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or
- 0x2F byte (the one in sequence of characters matched
+ <li><p>Advance the <var title="">position</var> pointer so that it points at the next 0x09,
+ 0x0A, 0x0C, 0x0D, 0x20, or 0x2F byte (the one in sequence of characters matched
above).</p></li>
- <li><p>Let <var title="">attribute list</var> be an empty
- list of strings.</p></li> <!-- so long as we only care about
- http-equiv, content, and charset, this can be a 3-bit
- bitfield -->
+ <li><p>Let <var title="">attribute list</var> be an empty list of strings.</p></li> <!-- so
+ long as we only care about http-equiv, content, and charset, this can be a 3-bit bitfield -->
<li><p>Let <var title="">got pragma</var> be false.</p></li>
<li><p>Let <var title="">need pragma</var> be null.</p></li>
- <li><p>Let <var title="">charset</var> be the null value
- (which, for the purposes of this algorithm, is distinct from
- an unrecognised encoding or the empty string).</p></li>
+ <li><p>Let <var title="">charset</var> be the null value (which, for the purposes of this
+ algorithm, is distinct from an unrecognised encoding or the empty string).</p></li>
- <li><p><i>Attributes</i>: <span
- title="concept-get-attributes-when-sniffing">Get an
- attribute</span> and its value. If no attribute was sniffed,
- then jump to the <i>processing</i> step below.</p></li>
+ <li><p><i>Attributes</i>: <span title="concept-get-attributes-when-sniffing">Get an
+ attribute</span> and its value. If no attribute was sniffed, then jump to the
+ <i>processing</i> step below.</p></li>
- <li><p>If the attribute's name is already in <var
- title="">attribute list</var>, then return to the step
- labeled <i>attributes</i>.</p>
+ <li><p>If the attribute's name is already in <var title="">attribute list</var>, then return
+ to the step labeled <i>attributes</i>.</p>
- <li><p>Add the attribute's name to <var title="">attribute
- list</var>.</p>
+ <li><p>Add the attribute's name to <var title="">attribute list</var>.</p>
<li>
- <p>Run the appropriate step from the following list, if one
- applies:</p>
+ <p>Run the appropriate step from the following list, if one applies:</p>
<dl class="switch">
- <dt>If the attribute's name is "<code
- title="">http-equiv</code>"</dt>
+ <dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>
- <dd><p>If the attribute's value is "<code
- title="">content-type</code>", then set <var title="">got
- pragma</var> to true.</p></dd>
+ <dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var
+ title="">got pragma</var> to true.</p></dd>
- <dt>If the attribute's name is "<code
- title="">content</code>"</dt>
+ <dt>If the attribute's name is "<code title="">content</code>"</dt>
- <dd><p>Apply the <span>algorithm for extracting a character encoding
- from a <code>meta</code> element</span>, giving the
- attribute's value as the string to parse. If a character encoding is
- returned, and if <var title="">charset</var> is still set
- to null, let <var title="">charset</var> be the encoding
- returned, and set <var title="">need pragma</var> to
- true.</p></dd>
+ <dd><p>Apply the <span>algorithm for extracting a character encoding from a
+ <code>meta</code> element</span>, giving the attribute's value as the string to parse. If a
+ character encoding is returned, and if <var title="">charset</var> is still set to null,
+ let <var title="">charset</var> be the encoding returned, and set <var title="">need
+ pragma</var> to true.</p></dd>
- <dt>If the attribute's name is "<code
- title="">charset</code>"</dt>
+ <dt>If the attribute's name is "<code title="">charset</code>"</dt>
- <dd><p>Let <var title="">charset</var> be the encoding
- corresponding to the attribute's value, and set <var
- title="">need pragma</var> to false.</p></dd>
+ <dd><p>Let <var title="">charset</var> be the result of <span>getting an encoding</span>
+ from the attribute's value, and set <var title="">need pragma</var> to false.</p></dd>
</dl>
@@ -101926,25 +101788,20 @@
<li><p>Return to the step labeled <i>attributes</i>.</p></li>
- <li><p><i>Processing</i>: If <var title="">need pragma</var> is
- null, then jump to the step below labeled <i>next
- byte</i>.</p></li>
+ <li><p><i>Processing</i>: If <var title="">need pragma</var> is null, then jump to the step
+ below labeled <i>next byte</i>.</p></li>
- <li><p>If <var title="">need pragma</var> is true but <var
- title="">got pragma</var> is false, then jump to the step below
- labeled <i>next byte</i>.</p></li>
+ <li><p>If <var title="">need pragma</var> is true but <var title="">got pragma</var> is
+ false, then jump to the step below labeled <i>next byte</i>.</p></li>
- <li><p>If <var title="">charset</var> is <span>a UTF-16
- encoding</span>, change the value of <var
- title="">charset</var> to UTF-8.</p></li>
+ <li><p>If <var title="">charset</var> is <span>a UTF-16 encoding</span>, change the value of
+ <var title="">charset</var> to UTF-8.</p></li>
- <li><p>If <var title="">charset</var> is not a supported
- character encoding, then jump to the step below labeled <i>next
- byte</i>.</p></li>
+ <li><p>If <var title="">charset</var> is not a supported character encoding, then jump to the
+ step below labeled <i>next byte</i>.</p></li>
- <li><p>Abort the <span>prescan a byte stream to determine its
- encoding</span> algorithm, returning the encoding given by <var
- title="">charset</var>.</p></li>
+ <li><p>Abort the <span>prescan a byte stream to determine its encoding</span> algorithm,
+ returning the encoding given by <var title="">charset</var>.</p></li>
</ol>
@@ -101955,15 +101812,13 @@
<ol>
- <li><p>Advance the <var title="">position</var> pointer so
- that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF),
- 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
+ <li><p>Advance the <var title="">position</var> pointer so that it points at the next 0x09
+ (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E
(ASCII >) byte.</p></li>
- <li><p>Repeatedly <span
- title="concept-get-attributes-when-sniffing">get an
- attribute</span> until no further attributes can be found, then
- jump to the step below labeled <i>next byte</i>.</p></li>
+ <li><p>Repeatedly <span title="concept-get-attributes-when-sniffing">get an attribute</span>
+ until no further attributes can be found, then jump to the step below labeled <i>next
+ byte</i>.</p></li>
</ol>
@@ -101974,9 +101829,8 @@
<dt>A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?')</dt>
<dd>
- <p>Advance the <var title="">position</var> pointer so that it
- points at the first 0x3E byte (ASCII >) that comes after the
- 0x3C byte that was found.</p>
+ <p>Advance the <var title="">position</var> pointer so that it points at the first 0x3E byte
+ (ASCII >) that comes after the 0x3C byte that was found.</p>
</dd>
@@ -101991,16 +101845,13 @@
</li>
- <li><i>Next byte</i>: Move <var title="">position</var> so it
- points at the next byte in the input byte stream, and return to the
- step above labeled <i>loop</i>.</li>
+ <li><i>Next byte</i>: Move <var title="">position</var> so it points at the next byte in the
+ input byte stream, and return to the step above labeled <i>loop</i>.</li>
</ol>
- <p>When the <span>prescan a byte stream to determine its
- encoding</span> algorithm says to <dfn
- title="concept-get-attributes-when-sniffing">get an attribute</dfn>,
- it means doing this:</p>
+ <p>When the <span>prescan a byte stream to determine its encoding</span> algorithm says to <dfn
+ title="concept-get-attributes-when-sniffing">get an attribute</dfn>, it means doing this:</p>
<ol>
@@ -102218,124 +102069,35 @@
<h5>Character encodings</h5>
- <p>User agents must at a minimum support the UTF-8 and Windows-1252
- encodings, but may support more. <a
- href="#refsRFC3629">[RFC3629]</a> <a
- href="#refsWIN1252">[WIN1252]</a></p>
+ <p>User agents must support the encodings defined in the WHATWG Encoding standard. User agents
+ should not support other encodings.</p>
- <p class="note">It is not unusual for Web browsers to support dozens
- if not upwards of a hundred distinct character encodings.</p>
-
- <p>User agents must support the <span>preferred MIME name</span> of
- every character encoding they support, and should support all the
- IANA-registered names and aliases of every character encoding they
- support. <a href="#refsIANACHARSET">[IANACHARSET]</a></p>
-
- <p>When comparing a string specifying a character encoding with the
- name or alias of a character encoding to determine if they are
- equal, user agents must remove any leading or trailing <span
- title="space character">space characters</span> in both names, and
- then perform the comparison in an <span>ASCII
- case-insensitive</span> manner.</p>
-
- <hr>
-
- <p>When a user agent would otherwise use a character encoding given in the
- first column of the following table to either convert content to
- Unicode characters or convert Unicode characters to bytes, it must
- instead use the encoding given in the cell in the second column of
- the same row. When a byte or sequence of bytes is treated
- differently due to this encoding aliasing, it is said to have been
- <dfn>misinterpreted for compatibility</dfn>.</p>
-
- <table id="table-encoding-overrides">
- <caption>Character encoding overrides</caption>
- <thead>
- <tr> <th> Input encoding <th> Replacement encoding <th> References
- <tbody>
- <tr> <td> EUC-KR <td> windows-949 <td>
- <a href="#refsEUCKR">[EUCKR]</a>
- <a href="#refsWIN949">[WIN949]</a>
- <tr> <td> EUC-JP <td> CP51932 <td>
- <a href="#refsEUCJP">[EUCJP]</a>
- <a href="#refsCP51932">[CP51932]</a>
- <tr> <td> GB2312 <td> GBK <td>
- <a href="#refsRFC1345">[RFC1345]</a>
- <a href="#refsGBK">[GBK]</a>
- <tr> <td> GB_2312-80 <td> GBK <td>
- <a href="#refsRFC1345">[RFC1345]</a>
- <a href="#refsGBK">[GBK]</a>
- <tr> <td> ISO-2022-JP <td> CP50220 <td>
- <a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP -->
- <a href="#refsRFC2237">[RFC2237]</a><!-- ISO-2022-JP-1 -->
- <a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 -->
- <a href="#refsCP50220">[CP50220]</a><!-- CP50220, the compatibility replacement for ISO-2022-JP -->
- <tr> <td> ISO-8859-1 <td> windows-1252 <td>
- <a href="#refsRFC1345">[RFC1345]</a>
- <a href="#refsWIN1252">[WIN1252]</a>
- <tr> <td> ISO-8859-9 <td> windows-1254 <td>
- <a href="#refsRFC1345">[RFC1345]</a>
- <a href="#refsWIN1254">[WIN1254]</a>
- <tr> <td> ISO-8859-11 <td> windows-874 <td>
- <a href="#refsISO885911">[ISO885911]</a>
- <a href="#refsWIN874">[WIN874]</a>
- <tr> <td> KS_C_5601-1987 <td> windows-949 <td>
- <a href="#refsRFC1345">[RFC1345]</a>
- <a href="#refsWIN949">[WIN949]</a>
- <tr> <td> Shift_JIS <td> Windows-31J <td>
- <a href="#refsSHIFTJIS">[SHIFTJIS]</a>
- <a href="#refsWIN31J">[WIN31J]</a>
- <tr> <td> TIS-620 <td> windows-874 <td>
- <a href="#refsTIS620">[TIS620]</a>
- <a href="#refsWIN874">[WIN874]</a>
- <tr> <td> US-ASCII <td> windows-1252 <td>
- <a href="#refsRFC1345">[RFC1345]</a>
- <a href="#refsWIN1252">[WIN1252]</a>
- </tbody>
- </table>
-
- <p class="note">The requirement to treat certain encodings as other
- encodings according to the table above is a <span>willful
- violation</span> of the W3C Character Model specification, motivated
- by a desire for compatibility with legacy content. <a
- href="#refsCHARMOD">[CHARMOD]</a></p>
-
- <p>When a user agent is to use the self-describing UTF-16 encoding
- but no BOM has been found, user agents must default to little-endian
- UTF-16.</p>
-
- <p class="note">The requirement to default UTF-16 to little-endian
- rather than big-endian is a <span>willful violation</span> of RFC
- 2781, motivated by a desire for compatibility with legacy content.
- <a href="#refsRFC2781">[RFC2781]</a></p>
-
- <hr>
-
- <p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
- encodings. <a href="#refsCESU8">[CESU8]</a> <a
- href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a
+ <p>User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU encodings. <a
+ href="#refsCESU8">[CESU8]</a> <a href="#refsUTF7">[UTF7]</a> <a href="#refsBOCU1">[BOCU1]</a> <a
href="#refsSCSU">[SCSU]</a></p>
- <p>Support for encodings based on EBCDIC is discouraged. This encoding is rarely used for
- publicly-facing Web content.</p>
+ <p>Support for encodings based on EBCDIC is especially discouraged. This encoding is rarely used
+ for publicly-facing Web content. Support for UTF-32 is also especially discouraged. This encoding
+ is rarely used, and frequently implemented incorrectly.</p>
- <p>Support for UTF-32 is also discouraged. This encoding is rarely used, and frequently
- implemented incorrectly.</p>
+ <p class="note">This specification does not make any attempt to support EBCDIC-based encodings and
+ UTF-32 in its algorithms; support and use of these encodings can thus lead to unexpected behavior
+ in implementations of this specification.</p>
- <p class="note">This specification does not make any attempt to
- support EBCDIC-based encodings and UTF-32 in its algorithms; support
- and use of these encodings can thus lead to unexpected behavior in
- implementations of this specification.</p>
+ <p>When a user agent is to use the self-describing UTF-16 encoding but no BOM has been found, user
+ agents must default to little-endian UTF-16.</p>
+ <p class="note">The requirement to default UTF-16 to little-endian rather than big-endian is a
+ <span>willful violation</span> of RFC 2781, motivated by a desire for compatibility with legacy
+ content. <a href="#refsRFC2781">[RFC2781]</a></p>
<h5>Changing the encoding while parsing</h5>
- <p>When the parser requires the user agent to <dfn>change the
- encoding</dfn>, it must run the following steps. This might happen
- if the <span>encoding sniffing algorithm</span> described above
- failed to find a character encoding, or if it found a character encoding that was not
- the actual encoding of the file.</p>
+ <p>When the parser requires the user agent to <dfn>change the encoding</dfn>, it must run the
+ following steps. This might happen if the <span>encoding sniffing algorithm</span> described above
+ failed to find a character encoding, or if it found a character encoding that was not the actual
+ encoding of the file.</p>
<ol>
@@ -106293,26 +106055,19 @@
token's <i>self-closing flag</i></span>, if it is set.</p>
<p id="meta-charset-during-parse">If the element has a <code
- title="attr-meta-charset">charset</code> attribute, and its value
- is either a supported <span>ASCII-compatible character
- encoding</span> or <span>a UTF-16 encoding</span>, and the <span
- title="concept-encoding-confidence">confidence</span> is currently
- <i>tentative</i>, then <span>change the encoding</span> to the
- encoding given by the value of the <code
- title="attr-meta-charset">charset</code> attribute.</p>
+ title="attr-meta-charset">charset</code> attribute, and <span>getting an encoding</span> from
+ its value results in a supported <span>ASCII-compatible character encoding</span> or <span>a
+ UTF-16 encoding</span>, and the <span title="concept-encoding-confidence">confidence</span> is
+ currently <i>tentative</i>, then <span>change the encoding</span> to the resulting encoding.</p>
- <p>Otherwise, if the element has an <code
- title="attr-meta-http-equiv">http-equiv</code> attribute whose
- value is an <span>ASCII case-insensitive</span> match for the
- string "<code title="">Content-Type</code>", and the element has a
- <code title="attr-meta-content">content</code> attribute, and
- applying the <span>algorithm for extracting a character encoding from a
- <code>meta</code> element</span> to that attribute's value returns
- a supported <span>ASCII-compatible character encoding</span> or
- <span>a UTF-16 encoding</span>, and the <span
- title="concept-encoding-confidence">confidence</span> is currently
- <i>tentative</i>, then <span>change the encoding</span> to the
- extracted encoding.</p>
+ <p>Otherwise, if the element has an <code title="attr-meta-http-equiv">http-equiv</code>
+ attribute whose value is an <span>ASCII case-insensitive</span> match for the string "<code
+ title="">Content-Type</code>", and the element has a <code
+ title="attr-meta-content">content</code> attribute, and applying the <span>algorithm for
+ extracting a character encoding from a <code>meta</code> element</span> to that attribute's
+ value returns a supported <span>ASCII-compatible character encoding</span> or <span>a UTF-16
+ encoding</span>, and the <span title="concept-encoding-confidence">confidence</span> is
+ currently <i>tentative</i>, then <span>change the encoding</span> to the extracted encoding.</p>
</dd>
@@ -117195,16 +116950,11 @@
<dl>
<dt><code title="">charset</code></dt>
<dd>
- <p>The <code title="">charset</code> parameter may be provided
- to definitively specify the <span>document's character
- encoding</span>, overriding any <span title="character encoding
- declaration">character encoding declarations</span> in the
- document. The parameter's value must be the name of the
- character encoding used to serialize the file, must be a valid
- character encoding name, and must be an <span>ASCII
- case-insensitive</span> match for the <span>preferred MIME
- name</span> for that encoding. <a
- href="#refsIANACHARSET">[IANACHARSET]</a></p>
+ <p>The <code title="">charset</code> parameter may be provided to definitively specify the
+ <span>document's character encoding</span>, overriding any <span title="character encoding
+ declaration">character encoding declarations</span> in the document. The parameter's value
+ must be the <span title="encoding name">name</span> of the <span title="encoding">character
+ encoding</span> used to serialize the file. <a href="#refsENCODING">[ENCODING]</a></p>
</dd>
</dl>
</dd>
@@ -117944,7 +117694,7 @@
<dt>URI scheme semantics:</dt>
<dd>Scheme-specific.</dd>
<dt>Encoding considerations:</dt>
- <dd>All "<code title="">web+</code>" schemes should use UTF-8 encodings were relevant.</dd>
+ <dd>All "<code title="">web+</code>" schemes should use UTF-8 encodings where relevant.</dd>
<dt>Applications/protocols that use this URI scheme name:</dt>
<dd>Scheme-specific.</dd>
<dt>Interoperability considerations:</dt>
@@ -119904,7 +119654,7 @@
<th> <code title="">accept-charset</code>
<td> <code title="attr-form-accept-charset">form</code>
<td> Character encodings to use for <span>form submission</span>
- <td> <span>Ordered set of unique space-separated tokens</span>, <span>ASCII case-insensitive</span>, consisting of <span title="preferred MIME name">preferred MIME names</span> of <span title="ASCII-compatible character encoding">ASCII-compatible character encodings</span>*
+ <td> <span>Ordered set of unique space-separated tokens</span>, <span>ASCII case-insensitive</span>, consisting of <span title="encoding name">names</span> of <span title="ASCII-compatible character encoding">ASCII-compatible character encodings</span>*
<tr>
<th> <code title="">accesskey</code>
<td> <span title="attr-accesskey">HTML elements</span>
@@ -119968,12 +119718,12 @@
<th> <code title="">charset</code>
<td> <code title="attr-meta-charset">meta</code>
<td> <span>Character encoding declaration</span>
- <td> <span>Preferred MIME name</span> of a character encoding*
+ <td> <span>Encoding name</span>*
<tr>
<th> <code title="">charset</code>
<td> <code title="attr-script-charset">script</code>
<td> Character encoding of the external script resource
- <td> <span>Preferred MIME name</span> of a character encoding*
+ <td> <span>Encoding name</span>*
<tr>
<th> <code title="">checked</code>
<td> <code title="attr-menuitem-checked">menuitem</code>;
@@ -122061,47 +121811,32 @@
<dt id="refsEDITING">[EDITING]</dt>
<dd><cite><a href="http://dvcs.w3.org/hg/editing/raw-file/tip/editing.html">HTML Editing APIs</a></cite>, A. Gregor. W3C Editing APIs CG.</dd>
+ <dt id="refsENCODING">[ENCODING]</dt>
+ <dd><cite><a href="http://encoding.spec.whatwg.org/">Encoding</a></cite>, A. van Kesteren, J. Bell. WHATWG.</dd>
+
<dt id="refsEUCKR">[EUCKR]</dt>
<dd><cite>Hangul Unix Environment</cite>. Korea Industrial Standards Association. Ref. No. KS C 5861-1992.</dd>
<dt id="refsEUCJP">[EUCJP]</dt>
<dd><cite>Definition and Notes of Japanese EUC</cite>. UI-OSF-USLP. In English in the abridged translation of the <a href="http://home.m05.itscom.net/numa/uocjleE.pdf">UI-OSF Application Platform Profile for Japanese Environment</a>, Appendix C.</dd>
- <dt id="refsEVENTSOURCE">[EVENTSOURCE]</dt>
- <!--
- <dd><cite><a href="http://www.w3.org/TR/eventsource/">Server-Sent
- Events</a></cite>, I. Hickson. W3C.</dd>
- -->
- <dd><cite><a
- href="http://dev.w3.org/html5/eventsource/">Server-Sent
- Events</a></cite>, I. Hickson. W3C.</dd>
-
<dt id="refsFILEAPI">[FILEAPI]</dt>
- <dd><cite><a
- href="http://dev.w3.org/2006/webapi/FileUpload/publish/FileAPI.html">File
- API</a></cite>, A. Ranganathan. W3C.</dd>
+ <dd><cite><a href="http://dev.w3.org/2006/webapi/FileUpload/publish/FileAPI.html">File API</a></cite>, A. Ranganathan. W3C.</dd>
<dt id="refsFILESYSTEMAPI">[FILESYSTEMAPI]</dt>
- <dd><cite><a
- href="http://dev.w3.org/2009/dap/file-system/file-dir-sys.html">File
- API: Directories and System</a></cite>, E. Uhrhane. W3C.</dd>
+ <dd><cite><a href="http://dev.w3.org/2009/dap/file-system/file-dir-sys.html">File API: Directories and System</a></cite>, E. Uhrhane. W3C.</dd>
<dt id="refsFULLSCREEN">[FULLSCREEN]</dt>
<dd><cite><a href="http://fullscreen.spec.whatwg.org/">Fullscreen</a></cite>, A. van Kesteren, T. Çelik. WHATWG.</dd>
<dt id="refsGBK">[GBK]</dt>
- <dd><cite>Chinese Internal Code Specification</cite>. Chinese IT
- Standardization Technical Committee.</dd>
- <!-- http://www.iana.org/assignments/charset-reg/GBK -->
+ <dd><cite>Chinese Internal Code Specification</cite>. Chinese IT Standardization Technical Committee.</dd> <!-- http://www.iana.org/assignments/charset-reg/GBK -->
<dt id="refsGIF">[GIF]</dt>
<dd>(Non-normative) <cite><a href="http://www.w3.org/Graphics/GIF/spec-gif89a.txt">Graphics Interchange Format</a></cite>. CompuServe.</dd>
<dt id="refsGRAPHICS">[GRAPHICS]</dt>
- <dd>(Non-normative) <cite>Computer Graphics: Principles and
- Practice in C</cite>, Second Edition, J. Foley, A. van Dam,
- S. Feiner, J. Hughes. Addison-Wesley. ISBN
- 0-201-84840-6.</dd>
+ <dd>(Non-normative) <cite>Computer Graphics: Principles and Practice in C</cite>, Second Edition, J. Foley, A. van Dam, S. Feiner, J. Hughes. Addison-Wesley. ISBN 0-201-84840-6.</dd>
<!--
This book ("Computer Graphics: Principles and Practice in C")
apparently does not make any references to literature in the
@@ -122111,13 +121846,10 @@
-->
<dt id="refsGREGORIAN">[GREGORIAN]</dt>
- <dd>(Non-normative) <cite>Inter Gravissimas</cite>, A. Lilius,
- C. Clavius. Gregory XIII Papal Bull, February 1582.</dd>
+ <dd>(Non-normative) <cite>Inter Gravissimas</cite>, A. Lilius, C. Clavius. Gregory XIII Papal Bull, February 1582.</dd>
<dt id="refsHATOM">[HATOM]</dt>
- <dd>(Non-normative) <cite><a
- href="http://microformats.org/wiki/hatom">hAtom</a></cite>, D
- Janes. Microformats.</dd>
+ <dd>(Non-normative) <cite><a href="http://microformats.org/wiki/hatom">hAtom</a></cite>, D Janes. Microformats.</dd>
<dt id="refsHMAC">[HMAC]</dt>
<dd><cite><a href="http://csrc.nist.gov/publications/fips/fips198/fips-198a.pdf">The Keyed-Hash Message Authentication Code (HMAC)</a></cite>. NIST.</dd>
@@ -122125,59 +121857,32 @@
<dt id="refsHPAAIG">[HPAAIG]</dt>
<dd><cite><a href="http://dev.w3.org/html5/html-api-map/overview.html">HTML to Platform Accessibility APIs Implementation Guide</a></cite>. W3C.</dd>
- <dt id="refsHTML4">[HTML4]</dt>
- <dd>(Non-normative) <cite><a
- href="http://www.w3.org/TR/html4/">HTML 4.01
- Specification</a></cite>, D. Raggett, A. Le Hors, I. Jacobs. W3C.</dd>
-
<dt id="refsHTML">[HTML]</dt>
<dd><cite><a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML</a></cite>, I. Hickson. WHATWG.</dd>
- <dt id="refsHTMLALTTECHS">[HTMLALTTECHS]</dt>
- <dd>(Non-normative) <cite><a href="http://dev.w3.org/html5/alt-techniques/">HTML5: Techniques for providing useful text alternatives</a></cite>, S. Faulkner. W3C.</dd>
-
- <dt id="refsHTMLDIFF">[HTMLDIFF]</dt>
- <dd>(Non-normative) <cite><a href="http://dev.w3.org/html5/html4-differences/">HTML5 differences from HTML4</a></cite>, S. Pieters. W3C.</dd>
-
<dt id="refsHTTP">[HTTP]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2616">Hypertext
- Transfer Protocol — HTTP/1.1</a></cite>, R. Fielding, J. Gettys,
- J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2616">Hypertext Transfer Protocol — HTTP/1.1</a></cite>, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee. IETF.</dd>
<dt id="refsHTTPS">[HTTPS]</dt>
<dd>(Non-normative) <cite><a href="http://tools.ietf.org/html/rfc2818">HTTP Over TLS</a></cite>, E. Rescorla. IETF.</dd>
- <dt id="refsIANACHARSET">[IANACHARSET]</dt>
- <dd><cite><a
- href="http://www.iana.org/assignments/character-sets">Character
- Sets</a></cite>. IANA.</dd>
-
<dt id="refsIANALINKTYPE">[IANALINKTYPE]</dt>
- <dd><cite><a
- href="http://www.iana.org/assignments/link-relations">Link
- Relations</a></cite>. IANA.</dd>
+ <dd><cite><a href="http://www.iana.org/assignments/link-relations">Link Relations</a></cite>. IANA.</dd>
<dt id="refsIANAPERMHEADERS">[IANAPERMHEADERS]</dt>
- <dd><cite><a
- href="http://www.iana.org/assignments/message-headers/perm-headers.html">Permanent
- Message Header Field Names</a></cite>. IANA.</dd>
+ <dd><cite><a href="http://www.iana.org/assignments/message-headers/perm-headers.html">Permanent Message Header Field Names</a></cite>. IANA.</dd>
<dt id="refsICE">[ICE]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc5245">Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols</a></cite>, J. Rosenberg. IETF.</dd>
<dt id="refsIEEE754">[IEEE754]</dt>
- <dd><cite><a
- href="http://ieeexplore.ieee.org/servlet/opac?punumber=4610933">IEEE
- Standard for Floating-Point Arithmetic (IEEE 754)</a></cite>. IEEE. ISBN 978-0-7381-5753-5.</dd>
+ <dd><cite><a href="http://ieeexplore.ieee.org/servlet/opac?punumber=4610933">IEEE Standard for Floating-Point Arithmetic (IEEE 754)</a></cite>. IEEE. ISBN 978-0-7381-5753-5.</dd>
<dt id="refsISO8601">[ISO8601]</dt>
<dd>(Non-normative) <cite><a href="http://isotc.iso.org/livelink/livelink/4021199/ISO_8601_2004_E.zip?func=doc.Fetch&nodeid=4021199">ISO8601: Data elements and interchange formats — Information interchange — Representation of dates and times</a></cite>. ISO.</dd>
<dt id="refsISO885911">[ISO885911]</dt>
- <dd><cite><a href="http://std.dkuug.dk/jtc1/sc2/open/02n3333.pdf">ISO-8859-11:
- Information technology — 8-bit single-byte coded graphic
- character sets — Part 11: Latin/Thai
- alphabet</a></cite>. ISO.</dd>
+ <dd><cite><a href="http://std.dkuug.dk/jtc1/sc2/open/02n3333.pdf">ISO-8859-11: Information technology — 8-bit single-byte coded character sets — Part 11: Latin/Thai alphabet</a></cite>. ISO.</dd>
<dt id="refsJLREQ">[JLREQ]</dt>
<dd><cite><a href="http://www.w3.org/TR/jlreq/">Requirements for Japanese Text Layout</a></cite>. W3C.</dd> <!-- too many editors to list -->
@@ -122186,9 +121891,7 @@
<dd><cite><a href="http://www.w3.org/Graphics/JPEG/jfif3.pdf">JPEG File Interchange Format</a></cite>, E. Hamilton.</dd>
<dt id="refsJSON">[JSON]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc4627">The
- application/json Media Type for JavaScript Object Notation
- (JSON)</a></cite>, D. Crockford. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc4627">The application/json Media Type for JavaScript Object Notation (JSON)</a></cite>, D. Crockford. IETF.</dd>
<dt id="refsJSURL">[JSURL]</dt>
<dd><cite><a href="http://tools.ietf.org/html/draft-hoehrmann-javascript-scheme">The 'javascript' resource identifier scheme</a></cite>, B. Höhrmann. IETF.
@@ -122199,15 +121902,10 @@
<dd>(Non-normative) <cite><a href="http://tools.ietf.org/html/rfc6068">The 'mailto' URI scheme</a></cite>, M. Duerst, L. Masinter, J. Zawinski. IETF.</dd>
<dt id="refsMATHML">[MATHML]</dt>
- <dd><cite><a href="http://www.w3.org/TR/MathML/">Mathematical
- Markup Language (MathML)</a></cite>, D. Carlisle, P. Ion, R. Miner,
- N. Poppelier. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/MathML/">Mathematical Markup Language (MathML)</a></cite>, D. Carlisle, P. Ion, R. Miner, N. Poppelier. W3C.</dd>
<dt id="refsMEDIAFRAG">[MEDIAFRAG]</dt>
- <dd><cite><a
- href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media
- Fragments URI</a></cite>, R. Troncy, E. Mannens, S. Pfeiffer, D.
- Van Deursen. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/">Media Fragments URI</a></cite>, R. Troncy, E. Mannens, S. Pfeiffer, D. Van Deursen. W3C.</dd>
<dt id="refsMFREL">[MFREL]</dt>
<dd><cite><a href="http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions">Microformats Wiki: existing rel values</a></cite>. Microformats.</dd>
@@ -122239,10 +121937,7 @@
<dd><cite><a href="http://wiki.xiph.org/SkeletonHeaders">SkeletonHeaders</a></cite>. Xiph.Org.</dd>
<dt id="refsOPENSEARCH">[OPENSEARCH]</dt>
- <dd><cite><a
- href="http://www.opensearch.org/Specifications/OpenSearch/1.1#Autodiscovery_in_HTML.2FXHTML">Autodiscovery
- in HTML/XHTML</a></cite>. In <cite>OpenSearch 1.1 Draft 4</cite>,
- Section 4.6.2. OpenSearch.org.</dd>
+ <dd><cite><a href="http://www.opensearch.org/Specifications/OpenSearch/1.1#Autodiscovery_in_HTML.2FXHTML">Autodiscovery in HTML/XHTML</a></cite>. In <cite>OpenSearch 1.1 Draft 4</cite>, Section 4.6.2. OpenSearch.org.</dd>
<dt id="refsORIGIN">[ORIGIN]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc6454">The Web Origin Concept</a></cite>, A. Barth. IETF.</dd>
@@ -122254,29 +121949,16 @@
<dd>(Non-normative) <cite><a href="http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf">Document management — Portable document format — Part 1: PDF</a></cite>. ISO.</dd>
<dt id="refsPINGBACK">[PINGBACK]</dt>
- <dd><cite><a
- href="http://www.hixie.ch/specs/pingback/pingback">Pingback
- 1.0</a></cite>, S. Langridge, I. Hickson.</dd>
+ <dd><cite><a href="http://www.hixie.ch/specs/pingback/pingback">Pingback 1.0</a></cite>, S. Langridge, I. Hickson.</dd>
<dt id="refsPNG">[PNG]</dt>
- <dd><cite><a href="http://www.w3.org/TR/PNG/">Portable Network
- Graphics (PNG) Specification</a></cite>, D. Duce. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/PNG/">Portable Network Graphics (PNG) Specification</a></cite>, D. Duce. W3C.</dd>
<dt id="refsPOINTERLOCK">[POINTERLOCK]</dt>
<dd><cite><a href="http://dvcs.w3.org/hg/pointerlock/raw-file/default/index.html">Pointer Lock</a></cite>, V. Scheib. W3C.</dd>
- <dt id="refsPOLYGLOT">[POLYGLOT]</dt>
- <dd>(Non-normative) <cite><a
- href="http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html">Polyglot
- Markup: HTML-Compatible XHTML Documents</a></cite>, E. Graff.
- W3C.</dd>
-
<dt id="refsPORTERDUFF">[PORTERDUFF]</dt>
- <dd><cite><a
- href="http://keithp.com/~keithp/porterduff/p253-porter.pdf">Compositing
- Digital Images</a></cite>, T. Porter, T. Duff. In <cite>Computer
- graphics</cite>, volume 18, number 3, pp. 253-259. ACM Press, July
- 1984.</dd>
+ <dd><cite><a href="http://keithp.com/~keithp/porterduff/p253-porter.pdf">Compositing Digital Images</a></cite>, T. Porter, T. Duff. In <cite>Computer graphics</cite>, volume 18, number 3, pp. 253-259. ACM Press, July 1984.</dd>
<dt id="refsPPUTF8">[PPUTF8]</dt>
<dd>(Non-normative) <cite><a href="http://www.sw.it.aoyama.ac.jp/2012/pub/IUC11-UTF-8.pdf">The Properties and Promises <!-- Promizes (sic) --> of UTF-8</a></cite>, M. Dürst. University of Zürich. In <cite>Proceedings of the 11th International Unicode Conference</cite>.</dd>
@@ -122289,177 +121971,116 @@
<dd><cite><a href="http://tools.ietf.org/html/rfc1034">Domain Names - Concepts and Facilities</a></cite>, P. Mockapetris. IETF, November 1987.</dd>
<dt id="refsRFC1321">[RFC1321]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1321">The MD5
- Message-Digest Algorithm</a></cite>, R. Rivest. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1321">The MD5 Message-Digest Algorithm</a></cite>, R. Rivest. IETF.</dd>
<dt id="refsRFC1345">[RFC1345]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1345">Character Mnemonics
- and Character Sets</a></cite>, K. Simonsen. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1345">Character Mnemonics and Character Sets</a></cite>, K. Simonsen. IETF.</dd>
<dt id="refsRFC1468">[RFC1468]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1468">Japanese Character
- Encoding for Internet Messages</a></cite>, J. Murai, M. Crispin, E. van der
- Poel. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1468">Japanese Character Encoding for Internet Messages</a></cite>, J. Murai, M. Crispin, E. van der Poel. IETF.</dd>
<dt id="refsRFC1494">[RFC1494]</dt>
- <dd>(Non-normative) <cite><a
- href="http://tools.ietf.org/html/rfc1494">Equivalences between
- 1988 X.400 and RFC-822 Message Bodies</a></cite>, H. Alvestrand,
- S. Thompson. IETF.</dd>
+ <dd>(Non-normative) <cite><a href="http://tools.ietf.org/html/rfc1494">Equivalences between 1988 X.400 and RFC-822 Message Bodies</a></cite>, H. Alvestrand, S. Thompson. IETF.</dd>
<dt id="refsRFC1554">[RFC1554]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1554">ISO-2022-JP-2:
- Multilingual Extension of ISO-2022-JP</a></cite>, M. Ohta, K. Handa. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1554">ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP</a></cite>, M. Ohta, K. Handa. IETF.</dd>
<dt id="refsRFC1557">[RFC1557]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1557">Korean Character
- Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1557">Korean Character Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF.</dd>
<dt id="refsRFC1842">[RFC1842]</dt>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1842">ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang. IETF.</dd>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1842">ASCII
- Printable Characters-Based Chinese Character Encoding for Internet
- Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
- IETF.</dd>
-
<dt id="refsRFC1922">[RFC1922]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc1922">Chinese Character
- Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,
- WCH. Chang, M. Crispin. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc1922">Chinese Character Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao, WCH. Chang, M. Crispin. IETF.</dd>
<dt id="refsRFC2045">[RFC2045]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2045">Multipurpose Internet
- Mail Extensions (MIME) Part One: Format of Internet Message Bodies</a></cite>,
- N. Freed, N. Borenstein. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2045">Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies</a></cite>, N. Freed, N. Borenstein. IETF.</dd>
<dt id="refsRFC2046">[RFC2046]</dt>
- <dd><cite><a
- href="http://tools.ietf.org/html/rfc2046">Multipurpose Internet
- Mail Extensions (MIME) Part Two: Media Types</a></cite>, N. Freed,
- N. Borenstein. IETF.</dd> <!-- for text/plain and
- "Internet Media type"; not for definition of "valid MIME type". -->
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2046">Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types</a></cite>, N. Freed, N. Borenstein. IETF.</dd> <!-- for text/plain and "Internet Media type"; not for definition of "valid MIME type". -->
<dt id="refsRFC2119">[RFC2119]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2119">Key words for use in
- RFCs to Indicate Requirement Levels</a></cite>, S. Bradner. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2119">Key words for use in RFCs to Indicate Requirement Levels</a></cite>, S. Bradner. IETF.</dd>
<dt id="refsRFC2237">[RFC2237]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2237">Japanese Character
- Encoding for Internet Messages</a></cite>, K. Tamaru. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2237">Japanese Character Encoding for Internet Messages</a></cite>, K. Tamaru. IETF.</dd>
<dt id="refsRFC2246">[RFC2246]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2246">The TLS Protocol
- Version 1.0</a></cite>, T. Dierks, C. Allen. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2246">The TLS Protocol Version 1.0</a></cite>, T. Dierks, C. Allen. IETF.</dd>
<dt id="refsRFC2313">[RFC2313]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2313">PKCS #1:
- RSA Encryption</a></cite>, B. Kaliski. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2313">PKCS #1: RSA Encryption</a></cite>, B. Kaliski. IETF.</dd>
<dt id="refsRFC2318">[RFC2318]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2318">The
- text/css Media Type</a></cite>, H. Lie, B. Bos, C. Lilley. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2318">The text/css Media Type</a></cite>, H. Lie, B. Bos, C. Lilley. IETF.</dd>
<dt id="refsRFC2388">[RFC2388]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2388">Returning Values from
- Forms: multipart/form-data</a></cite>, L. Masinter. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2388">Returning Values from Forms: multipart/form-data</a></cite>, L. Masinter. IETF.</dd>
<dt id="refsRFC2397">[RFC2397]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2397">The "data"
- URL scheme</a></cite>, L. Masinter. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2397">The "data" URL scheme</a></cite>, L. Masinter. IETF.</dd>
<dt id="refsRFC2445">[RFC2445]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2445">Internet Calendaring
- and Scheduling Core Object Specification (iCalendar)</a></cite>, F. Dawson, D.
- Stenerson. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2445">Internet Calendaring and Scheduling Core Object Specification (iCalendar)</a></cite>, F. Dawson, D. Stenerson. IETF.</dd>
<dt id="refsRFC2483">[RFC2483]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2483">URI Resolution
- Services Necessary for URN Resolution</a></cite>, M. Mealling, R. Daniel.
- IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2483">URI Resolution Services Necessary for URN Resolution</a></cite>, M. Mealling, R. Daniel. IETF.</dd>
<dt id="refsRFC2781">[RFC2781]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2781">UTF-16, an
- encoding of ISO 10646</a></cite>, P. Hoffman, F. Yergeau. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2781">UTF-16, an encoding of ISO 10646</a></cite>, P. Hoffman, F. Yergeau. IETF.</dd>
<dt id="refsRFC3676">[RFC3676]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3676">The Text/Plain Format
- and DelSp Parameters</a></cite>, R. Gellens. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3676">The Text/Plain Format and DelSp Parameters</a></cite>, R. Gellens. IETF.</dd>
<dt id="refsRFC2806">[RFC2806]</dt>
- <dd>(Non-normative) <cite><a
- href="http://tools.ietf.org/html/rfc2806">URLs for Telephone
- Calls</a></cite>, A. Vaha-Sipila. IETF.</dd>
+ <dd>(Non-normative) <cite><a href="http://tools.ietf.org/html/rfc2806">URLs for Telephone Calls</a></cite>, A. Vaha-Sipila. IETF.</dd>
<dt id="refsRFC3023">[RFC3023]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3023">XML Media
- Types</a></cite>, M. Murata, S. St. Laurent, D. Kohn. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3023">XML Media Types</a></cite>, M. Murata, S. St. Laurent, D. Kohn. IETF.</dd>
<dt id="refsRFC3279">[RFC3279]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3279">Algorithms
- and Identifiers for the Internet X.509 Public Key Infrastructure
- Certificate and Certificate Revocation List (CRL)
- Profile</a></cite>, W. Polk, R. Housley, L. Bassham. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3279">Algorithms and Identifiers for the Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile</a></cite>, W. Polk, R. Housley, L. Bassham. IETF.</dd>
<dt id="refsRFC3490">[RFC3490]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3490">Internationalizing
- Domain Names in Applications (IDNA)</a></cite>, P. Faltstrom, P. Hoffman, A.
- Costello. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3490">Internationalizing Domain Names in Applications (IDNA)</a></cite>, P. Faltstrom, P. Hoffman, A. Costello. IETF.</dd>
<dt id="refsRFC3629">[RFC3629]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3629">UTF-8, a
- transformation format of ISO 10646</a></cite>, F. Yergeau. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3629">UTF-8, a transformation format of ISO 10646</a></cite>, F. Yergeau. IETF.</dd>
<dt id="refsRFC3864">[RFC3864]</dt>
<dd><cite><a
- href="http://tools.ietf.org/html/rfc3864">Registration Procedures
- for Message Header Fields</a></cite>, G. Klyne, M. Nottingham,
- J. Mogul. IETF.</dd>
+ href="http://tools.ietf.org/html/rfc3864">Registration Procedures for Message Header Fields</a></cite>, G. Klyne, M. Nottingham, J. Mogul. IETF.</dd>
<dt id="refsRFC3986">[RFC3986]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3986">Uniform Resource
- Identifier (URI): Generic Syntax</a></cite>, T. Berners-Lee, R. Fielding, L.
- Masinter. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3986">Uniform Resource Identifier (URI): Generic Syntax</a></cite>, T. Berners-Lee, R. Fielding, L. Masinter. IETF.</dd>
<dt id="refsRFC3987">[RFC3987]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc3987">Internationalized
- Resource Identifiers (IRIs)</a></cite>, M. Dürst, M. Suignard. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc3987">Internationalized Resource Identifiers (IRIs)</a></cite>, M. Dürst, M. Suignard. IETF.</dd>
<dt id="refsRFC4281">[RFC4281]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc4281">The Codecs Parameter
- for "Bucket" Media Types</a></cite>, R. Gellens, D. Singer, P. Frojdh. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc4281">The Codecs Parameter for "Bucket" Media Types</a></cite>, R. Gellens, D. Singer, P. Frojdh. IETF.</dd>
<dt id="refsRFC4329">[RFC4329]</dt>
- <dd>(Non-normative) <cite><a
- href="http://tools.ietf.org/html/rfc4329">Scripting Media
- Types</a></cite>, B. Höhrmann. IETF.</dd>
+ <dd>(Non-normative) <cite><a href="http://tools.ietf.org/html/rfc4329">Scripting Media Types</a></cite>, B. Höhrmann. IETF.</dd>
<dt id="refsRFC4366">[RFC4366]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc4366">Transport
- Layer Security (TLS) Extensions</a></cite>, S. Blake-Wilson,
- M. Nystrom, D. Hopwood, J. Mikkelsen, T. Wright. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc4366">Transport Layer Security (TLS) Extensions</a></cite>, S. Blake-Wilson, M. Nystrom, D. Hopwood, J. Mikkelsen, T. Wright. IETF.</dd>
<dt id="refsRFC4395">[RFC4395]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc4395">Guidelines and Registration Procedures for New URI Schemes</a></cite>, T. Hansen, T. Hardie, L. Masinter. IETF.</dd>
<dt id="refsRFC4648">[RFC4648]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc4648">The Base16,
- Base32, and Base64 Data Encodings</a></cite>, S. Josefsson.
- IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc4648">The Base16, Base32, and Base64 Data Encodings</a></cite>, S. Josefsson. IETF.</dd>
<dt id="refsRFC5280">[RFC5280]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc5280">Internet
- X.509 Public Key Infrastructure Certificate and Certificate
- Revocation List (CRL) Profile</a></cite>, D. Cooper, S. Santesson,
- S. Farrell, S. Boeyen, R. Housley, W. Polk. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc5280">Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile</a></cite>, D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, W. Polk. IETF.</dd>
<dt id="refsRFC5322">[RFC5322]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc5322">Internet Message
- Format</a></cite>, P. Resnick. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc5322">Internet Message Format</a></cite>, P. Resnick. IETF.</dd>
<dt id="refsRFC5724">[RFC5724]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc5724">URI Scheme
- for Global System for Mobile Communications (GSM) Short Message
- Service (SMS)</a></cite>, E. Wilde, A. Vaha-Sipila. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc5724">URI Scheme for Global System for Mobile Communications (GSM) Short Message Service (SMS)</a></cite>, E. Wilde, A. Vaha-Sipila. IETF.</dd>
<dt id="refsRFC6266">[RFC6266]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc6266">Use of the Content-Disposition Header Field in the Hypertext Transfer Protocol (HTTP)</a></cite>, J. Reschke. IETF.</dd>
@@ -122468,10 +122089,7 @@
<dd><cite><a href="http://tools.ietf.org/html/rfc6350">vCard Format Specification</a></cite>, S. Perreault. IETF.</dd>
<dt id="refsSCSU">[SCSU]</dt>
- <dd>(Non-normative) <cite><a
- href="http://www.unicode.org/reports/tr6/">UTR #6: A Standard
- Compression Scheme For Unicode</a></cite>, M. Wolf, K. Whistler,
- C. Wicksteed, M. Davis, A. Freytag, M. Scherer. Unicode Consortium.</dd>
+ <dd>(Non-normative) <cite><a href="http://www.unicode.org/reports/tr6/">UTR #6: A Standard Compression Scheme For Unicode</a></cite>, M. Wolf, K. Whistler, C. Wicksteed, M. Davis, A. Freytag, M. Scherer. Unicode Consortium.</dd>
<dt id="refsSDP">[SDP]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc4566">SDP: Session Description Protocol</a></cite>, M. Handley, V. Jacobson, C. Perkins. IETF.</dd>
@@ -122493,29 +122111,17 @@
for information interchange</cite>. Japanese Industrial Standards Committee.</dd>
<dt id="refsSRGB">[SRGB]</dt>
- <dd><cite lang="en-GB"><a
- href="http://webstore.iec.ch/webstore/webstore.nsf/artnum/025408!OpenDocument&Click=">IEC
- 61966-2-1: Multimedia systems and equipment — Colour measurement
- and management — Part 2-1: Colour management — Default RGB colour
- space — sRGB</a></cite>. IEC.</dd>
+ <dd><cite lang="en-GB"><a href="http://webstore.iec.ch/webstore/webstore.nsf/artnum/025408!OpenDocument&Click=">IEC 61966-2-1: Multimedia systems and equipment — Colour measurement and management — Part 2-1: Colour management — Default RGB colour space — sRGB</a></cite>. IEC.</dd>
<dt id="refsSTUN">[STUN]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc5389">Session Traversal Utilities for NAT (STUN)</a></cite>, J. Rosenberg, R. Mahy, P. Matthews, D. Wing. IETF.</dd>
<dt id="refsSVG">[SVG]</dt>
- <dd><cite><a href="http://www.w3.org/TR/SVGTiny12/">Scalable Vector
- Graphics (SVG) Tiny 1.2 Specification</a></cite>, O. Andersson,
- R. Berjon, E. Dahlström, A. Emmons, J. Ferraiolo, A. Grasso,
- V. Hardy, S. Hayman, D. Jackson, C. Lilley, C. McCormack,
- A. Neumann, C. Northway, A. Quint, N. Ramani, D. Schepers,
- A. Shellshear. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/SVGTiny12/">Scalable Vector Graphics (SVG) Tiny 1.2 Specification</a></cite>, O. Andersson, R. Berjon, E. Dahlström, A. Emmons, J. Ferraiolo, A. Grasso, V. Hardy, S. Hayman, D. Jackson, C. Lilley, C. McCormack, A. Neumann, C. Northway, A. Quint, N. Ramani, D. Schepers, A. Shellshear. W3C.</dd>
<dt id="refsTIS620">[TIS620]</dt>
<dd><cite><a
- href="http://www.nectec.or.th/it-standards/std620/std620.htm">UDC
- 681.3.04:003.62</a></cite>. Thai Industrial Standards Institute,
- Ministry of Industry, Royal Thai Government. ISBN
- 974-606-153-4.</dd>
+ href="http://www.nectec.or.th/it-standards/std620/std620.htm">UDC 681.3.04:003.62</a></cite>. Thai Industrial Standards Institute, Ministry of Industry, Royal Thai Government. ISBN 974-606-153-4.</dd>
<dt id="refsTURN">[TURN]</dt>
<dd><cite><a href="http://tools.ietf.org/html/rfc5766">Traversal Using Relays around NAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN)</a></cite>, R. Mahy, P. Matthews, J. Rosenberg. IETF.</dd>
@@ -122533,56 +122139,31 @@
<dd><cite><a href="http://www.unicode.org/versions/">The Unicode Standard</a></cite>. Unicode Consortium.</dd>
<dt id="refsUNIVCHARDET">[UNIVCHARDET]</dt>
- <dd>(Non-normative) <cite><a
- href="http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html">A
- composite approach to language/encoding
- detection</a></cite>, S. Li, K. Momoi. Netscape. In
- <cite>Proceedings of the 19th International Unicode
- Conference</cite>.</dd>
+ <dd>(Non-normative) <cite><a href="http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html">A composite approach to language/encoding detection</a></cite>, S. Li, K. Momoi. Netscape. In <cite>Proceedings of the 19th International Unicode Conference</cite>.</dd>
<dt id="refsUTF7">[UTF7]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc2152">UTF-7: A
- Mail-Safe Transformation Format of Unicode</a></cite>,
- D. Goldsmith, M. Davis. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc2152">UTF-7: A Mail-Safe Transformation Format of Unicode</a></cite>, D. Goldsmith, M. Davis. IETF.</dd>
<dt id="refsUTF8DET">[UTF8DET]</dt>
- <dd>(Non-normative) <cite><a
- href="http://www.w3.org/International/questions/qa-forms-utf-8">Multilingual
- form encoding</a></cite>, M. Dürst. W3C.</dd>
+ <dd>(Non-normative) <cite><a href="http://www.w3.org/International/questions/qa-forms-utf-8">Multilingual form encoding</a></cite>, M. Dürst. W3C.</dd>
<dt id="refsUTR36">[UTR36]</dt>
- <dd>(Non-normative) <cite><a
- href="http://www.unicode.org/reports/tr36/">UTR #36: Unicode
- Security Considerations</a></cite>, M. Davis, M. Suignard. Unicode
- Consortium.</dd>
+ <dd>(Non-normative) <cite><a href="http://www.unicode.org/reports/tr36/">UTR #36: Unicode Security Considerations</a></cite>, M. Davis, M. Suignard. Unicode Consortium.</dd>
<dt id="refsWCAG">[WCAG]</dt>
- <dd>(Non-normative) <cite><a
- href="http://www.w3.org/TR/WCAG20/">Web Content Accessibility
- Guidelines (WCAG) 2.0</a></cite>, B. Caldwell, M. Cooper, L. Reid,
- G. Vanderheiden. W3C.</dd>
+ <dd>(Non-normative) <cite><a href="http://www.w3.org/TR/WCAG20/">Web Content Accessibility Guidelines (WCAG) 2.0</a></cite>, B. Caldwell, M. Cooper, L. Reid, G. Vanderheiden. W3C.</dd>
<dt id="refsWEBADDRESSES">[WEBADDRESSES]</dt>
- <dd><cite><a href="http://www.w3.org/html/wg/href/draft">Web
- addresses in HTML5</a></cite>, D. Connolly,
- C. Sperberg-McQueen.</dd>
+ <dd><cite><a href="http://www.w3.org/html/wg/href/draft">Web addresses in HTML5</a></cite>, D. Connolly, C. Sperberg-McQueen.</dd>
<dt id="refsWEBGL">[WEBGL]</dt>
- <dd><cite><a
- href="https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/WebGL-spec.html">WebGL
- Specification</a></cite>, C. Marrin. Khronos Group.</dd>
+ <dd><cite><a href="https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/WebGL-spec.html">WebGL Specification</a></cite>, C. Marrin. Khronos Group.</dd>
<dt id="refsWEBIDL">[WEBIDL]</dt>
- <!--
- <dd><cite><a href="http://www.w3.org/TR/WebIDL/">Web
- IDL</a></cite>, C. McCormack. W3C.</dd>
- -->
- <dd><cite><a href="http://dev.w3.org/2006/webapi/WebIDL/">Web
- IDL</a></cite>, C. McCormack. W3C.</dd>
+ <dd><cite><a href="http://dev.w3.org/2006/webapi/WebIDL/">Web IDL</a></cite>, C. McCormack. W3C.</dd>
<dt id="refsWEBLINK">[WEBLINK]</dt>
- <dd><cite><a href="http://tools.ietf.org/html/rfc5988">Web
- Linking</a></cite>, M. Nottingham. IETF.</dd>
+ <dd><cite><a href="http://tools.ietf.org/html/rfc5988">Web Linking</a></cite>, M. Nottingham. IETF.</dd>
<dt id="refsWEBMCG">[WEBMCG]</dt>
<dd><cite><a href="http://www.webmproject.org/code/specs/container/">WebM Container Guidelines</a></cite>. The WebM Project.</dd>
@@ -122626,17 +122207,10 @@
<dd><cite><a href="http://tools.ietf.org/html/rfc6455">The WebSocket protocol</a></cite>, I. Fette, A. Melnikov. IETF.</dd>
<dt id="refsX121">[X121]</dt>
- <dd><cite>Recommendation X.121 — International Numbering Plan for
- Public Data Networks</cite>, CCITT Blue Book, Fascicle VIII.3,
- pp. 317-332.</dd>
+ <dd><cite>Recommendation X.121 — International Numbering Plan for Public Data Networks</cite>, CCITT Blue Book, Fascicle VIII.3, pp. 317-332.</dd>
<dt id="refsX690">[X690]</dt>
- <dd><cite><a
- href="http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf">Recommendation
- X.690 — Information Technology — ASN.1 Encoding Rules —
- Specification of Basic Encoding Rules (BER), Canonical Encoding
- Rules (CER), and Distinguished Encoding Rules
- (DER)</a></cite>. International Telecommunication Union.</dd>
+ <dd><cite><a href="http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf">Recommendation X.690 — Information Technology — ASN.1 Encoding Rules — Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER)</a></cite>. International Telecommunication Union.</dd>
<dt id="refsXFN">[XFN]</dt>
<dd><cite><a href="http://gmpg.org/xfn/11">XFN 1.1 profile</a></cite>, T. Çelik, M. Mullenweg, E. Meyer. GMPG.</dd>
@@ -122645,36 +122219,25 @@
<dd><cite><a href="http://xhr.spec.whatwg.org/"><code>XMLHttpRequest</code></a></cite>, A. van Kesteren. WHATWG.</dd>
<dt id="refsXHTML1">[XHTML1]</dt>
- <dd><cite><a href="http://www.w3.org/TR/xhtml1/">XHTML(TM) 1.0 The
- Extensible HyperText Markup Language (Second Edition)</a></cite>. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/xhtml1/">XHTML(TM) 1.0 The Extensible HyperText Markup Language (Second Edition)</a></cite>. W3C.</dd>
<dt id="refsXHTMLMOD">[XHTMLMOD]</dt>
- <dd><cite><a
- href="http://www.w3.org/TR/xhtml-modularization">Modularization of
- XHTML(TM)</a></cite>, M. Altheim, F. Boumphrey, S. Dooley, S.
- McCarron, S. Schnitzenbaumer, T. Wugofski. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/xhtml-modularization">Modularization of XHTML(TM)</a></cite>, M. Altheim, F. Boumphrey, S. Dooley, S. McCarron, S. Schnitzenbaumer, T. Wugofski. W3C.</dd>
<dt id="refsXML">[XML]</dt>
- <dd><cite><a href="http://www.w3.org/TR/xml/">Extensible Markup
- Language</a></cite>, T. Bray, J. Paoli, C. Sperberg-McQueen,
- E. Maler, F. Yergeau. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/xml/">Extensible Markup Language</a></cite>, T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, F. Yergeau. W3C.</dd>
<dt id="refsXMLBASE">[XMLBASE]</dt>
- <dd><cite><a href="http://www.w3.org/TR/xmlbase/">XML
- Base</a></cite>, J. Marsh, R. Tobin. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/xmlbase/">XML Base</a></cite>, J. Marsh, R. Tobin. W3C.</dd>
<dt id="refsXMLNS">[XMLNS]</dt>
- <dd><cite><a href="http://www.w3.org/TR/xml-names/">Namespaces in
- XML</a></cite>, T. Bray, D. Hollander, A. Layman, R. Tobin. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/xml-names/">Namespaces in XML</a></cite>, T. Bray, D. Hollander, A. Layman, R. Tobin. W3C.</dd>
<dt id="refsXPATH10">[XPATH10]</dt>
- <dd><cite><a
- href="http://www.w3.org/TR/1999/REC-xpath-19991116">XML Path
- Language (XPath) Version 1.0</a></cite>, J. Clark, S. DeRose. W3C.</dd>
+ <dd><cite><a href="http://www.w3.org/TR/1999/REC-xpath-19991116">XML Path Language (XPath) Version 1.0</a></cite>, J. Clark, S. DeRose. W3C.</dd>
<dt id="refsXSLT10">[XSLT10]</dt>
- <dd>(Non-normative) <cite><a href="http://www.w3.org/TR/1999/REC-xslt-19991116">XSL
- Transformations (XSLT) Version 1.0</a></cite>, J. Clark. W3C.</dd>
+ <dd>(Non-normative) <cite><a href="http://www.w3.org/TR/1999/REC-xslt-19991116">XSL Transformations (XSLT) Version 1.0</a></cite>, J. Clark. W3C.</dd>
<!--(once XSLTProcessor is defined somewhere, update this and the place that references this)
<dt id="refsXSLTP">[XSLTP]</dt>