Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[e] (0) Closer integration with encoding.spec.whatwg.org
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=22661
Affected topics: HTML, HTML Syntax and Parsing, Security

git-svn-id: http://svn.whatwg.org/webapps@8081 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Jul 23, 2013
1 parent 0c8820d commit 0bbd1b0
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 82 deletions.
40 changes: 17 additions & 23 deletions complete.html
Expand Up @@ -2473,7 +2473,7 @@ <h4 id=syntax-errors><span class=secno>1.12.2 </span>Syntax errors</h4>
<div class=example>

<p>For example, the restriction on using UTF-7 exists purely to avoid authors falling prey to a
known cross-site-scripting attack using UTF-7.</p>
known cross-site-scripting attack using UTF-7. <a href=#refsUTF7>[UTF7]</a></p>

</div>

Expand Down Expand Up @@ -3065,7 +3065,7 @@ <h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</
0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any character sets we want to support
do things outside that range? -->, ignoring bytes that are the second and later bytes of multibyte
sequences, all correspond to single-byte sequences that map to the same Unicode characters as
those bytes in ANSI_X3.4-1968 (US-ASCII). <a href=#refsRFC1345>[RFC1345]</a></p>
those bytes in Windows-1252<!--ANSI_X3.4-1968 (US-ASCII)-->. <a href=#refsENCODING>[ENCODING]</a></p>

<p class=note>This includes such encodings as Shift_JIS, HZ-GB-2312, and variants of ISO-2022,
even though it is possible in these encodings for bytes like 0x70 to be part of longer sequences
Expand All @@ -3077,8 +3077,8 @@ <h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</
different encodings at once, with different <meta charset> elements applying in each case.
-->

<p>The term <dfn id=a-utf-16-encoding>a UTF-16 encoding</dfn> refers to any variant of UTF-16: self-describing UTF-16
with a BOM, ambiguous UTF-16 without a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>
<p>The term <dfn id=a-utf-16-encoding>a UTF-16 encoding</dfn> refers to any variant of UTF-16: UTF-16LE or UTF-16BE,
regardless of the presence or absence of a BOM. <a href=#refsENCODING>[ENCODING]</a></p>

<p>The term <dfn id=code-unit>code unit</dfn> is used as defined in the Web IDL specification: a 16 bit
unsigned integer, the smallest atomic component of a <code>DOMString</code>. (This is a narrower
Expand Down Expand Up @@ -3431,6 +3431,10 @@ <h4 id=dependencies><span class=secno>2.2.2 </span>Dependencies</h4>
algorithm</i>. The latter first strips a Byte Order Mark (BOM), if any, and then invokes the
former.</p>

<p>For readability, character encodings are sometimes referenced in this specification with a
case that differs from the canonical case given in the encoding standard. (For example,
"UTF-16LE" instead of "utf16-le".)</p>

</dd>


Expand Down Expand Up @@ -86613,13 +86617,6 @@ <h5 id=character-encodings><span class=secno>12.2.2.3 </span>Character encodings
UTF-32 in its algorithms; support and use of these encodings can thus lead to unexpected behavior
in implementations of this specification.</p>

<p>When a user agent is to use the self-describing UTF-16 encoding but no Byte Order Mark (BOM)
has been found, user agents must default to little-endian UTF-16.</p>

<p class=note>The requirement to default UTF-16 to little-endian rather than big-endian is a
<a href=#willful-violation>willful violation</a> of RFC 2781, motivated by a desire for compatibility with legacy
content. <a href=#refsRFC2781>[RFC2781]</a></p>


<h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</h5>

Expand Down Expand Up @@ -103331,7 +103328,7 @@ <h2 class=no-num id=references>References</h2><!--REFS-->
<dd><cite><a href=http://fetch.spec.whatwg.org/>Cross-Origin Resource Sharing</a></cite>, A. van Kesteren. WHATWG.</dd>

<dt id=refsCP50220>[CP50220]</dt>
<dd><cite><a href=http://www.iana.org/assignments/charset-reg/CP50220>CP50220</a></cite>, Y. Naruse. IANA.</dd> <!-- really should be "NARUSE, Y." or some such, but there's a western bias to these references for consistency. sorry. -->
<dd>(Non-normative) <cite><a href=http://www.iana.org/assignments/charset-reg/CP50220>CP50220</a></cite>, Y. Naruse. IANA.</dd> <!-- really should be "NARUSE, Y." or some such, but there's a western bias to these references for consistency. sorry. -->

<dt id=refsCSP>[CSP]</dt>
<dd>(Non-normative) <cite><a href=http://dvcs.w3.org/hg/content-security-policy/raw-file/tip/csp-specification.dev.html>Content Security Policy</a></cite>, B. Sterne, A. Barth. W3C.</dd>
Expand Down Expand Up @@ -103523,22 +103520,22 @@ <h2 class=no-num id=references>References</h2><!--REFS-->
<dd><cite><a href=http://tools.ietf.org/html/rfc1123>Requirements for Internet Hosts -- Application and Support</a></cite>, R. Braden. IETF, October 1989.</dd>

<dt id=refsRFC1345>[RFC1345]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1345>Character Mnemonics and Character Sets</a></cite>, K. Simonsen. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1345>Character Mnemonics and Character Sets</a></cite>, K. Simonsen. IETF.</dd>

<dt id=refsRFC1468>[RFC1468]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1468>Japanese Character Encoding for Internet Messages</a></cite>, J. Murai, M. Crispin, E. van der Poel. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1468>Japanese Character Encoding for Internet Messages</a></cite>, J. Murai, M. Crispin, E. van der Poel. IETF.</dd>

<dt id=refsRFC1554>[RFC1554]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1554>ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP</a></cite>, M. Ohta, K. Handa. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1554>ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP</a></cite>, M. Ohta, K. Handa. IETF.</dd>

<dt id=refsRFC1557>[RFC1557]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1557>Korean Character Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1557>Korean Character Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF.</dd>

<dt id=refsRFC1842>[RFC1842]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1842>ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1842>ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang. IETF.</dd>

<dt id=refsRFC1922>[RFC1922]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1922>Chinese Character Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao, WCH. Chang, M. Crispin. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1922>Chinese Character Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao, WCH. Chang, M. Crispin. IETF.</dd>

<dt id=refsRFC2046>[RFC2046]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2046>Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types</a></cite>, N. Freed, N. Borenstein. IETF.</dd> <!-- for text/plain and "Internet Media type"; not for definition of "valid MIME type". -->
Expand All @@ -103547,7 +103544,7 @@ <h2 class=no-num id=references>References</h2><!--REFS-->
<dd><cite><a href=http://tools.ietf.org/html/rfc2119>Key words for use in RFCs to Indicate Requirement Levels</a></cite>, S. Bradner. IETF.</dd>

<dt id=refsRFC2237>[RFC2237]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2237>Japanese Character Encoding for Internet Messages</a></cite>, K. Tamaru. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc2237>Japanese Character Encoding for Internet Messages</a></cite>, K. Tamaru. IETF.</dd>

<dt id=refsRFC2313>[RFC2313]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2313>PKCS #1: RSA Encryption</a></cite>, B. Kaliski. IETF.</dd>
Expand All @@ -103567,9 +103564,6 @@ <h2 class=no-num id=references>References</h2><!--REFS-->
<dt id=refsRFC2483>[RFC2483]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2483>URI Resolution Services Necessary for URN Resolution</a></cite>, M. Mealling, R. Daniel. IETF.</dd>

<dt id=refsRFC2781>[RFC2781]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2781>UTF-16, an encoding of ISO 10646</a></cite>, P. Hoffman, F. Yergeau. IETF.</dd>

<dt id=refsRFC3676>[RFC3676]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc3676>The Text/Plain Format and DelSp Parameters</a></cite>, R. Gellens. IETF.</dd>

Expand Down Expand Up @@ -103652,7 +103646,7 @@ <h2 class=no-num id=references>References</h2><!--REFS-->
<dd><cite><a href=http://url.spec.whatwg.org/>URL</a></cite>, A. van Kesteren. WHATWG.</dd>

<dt id=refsUTF7>[UTF7]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2152>UTF-7: A Mail-Safe Transformation Format of Unicode</a></cite>, D. Goldsmith, M. Davis. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc2152>UTF-7: A Mail-Safe Transformation Format of Unicode</a></cite>, D. Goldsmith, M. Davis. IETF.</dd>

<dt id=refsUTF8DET>[UTF8DET]</dt>
<dd>(Non-normative) <cite><a href=http://www.w3.org/International/questions/qa-forms-utf-8>Multilingual form encoding</a></cite>, M. D&uuml;rst. W3C.</dd>
Expand Down
40 changes: 17 additions & 23 deletions index
Expand Up @@ -2473,7 +2473,7 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
<div class=example>

<p>For example, the restriction on using UTF-7 exists purely to avoid authors falling prey to a
known cross-site-scripting attack using UTF-7.</p>
known cross-site-scripting attack using UTF-7. <a href=#refsUTF7>[UTF7]</a></p>

</div>

Expand Down Expand Up @@ -3065,7 +3065,7 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
0x3F, 0x41 - 0x5A, and 0x61 - 0x7A<!-- is that list ok? do any character sets we want to support
do things outside that range? -->, ignoring bytes that are the second and later bytes of multibyte
sequences, all correspond to single-byte sequences that map to the same Unicode characters as
those bytes in ANSI_X3.4-1968 (US-ASCII). <a href=#refsRFC1345>[RFC1345]</a></p>
those bytes in Windows-1252<!--ANSI_X3.4-1968 (US-ASCII)-->. <a href=#refsENCODING>[ENCODING]</a></p>

<p class=note>This includes such encodings as Shift_JIS, HZ-GB-2312, and variants of ISO-2022,
even though it is possible in these encodings for bytes like 0x70 to be part of longer sequences
Expand All @@ -3077,8 +3077,8 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
different encodings at once, with different <meta charset> elements applying in each case.
-->

<p>The term <dfn id=a-utf-16-encoding>a UTF-16 encoding</dfn> refers to any variant of UTF-16: self-describing UTF-16
with a BOM, ambiguous UTF-16 without a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>
<p>The term <dfn id=a-utf-16-encoding>a UTF-16 encoding</dfn> refers to any variant of UTF-16: UTF-16LE or UTF-16BE,
regardless of the presence or absence of a BOM. <a href=#refsENCODING>[ENCODING]</a></p>

<p>The term <dfn id=code-unit>code unit</dfn> is used as defined in the Web IDL specification: a 16 bit
unsigned integer, the smallest atomic component of a <code>DOMString</code>. (This is a narrower
Expand Down Expand Up @@ -3431,6 +3431,10 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
algorithm</i>. The latter first strips a Byte Order Mark (BOM), if any, and then invokes the
former.</p>

<p>For readability, character encodings are sometimes referenced in this specification with a
case that differs from the canonical case given in the encoding standard. (For example,
"UTF-16LE" instead of "utf16-le".)</p>

</dd>


Expand Down Expand Up @@ -86613,13 +86617,6 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>
UTF-32 in its algorithms; support and use of these encodings can thus lead to unexpected behavior
in implementations of this specification.</p>

<p>When a user agent is to use the self-describing UTF-16 encoding but no Byte Order Mark (BOM)
has been found, user agents must default to little-endian UTF-16.</p>

<p class=note>The requirement to default UTF-16 to little-endian rather than big-endian is a
<a href=#willful-violation>willful violation</a> of RFC 2781, motivated by a desire for compatibility with legacy
content. <a href=#refsRFC2781>[RFC2781]</a></p>


<h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</h5>

Expand Down Expand Up @@ -103331,7 +103328,7 @@ if (s = prompt('What is your name?')) {
<dd><cite><a href=http://fetch.spec.whatwg.org/>Cross-Origin Resource Sharing</a></cite>, A. van Kesteren. WHATWG.</dd>

<dt id=refsCP50220>[CP50220]</dt>
<dd><cite><a href=http://www.iana.org/assignments/charset-reg/CP50220>CP50220</a></cite>, Y. Naruse. IANA.</dd> <!-- really should be "NARUSE, Y." or some such, but there's a western bias to these references for consistency. sorry. -->
<dd>(Non-normative) <cite><a href=http://www.iana.org/assignments/charset-reg/CP50220>CP50220</a></cite>, Y. Naruse. IANA.</dd> <!-- really should be "NARUSE, Y." or some such, but there's a western bias to these references for consistency. sorry. -->

<dt id=refsCSP>[CSP]</dt>
<dd>(Non-normative) <cite><a href=http://dvcs.w3.org/hg/content-security-policy/raw-file/tip/csp-specification.dev.html>Content Security Policy</a></cite>, B. Sterne, A. Barth. W3C.</dd>
Expand Down Expand Up @@ -103523,22 +103520,22 @@ if (s = prompt('What is your name?')) {
<dd><cite><a href=http://tools.ietf.org/html/rfc1123>Requirements for Internet Hosts -- Application and Support</a></cite>, R. Braden. IETF, October 1989.</dd>

<dt id=refsRFC1345>[RFC1345]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1345>Character Mnemonics and Character Sets</a></cite>, K. Simonsen. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1345>Character Mnemonics and Character Sets</a></cite>, K. Simonsen. IETF.</dd>

<dt id=refsRFC1468>[RFC1468]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1468>Japanese Character Encoding for Internet Messages</a></cite>, J. Murai, M. Crispin, E. van der Poel. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1468>Japanese Character Encoding for Internet Messages</a></cite>, J. Murai, M. Crispin, E. van der Poel. IETF.</dd>

<dt id=refsRFC1554>[RFC1554]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1554>ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP</a></cite>, M. Ohta, K. Handa. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1554>ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP</a></cite>, M. Ohta, K. Handa. IETF.</dd>

<dt id=refsRFC1557>[RFC1557]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1557>Korean Character Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1557>Korean Character Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF.</dd>

<dt id=refsRFC1842>[RFC1842]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1842>ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1842>ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang. IETF.</dd>

<dt id=refsRFC1922>[RFC1922]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc1922>Chinese Character Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao, WCH. Chang, M. Crispin. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc1922>Chinese Character Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao, WCH. Chang, M. Crispin. IETF.</dd>

<dt id=refsRFC2046>[RFC2046]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2046>Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types</a></cite>, N. Freed, N. Borenstein. IETF.</dd> <!-- for text/plain and "Internet Media type"; not for definition of "valid MIME type". -->
Expand All @@ -103547,7 +103544,7 @@ if (s = prompt('What is your name?')) {
<dd><cite><a href=http://tools.ietf.org/html/rfc2119>Key words for use in RFCs to Indicate Requirement Levels</a></cite>, S. Bradner. IETF.</dd>

<dt id=refsRFC2237>[RFC2237]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2237>Japanese Character Encoding for Internet Messages</a></cite>, K. Tamaru. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc2237>Japanese Character Encoding for Internet Messages</a></cite>, K. Tamaru. IETF.</dd>

<dt id=refsRFC2313>[RFC2313]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2313>PKCS #1: RSA Encryption</a></cite>, B. Kaliski. IETF.</dd>
Expand All @@ -103567,9 +103564,6 @@ if (s = prompt('What is your name?')) {
<dt id=refsRFC2483>[RFC2483]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2483>URI Resolution Services Necessary for URN Resolution</a></cite>, M. Mealling, R. Daniel. IETF.</dd>

<dt id=refsRFC2781>[RFC2781]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2781>UTF-16, an encoding of ISO 10646</a></cite>, P. Hoffman, F. Yergeau. IETF.</dd>

<dt id=refsRFC3676>[RFC3676]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc3676>The Text/Plain Format and DelSp Parameters</a></cite>, R. Gellens. IETF.</dd>

Expand Down Expand Up @@ -103652,7 +103646,7 @@ if (s = prompt('What is your name?')) {
<dd><cite><a href=http://url.spec.whatwg.org/>URL</a></cite>, A. van Kesteren. WHATWG.</dd>

<dt id=refsUTF7>[UTF7]</dt>
<dd><cite><a href=http://tools.ietf.org/html/rfc2152>UTF-7: A Mail-Safe Transformation Format of Unicode</a></cite>, D. Goldsmith, M. Davis. IETF.</dd>
<dd>(Non-normative) <cite><a href=http://tools.ietf.org/html/rfc2152>UTF-7: A Mail-Safe Transformation Format of Unicode</a></cite>, D. Goldsmith, M. Davis. IETF.</dd>

<dt id=refsUTF8DET>[UTF8DET]</dt>
<dd>(Non-normative) <cite><a href=http://www.w3.org/International/questions/qa-forms-utf-8>Multilingual form encoding</a></cite>, M. D&uuml;rst. W3C.</dd>
Expand Down

0 comments on commit 0bbd1b0

Please sign in to comment.