Skip to content

Commit

Permalink
[a] (0) discourage use of HZ-GB-2312; explain why.
Browse files Browse the repository at this point in the history
git-svn-id: http://svn.whatwg.org/webapps@4282 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Oct 23, 2009
1 parent fbc0d2a commit ee95439
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 18 deletions.
30 changes: 24 additions & 6 deletions complete.html
Expand Up @@ -11888,12 +11888,13 @@ <h5 id=charset><span class=secno>4.2.5.5 </span>Specifying the document's charac
<a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>

<p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
(JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-->, and encodings based on EBCDIC. Authors should not use
UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
encodings.
(JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
encodings based on EBCDIC. Authors should not use UTF-32.
Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
<a href=#refsRFC1345>[RFC1345]</a><!-- for the JIS types -->
<a href=#refsRFC1842>[RFC1842]</a><!-- HZ-GB-2312 -->
<a href=#refsRFC1468>[RFC1468]</a><!-- ISO-2022-JP -->
<a href=#refsRFC2237>[RFC2237]</a><!-- ISO-2022-JP-1 -->
<a href=#refsRFC1554>[RFC1554]</a><!-- ISO-2022-JP-2 -->
Expand All @@ -11907,8 +11908,18 @@ <h5 id=charset><span class=secno>4.2.5.5 </span>Specifying the document's charac
<!-- no idea what to reference for EBCDIC, so... -->
</p>

<p class=note>Most of these encodings are discouraged because of
security concerns. If a hostile user can contribute text to a site
using these encodings, bugs in the site's whitelisting filter or in
a user agent can easily lead to the filter interpreting the
contribution as "safe" while the user agent interprets the same
contribution as containing a <code><a href=#script>script</a></code> element. This would
enable cross-site scripting attacks. By avoiding these encodings,
and always providing a <a href=#character-encoding-declaration>character encoding declaration</a>,
an author is less likely to run into this kind of problem.</p>

<p>Authors are encouraged to use UTF-8. Conformance checkers may
advise against authors using legacy encodings.</p>
advise authors against using legacy encodings.</p>

<div class=impl>

Expand Down Expand Up @@ -86522,6 +86533,13 @@ <h3 class="no-num">Reflecting IDL attributes</h3>
Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
December 1993.</dd>

<dt id=refsRFC1842>[RFC1842]</dt>

<dd><cite><a href=http://www.ietf.org/rfc/rfc1842.txt>ASCII
Printable Characters-Based Chinese Character Encoding for Internet
Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
IETF, August 1995.</dd>

<dt id=refsRFC1922>[RFC1922]</dt>
<dd><cite><a href=http://www.ietf.org/rfc/rfc1922.txt>Chinese Character
Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,
Expand Down
30 changes: 24 additions & 6 deletions index
Expand Up @@ -11718,12 +11718,13 @@ people expect to have work and what is necessary.
<a href=#ascii-compatible-character-encoding>ASCII-compatible character encoding</a>.</p>

<p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
(JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-->, and encodings based on EBCDIC. Authors should not use
UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
encodings.
(JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
encodings based on EBCDIC. Authors should not use UTF-32.
Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
<a href=#refsRFC1345>[RFC1345]</a><!-- for the JIS types -->
<a href=#refsRFC1842>[RFC1842]</a><!-- HZ-GB-2312 -->
<a href=#refsRFC1468>[RFC1468]</a><!-- ISO-2022-JP -->
<a href=#refsRFC2237>[RFC2237]</a><!-- ISO-2022-JP-1 -->
<a href=#refsRFC1554>[RFC1554]</a><!-- ISO-2022-JP-2 -->
Expand All @@ -11737,8 +11738,18 @@ people expect to have work and what is necessary.
<!-- no idea what to reference for EBCDIC, so... -->
</p>

<p class=note>Most of these encodings are discouraged because of
security concerns. If a hostile user can contribute text to a site
using these encodings, bugs in the site's whitelisting filter or in
a user agent can easily lead to the filter interpreting the
contribution as "safe" while the user agent interprets the same
contribution as containing a <code><a href=#script>script</a></code> element. This would
enable cross-site scripting attacks. By avoiding these encodings,
and always providing a <a href=#character-encoding-declaration>character encoding declaration</a>,
an author is less likely to run into this kind of problem.</p>

<p>Authors are encouraged to use UTF-8. Conformance checkers may
advise against authors using legacy encodings.</p>
advise authors against using legacy encodings.</p>

<div class=impl>

Expand Down Expand Up @@ -77700,6 +77711,13 @@ interface <a href=#htmldocument>HTMLDocument</a> {
Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
December 1993.</dd>

<dt id=refsRFC1842>[RFC1842]</dt>

<dd><cite><a href=http://www.ietf.org/rfc/rfc1842.txt>ASCII
Printable Characters-Based Chinese Character Encoding for Internet
Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
IETF, August 1995.</dd>

<dt id=refsRFC1922>[RFC1922]</dt>
<dd><cite><a href=http://www.ietf.org/rfc/rfc1922.txt>Chinese Character
Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,
Expand Down
30 changes: 24 additions & 6 deletions source
Expand Up @@ -12379,12 +12379,13 @@ people expect to have work and what is necessary.
<span>ASCII-compatible character encoding</span>.</p>

<p>Authors should not use JIS-X-0208 <!-- x-JIS0208 -->
(JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), encodings based on
ISO-2022<!-- http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422
-->, and encodings based on EBCDIC. Authors should not use
UTF-32. Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU
encodings.
(JIS_C6226-1983), JIS-X-0212 (JIS_X0212-1990), HZ-GB-2312<!-- has
crazy handling of ASCII "~" -->, encodings based on ISO-2022<!--
http://krijnhoetmer.nl/irc-logs/whatwg/20090628#l-422 -->, and
encodings based on EBCDIC. Authors should not use UTF-32.
Authors must not use the CESU-8, UTF-7, BOCU-1 and SCSU encodings.
<a href="#refsRFC1345">[RFC1345]</a><!-- for the JIS types -->
<a href="#refsRFC1842">[RFC1842]</a><!-- HZ-GB-2312 -->
<a href="#refsRFC1468">[RFC1468]</a><!-- ISO-2022-JP -->
<a href="#refsRFC2237">[RFC2237]</a><!-- ISO-2022-JP-1 -->
<a href="#refsRFC1554">[RFC1554]</a><!-- ISO-2022-JP-2 -->
Expand All @@ -12398,8 +12399,18 @@ people expect to have work and what is necessary.
<!-- no idea what to reference for EBCDIC, so... -->
</p>

<p class="note">Most of these encodings are discouraged because of
security concerns. If a hostile user can contribute text to a site
using these encodings, bugs in the site's whitelisting filter or in
a user agent can easily lead to the filter interpreting the
contribution as "safe" while the user agent interprets the same
contribution as containing a <code>script</code> element. This would
enable cross-site scripting attacks. By avoiding these encodings,
and always providing a <span>character encoding declaration</span>,
an author is less likely to run into this kind of problem.</p>

<p>Authors are encouraged to use UTF-8. Conformance checkers may
advise against authors using legacy encodings.</p>
advise authors against using legacy encodings.</p>

<div class="impl">

Expand Down Expand Up @@ -95692,6 +95703,13 @@ interface <span>HTMLDocument</span> {
Encoding for Internet Messages</a></cite>, U. Choi, K. Chon, H. Park. IETF,
December 1993.</dd>

<dt id="refsRFC1842">[RFC1842]</dt>

<dd><cite><a href="http://www.ietf.org/rfc/rfc1842.txt">ASCII
Printable Characters-Based Chinese Character Encoding for Internet
Messages</a></cite>, Y. Wei, Y. Zhang, J. Li, J. Ding, Y. Jiang.
IETF, August 1995.</dd>

<dt id="refsRFC1922">[RFC1922]</dt>
<dd><cite><a href="http://www.ietf.org/rfc/rfc1922.txt">Chinese Character
Encoding for Internet Messages</a></cite>, HF. Zhu, DY. Hu, ZG. Wang, TC. Kao,
Expand Down

0 comments on commit ee95439

Please sign in to comment.