Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[giow] (3) Make a BOM override HTTP headers.
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=17810
Affected topics: HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@7360 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Sep 16, 2012
1 parent 50bbda4 commit 947be85
Show file tree
Hide file tree
Showing 3 changed files with 93 additions and 45 deletions.
44 changes: 30 additions & 14 deletions complete.html
Expand Up @@ -88430,10 +88430,6 @@ <h5 id=determining-the-character-encoding><span class=secno>12.2.2.1 </span>Dete

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>The user agent may wait for more bytes of the resource to be
Expand All @@ -88455,13 +88451,21 @@ <h5 id=determining-the-character-encoding><span class=secno>12.2.2.1 </span>Dete

</li>

<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>
<li>

<!-- Doing this step before honouring HTTP is important for supporting
http://kb.dsqq.cn/html/2012-09/16/node_193.htm
which is encoded as UTF-8 but is incorrectly labeled as
Content-Type: text/html; charset=GB2312
-->

<p>For each of the rows in the following table, starting with the
first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table><thead><tr><th>Bytes in Hexadecimal
Expand All @@ -88485,12 +88489,24 @@ <h5 id=determining-the-character-encoding><span class=secno>12.2.2.1 </span>Dete
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</li>
(BOMs).</p>

<p class=note>That this step happens before the next one
honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
<a href=#willful-violation>willful violation</a> of the HTTP specification,
motivated by a desire to be maximally compatible with legacy
content. <a href=#refsHTTP>[HTTP]</a></p>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
determine its encoding">prescan the byte stream to determine its
<p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
encoding">prescan the byte stream to determine its
encoding</a>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
Expand Down
44 changes: 30 additions & 14 deletions index
Expand Up @@ -88430,10 +88430,6 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>The user agent may wait for more bytes of the resource to be
Expand All @@ -88455,13 +88451,21 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>

</li>

<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>
<li>

<!-- Doing this step before honouring HTTP is important for supporting
http://kb.dsqq.cn/html/2012-09/16/node_193.htm
which is encoded as UTF-8 but is incorrectly labeled as
Content-Type: text/html; charset=GB2312
-->

<p>For each of the rows in the following table, starting with the
first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table><thead><tr><th>Bytes in Hexadecimal
Expand All @@ -88485,12 +88489,24 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</li>
(BOMs).</p>

<p class=note>That this step happens before the next one
honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
<a href=#willful-violation>willful violation</a> of the HTTP specification,
motivated by a desire to be maximally compatible with legacy
content. <a href=#refsHTTP>[HTTP]</a></p>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
determine its encoding">prescan the byte stream to determine its
<p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
encoding">prescan the byte stream to determine its
encoding</a>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
Expand Down
50 changes: 33 additions & 17 deletions source
Expand Up @@ -102588,11 +102588,6 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps.</p></li>

<li>

<p>The user agent may wait for more bytes of the resource to be
Expand All @@ -102615,14 +102610,22 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

</li>

<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps:</p>
<li>

<!-- Doing this step before honouring HTTP is important for supporting
http://kb.dsqq.cn/html/2012-09/16/node_193.htm
which is encoded as UTF-8 but is incorrectly labeled as
Content-Type: text/html; charset=GB2312
-->

<p>For each of the rows in the following table, starting with the
first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table>
Expand Down Expand Up @@ -102655,13 +102658,26 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
-->
</table>

<p class="note">This step looks for Unicode Byte Order Marks
(BOMs).</p></li>
<p class="note">This step looks for Unicode Byte Order Marks
(BOMs).</p>

<p class="note">That this step happens before the next one
honoring the HTTP <code>Content-Type</code> header is a
<span>willful violation</span> of the HTTP specification,
motivated by a desire to be maximally compatible with legacy
content. <a href="#refsHTTP">[HTTP]</a></p>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps.</p></li>

<li>

<p>Otherwise, optionally <span title="prescan a byte stream to
determine its encoding">prescan the byte stream to determine its
<p>Optionally <span title="prescan a byte stream to determine its
encoding">prescan the byte stream to determine its
encoding</span>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
Expand Down

0 comments on commit 947be85

Please sign in to comment.