Skip to content

Commit

Permalink
[e] (0) Move a section so that the character encoding requirements ar…
Browse files Browse the repository at this point in the history
…e closer together.

Affected topics: HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@6992 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Feb 13, 2012
1 parent 0a42fa6 commit 0974ce8
Show file tree
Hide file tree
Showing 3 changed files with 173 additions and 175 deletions.
113 changes: 56 additions & 57 deletions complete.html
Expand Up @@ -1119,8 +1119,8 @@ <h2 class="no-num no-toc">Living Standard &mdash; Last Updated 13 February 2012<
<ol>
<li><a href=#determining-the-character-encoding><span class=secno>12.2.2.1 </span>Determining the character encoding</a></li>
<li><a href=#character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</a></li>
<li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</a></li>
<li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
<li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</a></li>
<li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</a></ol></li>
<li><a href=#parse-state><span class=secno>12.2.3 </span>Parse state</a>
<ol>
<li><a href=#the-insertion-mode><span class=secno>12.2.3.1 </span>The insertion mode</a></li>
Expand Down Expand Up @@ -81878,7 +81878,59 @@ <h5 id=character-encodings-0><span class=secno>12.2.2.2 </span>Character encodin



<h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</h5>
<h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</h5>

<p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
encoding</dfn>, it must run the following steps. This might happen
if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
failed to find an encoding, or if it found an encoding that was not
the actual encoding of the file.</p>

<ol><li>If the encoding that is already being used to interpret the
input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>

<li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
it to UTF-8.</li>

<li>If the new encoding is identical or equivalent to the encoding
that is already being used to interpret the input stream, then set
the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. This happens when the
encoding information found in the file matches what the
<a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
encoding, and in the second pass through the parser if the first
pass found that the encoding sniffing algorithm described in the
earlier section failed to find the right encoding.</li>

<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
encoding and the new encoding, and if the user agent supports
changing the converter on the fly, then the user agent may change
to the new converter for the encoding on the fly. Set the
<a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>, and abort these steps.</li>

<li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>. Whenever possible, this should be done without
actually contacting the network layer (the bytes should be
re-parsed from memory), even if, e.g., the document is marked as
not being cacheable. If this is not possible and contacting the
network layer would involve repeating a request that uses a method
other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and ignore the new encoding. The resource will be
misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>

</ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</h5>

<p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed
into it as the <a href=#the-input-byte-stream>input byte stream</a> is decoded or from the
Expand Down Expand Up @@ -81936,60 +81988,7 @@ <h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preproce
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>


<h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</h5>

<p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
encoding</dfn>, it must run the following steps. This might happen
if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
failed to find an encoding, or if it found an encoding that was not
the actual encoding of the file.</p>

<ol><li>If the encoding that is already being used to interpret the
input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>

<li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
it to UTF-8.</li>

<li>If the new encoding is identical or equivalent to the encoding
that is already being used to interpret the input stream, then set
the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. This happens when the
encoding information found in the file matches what the
<a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
encoding, and in the second pass through the parser if the first
pass found that the encoding sniffing algorithm described in the
earlier section failed to find the right encoding.</li>

<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
encoding and the new encoding, and if the user agent supports
changing the converter on the fly, then the user agent may change
to the new converter for the encoding on the fly. Set the
<a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>, and abort these steps.</li>

<li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>. Whenever possible, this should be done without
actually contacting the network layer (the bytes should be
re-parsed from memory), even if, e.g., the document is marked as
not being cacheable. If this is not possible and contacting the
network layer would involve repeating a request that uses a method
other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and ignore the new encoding. The resource will be
misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>

</ol></div>
</div>


<div class=impl>
Expand Down
113 changes: 56 additions & 57 deletions index
Expand Up @@ -1119,8 +1119,8 @@
<ol>
<li><a href=#determining-the-character-encoding><span class=secno>12.2.2.1 </span>Determining the character encoding</a></li>
<li><a href=#character-encodings-0><span class=secno>12.2.2.2 </span>Character encodings</a></li>
<li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</a></li>
<li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</a></ol></li>
<li><a href=#changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</a></li>
<li><a href=#preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</a></ol></li>
<li><a href=#parse-state><span class=secno>12.2.3 </span>Parse state</a>
<ol>
<li><a href=#the-insertion-mode><span class=secno>12.2.3.1 </span>The insertion mode</a></li>
Expand Down Expand Up @@ -81878,7 +81878,59 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>



<h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.3 </span>Preprocessing the input stream</h5>
<h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.3 </span>Changing the encoding while parsing</h5>

<p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
encoding</dfn>, it must run the following steps. This might happen
if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
failed to find an encoding, or if it found an encoding that was not
the actual encoding of the file.</p>

<ol><li>If the encoding that is already being used to interpret the
input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>

<li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
it to UTF-8.</li>

<li>If the new encoding is identical or equivalent to the encoding
that is already being used to interpret the input stream, then set
the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. This happens when the
encoding information found in the file matches what the
<a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
encoding, and in the second pass through the parser if the first
pass found that the encoding sniffing algorithm described in the
earlier section failed to find the right encoding.</li>

<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
encoding and the new encoding, and if the user agent supports
changing the converter on the fly, then the user agent may change
to the new converter for the encoding on the fly. Set the
<a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>, and abort these steps.</li>

<li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>. Whenever possible, this should be done without
actually contacting the network layer (the bytes should be
re-parsed from memory), even if, e.g., the document is marked as
not being cacheable. If this is not possible and contacting the
network layer would involve repeating a request that uses a method
other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and ignore the new encoding. The resource will be
misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>

</ol><h5 id=preprocessing-the-input-stream><span class=secno>12.2.2.4 </span>Preprocessing the input stream</h5>

<p>The <dfn id=input-stream>input stream</dfn> consists of the characters pushed
into it as the <a href=#the-input-byte-stream>input byte stream</a> is decoded or from the
Expand Down Expand Up @@ -81936,60 +81988,7 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.</p>


<h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Changing the encoding while parsing</h5>

<p>When the parser requires the user agent to <dfn id=change-the-encoding>change the
encoding</dfn>, it must run the following steps. This might happen
if the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> described above
failed to find an encoding, or if it found an encoding that was not
the actual encoding of the file.</p>

<ol><li>If the encoding that is already being used to interpret the
input stream is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, then set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. The new encoding is ignored;
if it was anything but the same encoding, then it would be clearly
incorrect.</li>

<li>If the new encoding is <a href=#a-utf-16-encoding>a UTF-16 encoding</a>, change
it to UTF-8.</li>

<li>If the new encoding is identical or equivalent to the encoding
that is already being used to interpret the input stream, then set
the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and abort these steps. This happens when the
encoding information found in the file matches what the
<a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> determined to be the
encoding, and in the second pass through the parser if the first
pass found that the encoding sniffing algorithm described in the
earlier section failed to find the right encoding.</li>

<li>If all the bytes up to the last byte converted by the current
decoder have the same Unicode interpretations in both the current
encoding and the new encoding, and if the user agent supports
changing the converter on the fly, then the user agent may change
to the new converter for the encoding on the fly. Set the
<a href="#document's-character-encoding">document's character encoding</a> and the encoding used to
convert the input stream to the new encoding, set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>, and abort these steps.</li>

<li>Otherwise, <a href=#navigate>navigate</a><!--DONAV reparse--> to the
document again, with <a href=#replacement-enabled>replacement enabled</a>, and using
the same <a href=#source-browsing-context>source browsing context</a>, but this time skip
the <a href=#encoding-sniffing-algorithm>encoding sniffing algorithm</a> and instead just set
the encoding to the new encoding and the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i>. Whenever possible, this should be done without
actually contacting the network layer (the bytes should be
re-parsed from memory), even if, e.g., the document is marked as
not being cacheable. If this is not possible and contacting the
network layer would involve repeating a request that uses a method
other than HTTP GET (<a href=#concept-http-equivalent-get title=concept-http-equivalent-get>or
equivalent</a> for non-HTTP URLs), then instead set the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a> to
<i>certain</i> and ignore the new encoding. The resource will be
misinterpreted. User agents may notify the user of the situation,
to aid in application development.</li>

</ol></div>
</div>


<div class=impl>
Expand Down

0 comments on commit 0974ce8

Please sign in to comment.