Skip to content

Commit

Permalink
[giow] (2) Change how character encodings are sniffed to require an h…
Browse files Browse the repository at this point in the history
…ttp-equiv attribute, and to only process one character encoding per <meta> element, even if attributes are duplicated.

Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=9225

git-svn-id: http://svn.whatwg.org/webapps@4993 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Apr 12, 2010
1 parent 989721f commit d36e742
Show file tree
Hide file tree
Showing 3 changed files with 186 additions and 69 deletions.
77 changes: 56 additions & 21 deletions complete.html
Expand Up @@ -74090,36 +74090,71 @@ <h5 id=determining-the-character-encoding><span class=secno>12.2.2.1 </span>Dete
0x2F byte (the one in sequence of characters matched
above).</li>

<li><p><a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get
an attribute</a> and its value. If no attribute was
sniffed, then skip this inner set of steps, and jump to the
second step in the overall "two step" algorithm.</li>
<li><p>Let <var title="">attribute list</var> be an empty
list of strings.</li> <!-- so long as we only care about
http-equiv, content, and charset, this can be a 3-bit
bitfield -->

<li><p>If the attribute's name is neither "<code title="">charset</code>" nor "<code title="">content</code>",
then return to step 2 in these inner steps.</li>
<li><p>Let <var title="">got pragma</var> be false.</li>

<li><p>If the attribute's name is "<code title="">charset</code>", let <var title="">charset</var> be
the attribute's value, interpreted as a character
encoding.</li>
<li><p>Let <var title="">mode</var> be null.</li>

<li><p>Otherwise, the attribute's name is "<code title="">content</code>": apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for
extracting an encoding from a Content-Type</a>, giving the
attribute's value as the string to parse. If an encoding is
returned, let <var title="">charset</var> be that
encoding. Otherwise, return to step 2 in these inner
steps.</li>
<li><p>Let <var title="">charset</var> be the null value
(which, for the purposes of this algorithm, is distinct from
an unrecognised encoding or the empty string).</li>

<li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
attribute</a> and its value. If no attribute was sniffed,
then jump to the <i>processing</i> step below.</li>

<li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
labeled <i>attributes</i>.</p>

<li>

<p>Run the appropriate step from the following list, if one
applies:</p>

<dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>

<dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
pragma</var> to true.</dd>

<dt>If the attribute's name is "<code title="">charset</code>"</dt>

<dd><p>If <var title="">charset</var> is still set to null,
let <var title="">charset</var> be the encoding
corresponding to the attribute's value, and set <var title="">mode</var> to "charset".</dd>

<dt>If the attribute's name is "<code title="">content</code>"</dt>

<dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding
from a Content-Type</a>, giving the attribute's value as
the string to parse. If an encoding is returned, and if
<var title="">charset</var> is still set to null, let <var title="">charset</var> be the encoding returned, and set
<var title="">mode</var> to "pragma".</dd>

</dl></li>

<li><p>Return to the step labeled <i>attributes</i>.</li>

<li><p><i>Processing</i>: If <var title="">mode</var> is
null, then jump to the second step of the overall "two step"
algorithm.</li>

<li><p>If <var title="">mode</var> is "pragma" but <var title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</li>

<li><p>If <var title="">charset</var> is a UTF-16 encoding,
change the value of <var title="">charset</var> to
UTF-8.</li>

<li><p>If <var title="">charset</var> is a supported
character encoding, then return the given encoding, with
<a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort all these steps.</li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
overall "two step" algorithm.</li>

<li><p>Otherwise, return to step 2 in these inner
steps.</li>
<li><p>Return the encoding given by <var title="">charset</var>, with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort all these steps.</li>

</ol></dd>

Expand Down
77 changes: 56 additions & 21 deletions index
Expand Up @@ -67362,36 +67362,71 @@ interface <dfn id=messageport>MessagePort</dfn> {
0x2F byte (the one in sequence of characters matched
above).</li>

<li><p><a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get
an attribute</a> and its value. If no attribute was
sniffed, then skip this inner set of steps, and jump to the
second step in the overall "two step" algorithm.</li>
<li><p>Let <var title="">attribute list</var> be an empty
list of strings.</li> <!-- so long as we only care about
http-equiv, content, and charset, this can be a 3-bit
bitfield -->

<li><p>If the attribute's name is neither "<code title="">charset</code>" nor "<code title="">content</code>",
then return to step 2 in these inner steps.</li>
<li><p>Let <var title="">got pragma</var> be false.</li>

<li><p>If the attribute's name is "<code title="">charset</code>", let <var title="">charset</var> be
the attribute's value, interpreted as a character
encoding.</li>
<li><p>Let <var title="">mode</var> be null.</li>

<li><p>Otherwise, the attribute's name is "<code title="">content</code>": apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for
extracting an encoding from a Content-Type</a>, giving the
attribute's value as the string to parse. If an encoding is
returned, let <var title="">charset</var> be that
encoding. Otherwise, return to step 2 in these inner
steps.</li>
<li><p>Let <var title="">charset</var> be the null value
(which, for the purposes of this algorithm, is distinct from
an unrecognised encoding or the empty string).</li>

<li><p><i>Attributes</i>: <a href=#concept-get-attributes-when-sniffing title=concept-get-attributes-when-sniffing>Get an
attribute</a> and its value. If no attribute was sniffed,
then jump to the <i>processing</i> step below.</li>

<li><p>If the attribute's name is already in <var title="">attribute list</var>, then return to the step
labeled <i>attributes</i>.</p>

<li>

<p>Run the appropriate step from the following list, if one
applies:</p>

<dl class=switch><dt>If the attribute's name is "<code title="">http-equiv</code>"</dt>

<dd><p>If the attribute's value is "<code title="">content-type</code>", then set <var title="">got
pragma</var> to true.</dd>

<dt>If the attribute's name is "<code title="">charset</code>"</dt>

<dd><p>If <var title="">charset</var> is still set to null,
let <var title="">charset</var> be the encoding
corresponding to the attribute's value, and set <var title="">mode</var> to "charset".</dd>

<dt>If the attribute's name is "<code title="">content</code>"</dt>

<dd><p>Apply the <a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding
from a Content-Type</a>, giving the attribute's value as
the string to parse. If an encoding is returned, and if
<var title="">charset</var> is still set to null, let <var title="">charset</var> be the encoding returned, and set
<var title="">mode</var> to "pragma".</dd>

</dl></li>

<li><p>Return to the step labeled <i>attributes</i>.</li>

<li><p><i>Processing</i>: If <var title="">mode</var> is
null, then jump to the second step of the overall "two step"
algorithm.</li>

<li><p>If <var title="">mode</var> is "pragma" but <var title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</li>

<li><p>If <var title="">charset</var> is a UTF-16 encoding,
change the value of <var title="">charset</var> to
UTF-8.</li>

<li><p>If <var title="">charset</var> is a supported
character encoding, then return the given encoding, with
<a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort all these steps.</li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
overall "two step" algorithm.</li>

<li><p>Otherwise, return to step 2 in these inner
steps.</li>
<li><p>Return the encoding given by <var title="">charset</var>, with <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort all these steps.</li>

</ol></dd>

Expand Down
101 changes: 74 additions & 27 deletions source
Expand Up @@ -84379,39 +84379,86 @@ interface <dfn>SQLTransactionSync</dfn> {
0x2F byte (the one in sequence of characters matched
above).</p></li>

<li><p><span title="concept-get-attributes-when-sniffing">Get
an attribute</span> and its value. If no attribute was
sniffed, then skip this inner set of steps, and jump to the
second step in the overall "two step" algorithm.</p></li>

<li><p>If the attribute's name is neither "<code
title="">charset</code>" nor "<code title="">content</code>",
then return to step 2 in these inner steps.</p></li>

<li><p>If the attribute's name is "<code
title="">charset</code>", let <var title="">charset</var> be
the attribute's value, interpreted as a character
encoding.</p></li>

<li><p>Otherwise, the attribute's name is "<code
title="">content</code>": apply the <span>algorithm for
extracting an encoding from a Content-Type</span>, giving the
attribute's value as the string to parse. If an encoding is
returned, let <var title="">charset</var> be that
encoding. Otherwise, return to step 2 in these inner
steps.</p></li>
<li><p>Let <var title="">attribute list</var> be an empty
list of strings.</p></li> <!-- so long as we only care about
http-equiv, content, and charset, this can be a 3-bit
bitfield -->

<li><p>Let <var title="">got pragma</var> be false.</p></li>

<li><p>Let <var title="">mode</var> be null.</p></li>

<li><p>Let <var title="">charset</var> be the null value
(which, for the purposes of this algorithm, is distinct from
an unrecognised encoding or the empty string).</p></li>

<li><p><i>Attributes</i>: <span
title="concept-get-attributes-when-sniffing">Get an
attribute</span> and its value. If no attribute was sniffed,
then jump to the <i>processing</i> step below.</p></li>

<li><p>If the attribute's name is already in <var
title="">attribute list</var>, then return to the step
labeled <i>attributes</i>.</p>

<li>

<p>Run the appropriate step from the following list, if one
applies:</p>

<dl class="switch">

<dt>If the attribute's name is "<code
title="">http-equiv</code>"</dt>

<dd><p>If the attribute's value is "<code
title="">content-type</code>", then set <var title="">got
pragma</var> to true.</p></dd>

<dt>If the attribute's name is "<code
title="">charset</code>"</dt>

<dd><p>If <var title="">charset</var> is still set to null,
let <var title="">charset</var> be the encoding
corresponding to the attribute's value, and set <var
title="">mode</var> to "charset".</p></dd>

<dt>If the attribute's name is "<code
title="">content</code>"</dt>

<dd><p>Apply the <span>algorithm for extracting an encoding
from a Content-Type</span>, giving the attribute's value as
the string to parse. If an encoding is returned, and if
<var title="">charset</var> is still set to null, let <var
title="">charset</var> be the encoding returned, and set
<var title="">mode</var> to "pragma".</p></dd>

</dl>

</li>

<li><p>Return to the step labeled <i>attributes</i>.</p></li>

<li><p><i>Processing</i>: If <var title="">mode</var> is
null, then jump to the second step of the overall "two step"
algorithm.</p></li>

<li><p>If <var title="">mode</var> is "pragma" but <var
title="">got pragma</var> is false, then jump to the second
step of the overall "two step" algorithm.</p></li>

<li><p>If <var title="">charset</var> is a UTF-16 encoding,
change the value of <var title="">charset</var> to
UTF-8.</p></li>

<li><p>If <var title="">charset</var> is a supported
character encoding, then return the given encoding, with
<span title="concept-encoding-confidence">confidence</span>
<i>tentative</i>, and abort all these steps.</p></li>
<li><p>If <var title="">charset</var> is not a supported
character encoding, then jump to the second step of the
overall "two step" algorithm.</p></li>

<li><p>Otherwise, return to step 2 in these inner
steps.</p></li>
<li><p>Return the encoding given by <var
title="">charset</var>, with <span
title="concept-encoding-confidence">confidence</span>
<i>tentative</i>, and abort all these steps.</p></li>

</ol>

Expand Down

0 comments on commit d36e742

Please sign in to comment.