Skip to content

Commit

Permalink
[giow] (2) Match Gecko for character encoding processing for <script>
Browse files Browse the repository at this point in the history
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=10656

git-svn-id: http://svn.whatwg.org/webapps@5545 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Sep 29, 2010
1 parent 04eba9d commit bb16d5b
Show file tree
Hide file tree
Showing 3 changed files with 214 additions and 156 deletions.
116 changes: 67 additions & 49 deletions complete.html
Expand Up @@ -14305,10 +14305,12 @@ <h4 id=script><span class=secno>4.3.1 </span>The <dfn><code>script</code></dfn>
<code><a href=#document>Document</a></code> objects can also have this flag set; it's
propagated to the <code><a href=#document>Document</a></code> when the script runs.</p>

<p>The fifth and sixth pieces of state are <dfn id="the-script-block's-type"><var>the script
block's type</var></dfn> and <dfn id="the-script-block's-character-encoding"><var>the script block's character
encoding</var></dfn>. They are determined when the script is run,
based on the attributes on the element at that time.</p>
<p>The last few pieces of state are <dfn id="the-script-block's-type"><var>the script block's
type</var></dfn>, <dfn id="the-script-block's-character-encoding"><var>the script block's character
encoding</var></dfn>, and <dfn id="the-script-block's-fallback-character-encoding"><var>the script block's fallback
character encoding</var></dfn>. They are determined when the script
is run, based on the attributes on the element at that time, and the
<code><a href=#document>Document</a></code> of the <code><a href=#script>script</a></code> element.</p>

<p>When a <code><a href=#script>script</a></code> element that is not marked as being
<a href=#parser-inserted>"parser-inserted"</a> experiences one of the events listed
Expand Down Expand Up @@ -14466,9 +14468,12 @@ <h4 id=script><span class=secno>4.3.1 </span>The <dfn><code>script</code></dfn>
<var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var> for this
<code><a href=#script>script</a></code> element be the encoding given by the <code title=attr-script-charset><a href=#attr-script-charset>charset</a></code> attribute.</p>

<p>Otherwise, let <var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var>
for this <code><a href=#script>script</a></code> element be the same as <a href="#document's-character-encoding" title="document's character encoding">the encoding of the document
itself</a>.</p>
<p>Otherwise, let <var><a href="#the-script-block's-fallback-character-encoding">the script block's fallback character
encoding</a></var> for this <code><a href=#script>script</a></code> element be the same as
<a href="#document's-character-encoding" title="document's character encoding">the encoding of the
document itself</a>.</p>

<p class=note>Only one of these two pieces of state is set.</p>

</li>

Expand All @@ -14495,13 +14500,6 @@ <h4 id=script><span class=secno>4.3.1 </span>The <dfn><code>script</code></dfn>
user agent must act as if it had received an empty HTTP 400
response.</p>

<p>Once the resource's <a href=#content-type title=Content-Type>Content Type
metadata</a> is available, if it ever is, apply the
<a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding from a
Content-Type</a> to it. If this returns an encoding, and the
user agent supports that encoding, then let <var><a href="#the-script-block's-character-encoding">the script
block's character encoding</a></var> be that encoding.</p>

<p>For performance reasons, user agents may start fetching the
script as soon as the attribute is set, instead, in the hope that
the element will be inserted into the document. Either way, once
Expand Down Expand Up @@ -14648,43 +14646,63 @@ <h4 id=script><span class=secno>4.3.1 </span>The <dfn><code>script</code></dfn>
<p>The contents of that file, interpreted as string of
Unicode characters, are the script source.</p>

<p>For each of the rows in the following table, starting with
the first one and going down, if the file has as many or more
bytes available than the number of bytes in the first column,
and the first bytes of the file match the bytes given in the
first column, then set <var><a href="#the-script-block's-character-encoding">the script block's character
encoding</a></var> to the encoding given in the cell in the second
column of that row, irrespective of any previous value:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
<th>Encoding
<tbody><!-- nobody uses this
<tr>
<td>00 00 FE FF
<td>UTF-32BE
<tr>
<td>FF FE 00 00
<td>UTF-32LE
--><tr><td>FE FF
<td>Big-endian UTF-16
<tr><td>FF FE
<td>Little-endian UTF-16
<tr><td>EF BB BF
<td>UTF-8
<!-- nobody uses this
<tr>
<td>DD 73 66 73
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</p>
<p>To obtain the string of Unicode characters, the user agent
run the following steps:</p>

<ol><li><p>If the resource's <a href=#content-type title=Content-Type>Content
Type metadata</a>, if any, specifies a character encoding,
and the user agent supports that encoding, then let <var title="">character encoding</var> be that encoding, and jump
to the bottom step in this series of steps.</li>

<li><p>If the algorithm above set <var><a href="#the-script-block's-character-encoding">the script block's
character encoding</a></var>, then let <var title="">character
encoding</var> be that encoding, and jump to the bottom step
in this series of steps.</li>

<li><p>For each of the rows in the following table, starting
with the first one and going down, if the file has as many or
more bytes available than the number of bytes in the first
column, and the first bytes of the file match the bytes given
in the first column, then set <var title="">character
encoding</var> to the encoding given in the cell in the
second column of that row, and jump to the bottom step in
this series of steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
<th>Encoding
<tbody><!-- nobody uses this
<tr>
<td>00 00 FE FF
<td>UTF-32BE
<tr>
<td>FF FE 00 00
<td>UTF-32LE
--><tr><td>FE FF
<td>Big-endian UTF-16
<tr><td>FF FE
<td>Little-endian UTF-16
<tr><td>EF BB BF
<td>UTF-8
<!-- nobody uses this
<tr>
<td>DD 73 66 73
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</p>

<p>The file must then be converted to Unicode using the
character encoding given by <var><a href="#the-script-block's-character-encoding">the script block's character
encoding</a></var>.</p>
</li>

</dd>
<li><p>Let <var title="">character encoding</var> be <var><a href="#the-script-block's-fallback-character-encoding">the
script block's fallback character encoding</a></var>.</li>

<li><p>Convert the file to Unicode using <var>character
encoding</var>, following the rules for doing so given by the
specification for <var><a href="#the-script-block's-type">the script block's
type</a></var>.</li>

</ol></dd>

<dt>If the script is from an external file and <var><a href="#the-script-block's-type">the script block's type</a></var> is an XML-based language</dt>

Expand Down
116 changes: 67 additions & 49 deletions index
Expand Up @@ -14282,10 +14282,12 @@ c-end = "--&gt;"</pre>
<code><a href=#document>Document</a></code> objects can also have this flag set; it's
propagated to the <code><a href=#document>Document</a></code> when the script runs.</p>

<p>The fifth and sixth pieces of state are <dfn id="the-script-block's-type"><var>the script
block's type</var></dfn> and <dfn id="the-script-block's-character-encoding"><var>the script block's character
encoding</var></dfn>. They are determined when the script is run,
based on the attributes on the element at that time.</p>
<p>The last few pieces of state are <dfn id="the-script-block's-type"><var>the script block's
type</var></dfn>, <dfn id="the-script-block's-character-encoding"><var>the script block's character
encoding</var></dfn>, and <dfn id="the-script-block's-fallback-character-encoding"><var>the script block's fallback
character encoding</var></dfn>. They are determined when the script
is run, based on the attributes on the element at that time, and the
<code><a href=#document>Document</a></code> of the <code><a href=#script>script</a></code> element.</p>

<p>When a <code><a href=#script>script</a></code> element that is not marked as being
<a href=#parser-inserted>"parser-inserted"</a> experiences one of the events listed
Expand Down Expand Up @@ -14443,9 +14445,12 @@ c-end = "--&gt;"</pre>
<var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var> for this
<code><a href=#script>script</a></code> element be the encoding given by the <code title=attr-script-charset><a href=#attr-script-charset>charset</a></code> attribute.</p>

<p>Otherwise, let <var><a href="#the-script-block's-character-encoding">the script block's character encoding</a></var>
for this <code><a href=#script>script</a></code> element be the same as <a href="#document's-character-encoding" title="document's character encoding">the encoding of the document
itself</a>.</p>
<p>Otherwise, let <var><a href="#the-script-block's-fallback-character-encoding">the script block's fallback character
encoding</a></var> for this <code><a href=#script>script</a></code> element be the same as
<a href="#document's-character-encoding" title="document's character encoding">the encoding of the
document itself</a>.</p>

<p class=note>Only one of these two pieces of state is set.</p>

</li>

Expand All @@ -14472,13 +14477,6 @@ c-end = "--&gt;"</pre>
user agent must act as if it had received an empty HTTP 400
response.</p>

<p>Once the resource's <a href=#content-type title=Content-Type>Content Type
metadata</a> is available, if it ever is, apply the
<a href=#algorithm-for-extracting-an-encoding-from-a-content-type>algorithm for extracting an encoding from a
Content-Type</a> to it. If this returns an encoding, and the
user agent supports that encoding, then let <var><a href="#the-script-block's-character-encoding">the script
block's character encoding</a></var> be that encoding.</p>

<p>For performance reasons, user agents may start fetching the
script as soon as the attribute is set, instead, in the hope that
the element will be inserted into the document. Either way, once
Expand Down Expand Up @@ -14625,43 +14623,63 @@ c-end = "--&gt;"</pre>
<p>The contents of that file, interpreted as string of
Unicode characters, are the script source.</p>

<p>For each of the rows in the following table, starting with
the first one and going down, if the file has as many or more
bytes available than the number of bytes in the first column,
and the first bytes of the file match the bytes given in the
first column, then set <var><a href="#the-script-block's-character-encoding">the script block's character
encoding</a></var> to the encoding given in the cell in the second
column of that row, irrespective of any previous value:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
<th>Encoding
<tbody><!-- nobody uses this
<tr>
<td>00 00 FE FF
<td>UTF-32BE
<tr>
<td>FF FE 00 00
<td>UTF-32LE
--><tr><td>FE FF
<td>Big-endian UTF-16
<tr><td>FF FE
<td>Little-endian UTF-16
<tr><td>EF BB BF
<td>UTF-8
<!-- nobody uses this
<tr>
<td>DD 73 66 73
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</p>
<p>To obtain the string of Unicode characters, the user agent
run the following steps:</p>

<ol><li><p>If the resource's <a href=#content-type title=Content-Type>Content
Type metadata</a>, if any, specifies a character encoding,
and the user agent supports that encoding, then let <var title="">character encoding</var> be that encoding, and jump
to the bottom step in this series of steps.</li>

<li><p>If the algorithm above set <var><a href="#the-script-block's-character-encoding">the script block's
character encoding</a></var>, then let <var title="">character
encoding</var> be that encoding, and jump to the bottom step
in this series of steps.</li>

<li><p>For each of the rows in the following table, starting
with the first one and going down, if the file has as many or
more bytes available than the number of bytes in the first
column, and the first bytes of the file match the bytes given
in the first column, then set <var title="">character
encoding</var> to the encoding given in the cell in the
second column of that row, and jump to the bottom step in
this series of steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table id=table-script-bom><thead><tr><th>Bytes in Hexadecimal
<th>Encoding
<tbody><!-- nobody uses this
<tr>
<td>00 00 FE FF
<td>UTF-32BE
<tr>
<td>FF FE 00 00
<td>UTF-32LE
--><tr><td>FE FF
<td>Big-endian UTF-16
<tr><td>FF FE
<td>Little-endian UTF-16
<tr><td>EF BB BF
<td>UTF-8
<!-- nobody uses this
<tr>
<td>DD 73 66 73
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</p>

<p>The file must then be converted to Unicode using the
character encoding given by <var><a href="#the-script-block's-character-encoding">the script block's character
encoding</a></var>.</p>
</li>

</dd>
<li><p>Let <var title="">character encoding</var> be <var><a href="#the-script-block's-fallback-character-encoding">the
script block's fallback character encoding</a></var>.</li>

<li><p>Convert the file to Unicode using <var>character
encoding</var>, following the rules for doing so given by the
specification for <var><a href="#the-script-block's-type">the script block's
type</a></var>.</li>

</ol></dd>

<dt>If the script is from an external file and <var><a href="#the-script-block's-type">the script block's type</a></var> is an XML-based language</dt>

Expand Down

0 comments on commit bb16d5b

Please sign in to comment.