HTML Standard Tracker

Diff (omit for latest revision)
Filter

Short URL: http://html5.org/r/2888

File a bug

SVNBugCommentTime (UTC)
2888Define how to determine the character encoding of worker scripts.2009-03-20 22:28
Index: source
===================================================================
--- source	(revision 2887)
+++ source	(revision 2888)
@@ -58411,6 +58411,95 @@
   represents.</p>
 
 
+  <h4>Decoding scripts</h4>
+
+  <p>When a user agent is to <dfn>decode a script resource</dfn> to
+  obtain its source in Unicode, it must run the following steps:</p>
+
+  <ol>
+
+   <li>
+
+    <p>Let <var title="">character encoding</var> be <i
+    title="">unknown</i>.</p>
+
+   </li>
+
+   <li>
+
+    <p>For each of the rows in the following table, starting with the
+    first one and going down, if the resource has as many or more
+    bytes available than the number of bytes in the first column, and
+    the first bytes of the resource match the bytes given in the first
+    column, then let <var title="">character encoding</var> be the
+    encoding given in the cell in the second column of that row:</p>
+
+    <!-- this table is present in several forms in this file; keep them in sync -->
+    <table>
+     <thead>
+      <tr>
+       <th>Bytes in Hexadecimal
+       <th>Encoding
+     <tbody>
+<!-- nobody uses this
+      <tr>
+       <td>00 00 FE FF
+       <td>UTF-32BE
+      <tr>
+       <td>FF FE 00 00
+       <td>UTF-32LE
+-->
+      <tr>
+       <td>FE FF
+       <td>UTF-16BE
+      <tr>
+       <td>FF FE
+       <td>UTF-16LE
+      <tr>
+       <td>EF BB BF
+       <td>UTF-8
+<!-- nobody uses this
+      <tr>
+       <td>DD 73 66 73
+       <td>UTF-EBCDIC
+-->
+    </table>
+
+    <p class="note">This step looks for Unicode Byte Order Marks
+    (BOMs).</p>
+
+   </li>
+
+   <li>
+
+    <p>If <var title="">character encoding</var> is still <i
+    title="">unknown</i>, apply the <span>algorithm for extracting an
+    encoding from a Content-Type</span> to the resource's <span
+    title="Content-Type">Content Type metadata</span>; if this returns
+    an encoding, and the user agent supports that encoding, then let
+    <var title="">character encoding</var> be that encoding.</p>
+
+   </li>
+
+   <li>
+
+    <p>If <var title="">character encoding</var> is still <i
+    title="">unknown</i>, then let <var title="">character
+    encoding</var> be UTF-8.</p>
+
+   </li>
+
+   <li>
+
+    <p>Convert the resource to Unicode using the character encoding
+    given by <var title="">character encoding</var>.</p>
+
+    <p>Return the text that is so obtained.</p>
+
+   </li>
+
+  </ol>
+
   <h4>The event loop</h4>
 
   <p>Each <code>WorkerGlobalScope</code> object is asssociated with a
@@ -58570,9 +58659,9 @@
     title="event-error">error</code> at that object. Abort these
     steps.</p>
 
-    <p>If the attempt succeeds, then let <var title="">source</var> be
-    the text of the resource that was obtained.</p><!-- XXX do we need
-    to define character encoding decoding here? -->
+    <p>If the attempt succeeds, then <span title="decode a script
+    resource">decode the script resource</span> to obtain its <var
+    title="">source</var>.</p>
 
     <p>Let <var title="">language</var> be JavaScript.</p>
 
@@ -59266,11 +59355,12 @@
       <code>NETWORK_ERR</code> exception and abort all these
       steps.</p>
 
-      <p>If the fetching attempt succeeded, then let <var
-      title="">source</var> be the text of the resource that was
-      obtained, and let <var title="">language</var> be
-      JavaScript.</p>
+      <p>If the attempt succeeds, then <span title="decode a script
+      resource">decode the script resource</span> to obtain its <var
+      title="">source</var>.</p>
 
+      <p>Let <var title="">language</var> be JavaScript.</p>
+
       <p class="note">As with the worker's script, the script here is
       always assumed to be JavaScript, regardless of the MIME
       type.</p> <!-- XXX -->

|