HTML Standard Tracker

Filter

File a bug

SVNBugCommentTime (UTC)
2802Support BOMs in <script src=''> JS files. (credit: mp)2009-02-12 10:46
@@ -4742,20 +4742,21 @@
    to be available.</p></li>
 
    <li><p>Let <var title="">n</var> be the smaller of either 512 or
    the number of bytes already available.</p></li>
 
    <li>
 
     <p>If <var title="">n</var> is 4 or more, and the first bytes of
     the resource match one of the following byte sets:</p>
 
+    <!-- this table is present in several forms in this file; keep them in sync -->
     <table>
      <thead>
       <tr>
        <th>Bytes in Hexadecimal
        <th>Description
      <tbody>
       <tr>
        <td>FE FF
        <td>UTF-16BE BOM <!-- followed by a character --><!-- nobody uses this: or UTF-32LE BOM -->
       <tr>
@@ -10824,22 +10825,64 @@ people expect to have work and what is necessary.
 
       <dl class="switch">
 
        <dt>If the script is from an external file</dt>
 
        <dd>
 
         <p>The contents of that file, interpreted as string of
         Unicode characters, are the script source.</p>
 
-        <p>The file must be converted to Unicode using the character
-        encoding given by <var>the script block's character
+        <p>For each of the rows in the following table, starting with
+        the first one and going down, if the file has as many or more
+        bytes available than the number of bytes in the first column,
+        and the first bytes of the file match the bytes given in the
+        first column, then set <var>the script block's character
+        encoding</var> to the encoding given in the cell in the second
+        column of that row, irrespective of any previous value:</p>
+
+        <!-- this table is present in several forms in this file; keep them in sync -->
+        <table>
+         <thead>
+          <tr>
+           <th>Bytes in Hexadecimal
+           <th>Encoding
+         <tbody>
+<!-- nobody uses this
+          <tr>
+           <td>00 00 FE FF
+           <td>UTF-32BE
+          <tr>
+           <td>FF FE 00 00
+           <td>UTF-32LE
+-->
+          <tr>
+           <td>FE FF
+           <td>UTF-16BE
+          <tr>
+           <td>FF FE
+           <td>UTF-16LE
+          <tr>
+           <td>EF BB BF
+           <td>UTF-8
+<!-- nobody uses this
+          <tr>
+           <td>DD 73 66 73
+           <td>UTF-EBCDIC
+-->
+        </table>
+
+        <p class="note">This step looks for Unicode Byte Order Marks
+        (BOMs).</p>
+
+        <p>The file must then be converted to Unicode using the
+        character encoding given by <var>the script block's character
         encoding</var>.</p>
 
        </dd>
 
        <dt>If the script is inline and <var>the script block's type</var> is a text-based language</dt>
 
        <dd>
 
         <p>The value of the DOM <code
         title="dom-script-text">text</code> attribute at the time the
@@ -54784,20 +54827,21 @@ interface <dfn>MessageChannel</dfn> {
 
    <li><p>For each of the rows in the following table, starting with
    the first one and going down, if there are as many or more bytes
    available than the number of bytes in the first column, and the
    first bytes of the file match the bytes given in the first column,
    then return the encoding given in the cell in the second column of
    that row, with the <span
    title="concept-encoding-confidence">confidence</span>
    <i>certain</i>, and abort these steps:</p>
 
+    <!-- this table is present in several forms in this file; keep them in sync -->
     <table>
      <thead>
       <tr>
        <th>Bytes in Hexadecimal
        <th>Encoding
      <tbody>
 <!-- nobody uses this
       <tr>
        <td>00 00 FE FF
        <td>UTF-32BE

|