HTML Standard Tracker

Filter

File a bug

SVNBugCommentTime (UTC)
4126[Gecko] [Internet Explorer] [Opera] [Webkit] List the default encodings by locale.2009-10-13 10:45
@@ -78195,30 +78195,172 @@ interface <dfn>MessagePort</dfn> {
     <p class="note">The UTF-8 encoding has a highly detectable bit
     pattern. Documents that contain bytes with values greater than
     0x7F which match the UTF-8 pattern are very likely to be UTF-8,
     while documents with byte sequences that do not match it are very
     likely not. User-agents are therefore encouraged to search for
     this common encoding. <a href="#refsPPUTF8">[PPUTF8]</a> <a
     href="#refsUTF8DET">[UTF8DET]</a></p>
 
    </li>
 
-   <li><p>Otherwise, return an implementation-defined or
-   user-specified default character encoding, with the <span
-   title="concept-encoding-confidence">confidence</span>
-   <i>tentative</i>. In controlled environments or in environments
-   where the encoding of documents can be prescribed (for example, for
-   user agents intended for dedicated use in new networks), the more
-   comprehensive <code title="">UTF-8</code> encoding is
-   suggested. Due to its use in legacy content, <code
-   title="">windows-1252</code> is suggested as a default in
-   predominantly Western locales instead.</p></li>
+   <li>
+
+    <p>Otherwise, return an implementation-defined or user-specified
+    default character encoding, with the <span
+    title="concept-encoding-confidence">confidence</span>
+    <i>tentative</i>.</p>
+
+    <p>In controlled environments or in environments where the
+    encoding of documents can be prescribed (for example, for user
+    agents intended for dedicated use in new networks), the
+    comprehensive <code title="">UTF-8</code> encoding is
+    suggested.</p>
+
+    <p>In other environments, the default encoding is typically
+    dependent on the user's locale (an approximation of the languages,
+    and thus typically encodings, of the pages that the user is likely
+    to frequent). The following table gives suggested defaults based
+    on the user's locale, for compatibility with legacy content:</p>
+
+    <!-- based on mozilla 1.9.1 localizations: 
+         http://mxr.mozilla.org/l10n-mozilla1.9.1/find?string=global%2Fintl.properties&tree=l10n-mozilla1.9.1&hint= -->
+
+    <table>
+     <thead>
+      <tr>
+       <th>Locale
+       <th>Suggested default encoding
+     <tbody>
+
+      <tr>
+       <td>ar
+       <td>UTF-8
+
+      <tr>
+       <td>be
+       <td>ISO-8859-5
+
+      <tr>
+       <td>bg
+       <td>windows-1251
+
+      <tr>
+       <td>cs<!-- -CZ -->
+       <td>ISO-8859-2
+
+      <tr>
+       <td>cy
+       <td>UTF-8
+
+      <tr>
+       <td>fa<!-- -IR -->
+       <td>UTF-8
+
+      <tr>
+       <td>he<!-- -IL -->
+       <td>windows-1255
+
+      <tr>
+       <td>hr
+       <td>UTF-8
+
+      <tr>
+       <td>hu<!-- -HU -->
+       <td>ISO-8859-2
+
+      <tr>
+       <td>ja <!-- and ja-JP-mac -->
+       <td>windows-31J <!-- Shift_JIS -->
+
+      <tr>
+       <td>kk
+       <td>UTF-8
+
+      <tr>
+       <td>ko<!-- -KR -->
+       <td>windows-949 <!-- EUC-KR -->
+
+      <tr>
+       <td>ku
+       <td>windows-1254 <!-- ISO-8859-9 -->
+
+      <tr>
+       <td>lt
+       <td>windows-1257
+
+      <tr>
+       <td>lv<!-- -LV -->
+       <td>ISO-8859-13
+
+      <tr>
+       <td>mk<!-- -MK -->
+       <td>UTF-8
+
+      <tr>
+       <td>or
+       <td>UTF-8
+
+      <tr>
+       <td>pl<!-- -PL -->
+       <td>ISO-8859-2
+
+      <tr>
+       <td>ro
+       <td>UTF-8
+
+      <tr>
+       <td>ru
+       <td>windows-1251
+
+      <tr>
+       <td>sk
+       <td>windows-1250
+
+      <tr>
+       <td>sl
+       <td>ISO-8859-2
+
+      <tr>
+       <td>sr
+       <td>UTF-8
+
+      <tr>
+       <td>th
+       <td>windows-874 <!-- TIS-620 -->
+
+      <tr>
+       <td>tr<!-- -TR -->
+       <td>windows-1254 <!-- ISO-8859-9 -->
+
+      <tr>
+       <td>uk
+       <td>windows-1251
+
+      <tr>
+       <td>vi
+       <td>UTF-8
+
+      <tr>
+       <td>zh-CN
+       <td>GB18030
+
+      <tr>
+       <td>zh-TW
+       <td>Big5
+
+      <tr>
+       <td>All other locales
+       <td>windows-1252
+
+    </table>
+
+   </li>
 
   </ol>
 
   <p>The <span>document's character encoding</span> must immediately
   be set to the value returned from this algorithm, at the same time
   as the user agent uses the returned value to select the decoder to
   use for the input stream.</p>
 
 
   <h5>Character encodings</h5>

|