Skip to content

Commit

Permalink
[e] (0) Mention and encourage UTF-8 detection specifically.
Browse files Browse the repository at this point in the history
git-svn-id: http://svn.whatwg.org/webapps@3882 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Sep 17, 2009
1 parent be5f566 commit 935df4e
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 14 deletions.
1 change: 0 additions & 1 deletion entities-unicode.inc
Expand Up @@ -1661,7 +1661,6 @@
<tr> <td> <code title="">rtriltri;</code> </td> <td> U+029CE </td> </tr>
<tr> <td> <code title="">LeftTriangleBar;</code> </td> <td> U+029CF </td> </tr>
<tr> <td> <code title="">RightTriangleBar;</code> </td> <td> U+029D0 </td> </tr>
<tr> <td> <code title="">race;</code> </td> <td> U+029DA </td> </tr>
<tr> <td> <code title="">iinfin;</code> </td> <td> U+029DC </td> </tr>
<tr> <td> <code title="">infintie;</code> </td> <td> U+029DD </td> </tr>
<tr> <td> <code title="">nvinfin;</code> </td> <td> U+029DE </td> </tr>
Expand Down
23 changes: 17 additions & 6 deletions index
Expand Up @@ -7348,6 +7348,7 @@ interface <dfn id=htmldocument>HTMLDocument</dfn> {
purpose. Authors must not use elements, attributes, and attribute
values that are not permitted by this specification or other
applicable specifications.</p>
<!-- http://www.w3.org/mid/17E341CD-E790-422C-9F9A-69347EE01CEB@iki.fi -->

<div class=example>
<p>For example, the following document is non-conforming, despite
Expand Down Expand Up @@ -62031,11 +62032,22 @@ interface <dfn id=messageport>MessagePort</dfn> {
visited, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort these steps.</li>

<li><p>The user agent may attempt to autodetect the character
encoding from applying frequency analysis or other algorithms to
the data stream. If autodetection succeeds in determining a
character encoding, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort these steps. <a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></li>
<li>

<p>The user agent may attempt to autodetect the character encoding
from applying frequency analysis or other algorithms to the data
stream. If autodetection succeeds in determining a character
encoding, then return that encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>tentative</i>, and abort these steps. <a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>

<p class=note>The UTF-8 encoding has a highly detectable bit
pattern. Documents that contain bytes with values greater than
0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very
likely not. User-agents are therefore encouraged to search for
this common encoding.</p>

</li>

<li><p>Otherwise, return an implementation-defined or
user-specified default character encoding, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
Expand Down Expand Up @@ -70270,7 +70282,6 @@ http://lxr.mozilla.org/seamonkey/search?string=nested
<tr> <td> <code title="">rAtail;</code> </td> <td> U+0291C </td> </tr>
<tr> <td> <code title="">rBarr;</code> </td> <td> U+0290F </td> </tr>
<tr> <td> <code title="">rHar;</code> </td> <td> U+02964 </td> </tr>
<tr> <td> <code title="">race;</code> </td> <td> U+029DA </td> </tr>
<tr> <td> <code title="">racute;</code> </td> <td> U+00155 </td> </tr>
<tr> <td> <code title="">radic;</code> </td> <td> U+0221A </td> </tr>
<tr> <td> <code title="">raemptyv;</code> </td> <td> U+029B3 </td> </tr>
Expand Down
26 changes: 19 additions & 7 deletions source
Expand Up @@ -7397,6 +7397,7 @@ interface <dfn>HTMLDocument</dfn> {
purpose. Authors must not use elements, attributes, and attribute
values that are not permitted by this specification or other
applicable specifications.</p>
<!-- http://www.w3.org/mid/17E341CD-E790-422C-9F9A-69347EE01CEB@iki.fi -->

<div class="example">
<p>For example, the following document is non-conforming, despite
Expand Down Expand Up @@ -76650,13 +76651,24 @@ interface <dfn>MessagePort</dfn> {
title="concept-encoding-confidence">confidence</span>
<i>tentative</i>, and abort these steps.</p></li>

<li><p>The user agent may attempt to autodetect the character
encoding from applying frequency analysis or other algorithms to
the data stream. If autodetection succeeds in determining a
character encoding, then return that encoding, with the <span
title="concept-encoding-confidence">confidence</span>
<i>tentative</i>, and abort these steps. <a
href="#refsUNIVCHARDET">[UNIVCHARDET]</a></p></li>
<li>

<p>The user agent may attempt to autodetect the character encoding
from applying frequency analysis or other algorithms to the data
stream. If autodetection succeeds in determining a character
encoding, then return that encoding, with the <span
title="concept-encoding-confidence">confidence</span>
<i>tentative</i>, and abort these steps. <a
href="#refsUNIVCHARDET">[UNIVCHARDET]</a></p>

<p class="note">The UTF-8 encoding has a highly detectable bit
pattern. Documents that contain bytes with values greater than
0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very
likely not. User-agents are therefore encouraged to search for
this common encoding.</p>

</li>

<li><p>Otherwise, return an implementation-defined or
user-specified default character encoding, with the <span
Expand Down

0 comments on commit 935df4e

Please sign in to comment.