Skip to content

Commit

Permalink
[e] (0) Define 'code unit'.
Browse files Browse the repository at this point in the history
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=13676

git-svn-id: http://svn.whatwg.org/webapps@6649 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Oct 6, 2011
1 parent ee9e809 commit 4351227
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 10 deletions.
13 changes: 9 additions & 4 deletions complete.html
Expand Up @@ -3362,24 +3362,29 @@ <h4 id=character-encodings><span class=secno>2.1.6 </span>Character encodings</h
UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>

<p>The term <dfn id=code-unit>code unit</dfn> is used as defined in the Web IDL
specification: a 16 bit unsigned integer, the smallest atomic
component of a <code>DOMString</code>. (This is a narrower
definition than the one used in Unicode.) <a href=#refsWEBIDL>[WEBIDL]</a></p>

<p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>

<p>The term <dfn id=character>character</dfn>, when not qualified as
<em>Unicode</em> character, means a <a href=#unicode-character>Unicode character</a>
where possible, or a surrogate code point when not: when an
algorithm that processes strings is defined in terms of characters,
a pair of <span title="code unit">code units</span> consisting of a
a pair of <a href=#code-unit title="code unit">code units</a> consisting of a
high surrogate followed by a low surrogate must be treated as a
single character, but isolated surrogates must each be treated as a
single character also.</p>

<p>The <dfn id=code-point-length>code-point length</dfn> of a string is the number of
<span title="code unit">code units</span> in that string. <a href=#refsWEBIDL>[WEBIDL]</a></p>
<a href=#code-unit title="code unit">code units</a> in that string.</p>

<p class=note>This complexity results from the historical decision
to define the DOM API in terms of 16 bit (UTF-16) <span title="code
unit">code units</span>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>
to define the DOM API in terms of 16 bit (UTF-16) <a href=#code-unit title="code
unit">code units</a>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>



Expand Down
13 changes: 9 additions & 4 deletions index
Expand Up @@ -3362,24 +3362,29 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
UTF-16: self-describing UTF-16 with a BOM, ambiguous UTF-16 without
a BOM, raw UTF-16LE, and raw UTF-16BE. <a href=#refsRFC2781>[RFC2781]</a></p>

<p>The term <dfn id=code-unit>code unit</dfn> is used as defined in the Web IDL
specification: a 16 bit unsigned integer, the smallest atomic
component of a <code>DOMString</code>. (This is a narrower
definition than the one used in Unicode.) <a href=#refsWEBIDL>[WEBIDL]</a></p>

<p>The term <dfn id=unicode-character>Unicode character</dfn> is used to mean a <i title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a href=#refsUNICODE>[UNICODE]</a></p>

<p>The term <dfn id=character>character</dfn>, when not qualified as
<em>Unicode</em> character, means a <a href=#unicode-character>Unicode character</a>
where possible, or a surrogate code point when not: when an
algorithm that processes strings is defined in terms of characters,
a pair of <span title="code unit">code units</span> consisting of a
a pair of <a href=#code-unit title="code unit">code units</a> consisting of a
high surrogate followed by a low surrogate must be treated as a
single character, but isolated surrogates must each be treated as a
single character also.</p>

<p>The <dfn id=code-point-length>code-point length</dfn> of a string is the number of
<span title="code unit">code units</span> in that string. <a href=#refsWEBIDL>[WEBIDL]</a></p>
<a href=#code-unit title="code unit">code units</a> in that string.</p>

<p class=note>This complexity results from the historical decision
to define the DOM API in terms of 16 bit (UTF-16) <span title="code
unit">code units</span>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>
to define the DOM API in terms of 16 bit (UTF-16) <a href=#code-unit title="code
unit">code units</a>, rather than in terms of <a href=#unicode-character title="Unicode character">Unicode characters</a>.</p>



Expand Down
9 changes: 7 additions & 2 deletions source
Expand Up @@ -2237,6 +2237,12 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
a BOM, raw UTF-16LE, and raw UTF-16BE. <a
href="#refsRFC2781">[RFC2781]</a></p>

<p>The term <dfn>code unit</dfn> is used as defined in the Web IDL
specification: a 16 bit unsigned integer, the smallest atomic
component of a <code>DOMString</code>. (This is a narrower
definition than the one used in Unicode.) <a
href="#refsWEBIDL">[WEBIDL]</a></p>

<p>The term <dfn>Unicode character</dfn> is used to mean a <i
title="">Unicode scalar value</i> (i.e. any Unicode code point that
is not a surrogate code point). <a
Expand All @@ -2252,8 +2258,7 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
single character also.</p>

<p>The <dfn>code-point length</dfn> of a string is the number of
<span title="code unit">code units</span> in that string. <a
href="#refsWEBIDL">[WEBIDL]</a></p>
<span title="code unit">code units</span> in that string.</p>

<p class="note">This complexity results from the historical decision
to define the DOM API in terms of 16 bit (UTF-16) <span title="code
Expand Down

0 comments on commit 4351227

Please sign in to comment.