HTML Standard Tracker

Filter

File a bug

SVNBugCommentTime (UTC)
2094Turns out that Zs isn't what we want; we want White_Space. (credit: w)2008-08-21 09:46
@@ -1037,20 +1037,24 @@
   such.</p>
 
 
   <h4>Common parser idioms</h4>
 
   <p>The <dfn title="space character">space characters</dfn>, for the
   purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER
   TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), and
   U+000D CARRIAGE RETURN (CR).</p>
 
+  <p>The <dfn title="White_Space">White_Space characters</dfn> are
+  those that have the Unicode property "White_Space". <a
+  href="#refsUNICODE">[UNICODE]</a></p>
+
   <p>Some of the micro-parsers described below follow the pattern of
   having an <var title="">input</var> variable that holds the string
   being parsed, and having a <var title="">position</var> variable
   pointing at the next character to parse in <var
   title="">input</var>.</p>
 
   <p>For parsers based on this pattern, a step that requires the user
   agent to <dfn>collect a sequence of characters</dfn> means that the
   following algorithm must be run, with <var title="">characters</var>
   being the set of characters that can be collected:</p>
@@ -1070,24 +1074,24 @@
    title="">result</var> and advance <var title="">position</var> to
    the next character in <var title="">input</var>.</p></li>
 
    <li><p>Return <var title="">result</var>.</p></li>
 
   </ol>
 
   <p>The step <dfn>skip whitespace</dfn> means that the user agent
   must <span>collect a sequence of characters</span> that are <span
   title="space character">space characters</span>. The step <dfn>skip
-  Zs characters</dfn> means that the user agent must <span>collect a
-  sequence of characters</span> that are in the Unicode character
-  class Zs. In both cases, the collected characters are not used. <a
-  href="#refsUNICODE">[UNICODE]</a></p>
+  White_Space characters</dfn> means that the user agent must
+  <span>collect a sequence of characters</span> that are
+  <span>White_Space</span> characters. In both cases, the collected
+  characters are not used. <a href="#refsUNICODE">[UNICODE]</a></p>
 
 
   <h4>Boolean attributes</h4>
 
   <p>A number of attributes in HTML5 are <dfn title="boolean
   attribute">boolean attributes</dfn>. The presence of a boolean
   attribute on an element represents the true value, and the absence
   of the attribute represents the false value.</p>
 
   <p>If the attribute is present, its value must either be the empty
@@ -1457,23 +1461,23 @@
    <li><span>Find a number</span> in the string according to the
    algorithm below, starting at the start of the string.</li>
 
    <li>If the sub-algorithm in step 2 returned nothing or returned an
    error condition, return nothing and abort these steps.</li>
 
    <li>Set <var title="">number1</var> to the number returned by the
    sub-algorithm in step 2.</li>
 
    <li>Starting with the character immediately after the last one
-   examined by the sub-algorithm in step 2, skip any characters in the
-   string that are in the Unicode character class Zs (this might match
-   zero characters). <a href="#refsUNICODE">[UNICODE]</a></li>
+   examined by the sub-algorithm in step 2, skip all
+   <span>White_Space</span> characters in the string (this might match
+   zero characters).</li>
 
    <li>If there are still further characters in the string, and the
    next character in the string is a <span>valid denominator
    punctuation character</span>, set <var title="">denominator</var>
    to that character.</li>
 
    <li>If the string contains any other characters in the range U+0030
    DIGIT ZERO to U+0039 DIGIT NINE, but <var title="">denominator</var> was
    given a value in the step 6, return nothing and abort these
    steps.</li>
@@ -1486,23 +1490,23 @@
    immediately after the last character that was examined by the
    sub-algorithm in step 2.</li>
 
    <li>If the sub-algorithm in step 9 returned nothing or an error
    condition, return nothing and abort these steps.</li>
 
    <li>Set <var title="">number2</var> to the number returned by the
    sub-algorithm in step 9.</li>
 
    <li>Starting with the character immediately after the last one
-   examined by the sub-algorithm in step 9, skip any characters in the
-   string that are in the Unicode character class Zs (this might match
-   zero characters). <a href="#refsUNICODE">[UNICODE]</a></li>
+   examined by the sub-algorithm in step 9, skip all
+   <span>White_Space</span> characters in the string (this might match
+   zero characters).</li>
 
    <li>If there are still further characters in the string, and the
    next character in the string is a <span>valid denominator
    punctuation character</span>, return nothing and abort these
    steps.</li>
 
    <li>If the string contains any other characters in the range U+0030
    DIGIT ZERO to U+0039 DIGIT NINE, return nothing and abort these
    steps.</li>
  
@@ -2230,21 +2234,21 @@
    title="">input</var>, initially pointing at the start of the
    string.</p></li>
 
    <li><p>Let <var title="">results</var> be the collection of results
    that are to be returned (one or more of a date, a time, and a
    timezone), initially empty. If the algorithm aborts at any point,
    then whatever is currently in <var title="">results</var> must be
    returned as the result of the algorithm.</p></li>
 
    <!-- LEADING WHITESPACE -->
-   <li><p>For the "in content" variant: <span>skip Zs
+   <li><p>For the "in content" variant: <span>skip White_Space
    characters</span>; for the "in attributes" variant: <span>skip
    whitespace</span>.</p></li><!-- XXX skip whitespace in attribute?
    really? -->
 
    <!-- YEAR or HOUR -->
    <li><p><span>Collect a sequence of characters</span> in the range
    U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected
    sequence is empty, then the string is invalid; abort these
    steps.</p></li>
 
@@ -2324,29 +2328,29 @@
      steps.</p></li>
 
      <li><p>Add the date represented by <var title="">year</var>, <var
      title="">month</var>, and <var title="">day</var> to the <var
      title="">results</var>.</p></li>
 
      <!-- XXX we should allow the algorithm to abort here without
      error, with just a date. -->
 
      <!-- WHITESPACE -->
-     <li><p>For the "in content" variant: <span>skip Zs
+     <li><p>For the "in content" variant: <span>skip White_Space
      characters</span>; for the "in attributes" variant: <span>skip
      whitespace</span>.</p></li>
 
      <li><p>If the character at <var title="">position</var> is a U+0054
      LATIN CAPITAL LETTER T, then move <var title="">position</var>
      forwards one character.</p></li>
 
-     <li><p>For the "in content" variant: <span>skip Zs
+     <li><p>For the "in content" variant: <span>skip White_Space
      characters</span>; for the "in attributes" variant: <span>skip
      whitespace</span>.</p></li>
 
      <!-- at this point, if <var title="">position</var> points to a
      number, we know that we passed at least one space or a T, because
      otherwise the number would have been slurped up in the last
      "collect" step. -->
 
      <!-- HOUR -->
      <li><p><span>Collect a sequence of characters</span> in the range
@@ -2433,21 +2437,21 @@
    title="">minute</var>, and <var title="">second</var> to the <var
    title="">results</var>.</p></li>
 
    <!-- TIME ZONE -->
 
    <li><p>If <var title="">results</var> has both a date and a time,
    then:</p>
 
     <ol>
 
-     <li><p>For the "in content" variant: <span>skip Zs
+     <li><p>For the "in content" variant: <span>skip White_Space
      characters</span>; for the "in attributes" variant: <span>skip
      whitespace</span>.</p></li>
 
      <li><p>If <var title="">position</var> is past the end of <var
      title="">input</var>, then skip to the next step in the overall
      set of steps.</p>
 
      <!-- UTC -->
      <li><p>Otherwise, if the character at <var
      title="">position</var> is a U+005A LATIN CAPITAL LETTER Z,
@@ -2534,21 +2538,21 @@
 
      </li>
 
      <li><p>Otherwise, the string is invalid; abort these
      steps.</p></li>
 
     </ol>
 
    </li>
 
-   <li><p>For the "in content" variant: <span>skip Zs
+   <li><p>For the "in content" variant: <span>skip White_Space
    characters</span>; for the "in attributes" variant: <span>skip
    whitespace</span>.</p></li>
 
    <li><p>If <var title="">position</var> is <em>not</em> past the end
    of <var title="">input</var>, then the string is invalid.</p>
 
    <li><p>Abort these steps (the string is parsed).</p></li>
 
   </ol>
 
@@ -23611,22 +23615,21 @@ function AddCloud(data, x, y) { ... }</pre>
   title="">x</sub></var>&nbsp;&lt;&nbsp;<var title="">x<sub
   title="">width</sub></var></span> and <span><var title="">header<sub
   title="">y</sub></var>&nbsp;&le;&nbsp;<var title="">slot<sub
   title="">y</sub></var>&nbsp;&lt;&nbsp;<var title="">header<sub
   title="">y</sub></var>+<var title="">header<sub
   title="">height</sub></var></span>, are all either empty or covered
   by <span title="empty data cell">empty data cells</span>.</p>
 
   <p>A data cell is said to be an <dfn>empty data cell</dfn> if it
   contains no elements and its text content, if any, consists only of
-  characters in the Unicode character class Zs. <a
-  href="#refsUNICODE">[UNICODE]</a></p>
+  <span>White_Space</span> characters.</p>
 
   <p>User agents may remove <span title="empty data cell">empty data
   cells</span> when analyzing data in a <span
   title="concept-table">table</span>.</p>
 
 
   <h3 id="forms">Forms</h3>
   <!-- XXX everything in WF2 -->
 
   <p class="big-issue">This section will contain definitions of the

|