Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[t] (2) Strip whitespace outside the root element from the DOM
git-svn-id: http://svn.whatwg.org/webapps@944 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Jun 22, 2007
1 parent 4079b70 commit 07303ac
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 18 deletions.
26 changes: 19 additions & 7 deletions index
Expand Up @@ -32123,6 +32123,15 @@ function receiver(e) {
title=attr-meta-charset>character encoding declarations</a> are to be
serialised, as discussed in the section on that topic.

<p class=note>Space characters before the root <code><a
href="#html">html</a></code> element will be dropped when the document is
parsed; space characters <em>after</em> the root <code><a
href="#html">html</a></code> element will be parsed as if they were at the
end of the <code><a href="#html">html</a></code> element. Thus, space
characters around the root element do not round-trip. It is suggested that
newlines be inserted after the DOCTYPE and any comments that aren't in the
root element.

<h4 id=the-doctype><span class=secno>8.1.1. </span>The DOCTYPE</h4>

<p>A <dfn id=doctype title=syntax-doctype>DOCTYPE</dfn> is a mostly
Expand Down Expand Up @@ -35114,13 +35123,12 @@ function receiver(e) {
from the <a href="#tokenisation0">tokenisation</a> stage as follows:

<dl class=switch>
<dt>A character token that <em>is</em> one of one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE
<dt>A character token that is one of one of U+0009 CHARACTER TABULATION,
U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF),
<!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE

<dd>
<p><a href="#append" title="append a character">Append that character</a>
to the <code>Document</code> node.</p>
<p>Ignore the token.</p>

<dt>A comment token

Expand Down Expand Up @@ -35451,8 +35459,7 @@ function receiver(e) {
<!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE

<dd>
<p><a href="#append" title="append a character">Append that character</a>
to the <code>Document</code> node.</p>
<p>Ignore the token.</p>

<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
Expand Down Expand Up @@ -38314,6 +38321,11 @@ Put the following into the MathML namespace if parsed:
<dd>
<p>Process the token as it would be processed in <a href="#the-main0">the
main phase</a>.</p>
<!-- if there was a <body>, the space will go
into it, otherwise (e.g. if there was a <frameset>) it'll go into
the <html> node (this is important in case we have "foo</html>
bar", as we don't want that to become one word) -->


<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM
Expand Down
33 changes: 22 additions & 11 deletions source
Expand Up @@ -29618,6 +29618,14 @@ function receiver(e) {
title="attr-meta-charset">character encoding declarations</span> are
to be serialised, as discussed in the section on that topic.</p>

<p class="note">Space characters before the root <code>html</code>
element will be dropped when the document is parsed; space
characters <em>after</em> the root <code>html</code> element will be
parsed as if they were at the end of the <code>html</code>
element. Thus, space characters around the root element do not
round-trip. It is suggested that newlines be inserted after the
DOCTYPE and any comments that aren't in the root element.</p>


<h4>The DOCTYPE</h4>

Expand Down Expand Up @@ -32438,13 +32446,12 @@ function receiver(e) {

<dl class="switch">

<dt>A character token that <em>is</em> one of one of U+0009
CHARACTER TABULATION, U+000A LINE FEED (LF), U+000B LINE
TABULATION, U+000C FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or
U+0020 SPACE</dt>
<dt>A character token that is one of one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
SPACE</dt>
<dd>
<p><span title="append a character">Append that character</span>
to the <code>Document</code> node.</p>
<p>Ignore the token.</p>
</dd>

<dt>A comment token</dt>
Expand Down Expand Up @@ -32625,10 +32632,10 @@ function receiver(e) {

<dt>A character token that is one of one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
SPACE</dt>
<dd>
<p><span title="append a character">Append that character</span>
to the <code>Document</code> node.</p>
<p>Ignore the token.</p>
</dd>

<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
Expand Down Expand Up @@ -35622,10 +35629,14 @@ Put the following into the MathML namespace if parsed:

<dt>A character token that is one of one of U+0009 CHARACTER
TABULATION, U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C
FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020 SPACE</dt>
FORM FEED (FF), <!--U+000D CARRIAGE RETURN (CR),--> or U+0020
SPACE</dt>
<dd>
<p>Process the token as it would be processed in <span>the main
phase</span>.</p>
phase</span>.</p> <!-- if there was a <body>, the space will go
into it, otherwise (e.g. if there was a <frameset>) it'll go into
the <html> node (this is important in case we have "foo</html>
bar", as we don't want that to become one word) -->
</dd>

<dt>A character token that is <em>not</em> one of U+0009 CHARACTER
Expand Down

0 comments on commit 07303ac

Please sign in to comment.