Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[e] (0) apply wg decision
Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=8207

git-svn-id: http://svn.whatwg.org/webapps@6007 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Apr 14, 2011
1 parent 26203de commit c2695e1
Show file tree
Hide file tree
Showing 3 changed files with 848 additions and 130 deletions.
314 changes: 272 additions & 42 deletions complete.html
Expand Up @@ -372,8 +372,10 @@ <h2 class="no-num no-toc" id=contents>Table of contents</h2>
<li><a href=#urls><span class=secno>2.6 </span>URLs</a>
<ol>
<li><a href=#terminology-0><span class=secno>2.6.1 </span>Terminology</a></li>
<li><a href=#dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic changes to base URLs</a></li>
<li><a href=#interfaces-for-url-manipulation><span class=secno>2.6.3 </span>Interfaces for URL manipulation</a></ol></li>
<li><a href=#parsing-urls><span class=secno>2.6.2 </span>Parsing URLs</a></li>
<li><a href=#resolving-urls><span class=secno>2.6.3 </span>Resolving URLs</a></li>
<li><a href=#dynamic-changes-to-base-urls><span class=secno>2.6.4 </span>Dynamic changes to base URLs</a></li>
<li><a href=#interfaces-for-url-manipulation><span class=secno>2.6.5 </span>Interfaces for URL manipulation</a></ol></li>
<li><a href=#fetching-resources><span class=secno>2.7 </span>Fetching resources</a>
<ol>
<li><a href=#concept-http-equivalent><span class=secno>2.7.1 </span>Protocol concepts</a></li>
Expand Down Expand Up @@ -5998,9 +6000,21 @@ <h4 id=mq><span class=secno>2.5.10 </span>Media queries</h4>

<h3 id=urls><span class=secno>2.6 </span>URLs</h3>

<h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>
<p>This specification defines the term <a href=#url>URL</a>, and defines
various algorithms for dealing with URLs, because for historical
reasons the rules defined by the URI and IRI specifications are not
a complete description of what HTML user agents need to implement to
be compatible with Web content.</p>

<p class=note>The term "URL" in this specification is used in a
manner distinct from the precise technical meaning it is given in
RFC 3986. Readers familiar with that RFC will find it easier to read
<em>this</em> specification if they pretend the term "URL" as used
herein is really called something else altogether. This is a
<a href=#willful-violation>willful violation</a> of RFC 3986. <a href=#refsRFC3986>[RFC3986]</a></p>


<!-- see also: svn diff -r3244:3245 source -->
<h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>

<p>A <dfn id=url>URL</dfn> is a string used to identify a resource.</p>

Expand Down Expand Up @@ -6031,24 +6045,155 @@ <h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>
whitespace">stripping leading and trailing whitespace</a> from
it, it is a <a href=#valid-non-empty-url>valid non-empty URL</a>.</p>

<p>This specification defines the URL
<dfn id=about:legacy-compat><code>about:legacy-compat</code></dfn> as a reserved, though
unresolvable, <code title="">about:</code> URI, for use in <a href=#syntax-doctype title=syntax-doctype>DOCTYPE</a>s in <a href=#html-documents>HTML
documents</a> when needed for compatibility with XML tools. <a href=#refsABOUT>[ABOUT]</a></p>

<p>This specification defines the URL
<dfn id=about:srcdoc><code>about:srcdoc</code></dfn> as a reserved, though
unresolvable, <code title="">about:</code> URI, that is used as
<a href="#the-document's-address">the document's address</a> of <a href=#an-iframe-srcdoc-document title="an iframe srcdoc
document"><code>iframe</code> <code title=attr-iframe-srcdoc>srcdoc</code> documents</a>. <a href=#refsABOUT>[ABOUT]</a></p>


<div class=impl>

<h4 id=parsing-urls><span class=secno>2.6.2 </span>Parsing URLs</h4>

<p>To <dfn id=parse-a-url>parse a URL</dfn> <var title="">url</var> into its
component parts, the user agent must use the <span class=XXX>parse
an address</span> algorithm defined by the IRI specification. <a href=#refsRFC3987>[RFC3987]</a></p>

<p>Parsing a URL can fail. If it does not, then it results in the
following components, again as defined by the IRI specification:</p>

<ul class=brief><li><dfn id=url-scheme title=url-scheme>&lt;scheme&gt;</dfn></li>
<li><dfn id=url-host title=url-host>&lt;host&gt;</dfn></li>
<li><dfn id=url-port title=url-port>&lt;port&gt;</dfn></li>
<li><dfn id=url-hostport title=url-hostport>&lt;hostport&gt;</dfn></li>
<li><dfn id=url-path title=url-path>&lt;path&gt;</dfn></li>
<li><dfn id=url-query title=url-query>&lt;query&gt;</dfn></li>
<li><dfn id=url-fragment title=url-fragment>&lt;fragment&gt;</dfn></li>
<li><dfn id=url-host-specific title=url-host-specific>&lt;host-specific&gt;</dfn></li>
</ul><hr><p>To <dfn id=resolve-a-url>resolve a URL</dfn> to an <a href=#absolute-url>absolute URL</a>
component parts, the user agent must use the following steps:</p>

<ol><li><p>Strip leading and trailing <a href=#space-character title="space
character">space characters</a> from <var title="">url</var>.</li>

<li>

<p>Parse <var title="">url</var> in the manner defined by RFC
3986, with the following exceptions:</p>

<ul><li>Add all characters with code points less than or equal to
U+0020 or greater than or equal to U+007F to the
&lt;unreserved&gt; production.</li>

<li>Add the characters U+0022, U+003C, U+003E, U+005B .. U+005E,
U+0060, and U+007B .. U+007D to the &lt;unreserved&gt;
production.
<!--
0022 QUOTATION MARK
003C LESS-THAN SIGN
003E GREATER-THAN SIGN
005B LEFT SQUARE BRACKET
005C REVERSE SOLIDUS
005D RIGHT SQUARE BRACKET
005E CIRCUMFLEX ACCENT
0060 GRAVE ACCENT
007B LEFT CURLY BRACKET
007C VERTICAL LINE
007D RIGHT CURLY BRACKET
-->
</li>

<li>Add a single U+0025 PERCENT SIGN character as a second
alternative way of matching the &lt;pct-encoded&gt; production,
except when the &lt;pct-encoded&gt; is used in the
&lt;reg-name&gt; production.</li>

<li>Add the U+0023 NUMBER SIGN character to the characters
allowed in the &lt;fragment&gt; production.</li>

<!-- some browsers also have other differences, e.g. Mozilla
seems to treat ";" as if it was not in sub-delims, if the scheem
is "ftp". -->

</ul></li>

<li>

<p>If <var title="">url</var> doesn't match the
&lt;URI-reference&gt; production, even after the above changes are
made to the ABNF definitions, then parsing the URL fails with an
error. <a href=#refsRFC3986>[RFC3986]</a></p>

<p>Otherwise, parsing <var title="">url</var> was successful; the
components of the URL are substrings of <var title="">url</var>
defined as follows:</p>

<dl><dt><dfn id=url-scheme title=url-scheme>&lt;scheme&gt;</dfn></dt>

<dd><p>The substring matched by the &lt;scheme&gt; production, if any.</dd>


<dt><dfn id=url-host title=url-host>&lt;host&gt;</dfn></dt>

<dd><p>The substring matched by the &lt;host&gt; production, if any.</dd>


<dt><dfn id=url-port title=url-port>&lt;port&gt;</dfn></dt>

<dd><p>The substring matched by the &lt;port&gt; production, if any.</dd>


<dt><dfn id=url-hostport title=url-hostport>&lt;hostport&gt;</dfn></dt>

<dd><p>If there is a &lt;scheme&gt; component and a &lt;port&gt;
component and the port given by the &lt;port&gt; component is
different than the default port defined for the protocol given by
the &lt;scheme&gt; component, then &lt;hostport&gt; is the
substring that starts with the substring matched by the
&lt;host&gt; production and ends with the substring matched by the
&lt;port&gt; production, and includes the colon in between the
two. Otherwise, it is the same as the &lt;host&gt; component.</p>


<dt><dfn id=url-path title=url-path>&lt;path&gt;</dfn></dt>

<dd>

<p>The substring matched by one of the following productions, if
one of them was matched:</p>

<ul class=brief><li>&lt;path-abempty&gt;</li>
<li>&lt;path-absolute&gt;</li>
<li>&lt;path-noscheme&gt;</li>
<li>&lt;path-rootless&gt;</li>
<li>&lt;path-empty&gt;</li>
</ul></dd>


<dt><dfn id=url-query title=url-query>&lt;query&gt;</dfn></dt>

<dd><p>The substring matched by the &lt;query&gt; production, if any.</dd>


<dt><dfn id=url-fragment title=url-fragment>&lt;fragment&gt;</dfn></dt>

<dd><p>The substring matched by the &lt;fragment&gt; production, if any.</dd>


<dt><dfn id=url-host-specific title=url-host-specific>&lt;host-specific&gt;</dfn></dt>

<dd><p>The substring that <em>follows</em> the substring matched
by the &lt;authority&gt; production, or the whole string if the
&lt;authority&gt; production wasn't matched.</dd>

</dl></li>

</ol><p class=note>These parsing rules are a <a href=#willful-violation>willful
violation</a> of RFC 3986 and RFC 3987 (which do not define error
handling), motivated by a desire to handle legacy content. <a href=#refsRFC3986>[RFC3986]</a> <a href=#refsRFC3987>[RFC3987]</a></p>

</div>


<h4 id=resolving-urls><span class=secno>2.6.3 </span>Resolving URLs</h4>

<p>Resolving a URL is the process of taking a relative URL and
obtaining the absolute URL that it implies.</p>

<div class=impl>

<p>To <dfn id=resolve-a-url>resolve a URL</dfn> to an <a href=#absolute-url>absolute URL</a>
relative to either another <a href=#absolute-url>absolute URL</a> or an element,
the user agent must use the following steps. Resolving a URL can
result in an error, in which case the URL is not resolvable.</p>
Expand Down Expand Up @@ -6150,11 +6295,113 @@ <h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>

</ol></li>

<li><p>Return the result of applying the <span class=XXX>resolve
an address</span> algorithm defined by the IRI specification to
resolve <var title="">url</var> relative to <var title="">base</var> using encoding <var title="">encoding</var>. <a href=#refsRFC3987>[RFC3987]</a></li>
<li><p><a href=#parse-a-url title="parse a URL">Parse</a> <var title="">url</var> into its component parts.</li>

</ol></div>
<li>

<p>If parsing <var title="">url</var> resulted in a <a href=#url-host title=url-host>&lt;host&gt;</a> component, then replace the
matching substring of <var title="">url</var> with the string that
results from expanding any sequences of percent-encoded octets in
that component that are valid UTF-8 sequences into Unicode
characters as defined by UTF-8.</p>

<p>If any percent-encoded octets in that component are not valid
UTF-8 sequences, then return an error and abort these steps.</p>

<p>Apply the IDNA ToASCII algorithm to the matching substring,
with both the AllowUnassigned and UseSTD3ASCIIRules flags
set. Replace the matching substring with the result of the ToASCII
algorithm.</p>

<p>If ToASCII fails to convert one of the components of the
string, e.g. because it is too long or because it contains invalid
characters, then return an error and abort these steps. <a href=#refsRFC3490>[RFC3490]</a></p>

</li>

<li>

<p>If parsing <var title="">url</var> resulted in a <a href=#url-path title=url-path>&lt;path&gt;</a> component, then replace the
matching substring of <var title="">url</var> with the string that
results from applying the following steps to each character other
than U+0025 PERCENT SIGN (%) that doesn't match the original
&lt;path&gt; production defined in RFC 3986:</p>

<ol><li>Encode the character into a sequence of octets as defined by
UTF-8.</li>

<li>Replace the character with the percent-encoded form of those
octets. <a href=#refsRFC3986>[RFC3986]</a></li>

</ol><div class=example>

<p>For instance if <var title="">url</var> was "<code title="">//example.com/a^b&#9786;c%FFd%z/?e</code>", then the
<a href=#url-path title=url-path>&lt;path&gt;</a> component's substring
would be "<code title="">/a^b&#9786;c%FFd%z/</code>" and the two
characters that would have to be escaped would be "<code title="">^</code>" and "<code title="">&#9786;</code>". The
result after this step was applied would therefore be that <var title="">url</var> now had the value "<code title="">//example.com/a%5Eb%E2%98%BAc%FFd%z/?e</code>".</p>

</div>

</li>

<li>

<p>If parsing <var title="">url</var> resulted in a <a href=#url-query title=url-query>&lt;query&gt;</a> component, then replace the
matching substring of <var title="">url</var> with the string that
results from applying the following steps to each character other
than U+0025 PERCENT SIGN (%) that doesn't match the original
&lt;query&gt; production defined in RFC 3986:</p>

<ol><li>If the character in question cannot be expressed in the
encoding <var title="">encoding</var>, then replace it with a
single 0x3F octet (an ASCII question mark) and skip the remaining
substeps for this character.</li>

<li>Encode the character into a sequence of octets as defined by
the encoding <var title="">encoding</var>.</li>

<li>Replace the character with the percent-encoded form of those
octets. <a href=#refsRFC3986>[RFC3986]</a></li>

</ol></li>

<li><p>Apply the algorithm described in RFC 3986 section 5.2
Relative Resolution, using <var title="">url</var> as the
potentially relative URI reference (<var title="">R</var>), and
<var title="">base</var> as the base URI (<var title="">Base</var>). <a href=#refsRFC3986>[RFC3986]</a></li>

<li>

<p>Apply any relevant conformance criteria of RFC 3986 and RFC
3987, returning an error and aborting these steps if
appropriate. <a href=#refsRFC3986>[RFC3986]</a> <a href=#refsRFC3987>[RFC3987]</a></p>

<p class=example>For instance, if an absolute URI that would be
returned by the above algorithm violates the restrictions specific
to its scheme, e.g. a <code title="">data:</code> URI using the
"<code title="">//</code>" server-based naming authority syntax,
then user agents are to treat this as an error instead.<!-- RFC
3986, 3.1 Scheme --></p>

</li>

<li><p>Let <var title="">result</var> be the target URI (<var title="">T</var>) returned by the Relative Resolution
algorithm.</li>

<li><p>If <var title="">result</var> uses a scheme with a
server-based naming authority, replace all U+005C REVERSE SOLIDUS
(\) characters in <var title="">result</var> with U+002F SOLIDUS
(/) characters.</li>

<li><p>Return <var title="">result</var>.</li>

</ol><p class=note>Some of the steps in these rules, for example the
processing of U+005C REVERSE SOLIDUS (\) characters, are a
<a href=#willful-violation>willful violation</a> of RFC 3986 and RFC 3987, motivated
by a desire to handle legacy content. <a href=#refsRFC3986>[RFC3986]</a> <a href=#refsRFC3987>[RFC3987]</a></p>

</div>

<p>A <a href=#url>URL</a> is an <dfn id=absolute-url>absolute URL</dfn> if <a href=#resolve-a-url title="resolve a url">resolving</a> it results in the same output
regardless of what it is resolved relative to, and that output is
Expand All @@ -6170,28 +6417,11 @@ <h4 id=terminology-0><span class=secno>2.6.1 </span>Terminology</h4>
immediately after the <a href=#url-scheme title=url-scheme>&lt;scheme&gt;</a>
component and they are both U+002F SOLIDUS characters (//).</p>

<hr><p>This specification defines the URL
<dfn id=about:legacy-compat><code>about:legacy-compat</code></dfn> as a reserved, though
unresolvable, <code title="">about:</code> URI, for use in <a href=#syntax-doctype title=syntax-doctype>DOCTYPE</a>s in <a href=#html-documents>HTML
documents</a> when needed for compatibility with XML tools. <a href=#refsABOUT>[ABOUT]</a></p>

<p>This specification defines the URL
<dfn id=about:srcdoc><code>about:srcdoc</code></dfn> as a reserved, though
unresolvable, <code title="">about:</code> URI, that is used as
<a href="#the-document's-address">the document's address</a> of <a href=#an-iframe-srcdoc-document title="an iframe srcdoc
document"><code>iframe</code> <code title=attr-iframe-srcdoc>srcdoc</code> documents</a>. <a href=#refsABOUT>[ABOUT]</a></p>

<p class=note>The term "URL" in this specification is used in a
manner distinct from the precise technical meaning it is given in
RFC 3986. Readers familiar with that RFC will find it easier to read
<em>this</em> specification if they pretend the term "URL" as used
herein is really called something else altogether. This is a
<a href=#willful-violation>willful violation</a> of RFC 3986. <a href=#refsRFC3986>[RFC3986]</a></p>


<div class=impl>

<h4 id=dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic changes to base URLs</h4>
<h4 id=dynamic-changes-to-base-urls><span class=secno>2.6.4 </span>Dynamic changes to base URLs</h4>

<p>When an <code title=attr-xml-base><a href=#the-xml:base-attribute-(xml-only)>xml:base</a></code> attribute
changes, the attribute's element, and all descendant elements, are
Expand Down Expand Up @@ -6264,7 +6494,7 @@ <h4 id=dynamic-changes-to-base-urls><span class=secno>2.6.2 </span>Dynamic chang



<h4 id=interfaces-for-url-manipulation><span class=secno>2.6.3 </span>Interfaces for URL manipulation</h4>
<h4 id=interfaces-for-url-manipulation><span class=secno>2.6.5 </span>Interfaces for URL manipulation</h4>

<p>An interface that has a complement of <dfn id=url-decomposition-idl-attributes>URL decomposition IDL
attributes</dfn> has seven attributes with the following
Expand Down

0 comments on commit c2695e1

Please sign in to comment.