Skip to content

Commit

Permalink
[giow] (2) Try to make the application/x-www-form-urlencoded algorith…
Browse files Browse the repository at this point in the history
…m work even for ISO-2022-JP's crazy escape schemes.

Fixing http://www.w3.org/Bugs/Public/show_bug.cgi?id=12199

git-svn-id: http://svn.whatwg.org/webapps@6592 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Sep 27, 2011
1 parent cfbc996 commit 60f0ed7
Show file tree
Hide file tree
Showing 3 changed files with 158 additions and 133 deletions.
94 changes: 51 additions & 43 deletions complete.html
Expand Up @@ -239,7 +239,7 @@

<header class=head id=head><p><a class=logo href=http://www.whatwg.org/><img alt=WHATWG height=101 src=/images/logo width=101></a></p>
<hgroup><h1>Web Applications 1.0</h1>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 26 September 2011</h2>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 27 September 2011</h2>
</hgroup><dl><dt>Multiple-page version:</dt>
<dd><a href=http://www.whatwg.org/specs/web-apps/current-work/complete/>http://www.whatwg.org/specs/web-apps/current-work/complete/</a></dd>
<dt>One-page version:</dt>
Expand Down Expand Up @@ -52591,6 +52591,15 @@ <h5 id=form-submission-algorithm><span class=secno>4.10.22.3 </span>Form submiss

<h5 id=url-encoded-form-data><span class=secno>4.10.22.5 </span>URL-encoded form data</h5>

<p class=note>This form data set encoding is in many ways an
aberrant monstrosity, the result of many years of implementation
accidents and compromises leading to a set of requirements necessary
for interoperability, but in no way representing good design
practices. In particular, readers are cautioned to pay close
attention to the twisted details involving repeated (and in some
cases nested) conversions between character encodings and byte
sequences.</p>

<div class=impl>

<p>The <dfn id=application/x-www-form-urlencoded-encoding-algorithm><code title="">application/x-www-form-urlencoded</code> encoding
Expand Down Expand Up @@ -52647,65 +52656,65 @@ <h5 id=url-encoded-form-data><span class=secno>4.10.22.5 </span>URL-encoded form

<li>

<p>For each character in the entry's name and value, apply the
<p>Encode the entry's name and value using the selected
character encoding. The entry's name and value are now byte
strings.</p>

</li>

<li>

<p>For each byte in the entry's name and value, apply the
appropriate subsubsteps from the following list:</p>

<dl class=switch><dt>The character is a U+0020 SPACE character</dt>
<dl class=switch><dt>The byte is 0x20 (U+0020 SPACE if interpreted as ASCII)</dt>

<dd>Replace the character with a single U+002B PLUS SIGN
character (+).</dd>
<dd>Replace the byte with a single 0x2B byte (U+002B PLUS SIGN
character (+) if interpreted as ASCII).</dd>


<!-- * - . 0-9 a-z _ A-Z -->

<dt>If the character is in the range U+002A, U+002D, U+002E,
U+0030 to U+0039, U+0041 to U+005A, U+005F, U+0061 to
U+007A</dt>
<dt>If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39,
0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt>

<dd><p>Leave the character as is.</dd>
<dd><p>Leave the byte as is.</dd>


<dt>Otherwise</dt>

<dd>

<p>Replace the character with a string formed as follows:</p>

<ol><li><p>Let <var title="">s</var> be an empty string.</li>

<li>
<ol><li><p>Let <var title="">s</var> be a string consisting of a
U+0025 PERCENT SIGN character (%) followed by two characters
in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9)
and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL
LETTER F representing the hexadecimal value of the byte in
question (zero-padded if necessary).</li>

<p>For each byte <var title="">b</var> of the character when
expressed in the selected character encoding in turn, run
the appropriate subsubsubstep from the list below:</p>
<li><p>Encode the string <var title="">s</var> as US-ASCII,
so that it is now a byte string.</p>

<dl class=switch><dt>If the byte is in the range 0x20, 0x2A, 0x2D, 0x2E,
0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt>
<li><p>Replace the byte in question in the name or value
being processed by the bytes in <var title="">s</var>,
preserving their relative order.</li>

<dd><p>Append to <var title="">s</var> the Unicode
character with the code point equal to the byte.</dd>

<dt>Otherwise</dt>
</ol></dd>

<dd><p>Append to the string a U+0025 PERCENT SIGN character
(%) followed by two characters in the ranges U+0030 DIGIT
ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL
LETTER A to U+0046 LATIN CAPITAL LETTER F representing the
hexadecimal value of the byte (zero-padded if
necessary).</dd>
</dl></li>

</dl></li>
<li>

</ol></dd>
<p>Interpret the entry's name and value as Unicode strings
encoded in US-ASCII. (All of the bytes in the string will be in
the range 0x00 to 0x7F; the high bit will be zero throughout.)
The entry's name and value are now Unicode strings again.</p>

</dl></li>
</li>

<li><p>If the entry's name is "<code title=attr-fe-name-isindex><a href=#attr-fe-name-isindex>isindex</a></code>",
its type is "<code title="">text</code>", and this is the first
entry in the <var title="">form data set</var>, then append the
value to <var title="">result</var> and skip the rest of the
substeps for this entry, moving on to the next entry, if any, or
the next step in the overall algorithm otherwise.</li>
<li><p>If the entry's name is "<code title=attr-fe-name-isindex><a href=#attr-fe-name-isindex>isindex</a></code>", its type is "<code title="">text</code>", and this is the first entry in the <var title="">form data set</var>, then append the value to <var title="">result</var> and skip the rest of the substeps for this
entry, moving on to the next entry, if any, or the next step in
the overall algorithm otherwise.</li>

<li><p>If this is not the first entry, append a single U+0026
AMPERSAND character (&amp;) to <var title="">result</var>.</li>
Expand Down Expand Up @@ -52799,18 +52808,17 @@ <h5 id=url-encoded-form-data><span class=secno>4.10.22.5 </span>URL-encoded form
</li>

<li><p>Convert the <var title="">name</var> and <var title="">value</var> strings to their byte representation in
US-ASCII (i.e. convert the Unicode string to a byte
string).</li>
ISO-8859-1 (i.e. convert the Unicode string to a byte string,
mapping code points to byte values directly).</li>

<li><p>Add a pair consisting of <var title="">name</var> and <var title="">value</var> to <var title="">pairs</var>.</li>

</ol></li>

<li><p>If any of the name-value pairs in <var title="">pairs</var>
have a name component consisting of the string "<code title="">_charset_</code>" encoded in US-ASCII, and the value
component of the first such pair is the name of a supported
character encoding, then let <var title="">encoding</var> be that
character encoding.</li>
component of the first such pair, when decoded as US-ASCII, is the
name of a supported character encoding, then let <var title="">encoding</var> be that character encoding.</li>

<li><p>Convert the name and value components of each name-value
pair in <var title="">pairs</var> to Unicode by interpreting the
Expand Down
94 changes: 51 additions & 43 deletions index
Expand Up @@ -243,7 +243,7 @@

<header class=head id=head><p><a class=logo href=http://www.whatwg.org/><img alt=WHATWG height=101 src=/images/logo width=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 26 September 2011</h2>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 27 September 2011</h2>
</hgroup><dl><dt><strong>Web developer edition</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
Expand Down Expand Up @@ -52458,6 +52458,15 @@ fur

<h5 id=url-encoded-form-data><span class=secno>4.10.22.5 </span>URL-encoded form data</h5>

<p class=note>This form data set encoding is in many ways an
aberrant monstrosity, the result of many years of implementation
accidents and compromises leading to a set of requirements necessary
for interoperability, but in no way representing good design
practices. In particular, readers are cautioned to pay close
attention to the twisted details involving repeated (and in some
cases nested) conversions between character encodings and byte
sequences.</p>

<div class=impl>

<p>The <dfn id=application/x-www-form-urlencoded-encoding-algorithm><code title="">application/x-www-form-urlencoded</code> encoding
Expand Down Expand Up @@ -52514,65 +52523,65 @@ fur

<li>

<p>For each character in the entry's name and value, apply the
<p>Encode the entry's name and value using the selected
character encoding. The entry's name and value are now byte
strings.</p>

</li>

<li>

<p>For each byte in the entry's name and value, apply the
appropriate subsubsteps from the following list:</p>

<dl class=switch><dt>The character is a U+0020 SPACE character</dt>
<dl class=switch><dt>The byte is 0x20 (U+0020 SPACE if interpreted as ASCII)</dt>

<dd>Replace the character with a single U+002B PLUS SIGN
character (+).</dd>
<dd>Replace the byte with a single 0x2B byte (U+002B PLUS SIGN
character (+) if interpreted as ASCII).</dd>


<!-- * - . 0-9 a-z _ A-Z -->

<dt>If the character is in the range U+002A, U+002D, U+002E,
U+0030 to U+0039, U+0041 to U+005A, U+005F, U+0061 to
U+007A</dt>
<dt>If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39,
0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt>

<dd><p>Leave the character as is.</dd>
<dd><p>Leave the byte as is.</dd>


<dt>Otherwise</dt>

<dd>

<p>Replace the character with a string formed as follows:</p>

<ol><li><p>Let <var title="">s</var> be an empty string.</li>

<li>
<ol><li><p>Let <var title="">s</var> be a string consisting of a
U+0025 PERCENT SIGN character (%) followed by two characters
in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9)
and U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL
LETTER F representing the hexadecimal value of the byte in
question (zero-padded if necessary).</li>

<p>For each byte <var title="">b</var> of the character when
expressed in the selected character encoding in turn, run
the appropriate subsubsubstep from the list below:</p>
<li><p>Encode the string <var title="">s</var> as US-ASCII,
so that it is now a byte string.</p>

<dl class=switch><dt>If the byte is in the range 0x20, 0x2A, 0x2D, 0x2E,
0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt>
<li><p>Replace the byte in question in the name or value
being processed by the bytes in <var title="">s</var>,
preserving their relative order.</li>

<dd><p>Append to <var title="">s</var> the Unicode
character with the code point equal to the byte.</dd>

<dt>Otherwise</dt>
</ol></dd>

<dd><p>Append to the string a U+0025 PERCENT SIGN character
(%) followed by two characters in the ranges U+0030 DIGIT
ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL
LETTER A to U+0046 LATIN CAPITAL LETTER F representing the
hexadecimal value of the byte (zero-padded if
necessary).</dd>
</dl></li>

</dl></li>
<li>

</ol></dd>
<p>Interpret the entry's name and value as Unicode strings
encoded in US-ASCII. (All of the bytes in the string will be in
the range 0x00 to 0x7F; the high bit will be zero throughout.)
The entry's name and value are now Unicode strings again.</p>

</dl></li>
</li>

<li><p>If the entry's name is "<code title=attr-fe-name-isindex><a href=#attr-fe-name-isindex>isindex</a></code>",
its type is "<code title="">text</code>", and this is the first
entry in the <var title="">form data set</var>, then append the
value to <var title="">result</var> and skip the rest of the
substeps for this entry, moving on to the next entry, if any, or
the next step in the overall algorithm otherwise.</li>
<li><p>If the entry's name is "<code title=attr-fe-name-isindex><a href=#attr-fe-name-isindex>isindex</a></code>", its type is "<code title="">text</code>", and this is the first entry in the <var title="">form data set</var>, then append the value to <var title="">result</var> and skip the rest of the substeps for this
entry, moving on to the next entry, if any, or the next step in
the overall algorithm otherwise.</li>

<li><p>If this is not the first entry, append a single U+0026
AMPERSAND character (&amp;) to <var title="">result</var>.</li>
Expand Down Expand Up @@ -52666,18 +52675,17 @@ fur
</li>

<li><p>Convert the <var title="">name</var> and <var title="">value</var> strings to their byte representation in
US-ASCII (i.e. convert the Unicode string to a byte
string).</li>
ISO-8859-1 (i.e. convert the Unicode string to a byte string,
mapping code points to byte values directly).</li>

<li><p>Add a pair consisting of <var title="">name</var> and <var title="">value</var> to <var title="">pairs</var>.</li>

</ol></li>

<li><p>If any of the name-value pairs in <var title="">pairs</var>
have a name component consisting of the string "<code title="">_charset_</code>" encoded in US-ASCII, and the value
component of the first such pair is the name of a supported
character encoding, then let <var title="">encoding</var> be that
character encoding.</li>
component of the first such pair, when decoded as US-ASCII, is the
name of a supported character encoding, then let <var title="">encoding</var> be that character encoding.</li>

<li><p>Convert the name and value components of each name-value
pair in <var title="">pairs</var> to Unicode by interpreting the
Expand Down

0 comments on commit 60f0ed7

Please sign in to comment.