Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[] (0) Define document.charset, .characterSet, .defaultCharset
git-svn-id: http://svn.whatwg.org/webapps@1460 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Apr 18, 2008
1 parent 5c98ca8 commit 2490c74
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 37 deletions.
100 changes: 70 additions & 30 deletions index
Expand Up @@ -24,7 +24,7 @@

<h1 id=html-5>HTML 5</h1>

<h2 class="no-num no-toc" id=working>Working Draft &mdash; 17 April 2008</h2>
<h2 class="no-num no-toc" id=working>Working Draft &mdash; 18 April 2008</h2>

<p>You can take part in this work. <a
href="http://www.whatwg.org/mailing-list">Join the working group's
Expand Down Expand Up @@ -2598,6 +2598,9 @@
attribute DOMString <a href="#cookie0" title=dom-document-cookie>cookie</a>;
readonly attribute DOMString <a href="#lastmodified" title=dom-document-lastModified>lastModified</a>;
readonly attribute DOMString <a href="#compatmode" title=dom-document-compatMode>compatMode</a>;
attribute DOMString <a href="#charset0" title=dom-document-charset>charset</a>;
readonly attribute DOMString <a href="#characterset" title=dom-document-characterSet>characterSet</a>;
readonly attribute DOMString <a href="#defaultcharset" title=dom-document-defaultCharset>defaultCharset</a>;

// <a href="#dom-tree0">DOM tree accessors</a>
attribute DOMString <a href="#document.title" title=dom-document-title>title</a>;
Expand Down Expand Up @@ -2642,9 +2645,6 @@
DOMString <a href="#querycommandvalue" title=dom-document-queryCommandValue>queryCommandValue</a>(in DOMString commandId);
<a href="#selection1">Selection</a> <a href="#getselection0" title=dom-document-getSelection>getSelection</a>();
<!-- XXX we're not done here.
attribute DOMString charset;
readonly attribute DOMString defaultCharset;
readonly attribute DOMString characterSet;
readonly attribute DOMString readyState;
readonly attribute HTMLCollection scripts;
-->
Expand Down Expand Up @@ -2806,6 +2806,35 @@
</ul>
</div>

<p>Documents have an associated <dfn id=character1 title="document's
character encoding">character encoding</dfn>. When a <code>Document</code>
object is created, the <a href="#character1">document's character
encoding</a> must be initialised to UTF-16. Various algorithms during page
loading affect this value, as does the <code title=dom-document-charset><a
href="#charset0">charset</a></code> setter. <a
href="#refsIANACHARSET">[IANACHARSET]</a> <!-- XXX
http://www.iana.org/assignments/character-sets -->

<p>The <dfn id=charset0
title=dom-document-charset><code>charset</code></dfn> DOM attribute must,
on getting, return the preferred MIME name of the <a
href="#character1">document's character encoding</a>. On setting, if the
new value is an IANA-registered alias for a character encoding, the <a
href="#character1">document's character encoding</a> must be set to that
character encoding. (Otherwise, nothing happens.)

<p>The <dfn id=characterset
title=dom-document-characterSet><code>characterSet</code></dfn> DOM
attribute must, on getting, return the preferred MIME name of the <a
href="#character1">document's character encoding</a>.

<p>The <dfn id=defaultcharset
title=dom-document-defaultCharset><code>defaultCharset</code></dfn> DOM
attribute must, on getting, return the preferred MIME name of a character
encoding, possibly the user's default encoding, or an encoding associated
with the user's current geographical location, or any arbitrary encoding
name.

<h3 id=elements><span class=secno>2.2 </span>Elements</h3>

<p>The nodes representing <a href="#html-elements">HTML elements</a> in the
Expand Down Expand Up @@ -7536,7 +7565,7 @@ onActivate, onBeforeDeactivate, onDeactivate, document.hasFocus):
<dt>Contexts in which this element may be used:

<dd>If the <code title=attr-meta-charset><a
href="#charset0">charset</a></code> attribute is present, or if the
href="#charset1">charset</a></code> attribute is present, or if the
element is in the <a href="#encoding"
title=attr-meta-http-equiv-content-type>Encoding declaraton state</a>: as
the first element in a <code><a href="#head">head</a></code> element.
Expand Down Expand Up @@ -7571,7 +7600,7 @@ onActivate, onBeforeDeactivate, onDeactivate, document.hasFocus):

<dd><code title=attr-meta-content><a href="#content0">content</a></code>

<dd><code title=attr-meta-charset><a href="#charset0">charset</a></code>
<dd><code title=attr-meta-charset><a href="#charset1">charset</a></code>
(<a href="#html-" title="HTML documents">HTML</a> only)

<dt>DOM interface:
Expand All @@ -7596,15 +7625,15 @@ onActivate, onBeforeDeactivate, onDeactivate, document.hasFocus):
document-level metadata with the <code title=attr-meta-name><a
href="#name">name</a></code> attribute, pragma directives with the <code
title=attr-meta-http-equiv><a href="#http-equiv0">http-equiv</a></code>
attribute, and the file's <a href="#character1">character encoding
attribute, and the file's <a href="#character2">character encoding
declaration</a> when an HTML document is serialised to string form (e.g.
for transmission over the network or for disk storage) with the <code
title=attr-meta-charset><a href="#charset0">charset</a></code> attribute.
title=attr-meta-charset><a href="#charset1">charset</a></code> attribute.

<p>Exactly one of the <code title=attr-meta-name><a
href="#name">name</a></code>, <code title=attr-meta-http-equiv><a
href="#http-equiv0">http-equiv</a></code>, and <code
title=attr-meta-charset><a href="#charset0">charset</a></code> attributes
title=attr-meta-charset><a href="#charset1">charset</a></code> attributes
must be specified.

<p>If either <code title=attr-meta-name><a href="#name">name</a></code> or
Expand All @@ -7613,15 +7642,15 @@ onActivate, onBeforeDeactivate, onDeactivate, document.hasFocus):
title=attr-meta-content><a href="#content0">content</a></code> attribute
must also be specified. Otherwise, it must be omitted.

<p>The <dfn id=charset0 title=attr-meta-charset><code>charset</code></dfn>
<p>The <dfn id=charset1 title=attr-meta-charset><code>charset</code></dfn>
attribute specifies the character encoding used by the document. This is
called a <a href="#character1">character encoding declaration</a>.
called a <a href="#character2">character encoding declaration</a>.

<p>The <code title=attr-meta-charset><a href="#charset0">charset</a></code>
<p>The <code title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute may be specified in <a href="#html5" title=HTML5>HTML
documents</a> only, it must not be used in <a href="#xhtml5"
title=XHTML>XML documents</a>. If the <code title=attr-meta-charset><a
href="#charset0">charset</a></code> attribute is specified, the element
href="#charset1">charset</a></code> attribute is specified, the element
must be the first element in <a href="#the-head0">the <code>head</code>
element</a> of the file.

Expand Down Expand Up @@ -7892,7 +7921,7 @@ people expect to have work and what is necessary.
user agent requirements are all handled by the parsing section of the
specification. The state is just an alternative form of setting the
<code title=meta-charset>charset</code> attribute: it is a <a
href="#character1">character encoding declaration</a>.</p>
href="#character2">character encoding declaration</a>.</p>

<p>For <code><a href="#meta0">meta</a></code> elements in the <a
href="#encoding" title=attr-meta-http-equiv-content-type>Encoding
Expand All @@ -7912,7 +7941,7 @@ people expect to have work and what is necessary.
then that element must be the first element in the document's <code><a
href="#head">head</a></code> element, and the document must not contain
a <code><a href="#meta0">meta</a></code> element with the <code
title=attr-meta-charset><a href="#charset0">charset</a></code> attribute
title=attr-meta-charset><a href="#charset1">charset</a></code> attribute
present.</p>

<p>The <a href="#encoding"
Expand Down Expand Up @@ -8096,7 +8125,7 @@ people expect to have work and what is necessary.
though if we do then we have to duplicate the requirements in the
parsing section for conformance checkers -->

<p>A <dfn id=character1>character encoding declaration</dfn> is a mechanism
<p>A <dfn id=character2>character encoding declaration</dfn> is a mechanism
by which the character encoding used to store or transmit a document is
specified.

Expand Down Expand Up @@ -8127,7 +8156,7 @@ people expect to have work and what is necessary.
and, in addition, if that encoding isn't US-ASCII itself, then the
encoding must be specified using a <code><a href="#meta0">meta</a></code>
element with a <code title=attr-meta-charset><a
href="#charset0">charset</a></code> attribute or a <code><a
href="#charset1">charset</a></code> attribute or a <code><a
href="#meta0">meta</a></code> element in the <a href="#encoding"
title=attr-meta-http-equiv-content-type>Encoding declaraton state</a>.

Expand Down Expand Up @@ -30279,7 +30308,9 @@ user reload must be equivalent to .reload()
<p>The actual HTTP headers and other metadata, not the headers as mutated
or implied by the algorithms given in this specification, are the ones
that must be used when determining the character encoding according to the
rules given in the above specifications.
rules given in the above specifications. Once the character encoding is
established, the <a href="#character1">document's character encoding</a>
must be set to that character encoding.

<p>If the root element, as parsed according to the XML specifications cited
above, is found to be an <code><a href="#html">html</a></code> element
Expand Down Expand Up @@ -30339,6 +30370,9 @@ user reload must be equivalent to .reload()
versions thereof. <a href="#refsRFC2046">[RFC2046]</a> <a
href="#refsRFC2046">[RFC2646]</a>

<p>The <a href="#character1">document's character encoding</a> must be set
to the character encoding used to decode the document.

<p>Upon creation of the <code>Document</code> object, the user agent must
run the <a href="#application3"
title=concept-appcache-init-no-attribute>application cache selection
Expand Down Expand Up @@ -38322,7 +38356,7 @@ function receiver(e) {
described below.

<p>RCDATA elements can have <a href="#text1" title=syntax-text>text</a> and
<a href="#character2" title=syntax-entities>character entity
<a href="#character3" title=syntax-entities>character entity
references</a>, but the text must not contain an <a href="#ambiguous"
title=syntax-ambiguous-ampersand>ambiguous ampersand</a>. There are also
<a href="#cdata-rcdata-restrictions">further restrictions</a> described
Expand All @@ -38332,7 +38366,7 @@ function receiver(e) {
any contents (since, again, as there's no end tag, no content can be put
between the start tag and the end tag). Foreign elements whose start tag
is <em>not</em> marked as self-closing can have <a href="#text1"
title=syntax-text>text</a>, <a href="#character2"
title=syntax-text>text</a>, <a href="#character3"
title=syntax-entities>character entity references</a>, <a href="#cdata0"
title=syntax-cdata>CDATA blocks</a>, other <a href="#elements2"
title=syntax-elements>elements</a>, and <a href="#comments0"
Expand All @@ -38342,7 +38376,7 @@ function receiver(e) {
ampersand</a>.

<p>Normal elements can have <a href="#text1" title=syntax-text>text</a>, <a
href="#character2" title=syntax-entities>character entity references</a>,
href="#character3" title=syntax-entities>character entity references</a>,
other <a href="#elements2" title=syntax-elements>elements</a>, and <a
href="#comments0" title=syntax-comments>comments</a>, but the text must
not contain the character U+003C LESS-THAN SIGN (<code>&lt;</code>) or an
Expand Down Expand Up @@ -38438,7 +38472,7 @@ function receiver(e) {

<p><dfn id=attribute0 title=syntax-attribute-value>Attribute values</dfn>
are a mixture of <a href="#text1" title=syntax-text>text</a> and <a
href="#character2" title=syntax-entities>character entity references</a>,
href="#character3" title=syntax-entities>character entity references</a>,
except with the additional restriction that the text cannot contain an <a
href="#ambiguous" title=syntax-ambiguous-ampersand>ambiguous
ampersand</a>.
Expand Down Expand Up @@ -38818,7 +38852,7 @@ function receiver(e) {
<h4 id=character><span class=secno>8.1.4 </span>Character entity references</h4>

<p>In certain cases described in other sections, <a href="#text1"
title=syntax-text>text</a> may be mixed with <dfn id=character2
title=syntax-text>text</a> may be mixed with <dfn id=character3
title=syntax-entities>character entity references</dfn>. These can be used
to escape characters that couldn't otherwise legally be included in <a
href="#text1" title=syntax-text>text</a>.
Expand Down Expand Up @@ -39435,6 +39469,11 @@ function receiver(e) {
heuristically decide which to use as a default.
</ol>

<p>The <a href="#character1">document's character encoding</a> must
immediately be set to the value returned from this algorithm, at the same
time as the user agent uses the returned value to select the decoder to
use for the input stream.

<h5 id=character0><span class=secno>8.2.2.2. </span>Character encoding
requirements</h5>

Expand Down Expand Up @@ -39566,9 +39605,11 @@ function receiver(e) {
have the same Unicode interpretations in both the current encoding and
the new encoding, and if the user agent supports changing the converter
on the fly, then the user agent may change to the new converter for the
encoding on the fly. Set the encoding to the new encoding, set the <a
href="#confidence" title=concept-encoding-confidence>confidence</a> to
<i>confident</i>, and abort these steps.
encoding on the fly. Set the <a href="#character1">document's character
encoding</a> and the encoding used to convert the input stream to the new
encoding, set the <a href="#confidence"
title=concept-encoding-confidence>confidence</a> to <i>confident</i>, and
abort these steps.

<li>Otherwise, <a href="#navigate">navigate</a> to the document again,
with <a href="#replacement">replacement enabled</a>, but this time skip
Expand Down Expand Up @@ -42752,16 +42793,16 @@ function receiver(e) {
set.</p>

<p id=meta-charset-during-parse>If the element has a <code
title=attr-meta-charset><a href="#charset0">charset</a></code>
title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute, and its value is a supported encoding, and the <a
href="#confidence" title=concept-encoding-confidence>confidence</a> is
currently <i>tentative</i>, then <a href="#change">change the
encoding</a> to the encoding given by the value of the <code
title=attr-meta-charset><a href="#charset0">charset</a></code>
title=attr-meta-charset><a href="#charset1">charset</a></code>
attribute.</p>

<p>Otherwise, if the element has a <code title=attr-meta-charset><a
href="#charset0">content</a></code> attribute, and applying the <a
href="#charset1">content</a></code> attribute, and applying the <a
href="#algorithm4">algorithm for extracting an encoding from a
Content-Type</a> to its value returns a supported encoding <var
title="">encoding</var>, and the <a href="#confidence"
Expand Down Expand Up @@ -50029,7 +50070,6 @@ XXX publish a "Valid HTML5!" button with a kitten on it. Made by an artist. (Doo


Interaction with document.open/write/close is undefined
How to determine the character encoding
Integration with quirks mode problems
<style> parsing needs tweaking if we want to exactly match IE
<base> parsing needs tweaking to handle multiple <base>s
Expand Down

0 comments on commit 2490c74

Please sign in to comment.