This version:
Latest version:
Previous version:
WG Chair:
Editors:
Principal contributors:
This document is part of the Document Object Model Specification; check the W3C web site for its current status. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". Note: Since working drafts are subject to frequent change, you are advised to check the list of current W3C working drafts.
The Document Object Model (DOM) level one provides a mechanism for software developers and web script authors to access and manipulate parsed HTML and XML content. This document defines a set of objects that extends the Document Object Model (Core) such that the combination can represent all parts of a parsed XML document, and to allow XML validity checkers to be written using the interfaces defined herein.
As in the Core DOM specification, the primary Document Object Model type definitions are presented using the Object Management Group's Interface Definition Language (IDL, ISO standard 14750).
The DOM Level One (Core) specification defines a set of object definitions that are sufficient to represent a document instance (the objects that occur within the document itself). This specification extends the DOM Level One (Core) specification such that document type definitions, entities, CDATA marked sections, and conditional sections can also be represented.
The objects and interfaces defined within this document are sufficient to allow validators and other applications that make use of a DTD (Document Type Definition) to be written. For editors, the interfaces defined here will probably be insufficient for fine-grained editing, where information about the document type declaration may be necessary, though structural isomorphism should be easily accomplished.
A Document Type Definition (DTD) defines three things:
This specification gives access to all of these, though only in the post-parse form.
From a practical point of view, this means that while all the information contained within a DTD is available, not all of the information about what created it is. Parameter entity references, for example, are assumed to have been already expanded, and hence, their boundaries are lost.
This section describes the objects that are used to represent the DTD of a document. The objects are not specific to XML, although some attributes are specific to the HTML DTD. Such cases are clearly marked.
interface DocumentType : Node { attribute wstring name; attribute NodeEnumerator externalSubset; attribute NodeEnumerator internalSubset; attribute NamedNodeList generalEntities; attribute NamedNodeList parameterEntities; attribute NamedNodeList notations; attribute NamedNodeList elementTypes; };
Each document has a (possibly null) attribute that contains a reference
to a DocumentType
object. The DocumentType
class provides an interface to access all of the entity declarations,
notation declarations, and all the element type declarations.
- name
The
name
attribute is awstring
that holds the name of the DTD; i.e. the name immediately following theDOCTYPE
keyword.- externalSubset
The
externalSubset
attribute is an enumerator that allows iteration over the list of nodes (definitions) that occurred in the external subset of a document. In this example:
<!DOCTYPE ex SYSTEM "ex.dtd" > <ex/>it would iterate over all of the declarations that occurred within the
ex.dtd
external entity.Note: An iterator interface is used so as to not constrain implementations
- internalSubset
The internal subset iterates over all the definitions that occurred within the internal subset of a document (the part that appears within the document instance). For example
<!DOCTYPE ex SYSTEM "ex.dtd" [ <!ENTITY ex "example"> ]> <ex/>if would iterate over a single node: the definition of the
ex
entity.
Note: An iterator interface is used so as to not constrain implementations- generalEntities
This is a
NamedNodeList
providing an interface to the list of general entities that were defined within the external and the internal subset. For example in:
<!DOCTYPE ex SYSTEM "ex.dtd" [ <!ENTITY foo "foo"> <!ENTITY bar "bar"> <!ENTITY % baz "baz"> ]> <ex/>the interface would provide access to
foo
andbar
but notbaz
. All objects supporting theNode
interface that are accessed though this attribute, will also support theEntity
interface (defined below).- parameterEntities
This is a
NamedNodeList
providing an interface to the list of parameter entities that were defined within the external and the internal subset. In the example above, the interface would provide access tobaz
but notfoo
orbar
. All objects supporting theNode
interface that are accessed though this attribute, will also support theEntity
interface (defined below).- notations
This is a
NamedNodeList
providing an interface to the list of notations that were defined within the external and the internal subset. All objects supporting theNode
interface that are accessed though this attribute, will also support theNotation
interface (defined below).- elementTypes
This is a
NamedNodeList
providing an interface to the list of element types that were defined within the external and the internal subset. All objects supporting theNode
interface that are accessed though this attribute, will also support theElementDefinition
interface (defined below).
interface ElementDefinition : Node { enum ContentType { EMPTY, ANY, PCDATA, MODEL_GROUP }; attribute wstring name; attribute ContentType contentType; attribute ModelGroup contentModel; attribute NamedNodeList attributeDefinitions; attribute StringList inclusions; attribute StringList exceptions; };
The definition of each element defined within the external or internal
subset (providing it is parsed), will be available through the elementTypes
attribute of the DocumentType
object. The name, attribute
list, and content model are all available for inspection.
- name
This is the name of the type of element being defined.
- contentType
This attribute specifies the type of content of the element. The different types are:
- EMPTY
The element is an empty element, and cannot have content.
- ANY
The element may have character data, or any of the other elements defined within the DTD as content, in any order and sequence.
- PCDATA
The element can have only PCDATA (Parsed Character Data) as content.
- MODEL_GROUP
The element has a specific content model associated with it. The model is accessible through the
contentModel
attribute (below).- contentModel
If the
contentType
isMODEL_GROUP
, then this will provide access to aModelGroup
(below) object that is the root of the content model object hierarchy for this element. For other content types, this will be null.- attributeDefinitions
This
NamedNodeList
provides an interface for accessing the list of attributes that were defined to be on anElementDefinition
. Each object supporting theNode
interface that is accessed through this attribute will also support theAttributeDefinition
interface.- inclusions
This provides an interface to a list of element type names that are included in the content model of this element by the SGML inclusion/exception mechanism (not available from XML, but used in HTML).
- exceptions
This provides an interface to a list of element type names that are excluded from the content model of this element by the SGML inclusion/exception mechanism (not available from XML, but used in HTML).
enum OccurrenceType { OPT, // ? PLUS, // + REP // * }; interface PCDATAToken : Node { // Token type for the string #PCDATA }; interface ElementToken: Node { attribute wstring name; attribute OccurrenceType occurrence; }; interface ModelGroup : Node { enum ConnectionType { OR, // | SEQ, // , AND }; attribute ConnectionType connector; attribute OccurrenceType occurrence; attribute NodeList tokens; };
The ModelGroup
object represents the content model of an
ElementDefinition
. The content model is represented as a
tree, where each node specifies how its children are connected, and the
number of times that it can occur within its parent. Leaf nodes in the
tree are either PCDATAToken
or ElementToken
.
ModelGroup
- connector
This attribute specifies how the members of
tokens
are joined together.- occurrence
This specifies how often this
ModelGroup
may occur at its position in the content model.- tokens
This provides access to the list of tokens that are allowed within this
ModelGroup
. Note that onlyPCDATAToken
andElementToken
may occur within the token list.
ElementToken
- name
This is the type name for the element.
- occurrence
This indicates how many times this element can occur in its position in the content model.
interface AttributeDefinition : Node { enum DeclaredValueType { CDATA, ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION, NAME_TOKEN_GROUP }; enum DefaultValueType { FIXED, REQUIRED, IMPLIED }; attribute wstring name; attribute StringList allowedTokens; attribute DeclaredValueType declaredType; attribute DefaultValueType defaultType; attribute NodeList defaultValue; };
The AttributeDefinition
interface is used to access
information about a particular attribute definition on a given element.
Object supporting this interface are available from the ElementDefinition
object through the attributeDefinitions
attribute.
- name
The name of the attribute.
- allowedTokens
The list of tokens that are allowed as values. For example, in
<!DOCTYPE ex [ <!ELEMENT ex (#PCDATA) > <!ATTLIST ex test (FOO|BAR) "FOO" > ]> <ex></ex>this would hold
FOO
andBAR
.- declaredType
This attribute indicates the type of values the attribute may contain.
- defaultType
This specifies whether the attribute must be specified in the instance, and if it is not, what the attribute value will be if not provided.
- defaultValue
This provides an interface to a list of
Nodes
that make up the default value for an attribute. This value is used if the attribute was not given an explicit value in the document instance.
interface Notation : Node { attribute wstring name; attribute boolean isPublic; attribute string publicIdentifier; attribute string systemIdentifier; };
The Notation
object is used to represent the definition of
a notation within a DTD.
- name
This is the name of the notation.
- isPublic
If a public identifier was specified in the notation declaration, this will be
TRUE
, and thepublicIdentifier
attribute will contain the string for the public identifier.- publicIdentifier
If a public identifier was specified in the notation declaration, this will hold the public identifier string, otherwise it will be null.
- systemIdentifier
If a system identifier was specified in the notation declaration, this will hold the system identifier string, otherwise it will be null.
To be written.
CDATA and conditional sections are objects specific to XML. CDATA
sections are used in the document instance, and conditional sections in
the DTD.
interface CDATASection : Node { attribute wstring content; };
CDATA sections are used in the document instance, and provide a region in which most of the XML delimiter recognition does not take place. The primary purpose is for including material such as XML fragments, without needing to escape all the delimiters.
- content
This holds the text that was contained by the CDATA section. Note that this may contain characters that need to be escaped outside of CDATA sections.
interface ConditionalSection : Node { attribute boolean included; attribute Node condition; attribute NodeList content; };
Conditional sections are used in the DTD to provide a limited form of control over inclusion or exclusion of DTD fragments.
- included
This is a flag indicating whether this section was included during parsing.
- condition
This
Node
indicates the condition. Generally, it will be aText
node containing eitherINCLUDE
orIGNORE
.- content
The content of this section.
typedef sequence<wstring> StringList; interface DocumentType : Node { attribute wstring name; attribute NodeEnumerator externalSubset; attribute NodeEnumerator internalSubset; attribute NamedNodeList generalEntities; attribute NamedNodeList parameterEntities; attribute NamedNodeList notations; attribute NamedNodeList elementTypes; }; enum OccurrenceType { OPT, // ? PLUS, // + REP // * }; interface ModelGroup : Node { enum ConnectionType { OR, // | SEQ, // , AND }; attribute ConnectionType connector; attribute OccurrenceType occurrence; attribute NodeList tokens; }; interface ElementDefinition : Node { enum ContentType { EMPTY, ANY, PCDATA, MODEL_GROUP }; attribute wstring name; attribute ContentType contentType; attribute ModelGroup contentModel; attribute NamedNodeList attributeDefinitions; attribute StringList inclusions; attribute StringList exceptions; }; interface PCDATAToken : Node { // Token type for the string #PCDATA }; interface ElementToken: Node { attribute wstring name; attribute OccurrenceType occurrence; }; interface AttributeDefinition : Node { enum DeclaredValueType { CDATA, ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION, NAME_TOKEN_GROUP }; enum DefaultValueType { VALUE, FIXED, REQUIRED, IMPLIED }; attribute wstring name; attribute StringList allowedTokens; attribute DeclaredValueType declaredType; attribute DefaultValueType defaultType; attribute NodeList defaultValue; }; interface Notation : Node { attribute wstring name; attribute boolean isPublic; attribute string publicIdentifier; attribute string systemIdentifier; };
typedef sequence<octet&> buffer; interface Entity : Node { attribute wstring name; attribute boolean isParameterEntity; }; interface InternalEntity : Entity { attribute wstring content; }; interface ExternalEntity : Entity { attribute boolean isNDATA; attribute boolean isPublic; attribute string publicIdentifier; attribute string systemIdentifier; }; interface ExternalTextEntity : ExternalEntity { attribute wstring content; }; interface ExternalNDATAEntity : ExternalEntity { attribute Notation notation; attribute buffer content; }; interface NDATA : Node { attribute buffer content; };
interface CDATASection : Node { attribute wstring content; }; interface ConditionalSection : Node { attribute boolean included; attribute Node condition; attribute NodeList content; };
// // Note: the IDL contains the following definition for a StringList: // // typedef sequence<String> StringList; // // Because Java does not support templates, we are using a Vector for this. // public interface DocumentType extends Node { void setName(String name); String getName(); void setExternalSubset(NodeList externalSubset); NodeList getExternalSubset(); void setInternalSubset(NodeList internalSubset); NodeList getInternalSubset(); void setNotations(NamedNodeList notations); NamedNodeList getNotations(); void setElementTypes(NamedNodeList elementTypes); NamedNodeList getElementTypes(); }; public final class OccurrenceType { public final int OPT = 0; // ? public final int PLUS = 1; // + public final int REP = 2; // * }; public interface ElementDefinition extends Node { public final class ContentType { public final int EMPTY = 0; public final int ANY = 1; public final int PCDATA = 2; public final int MODEL_GROUP = 3; }; void setName(String name); String getName(); // The ints for the following two methods should be // constants defined in the ContentType class. void setContentType(int contentType); int getContentType(); void setContentModel(ModelGroup contentModel); ModelGroup getContentModel(); void setAttributeDefinitions(NamedNodeList attributeDefinitions); NamedNodeList getAttributeDefinitions(); void setInclusions(Vector inclusions); Vector getInclusions(); void setExceptions(Vector exceptions); Vector getExceptions(); }; public interface ModelGroup extends Node { public final class ConnectionType { public final int OR = 0; // | public final int SEQ = 1; // , public final int AND = 2; }; // The ints for the following two methods should // be constants defined in the ConnectionType class. void setConnector(int connector); int getConnector(); // The ints for the two methods below should be // constants defined in the OccurrenceType class. void setOccurrence(int occurrence); int getOccurrence(); void setTokens(NodeList tokens); NodeList getTokens(); }; public interface PCDATAToken extends Node { // Token type for the string #PCDATA }; public interface ElementToken extends Node { void setName(String name); String getName(); // The ints for the following two methods should be // constants defined in the OccurrenceType class. void setOccurrence(int occurrence); int getOccurrence(); }; public interface AttributeDefinition extends Node { public final class DeclaredValueType { public final int CDATA = 0; public final int ID = 1; public final int IDREF = 2; public final int IDREFS = 3; public final int ENTITY = 4; public final int ENTITIES = 5; public final int NMTOKEN = 6; public final int NMTOKENS = 7; public final int NOTATION = 8; public final int NAME_TOKEN_GROUP = 9; }; public final class DefaultValueType { public final int VALUE = 0; public final int FIXED = 1; public final int REQUIRED = 2; public final int IMPLIED = 3; }; void setName(String name); String getName(); void setAllowedTokens(Vector allowedTokens); Vector getAllowedTokens(); // The ints for the following two methods should be // constants declared in the DeclaredValueType class. void setDeclaredType(int declaredType); int getDeclaredType(); // The ints for the following two methods should be // constants declared in the DefaultValueType class. void setDefaultType(int defaultType); int getDefaultType(); void setDefaultValue(NodeList defaultValue); NodeList getDefaultValue(); }; public interface Notation extends Node { void setName(String name); String getName(); void setIsPublic(boolean isPublic); boolean getIsPublic(); void setPublicIdentifier(String publicIdentifier); String getPublicIdentifier(); void setSystemIdentifier(String systemIdentifier); String getSystemIdentifier(); }; public interface CDATASection extends Node { void setContent(String content); String getContent(); };
(This section has yet to be written.)
There are a large number of terms that the DOM uses which may not be familiar to many of the readers. We suggest that you review the glossary if you encounter terms that aren't familiar.