W3CNOTE-dcd-19980731


Document Content Description for XML

Submission to the World Wide Web Consortium 31-July-1998

This version:
http://www.w3.org/TR/1998/NOTE-dcd-19980731
http://www.w3.org/TR/1998/NOTE-dcd-19980731.html
Latest version:
http://www.w3.org/TR/NOTE-dcd
Editors:
Tim Bray (Textuality) <tbray@textuality.com>
Charles Frankston (Microsoft) <cfranks@microsoft.com>
Ashok Malhotra (IBM) <petsa@us.ibm.com>

Status of this document

This document is a submission to the World Wide Web Consortium (see Submission Request, W3C Staff Comment). It is the initial draft of the specification of the DCD facility. It is intended for review and comment by W3C members and is subject to change.

This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.

Abstract

This document proposes a structural schema facility, Document Content Description (DCD), for specifying rules covering the structure and content of XML documents. The DCD proposal incorporates a subset of the XML-Data Submission [XML-Data] and expresses it in a way which is consistent with the ongoing W3C RDF (Resource Description Framework) [RDF] effort; in particular, DCD is an RDF vocabulary. DCD is intended to define document constraints in an XML syntax; these constraints may be used in the same fashion as traditional XML DTDs. DCD also provides additional properties, such as basic datatypes.

Document Content Description for XML

Version 1.0

Table of Contents

1. Introduction
    1.1 Motivating Examples
    1.2 Design Principles
    1.3 Future Work
2. The DCD Framework
    2.1 A Note on Syntax
        2.1.1 Proposed Simplification of RDF Syntax
        2.1.2 Interchangeability of Elements and Attributes
    2.2 DCD Nodes and Resource Types
    2.3 Referring to Elements and Attributes
3. The DCD Vocabulary
    3.1 Properties which apply to DCDs
        3.1.1 AttributeDef
        3.1.2 Description
        3.1.3 InternalEntityDef and ExternalEntityDef
        3.1.4 Contents
        3.1.5 Namespace
    3.2 Properties Which Apply to Element Definitions
        3.2.1 Attribute and AttributeDef
        3.2.2 Contents
        3.2.3 Datatype
        3.2.4 Default and Fixed
        3.2.5 Description
        3.2.6 Groups, Occurs and Order
        3.2.7 Max, Min, MaxExclusive, MinExclusive
        3.2.8 Model
        3.2.9 Root
        3.2.10 Type
    3.3 Properties Which Apply to Attribute Definitions
        3.3.1 Global
        3.3.2 ID-Role
        3.3.3 Name
        3.3.4 Occurs
    3.4 Properties Which Apply to Internal Entity Definitions
        3.4.1 Name
        3.4.2 Value
    3.5 Properties Which Apply to External Entity Definitions
        3.5.1 Name
        3.5.2 PublicID
        3.5.3 SystemID
4. Datatypes
    4.1 Datatype Specifications
    4.2 Datatypes in instances
    4.3 Picture Constraints

Appendices

A. Local Element Definitions
B. Inheritance and Subclassing
C. Null Values
D. Unique Values
E. References
F. Acknowledgements


1. Introduction

The Document Content Description facility for XML (abbreviated DCD) is an RDF vocabulary designed for describing constraints to be applied to the structure and content of XML documents. The abbreviation "DCD" is used to describe both the general facility described in this document and individual schema instances that conform to it.

1.1 Motivating Examples

The following example is a DCD which describes the important characteristics of the DL element from HTML:

<DCD>
  <ElementDef Type="DL" Model="Elements" Content="Closed">
    <Description>A simple 'definition list' construct, which contains paired
       'DT' (DL Term) and 'DD' (DL Definition) elements</Description>
    <Group Occurs="OneOrMore" RDF:Order="Seq">
      <Element>DT</Element>
      <Group Occurs="Optional"><Element>DD</Element></Group>
    </Group>
  </ElementDef>
  <ElementDef Type="DT" Model ="Data" Content="Closed">
    <Description>The term being defined in a DL list item</Description>
  </ElementDef>
  <ElementDef Type="DD" Model ="Mixed" Content="Open">
    <Description>A term's definition in a DL list item</Description>
    <!-- Open because lots of markup can be in a DL -->
  </ElementDef> 
</DCD>

The example above is very document-oriented and in many respects isomorphic to what can be done with an XML DTD. The following example, less document-oriented, provides constraints for an airline booking:

<DCD>
 <ElementDef Type="Booking" Model="Elements" Content="Closed">
   <Description>Describes an airline reservation</Description>
   <Group RDF:Order="Seq">
     <Element>LastName</Element> <Element>FirstInitial</Element>
     <Element>SeatRow</Element> <Element>SeatLetter</Element>
     <Element>Departure</Element> <Element>Class</Element>
   </Group>
 </ElementDef>

 <!-- example omits boring field declarations -->
 <ElementDef Type="SeatRow" Model="Data" Datatype="i1" Min="1" Max="72" />
 <ElementDef Type="SeatLetter" Model="Data" Datatype="char" Min="A" Max="K"/>
 <ElementDef Type="Class" Model="Data" Datatype="char" Default="1"/>
</DCD>

Here is a booking record that conforms to the schema:

<Booking>
  <LastName>Bray</LastName><FirstInitial>T</FirstInitial>
  <SeatRow>33</SeatRow><SeatLetter>B</SeatLetter>
  <Departure>1997-05-24T07:55:00+1</Departure>
</Booking>

1.2 Design Principles

DCD is based on the following design principles:

  1. DCD semantics shall be a superset of those provided by XML DTDs.
  2. The DCD data model and syntax shall be conformant with that of RDF.
  3. The constraints in a DCD shall be straightforwardly usable by authoring tools and other applications which wish to retrieve information about a document's content and structure.
  4. DCD shall use mechanisms from other W3C working groups wherever they are appropriate and efficient.
  5. DCDs should be human-readable and reasonably clear.

1.3 Future Work

It is anticipated that for DCD to realize its full potential, several types of constraint are required beyond those described in this note. These include:

Subclassing and Inheritance
The creation and maintenance of document schemas is a complex and demanding task, similar in many respects to that of software engineering. Software engineering has made great progress based on object-oriented design principles, which allow the efficient re-use and customization of proven pieces of work. The same techniques should be available to the designers and maintainers of document schemas. A proposal for subclassing and inheritance is contained in Appendix "B. Inheritance and Subclassing".
Database Interface
DCD is expected to find use in constraining XML documents that contain extracts from databases. To meet these needs, it will be necessary to add properties which describe database constraints such as the uniqueness of values, key fields and referential integrity. It will also be necessary to define datatypes that faithfully mirror database datatypes such as fixed length strings. Section "4. Datatypes" is a proposal for a specific datatype repertoire. It is anticipated that a future version of the DCD specification will include other facilities to support database interaction and that they will be conformant to applicable industry and international standards such as [SQL]
The &-Connector
There was a request from the database community to allow Bag as another legal value for the RDF:Order property. This would support the concept that a Relational database table is an unordered collection of columns. But this would bring back the SGML &-connector and so, on the balance, it was decided (for this release) to disallow Bag as a legal value for the RDF:Order property. This decision may need to be revisited in future.

2. The DCD Framework

A Document Content Description (DCD) is a set of properties used to constrain the types of elements and names of attributes that may appear in an XML document, the contents of the elements, and the values of the attributes.

2.1 A Note on Syntax

2.1.1 Proposed Simplification of RDF Syntax

As stated earlier, it is intended that DCD be conformant to the RDF Model and Syntax Specification [RDF]. However, it assumes certain simplifications in the RDF syntax which we intend to propose to the RDF working group. This syntax will be adopted only if ratified by the RDF working group. These syntactic simplifications are:

RDF:li
The RDF:li should not be required if typed nodes are being inserted into a collection.
Collection type
The collection type for properties can be specified as an attribute of the node.

2.1.2 Interchangeability of Elements and Attributes

The RDF syntax document allows non-repeatable properties to be expressed as attributes of the parent element. Thus, properties such as Name, Content and Model can be expressed either as elements or as attributes. The following are, therefore, equivalent:

<DCD>
  <ElementDef>
    <Type>DL</Type>
    <Description>A simple 'definition list' construct, which contains paired
        "DT" (DL Term) and "DD" (DL Definition) Elements</Description>
    <Model>Elements</Model>
    ... 
  </ElementDef>
</DCD>

<DCD>
  <?DCD syntax="explicit"?> 
  <ElementDef Type="DL" Model="Elements">
     <Description>A simple 'definition list' construct, which contains paired
         "DT" (DL Term) and "DD" (DL Definition) Elements</Description>
      ... 
  </ElementDef>
</DCD>

As shown in the above example, a optional processing instruction (PI) may be added to a DCD to specify the alternative "explicit" syntax form. The examples are equivalent and legal even without the PI. When the DCD PI is present with syntax="explicit" specified, then throughout the schema, the following properties must be specified using attribute syntax as shown below:

Type
Model
Occurs
RDF:Order
Content
Root
Fixed
Datatype

and all other properties must be specified using element syntax.

<?DCD syntax="explicit"?>

The examples in this document, for the most part, use the attribute form for properties.

2.2 DCD Nodes and Resource Types

The namespace which describes DCD properties and resources is identified by the URI http://w3.org/Schemas/DCD. It contains the following types: DCD, ElementDef, Group, AttributeDef, ExternalEntityDef and InternalEntityDef.

In the XML form of a DCD, the types of the elements correspond to RDF's property types. In the interests of brevity, we refer, for example, to "objects of type Namespace", which in the XML syntax are elements whose type is "Namespace" representing RDF properties where the property type is "Namespace".

A resource of type DCD is a document structure description that constrains the structure and contents of any document that identifies itself as falling under that DCD's constraints. An XML document can be identified as falling under the constraints of more than one DCD, in which event the properties applying to each such DCD are taken as constraints on the XML document. This provides two benefits: first, a single DCD can be used to provide constraints for large numbers of separate documents. Second, the DCD object provides a convenient level of granularity for applying namespace mechanisms.

The resources of type ElementDef and AttributeDef are more detailed structure descriptors. The properties of these resources provide constraints governing elements and attributes in the XML document. Implicitly, any node which is the value of an ElementDef or AttributeDef property is of respective type ElementDef or AttributeDef; however, there is typically no value in indicating this explicitly with an RDF:InstanceOf property.

2.3 Referring to Elements and Attributes

Most DCD declarations constrain the content and attributes of elements in document instances. This is done by assigning properties to objects of type ElementDef. These assignments may be seen as element type declarations. Element definitions declare that elements may have other elements as children, or may have attributes provided with certain names and properties. Child elements must be collected together into Groups which have Order and Occurs properties. See "3.2.6 Groups, Occurs and Order". Each ElementDef must have a Type property. This must be unique within the DCD. But, see Appendix "A. Local Element Definitions".

The attributes and the elements referred to in a particular DCD may come from the same DCD or from other DCDs identified by namespaces. Element definitions from within the same DCD are referred to by their Type property. If the element definition comes from another namespace, the value of the Type property may be a qualified name, where the prefix identifies the namespace.

For example, in the following, FirstName, MI and LastName are defined elsewhere in the DCD but Address comes from a namespace declared with the common prefix.

<ElementDef Type="person" Model="Elements">
  <Group RDF:Order="Seq">
    <Element>FirstName</Element>
    <Group Occurs="Optional">
      <Element>MI</Element>
    </Group>
    <Element>LastName</Element>
    <Element>common:Address</Element>
  </Group>
</ElementDef>

Attributes are declared in DCDs using objects of type AttributeDef. An attribute definition may occur on its own, as a property of the DCD, or it may occur within an element definition. In either case it may have a Global property whose value may be True or False. The default is False. Every attribute definition must have a Name property. If the value of the Global property is True the Name property must be unique in the DCD.

Global attributes can referred to by their names in any element definition within the DCD. Global attributes in other namespaces can be referred to by the use of qualified names.

In the following example, Hidden is a global attribute in the DCD, while schemas:CLASS is a global attribute from another namespace.

<DCD>
  <AttributeDef Name="Hidden" Default="False" Global="True" />
  <ElementDef Type="MyType" />
    <Attribute>Hidden</Attribute>
    <Attribute>schemas:CLASS</Attribute>
  </ElementDef>
</DCD>

In the following, the SRC attribute is defined locally within the IMG element definition.

<DCD>
  <AttributeDef Name="Border" Global="True">
    <!-- facts about the Border attribute -->
  </AttributeDef>
  <ElementDef Type="IMG" />
    <AttributeDef Name="SRC" Datatype="uri">
      <Description>The URI where the image may be retrieved" </Description>
    <Attribute Name="Border"/>
  </ElementDef>
</DCD>

Attributes defined with Global="False" can be referred to in other element definitions in the DCD by a resource identifier. For example:

<DCD>
  <AttributeDef Global="False" Name="size" Datatype="int" id="sizeAtt" />
  <ElementDef>
    <AttributeDef resource="#sizeAtt" />
  </ElementDef>
</DCD>

3. The DCD Vocabulary

The (roughly alphabetical) order in which the property descriptions appear is not intended to have any significance.

3.1 Properties which apply to DCDs

In the following descriptions, the phrase "such documents" signifies documents which have been identified as falling under the constraints of the DCD.

3.1.1 AttributeDef

Declares an attribute type which may be provided for one or more elements in such documents. This property does not assert that the attribute is provided for any individual element type; this can only be done with Attribute and AttributeDef properties of ElementDef. However, this property can be used to create an AttributeDef node which can serve as the value of Attribute properties. See discussion above.

An example of the use of AttributeDef:

<DCD>
  <?DCD syntax="explicit"?>
  ...
  <AttributeDef Name="Class" Datatype="string"/>
  ...
</DCD>

3.1.2 Description

Provides a, presumably human-readable, description of the semantics and usage of this DCD. The value of this property must match the production labeled Content in the XML specification; that is to say, it may contain markup, and is well-formed.

3.1.3 InternalEntityDef and ExternalEntityDef

Identify an entity which may be invoked via reference within such documents. The value of these properties must be a Node (in RDF terms), provided in the RDF syntax with subelement or URI. The resource which is the property value must be identified by the class mechanism as an InternalEntityDef or ExternalEntityDef.

An example of the use of InternalEntityDef and ExternalEntityDef:

<InternalEntityDef
  Name="W3C" Value="World Wide Web Consortium" />
<ExternalEntityDef resource='#copyrightNotice' />

3.1.4 Contents

Signals whether elements of types not explicitly declared via ElementDef properties may appear in such documents. The value of this property must be a string whose value is Open or Closed. Closed means that such documents may contain only elements whose types have been declared via ElementDef properties. Open means that such documents may contain elements which have not been so declared.

3.1.5 Namespace

Provides the namespace of this DCD. The value of this property must be a URI which identifies a namespace. This property is required to exist for every DCD.

The namespace of a DCD applies to all elements and attributes attached by properties to this DCD. The idea is that in an instance, the prefix part of a qualified name is used to locate the namespace and schema, and the local name part used to locate the applicable properties in the schema.

An example of the use of Namespace:

<DCD>
  <?DCD syntax="explicit"?>
  <Description>about HTML</Description>
  <Namespace>http://www.w3.org/TR/REC-html40</Namespace>
  <ElementDef Type="B" Model="Data"/>
</DCD>

This declares the namespace for this DCD to be http://www.w3.org/TR/REC-html40. If some XML document indicates that the prefix H refers to the namespace whose namespace name is http://www.w3.org/TR/REC-html40, then references to an element H:B in that document refer to the element defined in the above example using the local name B.

3.2 Properties Which Apply to Element Definitions

[Definition:] In the descriptions, the phrase this type signifies the element definition to which the properties apply.

3.2.1 Attribute and AttributeDef

Identify attributes which may be provided for elements of this type. No element definition may have two Attribute or AttributeDef properties referencing attributes that have the same name.

An example of the use of Attribute and AttributeDef:

<ElementDef Type="IMG">
  <AttributeDef Name="SRC" Datatype="uri"/>
    <Description>The URI where the image may be retrieved</Description>
  <Attribute>BORDER</Attribute>
  <Attribute>SiteMap:HUE</Attribute>
</ElementDef>

In this example, the properties of the Attribute whose name is SRC are declared within the declaration of the IMG element. This would make sense if IMG is the only element for which the SRC attribute applies.

The second attribute, BORDER, has a declaration stored separately, referenced by its name. This declaration style is suitable when such an attribute is applicable to multiple elements; it allows maintaining the declaration in one location.

Finally, the declaration for the third attribute, HUE uses a qualified name and refers to a declaration found in another DCD, whose namespace is identified by the prefix SiteMap. BORDER and HUE must be defined as global attributes in their respective DCDs.

3.2.2 Contents

Signals whether elements of types not explicitly declared via the Group property may appear as children of elements of this type. The value of this property must be a string whose value is Open or Closed. Closed means that this element type is allowed to have children only of types which are declared via the Group property. Open means that this element type may have children of types not declared via the Group property.

Examples of the use of Content:

<ElementDef Type="DT" Model="Data" Content="Closed"/>
  <Description>The term being defined in a DL list item</Description>
<ElementDef Type="DD" Model="Mixed" Content="Open"/>
  <Description>A term's definition in a DL list item</Description>

3.2.3 Datatype

Identifies a specific datatype (in the [XML-Data] sense) which constrains the content of elements of this type. The value of this property must be a string which matches one of an enumerated list of datatypes. See section "4. Datatypes".

The Datatype property is only meaningful if the value of the Model property is Data. That is to say, it is not meaningful to provide a lexical datatype for content which contains substructures.

Examples of the use of Datatype:

<ElementDef Type="Loan">
  <Description>A Bank Loan</Description>
  <Group RDF:Order="Seq">
    <Element>InterestRate</Element>
    <Element>Amount</Element>
    <Element>Maturity</Element>
  </Group>
</ElementDef> 
<ElementDef Type="InterestRate" Datatype="float"/>
<ElementDef Type="Amount" Datatype="int"/>
<ElementDef Type="Maturity" Model="Data" Datatype="dateTime"/>

3.2.4 Default and Fixed

Provides default values for the content of elements of this type, and signals whether any value other than the default is allowed. The value of the Default property must be a string which provides a default value. The only allowed values of the Fixed property are the strings True and False.

The Default value is used in the case that this element type appears as the value of an Element property of some other element type, but an element of that type fails to contain a child of this type.

The Default property is only meaningful if the value of the Model property is Data. That is to say, it is not meaningful to provide a default value for content which contains substructures.

When the Default property is used to give an element type a default value, the presence of the Fixed property with a value of True means that the default value is the only one allowed for this element type. If the Fixed property is not specified it is assumed to have a value of False.

An example of the use of Default:

<ElementDef Type="AirTicketClass" Model="Data" Datatype="char">
   <Default>Y</Default>
</ElementDef>

An example of the use of Fixed:

<ElementDef Type="Namespace" Model="Data" Fixed="True">
  <Default>http://www.w3.org/TR/REC-xml</Default>
</ElementDef>

3.2.5 Description

Provides a, presumably human-readable, description of the semantics and usage of elements of this type. The value of this property must match the production labeled Content in the XML specification; that is to say, it may contain markup, and is well-formed.

An example of the use of Description:

<ElementDef Type="BLINK">
  <Description>A mis-feature which should <em>never</em> be used.</Description>
</ElementDef>

3.2.6 Groups, Occurs and Order

An ElementDef whose Model property has the value Elements must also have a single property named Group, containing a specification of the elements and groups which can appear as children of elements of this type. Groups in turn may have an Occurs property. This can take one of four values.

Required
occurs exactly once
Optional
occurs zero or one times
OneOrMore
occurs one or more times
ZeroOrMore
occurs zero or more times

The default is Required.

A group declares individual elements and other groups which may occur as children of groups of this type. The order of occurrence of the children is declared using the RDF collection ordering facility via the proposed RDF:Collection attribute. Legal values are Seq, in which case children must occur in the specified order, or Alt in which case only one of the specified children may appear. The default is Seq. See section "1.3 Future Work".

An example of a simple element declaration:

<ElementDef Type="person" Model="Elements" >
  <Group RDF:Order="Seq">
    <Element>FirstName</Element>
    <Group Occurs="Optional"><Element>MI</Element></Group>
    <Element>LastName</Element>
  </Group>
</ElementDef>

Here is a more complete example with attribute and element specifications:

<ElementDef Type="employee" Model="Elements" Content="Closed">
  <AttributeDef Name="employment" Occurs="Required" Datatype="enumeration">
     <Values>Temporary Permanent Retired</Values>
  </AttributeDef>
  <Group RDF:Order="Seq">
    <Element>FirstName</Element>
    <Group Occurs="Optional"><Element>MI</Element></Group>
      <Element>LastName</Element>
      <Group Occurs="OneOrMore" RDF:Order="Alt">
         <Element>Street</Element><Element>PO-Box</Element>
      </Group>
      <Group RDF:Order="Seq">
        <Element>Telephone</Element>
        <Element>Salary</Element>
      </Group>
    </Group>
  </Group>
</ElementDef>

3.2.7 Max, Min, MaxExclusive, MinExclusive

Provide, respectively, upper and lower bounds on the content of elements of this type. Max and Min allow values upto and including the bound while MaxExclusive and MinExclusive allow values less than and greater than the bound, respectively, The semantics of upper and lower bounding are highly dependent on the element's Datatype; for some datatypes (e.g. uri), this property has no meaning.

If an element has no Datatype, then Max, Min, MaxExclusive and MinExclusive values are treated as strings, and tests for upper and lower bounding are performed according to the language specification collation rules defined in Chapter 5.15 of the Unicode standard.[Unicode].

The Max, Min, MaxExclusive and MinExclusive properties are only meaningful if the value of the Model property is Data. That is to say, it is not meaningful to provide upper or lower bounds for content which contains substructures.

Examples of the use of Max and Min:

<ElementDef Type="MonthOfYear" Model="Data" Datatype="int"
  Max="12" Min="1" />

3.2.8 Model

Indicates which of five broad classes of constraints apply to the content of elements of this type. The value of this property must be a string whose value is one of Empty, Any, Data, Elements, or Mixed. The meanings are:

Empty
Elements of this type must have no content.
Any
Elements of this type may contain text and child elements of any declared type.
Data
Elements of this type contain text, but must not contain any child elements.
Elements
Elements of this type contain only child elements, optionally separated by white space. The types of the child elements that may appear are controlled by the Group and Element properties.
Mixed
Elements of this type may contain text and embedded child elements. The types of the child elements that may appear are controlled by the Element property.

The default is Data.

Examples of the use of Model:

<ElementDef Type='IMG' Model='Empty' />
<ElementDef Type='BODY' Model='Any '/>
<ElementDef Type='DT' Model='Data' />
<ElementDef Type='DL' Model='Elements' />
<ElementDef Type='P' Model='Mixed' />

3.2.9 Root

Element definitions can have a Root property that indicates whether an element of that type can serve as the root of a conforming document. Allowed values are True and False. The default is False.

If no element definition in a DCD has a Root="True" property, then an element of any type that is allowed to appear in such documents may serve as the root element. If multiple element definitions have Root="True" then any element of one of those types can appear as the root of a conforming document.

An example of the use of Root:

<DCD>
  <?DCD syntax="explicit"?>
  <Description>DCD for an email message</Description>
  <ElementDef Type="EMail" Root="True">
   ... declarations ...
  </ElementDef>
  <ElementDef Type="Head">
   ... declarations for Head ...
  </ElementDef>
  <ElementDef Type="Body" Model="Data"/>
</DCD>

3.2.10 Type

Gives the type of the element. This property is required to be present for every Element resource in DCD. The value of this property must be a Name in the XML sense. Furthermore, it must be an NCName as defined in [XML Namespaces]; that is to say, it may not contain a prefix or a colon.

As discussed earlier, the Type property for element definitions must be unique within the DCD. But, see Appendix "A. Local Element Definitions".

3.3 Properties Which Apply to Attribute Definitions

The following properties which apply to attribute definitions or attribute types have the same names as, and are identical in effect to, the corresponding properties of element types: Datatype, Default, Description, Max, Min, MaxExclusive, MinExclusive and Fixed.

3.3.1 Global

Indicates whether the Name property of this attribute must be unique in the DCD, and thus can serve as an address for this attribute definition. The possible values are True and False. The default is False.

An example of the use of Global:

<DCD>
  <?DCD syntax="explicit"?>
  <AttributeDef Name="CLASS" Global="True">
    <!-- facts about the CLASS attribute -->
  </AttributeDef>
</DCD>

3.3.2 ID-Role

Signals that the attribute has unique identifier or unique ID pointer semantics. The value of this property must be a string whose value is one of ID, IDREF, or IDREFS. The effect of each of these values is the same as if the attribute had been declared, in an XML DTD, with the attribute type of the same name.

An example of the use of ID-Role:

<ElementDef Type="A">
  <AttributeDef>
    <Name>NAME</Name>
    <ID-Role>ID</ID-Role>
  </AttributeDef>
</ElementDef>

3.3.3 Name

Gives the name of the attribute. This property is required to be present for every Attribute resource in DCD. The value of this property must be a Name in the XML sense. Furthermore, it must be an NCName as defined in [XML Namespaces]; that is to say, it may not contain a prefix or a colon.

As discussed earlier, the Name property for attribute definitions that have Global="True" must be unique within the DCD.

3.3.4 Occurs

Indicates whether the presence of the Attribute is required. This can take one of two values.

Required
occurs exactly once
Optional
occurs zero or one times

The default is Optional.

3.4 Properties Which Apply to Internal Entity Definitions

3.4.1 Name

Gives the name by which the entity may be invoked. This property is required to be present for every InternalEntity definition resource in DCD. The value of this property must be a Name in the XML sense. Furthermore, it must be an NCName as defined in [XML Namespaces]; that is to say, it may not contain a prefix or a colon.

3.4.2 Value

Provides the replacement text for the internal entity. The value of this property must match the production labeled Content in the XML specification; that is to say, it may contain markup, and is well-formed.

An example of the use of Value:

<InternalEntityDef>
  <Name>Warning</Name>
  <Value>Entity text <em>can</em> contain markup; references (e.g. &copy;)
      will in general be expanded unless protected, e.g. &amp;copy;</Value>
</InternalEntityDef>

3.5 Properties Which Apply to External Entity Definitions

3.5.1 Name

Gives the name by which the entity may be invoked. This property is required to be present for every ExternalEntity definition resource in DCD. The value of this property must be a Name in the XML sense. Furthermore, it must be an NCName as defined in [XML Namespaces]; that is to say, it may not contain a prefix or a colon.

3.5.2 PublicID

Provides a public identifier for the entity. This is a string whose syntax (see PublicID) and semantics are exactly as described in the XML specification.

3.5.3 SystemID

Provides a system identifier for the entity. This is a string whose syntax and semantics are exactly as described in the XML specification.

The SystemID property must be provided for every ExternalEntity resource in DCD.

4. Datatypes

4.1 Datatype Specifications

A number of datatypes are specified in this section. These are modeled after the datatypes supported by [SQL] and modern programming languages. Attributes and element types whose Model property has the value Data can constrain their values/contents to be instances of a particular datatype. XML 1.0 defines about 10 datatypes, which may only be used to constrain attribute values, and essentially one datatype, PCDATA, that can be used for element content. Here we propose a much richer set of datatypes, applicable equally to attribute and element content.

The specifications in this section serve a number of purposes:

Datatypes are referenced from the datatype namespace. In order to use this namespace in a schema, it must be declared. Some datataypes require that additional properties be specified. For example, length and precision for decimal, length for char and legal values for enumeration. These should be specified as additional properties of the element or attribute being defined. See the final example in "3.2.6 Groups, Occurs and Order".

The DCD primitive datatypes are tabulated below.

Name Examples Parse type
id X XML ID
idref X XML IDREF
idrefs X Y Z XML IDREFS
entity Foo XML ENTITY
entities Foo Bar XML ENTITIES
nmtoken Name XML NMTOKEN
nmtokens Name1 Name2 XML NMTOKENS
enumeration
Legal values must be specified.
Red Blue Green XML ENUMERATION
notation GIF XML NOTATION
string Give me liberty or give me death! pcdata
number 15, 3.14, -123.456E+10 A number, with up to 31 digits.
May optionally have a leading sign,
fractional digits, and exponent.
Punctuation as in US English. Leading and
trailing blanks are removed before converting
a number specified as as string.
Similarly, leading and trailing
zeroes are removed.
int 1, 58502, -13 A number, with optional sign,
no fractions, no exponent.
fixed or decimal
Precision and scale must be specified.
12.0044 Precision is the total number
of digits. It may range from 1 to 31.
Scale is the number of digits to
the right of the decimal point and
must be less than or equal to the precision.
boolean 0, 1 (1=="true") "1" or "0"
dateTime 2088-04-07T18:39:09 A date in a subset of ISO 8601
format, with optional time and no optional zone. Fractional seconds may
be as precise as nanoseconds.
dateTime.tz 2088-04-07T18:39:09-08:00 A date in a subset ISO 8601
format, with optional time and
optional zone. Fractional seconds
may be as precise as nanoseconds.
date 2094-11-05 A date in a subset ISO 8601 format.
(no time)
time 08:15:27 A time in a subset ISO 8601
format, with no date and no time zone.
Fractional seconds may be as precise
as nanoseconds.
time.tz
08:1527-05:00
A time in a subset ISO 8601
format, with no date but optional
time zone. Fractional seconds may be
as precise as nanoseconds.
interval
2088-04-07T18:39:09
A time interval which may
have year, month, day, hour, minute
and second fields. Fractional seconds
may be as precise as nanoseconds.
i1, byte
1-byte integer
1, 127, -128 A number, with optional sign,
no fractions, no exponent.
i2
2-byte integer
1, 703, -32768 "
i4, int
4-byte integer
1, 703, -32768,
148343, -1000000000
"
i8
8-byte integer
1, 703, -32768,
1483433434334,
-1000000000000000
"
ui1
unsigned 1-byte integer
1, 255 A number, unsigned, no
fractions, no exponent.
ui2
unsigned 2-byte integer
1, 255, 65535 "
ui4
unsigned 4-byte integer
1, 703, 3000000000 "
ui8
unsigned 4-byte integer
1483433434334 "
r4
.31415E+1 Real number ranging from -3.402E+38 to
-1.175E-37 or from 1.175E-37 to 3.402E+38
r8 .314159265358979E+1 Real number ranging from -1.79769E+308
to -2.225E-307 or from 2.225E-307 to 1.79769E+308
fixed.14.4 1.95 A number with 14 digits to the left of the
decimal point and 4 digits to the right of the
decimal point. Convenient for representing
monetary values.
uuid 333C4-460F-11D0-BC04-0080CA83 Hexadecimal digits representing
octets. Optional embedded hyphens are
allowed but ignored during conversion.
uri urn:schemas-microsoft-com:Office9
http://www.ics.uci.edu/pub/ietf/uri/
Universal Resource Identifier
bin.hex
Length may be specified. Default is unlimited.
Hexadecimal digits representing octets
bin.base64
Length may be specified. Default is unlimited.
MIME style Base64 encoded binary blob.
char
Length may be specified. Default is 1.
char Character string, n characters long
picture
Picture must be specified.
999-99-9999 Constraint for validating strings.
See note below.

4.2 Datatypes in instances

The datatypes defined in "4. Datatypes" can also be used in instance datatype specifications as described in XML-Data [XML-Data]. For example:

<conversionRate DCD:dt="float">1.4172</conversionRate>

This provides the benefit of datatype support to well-formed documents that may not have an associated DTD or DCD. It is expected that XML parsers would provide assistance in encoding and decoding these datatypes.

4.3 Picture Constraints

"Pictures", similar to those in [COBOL] picture clauses, can be used to constrain the format of strings and in some cases control their conversion to numbers. A picture is an alphanumeric string consisting of character symbols. Each symbol, which is usually one character but may be two characters, is a placeholder that stands for a set of characters. For example, the picture "A" stands for a single alphabetic character.

The following is a list of picture symbols and their meanings.

A
A single alphabetic character.
B
A single blank character.
E
The character E, used to indicate floating point numbers.
S
The leftmost character of a picture indicating a signed number. The characters "+" or "-" may appear in the S position.
V
An implied decimal sign. The input 1234 validated by a picture 99V99 is converted into 12.34.
X
Any character.
Z
The leftmost leading numeric character that can be replaced by a space character when the content of that content position is a zero.
9
Any numeric character.
1
Any boolean character (0 or 1).
0,/,-,., and ,
represent themselves.
cs
The currency symbol.

Here are some examples of picture constraints

  $123,45.90 satisfies picture $999,99.99 
  $123,45.90 satisfies picture XXXX,XX.XX
  123-45-5678 satisfies picture 999-99-9999 (Social Security Number)
  24E80 satisfies picture 99E99 (floating point)
  23.45 satisfies picture 99.99
  2345 satisfies picture 99V99 (translates to 23.45)  


Material in the appendices represents issues that are still under discussion. This material should be considered for inclusion in a later version of DCD.

Appendices

A. Local Element Definitions

The specifications in this document only allow elements to be defined as properties of the DCD. A useful future direction may be to allow element definitions within the context of another element definition. Element definitions may be local or global. Global element definitions must have a Type property that is unique in the DCD and can be referred to by name in other definitions. Local element definitions can be used within the containing definition and can be referred to in other definitions by a resource identifier as described for attribute definitions in "2.3 Referring to Elements and Attributes".

For example, in the following, FirstName, MI and LastName are defined elsewhere in the DTD but Address comes from a namespace declared with the common prefix. The Telephone element is defined locally within the person definition.

<ElementDef Type="person" Model="Elements" >
  <Group RDF:Order="Seq">
    <Element>FirstName</Element>
    <Group Occurs="Optional"><Element>MI</Element></Group>
    <Element>LastName</Element>
    <Element>common:Address</Element>
    <ElementDef Type="Telephone" Datatype="string"/>
  </Group>
</ElementDef>

B. Inheritance and Subclassing

An element type may be declared to re-use the content model declarations of other element types through the use of the extends property. This property effectively replaces itself with the entire content model of the element type it names. For example:

<ElementDef Type="polygon" Model="Elements">
  <AttributeDef Name="n" Occurs="Required"/>
  <AttributeDef Name="Regularity"/>  
  <Element>diagonals</Element>
</ElementDef> 

<ElementDef Type="regularPolygon" Model="Element">
  <AttributeDef Name="regularity" Occurs="Required">
     <Default>regular</Default>
  </AttributeDef>
  <Element>side</Element>
  <Extends Type="polygon"/>
</ElementDef>

A legal instance of regularPolygon (in this case an empty equilateral triangle 3mm on a side) might be:

<regularPolygon n="3">
  <side><dimension unit='mm'>3</dimension></side>
  <diagonals/>
</regularPolygon>

Using extends also allows instances of the extending element type to occur anywhere the extended type is allowed. In the above example this means that any content model that allows polygon will also now allow regularPolygon. Furthermore, attributes declared on the extended element type may also occur on the extending element type, so in the example n can, in fact must, now appear on regularPolygon. For example, if in addition to the above example we have:

<ElementDef Type="picture">
  <Group Occurs="OneOrMore">
    <Element>polygon</Element>
  </Group>
</ElementDef>

then the following is a valid schema:

<picture>
  <polygon n="3" regularity="irregular">...</polygon>
  <regularPolygon n="3">...</regularPolygon>
</picture>

Note that in the above examples, Element declarations occur directly within an ElementDef without an enclosing Group. We allow this to facilitate inheritance. The Element declaration opens a default Group. In fact, Element extends Group and inherits its properties.

We restrict the use of extends to cases where the merger of the two content models involved is straightforward.

  1. Either the extended element type must have Content="Open" or the extending element type must have no content at all, either explicit or inherited.
  2. If the extending element type has explicit content, the values of the order attribute must be consistent. The following table shows all the allowed values (if the extended element type has order with value Alt, no extension is possible):
    Extended Extending
    Seq Seq
    Bag Bag; Seq
    Alt Alt
  3. The values of the content attribute must be consistent, as follows:
    Extended Extending
    Empty Empty
    Data Data; Empty
    Elements Elements
    Any; Mixed Any; Mixed; Data; Elements
  4. Allowed attributes and datatype constraints (see "4. Datatypes") are cumulative, that is, all apply. Attributes of the same name are merged: the only difference allowed is that an attribute in the extending declaration may provide and/or require a default where the extended declaration does not. Multiple datatype constraints, whether for content or for an attribute, must be intelligibly combinable, (see "4. Datatypes").

Consistent with the above remark about the extending element type being allowed anywhere the extended one is, the guiding principle is that anything allowed by the extending declaration would also be allowed by the extended one if the tag was changed. That is, the extending type is polymorphic to the extended type. Thus, if we rename regularPolygon to polygon in the first example above, we get a schema-valid polygon:

<polygon n="3">
  <side><dimension unit='mm'>3</dimension></side>
  <diagonals/>
</polygon>

It's legal as a polygon, because it has everything a polygon requires (n attribute, diagonals sub-element), and the side sub-element is permissible because polygon has, by default, open Content.

Note that a single ElementDef can contain multiple extends. This does not cause ambiguity -- effectively, the extended content model is dropped in as a group in the relevant place in the extending model.

C. Null Values

For several situations, especially in mapping data from a database into XML, we need to handle the case where the value is not specified. This is different from a numeric value being zero or a string being empty.

If the element or attribute is not Required then it can just be omitted. If it is Required or if it has a default value then it is desirable to be able to indicate that its value in the database was undefined. This can be done by defining a special attribute to signal this condition. If an element is involved then the special attribute is an attribute of the element. If an attribute is involved it is another attribute of the parent element. In either case, the special attribute takes one of two values "True" or "False".

Consider the case of a required Salary element. A missing Salary element would be appear as:

<Employee>
    ...
  <Salary DCD:null="True"/>
</Employee>

If Salary was a required attribute on, say, an Employee element then we would need to define another attribute on Employee called, say, Salary_null.

If the element or attribute had a default value the value would appear along with the null attribute with a True value.

Similarly, special attributes can be defined to indicate errors in data conversion

D. Unique Values

In current XML, the ID attribute type is unique within a document. Unique attribute and element types are very important and should be extended to any named attribute and element type with the ability to specify the scope of the uniqueness. For elements, uniqueness specification applies only if the model type is Data i.e. it does not apply to elements that have structure. Particular implementations can use unique element and attribute types to define keys to speed up searches.

Essentially, when defining an attribute type we can specify that it's value is unique within a particular element type.

<AttributeDef UniqueIn="Company" Global="True"
  <Name>SerialNumber</Name>
  <Datatype>int</Datatype
</AttributeDef>

Company is the name of an element type defined within the DTD. This specifies that the SerialNumber attribute is unique within Company elements in documents conformant with this DTD. The default value of the UniqueIn Attribute is "null" which signifies the entire document. Thus, the default behavior is the current XML behavior.

E. References

COBOL
COBOL Standard. See http://www.dkuug.dk/jtc1/sc22/wg4/
SQL
SQL Standard. See http://www.jcc.com/sql_stnd.html.
RDF
RDF Model and Syntax. See http://www.w3.org/TR/WD-rdf-syntax.
Unicode
Unicode Standard. See "The Unicode Standard, Version 2.0", Reading Mass., Addison-Wesley Developers Press, 1996
XML-Data
XML-Data. See http://www.w3.org/TR/1998/NOTE-XML-data-0105/.
XML Namespaces
Namespaces in XML. See http://www.w3.org/TR/WD-xml-names.

F. Acknowledgements

This work is totally dependent on the whole lineage of metadata thinking in the World Wide Web Consortium. This specification has benefited greatly as a result of input from David Fallside and David Singer, both of IBM, Andrew Layman and Jean Paoli both of Microsoft, and from Lauren Wood of SoftQuad. We also wish to thank Henry Thompson of the University of Edinburgh and all the authors of the XML-Data specification [XML-Data].