This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.
This document is a submission to W3C from Veo Systems Inc.. Please see Acknowledged Submissions to W3C regarding its disposition.
Comments on this document should be sent to schema@veosystems.com.
This document proposes a schema facility, Schema for Object-oriented XML (SOX), for defining the structure, content and semantics of XML documents to enable XML validation and higher levels of automated content checking. The SOX proposal is informed by the XML 1.0 [XML] specification as well as the XML-Data submission [XML-Data], the Document Content Description submission [DCD] and the EXPRESS language reference manual [ISO-10303-11].
SOX provides an alternative to XML DTDs for modeling markup relationships to enable more efficient software development processes for distributed applications. SOX also provides basic intrinsic datatypes, an extensible datatyping mechanism, content model and attribute interface inheritance, a powerful namespace mechanism, and embedded documentation. As compared to XML DTDs, SOX dramatically decreases the complexity of supporting interoperation among heterogenous applications by facilitating software mapping of XML data structures, expressing domain abstractions and common relationships directly and explicitly, enabling reuse at the document design and the application programming levels, and supporting the generation of common application components
A SOX document, or schema, is a valid XML document instance according to the SOX DTD, that represents a complete XML DTD-like structure. It has a document root element, and a representation of syntax that one would expect from a complete DTD, symbolically generated through the XML document instance.
"In SGML the 'DTD' defines, for an SGML element, what possible other elements may be nested inside it. For example, in an invoice, it may specify that the signing authority must be either Tom or Joe. It may specify that an item can be any part number or any accessory number or any book number. Checking the SGML validity of a document is a process which can be done automatically from the DTD. This is a check at a certain low level in that it does not verify semantic correctness, only structural correctness. But the structural constraints alone are useful in many ways. For example, a user interface for constructing a document can be generated automatically from the structural constraints.
"We plan to introduce more powerful languages for describing not only the structure of a document, but the semantics to an extent that not only can checking be automated to a higher level, but also so can the processing of a document and reasoning about its contents be automated. ..." From Web Architecture: Extensible Languages [WEBARCH-EXTLANG], Tim Berners-Lee and Dan Connolly
Automated processing of business documents in large-scale electronic commerce environments requires rigorous definition of the document structure, content and semantics to enable efficient software development processes for distributed applications. XML offers the Document Type Definition (DTD) as a formalism for defining the syntax and structure of XML documents. However, experience has shown that XML DTDs are not sufficient to specify content or semantics. Moreover, the fact that XML DTD syntax is incompatible with XML document syntax increases the complexity of supporting interoperation among heterogenous applications. Therefore, a schema facility is required to enable XML validation and higher levels of automated content checking by facilitating software mapping of XML data structures, supporting the generation of common application components, and enabling reuse at the document design and the application programming levels.
Schema for Object-oriented XML (SOX)is now being proposed not only as an XML instance replacement syntax for SGML [8859-1]and XML [XML] document type definitions, but a modelling language for information modeling itself. Information modeling is the domain, and therefore the domain-specific constructs provided are those which aid in that task. SOX provides intrinsic datatypes, an extensible datatyping mechanism, content model and attribute interface inheritance, a powerful namespace mechanism, and embedded documentation.
SOX documents can be operated on by a SOX processor to produce many different types of output targets. Transformation of SOX documents will yield XML DTDs and object-oriented language classes to facilitate the develpopment of intelligent applications, such as those needed to perform electronic commerce, for example. Other output targets of a schema include documentation derived from the documentation-based elements in SOX itself, and user interface components. Further output targets are yet to be defined, but the inherent flexibility of this schema language allows for many other options.
This submission is a collaborative work based on implementation experience at Veo Systems Inc., bringing together experts from the complementary disciplines of electronic commerce, markup languages, formal language theory, SGML systems development, and distributed software systems development. The development of this schema language was begun by Murray Maloney in December, 1997 to satisfy the need for a single XML-based language capable of expressing sufficient information to define simple or complex data definitions, structures and formats, universally usable names or identifiers, and documentation. In early 1998, Matt Fuchs began implementing a processor to derive DTDs, documentation, and programming language interfaces from Common Business Library schemas defined by Terry Allen. Based on experience gained building a processor that generates Java beans from SOX documents, and also his earlier work at Disney and New York University, Matt Fuchs invented many object-oriented extensions that make the SOX inheritance features possible. Alex Milowski suggested the concept of parameterized element types and a syntax for encoding this concept that led to a further refinement of the object-oriented extensions. Terry Allen's practical experience creating schemas fed back into an ongoing refinement. The software development team at Veo, inspired by CTO Bart Meltzer, provided critical feedback.
The result, Schema for Object-oriented XML (SOX), is now offered to the WorldWide Web Consortium (W3C) as a formal submission. We trust that you will deem it worthy of consideration and deliberation in the XML Activity's upcoming round of working drafts on schemas, namespaces, data models, and datatypes.
The goals of SOX are:
The requirements for SOX are:
SOX is more expressive than XML DTDs in the following critical areas:
SOX provides for parameterized base element types that can be used to build a foundation of regular patterns in your SOX documents. It allows you to create fully parameterized and complex types to describe the storage patterns that best suit your information-based applications. You can define patterns such as tuples and triples, tabular and columnar data, business documents, indexes, bibliographies. Parameterization allows you to reuse the structure with different content model atoms in another document. Extending base element types allows you to add attributes or further specialize the attribute datatype, enumeration and presence. Code reuse on extended base element types is much higher than without.
SOX offers an extensive and extensible set of datatypes that may be applied to data content elements and attribute types. The purpose of datatypes is to provide a contract between parties as to the constraints that are applicable to data content in elements and attribute values. These constraints may be used by a content validation engine, prior to dispatch or upon receipt of an XML document or by user interface methods.
There are three varieties of datatypes in SOX documents: scalar datatypes, enumerated datatypes and format datatypes. Scalar datatypes are derived from the basic number datatype, and support specification of the number of digits and decimal places, minimum and maximum value range, and a mask. An enumerated datatype may be derived from any of the intrinsic datatypes, and may specify an enumeration of valid values. A format datatype may be derived from any of the intrinsic datatypes, and must specify a mask.
SOX provides an extended list of intrinsic datatypes for attributes. Datatype extensibility is built upon a basic set of datatypes (binary, boolean, char, date, number, string, time) commonly used in many programming environments. An extensive list of intrinsic datatypes includes derivations of the intrinsic datatypes, including specializations of numbers, dates and strings. User-defined datatypes may be defined by specifying a base datatype, scale parameters or an enumeration of values, and a lexical format.
See Datatypes, Datatype masks and Datatype library for complete details.
Definitions provide for accompanying documentation through the intro and explain elements. Permitted within these two element types is a collection of familiar and easy to use HTML [HTML-4] element types. Anybody who writes HTML today will be able to write SOX documentation. Moreover, the application W3C's page authoring guidelines [WAI-PAGEAUTH] for HTML facilitates accessibility. The importance of the embedded documentation technique, or Literate Programming, must not be overlooked. In the right hands, this technique can be used for design, implementation and testing in both rapid prototyping and large-scale development projects.
SOX offers a reduced and restricted subset of the types of entity functionality that is offered by XML. The requirements for XML-style parameter entities have been addressed, either through specializations of some XML entity capability in distinct element types, or by introduction of new language features (such as element and attribute inheritance, enumerations and datatypes), that obviate the need for XML-style parameter entities. In particular, the parameter definitions are constrained to contain only a content model atom. The parameter and paramref element types can be used to define and reference a content model fragment, in much the same way that a parameter entity might be used.
Parsed and unparsed entities may be defined in SOX documents. Entity support is provided to enable simple mapping to/from XML DTDs. There is, however, some question as to the value and life expectancy of the XML unparsed entity approach, which uses a baroque and indirect definition and reference mechanism.
In SOX, any attribute or datatype may provide an enumerated list of values for selection. XML provides enumerated lists only for NMTOKEN attributes.
In SOX, URIs, URLs, and URNs are provided as a matter of course. Support for HTML anchors and XML Linking is facilitated through inheritance and specialization of hypertext attributes that can be conveniently arranged in neat attribute interfaces.
In SOX, element types may inherit their content models and attribute definitions directly from another named element type. An element type may also inherit and extend an attribute list. Specialization of attribute definitions allows refinement and restriction of attribute datatype, enumeration list and default value. Additionally, an attribute value may be defined to be inherited from the identically named attribute on a parent or older ancestor element. Thus, for example, namespaces can be inherited from superordinate elements.
The SOX namespace is fully and precisely defined. Objects from any identifiable namespace may be used in building a SOX document. That is, any element, attribute, datatype, enumeration, entity, interface, notation, parameter, or processing instruction may be imported from any namespace.
Pending future developments in the evolution of XML and its related specifications, this submission does not include several important features that were deemed desirable.
The terminology used to describe SOX documents is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a SOX processor:
This section describes the basics of SOX. Before getting into the technical details, we examine a SOX document example. This is a high-level view of what a SOX document looks like and what the building blocks are. Some terminology and definitions will be introduced here.
The example presented here is a memorandum. This example was chosen because most people are familiar with the components of a memorandum. Most readers will be better able to understand the concepts that are being presented if they are not burdened by having to also try to understand the document type that is being modelled.
A version of this example without annotations is available in a non-normative appendix.
<schema name="memo" namespace="http://www.veosystems.com/schemas/memo.xml"> <h1>Memo Document Type</h1>
Every SOX document begins with the root element schema, and a top-level heading that provides a title for the SOX document. The schema element may be used to establish the namespace identifier for a SOX document.
<h2>Definitions</h2> <intro> <p>...</p> <ul> <li>...</li> <li>...</li> </ul> </intro>
Lower-order headings with intro elements may be interspersed among the SOX document's defining elements to provide bridging titles and an introduction. Aside from a handful of custom elements, SOX documentation uses familiar HTML element types for convenience in cut/paste operations, training of average designers and engineers, and straightforward conversion from SOX documents to HTML documents.
The rules for including documentation in SOX documents are fairly simple:
<h3>Memo element type</h3> <elementtype name="memo"> <explain> <title>Memo Document</title> <synopsis>A simple, useful memo.</synopsis> <help> <p>Fill in attributes, enter paragraphs, lists and images, and press SEND.</p> </help> <p>A memo consists of six required fields and a body.</p> </explain>
Here, we are defining an element type whose name is "memo".
<model> <sequence> <element name="to"/> <element name="from"/> <element name="cc"/> <element name="subject"/> <element name="file"/> <element name="date"/> <element name="body"/> </sequence> </model> </elementtype>
Our memo's content model is a sequence of seven subordinate elements. As you can see in the following fragment, the model of the majority of these elements types simply contain text strings.
<h3>Memo fields</h3> <elementtype name="to"> <model><string/></model> </elementtype> <elementtype name="from"> <model><string/></model> </elementtype> <elementtype name="cc"> <model><string/></model> </elementtype> <elementtype name="subject"> <model><string/></model> </elementtype>
But notice here that the specification of the file and date elements' string content are slightly different. The file element's string content specifies its datatype attribute to be number. That means that the value must be a number. The datatype of the date element's string content is specified to be a date. That means that the content must match the datatype definition for calendar dates.
<elementtype name="file"> <model><string datatype="number"/></model> </elementtype> <elementtype name="date"> <model><string datatype="date"/></model> </elementtype>
The body of the memo is a bit richer, allowing a choice of paragraphs, lists and images. The value of the occurs attribute specifies the minimum and maximum occurrence, or number of times, that the choice group may be used.
<elementtype name="body"> <model> <choice occurs="1,*"> <element name="p"/> <element name="list"/> <element name="image"/> </choice> </model> </elementtype>
The content model of a paragraph is simply a string.
<elementtype name="p"> <model> <string/> </model> </elementtype>
Getting a bit more creative, and a bit proscriptive, we require that a list have at least three items and no more than nine. (This is a fairly typical editorial style rule in many organizations. We are just using it here to demonstrate.)
<elementtype name="list"> <model> <element name="item" occurs="3,9"/> </model> </elementtype>
Now, we want to an instance of an item to be just like an instance of a paragraph, so we say so to make it so.
<elementtype name="item"> <instanceof name="p"/> </elementtype>
The image is an empty element, and the required value of its src attribute must be a URI according to the datatype.
<elementtype name="image"> <empty/> <attdef name="src" datatype="URI"> <required/> </attdef> </elementtype>
</schema>
Even if you can already see that simple things are fairly simple to do, and you thought that you had seen enough to sell you on the virtues of writing SOX documents, there is in fact much more in the Schema for Object-oriented XML. Read on!
In SOX documents, element type definitions reproduce the expressiveness of XML element type declarations using explicit element and attribute markup. An element type may be defined, as shown in this example, by using the elementtype element with the required name attribute, and a subordinate model, instanceof or extends elements:
<elementtype name="inline"> <model> <string/> </model> </elementtype>
A mechanism for attaching attributes to an element type is described later in Attribute definitions.
The name of an element type may be any valid unqualified XML element type name. The name must be unique among the names of element types defined in the current SOX document.
An element type may be referenced by the element, extends and instanceof elements. Provision for namespace qualification of element type references is discussed in Names and namespaces. The local part of an element type name is specified when defining an element type.
It is a fatal error to re-assign an element name, or to reference an element that has not been defined.
The content model of an element type defines the structure and composition of an element of that type in an XML instance. The definition of a content model in SOX documents extends the expressiveness of that in XML DTD by providing greater specificity of the minimum and maximum number of times some content model atom may be repeated. This allows a schema designer more precise control than that offered by XML's *, ? and + occurrence indicators.
In the following example, the definition of the content model for a list element type specifies that it contains a minimum of 3 and a maximum of 9 item elements.
<elementtype name="list"> <model> <element name="item" occurs="3,9"/> </model> </elementtype>
In the following example, the b element type's content model is simply string content.
<elementtype name="b"> <model> <string/> </model> </elementtype>
In this example, the size element type's content model is string content that is constrained to be an int.
<elementtype name="size"> <model> <string datatype="int" /> </model> </elementtype>
In this example, the postcode element type's content model is string content that is constrained to match the mask (e.g., L1W 3K6)
<namespace name="canada" namespace="www.canadapost.ca/schemas/postcodes.xml" /> <elementtype name="postcode"> <model> <string> <mask>A#A #A#</mask> </string> </model> </elementtype>
In this example, the conference element type's content model is string content with a default value of "XML Developers' Days". A SOX processor must provide support for default, inherited, and fixed presence elements when modelling a string. This feature is useful for data entry applications such as a program for forms entry or a text editor. Such an application could insert the default, inherited or fixed value for the form field or when the element is inserted within a document.
<namespace name="gca" namespace="www.gca.org/schemas/xmldevdays.xml" /> <elementtype name="conference"> <model> <string> <default>XML Developers' Days</mask> </string> </model> </elementtype>
In this example, the p element type's content model is mixed content.
<elementtype name="p"> <model> <mixed> <element name="a"/> <element name="b"/> <element name="i"/> </mixed> </model> </elementtype>
Note: Even though mixed content consists of string and element content, the string element is not mentioned in the mixed content model. This is partly an optimization and largely a constraint to prevent inadvertent specification of a datatype, mask and presence for string content within mixed content. The implications of such a combination are unclear, so it is best avoided.
In this example, the dl element type's content model specifies that dt or dd elements are allowed any number of times.
<elementtype name="dl"> <model> <choice occurs="*"> <element name="dt"/> <element name="dd"/> </choice> </model> </elementtype>
In this example, the dl element type's content model specifies that dt followed by dd is allowed any number of times.
<elementtype name="dl"> <model> <sequence occurs="*"> <element name="dt"/> <element name="dd"/> </sequence> </model> </elementtype>
In this example, the dl element type's content model specifies that a dh is followed by two or more dt or dd elements.
<elementtype name="dl"> <model> <sequence> <element name="dh"/> <choice occurs="2,*"> <element name="dt"/> <element name="dd"/> </choice> </sequence> </model> </elementtype>
A content specification of any or empty may be used, rather than a content model, in an element definition. In that case, the model element is not required.
In the following example, the any content specification indicates that the HTML element may contain any combination of string content and any element that is defined in the schema.
<elementtype name="HTML"> <any/> </elementtype>
In the following example, the empty content specification indicates that the BR element may not contain any content.
<elementtype name="BR"> <empty/> </elementtype>
In the following example, first the inline element is defined, then the emphasis and strong elements inherit their definitions from inline.
<elementtype name="inline"> <model><string/></model> </elementtype> <elementtype name="emphasis"> <instanceof name="inline"/> </elementtype> <elementtype name="strong"> <instanceof name="inline"/> </elementtype>
In the following example, the a element extends the previously defined inline with an attribute definition
<elementtype name="a"> <extends name="inline"> <attdef name="href" datatype="uri"> <required/> </attdef> </extends> </elementtype>
Parameters may be scoped to a base element type or to the namespace.
In this example, the p1 parameter is defined as an element content model atom. The p element contains a parameter reference to p1 in its content model. The effect of this is that the element atom from the parameter is substituted for the parameter reference, and the content model includes the a element.
<parameter name="p1"> <element name="a" /> </parameter> <elementtype name="p"> <model> <mixed> <element name="emphasis"/> <element name="strong"/> <paramref scope="namespace" name="p1"/> </mixed> </model> </elementtype>
A parameterized base element type is an element type whose content model contains element-scoped parameter references. Such an element type cannot be instantiated and must be extended to be useful. Also note that an element that is based on a base element type must define all of its parameters; failure to do so is a fatal error.
In this example, block is a base element type. It's mixed content model may contain emphasis and strong elements. When the base element type is extended, the defined value of the element-scoped parameter, p1, replaces the parameter reference.
<elementtype name="block"> <model> <mixed> <element name="emphasis"/> <element name="strong"/> <paramref name="p1" scope="element"/> </mixed> </model> </elementtype> <elementtype name="p"> <extends name="block"> <parameter name="p1"> <element name="a" /> </parameter> </extends> </elementtype>
Attribute definitions in SOX documents may be defined as part of the element type definition. An attribute definition has a name and a type, and must include a presence element.
<elementtype name="image"> <empty/> <attdef name="id" datatype="ID"> <implied/> </attdef> </elementtype>
An attribute's name must be unique among the attributes of its host element type or interface. It is a fatal error to attempt to re-assign an attribute name within its respective scope, except when specializing the attribute.
The attribute's datatype may be any valid XML attribute type (ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION), any extended attribute type (ATTRIBUTE, DATATYPE, ELEMENT, INTERFACE, NAME, NAMESPACE), or any other intrinsic or user-defined datatype.
In SOX documents, unlike XML DTDs, enumerations may be specified for any attribute type. This information will be lost when an XML DTD is generated from a SOX document, except for attributes of type NMTOKEN and NOTATION. However, it may be used by an application to provide an ancillary level of validation, or by a user-interface mechanism to provide appropriate I/O methods.
Any attribute definition may specify an enumerated list of values, even strings. These enumerations are modelled after the HTML form's select element type which effectively provides a menu.
In the following example, the size attribute offers a choice among NMTOKEN values, and the topping attribute offers a selection of STRING values.
<elementtype name="pizza"> <empty/> <attdef name="size" datatype="NMTOKEN"> <enumeration> <option>small</option> <option>medium</option> <option>large</option> <option>party</option> </enumeration> <required/> </attdef> <attdef name="topping" datatype="STRING"> <enumeration multiple="true"> <option>green pepper</option> <option>mushroom</option> <option>onion</option> <option>pepperoni</option> <option>pineapple</option> </enumeration> <implied/> </attdef> </elementtype>
An attribute value's presence in an instance may be specified as default, fixed, implied, or required as in [XML], or inherited.
<elementtype name="pizza"> <empty/> <attdef name="size" datatype="NMTOKEN"> <enumeration> <option>small</option> <option>medium</option> <option>large</option> <option>party</option> </enumeration> <default>small</default> </attdef> </elementtype>
<elementtype name="glossary"> <model>....</model> <attdef name="id" datatype="ID"> <fixed>glossary</fixed> </attdef> </elementtype>
<elementtype name="A"> <model>....</model> <attdef name="A" datatype="ID"> <implied/> </attdef> <attdef name="href" datatype="uri"> <implied/> </attdef> </elementtype>
This attribute default value type becomes #IMPLIED in the generated DTD, but it may be used by an application to signal that the value of this attribute, if not specified, should be taken to be the value of an attribute whose name matches and has a specified value, and which is attached to nearest ancestor element of the attribute's host element for which that is true; or no value if no such attribute exists in the host element's ancestry. That is the value of an attribute of type INHERITED is scoped to the element on which it occurs.
For example:
<elementtype name="child"> <model> <sequence> <element name="child" occurs="*" /> </sequence> </model> <attdef name="family" datatype="STRING"><inherited/></attdef> <attdef name="given" datatype="STRING"><required/></attdef> </elementtype>
<elementtype name="xref"> <model> <string/> </model> <attdef name="xref" datatype="URI"> <required/> </attdef> </elementtype>
An attribute interface is similar to one of the uses for an XML parameter entity, but far more powerful than that. An attribute interface is a named object that contains one or more attribute definitions.
The local part of an attribute interface name is assigned by defining an attribute interface. An attribute interface name may be referenced by the implements element.
It is a fatal error to re-assign an interface name, or to reference an attribute interface that has not been defined.
<interface name="anchor"> <attdef name="href" datatype="uri"><implied/></attdef> <attdef name="name" datatype="ID"><implied/></attdef> </interface>
Given the interface defined in the previous example, we can implement that interface in a specific element type definition. In this example, the A element type specifies that it implements the attributes defined in the anchor attribute interface.
<elementtype name="A"> <model><string/><model> <implements name="anchor"/> </model> </elementtype>
We can specialize the attributes in an interface. In this example, the LINK element type specifies that the href attribute is now required, and the name attribute is fixed as a null value.
<elementtype name="LINK"> <model><string/><model> <implements name="anchor"> <attdef name="href" datatype="URI"> <required/> </attdef> <attdef name="name" datatype="ID"> <fixed></fixed> </attdef> </implements> </model> </elementtype>
We can also specilialize an attribute when extending an element. In the following example, the para element's label attribute is implied. The note element extends para and specializes the label attribute by specifying a default value. The warning element extends para and specializes the label attribute by specifying a fixed value.
<elementtype name="para"> <model><string/></model> <attdef name="label"><implied/></attdef> </elementtype> <elementtype name="note"> <extends name="para"> <attdef name="label"><default>Note: </default></attdef> </extends> </elementtype> <elementtype name="warning"> <extends name="para"> <attdef name="label"><fixed>Warning: </fixed></attdef> </extends> </elementtype>
There are some rules that apply when specializing an attribute definition inside of an extends or implements element.
The intrinsic datatypes define the domains of the atomic data units in SOX documents.
SOX documents provide a mechanism for defining datatypes that can be used to specify the datatype of an attribute or element string content. User-defined datatypes may only be derived from the intrinsic datatypes. A SOX processor must be capable of generating code to perform validation on the values of user-defined datatypes.
The local part of a datatype name is specified in the name attribute of the datatype element. A datatype name may be referenced in the datatype attribute of the attdef, enumeration, format, scalar, and string elements.
It is a fatal error to re-assign a datatype name or to reference a datatype that has not been defined.
User-defined scalar datatypes are derived from the intrinsic number datatype. A derived datatype must specify the number of digits and decimal places, and the minimum and maximum values permitted. An optional mask describes the required format of values that conform to the datatype. The minimum and maximum permitted values may be further constrained by setting the boolean minexclusive and maxexclusive attributes to "1". A SOX processor must be able to generate code that will validate a value against the datatype definition.
<datatype name="inch"> <scalar datatype="float" digits="4" decimals="2" min="0" max="12"> <mask>Z#.##</mask> </scalar> </datatype>
User-defined enumeration datatypes may be derived from any of the intrinsic datatypes. Each of the values specified in an enumerated datatype must conform to the specified type. A SOX processor must be able to generate code that will validate the value against the datatype definition.
<datatype name="postalcodes.ca"> <enumeration datatype="nmtoken"> <option>AB</option> <option>BC</option> <option>MB</option> <option>NB</option> <option>NF</option> <option>NT</option> <option>NS</option> <option>ON</option> <option>PE</option> <option>QC</option> <option>SK</option> <option>YT<option> </enumeration > </datatype>
User-defined enumeration datatypes may be derived from any of the intrinsic datatypes, but will most commonly be used to specialize string values. A required mask describes the required format of values that conform to the datatype. A SOX processor must be able to generate code that will validate the value against the datatype definition.
<datatype name="part-number"> <format datatype="string"> <mask>AAA-###.##-aa</mask> </format> </datatype>
To accommodate SOX itself, these attribute types are available for use as valid attribute datatypes when specified in the value of the datatype attribute of an attdef element.
The occurs datatype is available to accommodate SOX itself. It is used by the occurs attribute of the model, element, string, mixed, choice, and sequence elements. It is not intended to be used as an intrinsic datatype.
Collections of definitions known as modules may be directly included into a SOX document by using an include element. Like XML's combination of external parameter entity definition and reference, an inclusion is used to effectively copy the contents of an external resource into the SOX document where the include element is encountered. Unlike external parameter entities, there is no requirement to define a name and then reference that name to invoke the inclusion of the external resource.
For example, to include a module containing definitions that are commonly used for addresses:
<include href="http://www.veosystems.com/schemas/address.xml" />
In XML, there are two types of entity: parsed and unparsed. Parsed entities are available as internal and external entities, while unparsed entities are only available as external entities.
The local part of an entity name is specified when defining a parsed or unparsed entity, and may be re-assigned. A parsed entity name may only be referenced in an XML document instance, not in a SOX document. An unparsed entity name may be referenced in an attribute of type ENTITY or ENTITIES.
It is an error to reference an unparsed entity name that has not been defined.
Internal parsed entities are a feature of XML that enables reuse of text fragments by direct reference. In SOX documents, internal parsed entities may be defined by using the textentity element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD.
External parsed entities are a feature of XML that offers a baroque method for including well-formed XML document fragments, including text and markup, by direct reference to the storage object of the parsed entity. In SOX documents, external parsed entities may be defined by using the extentity element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD.
External parsed entities are included in SOX documents for XML compatibility. External parsed entities are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.
External unparsed entities are a feature of XML that offers a baroque method for including binary data by indirect reference to both the storage object and the the notation type of the unparsed entity. In SOX documents, external parsed entities may be defined by using the entity element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD.
External unparsed entities are included in SOX documents for XML compatibility. External unparsed entities are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.
The availability of SOX documentation elements should eliminate any need to use traditional XML comments in the body of a SOX document. Comments are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss. A SOX processors may, as appropriate to the application design, emit an XML comment into an XML DTD when a SOX comment element is encountered. Otherwise, there are no prescribed processing semantics associated with SOX comments.
<comment>A comment that belongs in an XML DTD</comment>
A notation may be defined by specifying a name and an identifier for the notation. A notation may be referenced by name as part of an external entity declaration. The external entity name may, in turn, be referenced as the value of an attribute of type entity. In that case, a processor that understands the notation is supposed to deal with the content of the entity.
The local part of a notation name is specified in the name attribute of the notation element. A notation name may be referenced in the notation attribute of the entity element, or by any attribute of type NOTATION.
It is a fatal error to re-assign a notation name, or to reference a notation that has not been defined.
Notations are included in SOX documents for XML compatibility. Notations are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.
Processing instructions are a feature of XML that provides a mechanism for by-passing the normal operation of an XML processor and delivering instructions directly to a downstream process whose responsibility it is to interpret the instruction and act accordingly.
In SOX documents, processing instructions may be defined by using the pi element. A SOX processor must transform this element to its XML equivalent when producing an XML DTD. The use of XML processing instructions in SOX documents is discouraged, as they are not interpretable by a SOX processor.
Processing instructions are included in SOX documents for XML compatibility. Processing instructions are available as first-class element types to satisfy the need to transform an XML DTD into a SOX document and back again without significant loss.
The names of SOX elements, attributes, interfaces, datatypes, notations, entities, and namespace identifiers themselves, are required to be valid XML names, with the exception that the colon (:) character is not allowed. That is, all names must begin with a letter followed by any combination of letters, digits, combining characters, extender characters, periods (.), hyphens (-), and underscores (_). However, to maximize interoperability with programming language interfaces, the use of punctuation characters is discouraged. Names are case-sensitive.
The names of objects of a given type must be unique for that object type, and the names of objects of one type do not share the same namespace as objects of other types. That is, a SOX processor is expected to maintain a separate lookup table, or index, for the names of each of the object classes listed here:
The purpose of this section is to define the methods by which the names of objects may be assigned, what a name is deemed to be when considered in the context of imported namespaces and element scopes, how names may be referenced, and the rules governing potential reassignment of an object name.
In SOX, the fully-qualified name of any of these objects, except namespace identifiers, is considered to be composed of multiple parts, including:
This description of the names is intentionally incompatible with that in [XML-Namespaces].
The namespace of a SOX document is not required to be specified. However, in cases where multiple namespaces are in use within a SOX document, it may be desirable to establish the current namespace by specify an identifier in the schema element's name attribute and an associated URI in the namespace attribute.
<schema name="invoice" namespace="http://www.veosystems.com/namespaces/invoice.xml" />
When an included external module's schema element has a namespace specified, that namespace becomes established as the current namespace, and the previous namespace is superordinated.
For any reference to an object in an imported namespace, that namespace becomes established as the current namespace while the reference is being reified. That is, the imported namespace becomes the current namespace while any subordinate or superordinate element definitions, or specializations are realized.
As mentioned earlier, importing a namespace makes a resource available for any namespace-qualified name references that may be encountered while processing the current SOX document. This means that a SOX document can refer to global elements, attributes, etc., as if they had been defined locally.
In the following example, we create a new kind of memo, based on the memo that we created in the first example. Note that the basic structure of the HTML memo is identical to the earlier memo. But here, we import the memo and HTML namespaces. Aside from memo, all of the element types identify one of the imported namespaces. The resulting from, subject, date, to and cc element types are the ones defined in the memo namespace. The body element is the one defined in the HTML namespace.
<schema name="HTMLmemo" namespace="http://www.veosystems.com/schemas/HTMLmemo.xml" > <h1>A memo document with HTML body</h1> <h2>Imported namespaces</h2> <namespace name="memo" namespace="http://www.veosystems.com/schemas/memo.xml"/> <namespace name="HTML" namespace="http://www.w3.org/schemas/html.xml"/> <h2>Memo element type</h2> <elementtype name="memo"> <model> <sequence> <element namespace="memo" name="from"/> <element namespace="memo" name="subject"/> <element namespace="memo" name="date"/> <element namespace="memo" name="to"/> <element namespace="memo" name="cc"/> <element namespace="HTML" name="body"/> </sequence> </model> </elementtype> </schema>
The example above would be a complete SOX document for a memo with an HTML body.
If we had used an inclusion to source the contents of these two external resources, there would have been a name collision between memo:body and HTML:body.
Here is another example that references namespaces:
<elementtype name="section"> <model> <element namespace="HTML" name="p" occurs="*" /> </model> <attdef namespace="CALS" name="security" /> <attdef namespace="HTML" scope="element" context="img" name="src" /> </elementtype>
The local part of a namespace identifier is specified when defining a namespace. A namespace identifier may be referenced in an attribute whose name and datatype is namespace. Namespace identifiers may not be namespace-qualified.
It is a fatal error to re-assign a namespace identifiers, or to reference a namespace that has not been defined.
The XML DTD for SOX is comprised of the core SOX DTD, the referenced HTML Text definitions, and the textual description found in this document. The definitions of SOX are presented here in two parts.
<!-- ************************************************************* --> <!-- SOX DTD --> <!-- PUBLIC "-//Veo Systems Inc.//DTD SOX 1.0//EN" --> <!-- SYSTEM "schema.dtd" --> <!-- Copyright: Veo Systems Inc., 1997, 1998 Written by: Murray Maloney Date created: 17 Dec 1997 Date revised: 30 Sep 1998 Version: 1.0 --> <!-- ************************************************************* --> <!-- ************************************************************* --> <!-- Schema ***************************************************** --> <!-- ************************************************************* --> <!ELEMENT schema (h1, (h2 | h3 | intro | datatype | elementtype | interface | include | namespace | comment | pi | entity | extentity | notation | textentity | parameter )*) > <!ATTLIST schema name NMTOKEN #IMPLIED namespace CDATA #IMPLIED version CDATA #FIXED "1.0" > <!-- Elements used for documentation components use a limited subset of HTML for convenience in cut/paste operations by average designers and engineers. Certainly other DTD subsets, such as DocBook, could be used in place of HTML, but learning curve and available tools led to this design decision. --> <!ENTITY % htmltext SYSTEM "htmltext.ent" > %htmltext; <!-- ************************************************************* --> <!-- ELEMENTS *************************************************** --> <!-- ************************************************************* --> <!-- An Element Type definition requires a name. It is defined to extend a named element, as an instance of a named element, as an EMPTY or ANY element with optional attribute definitions, or with a content model with optional attribute definitions. --> <!ELEMENT elementtype (((extends|instanceof) | ((any|empty|model), (attdef | implements)*))), explain?)> <!ATTLIST elementtype name NMTOKEN #REQUIRED > <!ELEMENT extends (explain?, (attdef | implements | parameter)+) > <!ATTLIST extends name NMTOKEN #REQUIRED namespace NMTOKEN #IMPLIED scope NMTOKEN #FIXED "element" > <!ELEMENT instanceof (explain?) > <!ATTLIST instanceof name NMTOKEN #REQUIRED namespace NMTOKEN #IMPLIED > <!ELEMENT any (explain?) > <!ELEMENT empty (explain?) > <!-- ************************************************************* --> <!-- MODEL ****************************************************** --> <!-- ************************************************************* --> <!ELEMENT model (string|element|mixed|choice|sequence|paramref)> <!ATTLIST model occurs CDATA #IMPLIED> <!ELEMENT element (explain?, instanceof?) > <!ATTLIST element name NMTOKEN #REQUIRED namespace NMTOKEN #IMPLIED occurs CDATA #IMPLIED > <!ELEMENT string ((default | fixed | mask)?, explain?)> <!ATTLIST string datatype NMTOKEN #IMPLIED > <!ELEMENT mixed ((element | paramref)+, explain?) > <!ATTLIST mixed name NMTOKEN #IMPLIED occurs CDATA #FIXED "*" > <!ELEMENT choice ((element|choice|sequence|paramref), (element|choice|sequence|paramref)+, explain?) > <!ATTLIST choice name NMTOKEN #IMPLIED occurs CDATA #IMPLIED > <!ELEMENT sequence ((element|choice|sequence|paramref), (element|choice|sequence|paramref)+, explain?) > <!ATTLIST sequence name NMTOKEN #IMPLIED occurs CDATA #IMPLIED > <!-- ************************************************************* --> <!-- ATTRIBUTES ************************************************* --> <!-- ************************************************************* --> <!-- An interface to a named collection of attribute definitions. --> <!ELEMENT interface ((attdef | implements)+, explain?) > <!ATTLIST interface name NMTOKEN #REQUIRED > <!-- Transcludes an interface specification --> <!ELEMENT implements ((attdef | implements)*, explain?) > <!ATTLIST implements name NMTOKEN #REQUIRED namespace NMTOKEN #IMPLIED scope NMTOKEN #FIXED "interface" > <!-- An attribute definition has a name and datatype, and must have a presence element "required|implied|inherit|default|fixed" included. It may have a namespace associated with it, or inherits? --> <!ELEMENT attdef (enumeration?, (required|implied|inherit|default|fixed), explain?)> <!ATTLIST attdef name NMTOKEN #REQUIRED namespace NMTOKEN #IMPLIED datatype NMTOKEN "STRING" scope NMTOKEN #IMPLIED > <!ELEMENT default (#PCDATA) > <!ELEMENT fixed (#PCDATA) > <!ELEMENT required EMPTY > <!ELEMENT implied EMPTY > <!ELEMENT inherit EMPTY > <!-- ************************************************************* --> <!-- DATATYPE *************************************************** --> <!-- ************************************************************* --> <!ELEMENT datatype ((enumeration|format|scalar), explain?)+ > <!ATTLIST datatype name NMTOKEN #REQUIRED > <!ELEMENT enumeration (option+, explain?) > <!ATTLIST enumeration datatype NMTOKEN #IMPLIED multiple (true|false) "false" > <!ELEMENT option (#PCDATA)* > <!ATTLIST option value CDATA #IMPLIED label CDATA #IMPLIED selected (selected) #IMPLIED disabled (disabled) #IMPLIED > <!ELEMENT format (mask, explain?) > <!ATTLIST format datatype NMTOKEN "string" > <!ELEMENT scalar (mask?, explain?) > <!ATTLIST scalar datatype NMTOKEN "number" digits CDATA #IMPLIED decimals CDATA #IMPLIED minvalue CDATA #IMPLIED maxvalue CDATA #IMPLIED minexclusive CDATA "0" maxexclusive CDATA "0" > <!ELEMENT mask (#PCDATA) > <!-- ************************************************************* --> <!-- NAMESPACES ************************************************* --> <!-- ************************************************************* --> <!-- Imports a namespace and provides a shorthand name for a full URN. --> <!ELEMENT namespace (explain?) > <!ATTLIST namespace name NMTOKEN #REQUIRED namespace CDATA #REQUIRED > <!-- ************************************************************* --> <!-- ENTITIES *************************************************** --> <!-- ************************************************************* --> <!-- Entities. XML's entity definition and reference mechanisms are partially reproduced in SOX. In addition, some SOX-specific entity definition and reference mechanisms are also provided. --> <!ELEMENT include (explain?)> <!ATTLIST include datatype NMTOKEN #FIXED "schema" public CDATA #IMPLIED system CDATA #IMPLIED > <!ELEMENT parameter (element|choice|sequence|paramref) > <!ATTLIST parameter name NMTOKEN #REQUIRED > <!ELEMENT paramref (explain?) > <!ATTLIST paramref name NMTOKEN #REQUIRED namespace NMTOKEN #IMPLIED scope (element | namespace) #REQUIRED > <!-- Parsed entities. --> <!ELEMENT textentity (#PCDATA)* > <!ATTLIST textentity name NMTOKEN #REQUIRED > <!ELEMENT extentity (explain?) > <!ATTLIST extentity name NMTOKEN #REQUIRED system CDATA #REQUIRED public CDATA #IMPLIED notation NMTOKEN #FIXED "XML" > <!-- Unparsed entity. --> <!ELEMENT entity (explain?) > <!ATTLIST entity name NMTOKEN #REQUIRED system CDATA #REQUIRED public CDATA #IMPLIED notation NMTOKEN #REQUIRED > <!-- Notation declaration. --> <!ELEMENT notation (explain?) > <!ATTLIST notation name NMTOKEN #REQUIRED system CDATA #IMPLIED public CDATA #IMPLIED > <!-- ************************************************************* --> <!-- COMMENT **************************************************** --> <!-- ************************************************************* --> <!ELEMENT comment (#PCDATA)> <!-- ************************************************************* --> <!-- PROCESSING INSTRUCTIONS ************************************ --> <!-- ************************************************************* --> <!ELEMENT pi (#PCDATA) > <!ATTLIST pi name NMTOKEN #REQUIRED >
<!-- ************************************************************* --> <!-- HTML Text: SOX uses HTML element types for convenience.--> <!-- ************************************************************* --> <!-- Copyright: Veo Systems Inc., 1997, 1998 Written by: Murray Maloney Date created: 17 Dec 1997 Date revised: 30 Sep 1998 Version: 1.0 --> <!-- ************************************************************* --> <!ENTITY % block "form | table | p | bq | pre | ol | ul | dl" > <!ENTITY % text "#PCDATA| a | abbr | b | big | br | button | checkbox | cite | code | em | fieldset | i | img | label | password | q | radio | select | small | span | strike | strong | sub | sup | textarea | textfield | tt | u " > <!ENTITY % heading "#PCDATA| a | abbr | b | big | br | cite | code | em | i | img | q | small | span | strike | strong | sub | sup | tt | u " > <!-- ************************************************************* --> <!ELEMENT intro (%block;)* > <!ELEMENT explain (title?, synopsis?, (h4 | h5 | h6 | %block;)*) > <!ELEMENT title (%heading;)* > <!ELEMENT synopsis (%heading;)* > <!-- ************************************************************* --> <!ELEMENT h1 (%heading;)* > <!ELEMENT h2 (%heading;)* > <!ELEMENT h3 (%heading;)* > <!ELEMENT h4 (%heading;)* > <!ELEMENT h5 (%heading;)* > <!ELEMENT h6 (%heading;)* > <!-- ************************************************************* --> <!ELEMENT b (#PCDATA)* > <!ELEMENT br EMPTY > <!ELEMENT big (#PCDATA)* > <!ELEMENT i (#PCDATA)* > <!ELEMENT small (#PCDATA)* > <!ELEMENT sub (#PCDATA)* > <!ELEMENT sup (#PCDATA)* > <!ELEMENT strike (#PCDATA)* > <!ELEMENT tt (#PCDATA)* > <!ELEMENT u (#PCDATA)* > <!ELEMENT abbr (#PCDATA)* > <!ELEMENT cite (#PCDATA)* > <!ELEMENT code (#PCDATA)* > <!ELEMENT em (#PCDATA)* > <!ELEMENT q (#PCDATA)* > <!ELEMENT span (#PCDATA)* > <!ELEMENT strong (#PCDATA)* > <!-- ************************************************************* --> <!ELEMENT a (%text;)* > <!ATTLIST a name CDATA #IMPLIED href CDATA #IMPLIED title CDATA #IMPLIED > <!-- ************************************************************* --> <!ELEMENT img (explain?) > <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED longdesc CDATA #IMPLIED usemap CDATA #IMPLIED > <!-- ************************************************************* --> <!ELEMENT pre (%text;)* > <!ATTLIST pre xml:space (preserve) #REQUIRED > <!-- ************************************************************* --> <!ELEMENT p (%text;)* > <!ELEMENT bq (%text;)* > <!ELEMENT ol (lh?, li+) > <!ELEMENT ul (lh?, li+) > <!ELEMENT lh (%heading;)* > <!ELEMENT li ((%block;)*) > <!ELEMENT dl (dh?,(dt,dd)+) > <!ELEMENT dh (%heading;)* > <!ELEMENT dt (%text;)* > <!ELEMENT dd ((%block;)*) > <!-- ************************************************************* --> <!ELEMENT table (thead?, tbody) > <!ATTLIST table cols CDATA #IMPLIED width CDATA #IMPLIED height CDATA #IMPLIED align (left|center|right|justify) #IMPLIED valign (top | middle | bottom | baseline) #IMPLIED vspace CDATA #IMPLIED hspace CDATA #IMPLIED cellpadding CDATA #IMPLIED cellspacing CDATA #IMPLIED border CDATA #IMPLIED frame (box|void|above| below|hsides|vsides|lhs|rhs) #IMPLIED rules (none|groups|rows|cols|all) #IMPLIED > <!ELEMENT thead (tr)+ > <!ATTLIST thead align (left|center|right|justify) #IMPLIED valign (top|middle|bottom|baseline) #IMPLIED > <!ELEMENT tbody (tr)+ > <!ATTLIST tbody align (left|center|right|justify) #IMPLIED valign (top|middle|bottom|baseline) #IMPLIED > <!ELEMENT tr (th | td)+ > <!ATTLIST tr align (left|center|right|justify) #IMPLIED valign (top | middle | bottom | baseline) #IMPLIED > <!ELEMENT th (%text;)* > <!ATTLIST th colspan CDATA #IMPLIED rowspan CDATA #IMPLIED width CDATA #IMPLIED height CDATA #IMPLIED align (left|center|right|justify) #IMPLIED valign (top | middle | bottom | baseline) #IMPLIED > <!ELEMENT td (%text;)* > <!ATTLIST td colspan CDATA #IMPLIED rowspan CDATA #IMPLIED width CDATA #IMPLIED height CDATA #IMPLIED align (left|center|right|justify) #IMPLIED valign (top | middle | bottom | baseline) #IMPLIED > <!-- ************************************************************* --> <!ELEMENT form (button|checkbox|fieldset|label|password|radio |select|textfield|textarea)* > <!ATTLIST form action CDATA #IMPLIED method (get|post) "get" > <!ELEMENT fieldset (legend?, (label, (button|checkbox|password|radio |select|textfield|textarea))*) > <!ELEMENT legend (%heading;)* > <!ELEMENT label (%heading;)* > <!ELEMENT select (option | optgroup)+ > <!ATTLIST select name NMTOKEN #IMPLIED multiple (multiple) #IMPLIED disabled (disabled) #IMPLIED size CDATA #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT optgroup (label?, (option | optgroup)+) > <!ATTLIST optgroup multiple (multiple) #IMPLIED disabled (disabled) #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT button (#PCDATA)* > <!ATTLIST button name NMTOKEN #REQUIRED value CDATA #IMPLIED icon CDATA #IMPLIED type (button | submit | reset) #REQUIRED disabled (disabled) #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT checkbox (#PCDATA)* > <!ATTLIST checkbox name NMTOKEN #REQUIRED value CDATA #IMPLIED icon CDATA #IMPLIED type NMTOKEN #FIXED "checkbox" checked (checked) #IMPLIED disabled (disabled) #IMPLIED size CDATA #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT radio (#PCDATA)* > <!ATTLIST radio name NMTOKEN #REQUIRED value CDATA #REQUIRED icon CDATA #IMPLIED type NMTOKEN #FIXED "radio" checked (checked) #IMPLIED disabled (disabled) #IMPLIED size CDATA #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT textfield (#PCDATA)* > <!ATTLIST textfield name NMTOKEN #REQUIRED value CDATA #IMPLIED icon CDATA #IMPLIED type NMTOKEN "text" disabled (disabled) #IMPLIED readonly (readonly) #IMPLIED size CDATA #IMPLIED maxlength CDATA #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT password (#PCDATA)* > <!ATTLIST password name NMTOKEN #REQUIRED value CDATA #IMPLIED icon CDATA #IMPLIED type NMTOKEN "password" disabled (disabled) #IMPLIED readonly (readonly) #IMPLIED size CDATA #IMPLIED maxlength CDATA #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED > <!ELEMENT textarea (#PCDATA)* > <!ATTLIST textarea name NMTOKEN #REQUIRED icon CDATA #IMPLIED rows CDATA #REQUIRED cols CDATA #REQUIRED disabled (disabled) #IMPLIED readonly (readonly) #IMPLIED tabindex CDATA #IMPLIED accesskey CDATA #IMPLIED >
A mask is a datatype format constraint. A mask consists of symbols, groups of symbols, and patterns, any of which may be modified by occurrence specifiers. Each symbol is a placeholder that stands for a character or a class of characters. Date and time masks tokens are taken from those defined in [ISO-8601]
All derived scalar datatypes have an XML parse type of string.
The following date and time datatypes are derived from ISO 8601 -- Date and Time [ISO-8601] and are informed by Date and time formats [DATETIME].
<schema name="memo" namespace="http://www.veosystems.com/schemas/memo.xml"> <h1>Memo Document Type</h1> <h2>Definitions</h2> <intro> <p>...</p> <ul> <li>...</li> <li>...</li> </ul> </intro> <h3>Memo element type</h3> <elementtype name="memo"> <explain> <title>Memo Document</title> <synopsis>A simple, useful memo.</synopsis> <help> <p>Fill in attributes, enter paragraphs, lists and images, and press SEND.</p> </help> <p>A memo consists of six required fields and a body.</p> </explain> <model> <sequence> <element name="to"/> <element name="from"/> <element name="cc"/> <element name="subject"/> <element name="file"/> <element name="date"/> <element name="body"/> </sequence> </model> </elementtype> <h3>Memo fields</h3> <elementtype name="to"> <model><string/></model> </elementtype> <elementtype name="from"> <model><string/></model> </elementtype> <elementtype name="cc"> <model><string/></model> </elementtype> <elementtype name="subject"> <model><string/></model> </elementtype> <elementtype name="file"> <model><string datatype="number"/></model> </elementtype> <elementtype name="date"> <model><string datatype="date"/></model> </elementtype> <elementtype name="body"> <model> <choice occurs="1,*"> <element name="p"/> <element name="list"/> <element name="image"/> </choice> </model> </elementtype> <elementtype name="p"> <model> <string/> </model> </elementtype> <elementtype name="list"> <model> <element name="item" occurs="3,9"/> </model> </elementtype> <elementtype name="item"> <instanceof name="p"/> </elementtype> <elementtype name="image"> <empty/> <attdef name="src" datatype="URI"> <required/> </attdef> </elementtype> </schema>
As in any collaborative work, some of the decisions that found their way into the SOX specification were fraught with technical differences of opinion. In particular, the authors and other collaborators had a hard time coming to terms with entities and notations. In the end we agreed to document support for both, while agreeing to disagree about whether another approach might be more suitable. However, we fully expect that the split that we encountered will be reflected in the outside world, so we are including herewith the minority opinion:
We foresee the spread of XML engendering the creation of large repositories of entities by different organizations. An instance corresponding to a Schema might legitimately choose to reference entities from a variety of these repositories (not all of which were necessarily known when the Schema was created). If we follow the approach of current DTDs, then we have the following alternatives:
None of these alternatives is entirely acceptable. In addition, there is a serious issue of name clashes among entities defined in the various repositories, which can be a serious problem if the entities need to be defined within the Schema itself.
Given the characteristics of the issue, the clearest solution to the problem is to extend namespaces to cover entities as well as well as element and attribute names. Doing so provides a means to declare a collection of entity names (as a subset of whatever the referenced namespace is) and a way to reference entities without fear of name clashes.
In addition, this mechanism makes it possible to handle all text and unparsed entities outside of the Schema itself. Rather than provide both an inadequate mechanism for compatibility as well as a more flexible one for future development, [the minority opinion was to] have left entity declarations entirely out of SOX.
Removing entity declarations from the language requires supplying an alternative mechanism for supporting entities. Part of that is accomplished through extending the namespace mechanism to include entity references. Another part is by describing the storage objects which will hold entity references, the entity repositories.
An entity repository is an XML instance which declares some number of entities. These are either simple text entities (the equivalent of an internal text entity), external parsed entities, or unparsed entities. It is also possible to include other repositories. Each entity has a name attribute of type ID, so the names must all be unique within the repository.
When an XML DTD is converted to a SOX document, a repository is created with all then entities defined in the DTD. This file is also merged with the Schema if it is desirable to generate a DTD from a Schema. The repository then contains the entity portion of the Schema namespace, as referenced by instances.
Both a SOX document and a DTD defining the structure of a repository have been elided.
Within an entity definition file at www.veosystems.com:
<textentity name="astring">this is a text string </textentity> <extentity name="anentity" public="urn:veo:text:anentity" system="http://www.veosystems.com/anentity.xml" /> <datatype name="gif" > <notation public="image/gif" /> </datatype> <entity name="animage" notation="gif" system="http://www.veosystems.com/animage.gif" />
Within a document
<elem xmlns:ents="http://www.veosystems.com" img="ents:animage" > &ents:astring; &ents:anentity; </elem>
We gratefully acknowledge:
A note is made here in memory of Yuri Rubinsky, who was instrumental in developing and promoting the precursor to XML -- SGML on the Web.
an element, string, mixed, choice, sequence or parameter content model atom
The elements used to define SOX objects.