XForms 1.0: Data Model

W3C Working Draft 06 April 2000

This Version:: http://www.w3.org/TR/2000/WD-xforms-datamodel-20000406
Latest Version:: http://www.w3.org/TR/xforms-datamodel

Editors:: Micah Dubinko (Cardiff Software) <mdubinko@cardiff.com>; Stacy Silvester (Cardiff Software) <ssilvester@cardiff.com>; Sebastian Schnitzenbaumer (Stack Overflow) <schnitz@overflow.de>; Dave Raggett (W3C/HP) <dsr@w3.org>

Abstract

This document presents a proposal for explicitly representing data models for XForms, the next generation of Web forms. Apart from other mechanisms described in this document, it is based upon the framework provided by XML Schema. While XML Schemas are used to define XML grammars, the XForms data model is intended to capture the device-independent data model and logic of form-based Web applications.

Although both specifications address different problems, they overlap in the definition of simple datatypes. Therefore, the datatypes defined in this specification are a close match to the datatypes found in XML Schema Part 2: Datatypes [XSchema-2]. In some cases, however, the XForms datatypes differ from the ones in XML Schema, due to different usage scenarios and target audiences. In Appendix A, an [XSLT] filter will be provided for translating the XForms data model into the corresponding syntax defined in the XML Schema specifications.

A later specification will focus on the user interface aspects of XForms.

Status of this document

This is a W3C Working Draft. It is intended for review by W3C members and other interested parties.

This working draft may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C working drafts can be found at http://www.w3.org/TR.

This document is work in progress and does not imply endorsement by the W3C membership or the HTML Working Group (members only).

This document has been produced as part of the W3C HTML Activity. Further information on XForms can be found at http://www.w3.org/MarkUp/Forms.

Please send detailed comments on this document to www-forms@w3.org, the public forum for discussion of W3C's work on Web forms.

1. Introduction
2. XForms Architecture
3. Example: Editing XML with XForms
4. Datatypes
- 4.1 String
- 4.2 Boolean
- 4.3 Number
- 4.4 Monentary Values
- 4.5 Date
- 4.6 Time of Day
- 4.7 Duration
- 4.8 URI
- 4.9 Binary
5. Common Datatype Facets
- 5.1 Default Values
- 5.2 Read-only Values
- 5.3 Required Values
- 5.4 Calculated Values
- 5.5 Script Validations
6. Data Model Structures
- 6.1 Enumerations
- 6.2 Unions
- 6.3 Composite Types
- 6.4 Variant Types
- 6.5 Arrays
- 6.6 Shared Datatype Libraries
7. Expression Language
- 7.1 Syntax
- 7.2 General Methods
- 7.3 String Methods
- 7.4 Finance Methods
- 7.5 Decimal Arithmetic
8. Acknowledgments
9. References
- 9.1 Normative References
- 9.2 Non-normative References
Appendix A. Mapping to XML Schema Concrete Syntax
Appendix B. Binding to XHTML 1.0 Forms
Appendix C. Rationale Behind Decimal Arithmetic

1. Introduction

Web forms are an important part of the Web, allowing fine-tuned interaction between document reader and document author, Web page visitor and Web server, software program and software program, buyer and seller over the Web.

The demand for richer Web forms that allow greater flexibility and richer interaction mechanisms has led to several proposals for the next generation of Web forms. XForms are the result of extended analysis, creating a new platform-independent markup language for user interaction and transactional behavior between a user agent and a remote entity.

XForms are the successor to HTML forms, and as such is being designed as modules for integration in [XHTML 1.0]. However, the design of XForms allows its usage in other XML grammars as well.

Proposals for the next generation of Web forms separate the user interface from the data and logic, allowing different presentations to be used with the same back-end. The form is represented in terms of the following pieces:

An explicit data model defining the form as a composite datatype with constraints on and between form data values
The user interface, expressed as a set of presentation controls that are bound to the data model
The use of XML and Unicode for exchanging form data with servers

In the past it was necessary to compromise the presentation to accommodate the various media with a one-size-fits-all approach. For instance, imagine a form that can be filled out either on a palm-top computer or on a paper print out. Now that XForms address the user interface separately from the data, the same form can include many presentations.

2. XForms Architecture

Apart from various problems with previous HTML forms, the design didn't separate the purpose from the presentation of a form. There is a fine distinction between the purpose of a form (i.e. the questions being asked to the user, the selections he may choose from, the logical sequence of decisions) and the presentation of the form (e.g. the visual appearance on a screen).

The purpose of a form can be expressed in various, device and media-specific ways, without losing the original intention of the form designer. The presentation, however, loses its richness when trying to accommodate every possible device because the least common denominator has to be used.

XForms separate purpose and presentation with two specifications: "Data model" and "User Interface". The data model in XForms is covered in this document, and a future document will focus on the user interface. The data model allows the abstract structure of a form to be defined without explicitly specifying a user interface. XForms will introduce a new user interface layer for richer user interaction, but the device-indepedence will be limited to avoid the sacrifice of functionality in and beyond existing HTML forms. Since the data model is device-independent, it is possible to bind other XML grammars to the data model, for instance VoiceML, WML, SMIL or even existing XHTML forms.

diagram showing user interface, data model, instance data and XML encoding

3. Example: Editing XML with XForms

Even though Web forms represent a world of their own, they are often only building blocks in larger frameworks, for instance database and workflow applications. The process of moving information from within a database to the inside of an HTML document containing a form and back from the form via submit to the database is one of many usage scenarios where forms are just one component.

The XForms design focuses on improving both the form itself as well as the match to database and workflow applications. XML is the universal data format, and now almost any data can be represented as XML. Since a form is a structured data exchange, XML and forms are a perfect match. In fact, it is possible to simply edit arbitrary XML document instances with XForms in a user agent.

Here is an example how to use XForms to edit a simple XML document, based on a simplified version of the purchase order example found in "XML Schema Part 0: Primer" [XSchema-0].

<?xml version="1.0"?>
<purchaseOrder>
  <shipTo>
    <name>Alice Smith</name>
    <street>123 Maple Street</street>
    <city>Mill Valley</city>
    <state>CA</state>
    <zip>90952</zip>
  </shipTo>
</purchaseOrder>

[Issue: Will it be possible for attributes in the instance data to be edited with XForms?]

We would like to allow the contents of the elements inside <shipTo> to be edited as text input fields on a Web page. To do this, we need to construct an XForms data model that maps to this XML. This is simple to do by hand, although advanced users might want to use more powerful tools, such as [XSLT], as explained in Appendix A.

This is what the XForms data model would look like for the preceding purchase order XML document instance:

<group name="purchaseOrder">
  <group name="shipTo">
    <string name="name"/>
    <string name="street"/>
    <string name="city"/>
    <string name="state"/>
    <string name="zip">
      <mask>ddddd</mask>
    </string>
  </group>
</group>

In XForms, the underlying data model that the form represents will be persisted into a generic, well-formed XML document where:

The leaves of the data model produce elements where the name of the element corresponds to the name of the data value and the content of the element corresponds to the data value itself;
The nodes of the data model produce elements where the name of the element corresponds to the name of the data group and the content of the element corresponds to the children of the data group.

The data model can be embedded in a parent document, giving it form capabilities. Likewise, the instance data can also be embedded in a parent document, serving as form data. In the following example, both the data model and instance data have been embedded in an [XHTML 1.0] document:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML-XForms 1.0//EN"
"http://www.w3.org/TR/xhtml-forms1/DTD/xhtml-xforms1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Purchase Order</title>             

<xform xmlns="http://www.w3.org/2000/xforms"
  action="http://www.my.com/cgi-bin/receiver.pl"
  method="postXML"
  id="po_xform">
  <model>
    <group name="purchaseOrder">
      <group name="shipTo">
        <string name="name"/>
        <string name="street"/>
        <string name="city"/>
        <string name="state"/>
        <string name="zip">
          <mask>ddddd</mask>
        </string>
      </group>
    </group>
  </model>
  <instance>
    <purchaseOrder>
      <shipTo>
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
      </shipTo>
    </purchaseOrder>
  </instance>
</xform>
</head>

<body>
  <h1>Shipping Information</h1>

  <form name="po_xform">

    Name: <input name="purchaseOrder.shipTo.name"/><br/>
    Street: <input name="purchaseOrder.shipTo.street"/><br/>
    City: <input name="purchaseOrder.shipTo.city"/><br/>
    State: <input name="purchaseOrder.shipTo.state"/><br/>
    Zip: <input name="purchaseOrder.shipTo.zip"/><br/>

    <button onclick="submit('po_xform')">Submit</button>

  </form>
</body>
</html>

Note that standard [XHTML 1.0] form elements have been used for the user interface. XForms makes it possible for different user interface modules to work the same data model such as SVG and SMIL. Appendix B explains how XHTML form user interface elements work with XForms.

When the user changes the entries in the text input fields, the instance data gets updated. When the user hits the submit button, the instance data gets sent over to the server, i.e. the same XML document from the very beginning but with updated element contents. On the server, this returning XML document can be validated with the same Schema as the original XML document. In fact, once valid, the original XML document on the server can be overwritten with the new one returned from the XForm. Hence we can think of XForms as editing XML in the browser.

4. Datatypes

The starting point is a small set of built-in datatypes. For instance, here is how you could define a string valued data item named "City":

<string name="city"/>

If the user entered "Boston", this would appear in the submitted data as:

<city>Boston</city>

Each datatype supports several facets which you can use to constrain data items. The name attribute can be used with all datatypes. The value for name attributes must match the syntax for object identifiers in ECMAScript as defined in the [ECMA-262] specification. This precludes the use of ".", ":" or "-" characters in names. This restriction makes it practical to use names within scripts.

4.1 String

Strings have the following facets:

Facet	Description	Default
`min`	minimum length in characters	0
`max`	maximum length in characters	unlimited
`mask`	simple mask, e.g. "`ddd-ddd-dddd`"	no restriction
`pattern`	regular expression, e.g. "`\d{3}-\d{3}-\d{4}`"	no restriction

Basic Patterns (Masks)

A simple syntax may be used to constrain, or "mask" permissible lexical values at the character level. Most people will find masks easier to understand than regular expression patterns, however this is at the expense of some expressive power.

A fundamental simplifying design aspect of masks is the absence of escape characters. Characters in the mask always map one-to-one with characters in the corresponding string data.

The mask is a string where certain characters are used to represent given classes of characters, and any remaining characters are literals. The following character classes are defined:

Character Class	Description	Equiv. Regex	Default Representation
`letter`	All letter characters	`\p{L}`	'`l`'
`digit`	All digit characters	`\d`	'`d`'
`character`	All characters allowed in XML names	`\c`	'`c`'
`space`	All whitespace characters	`\s`	'`s`'
`any`	Any Unicode character except newline	`.` (dot)	'`.`'

Note that any given mask can be transformed into an equivalent regular expression.

All non-literal positions in the mask must be filled. This facet is identified by a <mask> child element. Using this basic syntax, a telephone number could be represented like this:

<string name="phone">
  <mask>ddd-dddd</mask>
</string>

[Issue: We may want to consider allowing a child attribute for simple cases]

Multiple pattern facets can be specified. They will be processed in document order, in a logical OR fashion. For instance, allowing multiple types of postal codes might be done like this:

<string name="postalcode">
  <mask>ddddd</mask>       <!-- US ZIP code -->
  <mask>ddddd-dddd</mask>  <!-- US ZIP+4 code -->
  <mask>lldsdll</mask>     <!-- UK postal code -->
  <mask>llddsdll</mask>    <!-- UK postal code -->
  <mask>ldlsdld</mask>     <!-- Canadian postal code -->
</string>

In some instances, users may want to use different representative characters than 'l', 'd', 'c', 's', and '.'. Perhaps one or more of those symbols are needed as a literal part of the string. Perhaps using different characters makes it easier to interoperate with an existing system or makes an [XSLT] transformation simpler. For these cases, the representative characters can be redefined, using an attribute with the same name as the "character class" heading above. For example:

<string name="phone">
  <mask digit="#">###-####</mask>
</string>

Only single characters can be used when redefining a representative character.

ILLEGAL:
<mask digit="foo">...</mask>

Regular Expression Patterns

Regular expressions are more powerful, but considerably harder to understand. XForms use Perl-like regular expressions modified for Unicode compliance, as described in Appendix E of the XML Schema Datatypes part 2 document [XSchema-2].

Regular expression pattern facets are identified by a <pattern> child element.

[Issue: Currently, the element <pattern> is used for compatibility with XML Schema Datatypes. Is the difference between <mask> and <pattern> too subtle?]

<string name="phone">
  <pattern>(\d{3}-)?\d{3}-\d{4}</pattern>
</string>

4.2 Boolean

These represent true or false values. Here is an example of a Boolean data item:

<boolean name="married"/>

This would appear in the submitted data as:

<married>true</married>

<married>false</married>

Note this example could also have been represented by an enumeration, which is explained in Section 6.1.

4.3 Number

Numeric calculations should be performed on the internal data values (not the presentation values) using decimal arithmetic, except where the resource constraints preclude this.

For example:

<number name="age"/>

When submitted this would appear like:

<age>24</age>

Numbers can be constrained in various ways using the following facets:

Facet	Description	Default
`min`	minimum value	minus infinity
`max`	maximum value	plus infinity
`integer`	if "true" only integer values are permitted	real numbers
`decimals`	how many digits after the decimal point are significant	unlimited

Here are some examples:

The following is a non-zero positive integer:

<number name="count" min="1" integer="true"/>

[Issue: Integers are used commonly enough that an abbreviated syntax may be desirable, e.g.:
<integer name="quantity"/>]

4.4 Monetary Values

Monetary values and represented using the <money> element and can be constrained in various ways using the following facets

Facet	Description	Default
`min`	minimum value, e.g. "0"	minus infinity
`max`	maximum value	plus infinity
`decimals`	how many digits after the decimal point are significant	unlimited
`currency`	a space separated list of 3 letter currency codes, e.g. `USD` or `GBP`	unspecified

Calculations should be carried out using decimal arithmetic. The rounding method used for calculations involving monetary values will be specified after consultation with the financial community, but is likely to be ROUND_HALF_UP.

The currency attribute allows you to specify a list of acceptable currencies. The first in the list shall be considered to be the default when the data value doesn't specify the currency.

This is a value restricted to US Dollars:

<money name="price" currency="USD"/>

When submitted this would appear as:

<price>24.25</price>

This is a non-negative value in British Pounds:

<money name="price" currency="GBP" decimals="2" min="0"/>

If the monetary datatype allows more than one currency, the currency for a given data value represented by the currency attribute, e.g.

<price currency="EUR">26.00</price>

Three letter currency codes are defined in [ISO 4217].

[Issue: Do we want to support a means to indirectly specify facets when there are many money values in a form, all of which accept the same currency?]

4.5 Date

Dates are specified in years, months and days following the [ISO 8601] standard for date and time. The format consists of a decimal number denoting the year, optionally followed by the month and day, separated by hyphens. Thus 31st January 2000 is "2000-01-31", while the year 1976 is "1976". Note that months and days are always represented with two digits, with a leading zero for numbers in the range 1 to 9. Here is an example of how you declare a date.

<date name="date"/>

The facets for <date> allow you to constrain the value to be in the past or future, or in a given range. This range can be set to be relative to the present (when the form is filled in), and may be specified in days, months or years.

Facet	Description	Default
`min`	minimum value or "`now`"	the distant past
`max`	maximum value or "`now`"	the distant future
`precision`	"`years`", "`months`" or "`days`"	unconstrained

The values for min and max can be explicit dates. Alternatively, you specify the special value "now" which refers to the date the form is submitted. Finally you can specify values relative to the submission date using positive or negative durations. The syntax for dates and durations are as per the subset of [ISO 8601] specified for XML Schemas for time instants and durations.

The value "now" can be used to restrict dates to be in the past or future. For example a date of birth could be constrained to be in the past:

<date name="birth" max="now"/>

For a credit card expiry date, the value could be constrained to be some time between now and 4 years hence:

<date name="expires" precision="months" min="now" max="+P4Y"/>

The lexical format for dates and times to be transmitted to the server is defined in [ISO 8601].

It is recommended that user agents offer date and time pickers which offer date validation and choices from the distant past to the distant future. Small portable devices will likely validate and pick only dates in the range likely for business appointments near the current time; whereas, a full-featured desktop browser, which supports use cases such as historical records search and long-term financial obligations, should offer an extended range of dates. As always, the server must assume that the client has not performed the validation specified in the data model and perform its own validation on the entered date.

4.6 Time of Day

The time datatype is used for points in time such as the time of an appointment. It uses the [ISO 8601] subset specified by XML Schemas.

Facet	Description	Default
`min`	minimum value	unconstrained
`max`	maximum value	unconstrained
`precision`	"`hours`", "`minutes`" or "`seconds`"	unconstrained

The values for min and max are defined in exactly the same manner as for <date> values.

The following defines a data item called "meeting" which should be a time of day specified in hours for the Eastern standard time zone:

<time name="meeting" zone="EST"/>

The time could be restricted to between 9am and 5pm EST:

<time name="meeting" min="09:00-5" max="17:00-5"/>

[Issue: The time zone is expressed as hours relative to UTC. The format is less convenient than a time zone name e.g. "09:00 EST", but ISO8601 doesn't permit such names. Likewise you need to use a 24-hour clock and can't use "am" or "pm". User agents may provide time pickers with 12-hour clocks and named time zones as a convenience to users.]

4.7 Duration

This datatype is used for values representing a duration in years, months, hours, minutes, days or seconds. The precision can be specified via a facet:

Facet	Description	Default
`precision`	"`years`", "`months`", "`days`", "`hours`", "`minutes`" or "`seconds`"	unconstrained

For instance, the duration of a meeting in hours could be specified as:

<duration name="lasting" precision="hours"/>

When submitted a two-hour meeting would be represented as:

<lasting>+P2H</lasting>

[Issue: Months only provide an approximate means to specify duration since individual months vary in length.]

4.8 URI

This datatype is used for values representing an absolute Uniform Resource Identifier (URI) as defined in [RFC 2396].

Facet	Description	Default
`scheme`	space separated list of schemes	unconstrained

Here is how you can define a URI data item:

<uri name="home"/>

When submitted this would look like:

<home>http://www.acme.com</home>

The scheme attribute allows you to restrict URIs to a limited set of schemes. For instance, to restrict a field to email and Web addresses you could write:

<uri name="contact" schema="mailto http"/>

User agents are encouraged to provide a means to pick or browse addresses, for instance an email address picker. The user interface may allow users to enter relative URIs, but the internal values will always be absolute URIs.

[Issue: Should we split off special datatypes for email addresses and HTTP URLs?]

4.9 Binary

This is a datatype for use with data appropriate to specific Internet media types. The user agent could use the media type to determine how to prompt the user. For example, an image could be acquired from a digital camera, an image scanner, or a disk file.

Facet	Description	Default
`<type>`	one or more elements listing mime types	unconstrained

Here is an example for JPEG and PNG images:

<binary name="photo">
  <type>image/jpeg</type>
  <type>image/png</type>
</binary>

Binary data could be packaged either in-place as part of XML form data or held separately and referenced from XML. Further work is needed to cover the details.

[Issue: Is there a need for facets to further constrain the data, for instance, to place limits on the size of the data?]

5. Common Datatype Facets

5.1 Default Values

Sometimes it may be useful to provide an explicit default value. A natural way to do this is via a facet on the datatype, e.g.

<string name="color" default="black"/>

For enumerations, it is repetitive to specify the default value once as part of the enumeration and then again as the default. One proposal is to use the <default> element instead of the <value> element for marking up the default value. For example:

<string name="rating" enum="closed">
  <value>excellent</value>
  <value>good</value>
  <default>indifferent</default>
  <value>poor</value>
  <value>terrible</value>
</string>

Another possibility would be to treat default as a Boolean attribute of the <value> element, for example:

<string name="rating" enum="closed">
  <value>excellent</value>
  <value>good</value>
  <value default="true">indifferent</value>
  <value>poor</value>
  <value>terrible</value>
</string>

Within a union, no more than one value may marked as the default.

If a value is not supplied for a given data item, it will default to <null/>, which represents a null value, and can be used to distinguish values which haven't been filled out from those which the user has set to an empty string.

[Issue: How should <null/> be specified? Should it exist in another namespace?]

[Issue: It may be worth allowing the default attribute on the datatype elements such as <string> and <number>.]

5.2 Read-only values

These correspond to data items whose value is fixed. This can be represented by setting the range attribute to "closed" and supplying a single value, for instance:

<string name="color" range="closed">
  <value>black</value>
</string>

A more concise way to represent this would be to specify the fixed value using a fixed attribute as a facet on the data item, for instance:

<string name="color" fixed="black"/>

5.3 Required Values

The form may require certain values to be filled in before the form is submitted. This would be easy to represent as a facet on a data item, e.g.

<integer name="age" required="true"/>

More generally, fields may or may not be required according to the values of other fields. This could be represented as a Boolean expression, using the same syntax for expressions as used for computed values, for instance:

<string name="spouse" required="status is 'married'"/>

where "status" refers to a field which can be one of "married" or "single".

5.4 Calculated Values

The form may include values that are computed from the values of other fields. For example, the sum over line items for quantity times unit price, or the amount of tax to be paid on an order. The computed value can be represented as an expression over the values of other data items.

Here is an example:

<currency name="totalPrice" calc="sum(lineItem, quantity * price)"/>

This sums the product of the values of the data items named quantity and price over the repeated group named lineItem. See section 6.5 which provides an example of how lineItem could be represented.

5.5 Script Validations

Sometimes it will be valuable to be able to use an expression to verify that a group or field has a valid value. The validate facet will include an expression (which may be able to refer to this.value) that returns a Boolean. Our expression language will also permit a callout to a function defined elsewhere (such as in a traditional scripting language) which could also do the validation and return a Boolean. For example:

<string name="postcode" validate="ValidPostCode(this.value)"/>

where ValidPostCode is a Boolean function the form designer has written to verify that the postcode value is ok.

[Issue: this.value is used to access the value of the current data object. Scripts would be able to find their way around the form via functions to traverse the data model. This must be addressed by the expression syntax.]

6. Data Model Structures

6.1 Enumerations

An enumeration specifies a type and a set of values. Here is an example for a closed set of credit card types:

<string name="card" range="closed">
  <value>Visa</value>
  <value>MasterCard</value>
  <value>Diners</value>
  <value>American Express</value>
</string>

The range attribute specifies whether the enumeration is "open" or "closed". If "open" the datatype accepts values other than those listed, but the entered value must satisfy any facets specified for the datatype. The range attribute can be used with all of the built-in datatypes and defaults to "open".

6.2 Unions

You can specify a datatype as a union of types. For example, the following accepts a number or enumerated string:

<union name="weekday">
  <string range="closed">
    <value>Monday</value>
    <value>Tuesday</value>
    <value>Wednesday</value>
    <value>Thursday</value>
    <value>Friday</value>
    <value>Saturday</value>
    <value>Sunday</value>
  </string>
  <number min="1" max="7" integer="true"/>
</union>

Some examples of valid data are:

<weekday>Tuesday</weekday>
<weekday>2</weekday>

[Issue: Is the name attribute required for each of the types within a union? Is there a better name than "range"?]

6.3 Composite Types

The <group> element is used to define composite datatypes aggregating several data items. Here is an example for a datatype used to represent a customer address:

<group name="customer">
  <string name="fullname"/>
  <string name="street"/>
  <string name="city"/>
  <string name="state"/>
  <string name="zip"/>
  <string name="phone"/>
  <string name="email"/>
  <string name="fax"/>
</group>

Groups can be nested as needed for creating hierarchical datatypes. <group> elements are intended to be treated as "objects" in scripting languages such as ECMAScript. For instance, you could access the street in above data structure using the syntax: customer.street. If the group named "customer" is a member of a group named "order", the street could be accessed by order.customer.street. This is made possible by restrictions on the characters you can use for names. The names of data items must be unique for the group in which they are defined.

When submitted, the group is represented by an element whose tag is the same as the name of the group, for example:

<customer>
  <fullname>John Smith</fullname>
  <street>21 Filofax avenue</street>
  <city>Peoria</city>
  <state>Illinois</state>
  <zip>02139</zip>
  <phone>1 809 235 6178</phone>
  <email><null/></email>
  <fax><null/></fax>
</customer>

6.4 Variant Types

The details of postal addresses vary from one country to another. If a Web-based order form is to be used internationally, one solution is to use a lowest common denominator approach, for example, to provide a multi-line text input field. This makes it harder to identify the subfields in the address, for instance, in the US, the street, city, zip code and state.

A more sophisticated approach would be for the form to adjust itself according to the user's locale. This impacts both the user interface and the data model. It is proposed that data models can exploit a variant mechanism that allows a given name to identify one of a set of variants as appropriate to the locale.

<variant name="address">
  <case locale="us">
    <string name="street"/>
    <string name="city"/>
    <string name="state"/>
    <string name="zip"/>
  </case>  
  <case locale="uk">
    <string name="street"/>
    <string name="town"/>
    <string name="county"/>
    <string name="postcode"/>
  </case>  
</variant>

If an expression used to constrain the data model needs to reference one of the variant fields, the locale appears as part of the name, for instance: address.uk.town.

[Issue: The above description needs to be extended to allow for a default case, perhaps using a default element. How should this appear in references, e.g. address.default.town?]

6.5 Arrays

Normally each datatype definition corresponds to a single data value. You can allow a sequence of data values for the same datatype by specifying values for the minOccurs and maxOccurs attributes. These can be used with all the built-in types and with the <group>, <union> and variant elements. The default value for these attributes is "1". The special value "*" represents an unlimited repetition.

The data model for an order form will typically allow for a number of line items that detail the products and quantities being ordered. Setting maxOccurs to "*" will allow the form to have one or more such line items:

<group name="lineItem" maxOccurs="*">
  <integer name="quantity"/>
  <string name="product"/>
  <string name="description"/>
  <currency name="price"/>
</group>

When submitted, the data for this would look something like:

<lineItem>
  <quantity>1</quantity>
  <product>51645A</product>
  <description>Black HP InkJet cartridge</description>
  <price>17.15</price>
</lineItem>
<lineItem>
  <quantity>2</quantity>
  <product>51641A</product>
  <description>Tri-color HP InkJet cartridge</description>
  <price>17:45</price>
</lineItem>
...

[Issue: Would our target audience prefer an explicit <array> element? What about allowing expressions for the values for minOccurs and maxOccurs so that these can be sensitive to values entered into the form?]

6.6 Shared Datatype Libraries

Many forms applications are likely to have overlapping needs for the datatypes they use. One way to share definitions is to maintain a library of common datatypes for pasting into data models. Another would be to provide a means to import such datatypes by reference. A important consideration is a means to re-use server-side code for processing subforms when they use the same datatype.

Using a reference to a remote definition of a datatype could cause delays while the definition is retrieved. This suggests a combined approach whereby the shared definition is pasted into the data model, but an attribute is used to give a globally unique identifier which is the same for all occurrences of the shared datatype. For example:

<string name="isbn" pattern="\d*-\d*-\d*-\d*"
 uri="http://www.isbn.org/isbn"/>

The attribute's value, a URI reference, is the namespace name identifying the namespace. The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists). An example of a syntax that is designed with these goals in mind is that for Uniform Resource Names [RFC 2141]. However, it should be noted that ordinary URLs can be managed in such a way as to achieve these same goals.

[Issue: If the same datatype is used multiple times in the same data model, it might become tiresome to keep repeating the same definition over and over. Is it worth providing a short cut for this situation?]

7. Expression Language

[Issue: This is at a very early stage and much work remains to be done. In particular, methods to work with the data model in a reflective way are absent from this revision.]

Constraints on and between data values are easy to represent using expressions. The proposed syntax is close to that of ECMAScript expressions, with modifications to avoid the need for escaping characters such as "<" and "&" which occur in the names of certain ECMAScript operators [ECMA-262]. As a result, the expression syntax uses English words instead. A built-in set of functions would be provided for summing expressions over arrays, and common financial calculations and string operations. You could also call out to functions defined directly in ECMAScript or other scripting languages, for example, Microsoft's VBScript.

The <group> elements define scopes for names. Names in the local scope belonging to the same <group> element can be used directly. Each group implicitly defines names for the parent group ("parent()") and the top-most group ("root()"). These names are reserved and cannot be used for data items.

In order avoid long and brittle sequences of parental names (parent().parent().parent()...), groups are used to delineate the boundaries of scope. A name is within scope if it is within the current group or within a group that is an ancestor to the current group. This allows a form designer to create a form where local references prevail (i.e., a group's internal field references won't change when that group is inserted into a new context). Yet, it still allows a group access to data in its parent group, very useful for accessing common fields. Also, this is consistent with block-structured programming languages. For example:

<group name="outer">
  <money name="bar"/>
  <group name="inner">
    <money name="foo" calc="bar"/>
  </group>
</group>

Here, since no bar exists in the inner scope, the next level, the outer scope is examined, where a bar is found and used in the expression.

Note that only the immediate scoping context is searched for names on the right of periods. For example, Summary.Name will find the nearest ancestor scoping context for Summary, even if it is not the current context. However, Name will then be located only in the context managed by Summary. In other words, the Name field in the root group will not be found, even though it is an ancestral scoping context of Summary and Summary does not have a Name field.

In some situations, expressions may need to make a remote procedure call to a server, for instance to verify that a given value is acceptable based upon a database lookup operation. This can be handled in the scripting language and doesn't impact the data modeling language as such.

7.1 Syntax

This is a partial BNF for expressions:

expr ::= identifier
     ::= number
     ::= 'string' | "string"
     ::= function
     ::= (expr)
     ::= prefix expr
     ::= expr infix expr
     ::= expr is [not] within(expr, expr)

identifier ::= (this | ((root() | parent() | name) [[expr]]))
      [.((parent() | name)[[expr]] | function)]*
function ::= name ([arg [, arg]*] )
arg ::= expr
prefix ::= - | +
infix ::= and | or | xor | + | - | * | /
infix ::= is [not] [above | below]

White space is permitted between tokens but not before the "[" character of an array index. Likewise whitespace is not permitted before or after the "." character in compound identifiers. Whitespace is required between adjacent alphanumeric tokens, e.g. white space is required between the operator "not" and the name of a function.

7.2 General Methods

These are the built-in functions. You can also call functions defined in scripts if the user agent supports scripting.

x is within(y, z): Boolean function that returns true of the value of x is great than or equal to the value of y and less than or equal to the value of z. An alternative would be a 3 place function, e.g. "within(x,y,z)". Yet another would to define within as a method on all XForm data objects, e.g. "x.within(y,z)".
sum(x, expression): Numeric function that sums the value of the expression applied to each instance of x in the current group. For instance to sum the unit price times the quantity for each line item you could write: "sum(lineItem, price*quantity)".
average(x, expression): Numeric expression that returns the arithmetic average value of the expression applied to each instance of x in the current group. For instance to compute the average price for each line item you could write: "average(lineItem, price)".

7.3 String Methods

Note that following ECMASCript, the + operator can be used to concatenate strings. The following have been adapted from the ECMAScript specification ([ECMA-262] edition 3) but as functions rather than as methods on strings. The following is an incomplete list of the string functions.

string([char0, char1, ...]): Returns a string value containing as many characters as the number of arguments. Each argument specifies the numeric character code for a single character. The first argument designates the first character, the second argument designates the second character and so on. If no arguments are supplied, an empty string is returned. This is the fromCharCodes() method in ECMAScript.
toString(x): Returns a string value computed by coercing the value of x to a string.
charAt(s, n): Returns the numeric character code for the nth character in string s, where the first character is considered to be at position zero.
substring(s, n [, m]): Returns the substring in the string s, starting with the character at position n and ending with (but not including) the character at position m, where the first character is considered to be at position zero. If n is greater than m an empty string is returned. If n or m are greater than the length of string s, their values are replaced by the length of the string. If m is missing its value is taken to be the length of the string.
strlen(s): Returns the number of characters in string s.

7.4 Finance Methods

[Issue: Should these be in an optional financial library?]

apr(n1, n2, n2): Returns the annual percentage rate for a loan, where n1 is the principal amount of the loan, n2 is the monthly payment, and n3 is the number of months payments will have to be made. For example "apr(35000, 269.50, 30 * 12)" returns 0.085 (or 8.5%) for the annual interest rate on a loan of $35,000 being repaid at $269.50 per month over 30 years.
cterm(n1, n2, n2): Returns the number of periods needed for an investment earning a fixed, but compounded, interest rate to grow to a future value, where n1 is the interest rate per period, n2 is the future value of the investment, and n3 is the amount of the initial investment. For example "cterm(.02, 200, 100)" returns 35 as the required period for $100 invested at 2% to grow to $200.
fv(n1, n2, n3): Returns the future value of periodic constant payments at a constant interest rate, where n1 is the amount of each equal payment, n2 is the interest rate per period, and n3 is the total number of periods. For example "fv(100, .075 / 12, 10 * 12)" returns 17793.03* as the amount present after paying $100 a month for 10 years in an account bearing an annual interest of 7.5%.
ipmt(n1, n2, n2, n3, n4, n5): Returns the amount of interest paid on a loan over a period of time, where n1 is the principal amount of the loan, n2 is the annual interest rate, n3 is the monthly payment, n4 is the first month of the computation, and n5 is the number of months to be computed. For example "ipmt(30000, .085, 295.50, 7, 3)" returns 624.88 as the amount of interest paid starting in July (month 7) for 3 months on a loan of $30,000.00 at an annual interest rate of 8.5% being repaid at a rate of $295.50 per month.
npv(n1, n2 [, ...]): Returns the the net present value of an investment based on a discount rate, and a series of periodic future cash flows, where n1 is the discount rate over one period, n2 ... are cash flow values which must be equally spaced in time and occur at the end of each period. For example "npv(0.15, 100000, 120000, 130000, 140000, 50000)" returns 368075.16 as the net present value of an investment projected to generate $100,000, $120,000, $130,000, $140,000 and $50,000 over each of the next five years and the rate is 15% per annum.
pmt(n1, n2, n3): Returns the payment for a loan based on constant payments and a constant interest rate, where n1 is the principal amount of the loan, n2 is the interest rate per period, and n3 is the number of monthly payments. For example, "pmt(30000.00, .085 / 12, 12 * 12)" returns 333.01 as the monthly payment for a loan of a $30,000, borrowed at a yearly interest rate of 8.5%, repayable over 12 years (144 months).
ppmt(n1, n2, n2, n3, n4, n5): Returns the amount of principal paid on a loan over a period of time, where n1 is the principal amount of the loan, n2 is annual interest rate, n3 is the monthly payment, n4 is is the first month of the computation, and n5 is the number of months to be computed. For example "ppmt(30000, .085, 295.50, 7, 3)" returns 261.62 as the amount of principal paid starting in July (month 7) for 3 months on a loan of $30,000 at an annual interest rate of 8.5%, being repaid at $295.50 per month. The annual interest rate is used in the function because of the need to calculate a range within the entire year.
pv(n1, n2, n3): Returns the present value of an investment of periodic constant payments at a constant interest rate, where n1 is the amount of each equal payment, n2 is the interest rate per period, and n3 is the total number of periods. For example "pv(1000, .08 / 12, 5 * 12)" returns 49318.43 as the present value of $1000.00 invested at 8% for 5 years.
rate(n1, n2, n3): Returns the compound interest rate per period required for an investment to grow from present to future value in a given period, where n1 is the future value, n2 is the present value and n3 is is the total number of periods. For example "rate(110, 100, 1)" returns 0.10 as what the rate of interest must be for and investment of $100 to grow to $110 if invested for 1 term.
term(n1, n2, n3): Returns the number of periods needed to reach a given future value from periodic constant payments into an interest bearing account, where n1 is the payment amount made at the end of each period, n2 is the interest rate per period, and n3 is the future value. For example "term(475, .05, 1500)" returns 3 as the number of months for an investment of $475, deposited at the end of each period into an account bearing 5% compound interest, to grow to $1500.00.

7.5 Decimal Arithmetic

It is very common for people who are not experienced programmers to be confused by the results of numeric calculations such as division by 10. They are not aware that computers use binary arithmetic and that this method can produce results that differ from decimal arithmetic - the method we were taught in school.

The proposal is for XForms expressions to conform to [ANSI X3-274] for arithmetic. This features full-function decimal floating point arithmetic with integers as a seamless subset. It preserves matissa length, e.g. 1.20 x 2 gives 2.40 (not 2.4) and provides for an exact representation as expected for values such as 0.1 (not 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + … ).

This standard has been used heavily over 16 years by IBM and its customers and is based on feedback from users, mathematicians, data processing experts, and financial experts, etc. The overhead in processing time is expected to be negligible in practice, with a fixed code overhead of about 2 to 4 K bytes.

Further discussion on the choice of decimal arithmetic is in Appendix C.

8. Acknowledgments

This requirements document was written with the participation of the members of the Forms Subgroup of the W3C HTML Working Group (listed in alphabetical order):

Frank Boumphrey, HTML Writers Guild
Stewart Butterfield, Communicate.com
Tantek Çelik, Microsoft
Andrew Donoho, IBM
Micah Dubinko, Cardiff Software
Michael Fergusson, Communicate.com
Leigh Klotz, Xerox
Dave Manning, PureEdge
Mike Mansell, PureEdge
Larry Masinter, AT&T
Rob McDougal, JetForm
Gavin McKenzie, JetForm
Steven Pemberton, CWI
T. V. Raman, IBM
Dave Raggett, W3C/HP (W3C staff contact)
Sebastian Schnitzenbaumer, Stack Overflow
Stacy Silvester, Cardiff Software
Malte Wedel, Stack Overflow

9. References

9.1 Normative References

[ANSI X3-274]: American National Standards Institute (ANSI). Information Technology - Programming Language REXX. Document Number: ANSI X3.274-1996. 1996.

[ECMA 262]: European Computer Manufacturers' Association (ECMA). ECMA-262: ECMAScript Language Specification. Available at ftp://ftp.ecma.ch/ecma-st/Ecma-262.pdf. 1999.

[ISO 4217]: International Organization for Standardization (ISO). ISO Standards for Currency Names. 1999.

[ISO 8601]: International Organization for Standardization (ISO). Representations of dates and times. Available at: http://www.iso.ch/markete/8601.pdf. 1988.

[RFC 2396]: Berners-Lee, Tim et. al. RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax. Available at: http://www.ietf.org/rfc/rfc2396.txt. 1998.

[Unicode]: Aliprand, Joan, Julie Allen, Joe Becker, Mark Davis, Michael Everson, Asmus Freytag, John Jenkins, Mike Ksar, Rick McGowan, Lisa Moore, Michel Suignard, and Ken Whistler. The Unicode Standard, Version 3.0, Reading, Mass.: Addison-Wesley Developers Press. 2000.

[XML 1.0]: Bray, Tim, Jean Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. Available at: http://www.w3.org/TR/REC-xml. 1998.
[XML-Names]: Bray, Tim, Dave Hollander, and Andrew Layman. Namespaces in XML. Available at: http://www.w3.org/TR/REC-xml-names. 1999.
[XSchema-1]: Thompson, Henry S., David Beech, Murray Maloney, and Noah Mendelsohn. XML Schema Part 1: Structures. Available at: http://www.w3.org/TR/xmlschema-1. 2000.
[XSchema-2]: Biron, Paul V. and Ashok Malhotra. XML Schema Part 2: Datatypes. Available at: http://www.w3.org/TR/xmlschema-2. 2000.

9.2 Non-Normative References

[RFC 2141]: Moats, R. URN Syntax. Available at: http://www.ietf.org/rfc/rfc2141.txt. 1997.

[XHTML 1.0]: Pemberton, Steve, et al. XHTML^™ 1.0: The Extensible HyperText Markup Language - A Reformulation of HTML 4 in XML 1.0. Available at: http://www.w3.org/TR/xhtml1. 2000.

[XSchema-0]: Fallside, David C. XML Schema Part 0: Primer. Available at: http://www.w3.org/TR/xmlschema-0. 2000.
[XSLT]: Clark, James. XSL Transformations (XSLT) Version 1.0. Available at: http://www.w3.org/TR/xslt.1999.

Appendix A: Mapping to XML Schema Concrete Syntax (Non-Normative)

This is a placeholder for a section where a future revision will explain how the syntax proposed in this document for data modeling is mapped into the concrete syntax for XML Schemas.

Appendix B: Binding to XHTML 1.0 Forms (Non-Normative)

[Issue: This section contains preliminary information.]

One important aspect of XForms is providing a clean upgrade path for authors using Web forms today. The design of XForms is flexible in allowing various user interface technologies to work with a common data model. This appendix describes a simple binding between the XForms 1.0 data model and [XHTML 1.0] form elements.

The first step is to bind the <form> element to the appropriate <xform> element that defines the data model for the form. The value of the name attribute of the <form> element should match the id attribute on the <xform> element.

The next step is to ensure that the name attribute on each form control matches the full name of the corresponding field. XForms defines a hierarchical naming scheme using the name attribute for each level in the hierarchy. The full name of a field is given by the sequence of names from top to bottom of the hierarchy. In the example, the field for the street is identified by purchaseOrder.shipTo.street.

Specifying a name attribute on the following XHTML form elements binds it to an element in the instance data with the same fully qualified name:

<input>
<select>
<textarea>
<object>

The initial value of an XHTML form control is the value of the bound instance data. The <button> element can be used to submit the form via a call to a script function, supplying the id value for the XForm as an argument. The script can use the DOM to traverse the markup defining the data model to build the XML representation of the data.

Appendix C: Rationale Behind Decimal Arithmetic (Non-Normative)

Contributed by Mike Cowlishaw, IBM

Why is decimal arithmetic the right thing to use?

-- Many common decimal quantities (for example, 0.1) cannot be represented exactly in a binary floating point representation; binary floating point is a lossy encoding of decimal numbers. This leads to anomalies, even after a single operation, for example:

Division: 1/0.1 ==> 10 (correct) Remainder: 1%0.1 ==> 0.0999999999999995 (incorrect, it should be 0)

Anomalies build up even more rapidly under repeated operations.

-- These anomalies are visible even if rounding is applied (the latter result, for example, rounds to 0.1 instead of 0).

-- The anomalies lead to discrepancies between the results obtained 'manually' and those obtained by computer. This makes it difficult and expensive to verify algorithms and test software.

-- As a result, customers complain at unexpected results and there are significant increased costs in application development, service calls, and maintenance.

Issues of performance:

Binary floating point is often carried out in hardware, and is in that case faster than decimal arithmetic which on most computers is implemented in software. However:

-- Few commercial applications spend much time carrying out arithmetic; measurements in an interpreted environment using decimal arithmetic suggest a typical figure is 8% of execution time is in arithmetic.

-- Conversions between decimal and string (readable) forms are simpler and more efficient than those between binary and string.

-- The bulk of numeric data stored in databases is held in decimal form (to avoid the anomalies described above); converting these to and from a binary form is inefficient as well as lossy.

-- In practice, the 'default' decimal precision (9 digits) can be very efficiently implemented using 32-bit integers for mantissa and exponent. This implementation would be especially attractive on 'small' devices.

-- In addition, all widely used microprocessor, mini, and mainframe computers (other than RISC machines) provide native decimal instructions or decimal adjustment operations.