Mathematical Markup Language (MathML) Version 2.0
1 Introduction
2 MathML Fundamentals
2.1 MathML Overview
2.1.1 Taxonomy of MathML Elements
2.1.2 Expression Trees and Token Elements
2.1.3 Presentation Markup
2.1.4 Content Markup
2.1.5 Mixing Presentation and Content
2.2 Some MathML Examples
2.2.1 Presentation Examples
2.2.2 Content Examples
2.2.3 Mixed Markup Examples
2.3 MathML Syntax and Grammar
2.3.1 An XML Syntax Primer
2.3.2 Children versus Arguments
2.3.3 MathML Attribute Values
2.3.4 Attributes Shared by all MathML Elements
2.3.5 Collapsing Whitespace in Input
3 Presentation Markup
This chapter introduces the basic ideas of MathML. The first section describes the overall design of MathML. The second section presents a number of motivating examples, to give the reader something concrete to refer to while reading subsequent chapters of the MathML Specification. The final section describes basic features of the MathML syntax and grammar, which apply to all MathML markup. In particular, section 2.3 [MathML Syntax and Grammar] should be read before chapter 3 [Presentation Markup], chapter 4 [Content Markup] and chapter 5 [Combining Presentation and Content Markup].
A fundamental challenge in defining a mathematics markup language for the Web is reconciling the need to encode both the presentation of a mathematical notation and the content of the mathematical idea or object which it represents.
The relationship between a mathematical notation and a mathematical idea is subtle and deep. On a formal level, the results of mathematical logic raise unsettling questions about the correspondence between symbolic logic systems and the phenomena they model. At a more intuitive level, anyone who uses mathematical notation knows the difference that a good choice of notation can make; the symbolic structure of the notation suggests the logical structure. For example, the Leibniz notation for derivatives `suggests' the chain rule of calculus through the symbolic cancellation of fractions: .
Mathematicians and teachers understand this very well; part of their expertise lies in choosing notation that emphasizes key aspects of a problem while hiding or diminishing extraneous aspects. It is commonplace in mathematics and science to write one thing when technically something else is meant, because long experience shows this actually communicates the idea better at some higher level.
In many other settings, though, mathematical notation is used to encode the full, precise meaning of a mathematical object. Mathematical notation is capable of prodigious rigor, and when used carefully, it is virtually free of ambiguity. Moreover, it is precisely this lack of ambiguity which makes it possible to describe mathematical objects so that they can be used by software applications such as computer algebra systems and voice renderers. In situations where such inter-application communication is of paramount importance, the nuances of visual presentation generally play a minimal role.
MathML allows authors to encode both the notation which represents a mathematical object and the mathematical structure of the object itself. Moreover, authors can mix both kinds of encoding in order to specify both the presentation and content of a mathematical idea. The remainder of this section gives a basic overview of how MathML can be used in each of these ways.
All MathML elements fall into one of three categories: presentation elements, content elements and interface elements. Each of these categories is described in detail in chapter 3 [Presentation Markup], chapter 4 [Content Markup] and chapter 7 [The MathML Interface] respectively.
Presentation elements describe mathematical notation structure.
Typical examples are the mrow
element, which is used to
indicate a horizontal row of pieces of expressions, and the
msup
element, which is used to indicate a base and
superscript. As a general rule, each presentation element corresponds to a
single kind of notational `schema' such as a row, a
superscript, an underscript and so on. Since many notational schemata have
a number of frequently occurring variants, most presentation elements
accept a number of attributes which can be used to select between
variants. For example, the superscript element accepts a `superscript
shift' attribute which specifies the minimum amount the superscript
should shift upward.
Content elements describe mathematical objects directly, as opposed to
describing the notation which represents them. Typical examples include the
plus
element, which denotes the usual addition
operator for real numbers, and the vector
element,
which denotes a vector from linear algebra. Each content element
corresponds to some mathematical concept. Some elements represent
mathematical objects like vectors, while others represent functions or
operations like addition.
Every MathML element but one is either a presentation element or a
content element. The math
element is neither, since its
role is to serve as a top-level, interface element. One function of
the math
element is to pass on parameters to a MathML
processor that affect an entire expression, such as style preferences.
A second function is to communicate parameters to a Web browser about
what software to use to render a MathML expression, and how the
expression should be integrated into the surrounding HTML page. (As
XML support is added to browsers, it may ultimately be necessary to
introduce one or two more interface elements, to handle these
functions separately. See chapter 7 [The MathML Interface] for details.)
Presentation and content expressions both share a number of formal properties. In both cases, most expressions naturally decompose into pieces, or sub-expressions. For example, the expression (a + b)2 naturally breaks into a `base', the (a + b), and a `script', which is the single character `2' in this case. Furthermore, as this example shows, the sub-expressions may themselves decompose into further sub-expressions, and so on. Of course, the decomposition process eventually terminates with indivisible expressions such as digits, letters, or other symbol characters.
Although this particular example involves mathematical notation, and hence presentation markup, the same observation applies equally well to abstract mathematical objects, and hence to content markup. For example, in a context of content markup our superscript example would typically be denoted by an exponentiation operation that would require two operands: a `base' and an `exponent'. This is no coincidence, since as a general rule, mathematical notation closely mirrors the logical structure of the underlying mathematical objects.
The recursive nature of mathematical objects and notation is strongly reflected in MathML markup. Most presentation or content elements contain some number of other MathML elements corresponding to the constituent pieces out of which the original object is recursively built. The original schema is commonly called the parent schema, and the constituent pieces are called child schemata. More generally, MathML expressions can be regarded as trees, where each node corresponds to a MathML element, the branches under a `parent' node correspond to its `children', and the leaves in the tree correspond to indivisible notation or content units such as numbers, characters, etc.
Most leaf nodes in a MathML expression tree are either canonically
empty elements, or token elements.
Canonically empty elements directly represent symbols in MathML, such
as the content element plus
. MathML token elements are
the only MathML elements permitted to directly contain character data.
The character data may consist of ASCII characters and MathML
entities, which are escape sequences of the form
&name;
. MathML entities typically denote
non-ASCII Unicode characters such as α
,
→
and ∑
.
A third kind of leaf node permitted in MathML is the
annotation
element, which is used to hold data in a
non-MathML format.
The most important presentation token elements are mi
,
mn
and mo
for representing identifiers,
numbers and operators respectively. Typically a renderer will employ
slightly different typesetting styles for each of these kinds of
character data: numbers are usually in upright font, identifiers in
italics, and operators have extra space around them. In content
markup, there are only two tokens, ci
and
cn
for identifiers and numbers respectively. In
content markup, separate elements are provided for commonly used functions
and operators. The fn
element is provided for
user-defined extensions to the base set.
In terms of markup, most MathML elements have a start
tag and an end tag, which enclose the markup for their
contents. In the case of tokens, the content is character data, and in
most other cases, the content is the markup for child elements. A
third category of elements, called canonically empty elements, don't
require any contents, and are marked up using a single tag of the form
<name/>
. An example of this kind
of markup is <plus/>
in content markup.
Returning to the example of (a + b)2, we can now see how the principles discussed above play out in practice. One form of presentation markup for this example is:
<msup> <mfenced> <mrow> <mi>a</mi> <mo>+</mo> <mi>b</mi> </mrow> </mfenced> <mn>2</mn> </msup>
The content markup for the same example is:
<apply> <power/> <apply> <plus/> <ci>a</ci> <ci>b</ci> </apply> <cn>2</cn> </apply>
While a full discussion of presentation and content markup must wait until chapter 3 [Presentation Markup] and chapter 4 [Content Markup], the main features of these sample encodings should now be relatively clear.
MathML presentation markup consists of 30 elements which accept over 50
attributes. Most of the elements correspond to layout
schemata, which contain other presentation elements. Each layout
schema corresponds to a two-dimensional notational device, such as a
superscript or subscript, fraction or table. In addition, there are the
presentation token elements mi
, mn
and mo
introduced above, as
well as several other less commonly used token elements. The remaining few
presentation elements are empty elements, and are used mostly in connection
with alignment.
The layout schemata fall into several classes. One group of
elements is concerned with scripts, and contains elements such as
msub
, munder
, and
mmultiscripts
. Another group focuses on more general
layout and includes mrow
, mstyle
,
and mfrac
. A third group deals with tables. The maction
element is a category by itself, and represents
various kinds of actions on notation, such as in an expression which
toggles between two pieces of notation.
An important feature of many layout schemata is that the order of
child schemata is significant. For example, the first child of an
mfrac
element is the numerator and the second child is
the denominator. Since the order of child schemata is not enforced at
the XML level by the MathML DTD, the information added by ordering is
only available to a MathML processor, as opposed to a generic XML
processor. When we want to emphasize that a MathML element such as
mfrac
requires children in a specific order, we will
refer to them as arguments, and think of the
mfrac
element as a notational `constructor'.
Content markup consists of about 100 elements accepting roughly a dozen
attributes. The majority of these elements are empty elements corresponding
to a wide variety of operators, relations and named functions. Examples of
this sort include partialdiff
, leq
and tan
. Others such as matrix
and set
are used to
encode various mathematical data types, and a third, important category of
content elements such as apply
are used to make new
mathematical objects from others.
The apply
element is perhaps the single most important
content element. It is used to apply a function to a collection of
arguments. The positions of the child schemata is again significant,
with the first child denoting the function to be applied, and the
remaining children denoting the arguments of the function, with order
preserved. Note that the apply construct always uses prefix notation,
like the programming language LISP. In particular, even binary
operations like subtraction are marked up by applying a prefix
subtraction operator to two arguments. For example, a -
b would be marked up as
<apply> <minus/> <ci>a</ci> <ci>b</ci> </apply>
A number of functions and operations require one or more
quantifiers to be well-defined. For example, in addition to an
integrand, a definite integral must specify the limits of integration
and the bound variable. For this reason, there are several
qualifier schemata such as bvar
and
lowlimit
. They are used with operators such as
diff
and int
.
The declare
construct is especially important for
content markup that might be evaluated by a computer algebra system.
The declare
element provides a basic assignment
mechanism, where a variable can be declared to be of a certain type,
with a certain value. Typically, declarations are ignored for visual
rendering, and are used when an expression is evaluated.
Different kinds of markup will be most appropriate for different kinds of tasks. Legacy data is probably best translated into pure presentation markup, since semantic information about what the author meant can only be guessed at heuristically. By contrast, some mathematical applications and pedagogically-oriented authoring tools will likely choose to be entirely content-based. However, the majority of applications fall somewhere in between these extremes. For these applications, the most appropriate markup is a mixture of both presentation and content markup.
The rules for mixing presentation and content markup derive from the general principle that mixed content should only be allowed in places where it makes sense. For content markup embedded in presentation markup this basically means that any content fragments should be semantically meaningful, and should not require additional arguments or quantifiers to be fully specified. For presentation markup embedded in content markup, this usually means that presentation markup must be contained in a content token element, so that it will be treated as an indivisible notational unit used as a variable or function name.
Another option is to use a semantics
element. The
semantics
element is used to bind MathML expressions to
various kinds of annotations. One common use for the
semantics
element is to bind a content expression to a
presentation expression as a semantic annotation. In this way, an
author can specify a non-standard notation to be used when displaying a
particular content expression. Another use of the
semantics
element is to bind some other kind of semantic
specification, such as an OpenMath expression, to a MathML expression.
In this way, the semantics
element can be used to extend
the scope of MathML content markup.
Notation: x2 + 4x + 4 = 0.
Markup:
<mrow> <mrow> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <mrow> <mn>4</mn> <mo>⁢</mo> <mi>x</mi> </mrow> <mo>+</mo> <mn>4</mn> </mrow> <mo>=</mo> <mn>0</mn> </mrow>
Note the use of nested mrow
elements to denote terms,
in this case the left-hand side of the equation functioning as an operand
of `='. Marking terms greatly facilitates things like spacing
for visual rendering, voice rendering, and line breaking.
Notation: .
Markup:
<mrow> <mi>x</mi> <mo>=</mo> <mfrac> <mrow> <mrow> <mo>-</mo> <mi>b</mi> </mrow> <mo>±</mo> <msqrt> <mrow> <msup> <mi>b</mi> <mn>2</mn> </msup> <mo>-</mo> <mrow> <mn>4</mn> <mo>⁢</mo> <mi>a</mi> <mo>⁢</mo> <mi>c</mi> </mrow> </mrow> </msqrt> </mrow> <mrow> <mn>2</mn> <mo>⁢</mo> <mi>a</mi> </mrow> </mfrac> </mrow>
Notice that the plus/minus sign is given by a special named entity ±
. MathML provides a very comprehensive list of
entity names for mathematical symbols. In addition to the mathematical
symbols needed for screen and print rendering, MathML provides symbols to
facilitate audio rendering. For audio rendering, it is important to be able
to automatically determine whether
<mrow> <mi>z</mi> <mfenced> <mrow> <mi>x</mi> <mo>+</mo> <mi>y</mi> </mrow> </mfenced> </mrow>
should be read as `z times the quantity x plus
y' or `z of x plus y'. The entities ⁢
and ⁡
provide a way for authors to directly encode the distinction for audio
renderers. For instance, in the first case
⁢
should be inserted after the line
containing the z. MathML also introduces entities like ⅆ
which represents a `differential d'
which renders with slightly different spacing in print, and can be rendered
as `d' or `with respect to' in speech. Unless
content tags, or some other mechanism, are used to eliminate the ambiguity,
authors should always use these entities, in order to make their documents
more accessible.
Notation: .
Markup:
<mrow> <mi>A</mi> <mo>=</mo> <mfenced open="[" close="]"> <mtable> <mtr> <mtd><mi>x</mi></mtd> <mtd><mi>y</mi></mtd> </mtr> <mtr> <mtd><mi>z</mi></mtd> <mtd><mi>w</mi></mtd> </mtr> </mtable> </mfenced> </mrow>
Most elements have a number of attributes that control the details of their
screen and print rendering. For example, there are several attributes for
the mfenced
element that control what delimiters
should be used at the beginning and the end of the expression. The
attributes for operator elements given using <mo>
are set to default values determined by a dictionary. (For the suggested
MathML operator dictionary, see appendix D [Operator Dictionary].)
Notation: x2 + 4x + 4 = 0.
Markup:
<apply> <eq/> <apply> <plus/> <apply> <power/> <ci>x</ci> <cn>2</cn> </apply> <apply> <times/> <cn>4</cn> <ci>x</ci> </apply> <cn>4</cn> </apply> <cn>0</cn> </apply>
Note that the apply
element is used for
relations, operators and functions.
Notation: .
Markup:
<apply> <eq/> <ci>x</ci> <apply> <divide/> <apply> <fn><mo>±</mo></fn> <apply> <minus/> <ci>b</ci> </apply> <apply> <root/> <apply> <minus/> <apply> <power/> <ci>b</ci> <cn>2</cn> </apply> <apply> <times/> <cn>4</cn> <ci>a</ci> <ci>c</ci> </apply> </apply> <cn>2</cn> </apply> </apply> <apply> <times/> <cn>2</cn> <ci>a</ci> </apply> </apply> </apply>
MathML content markup does not directly contain an element for the
`plus or minus' operation. Therefore, we use the fn
element to declare that we want the presentation
markup for this operator to act as a content operator. This is a simple
example of how presentation and content markup can be mixed to extend
content markup.
Notation: .
Markup:
<apply> <eq/> <ci>A</ci> <matrix> <matrixrow> <ci>x</ci> <ci>y</ci> </matrixrow> <matrixrow> <ci>z</ci> <ci>w</ci> </matrixrow> </matrix> </apply>
Note that by default, the rendering of the content element
matrix
includes enclosing parentheses, so we need not directly
encode them. This is quite different from the presentation element
mtable
which may or may not refer to a matrix, and hence
requires explicit encoding of the parentheses if they are desired.
Notation: .
Markup:
<semantics> <mrow> <msubsup> <mo>∫</mo> <mn>0</mn> <mi>t</mi> </msubsup> <mfrac> <mrow> <mo>ⅆ</mo> <mi>x</mi> </mrow> <mi>x</mi> </mfrac> </mrow> <annotation-xml encoding="MathML-Content"> <apply> <int/> <bvar><ci>x</ci></bvar> <lowlimit><cn>0</cn></lowlimit> <uplimit><ci>t</ci></uplimit> <apply> <divide/> <cn>1</cn> <ci>x</ci> </apply> </apply> </annotation-xml> </semantics>
In this example, we use the semantics
element to provide a
MathML content expression to serve as a `semantic annotation' for a
presentation expression. The semantics
element has as its
first child the expression being annotated, and the subsequent
children are the annotations. There is no restriction on the kind of
annotation that can be attached using the semantics
element.
For example, one might give a TEX encoding, or computer algebra input
in an annotation. The type of annotation is specified by the
encoding
attribute and the annotation
and
annotation-xml
elements.
Another common use of the semantics
element arises when one
wants to use a content coding, and provide a suggestion for
its presentation. In this case, we would have the markup:
<semantics> <apply> <int/> <bvar><ci>x</ci></bvar> <lowlimit><cn>0</cn></lowlimit> <uplimit><ci>t</ci></uplimit> <apply> <divide/> <cn>1</cn> <ci>x</ci> </apply> </apply> <annotation-xml encoding="MathML-Presentation"> <mrow> <msubsup> <mo>∫</mo> <mn>0</mn> <mi>t</mi> </msubsup> <mfrac> <mrow> <mo>ⅆ</mo> <mi>x</mi> </mrow> <mi>x</mi> </mfrac> </mrow> </annotation-xml> </semantics>
This kind of annotation is useful when something other than the default rendering of the content encoding is desired. For example, by default, some renderers might layout the integrand something like `1/x dx'. Specifying that the integrand should by preference render as `dx/x' instead can be accomplished with the use of a MathML Presentation annotation as shown. Be aware, however, that renderers are not required to take into account information contained in annotations, and what use is made of them, if any, will depend on the renderer.
MathML is an application of XML, or Extensible Markup Language [Bray1998], and as such its syntax is governed by the rules of XML syntax, and its grammar is in part specified by a DTD, or Document Type Definition. In other words, the details of using tags, attributes, entity references and so on are defined in the XML language specification, and the details about MathML element and attribute names, which elements can be nested inside each other, and so on are specified in the MathML DTD.
Issue (rewrite-for-schema):The following needs to be revised pending creation of a schema for MathML.
However, MathML also specifies some syntax and grammar rules in addition to the general rules it inherits as an XML application. These rules allow MathML to encode a great deal more information than would ordinarily be possible with pure XML, without introducing many more elements, and using a substantially more complex DTD. A grammar for content markup expressions is given in appendix B [Content Markup Validation Grammar]. Of course, one drawback to using MathML specific rules is that they are invisible to generic XML processors and validators.
There are basically two kinds of additional MathML grammar and syntax rules. One kind involves placing additional criteria on attribute values. For example, it is not possible in pure XML to require that an attribute value be a positive integer. The second kind of rule specifies more detailed restrictions on the child elements (for example on ordering) than are given in the DTD. For example, it is not possible in XML to specify that the first child be interpreted one way, and the second in another.
The following sections discuss features both of XML syntax and grammar in general, and of MathML in particular. Throughout the remainder of the MathML specification, we will usually take care to distinguish between usage required by XML syntax and the MathML DTD and usage required by MathML specific rules. However, we will frequently allude to `MathML errors' without identifying which part of the specification is being violated.
Since MathML is an application of XML, the MathML Specification uses the
terminology of XML to describe it. Briefly, XML data is composed of Unicode
characters (which include ordinary ASCII characters), `entity
references' (informally called `entities') such as →
which usually represent `extended
characters', and `elements' such as
<mi fontstyle="normal"> x </mi>
.
Elements enclose other XML data called their `content' between
a `start tag' (sometimes called a `begin tag') and an
`end tag', much like in HTML. There are also `empty
elements' such as <plus/>
, whose start tag
ends with />
to indicate that the element has no content or
end tag. The start tag can contain named parameters called
`attributes', such as fontstyle="normal"
in the
example above. For further details on XML, consult the XML
specification [Bray1998].
As XML is case-sensitive, MathML element and attribute names are case-sensitive. For reasons of legibility, the MathML defines them almost all in lowercase.
In formal discussions of XML markup a distinction is maintained between
an element, such as an mrow
element, and the tags
<mrow>
and </mrow>
marking
it. What is between the <mrow>
start tag and the </mrow>
end tag is the content of the mrow
element. An `empty element' such as none
is defined to have no content and so has a single
tag of the form <none/>
. Usually, the distinction
between elements and tags will not be so finely drawn in this
specification. For instance, we will sometimes refer to the <mrow>
and <none/>
elements,
really meaning the elements whose tags these are, in order that references
to elements are visually distinguishable from references to
attributes. However, the words `element' and
`tag' themselves will be used strictly in accordance with XML
terminology.
Many MathML elements require a specific number of child elements and/or attach additional meanings to children in certain positions. As noted above, these kinds of requirements are MathML specific, and cannot be specified entirely in terms of XML syntax and grammar. When the children of a given MathML element are subject to these kinds of additional conditions, we will often refer to them as arguments instead of merely children in order to emphasize their MathML specific usage. Note that especially in chapter 3 [Presentation Markup] the term `argument' is usually used in this technical sense, unless otherwise noted, and therefore refers to a child element.
In the detailed discussions of element syntax given with each element throughout the MathML specification, the number of required arguments and their order is implicitly indicated by giving names for the arguments at various positions. This information is also given for presentation elements in the table of argument requirements in section 3.1.3 [Required Arguments], and for content elements in appendix B [Content Markup Validation Grammar].
A few elements have other requirements on the number or type of arguments. These additional requirements are described together with the individual elements.
According to the XML language specification, attributes given to elements must have one of the forms
attribute-name = "value"
or
attribute-name = 'value'
where whitespace around the '=' is optional.
Attribute names are generally shown in a
monospaced
font within descriptive text in
this specification, but not within examples.
The attribute value, which in general in MathML can be a string of
arbitrary characters, must be surrounded by a pair of either double
quotes ("
) or single quotes ('
). The
kind of quotes not used to surround the value may be included within
it.
MathML uses a more complicated syntax for attribute values than the generic XML syntax required by the MathML DTD. These additional rules are intended for use by MathML applications, and it is a MathML error to violate them, though they are not enforced by XML processing. The MathML syntax of each attribute value is specified in the table of attributes provided with the description of each element it can be used with, using a notation described below. In MathML applications these attribute values should be further processed as follows, unless otherwise specified: whitespace is ignored except to separate letter and/or digit sequences into individual words or numbers; and the same entity references (listed in chapter 6 [Entities, Characters and Fonts]) which can be used within token elements to represent characters can be used to represent those characters in attribute values (whenever those characters would be permitted by that attribute value's syntax).
In particular, the characters "
, '
,
&
and <
can be included in MathML
attribute values (when permitted by the attribute value syntax) using the
entity references "
, '
,
'
and <
,
respectively.
The MathML DTD provided in appendix A [Parsing MathML] declares most
attribute value types as CDATA
strings. This permits increased
interoperability with existing SGML and XML software and allows extension
to the lists of predefined values.
To describe the MathML-specific syntax of permissible attribute values, the following conventions and notations are used for most attributes in the present document.
Notation | What it matches |
number | decimal integer or rational number (digits with one decimal point), optionally starting with '-' |
unsigned-number | decimal integer or real number, no sign |
integer | decimal integer, optionally starting with '-' |
positive-integer | decimal integer, unsigned, not 0 |
string | arbitrary string (always the entire attribute value) |
character | single non-whitespace character, or MathML entity reference; whitespace separation is optional |
#rgb | RGB color value |
#rrggbb | RGB color value |
h-unit | unit of horizontal length (allowable units are listed below) |
v-unit | unit of vertical length (allowable units are listed below) |
css-fontfamily | explained in CSS subsection, below |
html-color-name | explained in CSS subsection, below |
other italicized words | explained in the text for each attribute |
form + | one or more instances of form |
form * | zero or more instances of form |
f1 f2 ... fn | one instance of each form, in sequence, perhaps separated by whitespace |
f1 | f2 | ... | fn | any one of the specified forms |
[ form ] | optional instance of form |
( form ) | same as form |
word in plain text | that word, literally present in attribute value (unless it is obviously part of an explanatory phrase) |
quoted symbol | that symbol, literally present in attribute value (e.g. "+" or '+') |
Issue (rgb-notation):Do we need to explain what RGB colour notation is?
The order of precedence of the syntax notation operators is, from highest to lowest precedence:
A string can contain arbitrary characters which are
specifiable within XML CDATA
attribute values; it must use entity
references for certain characters, as described earlier. It can contain
XML-format entity or character references for any of the characters listed
in chapter 6 [Entities, Characters and Fonts]. No syntax rule in MathML includes
string as only part of an attribute value, only as the entire
value.
Issue (character):This needs to be revised for the introduction of the
mchar
element.
A character consists of a single non-whitespace character or entity reference.
As a simple example, the permissible values of boolean attributes
are specified as true | false
, meaning that the entire
attribute value should be either true
or
false
.
Adjacent keywords and/or numbers must be separated by whitespace in
the actual attribute values, except for unit identifiers (symbolized
by h-unit
or v-unit
syntax symbols)
following numbers. Whitespace is not otherwise required, but is
permitted between any of the tokens listed above, except (for
compatibility with CSS1) immediately before unit identifiers, between
the '-' signs and digits of negative numbers, or between
#
and rgb
or rrggbb
.
Numeric attribute values for dimensions that should depend upon the
current font can be given in font-related units, or in named absolute
units (described in a separate subsection below). Horizontal
dimensions are conventionally given in em's, and vertical dimensions
in ex's, by immediately following a number by one of the unit
identifiers em
or ex
. For example, the
horizontal spacing around an operator such as `+' is conventionally
given in em
s, though other units can be used. Using font-related
units is usually preferable to using absolute units, since it allows
renderings to grow or shrink proportionately to the current font size.
For most numeric attributes, only those in a subset of the expressible values are sensible; values outside this subset are not errors, unless otherwise specified, but rather are rounded up or down (at the discretion of the renderer) to the closest value within the allowed subset. The set of allowed values may depend on the renderer, and is not specified by MathML.
If a numeric value within an attribute value syntax description is
declared to allow a minus sign ('-'), e.g. number
or
integer
, it is not a syntax error when one is
provided in cases where a negative value is not sensible. Instead, the
value should be handled by the processing application as described in the
preceding paragraph. An explicit plus sign ('+') is not allowed as part of
a numeric value except when it is specifically listed in the syntax (as a
quoted '+' or "+"), and its presence can change the meaning of the
attribute value (as documented with each attribute which permits it).
Issue (html-color):The phrase
html-color-name
is used but never explained.
The symbols h-unit
, v-unit
,
css-fontfamily
, and html-color-name
are
explained in the following subsections.
Some attributes accept horizontal or vertical lengths as numbers
followed by a `unit identifier' (often just called a
`unit'). The syntax symbols h-unit
and
v-unit
refer to a unit for horizontal or vertical
length, respectively. The possible units and the lengths they refer to are
shown in the table below; they are the same for horizontal and vertical
lengths, but the syntax symbols are distinguished in attribute syntaxes as
a reminder of the direction they are each used in.
The unit identifiers and meanings are taken from CSS1. (However, the syntax of numbers followed by unit identifiers in MathML is not identical to the syntax of length values with units in CSS style sheets, since numbers in CSS can't end with decimal points, and are allowed to start with '+' signs.)
The possible horizontal or vertical units in MathML are:
Unit identifier | Unit description |
em | em (font-relative unit traditionally used for horizontal lengths) |
ex | ex (font-relative unit traditionally used for vertical lengths) |
px | pixels, or pixel size of the current display |
in | inches (1 inch = 2.54 centimeters) |
cm | centimeters |
mm | millimeters |
pt | points (1 point = 1/72 inch) |
pc | picas (1 pica = 12 points) |
% | percentage of default value |
The typesetting units em
and ex
are defined in appendix F [Glossary], and
discussed further under `Additional notes' below.
%
is a `relative unit'; when an attribute value is
given as n%
(for any numeric value
n
), the value being specified is the default value for
the property being controlled multiplied by n
divided
by 100. The default value (or the way in which it is obtained, when it
is not constant) is listed in the table of attributes for each
element, and its meaning is described in the subsequent documentation
about that attribute. (The mpadded
element has
its own syntax for %
and does not allow it as a unit
identifier.)
For consistency with CSS, length units in MathML are rarely
optional. When they are, the unit symbol is enclosed in square
brackets in the attribute syntax, following the number it applies to,
e.g. number [ h-unit ]
. The
meaning of specifying no unit is given in the documentation for each
attribute; in general it is that the number given is a multiplier for
the default value of the attribute. (In such cases, specifying the
number nnn
without a unit is equivalent to specifying the
number nnn
times 100 followed by %
. For
example, <mo maxsize="2"> ( </mo>
is
equivalent to <mo maxsize="200%"> ( </mo>
.)
As a special exception (also consistent with CSS), a numeric value equal to 0 need not be followed by a unit identifier even if the syntax specified here requires one. In such cases, the unit identifier (or lack of one) would not matter, since 0 times any unit is 0.
For most attributes, the typical unit which would be used to
describe them in typesetting is the same as the one used in that
attribute's default value in this specification; when a specific
default value is not given, the typical unit is usually mentioned in
the syntax table or in the documentation for that attribute. The
typical unit is usually em
or ex
. However,
any unit can be used, unless otherwise specified for a specific
attribute.
Note that some attributes, e.g. framespacing
on
<mtable>
, can contain more than one numeric value,
each followed by its own unit.
It is conventional to use the font-relative unit ex
mainly
for vertical lengths, and em
mainly for horizontal lengths,
but this is not required. These units are relative to the font and fontsize
which would be used for rendering the element in whose attribute value they
are specified, which means they should be interpreted after
attributes such as fontfamily
and
fontsize
are processed, if those occur on the same
element, since changing the current font or fontsize can change the length
of these units.
The definition of the length of each unit (but not the MathML syntax for length values) is as specified in CSS1, except that if a font provides specific values for em and/or ex which differ from the values defined by CSS1 (the font size and `x'-height respectively), those values should be used.
Several MathML attributes, listed below, correspond closely with text rendering properties defined by Cascading Style Sheets, Level 1 (CSS1).
The names and acceptable values of these attributes have been aligned with the CSS1 recommendation where possible. In general, the MathML syntax for each attribute is intended to be a subset of the CSS syntax for the corresponding property. Differences at the detail level, where they exist, are explained with the documentation about each attribute, in the sections of this specification listed in the table.
The syntax of certain attributes is partially specified, in the
tables of attribute syntax in this specification, using one of the
symbols css-fontfamily
or html-color-name
,
as shown in the following table. These symbols refer to syntaxes from
other W3C Recommendations, and are explained in the sections of this
specification referred to in the table.
MathML attribute | CSS property | syntax symbol | MathML elements | refer to |
fontsize | font-size | - | presentation tokens; mstyle |
|
fontweight | font-weight | - | presentation tokens; mstyle |
|
fontstyle | font-style | - | presentation tokens; mstyle |
|
fontfamily | font-family | css-fontfamily | presentation tokens; mstyle |
|
color | color | html-color-name | presentation tokens; mstyle |
|
background | background | html-color-name | mstyle |
See also section 2.3.4 [Attributes Shared by all MathML Elements] below for a discussion of the class
, style
and id
attributes for use with style sheets.
CSS or analogous style sheets specify changes to rendering
properties of selected MathML elements (selecting the elements in
various ways). Either the properties listed above, or any other MathML
rendering attributes or properties supported by a style sheet
mechanism, can be affected, in principle for any element. Since
rendering properties can also be changed by attributes on an element,
or automatically (which can happen to fontsize
, as
explained in the discussion on scriptlevel
in section 3.3.4 [Style Change (mstyle
)]), it is necessary to specify the relative order
in which changes from various sources occur. In the case of
`absolute' changes, i.e. setting a new property value independent of
the old value (as opposed to `relative' changes, such as increments
or multiplications by a factor), the absolute change performed last
will be the only absolute change which is effective, so the sources of
changes which should have the highest priority must be processed last.
In the case of CSS1, the order of processing of changes from various sources which affect one MathML element's rendering properties should be as follows:
(first changes; lowest priority)
fontsize
in relation to scriptlevel
mentioned above;
such changes will usually be implemented by the parent element itself
before it passes a set of rendering properties to this element(last changes; highest priority)
Note that the order of the changes derived from CSS style sheets is specified by CSS itself. The following rationale is related only to the issue of where in this pre-existing order the changes caused by explicit MathML attribute settings should be inserted.
Rationale: MathML rendering attributes are analogous to HTML rendering
attributes such as align
, which the CSS1 section on
cascading order specifies should be processed with the same
priority. Furthermore, this choice of priority permits readers, by
declaring certain CSS styles as `important', to decide which
of their style preferences should override explicit attribute settings in
MathML. Since MathML expressions, whether composed of
`presentation' or `content' elements, are
primarily intended to convey meaning, with their `graphic
design' (if any) intended mainly to aid in that purpose but not to
be essential in it, it is likely that readers will often want their own
style preferences to have priority; the main exception will be when a
rendering attribute is intended to alter the meaning conveyed by an
expression, which is generally discouraged in the presentation attributes
of MathML.
Default values for MathML attributes are in general given along with the detailed descriptions of specific elements in the text. Default values shown in plain text, in the tables of attributes for an element, are literal (unless they are obviously explanatory phrases), but when italicized are descriptions of how default values can be computed.
Default values described as inherited are taken from
the rendering environment, as described under
mstyle
, or in some cases (described individually)
from the values of other attributes of surrounding elements, or from
certain parts of those values. The value used will always be one
which could have been specified explicitly, had it been known; it will
never depend on the content or attributes of the same element, only on
its environment. (What it means when used may, however, depend on
those.)
Default values described as automatic should be computed by a MathML renderer in a way which will produce a high-quality rendering; how to do this is not usually specified by MathML. The value computed will always be one which could have been specified explicitly, had it been known, but it will usually depend on the element content and/or the rendering environment.
Other italicized descriptions of default values which appear in the tables of attributes are explained for each attribute individually.
The single or double quotes which are required around attribute values in an XML start tag are not shown in the tables of attribute value syntax for each element, but are shown around example attribute values in the text.
Note that, in general, there is no value which can be given explicitly
for a MathML attribute which will simulate the effect of not specifying the
attribute at all, for attributes which are inherited or
automatic. Giving the words `inherited' or
`automatic' explicitly will not work, and is not generally
allowed. Furthermore, even for presentation attributes for which a specific
default value is documented here, the mstyle
element (section 3.3.4 [Style Change (mstyle
)]) can be used to change this
for the elements it contains. Therefore, the MathML DTD declares most
presentation attribute default values as #IMPLIED
, which prevents XML
preprocessors from adding them with any specific default value.
In an XML DTD, allowed attribute values can be declared as general strings, or they can be constrained in various ways, either by enumerating the possible values, or by declaring them to be certain special data types. The choice of an XML attribute type affects the extent to which validity checks can be performed using a DTD.
The MathML DTD specifies formal XML attribute types for all MathML
attributes, including enumerations of legitimate values in some cases. In
general, however, the MathML DTD is relatively permissive, frequently
declaring attribute values as strings; this is done to provide for
interoperability with SGML parsers while allowing multiple attributes on
one MathML element to accept the same values (such as true
and false
), and also to
allow extension to the lists of predefined values.
At the same time, even though an attribute value may be declared as a string in the DTD, only certain values are legitimate in MathML, as described above and in the rest of this specification. For example, many attributes expect numerical values. In the sections which follow, the allowed attribute values are described for each element. To determine when these constraints are actually enforced in the MathML DTD, consult appendix A [Parsing MathML]. However, lack of enforcement of a requirement in the DTD does not imply that the requirement is not part of the MathML language itself, or that it will not be enforced by a particular MathML renderer. (See section 7.2.2 [Handling of Errors] for a description of how MathML renderers should respond to MathML errors.)
Furthermore, the MathML DTD is provided for convenience; although it is intended to be fully compatible with the text of the specification, the text should be taken as definitive if there is a contradiction. (Any contradictions which may exist between various chapters of the text should be resolved by favoring chapter 6 [Entities, Characters and Fonts] first, then chapter 3 [Presentation Markup], chapter 4 [Content Markup], then section 2.3 [MathML Syntax and Grammar], and then other parts of the text.)
In order to facilitate compatibility with Cascading Style Sheets,
Level 1 (CSS1), all MathML elements accept class
, style
, and id
attributes in addition to the attributes described
specifically for each element. MathML renderers not supporting CSS may
ignore these attributes. (MathML specifies these attribute values
as general strings, even if style-sheet mechanisms have more restrictive
syntaxes for them. That is, any value for them is valid in MathML.)
Renderers supporting CSS (or analogous style sheet mechanisms) may use these attributes to help determine which MathML elements should be subject to which style sheet-induced changes to various rendering properties. The properties that can be affected, and how these changes affect them, are discussed in section 2.3.3.3 [CSS-compatible attributes] above.
Every MathML element also accepts the attribute other
(section 7.2.3 [Attribute for unspecified data])
for passing non-standard attributes without violating the MathML
DTD. MathML renderers are only required to process this attribute if they
respond to any attributes which are not standard in MathML.
See also section 3.2.1 [Attributes common to token elements] for a list of MathML attributes which can be used on most presentation token elements.
MathML ignores whitespace occurring outside token elements. Non-whitespace characters are not allowed there. Whitespace occurring within the content of token elements is `trimmed' from the ends (i.e. all whitespace at the beginning and end of the content is removed), and `collapsed' internally (i.e. each sequence of 1 or more whitespace characters is replaced with one blank character).
In MathML, as in XML, `whitespace' means blanks, tabs, newlines,
or carriage returns, i.e. characters with hexadecimal Unicode codes
U+0020
, U+0009
, U+000a
, or U+000d
, respectively.
For example, <mo> ( </mo>
is equivalent to
<mo>(</mo>
, and
<mtext> Theorem 1: </mtext>
is equivalent to <mtext>Theorem 1:</mtext>
.
Authors wishing to encode whitespace characters at the start or end
of the content of a token, or in sequences other than a single blank,
without having them ignored, must use
or other
`whitespace' non-marking entities as described in
section 6.1.4 [Non-Marking Entities]. For example, compare
<mtext> Theorem 1: </mtext>
with
<mtext> Theorem
 1: </mtext>
When the first example is rendered, there is no whitespace before `Theorem', one blank between `Theorem' and `1:', and no whitespace after `1:'. In the second example, a single blank is rendered before `Theorem', a new line is placed after `Theorem', two blanks are rendered before `1:', and there is no whitespace after the `1:'.
Note that the xml:space
attribute
does not apply in this situation since XML processors pass whitespace in
tokens to a MathML processor; it is the MathML processing rules which
specify that whitespace is trimmed and collapsed.
For whitespace occurring outside the content of the token elements
mi
, mn
, mo
,
ms
, mtext
, ci
, cn
and annotation
, an mspace
element
should be used, as opposed to an mtext
element
containing only `whitespace' entities.