This Section defines the SMIL content control module. This module contains elements and attributes which provide for runtime content choices and optimized content delivery. Since these elements and attributes are defined in a module, designers of other markup languages can reuse the functionality in the SMIL content control module when they need to include media content control in their language. Conversely, language designers incorporating other SMIL modules do not need to include the content module if other content control functionality is already present.
Proposed Extensions to SMIL 1.0 content control functionality include:
SMIL 1.0 provides a "test-attribute" mechanism to process an element only when certain conditions are true, e.g. when the client has a certain screen-size. SMIL 1.0 also provides the "switch" element for expressing that a set of document parts are alternatives, and that the first one fulfilling certain conditions should be chosen. This is useful to express that different language versions of an audio file are available, and to have the client select one of them. SMIL Boston includes these features and extends them by supporting new system test-attributes, as well as the ability to customize a presentation to an individual viewer by providing author defined, user selected test-attributes.
<switch>
Element
The switch element allows an author to specify a set of alternative elements from which only one acceptable element should be chosen. In SMIL Boston, an element is acceptable if the element is a SMIL Boston element, the media-type can be decoded (if the element declares media), and all of the test-attributes of the element evaluate to "true". When integrating content control into other languages, the language designer must specify what constitutes an "acceptable element."
An element is selected as follows: the player evaluates the elements in the order in which they occur in the switch element. The first acceptable element is selected at the exclusion of all other elements within the switch.
Thus, authors should order the alternatives from the most desirable to the least desirable. Furthermore, authors should place a relatively fail-safe alternative as the last item in the <switch> so that at least one item within the switch is chosen (unless this is explicitly not desired). Implementations should NOT arbitrarily pick an object within a <switch> when test-attributes for all child elements fail.
Note that some network protocols, e.g. HTTP and RTSP, support content-negotiation, which may be an alternative to using the "switch" element in some cases.
Attributes
The switch element can have the following attributes:
This specification defines a list of test attributes that can be added to language elements, as allowed by the language designer. In SMIL 1.0, these elements are synchronization and media elements. Conceptually, these attributes represent Boolean tests. When one of the test attributes specified for an element evaluates to "false", the element carrying this attribute is ignored.
Within the list below, the concept of "user preference" may show up. User preferences are usually set by the playback engine using a preferences dialog box, but this specification does not place any restrictions on how such preferences are communicated from the user to the SMIL player.
This version of SMIL defines the following test attributes. Note that some hyphenated test attribute names from SMIL 1.0 have been deprecated in favor of names using the current SMIL camelCase convention. For these, the deprecated SMIL 1.0 name is shown in parentheses after the preferred name.
Evaluates to "true" if one of the languages indicated by user preferences exactly equals one of the languages given in the value of this parameter, or if one of the languages indicated by user preferences exactly equals a prefix of one of the languages given in the value of this parameter such that the first tag character following the prefix is "-".
Evaluates to "false" otherwise.
Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix.
The prefix rule simply allows the use of prefix tags if this is the case.
Implementation note: When making the choice of linguistic preference available to the user, implementers should take into account the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users may assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. The user interface for setting user preferences should guide the user to add "en" to get the best matching behavior.
Multiple languages MAY be listed for content that is intended for multiple
audiences. For example, a rendition of the "Treaty of Waitangi", presented
simultaneously in the original Maori and English versions, would call for:
<audio src="foo.rm" systemLanguage="mi, en"/>
Authoring note: Authors should realize that if several alternative language objects are enclosed in a "switch", and none of them matches, this may lead to situations such as a video being shown without any audio track. It is thus recommended to include a "catch-all" choice at the end of such a switch which is acceptable in all cases.
screen-size-val ::= screen-height"X"screen-width
Examples
1) Choosing between content with different total bitrates
In a common scenario, implementations may wish to allow for selection via a systemBitrate attribute on elements. The media player evaluates each of the "choices" (elements within the switch) one at a time, looking for an acceptable bitrate given the known characteristics of the link between the media player and media server.
<par> <text .../> <switch> <par systemBitrate="40000"> ... </par> <par systemBitrate="24000"> ... </par> <par systemBitrate="10000"> ........ </par> </switch> </par> ...
2) Choosing between audio resources with different bitrates
The elements within the switch may be any combination of elements. For instance, one could merely be specifying an alternate audio track:
... <switch> <audio src="joe-audio-better-quality" systemBitrate="16000" /> <audio src="joe-audio" systemBitrate="8000" /> </switch> ...
3) Choosing between audio resources in different languages
In the following example, an audio resource is available both in French and in English. Based on the user's preferred language, the player can choose one of these audio resources.
... <switch> <audio src="joe-audio-french" systemLanguage="fr"/> <audio src="joe-audio-english" systemLanguage="en"/> </switch> ...
4) Choosing between content written for different screens
In the following example, the presentation contains alternative parts designed for screens with different resolutions and bit-depths. Depending on the particular characteristics of the screen, the player can choose one of the alternatives.
... <par> <text .../> <switch> <par systemScreenSize="1280X1024" systemScreenDepth="16"> ........ </par> <par systemScreenSize="640X480" systemScreenDepth="32"> ... </par> <par systemScreenSize="640X480" systemScreenDepth="16"> ... </par> </switch> </par> ...
5) Distinguishing caption tracks from stock tickers
In the following example, captions are shown only if the user wants captions on.
... <seq> <par> <audio src="audio.rm"/> <video src="video.rm"/> <textstream src="stockticker.rtx"/> <textstream src="closed-caps.rtx" systemCaptions="on"/> </par> </seq> ...
6) Choosing the language of overdub and subtitle tracks
In the following example, a French-language movie is available with English, German, and Dutch overdub and subtitle tracks. The following SMIL segment expresses this, and switches on the alternatives that the user prefers.
... <par> <switch> <audio src="movie-aud-en.rm" systemLanguage="en" systemOverdubOrSubtitle="overdub"/> <audio src="movie-aud-de.rm" systemLanguage="de" systemOverdubOrSubtitle="overdub"/> <audio src="movie-aud-nl.rm" systemLanguage="nl" systemOverdubOrSubtitle="overdub"/> <!-- French for everyone else --> <audio src="movie-aud-fr.rm"/> </switch> <video src="movie-vid.rm"/> <switch> <textstream src="movie-sub-en.rt" systemLanguage="en" systemOverdubOrSubtitle="subtitle"/> <textstream src="movie-sub-de.rt" systemLanguage="de" systemOverdubOrSubtitle="subtitle"/> <textstream src="movie-sub-nl.rt" systemLanguage="nl" systemOverdubOrSubtitle="subtitle"/> <!-- French captions for those that really want them --> <textstream src="movie-caps-fr.rt" systemCaptions="on"/> </switch> </par> ...
During the development of the SMIL 1.0, the issue of content selectability
within a presentation received a great deal of attention. Early on, it was
decided that a <switch>
construct would form the basic
selection primitive in the language. A <switch>
allows
a series of alternatives to be specified for a particular piece of content,
one of which is selected by the runtime environment for presentation. An
example of how a <switch>
might be used to control the
alternatives that could accompany a piece of video in a presentation would
be:
... <par> <video src="anchor.mpg" ... /> <switch> <audio src="dutch.aiff" systemLanguage="DU" systemCaptions="overdub" ... /> <audio src="english.aiff" systemLanguage="EN" systemCaptions="overdub"... /> <text src="dutch.html" systemLanguage="DU" systemCaptions="captions"... /> <text src="english.html" systemLanguage="EN" systemCaptions="captions"... /> </switch> </par> ...
This fragment (which is pseudo-SMIL for clarity) says that a video is played
in parallel with one of: Dutch audio, English audio, Dutch text, or English
text. SMIL does not specify the selection mechanism, only a way of specifying
the alternatives. While <switch>
-based content control
is a powerful mechanism, it comes with two problems.
First, it restricts the resolution of a <switch>
to a
single alternative. (If you want Dutch audio and Dutch text, you need to
specify a compound <switch>
statement, but in so doing,
you always get the compound result.)
Second, and more restrictively, it requires the author to explicitly state all of the possible combinations of input streams during authoring. If the user wanted Dutch audio and English text, this possibility must have been considered at authoring time.
A solution to both problems is to allow in-line use of System Test Attributes, as given in the following document fragment:
... <par> <video src="anchor.mpg" ... /> <switch> <audio src="dutch.aiff" systemLanguage="DU" systemCaptions="overdub" ... /> <audio src="english.aiff" systemLanguage="EN" systemCaptions="overdub"... /> <text src="dutch.html" systemLanguage="DU" systemCaptions="captions"... /> <text src="english.html" systemLanguage="EN" systemCaptions="captions"... /> </switch> </par> ...
This example says: a video is accompanied by four other data objects, all
of which are (logically) shown in parallel. This is, of course, exactly what
happens: all five do run in parallel, but it could be that only the video
and one audio stream are actually selected by the user (or a user agent)
to be rendered during the presentation. At author time you know which logical
streams are available, but it is only at runtime that you know which combination
of all potentially available stream actually meet the user's needs. Logically,
the alternatives indicated by the in-line construct could be represented
as a set of <switch>
statements, although the resulting
<switch>
could become explosive in size. Use of an in-line
test mechanism significantly simplifies the specification of adaptive content
in the case that many independent alternatives exist.
The provision of <switch>
-based and in-line system test
attributes provides a selection mechanism based on general system attributes.
This version of SMIL extends this notion with the definition of user test
attributes. User test attributes allow presentation authors to define their
own test attributes for use in a specific document.
The elements used to provide user group functionality are:
<user_attributes>
element
A section within the SMIL head that contains definitions of each of the user groups. The elements within the section define a collection of author-specified test attributes that can be used in the document.
<u_group>
element
An author-defined grouping of related media objects. These are defined within
the section delineated by the <user_attributes>
elements
that make up part of the document header, and they are referenced within
a media object definition.
The <u_group>
element supports the following
attributes:
<u_group>
. The initial state for the
<u_group>
is given in the value of this attribute, if
unspecified, it defaults to RENDERED. The run-time state
is defined by the user or the user agent via the SMIL DOM. If a particular
playback environment does not (or cannot) support user selection, the
u_state attribute controls the author-specified default
presentation.
<u_group>
definition. It is up to the runtime environment
to enforce this attribute. The attribute can also be used to influence adaptive
behavior at lower level in the transport hierarchy.
It would be good to have more explanation of this last use.
In addition to the <user_attribute>
and
<u_group>
elements, this module provides a
u_group attribute that can be applied to content requiring
selection.
u_group
attribute
The following example shows how user groups can be applied within a SMIL document:
1 <smil> 2 <head> 3 <layout> 4 <!-- define projection regions --> 5 </layout> 6 <user_attributes> 7 <u_group id="nl_aud" u_state="RENDERED" title="Dutch Audio Cap" override="allowed" /> 8 <u_group id="uk_aud" u_state="NOT_RENDERED" title="English Audio Cap" override="allowed" /> 9 <u_group id="nl_txt" u_state="NOT_RENDERED" title="Dutch Text Cap"override="allowed" /> 10 <u_group id="uk_txt" u_state="NOT_RENDERED" title="English Text Cap" override="allowed" /> 11 </user_attributes> 12 </head> 13 <body> 14 ... 15 <par> 16 <video src="announcer.rm" region="a"/> 17 <text src="news_headline.html" region="b"/> 18 <audio src="story_1_nl.rm" u_group="nl_aud"/> 19 <audio src="story_1_uk.rm" u_group="uk_aud-cam"/> 20 <text src="story_1_nl.html" u_group="nl_txt" region="c"/> 21 <text src="story_1_uk.html" u_group="uk_txt" region="d"/> 22 </par> 23 ... 24 </body> 25 </smil>
Lines 6 through 11 define the available groups. Each group contains an identifier and a title (which can be used by the user interface agent to label the group), as well as the (optional) initial state definition and override flag.
In line 7, a <u_group>
named "nl_aud" is defined for Dutch
audio captions that is initially set to RENDERED. The other
groups in this (very simple) example are set to
NOT_RENDERED.
In lines 15 through 22, a SMIL <par>
construct is used
to identify a portion of a presentation. In this
<par>
, a single video (line 16) is accompanied by two
audio streams (18,19) and two text streams (20,21), one each for English
and Dutch. The <par>
also contains a text title that contains
a headline.
The interaction of the user interface and the initial state determine which objects are rendered. Note that the same attributes are used across the entire document, meaning that the user only needs to select his/her content preferences once to control related groups of information. In the example, user is free to have the video and headline text accompanied by any combination of English and Dutch captions. (Note that if two audio captions are selected, the player will need to determine how these are processed for delivery.)
While this example shows in-line use of user groups, the groups could also
be applied as test attributes in a <switch>
. Similarly,
the system test attributes typically found in a
<switch>
could also be used in-line as a control attribute
on an element along with the u_group attribute.
A previous version of this specification used camelCase for the user group elements and attributes instead of the underlined convention used here. We need to standardize this across the SMIL modules.
The following is still under development by the SYMM Working Group. The working group is interested in considering this functionality but the syntax and semantics described here are only preliminary thinking.
Define a means to group collections of objects that share a common policy. A Channel defines a partitioning of elements into groups each group has a common set of access policies control use of quasi-physical resources: - priority - common server - common access rights / charging model - local resource use (layout, devices, etc.)
The following is still under development by the SYMM Working Group. The working group is interested in considering this functionality but the syntax and semantics described here are only preliminary thinking.
Focus on presentation as collection of content: each of the components may have a different user-level representation, encoding:
At author-time, you know alternatives; at use-time, you select
<prefetch>
element
This element will give a suggestion or hint to a user-agent that a media
resource will be used in the future and the author would like part or all
of the resource fetched ahead of time to make to make the document playback
more smoothly. User-agents can ignore <prefetch>
elements,
though doing so may cause an interruption in the document playback when the
resource is needed. It gives authoring tools or savvy authors the ability
to schedule retrieval of resources when they think that there is available
bandwidth or time to do it. A <prefetch>
element is contained
within the body of an XML document, and its scheduling is based on its lexical
order unless explicit timing is present.
The <prefetch>
element, like media object elements, can
have id
and src
. If SMIL Boston Timing is integrated
into the document, begin
, end
, dur
,
clipBegin
, and clipEnd
attributes are also available.
The id
and src
elements are the same as for other
media objects id
names the element for reference in the document
and src
names the resource to be prefetched. When a media object
with the same src
URL is encountered the user-agent can use
any data it prefetched to begin playback without rebuffering or other
interruption. The timing attributes begin
, end
,
dur
would constrain the presentation time period for prefetching
the element. At the end of the presentation time specified by
end
or dur
, the prefetch operation should stop.
The clipBegin
and clipEnd
elements are used to
identify the part of the src clip to prefetch, if only the last 30s of the
clip are being played, we don't want to prefetch it from the beginning. Likewise
if only the middle 30 seconds of the clip are begin played, we don't want
to prefetch more data than will be played.
mediaSize
, mediaTime
, and
bandwidth
Attributes
In addition to the attributes allowed on Media Object Elements, the following attributes are allowed:
mediaSize : bytes-value | percent-value
mediaTime : clock-value | percent-value
bandwidth : bitrate-value | percent-value
If both mediaSize
and mediaTime
are specified,
mediaSize
is used and mediaTime
is ignored.
For descrete media (non-time based media like text/html or image/png) using
the mediaTime
attribute causes the entire resource to be fetched.
Documents must still playback even when the prefetch elements are ignored, although rebuffering or pauses in presentation of the document may occur.
If a prefetch
element is repeated, due to restart or repeat
on a parent element the prefetch operation should occur again. This insures
appropriately "fresh" data is displayed if, for example, the prefetch is
for a banner ad to a URL whose content changes with each request. Note that
prefetching data from a URL that changes the content dynamically is dangerous
if the entire resource isn't prefetched as the subsequent request for the
remaining data may yield data from a newer resource. A user-agent should
respect any appropriate caching directives applied to the content, e.g. no-cache
822 headers in HTTP. More specifically, content marked as non-cachable would
have to be refetched each time it was played, where content that is cachable
could be prefetched once, with the results of the prefetch cached for future
use.
If the clipBegin
or ClipEnd
in the media object
are different from the prefetch, an implementation can use any data that
was fetched and applies but the result may not be optimal.
The bytes-value value has the following syntax:
bytes-value ::= Digit+; any positive number
The percent-val value has the following syntax:
percent-value ::= Digit+ "%"; any positive number in the range 0 to
100
The clock-value value has the following syntax:
Clock-val ::= ( Hms-val | Smpte-val )
Smpte-val ::= ( Smpte-type )? Hours ":" Minutes ":" Seconds
( ":" Frames ( "." Subframes )? )?
Smpte-type ::= "smpte" | "smpte-30-drop" | "smpte-25"
Hms-val ::= ( "npt=" )? (Full-clock-val | Partial-clock-val
| Timecount-val)
Full-clock-val ::= Hours ":" Minutes ":" Seconds ("." Fraction)?
Partial-clock-val ::= Minutes ":" Seconds ("." Fraction)?
Timecount-val ::= Timecount ("." Fraction)? (Metric)?
Metric ::= "h" | "min" | "s" | "ms"
Hours ::= DIGIT+; any positive number
Minutes ::= 2DIGIT; range from 00 to 59
Seconds ::= 2DIGIT; range from 00 to 59
Frames ::= 2DIGIT; @@ range?
Subframes ::= 2DIGIT; @@ range?
Fraction ::= DIGIT+
Timecount ::= DIGIT+
2DIGIT ::= DIGIT DIGIT
DIGIT ::= [0-9]
For Timecount values, the default metric suffix is "s" (for seconds).
The bitrate-value value specifies a number of bits per second. It has the following syntax:
bitrate-value ::= Digit+; any positive number
1) Prefetch the image so it can be displayed immediately after the video
ends:
<smil>
<body>
<seq>
<par>
<prefetch id="endimage"
src="http://www.w3c.org/logo.gif"/>
<text id="interlude"
src="http://www.w3c.org/pleasewait.html" fill="freeze"/>
</par>
<video id="main-event"
src="rtsp://www.w3c.org/video.mpg"/>
<image src="http://www.w3c.org/logo.gif"
fill="freeze"/>
</seq>
</body>
</smil>
No timing is specified so default timing applies in the above example. The
text is discrete media so it ends immediately, the prefetch is defaulted
to prefetch the entire image at full available bandwidth and the prefetch
element ends when the image is downloaded. That ends the
<par>
and the video begins playing. When the video ends
the image is shown.
2) Prefetch the images for a button so that rollover occurs quickly for the
end user:
<html>
<body>
<prefetch id="upimage"
src="http://www.w3c.org/up.gif"/>
<prefetch id="downimage"
src="http://www.w3c.org/down.gif"/>
....
<!-- script will change the graphic on rollover
-->
<img src="http://www.w3c.org/up.gif"/>
</body>
</html>
Can prefetch elements be used as timebases for sync? This could be an useful capability to be supported. We should be able to start a prefetch and not play the content until it completes. This means that prefetch has to have effective begin and end, depending upon how long it actually takes to get the data. Of course, if prefetching is optional, we need to decide when the begin and end events fire. However this introduces the problem of how to handle errors. Even though the prefetch may not be allowed or fail, there may be other things dependant upon the timing of the prefetch element. In this case it is appropriate for the element's timing to continue and fire begin\end events as if the prefetch element ran to completion. Since this is all very complicated, and prefetch is intended to be transparent, one idea is that we explicitly prohibit prefetch from being a syncbase. This is not as simple as it sounds, say that a prefetch element is in the middle of a <seq>. Maybe the simplest solution is to allow prefetch as a syncbase, and to say that for sync purposes, all prefetch elements always have duration zero, and fire begin\end events event if the prefetch itself fails or is not allowed