Contents
The aural rendering of a document, already commonly used by the blind and print-impaired communities, combines speech synthesis and "audio icons" Often such aural presentation occurs by converting the document to plain text and feeding this to a screen reader -- software or hardware that simply reads all the characters on the screen. This results in less effective presentation than would be the case if the document structure were retained. Style Sheet properties for aural presentation may be used together with visual properties (mixed media) or as an aural alternative to visual presentation.
Besides the obvious accessibility advantages, there are other large markets for aural presentation, including in-car use, industrial and medical documentation systems (intranets), home entertainment, and to help illiterate users.
Property name: | 'volume' |
---|---|
Value: | <number> | <percentage> | silent | x-soft | soft | medium | loud | x-loud | inherit |
Initial: | medium |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | relative to inherited value |
Media groups: | aural |
Volume refers to the median volume of the waveform. In other words, a highly inflected voice at a volume of 50 might peak well above that. The overall values are likely to be human adjustable for comfort, for example with a physical volume control (which would increase both the 0 and 100 values proportionately); what this property does is adjust the dynamic range.
Values have the following meanings:
User agents should allow the values corresponding to '0' and '100' to be set by the listener. No one setting is universally applicable; suitable values depend on the equipment in use (speakers, headphones), the environment (in car, home theater, library) and personal preferences. Some examples:
The same author style sheet could be used in all cases, simply by mapping the '0' and '100' points suitably at the client side.
Property name: | 'speak' |
---|---|
Value: | normal | none | spell-out | inherit |
Initial: | normal |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
This property specifies whether text will be rendered aurally and if so, in what manner (somewhat analogous to the 'display' property). The possibles values are:
Note the difference between an element whose 'volume' property has a value of 'silent' and an element whose 'speak' property has the value 'none'. The former takes up the same time as if it had been spoken, including any pause before and after the element, but no sound is generated. This may be used in language teaching applications, for example. A pause is generated for the pupil to speak the element themselves. Note that since the value of this property is inherited, child elements will also be silent. Child elements may however set the volume to a non-silent value and will then be spoken. On the other hand, elements for which the 'speak' property has the value 'none' are not spoken and take no time. Child elements may however override this value and may be spoken normally.
Property name: | 'pause-before' |
---|---|
Value: | <time> | <percentage> | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | no |
Percentage values: | see prose |
Media groups: | aural |
Property name: | 'pause-after' |
---|---|
Value: | <time> | <percentage> | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | no |
Percentage values: | see prose |
Media groups: | aural |
These properties specify a pause to be observed before (or after) speaking an element's content. Values have the following meanings:
Authors should use relative units to create more robust style sheets in the face of large changes in speech-rate.
Property name: | 'pause' |
---|---|
Value: | [ [<time> | <percentage>]{1,2} ] | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | no |
Percentage values: | see descriptions of 'pause-before' and 'pause-after' |
Media groups: | aural |
The 'pause' property is a shorthand for setting 'pause-before' and 'pause-after'. If two values are given, the first value is 'pause-before' and the second is 'pause-after'. If only one value is given, it applies to both properties.
Examples:
H1 { pause: 20ms } /* pause-before: 20ms; pause-after: 20ms */ H2 { pause: 30ms 40ms } /* pause-before: 30ms; pause-after: 40ms */ H3 { pause-after: 10ms } /* pause-before: ?; pause-after: 10ms */
Property name: | 'cue-before' |
---|---|
Value: | <uri> | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | no |
Percentage values: | N/A |
Media groups: | aural |
Property name: | 'cue-after' |
---|---|
Value: | <uri> | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | no |
Percentage values: | N/A |
Media groups: | aural |
Auditory icons are another way to distinguish semantic elements. Sounds may be played before, and/or after the element to delimit it. Values have the following meanings:
For example:
A {cue-before: url(bell.aiff); cue-after: url(dong.wav) } H1 {cue-before: url(pop.au); cue-after: url(pop.au) }
Property name: | 'cue' |
---|---|
Value: | [ <'cue-before'> || <'cue-after'> ] | inherit |
Initial: | not defined for shorthand properties |
Applies to: | all elements |
Inherited: | no |
Percentage values: | N/A |
Media groups: | aural |
The 'cue' property is a shorthand for setting 'cue-before' and 'cue-after'. If two values are given, the first value is 'cue-before' and the second is 'cue-after'. If only one value is given, it applies to both properties.
The following two rules are equivalent:
H1 {cue-before: url(pop.au); cue-after: url(pop.au) } H1 {cue: url(pop.au) }
Property name: | 'play-during' |
---|---|
Value: | <uri> | mix? repeat? | auto | none | inherit |
Initial: | auto |
Applies to: | all elements |
Inherited: | no |
Percentage values: | N/A |
Media groups: | aural |
Similar to the 'cue-before' and 'cue-after' properties, this property specifies a sound to be played as a background while an element's content is spoken. Values have the following meanings:
Examples:
BLOCKQUOTE.sad {play-during: url(violins.aiff) } BLOCKQUOTE Q {play-during: url(harp.wav) mix} SPAN.quiet {play-during: none }
If a stereo icon is dereferenced, the central point of the stereo pair should be placed at the azimuth for that element and the left and right channels should be placed to either side of this position.
Spatial audio is an important stylistic property for aural presentation. It provides a natural way to tell several voices apart, as in real life (people rarely all stand in the same spot in a room). Stereo speakers produce a lateral sound stage. Binaural headphones or the increasingly popular 5-speaker home theater setups can generate full surround sound, and multi-speaker setups can create a true three-dimensional sound stage. VRML 2.0 also includes spatial audio, which implies that in time consumer-priced spatial audio hardware will become more widely available.
Property name: | 'azimuth' |
---|---|
Value: | <angle> | [[ left-side | far-left | left | center-left | center | center-right | right | far-right | right-side ] || behind ] | leftwards | rightwards | inherit |
Initial: | center |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
Values have the following meanings:
This property is most likely to be implemented by mixing the same signal into different channels at differing volumes. It might also use phase shifting, digital delay, and other such techniques to provide the illusion of a sound stage. The precise means used to achieve this effect and the number of speakers used to do so are user agent-dependent; this property merely identifies the desired end result.
Examples:
H1 { azimuth: 30deg } TD.a { azimuth: far-right } /* 60deg */ #12 { azimuth: behind far-right } /* 120deg */ P.comment { azimuth: behind } /* 180deg */
If spatial-azimuth is specified and the output device cannot produce sounds behind the listening position, user agents should convert values in the rearwards hemisphere to forwards hemisphere values. One method is as follows:
Property name: | 'elevation' |
---|---|
Value: | <angle> | below | level | above | higher | lower | inherit |
Initial: | level |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
Values of this property have the following meanings:
The precise means used to achieve this effect and the number of speakers used to do so are undefined. This property merely identifies the desired end result.
Examples:
H1 { elevation: above } TR.a { elevation: 60deg } TR.b { elevation: 30deg } TR.c { elevation: level }
Property name: | 'speech-rate' |
---|---|
Value: | <number> | x-slow | slow | medium | fast | x-fast | faster | slower | inherit |
Initial: | medium |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
This property specifies the speaking rate. Note that both absolute and relative keyword values are allowed (compare with 'font-weight'). Values have the following meanings:
Property name: | 'voice-family' |
---|---|
Value: | [[<specific-voice> | <generic-voice> ],]* [<specific-voice> | <generic-voice> ] | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
The value is a comma-separated, prioritized list of voice family names (compare with 'font-family'). Values have the following meanings:
Examples:
H1 { voice-family: announcer, male } P.part.romeo { voice-family: romeo, male } P.part.juliet { voice-family: juliet, female }
Property name: | 'pitch' |
---|---|
Value: | <frequency> | x-low | low | medium | high | x-high | inherit |
Initial: | medium |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
Specifies the average pitch of the speaking voice. Values have the following meanings:
Property name: | 'pitch-range' |
---|---|
Value: | <number> | inherit |
Initial: | 50 |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
Specifies variation in average pitch. Values have the following meanings:
Property name: | 'stress' |
---|---|
Value: | <number> | inherit |
Initial: | 50 |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
Specifies the level of stress (assertiveness or emphasis) of the speaking voice. English is a stressed language, and different parts of a sentence are assigned primary, secondary or tertiary stress. The value of 'stress' controls the amount of inflection that results from these stress markers. Values have the following meanings:
Property name: | 'richness' |
---|---|
Value: | <number> | inherit |
Initial: | 50 |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
Specifies the richness (brightness) of the speaking voice. Values have the following meanings:
Note. The following four properties are preliminary and discussion on them is invited.
An additional speech property, speak-header, is described in the chapter on tables
Property name: | 'speak-punctuation' |
---|---|
Value: | code | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
This property specifies how punctuation is spoken. Values have the following meanings:
Property name: | 'speak-date' |
---|---|
Value: | mdy | dmy | ymd | inherit |
Initial: | depends on user agent |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
This property controls how dates are spoken. Values have the following meanings:
This property would be useful, for example, when combined with an XML element used to identify dates, such as:
<PARA>The campaign started on <DATE value="1874-10-21"/> and finished <DATE value="1874-10-28/"></PARA>
Property name: | 'speak-numeral' |
---|---|
Value: | digits | continuous | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
This property controls how numerals are spoken. Values have the following meanings:
Property name: | 'speak-time' |
---|---|
Value: | 24 | 12 | none | inherit |
Initial: | none |
Applies to: | all elements |
Inherited: | yes |
Percentage values: | N/A |
Media groups: | aural |
This property controls how times are spoken. Values have the following meanings:
When used in combination with the 'speak-date' property, this allows elements with an attribute containing an ISO 8601 format date/time attribute to be presented in a flexible manner.