Robust Audio Tool UCL Multimedia

RAT Features

There are two incarnations of RAT: a toll quality audio version (v3) and the experimental high quality audio version (v4). The toll quality version has been code frozen for the past 2 years and is very stable. The experimental version has been developed over the past 2 years and is generally a better application (better algorithms, modular implementation, more features), though marginally less stable. A detailed feature comparion is depicted in table below.

Selected feature highlights appear further down the page.

Feature RAT v3 RAT v4
Adaptive Playout Buffering Yes Yes
Adaptive scheduling protectionYesYes
Audio ChannelsMonoMono / Stereo
Audio Sampling Rates8 kHz8 / 16 / 32 / 48 kHz
Audio File Formats (Import and export)RawRaw / Au / Wav
Audio Coding Schemes
  G.711 PCM µ-Law (64kb/s) Yes Yes
  G.711 PCM A-Law (64kb/s) Yes Yes
  Wide-Band ADPCM (64kb/s) No Yes
  G.726 ADPCM (16-40kb/s) No Yes
  DVI ADPCM (32kb/s) Yes Yes
  Variate Rate DVI ADPCM (~32kb/s) No Yes
  Full Rate GSM (13kb/s) Yes Yes
  LPC (5.6kb/s) Yes Yes
Audio Duplex Half / full duplex Full duplex
Audio Platforms
  Advanced Linux Sound Architecture No Yes
  FreeBSD PCM No Yes
  HP-UX Yes No
  IRIX Yes Yes
  NetBSD Yes No
  Open Sound System (Linux/FreeBSD) Yes Yes
  Solaris Yes Yes
  SunOS Yes No
  Win32 Yes Yes (better)
Audio Clock Skew Correction No Yes
Automatic Gain Control Yes Yes
Encryption DES DES
Loss concealment schemes
  Packet Repetition Yes Yes
  Noise Subsitution No Yes
  Silence Subsitution Yes Yes
  Waveform Replication No Yes
Licensing Non-commercial Redistribution Open Source
Multiple stream mixing Yes Yes
Network Stacks IPv4 IPv4 / IPv6
Packet Spike Filtering No Yes
Round Trip Time Calculation No Yes
RTP Compliant Yes Yes
Sample Rate Conversion No Yes
Silence Detection Yes Yes
Sound Localization (3D Rendering) No Yes
Transcoder Operation Yes No
Transmission Strategies
  Layer audio No Yes
  Redundant audio Yes Yes

Selected feature highlights

Sender based repair of damaged audio streams

Unless some form of resource reservation protcol (eg: RSVP) is used, an IP based network, such as the Internet or the Mbone, will occasionally lose packets. These lost packets result in broken up audio, which rapidly becomes unintelligible as the loss rate increases. RAT implements two sender based repair schemes to recover from this problem: redundant transmission and interleaving.

Redundant transmission is the means by which a (more) heavily compressed copy of a packet is piggy-backed onto the following packet. If the original packet is lost, the redundant copy can be used in its place. Because the redundant packet is very heavily compressed, sound quality suffers, but is still better than having no audio to play out in the place of the lost packet. Clearly, there exists a tradeoff between the amount of compression used for the redundant packet (and hence stream bandwidth/overhead), and the quality of the resultant audio.

Redundant transmission was developed by UCL and INRIA Sophia-Antipolis, as part of the MICE/MERCI multimedia conferencing projects. It is discussed further in the following papers:

  • Vicky Hardman, Angela Sasse, Mark Handley and Anna Watson, " Reliable Audio for Use over the Internet", in Proceedings of INET'95, June 1995, Honolulu, Hawaii.
  • Isidor Kouvelas, Orion Hodson, Vicky Hardman and Jon Crowcroft, "Redundancy Control in Real-Time Internet Audio Conferencing", in Proceedings of AVSPN 97, September 1997, Aberdeen, Scotland, UK.
  • Colin Perkins, Isidor Kouvelas, Orion Hodson, Vicky Hardman, Mark Handley, Jean-Chrysostome Bolot, Andres Vega-Garcia, Sacha Fosse-Parisis, "RTP Payload for Redundant Audio Data", IETF Audio/Video Transport Working Group, RFC2198, September 1997.

As an alternative to redundant transmission, recent versions of RAT provide the option to send interleaved audio. Units of audio data are resequenced before transmission, so that originally adjacent units are separated by a guaranteed distance in the transmitted stream, and returned to their original order at the receiver. Interleaving disperses the effect of packet losses. If, for example, units are 5ms in length and packets 20ms (ie: 4 units per packet), then the first packet could contain units 1, 5, 9, 13; the second packet would contain units 2, 6, 10, 14; and so on. It can be seen that the loss of a single packet from an interleaved stream results in multiple small gaps in the reconstructed stream, as opposed to the single large gap which would occur in a non-interleaved stream.

Although interleaving does not reduce the amount of loss observed, it does significantly improve the perceived quality of an audio stream. The obvious disadvantage of interleaving is that it increases latency. This limits the use of this technique for interactive applications, although it performs well for non-interactive use. The major advantage of interleaving is that it does not increase the bandwidth requirements of a stream.

Receiver based repair of damaged audio streams

Receiver based recovery schemes rely on producing a replacement for a lost packet which is similar to the original. This is possible since audio signals, and in particular speech, exhibit large amounts of short-term self similarity. As such, these techniques work for relatively small loss rates (less than 15%), and for small packets (4-40ms). When the loss length approaches the length of a phoneme (5-100ms) these techniques breakdown, since whole phonemes may be missed by the listener.

It is, therefore, clear that receiver based repair schemes are not a substitute for sender-based repair, but rather work in tandem with it. A sender-based scheme is used to repair most losses, leaving a small number of isolated gaps to be repaired. Once the effective loss rate has been reduced in this way, receiver based repair forms a cheap and effective means of patching over the remaining loss.

A number of receiver based repair schemes are implemented in RAT:

  • Silence substituation
  • Packet repetition
  • Pattern matching repair

A simple form of receiver based recovery is silence substitution. The gap left by a lost packet is filled with silence, to maintain the timing relationship between the surrounding packets. It is only effective with short packet lengths (less than 4ms) and low loss rates (less than 2%), making it suitable for striped audio with narrow and distributed stripes over low loss paths.

The performance of silence substitution degrades rapidly as packet sizes increase, and quality is unacceptably bad for the 40ms packet size in common use in network audio conferencing tools. Despite this, the use of silence substitution is widespread, primarily because it is simple to implement.

Packet repetition replaces lost packets with copies of the packets that arrived immediately before the loss. It has low computational complexity and performs reasonably well. The subjective quality of repetition is improved by gradually fading repeated units. The GSM system, for example, advocates the repetition of the first 20ms with the same amplitude and followed by fading the repeated signal to zero amplitude over the next 320ms.

The use of repetition with fading is a good compromise between the poor performance of silence substitution, and the more complex pattern matching scheme.

Pattern matching repair uses audio before and after the loss to interpolate a suitable signal to cover the loss. It performs somewhat better than packet repetition, but is significantly more computationally intesive.

Adaptive Scheduling Protection

Current general purpose operating systems, such as Unix and Windows 95, do not provide adequate support for real-time services in their scheduling algorithms. RAT uses a novel adaptive algorithm, where the DMA driven audio playout is used to `cushion' the system against scheduling anomolies. This is described in the following paper:

Secure Conferencing

RAT allows for secure conferencing, whereby media streams and participant identity information can be encrypted using triple-DES. Other encryption algorithms could easily be added.

Improved Statistics and diagnostic features

Like other RTP-based audio tools, RAT provides reception quality statistics and user information for all participants in a conference. In addition, it has a graphical display of the loss to/from each participant, making diagnosis of problems a simple matter:

Conference coordination bus

RAT implements a conference coordination bus, whereby the user interface and media engine are separated, and communicate via an IPC mechanism. This allows for complete control of RAT by another process operating on the same host. Advantages of this split approach include:

  • Customised user-interface: the existing RAT user interface can easily be replaced, with no loss of functionality.
  • Lip-synchronisation: RAT can communicate with a videa tool, to synchronise audio and video.
  • Integration with wide area conference control: a separate conference control process may be run on the same host as the audio/video tools. This can use the conference bus to control the media tools, to provide, for example H.323 conference control.

More details of the conference bus used in RAT are available here.

Transcoder operation

When the bandwidth available is not constant for all participants in a conference, or when some participants do not have multicast capable access, the RAT transcoder/gateway may be used. This connects two multicast groups, or one multicast group and a single unicast host. RTP packets received from either group are transcoded into the format specified for the other group, multiple sources are mixed together, and the resulting stream is transmitted to the other group. This allows for different codecs to be used in each group, meaning that the bandwidth requirements are different.