com.googlecode.compressingcircularbuffers
Class CompressingCircularBuffer

java.lang.Object
  extended by com.googlecode.compressingcircularbuffers.CompressingCircularBuffer

public class CompressingCircularBuffer
extends java.lang.Object

Maintains a compressed, fixed-memory-footprint, "sketch" (summary) of a sampled data sequence. The compressed sequence is formed by applying a specified compression method to a series of approximately equal sized contiguous sequential chunks of the original sequence.

For example, with the default, AveragingCompressionMethod, sequential raw data samples (fed in via the update method) are averaged together in approximately equal sized contiguous chunks to form the compressed sequence. As new data comes in, those chunks expand in size (with associated averages adjusted accordingly) so that a buffer of the specified size can still contain a compressed form of the growing sequence.

In general, these chunk sizes will either 2k or 2k+1, where k is chosen so that the aggregated samples fill up the available buffer space (k starts at zero and increases as more samples come in and the buffer becomes full at the current compression level). Chunks at the beginning or at the end of the sequence use the larger aggregation interval, with the smaller interval used in the middle. You can use the methods getFirstSampleIndex and getLastSampleIndex to determine the number of samples aggregated into any specific compressed value.

Performance is very similar to that of an ordinary circular buffer: The algorithm updates the compressed sequence in at most one read from, and two writes to, its internal buffer per update. Thus it is appropriate for use when update times must be very predictable, since there is never any 'stalling' as large chunks of memory are reorganized to reflect a change in compression level. For example, if 'memory' were a large file on a hard disk, the algorithm could still have acceptable worst case update times (a couple of seeks/writes per update). As with ordinary circular buffers, it is also close to optimal in memory use: all buffer elements are filled with useful data.

This class uses an indexing scheme that exploits basic properties of modular arithmetic to quickly locate exactly those positions containing the 'compressed away' data that are available to store the new data. That's why it's more efficient than more straightforward approaches. In the associated paper "Compressing Circular Buffers" I explain this indexing scheme more completely.

Copyright, 2010, John C. Gunther. All rights reserved. Released under the terms of the Apache 2.0 licence.


Field Summary
static int DEFAULT_MAX_COMPRESSION_RATIO
          Default upper bound on compression ratio if none is specified in constructor.
static int MAX_ALLOWED_UPDATES
          Upper bound on maximum number of updates that can be performed without a reset.
static int NOSUCHINDEX
          Compressed element index returned when no compressed element matching given criteria is available.
 
Constructor Summary
CompressingCircularBuffer()
          Creates a new compressing circular buffer.
CompressingCircularBuffer(int nElements)
          Creates a new compressing circular buffer.
CompressingCircularBuffer(int nElements, CompressionMethod compressionMethod)
          Creates a new compressing circular buffer.
CompressingCircularBuffer(int nElements, CompressionMethod compressionMethod, int minCompressionRatio)
          Creates a new compressing circular buffer.
CompressingCircularBuffer(int nElements, CompressionMethod compressionMethod, int minCompressionRatio, int maxCompressionRatio)
          Creates a new compressing circular buffer.
 
Method Summary
 int getCompressedIndex(int iRawDataSample)
          Retrieves the index of the compressed value whose aggregation interval includes the given raw data sample.
 double getCompressedValue(int iCompressed)
          Retrieves compressed value associated with given index.
 int getCompressionRatio()
          Retrieves the number of points that will be included in the next new compressed value that will be formed from samples fed in via the update method.
 int getFirstSampleIndex(int iCompressed)
          Retrieves the index of the first uncompressed sequence value included in the specified compressed value; uncompressed sequence indexes start at 0.
 double getIncompleteCompressedValue()
          Retrieves the "in progress", or incomplete compressed value.
 int getLastSampleIndex(int iCompressed)
          Retrieves the index of the last uncompressed sequence value incorporated into the specified compressed value.
 int getNCompressed()
          Retrieves the number of compressed values within the circular buffer.
 int getNCompressedSamples()
          Retrieves the number of samples fed to the circular buffer that have been incorporated into the compressed sequence.
 int getNSamples()
          Retrieves the number of samples fed to the circular buffer so far.
 void reset()
          Returns the compressing circular buffer to it's initial state, discarding all previously seen data.
 void update(double sample)
          Updates the compressed representation of the sequence to include a newly collected data value (sequence element).
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_ALLOWED_UPDATES

public static final int MAX_ALLOWED_UPDATES
Upper bound on maximum number of updates that can be performed without a reset. This bound arises because integer related indexing calculations must remain in the valid range for integers.

See Also:
Constant Field Values

NOSUCHINDEX

public static final int NOSUCHINDEX
Compressed element index returned when no compressed element matching given criteria is available.

See Also:
Constant Field Values

DEFAULT_MAX_COMPRESSION_RATIO

public static final int DEFAULT_MAX_COMPRESSION_RATIO
Default upper bound on compression ratio if none is specified in constructor.

See Also:
Constant Field Values
Constructor Detail

CompressingCircularBuffer

public CompressingCircularBuffer(int nElements,
                                 CompressionMethod compressionMethod,
                                 int minCompressionRatio,
                                 int maxCompressionRatio)
Creates a new compressing circular buffer.

Parameters:
nElements - number of compressed elements used to represent sequence. Must be an odd integer and at least 1. Default is 1.
compressionMethod - how compression will be performed. Use an instance of one of the predefined compression methods (ThinningCompressionMethod, AveragingCompressionMethod, SummingCompressionMethod, MinimizingCompresionMethod, MaximizingCompressionMethod, or RandomSamplingCompressionMethod) or else create your own by implementing the CompressionMethod interface. Default is new AveragingCompressionMethod().
minCompressionRatio - minimum number of data points contained in any compressed value. Default is 1.
maxCompressionRatio - maximum number of data points in any compressed value. Must be ≥ minCompressionRatio. Note that compressed values always contain a number of points equal to minCompressionRatio*2k with k a non-negative integer. Thus actual maximum is the largest integer of this form that does not exceed the maxCompressionRatio that you specify. Once this limit is reached, the sequence will behave like an ordinary circular buffer (wrapping around and overwriting the oldest compressed value). Default is DEFAULT_MAX_COMPRESSION_RATIO.
See Also:
DEFAULT_MAX_COMPRESSION_RATIO, ThinningCompressionMethod, AveragingCompressionMethod, SummingCompressionMethod, MinimizingCompressionMethod, MaximizingCompressionMethod, RandomSamplingCompressionMethod

CompressingCircularBuffer

public CompressingCircularBuffer(int nElements,
                                 CompressionMethod compressionMethod,
                                 int minCompressionRatio)
Creates a new compressing circular buffer.

Convenience constructor. Same as:

 
 CompressingCircularBuffer(nElements, compressionMethod,
                           minCompressionRatio, DEFAULT_MAX_COMPRESSION_RATIO)
 

See Also:
CompressingCircularBuffer(int, CompressionMethod, int, int)

CompressingCircularBuffer

public CompressingCircularBuffer(int nElements,
                                 CompressionMethod compressionMethod)
Creates a new compressing circular buffer.

Convenience constructor. Same as:

 
 CompressingCircularBuffer(nElements, compressionMethod,
                           1, DEFAULT_MAX_COMPRESSION_RATIO)
 

See Also:
CompressingCircularBuffer(int, CompressionMethod, int, int)

CompressingCircularBuffer

public CompressingCircularBuffer(int nElements)
Creates a new compressing circular buffer.

Convenience constructor. Same as:

 
 CompressingCircularBuffer(nElements, new AveragingCompressionMethod(),
                           1, DEFAULT_MAX_COMPRESSION_RATIO)
 

See Also:
CompressingCircularBuffer(int, CompressionMethod, int, int)

CompressingCircularBuffer

public CompressingCircularBuffer()
Creates a new compressing circular buffer.

Convenience constructor. Same as:

 
 CompressingCircularBuffer(1, new AveragingCompressionMethod(),
                           1, DEFAULT_MAX_COMPRESSION_RATIO)
 

See Also:
CompressingCircularBuffer(int, CompressionMethod, int, int)
Method Detail

update

public void update(double sample)
Updates the compressed representation of the sequence to include a newly collected data value (sequence element).

For example, if you were sampling an on-line sensor once per minute, you might call update every minute, feeding in each newly sampled value. In that case, the compressing circular buffer would maintain an up-to-date compressed "sketch" of the entire sequence; that sketch would fit into the number of buffer elements that you specified in the compressing circular buffer's constructor.

Parameters:
sample - next value of a sampled sequence to be incorporated into the compressed representation.

getCompressedIndex

public int getCompressedIndex(int iRawDataSample)
Retrieves the index of the compressed value whose aggregation interval includes the given raw data sample.

Parameters:
iRawDataSample - raw data sample index
Returns:
index of compressed value whose aggregation interval includes the index of the given data sample, or NOSUCHINDEX if no compressed value includes the given sample.

getFirstSampleIndex

public int getFirstSampleIndex(int iCompressed)
Retrieves the index of the first uncompressed sequence value included in the specified compressed value; uncompressed sequence indexes start at 0.

Returns NOSUCHINDEX if there is currently no compressed value at the given index.

This method can be used when you need to determine the range of underlying uncompressed sequence values that a compressed value incorporates.

Parameters:
iCompressed - the compressed sequence element whose sample index is to be retrieved
Returns:
index of the first uncompressed sequence value incorporated into the specified compressed value.
See Also:
getLastSampleIndex, NOSUCHINDEX

getLastSampleIndex

public int getLastSampleIndex(int iCompressed)
Retrieves the index of the last uncompressed sequence value incorporated into the specified compressed value.

Returns NOSUCHINDEX if there is currently no compressed value at the given index.

Parameters:
iCompressed - the compressed sequence element whose sample index is to be retrieved
Returns:
index of the last uncompressed sequence value incorporated into the specified compressed value.
See Also:
getFirstSampleIndex, NOSUCHINDEX

getNCompressedSamples

public int getNCompressedSamples()
Retrieves the number of samples fed to the circular buffer that have been incorporated into the compressed sequence. The most recently collected samples that have not yet been incorporated into the compressed sequence are not included in the count.

Returns:
number of samples fed into the compressing circular buffer since it was created (or since the last reset), and that have been incorporated into the compressed sequence.

getNSamples

public int getNSamples()
Retrieves the number of samples fed to the circular buffer so far. In other words, the number of times that update was called since instantiation or reset.

Returns:
number of samples fed into the compressing circular buffer since it was created (or since the last reset).

getNCompressed

public int getNCompressed()
Retrieves the number of compressed values within the circular buffer.

Note: Except during startup, this will always equal the number of elements in the buffer specified in the constructor.

Returns:
number of compressed elements in the buffer

getCompressionRatio

public int getCompressionRatio()
Retrieves the number of points that will be included in the next new compressed value that will be formed from samples fed in via the update method.

Initially, this starts at minCompressionRatio, until the compressed values buffer fills up, in which case it doubles, until the buffer is filled again at the higher compression ratio, then it doubles again, and so forth.

Returns:
number of samples (updates) included in each newly added compressed sequence value.

getCompressedValue

public double getCompressedValue(int iCompressed)
Retrieves compressed value associated with given index.

Indexes range from 0...nElements-1, with order corresponding to the order of the underlying, uncompressed, data in the original sequence (0 for the oldest, nElements-1 for the newest).

Parameters:
iCompressed - index of compressed value to be retrieved
Returns:
value of the compressed sequence at the given index, or Double.NaN if no data has been placed into the compressed buffer at the given index (occurs only during startup).

getIncompleteCompressedValue

public double getIncompleteCompressedValue()
Retrieves the "in progress", or incomplete compressed value.

This value reflects any recent updates not yet incorporated into a compressed value. For example, if the buffer is currently compressing at a 4 to 1 ratio, and the total number of updates were not a multiple of 4, then a compressed value for either the most recently seen, 1, 2, or 3 points would be returned.

Note that if no incomplete compressed value is available, Double.NaN will be returned.

Returns:
value of the incomplete or "in progress" compressed element.

reset

public void reset()
Returns the compressing circular buffer to it's initial state, discarding all previously seen data.