Unit stringdiff

Classes

Functions

computeDifference - Decode integer from diffmap
dumpDifference - Extract other string using original string and diffmap
extractDifference - Compute difference between original and other and return "difference map" string
IntegerToTwoCharString - String difference routines: Barry Brannan, September 1997.
TwoCharStringToInteger - Encode integer into diffmap

Types

Constants

Variables


Functions


function computeDifference(block_size: integer; original, second: string): string;

Decode integer from diffmap

function dumpDifference(diffmap: string): string;

Extract other string using original string and diffmap

function extractDifference(original, diffmap: string): string;

Compute difference between original and other and return "difference map" string

function IntegerToTwoCharString(n: integer): string;

String difference routines: Barry Brannan, September 1997. barrylb@poboxes.com http://www.poboxes.com/barrylb Bug reports, suggestions welcome. If you make any improvements I would appreciate if you sent them to me. Some possible improvements... * faster speed * cleaner code The routines: ------------- computeDiff(blocksize, original, second) --> diffmap extractDiff(original, diffmap) --> second dumpDifference(diffmap) --> string illustrating diffmap nb blocksize is stored in diffmap What it is useful for: ---------------------- Suppose you are downloading pages of data from an online service at regular intervals. Rather than saving whole snapshots of the data each time, you save the first page then save "difference maps" for each update. If the pages don't change much then you have huge gains in storage efficency. How it works: ------------- 1. Each string (original, second) is divided into "blocks" of the specified blocksize. 2. Algorithm: For each block in second do Begin Find corresponding block in original. If found block in original then write block number to diffmap Else write literal text to diffmap End. 3. When trying to find corresponding block in original, the routines attempt to find a consecutive block rather than just the first block to optimize storage efficiency. See "Diffmap format", below. 4. The blocksize would ideally be 1 but that generally is too slow, so a blocksize of say 32 byte would be suitable for files under 64K. The maximum number of blocks in a file cannot exceed 65535 so you may have to increase the block size anyway if the files are too large. Diffmap format: --------------- Starts with one byte magic number then two bytes indicating the blocksize then followed by a sequence of codes... chr #1 -- source block range indicator, followed by four bytes. First two are start block number, second two are end block number. chr #2 -- literal string, followed by two bytes indicating the number of characters, then the characters themselves. chr #3 -- single source block, followed by two bytes indicating the source block number. Current magic number is '1' (ascii 49)

function TwoCharStringToInteger(s: string): integer;

Encode integer into diffmap

Types


Constants


Variables