Maps: Generic Associative Containers for Delphi (2, 3, & 4)
by Robert R. Marsh, S.J.
Maps are probably the most generally useful of all the abstract data types yet they are missing from Delphis VCL. What are they? First of all they are containersthey are designed to hold and manipulate data. What kind of data? Any kindnearly! The maps provided in this library are genericthey can hold Delphis atomic types (integer, real, string, etc.) and they can also hold objects. And like Delphis TStringList, maps are associative containersthey store things in pairs so that given one you can get the other. But where TStringList can only associate objects with strings, maps can be used to associate strings with objects, or integers with characters, or whatever you want.
The trick behind the ability to handle generic data is Delphis own array of const idiom which is used in a few strategic places in the VCL, e.g., in the Format function. You wont find it much used elsewhere though mainly because programmers imagine it to be slow and unwieldy, perhaps confusing it with another of Delphis mechanisms, the Variant type. Ross Judson, however, has shown it to be a powerful, robust, and efficient way of manipulating generic data. The inspiration for this library comes from Ross Judson's SDL (Standard Delphi Library) which uses the "array of const" technique to implement a very powerful generic container and algorithm library modeled after the C++ Standard Template Library (STL). Delphi programmers should overcome their distrust of anything remotely related to C++ and give it a careful look at Ross website (http://www.soletta.com).
The aim of the Maps library presented here is much more modest than Judsons. Delphi's "built-in" containers, TList and TStringList, are so easy to use that most programmers turn to them without further thought despite two great disadvantages. First, Delphi's lists manage to be generic only by handling all data as pointers, thereby ignoring a great deal of information that could be used to make life easier for the programmer. Second, and more important, the array-based architecture of Delphi's lists trades off the efficiency of some operations against others. In particular efficiency of insertion and deletion is sacrificed for rapid random access and searching. Every implementation makes such tradeoffswhat the programmer needs is a variety of containers offering a range of behaviors. With the library presented here you have the choices you need. If you want the fastest addition and deletion, for example, you could use the list-based map. For the fastest search, the hash map is the best choice but if, in addition to efficient searching, you need to access data in order the map based on a treap proves a good choice.
Some Terms
The Maps library will be easier to grasp if some terms are defined.
Variable: The generic type, Variable, is introduced to hold almost any data type, simple or object. It is, in fact, an alias for Delphi's TVarRec (which is unrelated to the Variant type). Of the commonly used types only arrays and records cannot be stored in a map.
Item, Key, & Datum: Maps store Items. Each Item is a pair of Variables, a Key and a Datum. The Map associates the Datum with the Key so that given a Key the associated Datum can be found efficiently. The relationship is asymmetric though: given a Datum a Key can be found but usually only by exhaustive search.
Indicator: An Indicator is analogous to a pointerit indicates something, in this case an Item stored in a map. If you have a valid Indicator for a particular map you can access the Item it points to or you can use the Indicator to navigate through the map. I might have called it instead an "iterator" but since most of the time what it does is indicate rather than iterate I decided against it.
Value and Reference: There is always an issue with container typesshould they store a copy of your data (a value) or just a reference to it? This matters particularly where objects are concerned because someone has to take responsibility for eventually freeing the objectshould it be you or the container? TList and TStringList choose to store pointers to data and leave the lifetime of the object up to the programmer. These Maps, in contrast, store values, i.e., they copy the value they are given and look after the disposal of their own contents. This applies both to simple types (integer, extended, string, etc.) and to objects.
Using Variables
In Delphi 4.x the following data types are compatible with Variable: Integer, Boolean, Char, Extended, ShortString, Pointer, PChar, Object, Class, WideChar, PWideChar, AnsiString, Currency, Variant, Interface, WideString, and Int64. (Interface is not available in D2, and WideString and Int64 are absent from D3).
The Maps Library imposes a couple of its own restrictions on this abundance. Although Variables are compatible with any TObject descendant, Maps will only accept objects descended from TPersistent. This gives Maps the ability to copy and stream their contents (but see below for a caveat). The Maps library also interprets pointers in a particular way. All Variables holding pointers are interpreted as references to chunks of memory. PChars, for example, are treated as null-terminated strings and ordinary pointers as blocks of memory allocated (directly or indirectly) by the Delphi heap memory manager. The reason is, again, the need to copy and stream such data.
When a compatible type is passed to a routine using the array of const idiom Delphi automatically creates a suitable Variable to hold it. Much of the time, however, you will want to construct Variables yourself using the AsVariable function. There are also a full suite of helper functions to make the inverse transformations, e.g., AsInteger(V) returns the integer value of the Variable (if it is holding one). The type of Variable can be checked by the TypeOf function.
Void is defined as a Variable of no particular type. Void is compatible with any other type and is counted as equal to any other value. Void Keys make little sense but Void Data can be useful. The IsVoid function
The first (non-Void) Key and Datum added to a map define the maps run-time type. Subsequent additions have to follow suit or an exception is raised.
Caveat There are some thorny issues with streaming objects. I indicated above that descendants of TPersistent are automatically copied and streamed by the Maps Library but there is one importnat proviso. The designers of Delphi chose not to make TPersistent's constructor virtual which means that it is difficult (probably impossible) to construct instances of persistent objects properly. But, if that is so, how does Delphi's VCL handle its own persistent objects like TStringList or TFont? By trickery, as far as I can see! I have found no documentation on the subject (so I may well be wrong) but it seems that the VCL persistent objects only work because they don't instantiate anything in their constructors. Instead the necessary structures are created on first use. See, for example, the Create and Grow methods of TStringListthe necessary memory is only allocated if the list is actually used. Not only does this solve the problem of the missing virtual constructor but tends to optimize resource usage by delaying until the last minute before allocating resources. If you have the luxury of crafting your own persistent objects then make sure you follow the same practice and the Maps Library will be able to copy and stream them correctly.
Common Behavior
Five kinds of Map are available distinguished by underlying data structures used in their implementation. The different characteristics of each will be described below. For now we just name them: TListMap, THATMap, THashMap, TTreeMap, and TTreapMap. Each descends from TAbstractMap and as such shares a common set of properties and methods.
Count gives the number of items in the map and Capacity the maximum a map is expected to have to hold. Setting the Capacity is essential for THashMap and helpful for THATMap, but immaterial for the other maps.
Each map has three read/write array properties Keys, Data, and DataByKey. Keys and Data are indexed by Indicator. DataByKey indexes by Key.
Items can be added to a map using the above properties or, more directly, by the Addxxx methods. AddItems adds several Key-Datum pairs in one operation. AddItemVal, and AddItemRef add one item at a time, "Val" or "Ref" indicating whether the item is handed over by value or merely by reference. The Val/Ref convention is used throughout to make it clear whether the map takes over the management of the item or not. Usually an item is added by value and accessed by reference but occasionally it can be useful to do differently.
Clear empties a map of all its contents and restores it to its newly created state. Items can be removed selectively according to Key using DeleteByKey or DeleteByKeys and by Datum using DeleteByDatum or DeleteByData. The latter two methods can be slow.
There are a number of ways of locating items. The Find function, modeled after TStrings.Find, tries to locate an item by Key and sets an Indicator accordingly. IndexOfKey does the same job in a different style. IndexOfDatum locates an item according to the value of a Datum but does so by a slow exhaustive search. The actual ordering of items varies with each kind of map. Maps use a set of built-in ordering methods and the programmer never has to provide an ordering function. If a Map holds strings two properties govern how they are orderedCaseSensitive and Ansi. Ansi causes string-ordering to follow the default Windows locale but introduces considerable overhead (Delphis own string containers use Ansi ordering).
Location methods are one of two ways to generate Indicators, the other being the navigation methods First, Last, Prev, and Next. An Indicator can be checked for validity by the Valid function. Invalidate is used to force an Indicator to an invalid value. For example, calling Next on a Last Indicator invalidates the Indicator. In this sense Valid works like BOF or EOF in a database system.
It is important to realize that Indicators only have temporary validity: adding to or removing items from a map leaves their behavior undefined (they may still register as Valid and yet not be so). Note too that Indicators from location methods can only be used for navigation in certain Maps (TListMap, THATMap, THashMap work, TTreeMap and TTreapMap dont).
Armed with an Indicator an item can be accessed (see Keys and Data above). GetKeyVal and GetKeyRef return the Key pointed at by (GetDatumVal and GetDatumRef analogously). If you get a Variable by value you take on the responsibility for disposing of it appropriately. Since theres no copying involved, getting a Variable by reference is also quicker. PutKeyVal, PutKeyRef, PutDatumVal, and PutDatumRef modify the Variable according to a similar pattern.
Maps can be persistent. SaveToStream and SaveToFile store away enough information to recreate the map with all its contents using LoadFromStream or LoadFromFile. Maps can also be used as published properties of other objects permitting, for example, maps of maps. The Assign method accomplishes copying of one map from another. It can also handle copying to and from TStringList (with conversion of data types as necessary).
With a few exceptions (noted below) Maps do not permit duplicate Keys.
Differences
TListMap is based on an unordered doubly-linked list. Items are added to the map either at the front or the back depending on the value of the AddAtFront property. There is an additional Delete method which removes an item according to an Indicator. The Capacity property has no function.
Unordered
Addition fastest of the maps
Location very slow exhaustive search
Deletion fast once you have an Indicator
Navigation fast
THATMap is based on a Hashed Array Tree (adapted from C code by Edward Sitarski, Dr Dobbs Journal, September 1996, 107-110). It can be regarded as a generalization of TStringList with a reduced tendency to fragmentation. THATMap is the only one of the supplied maps with random-access Indicators. Its Indicators can be cast as integers and manipulated directly (and not just via Next and Prev).
THATMap may be Sorted or not. When unsorted, items can be inserted at any location as well as added at the end. Move, Exchange, Reverse, and Sort are also available. When sorted, items are added in Key order. Through the Duplicates property it is possible to allow, ignore, or raise an exception when a duplicate Key is added to a sorted THATMap. Setting Capacity in advance makes additions to the map much faster.
Ordered or Unordered
Addition moderate if unsorted (faster at end), slow if sorted
Location very slow exhaustive search if unsorted, fast binary search if sorted
Deletion slow in general, fast at the end if unsorted
Navigation very fast (random-access too)
THashMap is based on a hash table with linked-list chaining. It is very important to set Capacity to the number of items you expect the map to accommodate. If asked to hold more performance degrades significantly. Capacity is best set while the map is empty to avoid the slow operation of rehashing all Keys.
The chief virtue of THashMap is very fast searching (independent of how big the map is). Navigation through the map is fast but not in any intelligible order.
Unordered
Addition very fast
Location very fast indeed
Deletion fast
Navigation fast but out of Key order
TTreeMap is based on a simple binary search tree with a high probability of being balanced because items are ordered not by Key but by the Keys hash value. Overall performance is second only to THashMap and there is no need to set Capacity in advance.
Unordered
Addition fast
Location fast
Deletion fast
Navigation fast but out of Key order
TTreapMap is based on the treapa relatively new balanced binary tree (R. Seidel and C. R. Aragon, "Randomized Search Trees." Algorithmica, 16(4/5):464-497, 1996). Performance is generally good with the advantage that Keys can be retrieved in order. Unlike most of the maps presented here TTreapMap can handle Duplicates.
Ordered
Addition good
Location good
Deletion good
Navigation good and in Key order
Examples of Use
The example program, Anagram, takes a file of words (a Scrabble-users dictionary) and extracts all the sets of words which are anagrams of each other. It also attempts the same task with TStringList. The advantages of maps show in the greater ease of development and the greater speed (a factor of ~25 on my machine).
Some general examples would be helpful though to get going. The snippet below adds some items to a map and then displays them in order.
00 var01 m: TListMap;02 I: Indicator;03 K: Variable;04 D: Variable;05 begin06 m:=TListMap.Create;07 try08 m.AddItems([1,2,3,4],[one,two,three,four]);09 m.AddItemVal(AsVariable([5]), AsVariable([five]));10 I:=m.First;11 while m.Valid(I) do12 begin13 K:=m.GetKeyRef(I);14 D:=m.GetDatumRef(I);15 ShowMessage(IntToStr(AsInteger(K)) + + AsString(D));16 I:=m.Next;17 end;18 finally19 m.Free;20 end;21 end;Notes:
08 AddItems uses the [] notation to add four key/datum pairs. The keys are integers, the data strings.
09 AddItemVal and the helper function AsVariable is used add another pair.
10 First returns an Indicator to the first item in the map.
11 We loop as long as the I is valid, i.e., until we have gone through every item in the map.
13 GetKeyRef retrieves a reference to the item at I.
14 GetDatumRef likewise. These are Variables.
15 AsInteger and AsString are used to extract the contents of K and D.
16 Next returns an Indicator, either to the next item or, if at end, an invalid one.
17 Free destroys the map and disposes of all memory associated with its contents.
The next snippet shows how to locate and modify items.
00 var01 m: THashMap;02 I: Indicator;03 V: Variable;04 begin05 m:=THashMap.Create;06 try07 m.AddItems([one,two,three,four],[1.0,2.0,3.0,4.0]);08 if m.Find(AsVariable([three]),I) then09 begin10 V:=m.GetDatumRef(I);11 V:=AsVariable([2.0*AsExtended(V)]);12 m.PutDatumVal(I,V);13 end;14 m.DataByKey[AsVariable([four])] := AsVariable([4.4]);15 finally16 m.Free;17 end;18 end;
Notes:
05 A different kind of map but the same pattern of use.
07 This time strings are used to key floating point values.
08 We try and find the key threeif it exists I will Indicate it.
09 V gets a reference to the datum (it should be 3.0)
10 We double it and put it back in V
11 and put V in its original place in the map
14 A different approachreplace 4.0 by 4.4 using the array property DataByKey