Nils Eilers, 13.06.2017
Grijjy, Inc. Consulting: Open Source Libraries and Components on GitHub
Grijjy, Inc. Consulting makes various libraries, components, and source code used in their application development available on GitHub. Part of this offering includes generic sets, which are described in the following post:
Introduction of TgoSet
Have you ever weighed the use of a TList<T>
or a TDictionary<T>
for storing a collection of items?
You want to quickly check if the collection contains a particular item, so a dictionary would make sense.
However, you don't want to store Key/Value pairs, just items, so a list would seem more appropriate.
Perhaps you've solved this problem by creating a dictionary with a dummy value type (like TDictionary<String, Integer>
) and adding 0
values with all strings. While this works, it feels like a hack and wastes memory and performance.
Therefore, we created a generic set called TgoSet<T>
. You can find it in the Grijjy.Collections
unit.
What Is a Set?
A set, like a dictionary, is a collection of unordered items that uses hashing for fast lookups.
Like a list, it stores only items without Key/Value pairs.
Unlike a list, the order of items in the collection is undefined, and items cannot be accessed by index.
Items can be enumerated using a
for..in
loop or theToArray
method.
Simple API Example
Using a TgoSet
is straightforward. The API is similar to a dictionary's:
var
Keywords: TgoSet<String>;
Keyword: String;
begin
Keywords := TgoSet<String>.Create;
try
Keywords.Add('if');
Keywords.Add('case');
Keywords.Add('repeat');
Keywords.Add('while');
Keywords.Remove('repeat');
if (Keywords.Contains('if')) then
WriteLn('"if" is a Delphi keyword');
Write('Delphi keywords:');
for Keyword in Keywords do
Write(' ', Keyword);
finally
Keywords.Free;
end;
end;
Output:
"if" is a Delphi keyword
Delphi keywords: while if case
As you can see, the order of items in the collection is undefined. It typically does not match the insertion order or any alphabetical sorting.
Like in a dictionary, the order of items is determined by their hash codes.
Note:
Like a dictionary, the
Add
method raises an exception if the item already exists in the set.To avoid this, use
AddOrSet
, although it's slightly slower.
You can also provide your own comparer to the constructor, like in a dictionary, if you do not want to use the default comparer. This might be useful if:
You want to compare strings case-insensitively.
You want to use a faster hash function than the one provided by Delphi.
Other Set Types
If you look at the source code of TgoSet<T>
, you will see that it inherits from TgoReadOnlySet<T>
. This means you can provide a read-only view of the set as a property or parameter, ensuring that others cannot accidentally modify the set.
Like other collections in Delphi, there is a version that only stores objects and automatically takes ownership of those objects.
This version,
TgoObjectSet<T>
, destroys its objects when the set is destroyed, cleared, or objects are removed.To remove an object without destroying it, use the
Extract
API.
How a Set Works
A set is most similar to a dictionary without values.
Delphi’s
TDictionary<TKey, TValue>
is implemented using hash tables and linear probing for collision resolution.We use the same model in our set.
Our hash table is simply a dynamic array of TItem
records:
type
TgoReadOnlySet<T> = class(TEnumerable<T>)
private type
TItem = record
HashCode: Integer;
Item: T;
end;
private
FItems: TArray<TItem>;
...
end;
A TItem
stores its value and hash code.
The length of the hash table (
FItems
) is always a power of two.
Adding an Item
Adding an item involves:
Calculating the hash code for the item (using
IEqualityComparer<T>.GetHashCode
).Mapping the hash code to an index of the hash table using a logical
and
operation with the length of the hash table minus 1.Finding the appropriate item in the hash table.
If it is empty, the item is added.
Otherwise, the index is incremented until an empty item is found (linear probing).
procedure TgoSet<T>.Add(const AItem: T);
var
Mask, Index, HashCode, HC: Integer;
begin
if (FCount >= FGrowThreshold) then
Resize(Length(FItems) * 2);
HashCode := FComparer.GetHashCode(AItem) and $7FFFFFFF;
Mask := Length(FItems) - 1;
Index := HashCode and Mask;
while True do
begin
HC := FItems[Index].HashCode;
if (HC = EMPTY_HASH) then
Break;
if (HC = HashCode) and FComparer.Equals(FItems[Index].Item, AItem) then
raise EListError.CreateRes(@SGenericDuplicateItem);
Index := (Index + 1) and Mask;
end;
FItems[Index].HashCode := HashCode;
FItems[Index].Item := AItem;
Inc(FCount);
end;
Explanation
If the hash table is 75% full, it resizes.
The hash code is bitwise
and
ed with$7FFFFFFF
to ensure it is non-negative.This prevents conflicts with
-1
, which indicates an empty hash table entry.
The while loop performs linear probing to find empty entries.
If a matching hash code exists, it checks equality with
IEqualityComparer<T>.Equals
.If they match, an exception is raised since duplicates are not allowed.
This is the basic implementation of a hash table in a collection. Additional code is required for resizing the set and removing items.
For more details, refer to the source code on Grijjy’s GitHub repository.
Original post: “Expand your Collections collection – Part 1: a generic Set” by Erik van Bilsen on Grijjy's website, January 5, 2017.
Translated by Nils Eilers.