Let us analyze your application - now have a free initial consultation.

Are you looking for a company to support or support your Delphi project?
Click here!

Jetzt Anfrage einreichenTelefon ERPwerk Button.fw

Generische Sets

Nils Eilers, 13.06.2017

 

Grijjy, Inc. Consulting: Open Source Libraries and Components on GitHub

Grijjy, Inc. Consulting makes various libraries, components, and source code used in their application development available on GitHub. Part of this offering includes generic sets, which are described in the following post:


Introduction of TgoSet

Have you ever weighed the use of a TList<T> or a TDictionary<T> for storing a collection of items?

  • You want to quickly check if the collection contains a particular item, so a dictionary would make sense.

  • However, you don't want to store Key/Value pairs, just items, so a list would seem more appropriate.

Perhaps you've solved this problem by creating a dictionary with a dummy value type (like TDictionary<String, Integer>) and adding 0 values with all strings. While this works, it feels like a hack and wastes memory and performance.

Therefore, we created a generic set called TgoSet<T>. You can find it in the Grijjy.Collections unit.


What Is a Set?

A set, like a dictionary, is a collection of unordered items that uses hashing for fast lookups.

  • Like a list, it stores only items without Key/Value pairs.

  • Unlike a list, the order of items in the collection is undefined, and items cannot be accessed by index.

  • Items can be enumerated using a for..in loop or the ToArray method.


Simple API Example

Using a TgoSet is straightforward. The API is similar to a dictionary's:

var
  Keywords: TgoSet<String>;
  Keyword: String;

begin
  Keywords := TgoSet<String>.Create;
  try
    Keywords.Add('if');
    Keywords.Add('case');
    Keywords.Add('repeat');
    Keywords.Add('while');
    Keywords.Remove('repeat');

    if (Keywords.Contains('if')) then
      WriteLn('"if" is a Delphi keyword');

    Write('Delphi keywords:');
    for Keyword in Keywords do
      Write(' ', Keyword);

  finally
    Keywords.Free;
  end;
end;

Output:

"if" is a Delphi keyword  
Delphi keywords: while if case  

As you can see, the order of items in the collection is undefined. It typically does not match the insertion order or any alphabetical sorting.

  • Like in a dictionary, the order of items is determined by their hash codes.

Note:

  • Like a dictionary, the Add method raises an exception if the item already exists in the set.

  • To avoid this, use AddOrSet, although it's slightly slower.

You can also provide your own comparer to the constructor, like in a dictionary, if you do not want to use the default comparer. This might be useful if:

  • You want to compare strings case-insensitively.

  • You want to use a faster hash function than the one provided by Delphi.


Other Set Types

If you look at the source code of TgoSet<T>, you will see that it inherits from TgoReadOnlySet<T>. This means you can provide a read-only view of the set as a property or parameter, ensuring that others cannot accidentally modify the set.

Like other collections in Delphi, there is a version that only stores objects and automatically takes ownership of those objects.

  • This version, TgoObjectSet<T>, destroys its objects when the set is destroyed, cleared, or objects are removed.

  • To remove an object without destroying it, use the Extract API.


How a Set Works

A set is most similar to a dictionary without values.

  • Delphi’s TDictionary<TKey, TValue> is implemented using hash tables and linear probing for collision resolution.

  • We use the same model in our set.

Our hash table is simply a dynamic array of TItem records:

type
  TgoReadOnlySet<T> = class(TEnumerable<T>)
  private type
    TItem = record
      HashCode: Integer;
      Item: T;
    end;
  private
    FItems: TArray<TItem>;
    ...
  end;

A TItem stores its value and hash code.

  • The length of the hash table (FItems) is always a power of two.


Adding an Item

Adding an item involves:

  1. Calculating the hash code for the item (using IEqualityComparer<T>.GetHashCode).

  2. Mapping the hash code to an index of the hash table using a logical and operation with the length of the hash table minus 1.

  3. Finding the appropriate item in the hash table.

    • If it is empty, the item is added.

    • Otherwise, the index is incremented until an empty item is found (linear probing).

procedure TgoSet<T>.Add(const AItem: T);
var
  Mask, Index, HashCode, HC: Integer;
begin
  if (FCount >= FGrowThreshold) then
    Resize(Length(FItems) * 2);

  HashCode := FComparer.GetHashCode(AItem) and $7FFFFFFF;
  Mask := Length(FItems) - 1;
  Index := HashCode and Mask;

  while True do
  begin
    HC := FItems[Index].HashCode;
    if (HC = EMPTY_HASH) then
      Break;

    if (HC = HashCode) and FComparer.Equals(FItems[Index].Item, AItem) then
      raise EListError.CreateRes(@SGenericDuplicateItem);

    Index := (Index + 1) and Mask;
  end;

  FItems[Index].HashCode := HashCode;
  FItems[Index].Item := AItem;
  Inc(FCount);
end;

Explanation

  1. If the hash table is 75% full, it resizes.

  2. The hash code is bitwise anded with $7FFFFFFF to ensure it is non-negative.

    • This prevents conflicts with -1, which indicates an empty hash table entry.

  3. The while loop performs linear probing to find empty entries.

    • If a matching hash code exists, it checks equality with IEqualityComparer<T>.Equals.

    • If they match, an exception is raised since duplicates are not allowed.

This is the basic implementation of a hash table in a collection. Additional code is required for resizing the set and removing items.

For more details, refer to the source code on Grijjy’s GitHub repository.


Original post: “Expand your Collections collection – Part 1: a generic Set” by Erik van Bilsen on Grijjy's website, January 5, 2017.
Translated by Nils Eilers.