Internal Structure |
Microsoft-supplied sorted classes internally use a binary tree data structure. This structure has some negative space and performance implications:
Elements require individual allocations increasing space requirements.
Elements require individual allocations eliminating locality speed advantages.
Enumeration requires a tree traversal with a callback for each element.
Accessing the first and last elements require traversing from root to leaf.
The data structure used by all ranked classes is an order statistic B+ tree variant. A tree becomes an order statistic tree with the addition of two traits:
Select(i) — find the i'th smallest element in the tree (i.e. retrieve key by index).
Rank(x) – find the rank of item x in the tree (i.e. retrieve index by key). The names of the classes in the KaosCollections library are derived from this operation.
As a B+ tree, all elements are stored in leaf nodes at the same depth. The leaf level is a sorted doubly linked list with head and tail pointers. The first key of every leaf (except the leftmost) is copied to one branch for subdividing. A tree with no elements is represented as an empty leaf.
This structure differs from a common B+ tree in three ways:
While the root may contain as few as two children, other rightmost branches may contain as few as one child. This variation optimizes for time and space when bulk loading of presorted data and improves seek performance for data near the end of the collection - both common operations. All other branches maintain at least 50% capacity usage following every add and remove operation.
The rightmost leaf may contain as few as one item. Again, this variation optimizes the structure for bulk loading of presorted data. All other leaves maintain at least 50% capacity usage following every add and remove operation.
Every branch stores the number of elements (weight) in all of its descendent leaves. For example, the weight of the root is the total number of elements in every leaf.