Data Structures
Closed-addressing Containers
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as separate chaining. An example figure representing the data structure is below:

An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of O(bucket_count() + size())
when the standard requires complexity to be O(size())
.
Canonical standard implementations will wind up looking like the diagram below:
It’s worth noting that this approach is only used by libc++ and libstdc++; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found here.
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node before the start of the bucket’s elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
auto const idx = get_bucket_idx(hash_function(key));
node* p = buckets[idx]; // first load
node* n = p->next; // second load
if (n && is_in_bucket(n, idx)) {
value_type const& v = *n; // third load
// ...
}
With a simple bucket group layout, this is all that must be done:
auto const idx = get_bucket_idx(hash_function(key));
node* n = buckets[idx]; // first load
if (n) {
value_type const& v = *n; // second load
// ...
}
In practice, the extra indirection can have a dramatic performance impact to common operations such as insert
, find
and erase
. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a std::size_t
) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:

Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to O(size())
. In total, a bucket group is only 4 words in size and it views sizeof(std::size_t) * CHAR_BIT
buckets meaning that for all common implementations, there’s only 4 bits of space overhead per bucket introduced by the bucket groups.
A more detailed description of Boost.Unordered’s closed-addressing implementation is given in an external article. For more information on implementation rationale, read the corresponding section.
Open-addressing Containers
The diagram shows the basic internal layout of boost::unordered_flat_set
/unordered_node_set
and
boost:unordered_flat_map
/unordered_node_map
.

As with all open-addressing containers, elements (or pointers to the element nodes in the case of
boost::unordered_node_set
and boost::unordered_node_map
) are stored directly in the bucket array.
This array is logically divided into 2n groups of 15 elements each.
In addition to the bucket array, there is an associated metadata array with 2n
16-byte words.

A metadata word is divided into 15 hi bytes (one for each associated bucket), and an overflow byte (ofw in the diagram). The value of hi is:
-
0 if the corresponding bucket is empty.
-
1 to encode a special empty bucket called a sentinel, which is used internally to stop iteration when the container has been fully traversed.
-
If the bucket is occupied, a reduced hash value obtained from the hash value of the element.
When looking for an element with hash value h, SIMD technologies such as SSE2 and Neon allow us to very quickly inspect the full metadata word and look for the reduced value of h among all the 15 buckets with just a handful of CPU instructions: non-matching buckets can be readily discarded, and those whose reduced hash value matches need be inspected via full comparison with the corresponding element. If the looked-for element is not present, the overflow byte is inspected:
-
If the bit in the position h mod 8 is zero, lookup terminates (and the element is not present).
-
If the bit is set to 1 (the group has been overflowed), further groups are checked using quadratic probing, and the process is repeated.
Insertion is algorithmically similar: empty buckets are located using SIMD, and when going past a full group its corresponding overflow bit is set to 1.
In architectures without SIMD support, the logical layout stays the same, but the metadata word is codified using a technique we call bit interleaving: this layout allows us to emulate SIMD with reasonably good performance using only standard arithmetic and logical operations.

A more detailed description of Boost.Unordered’s open-addressing implementation is given in an external article. For more information on implementation rationale, read the corresponding section.
Concurrent Containers
boost::concurrent_flat_set
/boost::concurrent_node_set
and
boost::concurrent_flat_map
/boost::concurrent_node_map
use the basic
open-addressing layout described above
augmented with synchronization mechanisms.

Two levels of synchronization are used:
-
Container level: A read-write mutex is used to control access from any operation to the container. Typically, such access is in read mode (that is, concurrent) even for modifying operations, so for most practical purposes there is no thread contention at this level. Access is only in write mode (blocking) when rehashing or performing container-wide operations such as swapping or assignment.
-
Group level: Each 15-slot group is equipped with an 8-byte word containing:
-
A read-write spinlock for synchronized access to any element in the group.
-
An atomic insertion counter used for optimistic insertion as described below.
-
By using atomic operations to access the group metadata, lookup is (group-level) lock-free up to the point where an actual comparison needs to be done with an element that has been previously SIMD-matched: only then is the group’s spinlock used.
Insertion uses the following optimistic algorithm:
-
The value of the insertion counter for the initial group in the probe sequence is locally recorded (let’s call this value
c0
). -
Lookup is as described above. If lookup finds no equivalent element, search for an available slot for insertion successively locks/unlocks each group in the probing sequence.
-
When an available slot is located, it is preemptively occupied (its reduced hash value is set) and the insertion counter is atomically incremented: if no other thread has incremented the counter during the whole operation (which is checked by comparing with
c0
), then we’re good to go and complete the insertion, otherwise we roll back and start over.
This algorithm has very low contention both at the lookup and actual insertion phases in exchange for the possibility that computations have to be started over if some other thread interferes in the process by performing a succesful insertion beginning at the same group. In practice, the start-over frequency is extremely small, measured in the range of parts per million for some of our benchmarks.
For more information on implementation rationale, read the corresponding section.