Why you should consider intrusive lists and trees

Intrusive data structures are forms of generic collections where the internal parts of the collection are directly stored in the data structures they are holding. The most common types of intrusive data structures are intrusive lists and intrusive trees.

Intrusive lists

Consider a conventional linked list:

struct ListNode {
    ListNode *prev;
    ListNode *next;
    MyData *d;
};

For each MyData* added to this list, a ListNode must be allocated to point to it and to point to the rest of the list. This seems fairly inefficient, but it is just accepted as "the way things are" for std::list.

What if the list nodes were part of the definition of MyData? Then we wouldn't need a separate allocation. The trade-off is that we are only able to place the object into one list this way, which is almost always fine.

struct MyData {
    MyData *prev;
    MyData *next;
    int value;
};

In fact, we can even make this more generic using CRTP:

template <typename Derived>
struct ListBase {
    Derived *prev;
    Derived *next;
}

struct MyData : public ListBase<MyData> {
    int value;
};

Intrusive red-black trees

This is mostly the same as the list example, except we have a bit more data per node:

struct TreeNode {
    TreeNode *parent;
    TreeNode *left;
    TreeNode *right;
    bool is_red;
    MyData *d;
};

As before, we can use CRTP to streamline this into the definition of MyData, with the same trade-off that we are only able to add the object into one tree.

template <typename Derived>
struct TreeBase {
    Derived *parent;
    Derived *left;
    Derived *right;
    bool is_red;
};

struct MyData : public TreeBase<MyData> {
    int value;
}

While this may seem only abstractly useful so far, there are several specific examples where it really shines.

Reducing allocations

Intrusive data structures require no extra allocations. The item is simply added to or spliced from the list or tree. This is especially helpful for systems-level programming, where every allocation has to be accounted for, and dynamic allocation of list or tree nodes may not always even be possible. However, for lists, the intrusive version must be doubly-linked, which may waste some space.

Flexible array members

Structures with a flexible array member cannot be contained in owning C++ containers because they "clip" the struct to its size without the member. However, this is not a problem for intrusive data structures.

struct Flexible : public TreeBase<Flexible> {
    size_t data_size;
    int data[];
};

Bimaps

In bidirectional maps (not to be confused with bitmaps), a unique instance of an object can be looked up by more than one key. This form of database can be quite painful with an extrusive map like std::map that owns its data—you must pick one key field of the struct to be "canonical" and the rest of the maps point to that key.

Here we can use multiple instances of the tree node in the same data structure to be able to index multiple keys:

struct Document {
    TreeNode<Document> id_node;
    TreeNode<Document> owner_node;

    uint64_t id;
    User *owner;
    void *data;
    size_t size;
};

This would allow inserting the same instance of a Document into a tree that allows lookup by id, and into a tree that allows lookup by owner.

Thread synchronization

When multiple threads need to wait for a resource, they can place an object with a list node on their own stack and register it with some form of coordinating object. One possible example is a hypothetical implementation of std::stop_callback.

struct stop_callback {
    ListNode<stop_callback> stop_source_node;
};

Constructing the stop_callback against a stop_token would extract its stop source, lock the stop source, and add the stop_callback to the list. Conversely, destructing the stop_callback would lock the stop source and splice the stop_callback out of the list.