Zenhack.net

Red-Black Trees

24 Jul 2016

Damien has been brushing up on his data-structure-implementation skills, and in the course of discussing it with him, I’ve gotten a bit nostalgic for working on that kind of thing. I’ve mostly been working in high-level languages on things that aren’t algorithmically tricky, and I feel like I’m getting soft. So today I decided to implement a red-black tree in C. It’s not done yet, but in thinking about how to write it, I came up with a trick that I thought was kinda neat, and while I’m sure I’m not the first person to think of it, I haven’t seen it written up or heard it discussed before, so this week’s post is a write-up of the idea.

The Wikipedia article on red-black trees gives a description of the algorithms, and much of an implementation in C. The algorithm for re-balancing the tree involves checking various conditions about parent, grandparent, and “uncle” nodes. You somehow have to propagate the needed information into the routine that does the re-balancing, as it isn’t available in a standard binary tree.

Wikipedia’s approach is to just add a parent pointer to the node. This is a reasonable approach, but it increases memory usage. What are other approaches? One thing you could try to do is pass the necessary information into each call, but I’ve found that to be finicky in the past.

Here’s my idea: store the parent pointers in a linked-list on the stack, while you’re doing your insert/remove/whatever. Then, you have equivalent information available when you get to the re-balancing, but you don’t have to store the pointers except when you actually need them; when you return from e.g. insert, they go away. There’s no real overhead in terms of allocation either, since the list is on the stack.

Pseudocode for a few bits of the implementation:


struct Lineage {
    Lineage *parent;
    Node *node;
    Direction child_dir;
};

void insert(Node *node, Key key, Value val, Lineage *parent) {

    if(node->is_leaf) {
        node->is_leaf = false;
        node->color = RED;
        node->key = key;
        node->val = val;
        node->left = leaf();
        node->right = leaf();
        rebalance(node, parent);
        return;
    }

    Lineage child_lineage = (Lineage) {
        .parent = parent,
        .node = node,
    };

    if(key == node->key) {
        node->val = val;
    } else if(key < node->key) {
        insert(node->left, key, val, &child_lineage);
    } else { // key > node->key
        insert(node->right, key, val, &child_lineage);
    }
}

void rebalance(Node *node, Lineage *parent) {
    // have a look at the wikipedia article for the full explanation of
    // this.
    if(parent == NULL) {
        // Case 1: we're the root:
        node->color = BLACK;
    } else if(parent->node->color == BLACK) {
        // Case 2: parent is black, so we're fine.
    } else if(uncle(parent)->color == RED) {
        // Case 3: both parent and uncle are red; repaint them, paint the
        // grandparent red, and recurse.

        uncle(parent)->color = BLACK;
        parent->node->color = BLACK;

        parent->parent->node->color = RED;
        rebalance(parent->parent->node, parent->parent->parent);
    }
    // ... remaining cases; the above should be enough to show how the lineage
    // list is used.
}

Node *uncle(Lineage *parent) {
    if(parent->child_dir == LEFT) {
        return parent->node->right;
    } else {
        return parent->node->left;
    }
}