In part 1 we saw a simple implementation of a linked list, but it only works with a particular structure:
struct node {
int value;
struct node *next;
};
What happens when the value is not an int
, but a float
? Or we want a more complex structure:
struct node {
int id;
const char *name;
struct node *next;
};
Clearly we need another approach.
GLib
Let’s take a look at a real implementation from a real library: GLib (not to be confused with glibc). GLib is the fundamental library every GNOME and GTK+ project uses, and fortunately for our purses it’s implemented in C.
typedef void* gpointer;
typedef struct _GSList GSList;
struct _GSList
{
gpointer data;
GSList *next;
};
GSList*
g_slist_remove_link (GSList *list,
GSList *link)
{
GSList *tmp = NULL;
GSList **previous_ptr = &list;
while (*previous_ptr)
{
tmp = *previous_ptr;
if (tmp == link)
{
*previous_ptr = tmp->next;
tmp->next = NULL;
break;
}
previous_ptr = &tmp->next;
}
return list;
}
I personally can’t stand this style, so let’s clean it up:
typedef struct GSList {
void *data;
struct GSList *next;
} GSList;
GSList *g_slist_remove_link(GSList *list, GSList *link)
{
GSList *tmp = NULL;
GSList **previous_ptr = &list;
while (*previous_ptr) {
tmp = *previous_ptr;
if (tmp == link) {
*previous_ptr = tmp->next;
tmp->next = NULL;
break;
}
previous_ptr = &tmp->next;
}
return list;
}
This looks very similar to our good taste version in part 1, except it’s using a tmp
pointer.
Let’s use p
instead of previous_ptr
and replace tmp
with it when possible:
GSList *tmp = NULL;
GSList **p = &list;
while (*p) {
tmp = *p;
if (*p == link) {
*p = (*p)->next;
tmp->next = NULL;
break;
}
p = &(*p)->next;
}
return list;
Now it is clear that tmp
is only really used to clear the next
pointer of the node we want to
remove. This is something that should be done by the user of this function if that’s what is
desired – which more likely it’s not, because the node will be disposed of – so let’s remove that
unnecessary code.
GSList **p = &list;
while (*p) {
if (*p == link) {
*p = (*p)->next;
break;
}
p = &(*p)->next;
}
return list;
We are getting closer to our version.
We can move the break comparison *p == link
into the while
condition, and update *p
after the
loop:
GSList **p = &list;
while (*p && *p != link)
p = &(*p)->next;
*p = (*p)->next;
return list;
At this point we’ve made too many changes to be comfortable without running any tests.
Tests
Let’s try to port our unit tests from part 1 to GLib’s single linked list.
First, let’s look at the structures:
struct node {
int value;
struct node *next;
};
struct GSList {
void *data;
struct GSList *next;
};
As we can see, the only real difference is that in our version the data (int value
) was directly
available, in GLib’s version data
is a pointer to our data.
So, instead of initializing our node’s data directly, we have to do do it indirectly:
// Before
node a = { .value = 0 };
// After
node_data a = { .value = 0 };
GSList link_a = { .data = &a };
And of course: we have to define our node_data
structure:
struct node_data {
int value;
};
So, our test looks like this:
int do_test(void) {
// Define the nodes
node_data a = { .value = 0 };
node_data b = { .value = 1 };
node_data c = { .value = 2 };
GSList link_a = { .data = &a };
GSList link_b = { .data = &b };
GSList link_c = { .data = &c };
// Link the nodes
GSList *list = &link_a;
link_a.next = &link_b;
link_b.next = &link_c;
// Remove the second node
list = g_slist_remove_link(list, &link_b);
if (link_a.next != &link_c) return false;
// Remove the first node
list = g_slist_remove_link(list, &link_a);
if (list != &link_c) return false;
return true;
}
This test passes both with GLib’s original version, and our cleaned up version. So we are good.
… not so fast.
Unexpected behavior
One difference from part 1’s code and this new version is the while
condition:
// Before
while (*p != entry)
// Now
while (*p && *p != link)
Have you spotted the problem?
What happens when *p
is NULL
?
It turns out Linus’ version has a bug – or at least a missing feature. If we search for an entry that
doesn’t exist, eventually we reach the end of the list, *p
is NULL
, we try to dereference
NULL->next
, and we crash.
Now, it’s debatable whether or not trying to remove a non-existing element is a valid use case, but surely it can happen. In those cases would probably would want to return an error (can’t do it in this API), show a warning (we would need a lot more infraestructure), or simply ignore the problem.
Either way crashing is never good, so we want to avoid that.
Our current tests aren’t enough though, we need to search for an element that isn’t in the list, for example an element that we just removed:
// Remove the second node
head = g_slist_remove_link(head, &link_b);
if (link_a.next != &link_c) return false;
// Remove the first node
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;
// Remove the same node again
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;
Does the code works now? Nope, because we made a mistake refactoring:
// This isn't the same
while (*p) {
if (*p == link) {
*p = (*p)->next;
break;
}
p = &(*p)->next;
}
// As this:
while (*p && *p != link)
p = &(*p)->next;
*p = (*p)->next;
The difference is that if *p
is NULL
we don’t want to execute anything else, so we have to check
that *p
isn’t NULL
at the end too:
while (*p && *p != link)
p = &(*p)->next;
if (*p) *p = (*p)->next;
Now the code is equivalent, and all the tests pass.
But if we are going to do two checks (one in the condition statement and one outside), maybe it’s better to move the code back inside as it originally was:
GSList **p = &list;
while (*p) {
if (*p == link) {
*p = (*p)->next;
break;
}
p = &(*p)->next;
}
return list;
Well, except that we can use a for
instead, and flip the if
condition:
GSList **p;
for (p = &list; *p; p = &(*p)->next) {
if (*p != link) continue;
*p = (*p)->next;
break;
}
return list;
And of course all the tests still pass.
Conclusion
Now that we have a proper implementation that is extensible, we can use any structure we want, not
just a simple int
.
It’s as extensible as the implementation of a mainstream library (GLib), but it’s simpler, has good taste, and it works better than Linus Torvalds’s naive implentation.
Is that all we can do? Turns out there’s more, see part 3.