In part 1 we saw a simple implementation of a linked list, but it only works with a particular structure:
struct node {
int value;
struct node *next;
};
What happens when the value is not an int
, but a float
? Or we want a more complex structure:
struct node {
int id;
const char *name;
struct node *next;
};
GLib
Let’s take a look at a real implementation from a real library: GLib (not to be confused with glibc). GLib is the fundamental library every GNOME and GTK+ project uses, and fortunately for our purses it’s implemented in C.
typedef void* gpointer;
typedef struct _GSList GSList;
struct _GSList
{
gpointer data;
GSList *next;
};
GSList*
g_slist_remove_link (GSList *list,
GSList *link)
{
GSList *tmp = NULL;
GSList **previous_ptr = &list;
while (*previous_ptr)
{
tmp = *previous_ptr;
if (tmp == link)
{
*previous_ptr = tmp->next;
tmp->next = NULL;
break;
}
previous_ptr = &tmp->next;
}
return list;
}
I personally can’t stand this style, so let’s clean it up:
typedef struct GSList {
void *data;
struct GSList *next;
} GSList;
GSList *g_slist_remove_link(GSList *list, GSList *link)
{
GSList *tmp = NULL;
GSList **previous_ptr = &list;
while (*previous_ptr) {
tmp = *previous_ptr;
if (tmp == link) {
*previous_ptr = tmp->next;
tmp->next = NULL;
break;
}
previous_ptr = &tmp->next;
}
return list;
}
This looks very similar to our good taste version, except it’s using a tmp
pointer.
Let’s use p
instead of previous_ptr
and replace tmp
with it when possible:
GSList *tmp = NULL;
GSList **p = &list;
while (*p) {
tmp = *p;
if (*p == link) {
*p = (*p)->next;
tmp->next = NULL;
break;
}
p = &(*p)->next;
}
return list;
Now it is clear that tmp
is only really used to clear the next
pointer of the node we want to
remove. This is something that should be done by the user of this function if
it’s wanted, which more likely is not, because the node will be disposed of, so
let’s remove that unnecessary code.
GSList **p = &list;
while (*p) {
if (*p == link) {
*p = (*p)->next;
break;
}
p = &(*p)->next;
}
return list;
We are getting closer to our version.
We can move the break comparison *p == link
into the while condition, and update *p
after the
loop:
GSList **p = &list;
while (*p && *p != link)
p = &(*p)->next;
*p = (*p)->next;
return list;
At this point we have made too many changes to be comfortable without running any tests.
Tests
Let’s try to port our unit tests from part 1 to GLib’s single linked list.
First, let’s look at the structures:
struct node {
int value;
struct node *next;
};
struct GSList {
void *data;
struct GSList *next;
};
As we can see, the only real difference is that in our version the data (int value
) was directly
available, in GLib’s version data
is a pointer to our data.
So, instead of initializing our node’s data directly, we have to do do it indirectly:
// Before
node a = { .value = 0 };
// After
node_data a = { .value = 0 };
GSList link_a = { .data = &a };
And of course, we have to define our node_data
structure:
struct node_data {
int value;
};
So, our test looks like this:
int do_test(void) {
node_data a = { .value = 0 };
node_data b = { .value = 1 };
node_data c = { .value = 2 };
GSList link_a = { .data = &a };
GSList link_b = { .data = &b };
GSList link_c = { .data = &c };
GSList *list = &link_a;
link_a.next = &link_b;
link_b.next = &link_c;
list = g_slist_remove_link(list, &link_b);
if (link_a.next != &link_c) return false;
list = g_slist_remove_link(list, &link_a);
if (list != &link_c) return false;
return true;
}
This test passes both with GLib’s original version, and our cleaned up version. So we are good.
Not so fast.
Unexpected behavior
One difference from part 1’s code and this new version is the while
condition:
// Before
while (*p != entry)
// Now
while (*p && *p != link)
Have you spotted the problem?
What happens when *p
is NULL
?
It turns out Linus’ version has a bug–or at least a missing feature. If we search for an entry that
doesn’t exist, eventually we reach the end of the list, *p
is NULL
, we try to dereference
NULL->next
, and we crash.
Now, it’s debatable whether or not trying to remove a non-existing element is a valid use case, but surely it does happen. In those cases would probably want to return an error (can’t do it in this API), show a warning (we would need a lot more infraestructure), or simply ignore the problem.
Either way crashing is never good, so we want to avoid that.
Our current tests aren’t enough though, we need to search for an element that isn’t in the list, for example an element that we just removed:
head = g_slist_remove_link(head, &link_b);
if (link_a.next != &link_c) return false;
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;
Does the code works now? Nope, because we made a mistake refactoring:
// This isn't the same
while (*p) {
if (*p == link) {
*p = (*p)->next;
break;
}
p = &(*p)->next;
}
// As this:
while (*p && *p != link)
p = &(*p)->next;
*p = (*p)->next;
The difference is that if *p
is NULL
we don’t want to execute anything else, so we have to check
that *p
isn’t NULL
at the end too:
while (*p && *p != link)
p = &(*p)->next;
if (*p) *p = (*p)->next;
Now the code is equivalent, but is that the only difference?
We can add more tests to make sure that passing NULL
arguments works, and also removing all the
elements of the list:
head = g_slist_remove_link(head, &link_b);
if (link_a.next != &link_c) return false;
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;
head = g_slist_remove_link(head, NULL);
if (head != &link_c) return false;
head = g_slist_remove_link(head, &link_c);
if (head != NULL) return false;
head = g_slist_remove_link(NULL, NULL);
head = g_slist_remove_link(NULL, &link_c);
All the tests pass, so our refactored code is correct now.
But if we are going to do two checks, one in the condition statement and one outside, maybe it’s better to move it back inside as it originally was:
GSList **p = &list;
while (*p) {
if (*p == link) {
*p = (*p)->next;
break;
}
p = &(*p)->next;
}
return list;
Well, except that we can use a for instead, and flip the if condition:
GSList **p;
for (p = &list; *p; p = &(*p)->next) {
if (*p != link) continue;
*p = (*p)->next;
break;
}
return list;
And of course all the tests pass.
Conclusion
Now that we have a proper implementation that is extensible, we can use any structure we want, not
just a simple int
.
It’s as extensible as the implementation of a mainstream library (GLib), but it’s simpler, has good taste, and it works better than Linus Torvalds’s naive implentation.
Is that all we can do? Turns out theres more, see part 3.