good-taste

In part 1 we saw a simple implementation of a linked list, but it only works with a particular structure:

struct node {
	int value;
	struct node *next;
};

What happens when the value is not an int, but a float? Or we want a more complex structure:

struct node {
	int id;
	const char *name;
	struct node *next;
};

Clearly we need another approach.

GLib

Let’s take a look at a real implementation from a real library: GLib (not to be confused with glibc). GLib is the fundamental library every GNOME and GTK+ project uses, and fortunately for our purses it’s implemented in C.

typedef void* gpointer;
typedef struct _GSList GSList;

struct _GSList
{
  gpointer data;
  GSList *next;
};

GSList*
g_slist_remove_link (GSList *list,
                     GSList *link)
{
  GSList *tmp = NULL;
  GSList **previous_ptr = &list;

  while (*previous_ptr)
    {
      tmp = *previous_ptr;
      if (tmp == link)
        {
          *previous_ptr = tmp->next;
          tmp->next = NULL;
          break;
        }

      previous_ptr = &tmp->next;
    }

  return list;
}

I personally can’t stand this style, so let’s clean it up:

typedef struct GSList {
	void *data;
	struct GSList *next;
} GSList;

GSList *g_slist_remove_link(GSList *list, GSList *link)
{
	GSList *tmp = NULL;
	GSList **previous_ptr = &list;

	while (*previous_ptr) {
		tmp = *previous_ptr;
		if (tmp == link) {
			*previous_ptr = tmp->next;
			tmp->next = NULL;
			break;
		}

		previous_ptr = &tmp->next;
	}

	return list;
}

This looks very similar to our good taste version in part 1, except it’s using a tmp pointer.

Let’s use p instead of previous_ptr and replace tmp with it when possible:

GSList *tmp = NULL;
GSList **p = &list;

while (*p) {
	tmp = *p;
	if (*p == link) {
		*p = (*p)->next;
		tmp->next = NULL;
		break;
	}

	p = &(*p)->next;
}

return list;

Now it is clear that tmp is only really used to clear the next pointer of the node we want to remove. This is something that should be done by the user of this function if that’s what is desired – which more likely it’s not, because the node will be disposed of – so let’s remove that unnecessary code.

GSList **p = &list;

while (*p) {
	if (*p == link) {
		*p = (*p)->next;
		break;
	}

	p = &(*p)->next;
}

return list;

We are getting closer to our version.

We can move the break comparison *p == link into the while condition, and update *p after the loop:

GSList **p = &list;

while (*p && *p != link)
	p = &(*p)->next;

*p = (*p)->next;

return list;

At this point we’ve made too many changes to be comfortable without running any tests.

Tests

Let’s try to port our unit tests from part 1 to GLib’s single linked list.

First, let’s look at the structures:

struct node {
	int value;
	struct node *next;
};

struct GSList {
	void *data;
	struct GSList *next;
};

As we can see, the only real difference is that in our version the data (int value) was directly available, in GLib’s version data is a pointer to our data.

So, instead of initializing our node’s data directly, we have to do do it indirectly:

// Before
node a = { .value = 0 };

// After
node_data a = { .value = 0 };
GSList link_a = { .data = &a };

And of course: we have to define our node_data structure:

struct node_data {
	int value;
};

So, our test looks like this:

int do_test(void) {
	// Define the nodes

	node_data a = { .value = 0 };
	node_data b = { .value = 1 };
	node_data c = { .value = 2 };

	GSList link_a = { .data = &a };
	GSList link_b = { .data = &b };
	GSList link_c = { .data = &c };

	// Link the nodes
	GSList *list = &link_a;
	link_a.next = &link_b;
	link_b.next = &link_c;

	// Remove the second node
	list = g_slist_remove_link(list, &link_b);
	if (link_a.next != &link_c) return false;

	// Remove the first node
	list = g_slist_remove_link(list, &link_a);
	if (list != &link_c) return false;

	return true;
}

This test passes both with GLib’s original version, and our cleaned up version. So we are good.

… not so fast.

Unexpected behavior

One difference from part 1’s code and this new version is the while condition:

// Before
while (*p != entry)

// Now
while (*p && *p != link)

Have you spotted the problem?

What happens when *p is NULL?

It turns out Linus’ version has a bug – or at least a missing feature. If we search for an entry that doesn’t exist, eventually we reach the end of the list, *p is NULL, we try to dereference NULL->next, and we crash.

Now, it’s debatable whether or not trying to remove a non-existing element is a valid use case, but surely it can happen. In those cases would probably would want to return an error (can’t do it in this API), show a warning (we would need a lot more infraestructure), or simply ignore the problem.

Either way crashing is never good, so we want to avoid that.

Our current tests aren’t enough though, we need to search for an element that isn’t in the list, for example an element that we just removed:

// Remove the second node
head = g_slist_remove_link(head, &link_b);
if (link_a.next != &link_c) return false;

// Remove the first node
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;

// Remove the same node again
head = g_slist_remove_link(head, &link_a);
if (head != &link_c) return false;

Does the code works now? Nope, because we made a mistake refactoring:

// This isn't the same

while (*p) {
	if (*p == link) {
		*p = (*p)->next;
		break;
	}

	p = &(*p)->next;
}

// As this:

while (*p && *p != link)
	p = &(*p)->next;

*p = (*p)->next;

The difference is that if *p is NULL we don’t want to execute anything else, so we have to check that *p isn’t NULL at the end too:

while (*p && *p != link)
	p = &(*p)->next;

if (*p) *p = (*p)->next;

Now the code is equivalent, and all the tests pass.

But if we are going to do two checks (one in the condition statement and one outside), maybe it’s better to move the code back inside as it originally was:

GSList **p = &list;

while (*p) {
	if (*p == link) {
		*p = (*p)->next;
		break;
	}

	p = &(*p)->next;
}

return list;

Well, except that we can use a for instead, and flip the if condition:

GSList **p;

for (p = &list; *p; p = &(*p)->next) {
	if (*p != link) continue;
	*p = (*p)->next;
	break;
}

return list;

And of course all the tests still pass.

Conclusion

Now that we have a proper implementation that is extensible, we can use any structure we want, not just a simple int.

It’s as extensible as the implementation of a mainstream library (GLib), but it’s simpler, has good taste, and it works better than Linus Torvalds’s naive implentation.

Is that all we can do? Turns out there’s more, see part 3.