good-taste

In a TED talk in 2016 Linus Torvalds was asked to explain his notion of “taste”. As an illustration he presented code as is typically taught in universities:

void remove_list_entry(node **head, node *entry)
{
	node *prev, *walk;

	prev = NULL;
	walk = *head;

	// Walk the list

	while (walk != entry) {
		prev = walk;
		walk = walk->next;
	}

	// Remove the entry by updating the
	// head or the previous entry

	if (!prev)
		*head = entry->next;
	else
		prev->next = entry->next;
}

In contrast, he provided an alternative that he considers superior simply by using an indirect pointer:

void remove_list_entry(node **head, node *entry)
{
	node **indirect;

	// The "indirect" pointer points to the
	// *address* of the thing we'll update

	indirect = head;

	// Walk the list, looking for the thing that
	// points to the entry we want to remove

	while ((*indirect) != entry)
		indirect = &(*indirect)->next;

	// ... and just remove it
	*indirect = entry->next;
}

He argues that the important part isn’t the details of the code (although details are also important), but that by looking at the code in a different way it can be greatly improved.

Taste isn’t something you can learn on a book: you absorb it by working with people who have it through the course of many years, or even decades.

If you follow this illustration you are not going to learn good taste, but you’ll have a better idea of what Linus Torvalds means by it.

It’s divided in three parts with increasing levels of complication. Part 1 is applicable to beginners, part 2 is relevant to people in the software industry, and part 3 is state-of-the-art level.

Let’s get started with part 1.

Linked list

Anyone who has studied computer science in university at some point was asked to create a linked list, which is one of the most basic data structures.

The core of a linked linked list is a node, which is a structure consisting of two members: a value and a pointer to the next node.

struct node {
	int value;
	struct node *next;
};

A linked list is nothing more than a sequence of nodes:

head points to the first node, and the last node’s next pointer doesn’t point to anything (NULL).

So to get the second node’s value (1), we could do this:

head->next->value;

Traditional code

To remove an element we need to make the previous element point to the next element – effectively skipping the element we want to remove. Since we don’t have a pointer to the previous element, we need to traverse the list until we find the element we want to remove, and the previous element will be the last element traversed.

So, a naive implementation would be like:

walk = *head;

// Walk the list

while (walk != entry) {
	prev = walk;
	walk = walk->next;
}

// Remove the entry by updating the previous entry

prev->next = entry->next;

The moment walk is equal to entry we have found the node we want to remove, therefore prev is the previous node.

All we have to do is update the previous node’s next ponter to the current node’s next pointer (0 points to 2).

The problem is that when entry is the very first node, there is no prev, so this program would either produce undefined behavior or crash.

To fix this all we have to do is add a check to deal with the corner case:

if (!prev)
	*head = entry->next;
else
	prev->next = entry->next;

Now the code works properly (if you initialize prev to NULL), and that’s where most programmers would call it a day. “It compiles, ship it!”

But good programmers wouldn’t stop there. It’s not enough that the code compiles and runs correctly, there’s many other considerations: efficiency, readability, maintainability, style, etc.

Let’s consider more.

Refactoring code

The first step in refactoring code is ensuring the code keeps working in the same way as before, so it behooves us to write tests.

Let’s write a very simple unit test:

int do_test(void)
{
	node *head;

	// Define the nodes
	node a = { .value = 0 };
	node b = { .value = 1 };
	node c = { .value = 2 };

	// Link the nodes
	head = &a;
	a.next = &b;
	b.next = &c;

	// Remove the second node
	remove_list_entry(&head, &b);
	if (a.next != &c) return false;

	// Remove the first node
	remove_list_entry(&head, &a);
	if (head != &c) return false;

	return true;
}

In this test we check both cases: removing an element in the middle of the list, and removing the first element.

We check that the code runs correctly:

printf("result: %s\n", do_test() ? "OK" : "FAIL");
=> result: OK

… and it does.

Now we are ready to improve the code.

Improving code

Walking the list isn’t something we can get rid of, but perhaps there’s something we can do about the last check:

if (!prev)
	*head = entry->next;
else
	prev->next = entry->next;

In both cases the right-hand side of the assignment is the same: entry->next, so an easy refactoring could be storing the left-hand side which is variable, in a variable:

node **tmp;
if (!prev)
	tmp = head;
else
	tmp = &prev->next;
*tmp = entry->next;

This code does exactly the same as the original, because in the first branch *tmp is the same as *head, and in the second branch *tmp is prev->next.

If we run the test we can verify that’s indeed the case.

Now, we know the only time prev is going to be NULL is when entry is the first node – and therefore the while loop is never run, so we can initialize tmp at the same time we initialize prev (at the start):

prev = NULL;
tmp = head;
walk = *head;

Now tmp only needs to be updated when prev has not been updated (not NULL):

if (prev) tmp = &prev->next;

But if we are going to update tmp only when prev is updated, perhaps we can update both at the same time:

while (walk != entry) {
	prev = walk;
	tmp = &prev->next;
	walk = walk->next;
}

At this point the resulting code is:

node *prev, *walk, **tmp;

prev = NULL;
tmp = head;
walk = *head;

while (walk != entry) {
	prev = walk;
	tmp = &prev->next;
	walk = walk->next;
}

*tmp = entry->next;

Turns out that prev isn’t used at all any more, so let’s get rid of it:

node *walk, **tmp;

tmp = head;
walk = *head;

while (walk != entry) {
	tmp = &walk->next;
	walk = walk->next;
}

*tmp = entry->next;

The test still passes, so we are are doing OK.

Next, it’s very obvious that tmp and walk are very similar, in fact tmp is simply the address of walk, and walk is only used to check if we’ve reached the target entry. So we can use *tmp instead of walk:

node **tmp;

tmp = head;

while ((*tmp) != entry) {
	tmp = &(*tmp)->next;
}

*tmp = entry->next;

We’ve reached the exact same code Linus Torvalds considered to be “good taste”, all we have to do is change the name of tmp which is an indirect pointer to the pointer we want to update, and clean it up a bit:

node **p = head;
while (*p != entry)
	p = &(*p)->next;
*p = entry->next;

This already drives the point home, but that’s not all we can do, we can use a for loop to compact the code:

node **p;
for (p = head; *p != entry; p = &(*p)->next);
*p = entry->next;

All we had to do is consider a pointer to a pointer, and now the code is only two lines and there are no corner cases.

Conclusion

As we’ve seen from this illustration, to a developer with good taste ™ it’s not enough for the code to compile and run correctly, by simply employing a little bit of thought the code can be improved to the point where the best developers in the world could consider “good”.

However, as Linus Torvalds pointed out: this is nothing. This is a very trivial illustration of the kind of pushback you would receive from good taste developers to perfectly correct code. Real examples look significantly different from this one.

For starters, in the real world not all linked lists contain a single int field, so what would a real linked list look like? For the answer check part 2.