In a TED talk in 2016 Linus Torvalds was asked to explain his notion of “taste”. As an illustration he presented code as is typically taught in universities:
void remove_list_entry(node **head, node *entry)
{
node *prev, *walk;
prev = NULL;
walk = *head;
// Walk the list
while (walk != entry) {
prev = walk;
walk = walk->next;
}
// Remove the entry by updating the
// head or the previous entry
if (!prev)
*head = entry->next;
else
prev->next = entry->next;
}
In contrast, he provided an alternative that he considers superior simply by using an indirect pointer:
void remove_list_entry(node **head, node *entry)
{
node **indirect;
// The "indirect" pointer points to the
// *address* of the thing we'll update
indirect = head;
// Walk the list, looking for the thing that
// points to the entry we want to remove
while ((*indirect) != entry)
indirect = &(*indirect)->next;
// ... and just remove it
*indirect = entry->next;
}
He argues that the important part isn’t the details of the code (although details are also important), but that by looking at the code in a different way it can be greatly improved.
Taste isn’t something you can learn on a book: you absorb it by working with people who have it through the course of many years, or even decades.
If you follow this illustration you are not going to learn good taste, but you’ll have a better idea of what Linus Torvalds means by it.
It’s divided in three parts with increasing levels of complication. Part 1 is applicable to beginners, part 2 is relevant to people in the software industry, and part 3 is state-of-the-art level.
Let’s get started with part 1.
Linked list
Anyone who has studied computer science in university at some point was asked to create a linked list, which is one of the most basic data structures.
The core of a linked linked list is a node
, which is a structure consisting of two members: a
value and a pointer to the next node.
struct node {
int value;
struct node *next;
};
A linked list is nothing more than a sequence of nodes:
head
points to the first node, and the last node’s next
pointer doesn’t point to anything
(NULL
).
So to get the second node’s value (1
), we could do this:
head->next->value;
Traditional code
To remove an element we need to make the previous element point to the next element – effectively skipping the element we want to remove. Since we don’t have a pointer to the previous element, we need to traverse the list until we find the element we want to remove, and the previous element will be the last element traversed.
So, a naive implementation would be like:
walk = *head;
// Walk the list
while (walk != entry) {
prev = walk;
walk = walk->next;
}
// Remove the entry by updating the previous entry
prev->next = entry->next;
The moment walk
is equal to entry
we have found the node we want to remove, therefore prev
is
the previous node.
All we have to do is update the previous node’s next
ponter to the current node’s next
pointer
(0
points to 2
).
The problem is that when entry
is the very first node, there is no prev
, so this program would
either produce undefined behavior or crash.
To fix this all we have to do is add a check to deal with the corner case:
if (!prev)
*head = entry->next;
else
prev->next = entry->next;
Now the code works properly (if you initialize prev
to NULL
), and that’s where most programmers
would call it a day. “It compiles, ship it!”
But good programmers wouldn’t stop there. It’s not enough that the code compiles and runs correctly, there’s many other considerations: efficiency, readability, maintainability, style, etc.
Let’s consider more.
Refactoring code
The first step in refactoring code is ensuring the code keeps working in the same way as before, so it behooves us to write tests.
Let’s write a very simple unit test:
int do_test(void)
{
node *head;
// Define the nodes
node a = { .value = 0 };
node b = { .value = 1 };
node c = { .value = 2 };
// Link the nodes
head = &a;
a.next = &b;
b.next = &c;
// Remove the second node
remove_list_entry(&head, &b);
if (a.next != &c) return false;
// Remove the first node
remove_list_entry(&head, &a);
if (head != &c) return false;
return true;
}
In this test we check both cases: removing an element in the middle of the list, and removing the first element.
We check that the code runs correctly:
printf("result: %s\n", do_test() ? "OK" : "FAIL");
=> result: OK
… and it does.
Now we are ready to improve the code.
Improving code
Walking the list isn’t something we can get rid of, but perhaps there’s something we can do about the last check:
if (!prev)
*head = entry->next;
else
prev->next = entry->next;
In both cases the right-hand side of the assignment is the same: entry->next
, so an easy
refactoring could be storing the left-hand side which is variable, in a variable:
node **tmp;
if (!prev)
tmp = head;
else
tmp = &prev->next;
*tmp = entry->next;
This code does exactly the same as the original, because in the first branch *tmp
is the same as
*head
, and in the second branch *tmp
is prev->next
.
If we run the test we can verify that’s indeed the case.
Now, we know the only time prev
is going to be NULL
is when entry
is the first node – and
therefore the while
loop is never run, so we can initialize tmp
at the same time we initialize
prev
(at the start):
prev = NULL;
tmp = head;
walk = *head;
Now tmp
only needs to be updated when prev
has not been updated (not NULL
):
if (prev) tmp = &prev->next;
But if we are going to update tmp
only when prev
is updated, perhaps we can update both at the
same time:
while (walk != entry) {
prev = walk;
tmp = &prev->next;
walk = walk->next;
}
At this point the resulting code is:
node *prev, *walk, **tmp;
prev = NULL;
tmp = head;
walk = *head;
while (walk != entry) {
prev = walk;
tmp = &prev->next;
walk = walk->next;
}
*tmp = entry->next;
Turns out that prev
isn’t used at all any more, so let’s get rid of it:
node *walk, **tmp;
tmp = head;
walk = *head;
while (walk != entry) {
tmp = &walk->next;
walk = walk->next;
}
*tmp = entry->next;
The test still passes, so we are are doing OK.
Next, it’s very obvious that tmp
and walk
are very similar, in fact tmp
is simply the address
of walk
, and walk
is only used to check if we’ve reached the target entry
. So we can use
*tmp
instead of walk
:
node **tmp;
tmp = head;
while ((*tmp) != entry) {
tmp = &(*tmp)->next;
}
*tmp = entry->next;
We’ve reached the exact same code Linus Torvalds considered to be “good taste”, all we have to do is
change the name of tmp
which is an indirect pointer to the pointer we want to
update, and clean it up a bit:
node **p = head;
while (*p != entry)
p = &(*p)->next;
*p = entry->next;
This already drives the point home, but that’s not all we can do, we can use a for
loop to compact
the code:
node **p;
for (p = head; *p != entry; p = &(*p)->next);
*p = entry->next;
All we had to do is consider a pointer to a pointer, and now the code is only two lines and there are no corner cases.
Conclusion
As we’ve seen from this illustration, to a developer with good taste ™ it’s not enough for the code to compile and run correctly, by simply employing a little bit of thought the code can be improved to the point where the best developers in the world could consider “good”.
However, as Linus Torvalds pointed out: this is nothing. This is a very trivial illustration of the kind of pushback you would receive from good taste developers to perfectly correct code. Real examples look significantly different from this one.
For starters, in the real world not all linked lists contain a single int
field, so what would a
real linked list look like? For the answer check part 2.