Linked List Variations

Previously, we studied the basic linked-based implementation of a list data type. We can further augment this basic implementation to make common operations more efficient, or to specialize the list’s behavior to domain-specific scenarios. In doing so, we begin making trade-offs between features, performance, and implementation complexity.

Memoizing the Size of a List

Recall that the size operation over a linked list traverses the list, counting every node. This is not a problem for small lists, but if the list is large then traversing the list—perhaps multiple times in quick succession—will be very inefficient. Rather than recomputing the size of a list on the fly, we can, instead, remember the size as an additional field of the list structure:

typedef struct list {
  node_t* head;
  size_t size;
} list_t;

size now has a straightforward implementation.

size_t size(const list_t* lst) {
  return lst->size;
}

Note that we want to keep the size function even though it is only accessing a field of lst. This function still has the effect of hiding the implementation of the lst structure even though the function, itself, is very simple.

While size is now simple, we must ensure that the lst’s size field is kept up-to-date by the other list functions. For example, list_init should initialize the size to be 0:

void list_init(list_t* lst) {
  lst->head = NULL;
  lst->size = 0;
}

Functions that insert into the list will need to increment the size field. If our list interface includes an insert_front function it may look like this:

void insert_front(list_t* lst, int value) {
  node_t* node = (node_t*)malloc(sizeof(node_t));
  node->value = value;
  node->next = lst->head;

  lst->head = node;
  lst->size++;
}

And functions that remove from the list, e.g., remove_first need to decrement the size:

bool remove_first(list_t* lst) {
  if (lst->head == NULL) {
    return false;
  }

  node_t* temp = lst->head;
  lst->head = lst->head->next;
  lst->size--;
  free(temp);
  return true;
}

We can capture this responsibility of the list functions to keep the size field updated as an invariant on the list structure:

After the execution of every list function, the size field accurately records the size of the list.

This invariant induces extra burden on our implementation. In particular, you can imagine that it is likely we’ll introduce at least one bug into our code by forgetting to update the size field. However, if we are willing to take on this burden, we gain the benefit of a quick way to look up the size of a list.

Similar to size, we can also observe a similar sort of performance issue with a function that adds a value at the end of a list. To insert onto the end of the list we must traverse the entire list, which could be costly for a long list. Inserting at the front of a list is much easier, and does not depend on the length of the list.

To alleviate this issue, we can see that if we had a pointer to the last node of the list, which we’ll call tail. We can use this end pointer to quickly append onto the end. This leads to the following variation on our list type:

typedef struct list {
  node_t* head;
  node_t* tail;
} list_t;

Like with size, our functions must now ensure that tail is always pointing to the last node of the list. Unlike size, keeping tail up-to-date requires some careful thought! Let’s consider updating insert_back to use the tail pointer:

void insert_back(list_t* lst, int value) {
  node_t* new_node = (node_t*)malloc(sizeof(node_t));
  new_node->value = value;
  new_node->next = NULL;

  // this code has a bug! do you see it?
  // Hint: think about the possible cases of old_tail...
  node_t* old_tail = lst->tail;
  lst->tail = new_node;
  old_tail->next = new_node;
}

What’s wrong with this code? Consider the field access to old_tail->next, which requires that old_tail->next, the (old) last node of the list, is non-NULL. When might this not be the case? When the list is empty, of course! Thus, like the insert_front function, we must perform a case analysis on lst->tail:

void insert_back(list_t* lst, int value) {
  node_t* new_node = (node_t*)malloc(sizeof(node_t));
  new_node->value = value;
  new_node->next = NULL;

  // this code still has a bug! do you see it?
  // Hint: did we forget to update something?
  if (lst->tail == NULL) {
    lst->tail = new_node;
  } else {
    node_t* old_tail = lst->tail;
    lst->tail = new_node;
    old_tail->next = new_node;
  }
}

As the code suggests, we forgot to do something. What was it? We needed to update l->head as well because the list was empty! This final code snippet is correct:

void insert_back(list_t* lst, int value) {
  node_t* new_node = (node_t*)malloc(sizeof(node_t));
  new_node->value = value;
  new_node->next = NULL;

  // this code still has a bug! do you see it?
  // Hint: did we forget to update something?
  if (lst->tail == NULL) {
    lst->tail = new_node;
    lst->tail = new_node;
  } else {
    node_t* old_tail = lst->tail;
    lst->tail = new_node;
    old_tail->next = new_node;
  }
}

In the one-element case, both l->head and l->tail point to the same node. Will this be a problem when we insert_back an additional element? Let’s think through this case with diagrams! In the one-element scenario, our list lst looks as follows.

lst.head [o]--->[0][/]
   .tail [o]-----^

Note that lst has two pointers, the head pointer and the tail pointer. Now, let’s modify lst so we insert 1 via insert_back. If we follow our code precisely, we:

  1. We hold a temporary pointer to the old tail node of the list, 0.
  2. We make the tail node of the list a new node containing 1.
  3. We update the old tail node to point to this new node.

These three steps modify the diagram as follows:

  • Step 1:

    lst.head [o]--->[0][ ]
       .tail [o]-----^
    old_tail [o]-----|
    
  • Step 2:

    lst.head [o]--->[0][o]    [1][/]
       .tail [o]-----^---------^
                     |
    old_tail [o]-----|
    
  • Step 3:

    lst.head [o]--->[0][o]--->[1][/]
       .tail [o]-----^---------^
                     |
    old_tail [o]-----|
    

So our code works out in the end! But it wasn’t clear at all it would do that. We needed a diagram to check that our approach was correct. This is the complexity introduced by adding a pointer to the end of the list: we must be very careful about maintaining the head and tail pointers since they might be updated at times we don’t expect, particularly, when the list is empty.

Other Variants of Linked Structures

So far we have considered modifications to the list_t structure, but we might also consider modifying how our node_t structures are represented. For example, two common changes are:

  • Making nodes doubly-linked. That is, equipping each node with a previous pointer. Previous pointers allow us to perform right-to-left traversals of the list and can simplify trickier operations such as deletion.
  • Connecting the first and last nodes of the list, creating a circular structure. Circular structures arise naturally in a variety of contexts and it is sometimes useful to represent them directly in our code.

We will explore some of these structural variants in class.

Abstract Data Types

These variations on linked lists bring us to an interesting observation: we can change the way a list is represented without changing the basic operations available to clients of this code. This is a good example of abstraction. By abstraction, we mean that we:

  • Expose the essential parts of a software component required to interact with others and
  • Hide the inner workings of the component when we can.

This approach to building software helps minimize the surface area of the component we’re writing, minimizing the ways it can interact with other components, thus reducing the complexity of our programs.

When our software components are best realized as types in our programs, this abstraction takes the form of abstract data types (ADTs). We break up our component into two parts:

  • A public interface that defines the ways that our program can interact with this type, i.e., what functions are available to create, use, and delete values of this type.
  • A private implementation of the interface functions, hidden from the outside world.

We will spend more time working with abstract data types soon, but you should start to think about lists as an abstraction we use to talk about many possible implementation approaches.

Acknowledgements

This reading is adapted from Peter-Michael Osera’s readings about Abstract Data Types and Linked List Variants.