Lab: Address Sanitizer

Today’s lab will introduce another useful tool to keep on hand when you’re working in C: AddressSanitizer. The exercises will walk you through AddressSanitizer’s capabilities, but at a high level, it is a tool that adds checks to your program to catch some of the pointer and memory-related errors you’ll likely make when writing C programs. AddressSanitizer is a good complement to gdb; where gdb allows you to walk a program through its execution and examine its internal state, AddressSanitizer adds automated checks to watch for specific errors and report useful diagnostic information when an error occurs. Neither tool is a great replacement for the other, but together they help save you from what would otherwise have been a long, tedious debugging process.

Today’s lab uses the starter code provided in the file asan.tar.gz. Download the file, the run these terminal commands to get set up:

$ cd csc161/labs
$ tar xvzf ~/Downloads/asan.tar.gz
$ cd asan
$ code .

A. Building a Program with AddressSanitizer

The first step when using AddressSanitizer is to recompile the program with an extra flag: -fsanitize=address. As you might guess from the structure of that option, there are other sanitizers available to check for errors or undesirable program behaviors, but AddressSanitizer was the first.

We’ll start by building the following program:

int main() {
  // Make an array of ten values
  int values[10];

  // Initialize the values array to zeroes
  for (int i=0; i<sizeof(values); i++) {
    values[i] = 0;
  }

  return 0;
}

This program contains a memory error. If you’ve spotted it already, please keep it to yourself; these can be tricky to spot in small programs, and nearly impossible in a larger program. We’ll use AddressSanitizer to catch the error in a moment.

If you save this code in a file named partA.c a normal compilation and run woud look like this:

$ clang -o partA partA.c
$ ./partA

What happens when you run the program? It’s likely the program fails with a segmentation fault on MathLAN machines. You might see different behavior on other computers. Some programs with similar errors will work on some runs and fail on others. These are good clues that we’re dealing with a memory error.

One of the first tools you should reach for when you encounter a memory error is gdb. But to get useful debugging information, you’ll need to recompile partA with the -g flag before running it in gdb:

$ clang -g -o partA partA.c
$ gdb ./partA

Type run into gdb to run the program. It will stop when the segmentation fault occurs. Where does the program stop? What useful information can you learn from the point where the program has stopped? It turns out, not much.

This is where AddressSanitizer can be more useful. We can compile the program with AddressSanitizer like this:

$ clang -g -fsanitize=address -o partA partA.c

The program now includes extra checks for memory errors that will alert you when something goes wrong. The exercises below will ask you to try using AddressSanitizer to diagnose the bug in this program.

Exercises

  1. The narrative above explains how to compile partA.c with AddressSanitizer on the command line, but the starter code for today includes a Makefile. Modify the Makefile to build the program with AddressSanitizer.

  2. Now run partA. The program should fail with a long AddressSanitizer error instead of just crashing. You can ignore the parts of the error beyond the line that begins with “SUMMARY”, but everything above that line is useful diagnostic information. Try to write out an explanation for what each line of the diagnostic output means, excluding the line that starts with “HINT”. Your explanation should include a fairly clear statement of what the error is. There will be a few terms you won’t recognize (like thread, at least in this context) but you can ignore these parts of the diagnostic message in this class.

  3. Use the information from AddressSanitizer to fix the error. Once you’ve fixed the error you should be able to run the program with AddressSanitizer enabled, but receive no error reports.

B. Exploring AddressSanitizer

For this part of the lab, you will explore the different kinds of memory errors AddressSanitizer can detect. Each exercise asks you to write a program containing a specific error in C, and then to check if AddressSanitizer detects the error. You’ll compare the diagnostic information from AddressSanitizer (if it produces any at all) to the information you can get from examining the error in gdb.

For some errors, you’ll almost certainly end up with compiler warnings. This lab is the one time you should ignore these warnings; they’re there to help you avoid errors, but we’re adding errors on purpose.

For each exercise, you will follow the same steps:

  1. Write a program that contains the requested error.

  2. Compile and run the program without AddressSanitizer. Write down what happens when you run the program. Does it crash or give any other evidence that an error has occurred?

  3. Now try running the program in gdb. Does the debugger give you any useful information to guide you to the error?

  4. Finally, build the program with AddressSanitizer and run it again. Does AddressSanitizer catch the error? If so, what diagnostic information does it print? Does the diagnostic information point directly at the error?

  5. Which tool do you think is better for finding this type of error? Pick either gdb or AddressSanitizer and explain why you think it’s more helpful for this error.

Exercises

  1. Use After Return: It’s never safe to access a function’s local variables once it goes out of scope. C allows us to create pointers to local variables, and if you’re not careful you might end up using one of the pointers after the function returns. Create a source file called use-after-return.c and fill it in with a program that includes this error. Follow the steps listed above to see how AddressSanitizer and gdb handle this error.

    Update: You will need to run the program slightly differently to tell AddressSanitizer to check for this error:

    $ ASAN_OPTIONS=detect_stack_use_after_return=1 ./use-after-return
    

    This extra option is only required for this exercise. You can run the program as usual for the other exercises.

  2. Uninitialized Reads: Any time we reserve space in memory in C (e.g. by creating a local variable or array) we are required to initialize it before reading its contents. Create a source file named uninitialized-read.c and fill it in with a program that contains this error. Follow the steps listed above to see how AddressSanitizer and gdb handle this error.

  3. Use After Scope: Functions aren’t the only way to create a new scope in C. You can declare new variables inside of an if, a for loop, or any other construct with curly braces. Create a source file named use-after-scope.c and fill it in with a program that accesses memory for a variable that has gone out of scope; your implementation should only contain a main function. Follow the steps listed above to see how AddressSanitizer and gdb handle this error.

  4. Constant String Modifications: We’ve seen in an earlier lab that string constants in C are immutable. Create a source file named constant-string.c and fill it in with a program that attempts to modify one or more characters in a string constant. Follow the steps listed above to see how AddressSanitizer and gdb handle this error.

  5. Stack Buffer Overflow: The example bug in part A was one kind of stack buffer overflow: it writes far beyond the end of an array. We’re more likely to make errors that read or write one element beyond the end of an array, or maybe even one element before the array (if you accidentally use negative indices). Create a source file named stack-buffer-overflow.c and fill it in with a program that reads one element beyond the bounds of a local array. Follow the steps listed above to see how AddressSanitizer and gdb handle this error.

  6. Global Buffer Overflow: We can also make accesses beyond the end of a global array. Create a source file named global-buffer-overflow.c and fill it in with a program that writes one element beyond the bounds of a global array. Follow the steps listed above to see how AddressSanitizer and gdb handle this error.

AddressSanitizer doesn’t catch all of the errors in the exercises above, but it catches some of the most difficult ones to diagnose in a larger project. The MemorySanitizer is able to catch one of the errors above that AddressSanitizer misses. There are at least three more errors AddressSanitizer is particularly good at finding, but we haven’t yet learned how to write code that contains these errors. We’ll revisit AddressSanitizer to discuss these additional errors soon.