Debugging Memory Errors, including Segmentation Faults

                  Mark A. Sheldon, Tufts University
        ======================================================

As you write more complex programs, it is inevitable that you will get
some memory errors.  The most common memory error is a segmentation
fault, though many of you will also likely get a "double free"
message.

No one can tell you what your problem is based only on the information
that you have a segmentation fault or a double free --- all we can do
is tell you what that means.

In this document, I'll tell you what these errors mean, list some
common causes, and then share debugging secrets of the C++ Jedi
masters.


Common memory errors:  What they mean
-------------------------------------

A segmentation fault occurs when your program attempts to access or
alter memory the operating system says you shouldn't.  That's it.
Your program can have bugs in which it accesses its own memory
inappropriately, but if you step outside the bound's of your program's
memory, you get a segmentation fault and the OS kills your program.

A double free occurs when you use delete on the same pointer value
more than once.  E. g., allocate a struct instance on the heap, delete
it, and then later delete it again, you'll get a double free error.

You can also get an error when you attempt to free space that isn't in
the heap.  For example, if the pointer value you give to delete is the
address of something on the stack you'll see "free():  invalid
pointer".


Common causes of memory errors
------------------------------

Double and invalid frees are self-explanatory.  A typical cause of a
double free is making multiple copies of an address you get from new,
and then using delete on each copy.  Invalid frees often result from
not understanding delete and putting an ampersand to the left of a
stack variable as an argument to delete or from storing the address
the of a stack variable in a data structure that subsequently deletes
that field.

Segmentation faults, however, have a variety of causes.  Here are the
most common ones students encounter in an introductory course:

    o Indexing (well) beyond the bounds of an array.  If you are well
      out of bounds, the access will create a segmentation fault.  If
      you update something just slightly outside the bounds of an
      array, you will alter some other memory, and that can lead to a
      segmentation fault much later.
    o Neglecting to return a value from a function whose return type
      is not void.  Be careful!  If you promise a return value from a
      function, you must return something in ALL cases.  The compiler
      will warn you about this:  "control reaches end of non-void
      function."  Do not ignore this message!
    o Dereferencing a null pointer.  The null pointer, by definition,
      is not actually the address of anything, and so dereferencing it
      is an error.  You dereference a pointer with the * or ->
      operators and sometimes with the [] operator (when you are
      treating the pointer as the address of the beginning of an
      array).
    o Dereferencing uninitialized pointers.  Pointers are like integers
      or floats in that declaring a variable without an explicit
      initialization means the variable can contain anything.  Some
      compilers will initialize pointer variables to the null pointer,
      but some don't.  Using the variable before you give it a value
      can lead to a segmentation fault (if you're lucky!).

      A particularly common instance of this problem arises when
      students program as if a pointer to a Node struct creates a Node
      (it doesn't).  For example:

          struct Node {
                  int          data;
                  struct Node *next;
          };

          ...
          
                  Node *aNewNode;
 
                  aNewNode->data = 0;          // BUG!!
                  aNewNode->next = nullptr;    // BUG!!
          
      In the above example, the programmer allocated an uninitialzed
      variable that can hold the address of a Node.  This does not
      create a Node.  If you want to use this variable, you have to
      store the address of a Node in there yourself.  For example:

          aNewNode = new Node;

      After this line, there is a Node on the heap, and the address of
      that Node is stored in aNewNode (unless the program ran out of
      memory).

When we get to classes, we'll see that one way to get double frees
involves passing instances of your classes by value and/or assigning
an instance of a class to a variable that contains another instance.
For now, just avoid doing that.  Pass instances by reference (you can
pass a pointer to it).  Don't use assignment on instances of your
classes.  Otherwise, you have to learn about copy constructors and
overloading the assignment operator, which we don't cover in this
class.  You are welcome to read about these things, but I encourage
you to avoid the problem for now.


Debugging secrets of the C++ Jedi masters
-----------------------------------------

When your program has a bug, this means that either you have an
erroneous conception of the problem, or an erroneous solution to the
problem, or an erroneous encoding of your solution to the problem.

You can stare at the program and think, but, if that doesn't yield
results in a few minutes, then you need to take action.

Caution:  Do NOT just make various changes to your program hoping to
make the error go away.  I call this the "poke it with a stick"
method.  Just adding, removing, or rearranging *'s or &'s using trial
and error is a way to transform a program you wrote but whose behavior
you don't understand into a program you didn't actually write and
whose behavior you don't understand.  No good will come of this!

The first step is to try to figure out where things are going wrong.

Use debug output to trace your program.

Secret #1:  Strategically place debug print statements in your
program.  A useful way to proceed is to put a print statement at the
start and end of various (or even all) functions.  The debug print
statements should be useful and not just something like "b" or
"here".  Print out "Entering findMax" and "Leaving findMax" at least.
If your program prints the entering message and not the leaving
message, then you are closer to knowing where the program is crashing
or going wrong!

===> Always put a new line at the end of a debug print statement. <===

Secret #2:  It's even better to print out parameter values when you
enter a function and return values, when you have them.  If you see
that a findMax gets an array with [-1, 2, 4, 0] and returns 0, then
you've found a bug.

Secret #3:  Within a function you can print out helpful messages,
too.  "Starting loop to find where to insert node", "inserting new
node after element with name Sam".

Secret #4:  Once you know where a program is crashing or going wrong,
print out ALL THE RELEVANT VARIABLE VALUES --- and this includes
pointer values.  The null pointer on our system will print 0x0.

Many students don't want to print pointer values, because they don't
know what 0x601010 means.  You don't have to know what it means,
though a few bits of information can be helpful.  If you print out a
pointer you're about to delete and then later do that again, and you
see the same pointer, you've just figured out where a double free is
coming from.  That is, you may not know exactly what 0x601010 means,
but you can tell whether it's the same as another value or different.
You can tell whether two nodes have next fields pointing to the same
node, for example, or even whether a node points to itself.  You can
tell whether a pointer is 0x0, which is how the null pointer prints
out.

Pointers that begin 0x7ff...  are usually on the stack.  If you get an
invalid pointer error in free(), and the value looks like that, then
you have given delete an address of a stack allocated variable rather
than an address of a heap allocated variable.  Now you can try to
isolate how that happened.