Tufts: Comp 15: Pointers Review

Pointers!?

Pointers and references are key to having a thorough understanding of how to program in C++. This page is a review on pointers and references, and will cover what they are, why we need them, and how they are used.

What are pointers and references?

We should start by distinguishing between pointer variables and pointer values. As you might recall, every variable has both an address and a value. The address identifies where the variable is stored in memory, and the value identifies what is actually being stored.

A pointer value is the address of some spot in memory. These look like 0x7fff3889b4b4 or 0x602010, and are hexadecimal (base 16) numbers that identify where something is. If you think of memory as one huge array, a pointer value is an index in that array. Pointer variables are variables that hold a pointer value. In the array, this would be the integer variable you might have declared that stores the index. A pointer variable points to something else, the pointer value says where that something else is.

A reference variable refers to another variable that already exists. It is basically making a new name for the same spot in memory, meaning that the value is shared and changing the value either changes the value of them both. We most often see these as function parameters, but might see them elsewhere as well.

It is important to remember the following as we continue:

Every location in memory, and therefore every variable, has an address.
Every address corresponds to a unique location in memory.
The computer knows the address of every variable in your program.
Given a memory address, the computer can find out what value is stored at that location.
While addresses are just numbers, C++ treats them as a separate type. This allows the compiler to catch cases where you accidentally assign a pointer to a numeric variable and vice versa (which is almost always an error).

Why use pointers?

So it sounds like pointers just add an unnecessary level of complexity. Why should you care where something is? There are a number of things that can only be done using pointers, and there are other things that are much easier or simpler when pointers are used rather than some workaround.

Call by Reference

Call by reference (CBR) as opposed to call by value (CBV) can be useful when you want to return multiple things from a function, or when you have a function parameter that is very big and would be inefficient to copy.

A standard function is CBV, meaning all the parameters are copied from the function call and put in new variables on the stack frame where the function runs. One result of this is that any changes made to variables are not kept. Imagine a situation where you want to have a function both get an input string and tell you if it was able to do so successfully. In this case, you want to return two things, a string containing the input, and a boolean indicating if the operation occurred successfully. C++ doesn't let you do that without making a struct, so instead it is often nice to pass a string variable as reference and return the boolean. In fact, you will likely use a function that does this very thing when dealing with I/O.

Another reason to use CBR is when a function argument is big and it would be inefficient to copy the whole thing any time the function is called. In fact, arrays are automatically CBR for exactly this reason. However, if you have a really big struct, maybe it contains multiple arrays as well as other things, by default the whole thing would be copied for every function call. Using call by reference, the new stack frame has a reference variable that lets you bypass copying the whole structure over.

Dynamic Memory

C++ doesn't let you change the size of something once you have made it. If you declare an array int arr[10], you have space for 10 integers, and if you realize you have 11, you can't ask C++ to add another spot to the end. Because of this, programming in C++ requires you to use dynamic memory if you don't know how big of a structure you need. We use pointers to allow us to interact with a structure whose size we determine at runtime.

How to use pointers

Okay, at this point, you have been convinced that you do in fact need to learn how to use pointers, no matter how annoying they may seem.

To use them, you need to understand:

Two symbols you will use and come across often (&, *)
Two contexts in which you will use these (in variable declarations and as operators)

& is the address-of operator, and is used to declare reference variables. As you might expect, the address-of operator is applied to variables that already exist and returns their memory address.

address_of.cpp

&, when used in a variable declaration, creates a variable with the type reference-to-a-char or reference-to-an-int etc. It is used when already have a variable, and want a new way to refer to it. This is most often seen in functions, where you might have a reference parameter to keep from copying your data over again.

ref_variable.cpp

* is the dereference operator, and it is used to declare a pointer variable. The dereference operator takes a memory address and gets the value stored there. Pointer variables have a value of a memory address. When used in a declaration, the * will make a pointer-to-a-char or pointer-to-an-int etc.

There is a special pointer which specifically points to 'nothing' - this is known as the null pointer (std::nullptr in C++). We use this as a comparison to see if a pointer variable is meaningful (you often set a pointer to null once you are done using it to signal that that memory address shouldn't be used). Attempts to access the null pointer will usually crash your program, which is better than perhaps modifying random memory locations.

It is also worth mentioning the -> syntax. This is used when you want to access a field or a function of a structure that you have a pointer to. For example, if you have a class object pointer named pc that has a member variable courseNum, you would access it using the syntax pc->courseNum (This does the exact same thing as (*pc).courseNum, but is easier to read and clearer, so you should use the arrow.)

pointers.cpp

Warning:
There is considerable confusion out there about the * symbol. It is not part of the type, though people often pronounce it, and even worse, write it that way. A declaration has a type followed by variables in sample expressions that would produce a value of that type.

      int  *p, x;  // declares a pointer to an int and an int
      int*  q, y;  // SAME!!  EVIL EVIL EVIL — NEVER WRITE THIS!!

Some will call the second example above a difference in style — it isn't. It's an abomination! I have seen experienced programmers waste hours because of it. The * goes with the variable, not the type, so write it there! (My theory is that it's a habit picked up when the person didn't understand what was going on, and when/if they did figure it out, they continued it out of a perverse notion of style and as a way to haze newcomers.)

Pointers and Dynamic Memory

Pointers are the way we interface with dynamic memory. To that end, there is some pointer syntax that's necessary to use that memory.

First, creating and destroying variables dynamically:

new creates a newly allocated variable and gives you a pointer to the variable. Such a variable is a dynamic variable. The variable itself has no name, so it's often called an anonymous variable.
```
        double *dp = new double;
      
```
Dynamic variables live until you destroy them. (Compare to static and automatic variables.)
Dynamic variables can be destroyed using delete when you don't need the space any more.
```
        delete dp;
        dp = NULL;
      
```
You should destroy variables you don't need any more, and you should set such pointers to NULL, because using the pointer after you delete is an error that may cause random things to happen. Beware of aliases of the variable!

Creating and destroying arrays dynamcially:

Here's how you make an array using new:
```
          double *darray = new double[compute_array_size()];
        
```
compute_array_size() is just there to indicate that you can put any integer expression you like in the square brackets: a constant, an integer variable, the result of a calculation, a function call. Really, any integer expression.
You destroy an array like this:
```
          delete [] darray;
        
```
Note the square brackets. If you leave them out, the C++ standard says anything can happen. So don't leave them out!

Examples:

new1.cpp.
Simple dynamically allocated array: new2a.cpp
Function returning a dynamically allocated array: new2b.cpp

Warnings/Common Bugs

Declaring a pointer variable allocates space to hold a pointer, i.e., an address. It does not allocate space to hold the data pointed to, nor does it initialize the pointer to anything in particular. For example:
```
 
        char *pc;
      
```
allocates space to hold a pointer to a character, but does not allocate space for any characters. Thus,
```
        *pc = 'q';
      
```
is an error unless pc has been give the address of an actual character variable.
For this reason, it is a very good practice to initialize pointer values to the null pointer like this:
```
        char *cp = nullptr;
      
```
Aliasing. When two expressions refer to the same location, we say they are aliases. When you make a pointer to a variable, dereferencing the pointer is the same as accessing the original variable, and the value in the variable can be changed either way. When you have aliases, you can see a variable's value change even though that variable has not been used at all (it was changed via an alias).
You can create a pointer to a local variable, and you can return pointers, but you must never return a pointer to a local variable. Local variables are destroyed when a function returns, and therefore, a pointer to a local variable is meaningless and can have very random effects on the behavior of your program. We'll see below how to return a pointer that does have meaning.