Tufts: Comp 15: Classes, Data Abstraction, and Representation Invariants

Classes for Modularity and Encapsulation

Classes give us a way to package data and the functions that operate on that data together: a class is a struct that has data elements, plus:

Functions
Control of visibility

Data Abstraction with Classes

Data abstraction is the process of ecapsulating all in the information about a user-defined data type in one place. Good data abstraction involves having a strong abstraction barrier that prevents clients from interfering with the implementation.

Let's consider a class whose instances represent points in the X-Y plane:

Point1.cpp:

Notice how we put functions and data together into the class. We made the functions public, which means anyone with a Point object can use them. The data is there, too, but we made that private, which means only the functions within the class can access the data members. Abstraction barrier!

Look in main(). We declare variables of type Point: a class makes a new type. A value whose type is a class is called an instance of that class. It's also called an object.

It's normal to want to initialize a value when you create it, so C++ has something called a constructor, which is a function that is called automatically when someone makes a new instance. In this case, the constructor takes two parameters: the x- and y-coordinates of the Point to be created. (Constructor is really a misnomer: it doesn't construct anything. It's really an initializer.)

To invoke member functions, also known as methods, you put an instance on the left and select the member function you want and pass the arguments. The member functions live in the object, just as data members live in a struct!

You don't have to pass the object the function is to operate on (it's moved to the left of the dot). The function will work on the object is was taken out of.

The :: is called the scope resolution operator. It's there to tell C++ that the thing it's attached to really belongs in the class named on the left. Without it, C++ would assume the function was an outsider and thus would not coordinate its use with any objects or give it access to data members.

While looking at the member functions, like get_x(), note that member functions can just use the member variables. C++ will assume that such references mean the data members in the object the function was called from.

Terminology:

An object is a data value that has state and behavior. Sound familiar? How do you represent something; what can you do with it?
A class is a description of the shared characteristics of a set of objects.
C++ objects are instances of classes. We use member variables to store the state of an object. We put functions into the structs to represent behavior: they are called member functions in C++ (and methods in other languages).
A class is a description of the shared characteristics of a set of objects. That is, a class can be thought of as a kind of template for making objects that all have the same sort of abstract state and behavior.
In C++, classes are an extension of a struct definition.

Exercise: Add a distance() member function to the Point class and add code in main() to test it.

Real Modularity: Reusing Points

Our code above is nice, but no one else can use the point implementation. We can't even use it in another application with out cutting and pasting (boo, hiss). To solve this problem, separate out the following three things:

interface
implementation
client

The Point interface goes in a .h (header) file: Point.h.

For somewhat technical reasons, there a new constructor, nullary constructor here. It will make a point for (0, 0) if someone wants to make a Point without specifiying the coordinates.

The Point implementation goes in a separate .cpp file: Point.cpp.

Notice this file also includes Point.h.
Remember: If you define a member function outside of the class declaration, you have to use the scope resolution operator (::) to tell C++ that the function you're defining is part of a class and not a regular top-level function.

The client code, which uses the Point code, is written in yet a separate file - PointClient.cpp. This code includes the Point.h header file, and uses whatever is declared as public there

(Notice we use quotation marks in the #include directive. clang++ will look in current directory.)

What's going on here? Why do we hide all the data structure and function information somewhere else? This way,

The application only has to worry about the application.
The implementation only worries about the implementation.
They both use the interface/contract.

The client uses contract to determine what it can do.
The implementer uses the contract as a statement of what it must provide.

Moreover, the implementation can now be easily reused, indepedently of the application.

If we discover a better way to implement the same interface, we can change the implementation, and the client doesn't have to change at all.

And, if the client finds a better implementation of the same interface, they can use that one without changing their code.

Rectangles

Let's build a rectangle abstraction on top of points, but using all the new tools. This would be great to attempt yourself, and then double-check your code against what is here.

Getters, Setters, Data Abstraction, and Representation Invariants

How can I be sure,
In a world that's constantly changing?
How can I be sure,
where I stand with you?
–The Rascals

Control of visibility is very important!

It is very common to provide getters and setters for elements of an object's state: a getter returns the current value of a state element, and a setter updates the value of a state element.

Getters and setters should only be provided for elements of logical or abstract state. The concrete state should not be revealed (unless it maps directly to the logical state, but the client should not worry about that!).
Setters should only be provided for elements of the logical state that can change (i. e., that are mutable) and that it makes sense for the client to update individually.

For example, the rectangles have lower left and upper right points. It would be very common to have member functions (methods):

get_lower_left()
get_upper_right()
set_lower_left()
set_upper_right()

Why? Why not make the fields public and then clients clients can just get and update the fields themselves?

Data abstraction: The client should not know or care how we represent our data.

That permits client and implementation to be more decoupled, more independent, more modular.

Consider our rectangles. We're storing only two points of the rectangle, but why should a client work only for rectangles with lower left and upper right points? This is an arbitrary decision that has nothing to do with their application.

We could extend our rectangles:

class Rectangle
{
public:
        Rectangle(Point low_left, Point up_right);

        Point get_lower_left ();
        Point get_upper_left ();
        Point get_upper_right();
        Point get_lower_right();


        void  set_lower_left (Point new_lower_left);
        void  set_upper_left (Point new_upper_left);
        void  set_upper_right(Point new_upper_right);
        void  set_lower_right(Point new_lower_right);

        int   get_width ();
        int   get_height();

        void print();

private:
        Point lower_left, upper_right;
};

Notice that the implementation is still storing 2 points, but it's allowing clients to behave as if it has all four corners available as well as the width and the height. It could store all those things, but it doesn't have to.

The client doesn't care about the actual state of an object, only its logical state, i.e., the state of the thing the object represents. In a way, the implementer can lie about what's in an object. As long as they can produce a value or record an appropriate state change, the client is happy.

Exercise:
Implement all the getters in the Rectangle class above.

Representation Invariants: How you can be sure in a world that's constantly changing.

You may have started to think about the setters in the above example. First, you must consider whether you want to have setters at all. Data values that cannot change are said to be immutable. Mutable values can change. Our points above are immutable.

If you do want to support mutations, what do they mean? For example, if the client is thinking that a Rectangle instance represents a rectangle (with sides parallel to the edges of the screen), what does it mean to change the lower left corner?

Point p1(0, 0), p2(10, 20), p3(5, 6);
Rectangle r(p1, p2);

r.set_lower_left(p3);

Can we really change one corner out of four? If so, then the figure isn't a rectangle any more! So, consider set_upper_left(). How would you implement that?

Stranger: what if I set the upper left corner to be to the right of the upper right corner?

A representation invariant is a property of an object that is true whenever anyone outside the abstraction looks at it. It's a consistency property.

When you initialize an object (in the constructor), you establish the invariant (e.g., the left corners are to the left of the right corners).
All public member functions assume the invariant is true when they are called.
If the function changes anything, it re-establishes the invariant before the function returns.

Therefore, setters should either update all the necessary members of an object in a consistent way or fail. (They can silently fail to do anything, effectively ignoring the request, return a failure indication, or throw something called an exception.)