|
perltoot - Tom's object-oriented tutorial for perl
Object-oriented programming is a big seller these days. Some managers
would rather have objects than sliced bread. Why is that? What's so
special about an object? Just what is an object anyway?
An object is nothing but a way of tucking away complex behaviours into
a neat little easy-to-use bundle. (This is what professors call
abstraction.) Smart people who have nothing to do but sit around for
weeks on end figuring out really hard problems make these nifty
objects that even regular people can use. (This is what professors call
software reuse.) Users (well, programmers) can play with this little
bundle all they want, but they aren't to open it up and mess with the
insides. Just like an expensive piece of hardware, the contract says
that you void the warranty if you muck with the cover. So don't do that.
The heart of objects is the class, a protected little private namespace
full of data and functions. A class is a set of related routines that
addresses some problem area. You can think of it as a user-defined type.
The Perl package mechanism, also used for more traditional modules,
is used for class modules as well. Objects "live" in a class, meaning
that they belong to some package.
More often than not, the class provides the user with little bundles.
These bundles are objects. They know whose class they belong to,
and how to behave. Users ask the class to do something, like "give
me an object." Or they can ask one of these objects to do something.
Asking a class to do something for you is calling a class method.
Asking an object to do something for you is calling an object method.
Asking either a class (usually) or an object (sometimes) to give you
back an object is calling a constructor, which is just a
kind of method.
That's all well and good, but how is an object different from any other
Perl data type? Just what is an object really; that is, what's its
fundamental type? The answer to the first question is easy. An object
is different from any other data type in Perl in one and only one way:
you may dereference it using not merely string or numeric subscripts
as with simple arrays and hashes, but with named subroutine calls.
In a word, with methods.
The answer to the second question is that it's a reference, and not just
any reference, mind you, but one whose referent has been bless()ed
into a particular class (read: package). What kind of reference? Well,
the answer to that one is a bit less concrete. That's because in Perl
the designer of the class can employ any sort of reference they'd like
as the underlying intrinsic data type. It could be a scalar, an array,
or a hash reference. It could even be a code reference. But because
of its inherent flexibility, an object is usually a hash reference.
Before you create a class, you need to decide what to name it. That's
because the class (package) name governs the name of the file used to
house it, just as with regular modules. Then, that class (package)
should provide one or more ways to generate objects. Finally, it should
provide mechanisms to allow users of its objects to indirectly manipulate
these objects from a distance.
For example, let's make a simple Person class module. It gets stored in
the file Person.pm. If it were called a Happy::Person class, it would
be stored in the file Happy/Person.pm, and its package would become
Happy::Person instead of just Person. (On a personal computer not
running Unix or Plan 9, but something like Mac OS or VMS, the directory
separator may be different, but the principle is the same.) Do not assume
any formal relationship between modules based on their directory names.
This is merely a grouping convenience, and has no effect on inheritance,
variable accessibility, or anything else.
For this module we aren't going to use Exporter, because we're
a well-behaved class module that doesn't export anything at all.
In order to manufacture objects, a class needs to have a constructor
method. A constructor gives you back not just a regular data type,
but a brand-new object in that class. This magic is taken care of by
the bless() function, whose sole purpose is to enable its referent to
be used as an object. Remember: being an object really means nothing
more than that methods may now be called against it.
While a constructor may be named anything you'd like, most Perl
programmers seem to like to call theirs new(). However, new() is not
a reserved word, and a class is under no obligation to supply such.
Some programmers have also been known to use a function with
the same name as the class as the constructor.
By far the most common mechanism used in Perl to represent a Pascal
record, a C struct, or a C++ class is an anonymous hash. That's because a
hash has an arbitrary number of data fields, each conveniently accessed by
an arbitrary name of your own devising.
If you were just doing a simple
struct-like emulation, you would likely go about it something like this:
$rec = {
name => "Jason",
age => 23,
peers => [ "Norbert", "Rhys", "Phineas"],
};
If you felt like it, you could add a bit of visual distinction
by up-casing the hash keys:
$rec = {
NAME => "Jason",
AGE => 23,
PEERS => [ "Norbert", "Rhys", "Phineas"],
};
And so you could get at $rec->{NAME} to find "Jason", or
@{ $rec->{PEERS} } to get at "Norbert", "Rhys", and "Phineas".
(Have you ever noticed how many 23-year-old programmers seem to
be named "Jason" these days? :-)
This same model is often used for classes, although it is not considered
the pinnacle of programming propriety for folks from outside the
class to come waltzing into an object, brazenly accessing its data
members directly. Generally speaking, an object should be considered
an opaque cookie that you use object methods to access. Visually,
methods look like you're dereffing a reference using a function name
instead of brackets or braces.
Some languages provide a formal syntactic interface to a class's methods,
but Perl does not. It relies on you to read the documentation of each
class. If you try to call an undefined method on an object, Perl won't
complain, but the program will trigger an exception while it's running.
Likewise, if you call a method expecting a prime number as its argument
with a non-prime one instead, you can't expect the compiler to catch this.
(Well, you can expect it all you like, but it's not going to happen.)
Let's suppose you have a well-educated user of your Person class,
someone who has read the docs that explain the prescribed
interface. Here's how they might use the Person class:
use Person;
$him = Person->new();
$him->name("Jason");
$him->age(23);
$him->peers( "Norbert", "Rhys", "Phineas" );
push @All_Recs, $him;
printf "%s is %d years old.\n", $him->name, $him->age;
print "His peers are: ", join(", ", $him->peers), "\n";
printf "Last rec's name is %s\n", $All_Recs[-1]->name;
As you can see, the user of the class doesn't know (or at least, has no
business paying attention to the fact) that the object has one particular
implementation or another. The interface to the class and its objects
is exclusively via methods, and that's all the user of the class should
ever play with.
Still, someone has to know what's in the object. And that someone is
the class. It implements methods that the programmer uses to access
the object. Here's how to implement the Person class using the standard
hash-ref-as-an-object idiom. We'll make a class method called new() to
act as the constructor, and three object methods called name(), age(), and
peers() to get at per-object data hidden away in our anonymous hash.
package Person;
use strict;
sub new {
my $self = {};
$self->{NAME} = undef;
$self->{AGE} = undef;
$self->{PEERS} = [];
bless($self);
return $self;
}
sub name {
my $self = shift;
if (@_) { $self->{NAME} = shift }
return $self->{NAME};
}
sub age {
my $self = shift;
if (@_) { $self->{AGE} = shift }
return $self->{AGE};
}
sub peers {
my $self = shift;
if (@_) { @{ $self->{PEERS} } = @_ }
return @{ $self->{PEERS} };
}
1;
We've created three methods to access an object's data, name(), age(),
and peers(). These are all substantially similar. If called with an
argument, they set the appropriate field; otherwise they return the
value held by that field, meaning the value of that hash key.
Even though at this point you may not even know what it means, someday
you're going to worry about inheritance. (You can safely ignore this
for now and worry about it later if you'd like.) To ensure that this
all works out smoothly, you must use the double-argument form of bless().
The second argument is the class into which the referent will be blessed.
By not assuming our own class as the default second argument and instead
using the class passed into us, we make our constructor inheritable.
sub new {
my $class = shift;
my $self = {};
$self->{NAME} = undef;
$self->{AGE} = undef;
$self->{PEERS} = [];
bless ($self, $class);
return $self;
}
That's about all there is for constructors. These methods bring objects
to life, returning neat little opaque bundles to the user to be used in
subsequent method calls.
Every story has a beginning and an end. The beginning of the object's
story is its constructor, explicitly called when the object comes into
existence. But the ending of its story is the destructor, a method
implicitly called when an object leaves this life. Any per-object
clean-up code is placed in the destructor, which must (in Perl) be called
DESTROY.
If constructors can have arbitrary names, then why not destructors?
Because while a constructor is explicitly called, a destructor is not.
Destruction happens automatically via Perl's garbage collection (GC)
system, which is a quick but somewhat lazy reference-based GC system.
To know what to call, Perl insists that the destructor be named DESTROY.
Perl's notion of the right time to call a destructor is not well-defined
currently, which is why your destructors should not rely on when they are
called.
Why is DESTROY in all caps? Perl on occasion uses purely uppercase
function names as a convention to indicate that the function will
be automatically called by Perl in some way. Others that are called
implicitly include BEGIN, END, AUTOLOAD, plus all methods used by
tied objects, described in the perltie manpage.
In really good object-oriented programming languages, the user doesn't
care when the destructor is called. It just happens when it's supposed
to. In low-level languages without any GC at all, there's no way to
depend on this happening at the right time, so the programmer must
explicitly call the destructor to clean up memory and state, crossing
their fingers that it's the right time to do so. Unlike C++, an
object destructor is nearly never needed in Perl, and even when it is,
explicit invocation is uncalled for. In the case of our Person class,
we don't need a destructor because Perl takes care of simple matters
like memory deallocation.
The only situation where Perl's reference-based GC won't work is
when there's a circularity in the data structure, such as:
$this->{WHATEVER} = $this;
In that case, you must delete the self-reference manually if you expect
your program not to leak memory. While admittedly error-prone, this is
the best we can do right now. Nonetheless, rest assured that when your
program is finished, its objects' destructors are all duly called.
So you are guaranteed that an object eventually gets properly
destroyed, except in the unique case of a program that never exits.
(If you're running Perl embedded in another application, this full GC
pass happens a bit more frequently--whenever a thread shuts down.)
The methods we've talked about so far have either been constructors or
else simple "data methods", interfaces to data stored in the object.
These are a bit like an object's data members in the C++ world, except
that strangers don't access them as data. Instead, they should only
access the object's data indirectly via its methods. This is an
important rule: in Perl, access to an object's data should only
be made through methods.
Perl doesn't impose restrictions on who gets to use which methods.
The public-versus-private distinction is by convention, not syntax.
(Well, unless you use the Alias module described below in
Data Members as Variables.) Occasionally you'll see method names beginning or ending
with an underscore or two. This marking is a convention indicating
that the methods are private to that class alone and sometimes to its
closest acquaintances, its immediate subclasses. But this distinction
is not enforced by Perl itself. It's up to the programmer to behave.
There's no reason to limit methods to those that simply access data.
Methods can do anything at all. The key point is that they're invoked
against an object or a class. Let's say we'd like object methods that
do more than fetch or set one particular field.
sub exclaim {
my $self = shift;
return sprintf "Hi, I'm %s, age %d, working with %s",
$self->{NAME}, $self->{AGE}, join(", ", @{$self->{PEERS}});
}
Or maybe even one like this:
sub happy_birthday {
my $self = shift;
return ++$self->{AGE};
}
Some might argue that one should go at these this way:
sub exclaim {
my $self = shift;
return sprintf "Hi, I'm %s, age %d, working with %s",
$self->name, $self->age, join(", ", $self->peers);
}
sub happy_birthday {
my $self = shift;
return $self->age( $self->age() + 1 );
}
But since these methods are all executing in the class itself, this
may not be critical. There are tradeoffs to be made. Using direct
hash access is faster (about an order of magnitude faster, in fact), and
it's more convenient when you want to interpolate in strings. But using
methods (the external interface) internally shields not just the users of
your class but even you yourself from changes in your data representation.
What about "class data", data items common to each object in a class?
What would you want that for? Well, in your Person class, you might
like to keep track of the total people alive. How do you implement that?
You could make it a global variable called $Person::Census. But about
only reason you'd do that would be if you wanted people to be able to
get at your class data directly. They could just say $Person::Census
and play around with it. Maybe this is ok in your design scheme.
You might even conceivably want to make it an exported variable. To be
exportable, a variable must be a (package) global. If this were a
traditional module rather than an object-oriented one, you might do that.
While this approach is expected in most traditional modules, it's
generally considered rather poor form in most object modules. In an
object module, you should set up a protective veil to separate interface
from implementation. So provide a class method to access class data
just as you provide object methods to access object data.
So, you could still keep $Census as a package global and rely upon
others to honor the contract of the module and therefore not play around
with its implementation. You could even be supertricky and make $Census a
tied object as described in the perltie manpage, thereby intercepting all accesses.
But more often than not, you just want to make your class data a
file-scoped lexical. To do so, simply put this at the top of the file:
my $Census = 0;
Even though the scope of a my() normally expires when the block in which
it was declared is done (in this case the whole file being required or
used), Perl's deep binding of lexical variables guarantees that the
variable will not be deallocated, remaining accessible to functions
declared within that scope. This doesn't work with global variables
given temporary values via local(), though.
Irrespective of whether you leave $Census a package global or make
it instead a file-scoped lexical, you should make these
changes to your Person::new() constructor:
sub new {
my $class = shift;
my $self = {};
$Census++;
$self->{NAME} = undef;
$self->{AGE} = undef;
$self->{PEERS} = [];
bless ($self, $class);
return $self;
}
sub population {
return $Census;
}
Now that we've done this, we certainly do need a destructor so that
when Person is destroyed, the $Census goes down. Here's how
this could be done:
sub DESTROY { --$Census }
Notice how there's no memory to deallocate in the destructor? That's
something that Perl takes care of for you all by itself.
Alternatively, you could use the Class::Data::Inheritable module from
CPAN.
It turns out that this is not really a good way to go about handling
class data. A good scalable rule is that you must never reference class
data directly from an object method. Otherwise you aren't building a
scalable, inheritable class. The object must be the rendezvous point
for all operations, especially from an object method. The globals
(class data) would in some sense be in the "wrong" package in your
derived classes. In Perl, methods execute in the context of the class
they were defined in, not that of the object that triggered them.
Therefore, namespace visibility of package globals in methods is unrelated
to inheritance.
Got that? Maybe not. Ok, let's say that some other class "borrowed"
(well, inherited) the DESTROY method as it was defined above. When those
objects are destroyed, the original $Census variable will be altered,
not the one in the new class's package namespace. Perhaps this is what
you want, but probably it isn't.
Here's how to fix this. We'll store a reference to the data in the
value accessed by the hash key "_CENSUS". Why the underscore? Well,
mostly because an initial underscore already conveys strong feelings
of magicalness to a C programmer. It's really just a mnemonic device
to remind ourselves that this field is special and not to be used as
a public data member in the same way that NAME, AGE, and PEERS are.
(Because we've been developing this code under the strict pragma, prior
to perl version 5.004 we'll have to quote the field name.)
sub new {
my $class = shift;
my $self = {};
$self->{NAME} = undef;
$self->{AGE} = undef;
$self->{PEERS} = [];
$self->{"_CENSUS"} = \$Census;
bless ($self, $class);
++ ${ $self->{"_CENSUS"} };
return $self;
}
sub population {
my $self = shift;
if (ref $self) {
return ${ $self->{"_CENSUS"} };
} else {
return $Census;
}
}
sub DESTROY {
my $self = shift;
-- ${ $self->{"_CENSUS"} |