perltooc - Tom's OO Tutorial for Class Data in Perl
When designing an object class, you are sometimes faced with the situation
of wanting common state shared by all objects of that class.
Such class attributes act somewhat like global variables for the entire
class, but unlike program-wide globals, class attributes have meaning only to
the class itself.
Here are a few examples where class attributes might come in handy:
-
to keep a count of the objects you've created, or how many are
still extant.
-
to extract the name or file descriptor for a logfile used by a debugging
method.
-
to access collective data, like the total amount of cash dispensed by
all ATMs in a network in a given day.
-
to access the last object created by a class, or the most accessed object,
or to retrieve a list of all objects.
Unlike a true global, class attributes should not be accessed directly.
Instead, their state should be inspected, and perhaps altered, only
through the mediated access of class methods. These class attributes
accessor methods are similar in spirit and function to accessors used
to manipulate the state of instance attributes on an object. They provide a
clear firewall between interface and implementation.
You should allow access to class attributes through either the class
name or any object of that class. If we assume that $an_object is of
type Some_Class, and the &Some_Class::population_count method accesses
class attributes, then these two invocations should both be possible,
and almost certainly equivalent.
Some_Class->population_count()
$an_object->population_count()
The question is, where do you store the state which that method accesses?
Unlike more restrictive languages like C++, where these are called
static data members, Perl provides no syntactic mechanism to declare
class attributes, any more than it provides a syntactic mechanism to
declare instance attributes. Perl provides the developer with a broad
set of powerful but flexible features that can be uniquely crafted to
the particular demands of the situation.
A class in Perl is typically implemented in a module. A module consists
of two complementary feature sets: a package for interfacing with the
outside world, and a lexical file scope for privacy. Either of these
two mechanisms can be used to implement class attributes. That means you
get to decide whether to put your class attributes in package variables
or to put them in lexical variables.
And those aren't the only decisions to make. If you choose to use package
variables, you can make your class attribute accessor methods either ignorant
of inheritance or sensitive to it. If you choose lexical variables,
you can elect to permit access to them from anywhere in the entire file
scope, or you can limit direct data access exclusively to the methods
implementing those attributes.
One of the easiest ways to solve a hard problem is to let someone else
do it for you! In this case, Class::Data::Inheritable (available on a
CPAN near you) offers a canned solution to the class data problem
using closures. So before you wade into this document, consider
having a look at that module.
Because a class in Perl is really just a package, using package variables
to hold class attributes is the most natural choice. This makes it simple
for each class to have its own class attributes. Let's say you have a class
called Some_Class that needs a couple of different attributes that you'd
like to be global to the entire class. The simplest thing to do is to
use package variables like $Some_Class::CData1 and $Some_Class::CData2
to hold these attributes. But we certainly don't want to encourage
outsiders to touch those data directly, so we provide methods
to mediate access.
In the accessor methods below, we'll for now just ignore the first
argument--that part to the left of the arrow on method invocation, which
is either a class name or an object reference.
package Some_Class;
sub CData1 {
shift;
$Some_Class::CData1 = shift if @_;
return $Some_Class::CData1;
}
sub CData2 {
shift;
$Some_Class::CData2 = shift if @_;
return $Some_Class::CData2;
}
This technique is highly legible and should be completely straightforward
to even the novice Perl programmer. By fully qualifying the package
variables, they stand out clearly when reading the code. Unfortunately,
if you misspell one of these, you've introduced an error that's hard
to catch. It's also somewhat disconcerting to see the class name itself
hard-coded in so many places.
Both these problems can be easily fixed. Just add the use strict
pragma, then pre-declare your package variables. (The our operator
will be new in 5.6, and will work for package globals just like my
works for scoped lexicals.)
package Some_Class;
use strict;
our($CData1, $CData2);
sub CData1 {
shift;
$CData1 = shift if @_;
return $CData1;
}
sub CData2 {
shift;
$CData2 = shift if @_;
return $CData2;
}
As with any other global variable, some programmers prefer to start their
package variables with capital letters. This helps clarity somewhat, but
by no longer fully qualifying the package variables, their significance
can be lost when reading the code. You can fix this easily enough by
choosing better names than were used here.
Just as the mindless enumeration of accessor methods for instance attributes
grows tedious after the first few (see the perltoot manpage), so too does the
repetition begin to grate when listing out accessor methods for class
data. Repetition runs counter to the primary virtue of a programmer:
Laziness, here manifesting as that innate urge every programmer feels
to factor out duplicate code whenever possible.
Here's what to do. First, make just one hash to hold all class attributes.
package Some_Class;
use strict;
our %ClassData = (
CData1 => "",
CData2 => "",
);
Using closures (see the perlref manpage) and direct access to the package symbol
table (see the perlmod manpage), now clone an accessor method for each key in
the %ClassData hash. Each of these methods is used to fetch or store
values to the specific, named class attribute.
for my $datum (keys %ClassData) {
no strict "refs";
*$datum = sub {
shift;
$ClassData{$datum} = shift if @_;
return $ClassData{$datum};
}
}
It's true that you could work out a solution employing an &AUTOLOAD
method, but this approach is unlikely to prove satisfactory. Your
function would have to distinguish between class attributes and object
attributes; it could interfere with inheritance; and it would have to
careful about DESTROY. Such complexity is uncalled for in most cases,
and certainly in this one.
You may wonder why we're rescinding strict refs for the loop. We're
manipulating the package's symbol table to introduce new function names
using symbolic references (indirect naming), which the strict pragma
would otherwise forbid. Normally, symbolic references are a dodgy
notion at best. This isn't just because they can be used accidentally
when you aren't meaning to. It's also because for most uses
to which beginning Perl programmers attempt to put symbolic references,
we have much better approaches, like nested hashes or hashes of arrays.
But there's nothing wrong with using symbolic references to manipulate
something that is meaningful only from the perspective of the package
symbol table, like method names or package variables. In other
words, when you want to refer to the symbol table, use symbol references.
Clustering all the class attributes in one place has several advantages.
They're easy to spot, initialize, and change. The aggregation also
makes them convenient to access externally, such as from a debugger
or a persistence package. The only possible problem is that we don't
automatically know the name of each class's class object, should it have
one. This issue is addressed below in The Eponymous Meta-Object.
Suppose you have an instance of a derived class, and you access class
data using an inherited method call. Should that end up referring
to the base class's attributes, or to those in the derived class?
How would it work in the earlier examples? The derived class inherits
all the base class's methods, including those that access class attributes.
But what package are the class attributes stored in?
The answer is that, as written, class attributes are stored in the package into
which those methods were compiled. When you invoke the &CData1 method
on the name of the derived class or on one of that class's objects, the
version shown above is still run, so you'll access $Some_Class::CData1--or
in the method cloning version, $Some_Class::ClassData{CData1}.
Think of these class methods as executing in the context of their base
class, not in that of their derived class. Sometimes this is exactly
what you want. If Feline subclasses Carnivore, then the population of
Carnivores in the world should go up when a new Feline is born.
But what if you wanted to figure out how many Felines you have apart
from Carnivores? The current approach doesn't support that.
You'll have to decide on a case-by-case basis whether it makes any sense
for class attributes to be package-relative. If you want it to be so,
then stop ignoring the first argument to the function. Either it will
be a package name if the method was invoked directly on a class name,
or else it will be an object reference if the method was invoked on an
object reference. In the latter case, the ref() function provides the
class of that object.
package Some_Class;
sub CData1 {
my $obclass = shift;
my $class = ref($obclass) || $obclass;
my $varname = $class . "::CData1";
no strict "refs";
$$varname = shift if @_;
return $$varname;
}
And then do likewise for all other class attributes (such as CData2,
etc.) that you wish to access as package variables in the invoking package
instead of the compiling package as we had previously.
Once again we temporarily disable the strict references ban, because
otherwise we couldn't use the fully-qualified symbolic name for
the package global. This is perfectly reasonable: since all package
variables by definition live in a package, there's nothing wrong with
accessing them via that package's symbol table. That's what it's there
for (well, somewhat).
What about just using a single hash for everything and then cloning
methods? What would that look like? The only difference would be the
closure used to produce new method entries for the class's symbol table.
no strict "refs";
*$datum = sub {
my $obclass = shift;
my $class = ref($obclass) || $obclass;
my $varname = $class . "::ClassData";
$varname->{$datum} = shift if @_;
return $varname->{$datum};
}
It could be argued that the %ClassData hash in the previous example is
neither the most imaginative nor the most intuitive of names. Is there
something else that might make more sense, be more useful, or both?
As it happens, yes, there is. For the "class meta-object", we'll use
a package variable of the same name as the package itself. Within the
scope of a package Some_Class declaration, we'll use the eponymously
named hash %Some_Class as that class's meta-object. (Using an eponymously
named hash is somewhat reminiscent of classes that name their constructors
eponymously in the Python or C++ fashion. That is, class Some_Class would
use &Some_Class::Some_Class as a constructor, probably even exporting that
name as well. The StrNum class in Recipe 13.14 in The Perl Cookbook
does this, if you're looking for an example.)
This predictable approach has many benefits, including having a well-known
identifier to aid in debugging, transparent persistence,
or checkpointing. It's also the obvious name for monadic classes and
translucent attributes, discussed later.
Here's an example of such a class. Notice how the name of the
hash storing the meta-object is the same as the name of the package
used to implement the class.
package Some_Class;
use strict;
our %Some_Class = (
CData1 => "",
CData2 => "",
);
sub CData1 {
my $obclass = shift;
my $class = ref($obclass) || $obclass;
no strict "refs";
$class->{CData1} = shift if @_;
return $class->{CData1};
}
sub CData2 {
shift;
no strict "refs";
__PACKAGE__ -> {CData2} = shift if @_;
return __PACKAGE__ -> {CData2};
}
In the second accessor method, the __PACKAGE__ notation was used for
two reasons. First, to avoid hardcoding the literal package name
in the code in case we later want to change that name. Second, to
clarify to the reader that what matters here is the package currently
being compiled into, not the package of the invoking object or class.
If the long sequence of non-alphabetic characters bothers you, you can
always put the __PACKAGE__ in a variable first.
sub CData2 {
shift;
no strict "refs";
my $class = __PACKAGE__;
$class->{CData2} = shift if @_;
return $class->{CData2};
}
Even though we're using symbolic references for good not evil, some
folks tend to become unnerved when they see so many places with strict
ref checking disabled. Given a symbolic reference, you can always
produce a real reference (the reverse is not true, though). So we'll
create a subroutine that does this conversion for us. If invoked as a
function of no arguments, it returns a reference to the compiling class's
eponymous hash. Invoked as a class method, it returns a reference to
the eponymous hash of its caller. And when invoked as an object method,
this function returns a reference to the eponymous hash for whatever
class the object belongs to.
package Some_Class;
use strict;
our %Some_Class = (
CData1 => "",
CData2 => "",
);
sub _classobj {
my $obclass = shift || __PACKAGE__;
my $class = ref($obclass) || $obclass;
no strict "refs";
return \%$class;
}
for my $datum (keys %{ _classobj() } ) {
no strict "refs";
*$datum = sub {
use strict "refs";
my $self = shift->_classobj();
$self->{$datum} = shift if @_;
return $self->{$datum};
}
}
A reasonably common strategy for handling class attributes is to store
a reference to each package variable on the object itself. This is
a strategy you've probably seen before, such as in the perltoot manpage and
the perlbot manpage, but there may be variations in the example below that you
haven't thought of before.
package Some_Class;
our($CData1, $CData2);
sub new {
my $obclass = shift;
return bless my $self = {
ObData1 => "",
ObData2 => "",
CData1 => \$CData1,
CData2 => \$CData2,
} => (ref $obclass || $obclass);
}
sub ObData1 {
my $self = shift;
$self->{ObData1} = shift if @_;
return $self->{ObData1};
}
sub ObData2 {
my $self = shift;
$self->{ObData2} = shift if @_;
return $self->{ObData2};
}
sub CData1 {
my $self = shift;
my $dataref = ref $self
? $self->{CData1}
: \$CData1;
$$dataref = shift if @_;
return $$dataref;
}
sub CData2 {
my $self = shift;
my $dataref = ref $self
? $self->{CData2}
: \$CData2;
$$dataref = shift if @_;
return $$dataref;
}
As written above, a derived class will inherit these methods, which
will consequently access package variables in the base class's package.
This is not necessarily expected behavior in all circumstances. Here's an
example that uses a variable meta-object, taking care to access the
proper package's data.
package Some_Class;
use strict;
our %Some_Class = (
CData1 => "",
CData2 => "",
);
sub _classobj {
my $self = shift;
my $class = ref($self) || $self;
no strict "refs";
return \%$class;
}
sub new {
my $obclass = shift;
my $classobj = $obclass->_classobj();
bless my $self = {
ObData1 => "",
ObData2 => "",
CData1 => \$classobj->{CData1},
CData2 => \$classobj->{CData2},
} => (ref $obclass || $obclass);
return $self;
}
sub ObData1 {
my $self = shift;
$self->{ObData1} = shift if @_;
return $self->{ObData1};
}
sub ObData2 {
my $self = shift;
$self->{ObData2} = shift if @_;
return $self->{ObData2};
}
sub CData1 {
my $self = shift;
$self = $self->_classobj() unless ref $self;
my $dataref = $self->{CData1};
$$dataref = shift if @_;
return $$dataref;
}
sub CData2 {
my $self = shift;
$self = $self->_classobj() unless ref $self;
my $dataref = $self->{CData2};
$$dataref = shift if @_;
return $$dataref;
}
Not only are we now strict refs clean, using an eponymous meta-object
seems to make the code cleaner. Unlike the previous version, this one
does something interesting in the face of inheritance: it accesses the
class meta-object in the invoking class instead of the one into which
the method was initially compiled.
You can easily access data in the class meta-object, making
it easy to dump the complete class state using an external mechanism such
as when debugging or implementing a persistent class. This works because
the class meta-object is a package variable, has a well-known name, and
clusters all its data together. (Transparent persistence
is not always feasible, but it's certainly an appealing idea.)
There's still no check that object accessor methods have not been
invoked on a class name. If strict ref checking is enabled, you'd
blow up. If not, then you get the eponymous meta-object. What you do
with--or about--this is up to you. The next two sections demonstrate
innovative uses for this powerful feature.
Some of the standard modules shipped with Perl provide class interfaces
without any attribute methods whatsoever. The most commonly used module
not numbered amongst the pragmata, the Exporter module, is a class with
neither constructors nor attributes. Its job is simply to provide a
standard interface for modules wishing to export part of their namespace
into that of their caller. Modules use the Exporter's &import method by
setting their inheritance list in their package's @ISA array to mention
"Exporter". But class Exporter provides no constructor, so you can't
have several instances of the class. In fact, you can't have any--it
just doesn't make any sense. All you get is its methods. Its interface
contains no statefulness, so state data is wholly superfluous.
Another sort of class that pops up from time to time is one that supports
a unique instance. Such classes are called monadic classes, or less
formally, singletons or highlander classes.
If a class is monadic, where do you store its state, that is,
its attributes? How do you make sure that there's never more than
one instance? While you could merely use a slew of package variables,
it's a lot cleaner to use the eponymously named hash. Here's a complete
example of a monadic class:
package Cosmos;
%Cosmos = ();
sub name {
my $self = shift;
$self->{name} = shift if @_;
return $self->{name};
}
sub birthday {
my $self = shift;
die "can't reset birthday" if @_;
return $self->{birthday};
}
sub stars {
my $self = shift;
$self->{stars} = shift if @_;
return $self->{stars};
}
sub supernova {
my $self = shift;
my $count = $self->stars();
$self->stars($count - 1) if $count > 0;
}
sub bigbang {
my $self = shift;
%$self = (
name => "the world according to tchrist",
birthday => time(),
stars => 0,
);
return $self;
}
__PACKAGE__ -> bigbang();
Hold on, that doesn't look like anything special. Those attribute
accessors look no different than they would if this were a regular class
instead of a monadic one. The crux of the matter is there's nothing
that says that $self must hold a reference to a blessed object. It merely
has to be something you can invoke methods on. Here the package name
itself, Cosmos, works as an object. Look at the &supernova method. Is that
a class method or an object method? The answer is that static analysis
cannot reveal the answer. Perl doesn't care, and neither should you.
In the three attribute methods, %$self is really accessing the %Cosmos
package variable.
If like Stephen Hawking, you posit the existence of multiple, sequential,
and unrelated universes, then you can invoke the &bigbang method yourself
at any time to start everything all over again. You might think of
&bigbang as more of an initializer than a constructor, since the function
doesn't allocate new memory; it only initializes what's already there.
But like any other constructor, it does return a scalar value to use
for later method invocations.
Imagine that some day in the future, you decide that one universe just
isn't enough. You could write a new class from scratch, but you already
have an existing class that does what you want--except that it's monadic,
and you want more than just one cosmos.
That's what code reuse via subclassing is all about. Look how short
the new code is:
package Multiverse;
use Cosmos;
@ISA = qw(Cosmos);
sub new {
my $protoverse = shift;
my $class = ref($protoverse) || $protoverse;
my $self = {};
return bless($self, $class)->bigbang();
}
1;
Because we were careful to be good little creators when we designed our
Cosmos class, we can now reuse it without touching a single line of code
when it comes time to write our Multiverse class. The same code that
worked when invoked as a class method continues to work perfectly well
when invoked against separate instances of a derived class.
The astonishing thing about the Cosmos class above is that the value
returned by the &bigbang "constructor" is not a reference to a blessed
object at all. It's just the class's own name. A class name is, for
virtually all intents and purposes, a perfectly acceptable object.
It has state, behavior, and identity, the three crucial components
of an object system. It even manifests inheritance, polymorphism,
and encapsulation. And what more can you ask of an object?
To understand object orientation in Perl, it's important to recognize the
unification of what other programming languages might think of as class
methods and object methods into just plain methods. "Class methods"
and "object methods" are distinct only in the compartmentalizing mind
of the Perl programmer, not in the Perl language itself.
Along those same lines, a constructor is nothing special either, which
is one reason why Perl has no pre-ordained name for them. "Constructor"
is just an informal term loosely used to describe a method that returns
a scalar value that you can make further method calls against. So long
as it's either a class name or an object reference, that's good enough.
It doesn't even have to be a reference to a brand new object.
You can have as many--or as few--constructors as you want, and you can
name them whatever you care to. Blindly and obediently using new()
for each and every constructor you ever write is to speak Perl with
such a severe C++ accent that you do a disservice to both languages.
There's no reason to insist that each class have but one constructor,
or that a constructor be named new(), or that a constructor be
used solely as a class method and not an object method.
The next section shows how useful it can be to further distance ourselves
from any formal distinction between class method calls and object method
calls, both in constructors and in accessor methods.
A package's eponymous hash can be used for more than just containing
per-class, global state data. It can also serve as a sort of template
containing default settings for object attributes. These default
settings can then be used in constructors for initialization of a
particular object. The class's eponymous hash can also be used to
implement translucent attributes. A translucent attribute is one
that has a class-wide default. Each object can set its own value for the
attribute, in which case $object->attribute() returns that value.
But if no value has been set, then $object->attribute() returns
the class-wide default.
We'll apply something of a copy-on-write approach to these translucent
attributes. If you're just fetching values from them, you get
translucency. But if you store a new value to them, that new value is
set on the current object. On the other hand, if you use the class as
an object and store the attribute value directly on the class, then the
meta-object's value changes, and later fetch operations on objects with
uninitialized values for those attributes will retrieve the meta-object's
new values. Objects with their own initialized values, however, won't
see any change.
Let's look at some concrete examples of using these properties before we
show how to implement them. Suppose that a class named Some_Class
had a translucent data attribute called "color". First you set the color
in the meta-object, then you create three objects using a constructor
that happens to be named &spawn.
use Vermin;
Vermin->color("vermilion");
$ob1 = Vermin->spawn();
$ob2 = Vermin->spawn();
$ob3 = Vermin->spawn();
print $obj3->color();
Each of these objects' colors is now "vermilion", because that's the
meta-object's value for that attribute, and these objects do not have
individual color values set.
Changing the attribute on one object has no effect on other objects
previously created.
$ob3->color("chartreuse");
print $ob3->color();
print $ob1->color();
If you now use $ob3 to spawn off another o |