|
perltooc - Tom's OO Tutorial for Class Data in Perl
When designing an object class, you are sometimes faced with the situation
of wanting common state shared by all objects of that class.
Such class attributes act somewhat like global variables for the entire
class, but unlike program-wide globals, class attributes have meaning only to
the class itself.
Here are a few examples where class attributes might come in handy:
-
to keep a count of the objects you've created, or how many are
still extant.
-
to extract the name or file descriptor for a logfile used by a debugging
method.
-
to access collective data, like the total amount of cash dispensed by
all ATMs in a network in a given day.
-
to access the last object created by a class, or the most accessed object,
or to retrieve a list of all objects.
Unlike a true global, class attributes should not be accessed directly.
Instead, their state should be inspected, and perhaps altered, only
through the mediated access of class methods. These class attributes
accessor methods are similar in spirit and function to accessors used
to manipulate the state of instance attributes on an object. They provide a
clear firewall between interface and implementation.
You should allow access to class attributes through either the class
name or any object of that class. If we assume that $an_object is of
type Some_Class, and the &Some_Class::population_count method accesses
class attributes, then these two invocations should both be possible,
and almost certainly equivalent.
Some_Class->population_count()
$an_object->population_count()
The question is, where do you store the state which that method accesses?
Unlike more restrictive languages like C++, where these are called
static data members, Perl provides no syntactic mechanism to declare
class attributes, any more than it provides a syntactic mechanism to
declare instance attributes. Perl provides the developer with a broad
set of powerful but flexible features that can be uniquely crafted to
the particular demands of the situation.
A class in Perl is typically implemented in a module. A module consists
of two complementary feature sets: a package for interfacing with the
outside world, and a lexical file scope for privacy. Either of these
two mechanisms can be used to implement class attributes. That means you
get to decide whether to put your class attributes in package variables
or to put them in lexical variables.
And those aren't the only decisions to make. If you choose to use package
variables, you can make your class attribute accessor methods either ignorant
of inheritance or sensitive to it. If you choose lexical variables,
you can elect to permit access to them from anywhere in the entire file
scope, or you can limit direct data access exclusively to the methods
implementing those attributes.
One of the easiest ways to solve a hard problem is to let someone else
do it for you! In this case, Class::Data::Inheritable (available on a
CPAN near you) offers a canned solution to the class data problem
using closures. So before you wade into this document, consider
having a look at that module.
Because a class in Perl is really just a package, using package variables
to hold class attributes is the most natural choice. This makes it simple
for each class to have its own class attributes. Let's say you have a class
called Some_Class that needs a couple of different attributes that you'd
like to be global to the entire class. The simplest thing to do is to
use package variables like $Some_Class::CData1 and $Some_Class::CData2
to hold these attributes. But we certainly don't want to encourage
outsiders to touch those data directly, so we provide methods
to mediate access.
In the accessor methods below, we'll for now just ignore the first
argument--that part to the left of the arrow on method invocation, which
is either a class name or an object reference.
package Some_Class;
sub CData1 {
shift;
$Some_Class::CData1 = shift if @_;
return $Some_Class::CData1;
}
sub CData2 {
shift;
$Some_Class::CData2 = shift if @_;
return $Some_Class::CData2;
}
This technique is highly legible and should be completely straightforward
to even the novice Perl programmer. By fully qualifying the package
variables, they stand out clearly when reading the code. Unfortunately,
if you misspell one of these, you've introduced an error that's hard
to catch. It's also somewhat disconcerting to see the class name itself
hard-coded in so many places.
Both these problems can be easily fixed. Just add the use strict
pragma, then pre-declare your package variables. (The our operator
will be new in 5.6, and will work for package globals just like my
works for scoped lexicals.)
package Some_Class;
use strict;
our($CData1, $CData2);
sub CData1 {
shift;
$CData1 = shift if @_;
return $CData1;
}
sub CData2 {
shift;
$CData2 = shift if @_;
return $CData2;
}
As with any other global variable, some programmers prefer to start their
package variables with capital letters. This helps clarity somewhat, but
by no longer fully qualifying the package variables, their significance
can be lost when reading the code. You can fix this easily enough by
choosing better names than were used here.
Just as the mindless enumeration of accessor methods for instance attributes
grows tedious after the first few (see the perltoot manpage), so too does the
repetition begin to grate when listing out accessor methods for class
data. Repetition runs counter to the primary virtue of a programmer:
Laziness, here manifesting as that innate urge every programmer feels
to factor out duplicate code whenever possible.
Here's what to do. First, make just one hash to hold all class attributes.
package Some_Class;
use strict;
our %ClassData = (
CData1 => "",
CData2 => "",
);
Using closures (see the perlref manpage) and direct access to the package symbol
table (see the perlmod manpage), now clone an accessor method for each key in
the %ClassData hash. Each of these methods is used to fetch or store
values to the specific, named class attribute.
for my $datum (keys %ClassData) {
no strict "refs";
*$datum = sub {
shift;
$ClassData{$datum} = shift if @_;
return $ClassData{$datum};
}
}
It's true that you could work out a solution employing an &AUTOLOAD
method, but this approach is unlikely to prove satisfactory. Your
function would have to distinguish between class attributes and object
attributes; it could interfere with inheritance; and it would have to
careful about DESTROY. Such complexity is uncalled for in most cases,
and certainly in this one.
You may wonder why we're rescinding strict refs for the loop. We're
manipulating the package's symbol table to introduce new function names
using symbolic references (indirect naming), which the strict pragma
would otherwise forbid. Normally, symbolic references are a dodgy
notion at best. This isn't just because they can be used accidentally
when you aren't meaning to. It's also because for most uses
to which beginning Perl programmers attempt to put symbolic references,
we have much better approaches, like nested hashes or hashes of arrays.
But there's nothing wrong with using symbolic references to manipulate
something that is meaningful only from the perspective of the package
symbol table, like method names or package variables. In other
words, when you want to refer to the symbol table, use symbol references.
Clustering all the class attributes in one place has several advantages.
They're easy to spot, initialize, and change. The aggregation also
makes them convenient to access externally, such as from a debugger
or a persistence package. The only possible problem is that we don't
automatically know the name of each class's class object, should it have
one. This issue is addressed below in The Eponymous Meta-Object.
Suppose you have an instance of a derived class, and you access class
data using an inherited method call. Should that end up referring
to the base class's attributes, or to those in the derived class?
How would it work in the earlier examples? The derived class inherits
all the base class's methods, including those that access class attributes.
But what package are the class attributes stored in?
The answer is that, as written, class attributes are stored in the package into
which those methods were compiled. When you invoke the &CData1 method
on the name of the derived class or on one of that class's objects, the
version shown above is still run, so you'll access $Some_Class::CData1--or
in the method cloning version, $Some_Class::ClassData{CData1}.
Think of these class methods as executing in the context of their base
class, not in that of their derived class. Sometimes this is exactly
what you want. If Feline subclasses Carnivore, then the population of
Carnivores in the world should go up when a new Feline is born.
But what if you wanted to figure out how many Felines you have apart
from Carnivores? The current approach doesn't support that.
You'll have to decide on a case-by-case basis whether it makes any sense
for class attributes to be package-relative. If you want it to be so,
then stop ignoring the first argument to the function. Either it will
be a package name if the method was invoked directly on a class name,
or else it will be an object reference if the method was invoked on an
object reference. In the latter case, the ref() function provides the
class of that object.
package Some_Class;
sub CData1 {
my $obclass = shift;
my $class = ref($obclass) || $obclass;
my $varname = $class . "::CData1";
no strict "refs";
$$varname = shift if @_;
return $$varname;
}
And then do likewise for all other class attributes (such as CData2,
etc.) that you wish to access as package variables in the invoking package
instead of the compiling package as we had previously.
Once again we temporarily disable the strict references ban, because
otherwise we couldn't use the fully-qualified symbolic name for
the package global. This is perfectly reasonable: since all package
variables by definition live in a package, there's nothing wrong with
accessing them via that package's symbol table. That's what it's there
for (well, somewhat).
What about just using a single hash for everything and then cloning
methods? What would that look like? The only difference would be the
closure used to produce new method entries for the class's symbol table.
no strict "refs";
*$datum = sub {
my $obclass = shift;
my $class = ref($obclass) || $obclass;
my $varname = $class . "::ClassData";
$varname->{$datum} = shift if @_;
return $varname->{$datum};
}
It could be argued that the %ClassData hash in the previous example is
neither the most imaginative nor the most intuitive of names. Is there
something else that might make more sense, be more useful, or both?
As it happens, yes, there is. For the "class meta-object", we'll use
a package variable of the same name as the package itself. Within the
scope of a package Some_Class declaration, we'll use the eponymously
named hash %Some_Class as that class's meta-object. (Using an eponymously
named hash is somewhat reminiscent of classes that name their constructors
eponymously in the Python or C++ fashion. That is, class Some_Class would
use &Some_Class::Some_Class as a constructor, probably even exporting that
name as well. The StrNum class in Recipe 13.14 in The Perl Cookbook
does this, if you're looking for an example.)
This predictable approach has many benefits, including having a well-known
identifier to aid in debugging, transparent persistence,
or checkpointing. It's also the obvious name for monadic classes and
translucent attributes, discussed later.
Here's an example of such a class. Notice how the name of the
hash storing the meta-object is the same as the name of the package
used to implement the class.
package Some_Class;
use strict;
our %Some_Class = (
CData1 => "",
CData2 => "",
);
sub CData1 {
my $obclass = shift;
my $class = ref($obclass) || $obclass;
no strict "refs";
$class->{CData1} = shift if @_;
return $class->{CData1};
}
sub CData2 {
shift;
no strict "refs";
__PACKAGE__ -> {CData2} = shift if @_;
return __PACKAGE__ -> {CData2};
}
In the second accessor method, the __PACKAGE__ notation was used for
two reasons. First, to avoid hardcoding the literal package name
in the code in case we later want to change that name. Second, to
clarify to the reader that what matters here is the package currently
being compiled into, not the package of the invoking object or class.
If the long sequence of non-alphabetic characters bothers you, you can
always put the __PACKAGE__ in a variable first.
sub CData2 {
shift;
no strict "refs";
my $class = __PACKAGE__;
$class->{CData2} = shift if @_;
return $class->{CData2};
}
Even though we're using symbolic references for good not evil, some
folks tend to become unnerved when they see so many places with strict
ref checking disabled. Given a symbolic reference, you can always
produce a real reference (the reverse is not true, though). So we'll
create a subroutine that does this conversion for us. If invoked as a
function of no arguments, it returns a reference to the compiling class's
eponymous hash. Invoked as a class method, it returns a reference to
the eponymous hash of its caller. And when invoked as an object method,
this function returns a reference to the eponymous hash for whatever
class the object belongs to.
package Some_Class;
use strict;
our %Some_Class = (
CData1 => "",
CData2 => "",
);
sub _classobj {
|