|
perlguts - Introduction to the Perl API
This document attempts to describe how to use the Perl API, as well as
to provide some info on the basic workings of the Perl core. It is far
from complete and probably contains many errors. Please refer any
questions or comments to the author below.
Perl has three typedefs that handle Perl's three main data types:
SV Scalar Value
AV Array Value
HV Hash Value
Each typedef has specific routines that manipulate the various data types.
Perl uses a special typedef IV which is a simple signed integer type that is
guaranteed to be large enough to hold a pointer (as well as an integer).
Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses two special typedefs, I32 and I16, which will always be at
least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
as well.) They will usually be exactly 32 and 16 bits long, but on Crays
they will both be 64 bits.
An SV can be created and loaded with one command. There are five types of
values that can be loaded: an integer value (IV), an unsigned integer
value (UV), a double (NV), a string (PV), and another scalar (SV).
The seven routines are:
SV* newSViv(IV);
SV* newSVuv(UV);
SV* newSVnv(double);
SV* newSVpv(const char*, STRLEN);
SV* newSVpvn(const char*, STRLEN);
SV* newSVpvf(const char*, ...);
SV* newSVsv(SV*);
STRLEN is an integer type (Size_t, usually defined as size_t in
config.h) guaranteed to be large enough to represent the size of
any string that perl can handle.
In the unlikely case of a SV requiring more complex initialisation, you
can create an empty SV with newSV(len). If len is 0 an empty SV of
type NULL is returned, else an SV of type PV is returned with len + 1 (for
the NUL) bytes of storage allocated, accessible via SvPVX. In both cases
the SV has value undef.
SV *sv = newSV(0); /* no storage allocated */
SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
To change the value of an already-existing SV, there are eight routines:
void sv_setiv(SV*, IV);
void sv_setuv(SV*, UV);
void sv_setnv(SV*, double);
void sv_setpv(SV*, const char*);
void sv_setpvn(SV*, const char*, STRLEN)
void sv_setpvf(SV*, const char*, ...);
void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
void sv_setsv(SV*, SV*);
Notice that you can choose to specify the length of the string to be
assigned by using sv_setpvn, newSVpvn, or newSVpv, or you may
allow Perl to calculate the length by using sv_setpv or by specifying
0 as the second argument to newSVpv. Be warned, though, that Perl will
determine the string's length by using strlen, which depends on the
string terminating with a NUL character.
The arguments of sv_setpvf are processed like sprintf, and the
formatted output becomes the value.
sv_vsetpvfn is an analogue of vsprintf, but it allows you to specify
either a pointer to a variable argument list or the address and length of
an array of SVs. The last argument points to a boolean; on return, if that
boolean is true, then locale-specific information has been used to format
the string, and the string's contents are therefore untrustworthy (see
the perlsec manpage). This pointer may be NULL if that information is not
important. Note that this function requires you to specify the length of
the format.
The sv_set*() functions are not generic enough to operate on values
that have "magic". See Magic Virtual Tables later in this document.
All SVs that contain strings should be terminated with a NUL character.
If it is not NUL-terminated there is a risk of
core dumps and corruptions from code which passes the string to C
functions or system calls which expect a NUL-terminated string.
Perl's own functions typically add a trailing NUL for this reason.
Nevertheless, you should be very careful when you pass a string stored
in an SV to a C function or system call.
To access the actual value that an SV points to, you can use the macros:
SvIV(SV*)
SvUV(SV*)
SvNV(SV*)
SvPV(SV*, STRLEN len)
SvPV_nolen(SV*)
which will automatically coerce the actual scalar type into an IV, UV, double,
or string.
In the SvPV macro, the length of the string returned is placed into the
variable len (this is a macro, so you do not use &len). If you do
not care what the length of the data is, use the SvPV_nolen macro.
Historically the SvPV macro with the global variable PL_na has been
used in this case. But that can be quite inefficient because PL_na must
be accessed in thread-local storage in threaded Perl. In any case, remember
that Perl allows arbitrary strings of data that may both contain NULs and
might not be terminated by a NUL.
Also remember that C doesn't allow you to safely say foo(SvPV(s, len),
len);. It might work with your compiler, but it won't work for everyone.
Break this sort of statement up into separate assignments:
SV *s;
STRLEN len;
char * ptr;
ptr = SvPV(s, len);
foo(ptr, len);
If you want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to force
Perl to allocate more memory for your SV, you can use the macro
SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it will
call the function sv_grow. Note that SvGROW can only increase, not
decrease, the allocated memory of an SV and that it does not automatically
add a byte for the a trailing NUL (perl's own string functions typically do
SvGROW(sv, len + 1)).
If you have an SV and want to know what kind of data Perl thinks is stored
in it, you can use the following macros to check the type of SV you have.
SvIOK(SV*)
SvNOK(SV*)
SvPOK(SV*)
You can get and set the current length of the string stored in an SV with
the following macros:
SvCUR(SV*)
SvCUR_set(SV*, I32 val)
You can also get a pointer to the end of the string stored in the SV
with the macro:
SvEND(SV*)
But note that these last three macros are valid only if SvPOK() is true.
If you want to append something to the end of string stored in an SV*,
you can use the following functions:
void sv_catpv(SV*, const char*);
void sv_catpvn(SV*, const char*, STRLEN);
void sv_catpvf(SV*, const char*, ...);
void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
void sv_catsv(SV*, SV*);
The first function calculates the length of the string to be appended by
using strlen. In the second, you specify the length of the string
yourself. The third function processes its arguments like sprintf and
appends the formatted output. The fourth function works like vsprintf.
You can specify the address and length of an array of SVs instead of the
va_list argument. The fifth function extends the string stored in the first
SV with the string stored in the second SV. It also forces the second SV
to be interpreted as a string.
The sv_cat*() functions are not generic enough to operate on values that
have "magic". See Magic Virtual Tables later in this document.
If you know the name of a scalar variable, you can get a pointer to its SV
by using the following:
SV* get_sv("package::varname", FALSE);
This returns NULL if the variable does not exist.
If you want to know if this variable (or any other SV) is actually defined,
you can call:
SvOK(SV*)
The scalar undef value is stored in an SV instance called PL_sv_undef.
Its address can be used whenever an SV* is needed. Make sure that
you don't try to compare a random sv with &PL_sv_undef. For example
when interfacing Perl code, it'll work correctly for:
foo(undef);
But won't work when called as:
$x = undef;
foo($x);
So to repeat always use SvOK() to check whether an sv is defined.
Also you have to be careful when using &PL_sv_undef as a value in
AVs or HVs (see AVs, HVs and undefined values).
There are also the two values PL_sv_yes and PL_sv_no, which contain
boolean TRUE and FALSE values, respectively. Like PL_sv_undef, their
addresses can be used whenever an SV* is needed.
Do not be fooled into thinking that (SV *) 0 is the same as &PL_sv_undef.
Take this code:
SV* sv = (SV*) 0;
if (I-am-to-return-a-real-value) {
sv = sv_2mortal(newSViv(42));
}
sv_setsv(ST(0), sv);
This code tries to return a new SV (which contains the value 42) if it should
return a real value, or undef otherwise. Instead it has returned a NULL
pointer which, somewhere down the line, will cause a segmentation violation,
bus error, or just weird results. Change the zero to &PL_sv_undef in the
first line and all will be well.
To free an SV that you've created, call SvREFCNT_dec(SV*). Normally this
call is not necessary (see Reference Counts and Mortality).
Perl provides the function sv_chop to efficiently remove characters
from the beginning of a string; you give it an SV and a pointer to
somewhere inside the PV, and it discards everything before the
pointer. The efficiency comes by means of a little hack: instead of
actually removing the characters, sv_chop sets the flag OOK
(offset OK) to signal to other functions that the offset hack is in
effect, and it puts the number of bytes chopped off into the IV field
of the SV. It then moves the PV pointer (called SvPVX) forward that
many bytes, and adjusts SvCUR and SvLEN.
Hence, at this point, the start of the buffer that we allocated lives
at SvPVX(sv) - SvIV(sv) in memory and the PV pointer is pointing
into the middle of this allocated storage.
This is best demonstrated by example:
% ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
SV = PVIV(0x8128450) at 0x81340f0
REFCNT = 1
FLAGS = (POK,OOK,pPOK)
IV = 1 (OFFSET)
PV = 0x8135781 ( "1" . ) "2345"\0
CUR = 4
LEN = 5
Here the number of bytes chopped off (1) is put into IV, and
Devel::Peek::Dump helpfully reminds us that this is an offset. The
portion of the string between the "real" and the "fake" beginnings is
shown in parentheses, and the values of SvCUR and SvLEN reflect
the fake beginning, not the real one.
Something similar to the offset hack is performed on AVs to enable
efficient shifting and splicing off the beginning of the array; while
AvARRAY points to the first element in the array that is visible from
Perl, AvALLOC points to the real start of the C array. These are
usually the same, but a shift operation can be carried out by
increasing AvARRAY by one and decreasing AvFILL and AvLEN.
Again, the location of the real start of the C array only comes into
play when freeing the array. See av_shift in av.c.
Recall that the usual method of determining the type of scalar you have is
to use Sv*OK macros. Because a scalar can be both a number and a string,
usually these macros will always return TRUE and calling the Sv*V
macros will do the appropriate conversion of string to integer/double or
integer/double to string.
If you really need to know if you have an integer, double, or string
pointer in an SV, you can use the following three macros instead:
SvIOKp(SV*)
SvNOKp(SV*)
SvPOKp(SV*)
These will tell you if you truly have an integer, double, or string pointer
stored in your SV. The "p" stands for private.
The are various ways in which the private and public flags may differ.
For example, a tied SV may have a valid underlying value in the IV slot
(so SvIOKp is true), but the data should be accessed via the FETCH
routine rather than directly, so SvIOK is false. Another is when
numeric conversion has occurred and precision has been lost: only the
private flag is set on 'lossy' values. So when an NV is converted to an
IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be.
In general, though, it's best to use the Sv*V macros.
There are two ways to create and load an AV. The first method creates an
empty AV:
AV* newAV();
The second method both creates the AV and initially populates it with SVs:
AV* av_make(I32 num, SV **ptr);
The second argument points to an array containing num SV*'s. Once the
AV has been created, the SVs can be destroyed, if so desired.
Once the AV has been created, the following operations are possible on AVs:
void av_push(AV*, SV*);
SV* av_pop(AV*);
SV* av_shift(AV*);
void av_unshift(AV*, I32 num);
These should be familiar operations, with the exception of av_unshift.
This routine adds num elements at the front of the array with the undef
value. You must then use av_store (described below) to assign values
to these new elements.
Here are some other functions:
I32 av_len(AV*);
SV** av_fetch(AV*, I32 key, I32 lval);
SV** av_store(AV*, I32 key, SV* val);
The av_len function returns the highest index value in array (just
like $#array in Perl). If the array is empty, -1 is returned. The
av_fetch function returns the value at index key, but if lval
is non-zero, then av_fetch will store an undef value at that index.
The av_store function stores the value val at index key, and does
not increment the reference count of val. Thus the caller is responsible
for taking care of that, and if av_store returns NULL, the caller will
have to decrement the reference count to avoid a memory leak. Note that
av_fetch and av_store both return SV**'s, not SV*'s as their
return value.
void av_clear(AV*);
void av_undef(AV*);
void av_extend(AV*, I32 key);
The av_clear function deletes all the elements in the AV* array, but
does not actually delete the array itself. The av_undef function will
delete all the elements in the array plus the array itself. The
av_extend function extends the array so that it contains at least key+1
elements. If key+1 is less than the currently allocated length of the array,
then nothing is done.
If you know the name of an array variable, you can get a pointer to its AV
by using the following:
AV* get_av("package::varname", FALSE);
This returns NULL if the variable does not exist.
See Understanding the Magic of Tied Hashes and Arrays for more
information on how to use the array access functions on tied arrays.
To create an HV, you use the following routine:
HV* newHV();
Once the HV has been created, the following operations are possible on HVs:
SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval);
The klen parameter is the length of the key being passed in (Note that
you cannot pass 0 in as a value of klen to tell Perl to measure the
length of the key). The val argument contains the SV pointer to the
scalar being stored, and hash is the precomputed hash value (zero if
you want hv_store to calculate it for you). The lval parameter
indicates whether this fetch is actually a part of a store operation, in
which case a new undefined value will be added to the HV with the supplied
key and hv_fetch will return as if the value had already existed.
Remember that hv_store and hv_fetch return SV**'s and not just
SV*. To access the scalar value, you must first dereference the return
value. However, you should check to make sure that the return value is
not NULL before dereferencing it.
These two functions check if a hash table entry exists, and deletes it.
bool hv_exists(HV*, const char* key, U32 klen);
SV* hv_delete(HV*, const char* key, U32 klen, I32 flags);
If flags does not include the G_DISCARD flag then hv_delete will
create and return a mortal copy of the deleted value.
And more miscellaneous functions:
void hv_clear(HV*);
void hv_undef(HV*);
Like their AV counterparts, hv_clear deletes all the entries in the hash
table but does not actually delete the hash table. The hv_undef deletes
both the entries and the hash table itself.
Perl keeps the actual data in linked list of structures with a typedef of HE.
These contain the actual key and value pointers (plus extra administrative
overhead). The key is a string pointer; the value is an SV*. However,
once you have an HE*, to get the actual key and value, use the routines
specified below.
I32 hv_iterinit(HV*);
/* Prepares starting point to traverse hash table */
HE* hv_iternext(HV*);
/* Get the next entry, and return a pointer to a
structure that has both the key and value */
char* hv_iterkey(HE* entry, I32* retlen);
/* Get the key from an HE structure and also return
the length of the key string */
SV* hv_iterval(HV*, HE* entry);
/* Return an SV pointer to the value of the HE
structure */
SV* hv_iternextsv(HV*, char** key, I32* retlen);
/* This convenience routine combines hv_iternext,
hv_iterkey, and hv_iterval. The key and retlen
arguments are return values for the key and its
length. The value is returned in the SV* argument */
If you know the name of a hash variable, you can get a pointer to its HV
by using the following:
HV* get_hv("package::varname", FALSE);
This returns NULL if the variable does not exist.
The hash algorithm is defined in the PERL_HASH(hash, key, klen) macro:
hash = 0;
while (klen--)
hash = (hash * 33) + *key++;
hash = hash + (hash >> 5); /* after 5.6 */
The last step was added in version 5.6 to improve distribution of
lower bits in the resulting hash value.
See Understanding the Magic of Tied Hashes and Arrays for more
information on how to use the hash access functions o |