|
DB_File - Perl5 access to Berkeley DB version 1.x
use DB_File;
[$X =] tie %hash, 'DB_File', [$filename, $flags, $mode, $DB_HASH] ;
[$X =] tie %hash, 'DB_File', $filename, $flags, $mode, $DB_BTREE ;
[$X =] tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO ;
$status = $X->del($key [, $flags]) ;
$status = $X->put($key, $value [, $flags]) ;
$status = $X->get($key, $value [, $flags]) ;
$status = $X->seq($key, $value, $flags) ;
$status = $X->sync([$flags]) ;
$status = $X->fd ;
$count = $X->get_dup($key) ;
@list = $X->get_dup($key) ;
%list = $X->get_dup($key, 1) ;
$status = $X->find_dup($key, $value) ;
$status = $X->del_dup($key, $value) ;
$a = $X->length;
$a = $X->pop ;
$X->push(list);
$a = $X->shift;
$X->unshift(list);
@r = $X->splice(offset, length, elements);
$old_filter = $db->filter_store_key ( sub { ... } ) ;
$old_filter = $db->filter_store_value( sub { ... } ) ;
$old_filter = $db->filter_fetch_key ( sub { ... } ) ;
$old_filter = $db->filter_fetch_value( sub { ... } ) ;
untie %hash ;
untie @array ;
DB_File is a module which allows Perl programs to make use of the
facilities provided by Berkeley DB version 1.x (if you have a newer
version of DB, see Using DB_File with Berkeley DB version 2 or greater).
It is assumed that you have a copy of the Berkeley DB manual pages at
hand when reading this documentation. The interface defined here
mirrors the Berkeley DB interface closely.
Berkeley DB is a C library which provides a consistent interface to a
number of database formats. DB_File provides an interface to all
three of the database types currently supported by Berkeley DB.
The file types are:
- DB_HASH
-
This database type allows arbitrary key/value pairs to be stored in data
files. This is equivalent to the functionality provided by other
hashing packages like DBM, NDBM, ODBM, GDBM, and SDBM. Remember though,
the files created using DB_HASH are not compatible with any of the
other packages mentioned.
-
A default hashing algorithm, which will be adequate for most
applications, is built into Berkeley DB. If you do need to use your own
hashing algorithm it is possible to write your own in Perl and have
DB_File use it instead.
- DB_BTREE
-
The btree format allows arbitrary key/value pairs to be stored in a
sorted, balanced binary tree.
-
As with the DB_HASH format, it is possible to provide a user defined
Perl routine to perform the comparison of keys. By default, though, the
keys are stored in lexical order.
- DB_RECNO
-
DB_RECNO allows both fixed-length and variable-length flat text files
to be manipulated using the same key/value pair interface as in DB_HASH
and DB_BTREE. In this case the key will consist of a record (line)
number.
Although DB_File is intended to be used with Berkeley DB version 1,
it can also be used with version 2, 3 or 4. In this case the interface is
limited to the functionality provided by Berkeley DB 1.x. Anywhere the
version 2 or greater interface differs, DB_File arranges for it to work
like version 1. This feature allows DB_File scripts that were built
with version 1 to be migrated to version 2 or greater without any changes.
If you want to make use of the new features available in Berkeley DB
2.x or greater, use the Perl module BerkeleyDB instead.
Note: The database file format has changed multiple times in Berkeley
DB version 2, 3 and 4. If you cannot recreate your databases, you
must dump any existing databases with either the db_dump or the
db_dump185 utility that comes with Berkeley DB.
Once you have rebuilt DB_File to use Berkeley DB version 2 or greater,
your databases can be recreated using db_load. Refer to the Berkeley DB
documentation for further details.
Please read COPYRIGHT before using version 2.x or greater of Berkeley
DB with DB_File.
DB_File allows access to Berkeley DB files using the tie() mechanism
in Perl 5 (for full details, see tie() in the perlfunc manpage). This facility
allows DB_File to access Berkeley DB files using either an
associative array (for DB_HASH & DB_BTREE file types) or an ordinary
array (for the DB_RECNO file type).
In addition to the tie() interface, it is also possible to access most
of the functions provided in the Berkeley DB API directly.
See THE API INTERFACE.
Berkeley DB uses the function dbopen() to open or create a database.
Here is the C prototype for dbopen():
DB*
dbopen (const char * file, int flags, int mode,
DBTYPE type, const void * openinfo)
The parameter type is an enumeration which specifies which of the 3
interface methods (DB_HASH, DB_BTREE or DB_RECNO) is to be used.
Depending on which of these is actually chosen, the final parameter,
openinfo points to a data structure which allows tailoring of the
specific interface method.
This interface is handled slightly differently in DB_File. Here is
an equivalent call using DB_File:
tie %array, 'DB_File', $filename, $flags, $mode, $DB_HASH ;
The filename, flags and mode parameters are the direct
equivalent of their dbopen() counterparts. The final parameter $DB_HASH
performs the function of both the type and openinfo parameters in
dbopen().
In the example above $DB_HASH is actually a pre-defined reference to a
hash object. DB_File has three of these pre-defined references.
Apart from $DB_HASH, there is also $DB_BTREE and $DB_RECNO.
The keys allowed in each of these pre-defined references is limited to
the names used in the equivalent C structure. So, for example, the
$DB_HASH reference will only allow keys called bsize, cachesize,
ffactor, hash, lorder and nelem.
To change one of these elements, just assign to it like this:
$DB_HASH->{'cachesize'} = 10000 ;
The three predefined variables $DB_HASH, $DB_BTREE and $DB_RECNO are
usually adequate for most applications. If you do need to create extra
instances of these objects, constructors are available for each file
type.
Here are examples of the constructors and the valid options available
for DB_HASH, DB_BTREE and DB_RECNO respectively.
$a = new DB_File::HASHINFO ;
$a->{'bsize'} ;
$a->{'cachesize'} ;
$a->{'ffactor'};
$a->{'hash'} ;
$a->{'lorder'} ;
$a->{'nelem'} ;
$b = new DB_File::BTREEINFO ;
$b->{'flags'} ;
$b->{'cachesize'} ;
$b->{'maxkeypage'} ;
$b->{'minkeypage'} ;
$b->{'psize'} ;
$b->{'compare'} ;
$b->{'prefix'} ;
$b->{'lorder'} ;
$c = new DB_File::RECNOINFO ;
$c->{'bval'} ;
$c->{'cachesize'} ;
$c->{'psize'} ;
$c->{'flags'} ;
$c->{'lorder'} ;
$c->{'reclen'} ;
$c->{'bfname'} ;
The values stored in the hashes above are mostly the direct equivalent
of their C counterpart. Like their C counterparts, all are set to a
default values - that means you don't have to set all of the
values when you only want to change one. Here is an example:
$a = new DB_File::HASHINFO ;
$a->{'cachesize'} = 12345 ;
tie %y, 'DB_File', "filename", $flags, 0777, $a ;
A few of the options need extra discussion here. When used, the C
equivalent of the keys hash, compare and prefix store pointers
to C functions. In DB_File these keys are used to store references
to Perl subs. Below are templates for each of the subs:
sub hash
{
my ($data) = @_ ;
...
return $hash ;
}
sub compare
{
my ($key, $key2) = @_ ;
...
return (-1 , 0 or 1) ;
}
sub prefix
{
my ($key, $key2) = @_ ;
...
return $bytes ;
}
See Changing the BTREE sort order for an example of using the
compare template.
If you are using the DB_RECNO interface and you intend making use of
bval, you should check out The 'bval' Option.
It is possible to omit some or all of the final 4 parameters in the
call to tie and let them take default values. As DB_HASH is the most
common file format used, the call:
tie %A, "DB_File", "filename" ;
is equivalent to:
tie %A, "DB_File", "filename", O_CREAT|O_RDWR, 0666, $DB_HASH ;
It is also possible to omit the filename parameter as well, so the
call:
tie %A, "DB_File" ;
is equivalent to:
tie %A, "DB_File", undef, O_CREAT|O_RDWR, 0666, $DB_HASH ;
See In Memory Databases for a discussion on the use of undef
in place of a filename.
Berkeley DB allows the creation of in-memory databases by using NULL
(that is, a (char *)0 in C) in place of the filename. DB_File
uses undef instead of NULL to provide this functionality.
The DB_HASH file format is probably the most commonly used of the three
file formats that DB_File supports. It is also very straightforward
to use.
This example shows how to create a database, add key/value pairs to the
database, delete keys/value pairs and finally how to enumerate the
contents of the database.
use warnings ;
use strict ;
use DB_File ;
our (%h, $k, $v) ;
unlink "fruit" ;
tie %h, "DB_File", "fruit", O_RDWR|O_CREAT, 0666, $DB_HASH
or die "Cannot open file 'fruit': $!\n";
$h{"apple"} = "red" ;
$h{"orange"} = "orange" ;
$h{"banana"} = "yellow" ;
$h{"tomato"} = "red" ;
print "Banana Exists\n\n" if $h{"banana"} ;
delete $h{"apple"} ;
while (($k, $v) = each %h)
{ print "$k -> $v\n" }
untie %h ;
here is the output:
Banana Exists
orange -> orange
tomato -> red
banana -> yellow
Note that the like ordinary associative arrays, the order of the keys
retrieved is in an apparently random order.
The DB_BTREE format is useful when you want to store data in a given
order. By default the keys will be stored in lexical order, but as you
will see from the example shown in the next section, it is very easy to
define your own sorting function.
This script shows how to override the default sorting algorithm that
BTREE uses. Instead of using the normal lexical ordering, a case
insensitive compare function will be used.
use warnings ;
use strict ;
use DB_File ;
my %h ;
sub Compare
{
my ($key1, $key2) = @_ ;
"\L$key1" cmp "\L$key2" ;
}
$DB_BTREE->{'compare'} = \&Compare ;
unlink "tree" ;
tie %h, "DB_File", "tree", O_RDWR|O_CREAT, 0666, $DB_BTREE
or die "Cannot open file 'tree': $!\n" ;
$h{'Wall'} = 'Larry' ;
$h{'Smith'} = 'John' ;
$h{'mouse'} = 'mickey' ;
$h{'duck'} = 'donald' ;
delete $h{"duck"} ;
foreach (keys %h)
{ print "$_\n" }
untie %h ;
Here is the output from the code above.
mouse
Smith
Wall
There are a few point to bear in mind if you want to change the
ordering in a BTREE database:
-
The new compare function must be specified when you create the database.
-
You cannot change the ordering once the database has been created. Thus
you must use the same compare function every time you access the
database.
-
Duplicate keys are entirely defined by the comparison function.
In the case-insensitive example above, the keys: 'KEY' and 'key'
would be considered duplicates, and assigning to the second one
would overwrite the first. If duplicates are allowed for (with the
R_DUP flag discussed below), only a single copy of duplicate keys
is stored in the database --- so (again with example above) assigning
three values to the keys: 'KEY', 'Key', and 'key' would leave just
the first key: 'KEY' in the database with three values. For some
situations this results in information loss, so care should be taken
to provide fully qualified comparison functions when necessary.
For example, the above comparison routine could be modified to
additionally compare case-sensitively if two keys are equal in the
case insensitive comparison:
sub compare {
my($key1, $key2) = @_;
lc $key1 cmp lc $key2 ||
$key1 cmp $key2;
}
And now you will only have duplicates when the keys themselves
are truly the same. (note: in versions of the db library prior to
about November 1996, such duplicate keys were retained so it was
possible to recover the original keys in sets of keys that
compared as equal).
The BTREE file type optionally allows a single key to be associated
with an arbitrary number of values. This option is enabled by setting
the flags element of $DB_BTREE to R_DUP when creating the database.
There are some difficulties in using the tied hash interface if you
want to manipulate a BTREE database with duplicate keys. Consider this
code:
use warnings ;
use strict ;
use DB_File ;
my ($filename, %h) ;
$filename = "tree" ;
unlink $filename ;
$DB_BTREE->{'flags'} = R_DUP ;
tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
or die "Cannot open $filename: $!\n";
$h{'Wall'} = 'Larry' ;
$h{'Wall'} = 'Brick' ;
$h{'Wall'} = 'Brick' ;
$h{'Smith'} = 'John' ;
$h{'mouse'} = 'mickey' ;
foreach (sort keys %h)
{ print "$_ -> $h{$_}\n" }
untie %h ;
Here is the output:
Smith -> John
Wall -> Larry
Wall -> Larry
Wall -> Larry
mouse -> mickey
As you can see 3 records have been successfully created with key Wall
- the only thing is, when they are retrieved from the database they
seem to have the same value, namely Larry. The problem is caused
by the way that the associative array interface works. Basically, when
the associative array interface is used to fetch the value associated
with a given key, it will only ever retrieve the first value.
Although it may not be immediately obvious from the code above, the
associative array interface can be used to write values with duplicate
keys, but it cannot be used to read them back from the database.
The way to get around this problem is to use the Berkeley DB API method
called seq. This method allows sequential access to key/value
pairs. See THE API INTERFACE for details of both the seq method
and the API in general.
Here is the script above rewritten using the seq API method.
use warnings ;
use strict ;
use DB_File ;
my ($filename, $x, %h, $status, $key, $value) ;
$filename = "tree" ;
unlink $filename ;
$DB_BTREE->{'flags'} = R_DUP ;
$x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE
or die "Cannot open $filename: $!\n";
$h{'Wall'} = 'Larry' ;
$h{'Wall'} = 'Brick' ;
$h{'Wall'} = 'Brick' ;
$h{'Smith'} = 'John' ;
$h{'mouse'} = 'mickey' ;
$key = $value = 0 ;
for ($status = $x->seq($key, $value, R_FIRST) ;
$status == 0 ;
$status = $x->seq($key, $value, R_NEXT) )
{ print "$key -> $value\n" }
undef $x ;
untie %h ;
that prints:
Smith -> John
Wall -> Brick
Wall -> Brick
Wall -> Larry
mouse -> mickey
This time we have got all the key/value pairs, including the multiple
values associated with the key Wall.
To make life easier when dealing with duplicate keys, DB_File comes with
a few utility methods.
The get_dup method assists in
reading duplicate values from BTREE databases. The method can take the
following forms:
$count = $x->get_dup($key) ;
@list = $x->get_dup($key) ; |