ASPN ActiveState Programmer Network
  ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups | Web Services
SEARCH
advanced | search help

Reference
ActivePerl 5.8
Modules
ActivePerl
ActiveState
AnyDBM File
Apache
Archive
Attribute
AutoLoader
AutoSplit
B
Benchmark
Bundle
ByteLoader
Carp
CGI
Class
Compress
Config
CPAN
Cwd
Data
DB
DBD
DBI
DBM Filter
DB File
Devel
Digest
DirHandle
Dumpvalue
DynaLoader
Encode
English
Env
Errno
Exporter
ExtUtils
Fatal
Fcntl
File
FileCache
FileHandle
Filter
FindBin
Font
GDBM File
Getopt
Hash
HTML
HTTP
I18N
IO
IPC
List
Locale
LWP
lwpcook
lwptut
Mac
MacPerl
Math
MD5
Memoize
MIME
MLDBM
Module
NDBM File
Net
NEXT
O
Opcode
perl5db
PerlEx
PerlIO
perllocal
Pod
POSIX
Roadmap
Safe
Scalar
SDBM File
Search
SelectSaver
SelfLoader
Shell
SOAP
Socket
Storable
Switch
Symbol
Sys
TASKS
Tcl
Term
Test
Text
Thread
Tie
Time
Tk
Tkx
UDDI
Unicode
UNIVERSAL
URI
User
Win32
Win32API
Win32CORE
WWW
XML
XMLRPC
XSLoader

MyASPN >> Reference >> ActivePerl 5.8 >> Modules
ActivePerl 5.8 documentation

DB_File - Perl5 access to Berkeley DB version 1.x


NAME

DB_File - Perl5 access to Berkeley DB version 1.x


SYNOPSIS

 use DB_File;
 [$X =] tie %hash,  'DB_File', [$filename, $flags, $mode, $DB_HASH] ;
 [$X =] tie %hash,  'DB_File', $filename, $flags, $mode, $DB_BTREE ;
 [$X =] tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO ;
 $status = $X->del($key [, $flags]) ;
 $status = $X->put($key, $value [, $flags]) ;
 $status = $X->get($key, $value [, $flags]) ;
 $status = $X->seq($key, $value, $flags) ;
 $status = $X->sync([$flags]) ;
 $status = $X->fd ;
 # BTREE only
 $count = $X->get_dup($key) ;
 @list  = $X->get_dup($key) ;
 %list  = $X->get_dup($key, 1) ;
 $status = $X->find_dup($key, $value) ;
 $status = $X->del_dup($key, $value) ;
 # RECNO only
 $a = $X->length;
 $a = $X->pop ;
 $X->push(list);
 $a = $X->shift;
 $X->unshift(list);
 @r = $X->splice(offset, length, elements);
 # DBM Filters
 $old_filter = $db->filter_store_key  ( sub { ... } ) ;
 $old_filter = $db->filter_store_value( sub { ... } ) ;
 $old_filter = $db->filter_fetch_key  ( sub { ... } ) ;
 $old_filter = $db->filter_fetch_value( sub { ... } ) ;
 untie %hash ;
 untie @array ;


DESCRIPTION

DB_File is a module which allows Perl programs to make use of the facilities provided by Berkeley DB version 1.x (if you have a newer version of DB, see Using DB_File with Berkeley DB version 2 or greater). It is assumed that you have a copy of the Berkeley DB manual pages at hand when reading this documentation. The interface defined here mirrors the Berkeley DB interface closely.

Berkeley DB is a C library which provides a consistent interface to a number of database formats. DB_File provides an interface to all three of the database types currently supported by Berkeley DB.

The file types are:

DB_HASH

This database type allows arbitrary key/value pairs to be stored in data files. This is equivalent to the functionality provided by other hashing packages like DBM, NDBM, ODBM, GDBM, and SDBM. Remember though, the files created using DB_HASH are not compatible with any of the other packages mentioned.

A default hashing algorithm, which will be adequate for most applications, is built into Berkeley DB. If you do need to use your own hashing algorithm it is possible to write your own in Perl and have DB_File use it instead.

DB_BTREE

The btree format allows arbitrary key/value pairs to be stored in a sorted, balanced binary tree.

As with the DB_HASH format, it is possible to provide a user defined Perl routine to perform the comparison of keys. By default, though, the keys are stored in lexical order.

DB_RECNO

DB_RECNO allows both fixed-length and variable-length flat text files to be manipulated using the same key/value pair interface as in DB_HASH and DB_BTREE. In this case the key will consist of a record (line) number.

Using DB_File with Berkeley DB version 2 or greater

Although DB_File is intended to be used with Berkeley DB version 1, it can also be used with version 2, 3 or 4. In this case the interface is limited to the functionality provided by Berkeley DB 1.x. Anywhere the version 2 or greater interface differs, DB_File arranges for it to work like version 1. This feature allows DB_File scripts that were built with version 1 to be migrated to version 2 or greater without any changes.

If you want to make use of the new features available in Berkeley DB 2.x or greater, use the Perl module BerkeleyDB instead.

Note: The database file format has changed multiple times in Berkeley DB version 2, 3 and 4. If you cannot recreate your databases, you must dump any existing databases with either the db_dump or the db_dump185 utility that comes with Berkeley DB. Once you have rebuilt DB_File to use Berkeley DB version 2 or greater, your databases can be recreated using db_load. Refer to the Berkeley DB documentation for further details.

Please read COPYRIGHT before using version 2.x or greater of Berkeley DB with DB_File.

Interface to Berkeley DB

DB_File allows access to Berkeley DB files using the tie() mechanism in Perl 5 (for full details, see tie() in the perlfunc manpage). This facility allows DB_File to access Berkeley DB files using either an associative array (for DB_HASH & DB_BTREE file types) or an ordinary array (for the DB_RECNO file type).

In addition to the tie() interface, it is also possible to access most of the functions provided in the Berkeley DB API directly. See THE API INTERFACE.

Opening a Berkeley DB Database File

Berkeley DB uses the function dbopen() to open or create a database. Here is the C prototype for dbopen():

      DB*
      dbopen (const char * file, int flags, int mode, 
              DBTYPE type, const void * openinfo)

The parameter type is an enumeration which specifies which of the 3 interface methods (DB_HASH, DB_BTREE or DB_RECNO) is to be used. Depending on which of these is actually chosen, the final parameter, openinfo points to a data structure which allows tailoring of the specific interface method.

This interface is handled slightly differently in DB_File. Here is an equivalent call using DB_File:

        tie %array, 'DB_File', $filename, $flags, $mode, $DB_HASH ;

The filename, flags and mode parameters are the direct equivalent of their dbopen() counterparts. The final parameter $DB_HASH performs the function of both the type and openinfo parameters in dbopen().

In the example above $DB_HASH is actually a pre-defined reference to a hash object. DB_File has three of these pre-defined references. Apart from $DB_HASH, there is also $DB_BTREE and $DB_RECNO.

The keys allowed in each of these pre-defined references is limited to the names used in the equivalent C structure. So, for example, the $DB_HASH reference will only allow keys called bsize, cachesize, ffactor, hash, lorder and nelem.

To change one of these elements, just assign to it like this:

        $DB_HASH->{'cachesize'} = 10000 ;

The three predefined variables $DB_HASH, $DB_BTREE and $DB_RECNO are usually adequate for most applications. If you do need to create extra instances of these objects, constructors are available for each file type.

Here are examples of the constructors and the valid options available for DB_HASH, DB_BTREE and DB_RECNO respectively.

     $a = new DB_File::HASHINFO ;
     $a->{'bsize'} ;
     $a->{'cachesize'} ;
     $a->{'ffactor'};
     $a->{'hash'} ;
     $a->{'lorder'} ;
     $a->{'nelem'} ;
     $b = new DB_File::BTREEINFO ;
     $b->{'flags'} ;
     $b->{'cachesize'} ;
     $b->{'maxkeypage'} ;
     $b->{'minkeypage'} ;
     $b->{'psize'} ;
     $b->{'compare'} ;
     $b->{'prefix'} ;
     $b->{'lorder'} ;
     $c = new DB_File::RECNOINFO ;
     $c->{'bval'} ;
     $c->{'cachesize'} ;
     $c->{'psize'} ;
     $c->{'flags'} ;
     $c->{'lorder'} ;
     $c->{'reclen'} ;
     $c->{'bfname'} ;

The values stored in the hashes above are mostly the direct equivalent of their C counterpart. Like their C counterparts, all are set to a default values - that means you don't have to set all of the values when you only want to change one. Here is an example:

     $a = new DB_File::HASHINFO ;
     $a->{'cachesize'} =  12345 ;
     tie %y, 'DB_File', "filename", $flags, 0777, $a ;

A few of the options need extra discussion here. When used, the C equivalent of the keys hash, compare and prefix store pointers to C functions. In DB_File these keys are used to store references to Perl subs. Below are templates for each of the subs:

    sub hash
    {
        my ($data) = @_ ;
        ...
        # return the hash value for $data
        return $hash ;
    }
    sub compare
    {
        my ($key, $key2) = @_ ;
        ...
        # return  0 if $key1 eq $key2
        #        -1 if $key1 lt $key2
        #         1 if $key1 gt $key2
        return (-1 , 0 or 1) ;
    }
    sub prefix
    {
        my ($key, $key2) = @_ ;
        ...
        # return number of bytes of $key2 which are 
        # necessary to determine that it is greater than $key1
        return $bytes ;
    }

See Changing the BTREE sort order for an example of using the compare template.

If you are using the DB_RECNO interface and you intend making use of bval, you should check out The 'bval' Option.

Default Parameters

It is possible to omit some or all of the final 4 parameters in the call to tie and let them take default values. As DB_HASH is the most common file format used, the call:

    tie %A, "DB_File", "filename" ;

is equivalent to:

    tie %A, "DB_File", "filename", O_CREAT|O_RDWR, 0666, $DB_HASH ;

It is also possible to omit the filename parameter as well, so the call:

    tie %A, "DB_File" ;

is equivalent to:

    tie %A, "DB_File", undef, O_CREAT|O_RDWR, 0666, $DB_HASH ;

See In Memory Databases for a discussion on the use of undef in place of a filename.

In Memory Databases

Berkeley DB allows the creation of in-memory databases by using NULL (that is, a (char *)0 in C) in place of the filename. DB_File uses undef instead of NULL to provide this functionality.


DB_HASH

The DB_HASH file format is probably the most commonly used of the three file formats that DB_File supports. It is also very straightforward to use.

A Simple Example

This example shows how to create a database, add key/value pairs to the database, delete keys/value pairs and finally how to enumerate the contents of the database.

    use warnings ;
    use strict ;
    use DB_File ;
    our (%h, $k, $v) ;
    unlink "fruit" ;
    tie %h, "DB_File", "fruit", O_RDWR|O_CREAT, 0666, $DB_HASH 
        or die "Cannot open file 'fruit': $!\n";
    # Add a few key/value pairs to the file
    $h{"apple"} = "red" ;
    $h{"orange"} = "orange" ;
    $h{"banana"} = "yellow" ;
    $h{"tomato"} = "red" ;
    # Check for existence of a key
    print "Banana Exists\n\n" if $h{"banana"} ;
    # Delete a key/value pair.
    delete $h{"apple"} ;
    # print the contents of the file
    while (($k, $v) = each %h)
      { print "$k -> $v\n" }
    untie %h ;

here is the output:

    Banana Exists
    orange -> orange
    tomato -> red
    banana -> yellow

Note that the like ordinary associative arrays, the order of the keys retrieved is in an apparently random order.


DB_BTREE

The DB_BTREE format is useful when you want to store data in a given order. By default the keys will be stored in lexical order, but as you will see from the example shown in the next section, it is very easy to define your own sorting function.

Changing the BTREE sort order

This script shows how to override the default sorting algorithm that BTREE uses. Instead of using the normal lexical ordering, a case insensitive compare function will be used.

    use warnings ;
    use strict ;
    use DB_File ;
    my %h ;
    sub Compare
    {
        my ($key1, $key2) = @_ ;
        "\L$key1" cmp "\L$key2" ;
    }
    # specify the Perl sub that will do the comparison
    $DB_BTREE->{'compare'} = \&Compare ;
    unlink "tree" ;
    tie %h, "DB_File", "tree", O_RDWR|O_CREAT, 0666, $DB_BTREE 
        or die "Cannot open file 'tree': $!\n" ;
    # Add a key/value pair to the file
    $h{'Wall'} = 'Larry' ;
    $h{'Smith'} = 'John' ;
    $h{'mouse'} = 'mickey' ;
    $h{'duck'}  = 'donald' ;
    # Delete
    delete $h{"duck"} ;
    # Cycle through the keys printing them in order.
    # Note it is not necessary to sort the keys as
    # the btree will have kept them in order automatically.
    foreach (keys %h)
      { print "$_\n" }
    untie %h ;

Here is the output from the code above.

    mouse
    Smith
    Wall

There are a few point to bear in mind if you want to change the ordering in a BTREE database:

  1. The new compare function must be specified when you create the database.

  2. You cannot change the ordering once the database has been created. Thus you must use the same compare function every time you access the database.

  3. Duplicate keys are entirely defined by the comparison function. In the case-insensitive example above, the keys: 'KEY' and 'key' would be considered duplicates, and assigning to the second one would overwrite the first. If duplicates are allowed for (with the R_DUP flag discussed below), only a single copy of duplicate keys is stored in the database --- so (again with example above) assigning three values to the keys: 'KEY', 'Key', and 'key' would leave just the first key: 'KEY' in the database with three values. For some situations this results in information loss, so care should be taken to provide fully qualified comparison functions when necessary. For example, the above comparison routine could be modified to additionally compare case-sensitively if two keys are equal in the case insensitive comparison:

        sub compare {
            my($key1, $key2) = @_;
            lc $key1 cmp lc $key2 ||
            $key1 cmp $key2;
        }
    

    And now you will only have duplicates when the keys themselves are truly the same. (note: in versions of the db library prior to about November 1996, such duplicate keys were retained so it was possible to recover the original keys in sets of keys that compared as equal).

Handling Duplicate Keys

The BTREE file type optionally allows a single key to be associated with an arbitrary number of values. This option is enabled by setting the flags element of $DB_BTREE to R_DUP when creating the database.

There are some difficulties in using the tied hash interface if you want to manipulate a BTREE database with duplicate keys. Consider this code:

    use warnings ;
    use strict ;
    use DB_File ;
    my ($filename, %h) ;
    $filename = "tree" ;
    unlink $filename ;
    # Enable duplicate records
    $DB_BTREE->{'flags'} = R_DUP ;
    tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE 
        or die "Cannot open $filename: $!\n";
    # Add some key/value pairs to the file
    $h{'Wall'} = 'Larry' ;
    $h{'Wall'} = 'Brick' ; # Note the duplicate key
    $h{'Wall'} = 'Brick' ; # Note the duplicate key and value
    $h{'Smith'} = 'John' ;
    $h{'mouse'} = 'mickey' ;
    # iterate through the associative array
    # and print each key/value pair.
    foreach (sort keys %h)
      { print "$_  -> $h{$_}\n" }
    untie %h ;

Here is the output:

    Smith   -> John
    Wall    -> Larry
    Wall    -> Larry
    Wall    -> Larry
    mouse   -> mickey

As you can see 3 records have been successfully created with key Wall - the only thing is, when they are retrieved from the database they seem to have the same value, namely Larry. The problem is caused by the way that the associative array interface works. Basically, when the associative array interface is used to fetch the value associated with a given key, it will only ever retrieve the first value.

Although it may not be immediately obvious from the code above, the associative array interface can be used to write values with duplicate keys, but it cannot be used to read them back from the database.

The way to get around this problem is to use the Berkeley DB API method called seq. This method allows sequential access to key/value pairs. See THE API INTERFACE for details of both the seq method and the API in general.

Here is the script above rewritten using the seq API method.

    use warnings ;
    use strict ;
    use DB_File ;
    my ($filename, $x, %h, $status, $key, $value) ;
    $filename = "tree" ;
    unlink $filename ;
    # Enable duplicate records
    $DB_BTREE->{'flags'} = R_DUP ;
    $x = tie %h, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE 
        or die "Cannot open $filename: $!\n";
    # Add some key/value pairs to the file
    $h{'Wall'} = 'Larry' ;
    $h{'Wall'} = 'Brick' ; # Note the duplicate key
    $h{'Wall'} = 'Brick' ; # Note the duplicate key and value
    $h{'Smith'} = 'John' ;
    $h{'mouse'} = 'mickey' ;
    # iterate through the btree using seq
    # and print each key/value pair.
    $key = $value = 0 ;
    for ($status = $x->seq($key, $value, R_FIRST) ;
         $status == 0 ;
         $status = $x->seq($key, $value, R_NEXT) )
      {  print "$key -> $value\n" }
    undef $x ;
    untie %h ;

that prints:

    Smith   -> John
    Wall    -> Brick
    Wall    -> Brick
    Wall    -> Larry
    mouse   -> mickey

This time we have got all the key/value pairs, including the multiple values associated with the key Wall.

To make life easier when dealing with duplicate keys, DB_File comes with a few utility methods.

The get_dup() Method

The get_dup method assists in reading duplicate values from BTREE databases. The method can take the following forms:

    $count = $x->get_dup($key) ;
    @list  = $x->get_dup($key) ;