Welcome, guest | Sign In | My Account | Store | Cart

A persistent, lazy, caching, dictionary, using the anydbm module for persistence. Keys must be basic strings (this is an anydbm limitation) and values must be pickle-able objects.

Python, 43 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import anydbm
import cPickle as pickle

class pdict(object):
    """ A persistent, lazy, caching, dictionary, using the anydbm module for 
    persistence. Keys must be basic strings (this is an anydbm limitation) and
    values must be pickle-able objects. """
    def __init__(self, file, mode):
        """ Create new pdict using file. mode is passed to anydbm.open(). """
        self._cache = {}
        self._flush = {}
        self._dbm = anydbm.open(file, mode)

    def __contains__(self, key):
        return key in self._cache or key in self._dbm

    def __getitem__(self, key):
        if key in self._cache:
            return self._cache[key]
        return self._cache.setdefault(key, pickle.loads(self._dbm[key]))

    def __setitem__(self, key, value):
        self._cache[key] = self._flush[key] = value

    def __delitem__(self, key):
        found = False
        for data in (self._cache, self._flush, self._dbm):
            if key in data:
                del data[key]
                found = True
        if not found:
            raise KeyError(key)

    def keys(self):
        keys = set(self._cache.keys())
        keys.update(self._dbm.keys())
        return keys

    def sync(self):
        for key, value in self._flush.iteritems():
            self._dbm[key] = pickle.dumps(value, 2)
        self._dbm.sync()
        self._flush = {}

This class is meant for storing large datasets. Values are loaded lazily from the anydbm module and cached for performance. Similarly, modified values are not written back to the anydbm database until a sync() is performed.

The class is quite basic, and deliberately doesn't implement all dictionary behaviour. eg. values(), items(), etc. would defeat the purpose of having lazy fetching.

Features that might be useful are a limit on the flush/cache elements to avoid excessive memory use, support for arbitrary key types using pickle, and probably more. I think you get the idea.

1 comment

Darshan Hegde 9 years, 8 months ago  # | flag

Thanks for the recipe.

I couldn't find sync() method in anydbm ( python-2.7.8 : https://docs.python.org/2/library/anydbm.html )

If I switch to dumbdbm it works just fine :)