I often return result sets from a database call using a list of dictionary objects. When transmitting the pickled list object over the wire, the size of the pickle greatly effects the speed of the transmission.
I wrote this small class to emulate a list of dictionary objects without the memory and pickle storage overhead which occurs when storing every item in the list as a dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | #!/usr/bin/python2.4
import types
class Table(object):
"""A structure which implements a list of dict's."""
def __init__(self, *args):
self.columns = args
self.rows = []
def _createRow(self, k,v):
return dict(zip(k, v))
def append(self, row):
if type(row) == types.DictType:
row = [row[x] for x in self.columns]
row = tuple(row)
if len(row) != len(self.columns):
raise TypeError, 'Row must contain %d elements.' % len(self.columns)
self.rows.append(row)
def __iter__(self):
for row in self.rows:
yield self._createRow(self.columns, row)
def __getitem__(self, i):
return self._createRow(self.columns, self.rows[i])
def __setitem__(self, i, row):
if type(row) == types.DictType:
row = [row[x] for x in self.columns]
row = tuple(row)
if len(row) != len(self.columns):
raise TypeError, 'Row must contain %d elements.' % len(self.columns)
self.rows[i] = row
def __repr__(self):
return ("<" + self.__class__.__name__ + " object at 0x" + str(id(self))
+ " " + str(self.columns) + ", %d rows.>" % len(self.rows))
if __name__ == "__main__":
import pickle
t = Table("a","b","c")
for i in xrange(10000):
t.append((1,2,3))
print "Table size when pickled:",len(pickle.dumps(t))
t = []
for i in xrange(10000):
t.append({"a":1,"b":2,"c":3})
print "List size when pickled: ",len(pickle.dumps(t))
|
The size of the pickled table is reduced by ~ 50%, which provides worthwhile speedups when sending the pickle over a slow or busy connection.
It works just like a regular list of dictionaries, except that the dictionary returned by the __getitem__ or __iter__ calls is generated dynamically. It does not support slices, but this could easily be added when needed.
2004/11/09: Changed .__setitem__ and .append so that lists, tuples, or dicts can be inserted.
dbrow. dbrow does something like this, though it probably wouldn't work in a pickle. But it is memory efficient and speed efficient. Available at:
http://opensource.theopalgroup.com/