|
Description:
You want to access portions of a string. For example, you've read a fixed-width record and want to extract the fields.
Source: Text Source
afield = theline[3:8]
import struct
baseformat = "5s 3x 8s 8s"
numremain = len(theline)-struct.calcsize(baseformat)
format = "%s %ds" % (baseformat, numremain)
leading, s1, s2, trailing = struct.unpack(format, theline)
def fields(baseformat, theline, lastfield=None):
numremain = len(theline)-struct.calcsize(baseformat)
format = "%s %d%s" % (baseformat, numremain, lastfield and "s" or "x")
return struct.unpack(format, theline)
numfives, therest = divmod(len(theline), 5)
form5 = "%s %dx" % ("5s "*numfives, therest)
fivers = struct.unpack(form5, theline)
def split_by(theline, n, lastfield=None):
numblocks, therest = divmod(len(theline), n)
baseblock = "%d%s"%(n,lastfield and "s" or "x")
format = "%s %dx"%(baseblock*numblocks, therest)
chars = list(theline)
cuts = [8,14,20,26,30]
pieces = [ theline[i:j] for i, j in zip([0]+cuts, cuts+[sys.maxint]) ]
def split_at(theline, cuts, lastfield=None):
pieces = [ theline[i:j] for i, j in zip([0]+cuts, cuts) ]
if lastfield:
pieces.append(theline(cuts[-1]:))
return pieces
Discussion:
This recipe is inspired by O'Reilly's "Perl Cookbook" Recipe 1.1. Python's slicing takes the place of Perl's substr. Perl's unpack and Python's struct.unpack are rather similar, though Perl's is slightly handier as it accepts a "field-length" of "*" for the last field to mean "all the rest", while, in Python, we have to compute and insert the exact length for either extraction or skipping. This shouldn't be a major issue, since such extraction tasks will most often be encapsulated into small, probably-local functions (where "memoizing", aka automatic caching, may help a lot with performance if the function is called in a loop, to avoid repeating some computations).
In a purely-Python context, the point of this recipe is to remind that struct.unpack IS often a viable, and not rarely a preferable, alternative to slicing -- not quite as often as unpack vs substr in Perl, given the lack of a *-valued field-length, but often enough to be worth keeping in mind.
In the code as presented a decision worth noticing (and perhaps worth criticizing) is that of having a "lastfield=None" optional parameter to each of the encapsulation functions -- this reflects the observation that often we want to skip the last, unknown-length subfield, but often enough we want to retain it instead. The use of lastfield in the "cutesy" expression 'lastfield and "s" or "y"' (equivalent to C's "lastfield?'s':'c'") saves an 'if/else', but it's unclear whether the saving is worth the cuteness -- '"sx"[not lastfield]' and other similar alternatives being roughly equivalent in this respect. When lastfield is false, applying the struct.unpack to just a prefix of theline (specifically theline[:struct.calcsize(format)]) is an alternative, but that's not easy to merge with the case of lastfield being true, when the format does need a supplementary Nx field for some N=len(theline)-struct.calcsize(format).
Performance is not emphasized as crucial to any of these idioms, except for the reminder of memoizing as an often-useful technique. "Premature optimization is the root of all evil". Make your code CLEAR, SIMPLE, and SOLID, first, and worry about making it truly optimal only afterwards... *if at all* (most often, in real life, the clear, simple, solid solution will be fast enough!-).
|