|
Simulating scanf()
Python does not currently have an equivalent to scanf().
Regular expressions are generally more powerful, though also more
verbose, than scanf() format strings. The table below
offers some more-or-less equivalent mappings between
scanf() format tokens and regular expressions.
%c |
. |
%5c |
.{5} |
%d |
[-+]?\d+ |
%e, %E, %f, %g |
[-+]?(\d+(\.\d*)?|\d*\.\d+)([eE][-+]?\d+)? |
%i |
[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+) |
%o |
0[0-7]* |
%s |
\S+ |
%u |
\d+ |
%x, %X |
0[xX][\dA-Fa-f]+ |
To extract the filename and numbers from a string like
/usr/sbin/sendmail - 0 errors, 4 warnings
you would use a scanf() format like
%s - %d errors, %d warnings
The equivalent regular expression would be
(\S+) - (\d+) errors, (\d+) warnings
Avoiding recursion
If you create regular expressions that require the engine to perform a
lot of recursion, you may encounter a RuntimeError exception with
the message maximum recursion limit exceeded. For example,
>>> import re
>>> s = 'Begin ' + 1000*'a very long string ' + 'end'
>>> re.match('Begin (\w| )*? end', s).end()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.3/sre.py", line 132, in match
return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded
You can often restructure your regular expression to avoid recursion.
Starting with Python 2.3, simple uses of the *? pattern are
special-cased to avoid recursion. Thus, the above regular expression
can avoid recursion by being recast as
Begin [a-zA-Z0-9_ ]*?end. As a further benefit, such regular
expressions will run faster than their recursive equivalents.
See About this document... for information on suggesting changes.
|