ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> python-list
python-list
Re: advanced regex, was: Re: scanf style parsing
by Skip Montanaro other posts by this author
Oct 5 2001 2:14AM messages near this date
scanf style parsing | Re: advanced regex, was: Re: scanf style parsing
Hans-Peter>  Well, yesterday, I tried to parse some simple hexdump,
    Hans-Peter>  produced by tcpdump -xs1500 port 80. The idea was, filter
    Hans-Peter>  the hexcodes, and display and 7 bit acsii codes like a
    Hans-Peter>  little advanced hex monitors do.

    Hans-Peter>  As I'm fairly new to advanced regex constructs, would
    Hans-Peter>  somebody enlight me, how to efficiently parse lines like:

    Hans-Peter>     2067 726f 7570 732e 2e2e 3c2f 613e 3c2f
    Hans-Peter>     666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c
    Hans-Peter>     7472 3e3c 7464 2062 6763 6f6c 6f72 3d23
    Hans-Peter>     6666 6363 3333 2063 6f6c 7370 616e 3d34
    Hans-Peter>     3e3c 494d 4720 6865 6967 6874 3d31 2073
    Hans-Peter>     7263 3d22 2f69 6d61 6765 732f 636c 6561
    Hans-Peter>     7264 6f74 2e67 6966 2220 7769 6474 683d
    Hans-Peter>     3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74
    Hans-Peter>     6162 6c65 3e3c 703e 3c66 6f6e 7420 7369
    Hans-Peter>     7a65 3d2d 313e 4172 6520 796f 7520 6120

    Hans-Peter>  with respect to varying column numbers. I will refrain to
    Hans-Peter>  show my stupid beginnings, but I wasn't able to get that
    Hans-Peter>  _one_ regex right, with all columns in matchobj.groups()
    Hans-Peter>  listed.

I'm not sure quite what you're looking for, but this data is so regular I
wouldn't use regular expressions to parse it (no pun intended).

Assuming the above stream is coming in on stdin and I wanted to display
any printable ASCII characters, I'd start with something like this:

    import sys

    for line in sys.stdin.readlines():
        line = line.strip()
        fields = line.split()
        printing = []
        for pair in fields:
            first = chr(int(pair[:2], 16))
            second = chr(int(pair[2:], 16))
            if first < " " or first >  "~":
                first = "."
            if second < " " or second >  "~":
                second = "."
            printing.extend([first, second])
        print line, "".join(printing)

The above hex data fed to this code produces

    2067 726f 7570 732e 2e2e 3c2f 613e 3c2f  groups...</a> </
    666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c font> </td></tr><
    7472 3e3c 7464 2062 6763 6f6c 6f72 3d23 tr> <td bgcolor=#
    6666 6363 3333 2063 6f6c 7370 616e 3d34 ffcc33 colspan=4
    3e3c 494d 4720 6865 6967 6874 3d31 2073 > <IMG height=1 s
    7263 3d22 2f69 6d61 6765 732f 636c 6561 rc="/images/clea
    7264 6f74 2e67 6966 2220 7769 6474 683d rdot.gif" width=
    3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74 1 > </td></tr></t
    6162 6c65 3e3c 703e 3c66 6f6e 7420 7369 able> <p><font si
    7a65 3d2d 313e 4172 6520 796f 7520 6120 ze=-1> Are you a 

on stdout.

-- 
Skip Montanaro (skip@pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/

-- 
http://mail.python.org/mailman/listinfo/python-list
Thread:
Bruce Dawson
Skip Montanaro
George Demmy
Hans-Peter Jansen
Quinn Dunkan
Tim Hammerquist
Ralph Corderoy
Toby Dickenson
Duncan Booth
Aahz Maruch
Aahz Maruch
Aahz Maruch
Stefan Schwarzer
Grant Edwards
Fredrik Lundh

Malcolm Tredinnick
Ralph Corderoy
Tim Hammerquist
Stefan Schwarzer
Greg Ewing
Skip Montanaro
Boyd Roberts
Steve Clift
Bruce Dawson
Tim Hammerquist
Tim Hammerquist
Tim Hammerquist
Skip Montanaro
Andrew Dalke
Fredrik Lundh
Oleg Broytmann
Andrei Kulakov
Duncan Booth
Chris Barker
Tim Hammerquist
Skip Montanaro
Jon Nicoll
Oleg Broytmann
Bruce Dawson
Skip Montanaro
Andrei Kulakov
Richard Jones
Skip Montanaro
Andrew Dalke

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved