ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> python-list
python-list
Re: advanced regex, was: Re: scanf style parsing
by George Demmy other posts by this author
Oct 4 2001 4:54PM messages near this date
Re: advanced regex, was: Re: scanf style parsing | advanced regex, was: Re: scanf style parsing
hpj@[...].net (Hans-Peter Jansen) writes:
>  Well, yesterday, I tried to parse some simple hexdump, produced by
>  tcpdump -xs1500 port 80. The idea was, filter the hexcodes, and display
>  and 7 bit acsii codes like a little advanced hex monitors do.
>  
>  As I'm fairly new to advanced regex constructs, would somebody enlight
>  me, how to efficiently parse lines like:
>  
>                   2067 726f 7570 732e 2e2e 3c2f 613e 3c2f
>                   666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c
>                   7472 3e3c 7464 2062 6763 6f6c 6f72 3d23
>                   6666 6363 3333 2063 6f6c 7370 616e 3d34
>                   3e3c 494d 4720 6865 6967 6874 3d31 2073
>                   7263 3d22 2f69 6d61 6765 732f 636c 6561
>                   7264 6f74 2e67 6966 2220 7769 6474 683d
>                   3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74
>                   6162 6c65 3e3c 703e 3c66 6f6e 7420 7369
>                   7a65 3d2d 313e 4172 6520 796f 7520 6120
>  
>  with respect to varying column numbers. I will refrain to 
>  show my stupid beginnings, but I wasn't able to get that _one_
>  regex right, with all columns in matchobj.groups() listed.
>  
>  new-in-regexing-ly, yr's
>  Hans-Peter
>  
>  P.S.: I ended up in a "simple" c based filter...
>  Please CC me

Hi Hans-Peter,

You're asking how to use a regex to parse your hexdump, with an eye
towards displaying the ascii representation. I don't know if regex is
what you want to do the latter. Here is some example code that might
help.

import re

hexpat = re.compile ('[a-f0-9]{4}')

# your first line of the hexdump, stripped

line = '2067 726f 7570 732e 2e2e 3c2f 613e 3c2fp'
hexpat.search (line).span ()

->  (0, 4)

hexpat.search (line[4:])

->  (1, 5)



As to the getting your ascii...

import operator

def hex2ascii (hexstr):
    """hex2ascii (hexstr) ->  ascii rep of 4 character hex string"""
    # error checking here, please!
    return chr (int (hexstr[:2], 16)) + chr (int (hexstr[2:], 16))

# slurp your hexdump by line (your example is stored in hexdat, by line)
# stripping off the leading whitespace

hexdat = map (lambda x: x.strip (), open ("dumpfile").readlines ())

for i in hexdat:
    print i, reduce (operator.add, map (hex2ascii, i.split ()))

-> 
2067 726f 7570 732e 2e2e 3c2f 613e 3c2f  groups...</a> </
666f 6e74 3e3c 2f74 643e 3c2f 7472 3e3c font> </td></tr><
7472 3e3c 7464 2062 6763 6f6c 6f72 3d23 tr> <td bgcolor=#
6666 6363 3333 2063 6f6c 7370 616e 3d34 ffcc33 colspan=4
3e3c 494d 4720 6865 6967 6874 3d31 2073 > <IMG height=1 s
7263 3d22 2f69 6d61 6765 732f 636c 6561 rc="/images/clea
7264 6f74 2e67 6966 2220 7769 6474 683d rdot.gif" width=
3120 3e3c 2f74 643e 3c2f 7472 3e3c 2f74 1 > </td></tr></t
6162 6c65 3e3c 703e 3c66 6f6e 7420 7369 able> <p><font si
7a65 3d2d 313e 4172 6520 796f 7520 6120 ze=-1> Are you a 

Hope this helps, and critique most welcome...

G
-- 
George Demmy
Layton Graphics, Inc
Marietta, Georgia

-- 
http://mail.python.org/mailman/listinfo/python-list
Thread:
Bruce Dawson
Skip Montanaro
George Demmy
Hans-Peter Jansen
Quinn Dunkan
Tim Hammerquist
Ralph Corderoy
Toby Dickenson
Duncan Booth
Aahz Maruch
Aahz Maruch
Aahz Maruch
Stefan Schwarzer
Grant Edwards
Fredrik Lundh

Malcolm Tredinnick
Ralph Corderoy
Tim Hammerquist
Stefan Schwarzer
Greg Ewing
Skip Montanaro
Boyd Roberts
Steve Clift
Bruce Dawson
Tim Hammerquist
Tim Hammerquist
Tim Hammerquist
Skip Montanaro
Andrew Dalke
Fredrik Lundh
Oleg Broytmann
Andrei Kulakov
Duncan Booth
Chris Barker
Tim Hammerquist
Skip Montanaro
Jon Nicoll
Oleg Broytmann
Bruce Dawson
Skip Montanaro
Andrei Kulakov
Richard Jones
Skip Montanaro
Andrew Dalke

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved