Re: uri regex
by Craig Jackson other posts by this author
Jun 15 2005 10:21AM messages near this date
Re: uri regex
|
RE: uri regex
Stuart Johnston wrote:
> cjackson wrote:
>
> > Hi,
> >
> > I flunked the IQ test so I need some help. I want to match all domains
> > in the body that are not in .com,.org.us,.edu,.gov and .mil. But
> > there's more. I need to match some characters at the end of the URI
> > that can often be found there such as >.?)*!"';
> >
> > The rule would match http://www.go.za and http://www.go.za), but not
> > match http://www.go.com
> >
> > Here's my regex that does not work...
> >
> > m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!
\.mil)(?:"|'|:|\?|!|> |\*|\)|$)}
> >
> >
> >
> > It works for all of the characters except for an ending "." such as
> > http://www.go.com.
> >
> > I have grappled with this for some time and read the pcrepattern.txt
> > accompanying Exim source, but damn if I can get it to work. Anybody
> > want to spit out the answer?
>
>
> Assuming that you are creating a SA rule, have you considered using a
> uri test? That way you wouldn't have to worry about the extra
> characters at the end. SA would take care of it for you.
>
Yes, it is a uri test which I patterned after WEIRD_PORTS in 20_uri
Mine is like this...
uri SUSPECT_DOM_CJ =~ <expression>
score SUSPECT_DOM_CJ <score>
I didn't know that SA took care of the ending characters in uri tests.
I'll take another look to consider this. Thanks.
Thread:
Cjackson
Stuart Johnston
Bret Miller
Craig Jackson
Craig Jackson
Bret Miller
|