Re: uri regex
by Stuart Johnston other posts by this author
Jun 15 2005 11:08AM messages near this date
uri regex
|
RE: uri regex
cjackson wrote:
> Hi,
>
> I flunked the IQ test so I need some help. I want to match all domains
> in the body that are not in .com,.org.us,.edu,.gov and .mil. But there's
> more. I need to match some characters at the end of the URI that can
> often be found there such as >.?)*!"';
>
> The rule would match http://www.go.za and http://www.go.za), but not
> match http://www.go.com
>
> Here's my regex that does not work...
>
> m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\
.mil)(?:"|'|:|\?|!|> |\*|\)|$)}
>
>
>
> It works for all of the characters except for an ending "." such as
> http://www.go.com.
>
> I have grappled with this for some time and read the pcrepattern.txt
> accompanying Exim source, but damn if I can get it to work. Anybody want
> to spit out the answer?
Assuming that you are creating a SA rule, have you considered using a
uri test? That way you wouldn't have to worry about the extra
characters at the end. SA would take care of it for you.
Thread:
Cjackson
Stuart Johnston
Bret Miller
Craig Jackson
Craig Jackson
Bret Miller
|