Re: uri regex
by Craig Jackson other posts by this author
Jun 15 2005 10:21AM messages near this date
RE: uri regex
|
Re: uri regex
Bret Miller wrote:
> >I flunked the IQ test so I need some help. I want to match
> >all domains
> >in the body that are not in .com,.org.us,.edu,.gov and .mil.
> >But there's
> >more. I need to match some characters at the end of the URI that can
> >often be found there such as >.?)*!"';
> >
> >The rule would match http://www.go.za and http://www.go.za), but not
> >match http://www.go.com
> >
> >Here's my regex that does not work...
> >
> >m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.
> >gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(?:"|'|:|\?|!|>|\*|\)|$)}
> >
> >
> >It works for all of the characters except for an ending "." such as
> >http://www.go.com.
> >
> >I have grappled with this for some time and read the pcrepattern.txt
> >accompanying Exim source, but damn if I can get it to work.
> >Anybody want to spit out the answer?
>
>
> I'm no regex expert, but your ending (?:"|'|:|\?|!|>|\*|\)|$) doesn't
> list a ., so it wouldn't catch it.
>
> Maybe
> (?:"|'|:|\?|!|>|\*|\)|$|\.)
> Would be better?
>
> Bret
>
>
>
Thanks, Bret, but I tried that and got matches up to all the "."s in the
uri.
Thread:
Cjackson
Stuart Johnston
Bret Miller
Craig Jackson
Craig Jackson
Bret Miller
|