ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> spamassassin-users
spamassassin-users
RE: uri regex
by Bret Miller other posts by this author
Jun 15 2005 10:38AM messages near this date
Re: uri regex | imapsalearn for SpamAssassin
>  >> I flunked the IQ test so I need some help. I want to match 
>  all domains 
>  >> in the body that are not in .com,.org.us,.edu,.gov and .mil. But 
>  >> there's more. I need to match some characters at the end 
>  of the URI 
>  >> that can often be found there such as >.?)*!"';
>  >>
>  >> The rule would match http://www.go.za and 
>  http://www.go.za), but not 
>  >> match http://www.go.com
>  >>
>  >> Here's my regex that does not work...
>  >>
>  >> 
>  m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.
>  gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(?:"|'|:|\?|!|>|\*|\)|$)} 
>  >>
>  >>
>  >>
>  >> It works for all of the characters except for an ending 
>  "." such as 
>  >> http://www.go.com.
>  >>
>  >> I have grappled with this for some time and read the 
>  pcrepattern.txt 
>  >> accompanying Exim source, but damn if I can get it to 
>  work. Anybody 
>  >> want to spit out the answer?
>  > 
>  > 
>  > Assuming that you are creating a SA rule, have you 
>  considered using a 
>  > uri test?  That way you wouldn't have to worry about the extra 
>  > characters at the end.  SA would take care of it for you.
>  > 
>  Yes, it is a uri test which I patterned after WEIRD_PORTS in 20_uri
>  
>  Mine is like this...
>  
>  uri SUSPECT_DOM_CJ =~ <expression>
>  score SUSPECT_DOM_CJ <score>
>  
>  I didn't know that SA took care of the ending characters in 
>  uri tests. I'll take another look to consider this. Thanks.


That I do know a little about. The developers have been working on
handling extra characters on the end of URIs. I think the fix got into
3.0.4 so you should probably upgrade if you haven't.

Bret
Thread:
Cjackson
Stuart Johnston
Bret Miller
Craig Jackson
Craig Jackson
Bret Miller

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved