ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> spamassassin-users
spamassassin-users
Re: Rule Design Benchmark/Resource Question
by Rocky Olsen other posts by this author
Mar 31 2005 3:57PM messages near this date
Re: Rule Design Benchmark/Resource Question | RE: my girlfriend is getting ticked :)
Thanks

On Thu, Mar 31, 2005 at 05:16:25PM -0500, Matt Kettler wrote:
>  Rocky Olsen wrote:
>  
>  >Before i pull my hair out doing bench/resource test, i was wondering if
>  >anyone out there knew if there was much of a speed/resource usage
>  >difference between the following way of writing the same rule.
>  >
>  >
>  >Method A:
>  >body	rule_a		/(?:feh|meh|bleh)/i
>  >
>  >vs.
>  >
>  >Method B:
>  >
>  >bod		__rule_a	/(?:feh)/i
>  >body	__rule_b	/(?:meh)/i
>  >body	__rule_c	/(?:bleh)/i
>  >
>  >meta	rule_d		(__rule_a || __rule_b || __rule_c)
>  >
>  >
>  >There probably isn't much difference using just 3 rules, but i'm thinking
>  >more along the lines of large(500+) lists and it isn't limited to just body
>  >stuff.  So if anyone has some realworld benching/experience with what is
>  >preferred or if the developers know which is faster for SA, i would love
>  >the input.
>  >  
>  >
>  
>  To start with, use perl's regex debugger as your friend:
>  
>  $perl -Mre=debug -e  "/(?:feh|meh|bleh)/i"
>  size 11 Got 92 bytes for offset annotations.
>  
>  $ perl -Mre=debug -e  "/(?:feh)/i"
>  Freeing REx: `","'
>  Compiling REx `(?:feh)'
>  size 3 Got 28 bytes for offset annotations.
>  
>  (repeat 2 times)
>  
>  However, this only deals with part of the story. The cost of the regex
>  itself. It does not deal with the per-rule overhead in SA.
>  
>  In general I'd favor the combined approach, unless for some reason your
>  combined rule is considerably larger than the sum of it's parts. Bigevil
>  ran much better once Chris S did some combining and common subexpression
>  elimination.
>  
>  
>  
>  
>  Also, I'd suggest eliminating the (?:) for the single-text-matches. It
>  does nothing of use, and doesn't change the evaluation of the regex any
>  for a simple single text match. All it does is waste 4 bytes of disk
>  space per rule.
>  
>  body __RULE_A   /feh/i
>  
>  instead of:
>  body __RULE_A   /(?:feh)/i
>  
>  I leave comparing the two using re=debug as an exercise for the student.
>  Also compare to /(feh)/i and /(feh)\1/i to see how backtracking works.
>  
>  
>  
>  
>  
>  
>  

-- 
______________________________________________________________________


what's with today, today?

Email:	rocky@[...].org
PGP:	http://rocky.mindphone.org/rocky_mindphone.org.gpg
Attachments:
signature.asc
unknown1

Thread:
Rocky Olsen
Robert Menschel
Matt Kettler
Rocky Olsen

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved