ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl5-porters
perl5-porters
Re: (blead patch) New TRIE regex patch
by other posts by this author
Mar 2 2005 4:33PM messages near this date
Re: (blead patch) New TRIE regex patch | Re: (blead patch) New TRIE regex patch
demerphq <demerphq@[...].com>  wrote:
:On Wed, 02 Mar 2005 14:53:21 +0000, hv@[...].org <hv@[...].org>  wrote:
:>  demerphq <demerphq@[...].com> wrote:
:>  :Hmm. Yes I think you are right. Ok. If i rework it so the trie is
:>  :allocated and inserted into the regexes data structure at the get-go
:>  :then pregfree would clean it up.
:>  
:>  Make sure in that case that you can cope with reentrancy.
:
:Can you expand on that a bit please? Im not entirely certain what you
:mean there.

If you're attaching stuff to the regexp structure, you need to cope
with reentrancy due to eval blocks, by making sure that stuff is saved
and restored at the right time. There's stuff that does all that, I just
don't remember where right now - I'm sure a trawl would turn it up.

:Personally I think of cases such as you describe as being pretty well
:pathological. I had a look through a large set of regexes and none of
:them had more than a small handful of unique characters.

The sort of situation I'm thinking about is a virus scanner, which would
have a large number of short characteristic sequences to search for;
I can also imagine bioinformatics applications that would do something
similar.

:(Do a review of your own code. How many times have your made a regex
:with /a|list|of|words/ that had more than a small number of letters
:(ie well less than 256)?)

In my current work application (about 48 KLOC) I found only two regexps
that would get a trie, one matching '(before | after)', and the other
'(add | remove | replace)'; however I don't think I'm a typical user -
in general, the people who will get the biggest benefit are those who
wrote their code *without* knowing that this is one case that isn't
currently well optimised.

:>  Maybe it would make sense to use a poor man's hash: [...]
:
:Possibly such a strategy could be taken when ( charcount * length )
:was high enough although my gut feeling is that it probably wouldnt be
:worth it.

In general we don't want multiple strategies if we can possibly avoid it.
It's down to finding a best-fit strategy that minimises the resource cost
of the average case while avoiding pathological behaviour in all cases.

Hugo
Thread:

Yitzchak Scott-Thoennes
Demerphq

Demerphq
Yitzchak Scott-Thoennes

Dave Mitchell
Demerphq
Dave Mitchell
Demerphq
Dave Mitchell

Demerphq
Demerphq
Yitzchak Scott-Thoennes
Rafael Garcia-Suarez

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved