ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> python-list
python-list
Re: python script as an emergency mailbox cleaner
by Phil Weldon other posts by this author
Sep 21 2003 9:15PM messages near this date
Re: Download to client from a cgi script? | RE: python script as an emergency mailbox cleaner
Yes, I tend to discount your advice because it may be that you aren't
considering the messages generated by Worm.Automat.AHB are a very restricted
subset of spam, the legitimate 'undeliverable e-mail' messages are closely
related, and the 'undelivered e-mail' messages caused by Worm.Automat.AHB
generated e-mail with the target e-mail address in the FROM line are also
closely related.  The current need is a quick way to counter the 'spam'
effects of Worm.Automat.AHB, not correctly categorizing Nigerian fund
transfer and Viagra spam sets.

To further explain, the bogus 'undeliverable e-mail' type messages are
permutating and the database supplying the input to the worm's generator is
growing.  There are at least two classes of bogus 'undeliverable mail';

1.  e-mail generated by the worm
2.  real 'undeliverable e-mail' messages that are the results of the worm
using your e-mail address as the sender on bogus 'undeliverable e-mail'
which then generates a legitimate but unwanted and useless 'undeliverable
e-mail' message.

Now, if you have the time to supply your arguments rather than cv, I'll be
happy to learn.

And, to quote the Inboxer help file,

"The text box in the Create Filters area indicates the number of messages
that were processed to build the filters. Generally, the higher the number,
the more accurate the filters will become."

So far the scoring Inboxer developed on the basis of the ~1500 bad and 264
good examples results in no false negatives or false positives, including
correctly classifing a dozen completely legitimate 'undelivered e-mail'
messages in a set of ~ 400 new messages.  The -1500 bad e-mail messages have
a date spread of 18SEP03 though 20SEP03 while the 265 good e-mail messages
have a date spread of 1AUG03 through 20SEP03.  Both sets were sent to my ISP
mailbox.

I will try dividing the two sets of messages into smaller sets and try the
results of your suggestion on new e-mails as they collect.  By the way, my
current ratio of Worm.Automat.AHB instigated messages to legitimate e-mail
(which for my purposes includes traditional spam) is far greater than
1500:265; it's more like 1500:50.

And I guess I should download from spambayes and donate to PSF since my
daughter is using Python in her physics classes at Carnegie-Mellon.
Concidently, I just happened to be looking at my loose-leafed copy of
Feynman's Lectures on Physics with a reference manual in the back for
FORTRAN IV I had to use for physics classes.



Phil Weldon, pweldon@[...].com

"Tim Peters" <tim.one@[...].net>  wrote in message
news:mailman.1064166807.8722.python-list@[...]..
>  [Phil Weldon]
>  > I don't think 'fewer' examples of bogus 'Undeliverable e-mail'
>  > messages will be 'better' because of the permutating and morphing
>  > nature of this worm generated message.  'Fewer' examples would result
>  > in ALL 'Undeliverable e-mail' message catagorized as objectionable
>  > because the number valid messages of this type is so small in the
>  > save e-mail that most users have.
> 
>  Which is exactly why training on "too many" such unwanted messages will
make
>  it very difficult for the handful of legitimate messages of that sort to
>  score as ham.  I started the spambayes project, and did most of the
research
>  for, and coding of, its tokenizer and classifier, but you're certainly
free
>  to ignore my ill-informed advice <wink>
> 
.
.
.
>  > Now, if I can just find a way to charge the cost to Earthlink because
>  > of their failure to perform their implicit contract to provide
>  > reliable e-mail service.
> 
>  I suspect they already thought of that trick <wink> -- a good start would
be
>  to read your service contract with them.
> 
> 


-- 
http://mail.python.org/mailman/listinfo/python-list
Thread:
Phil Weldon
Tim Peters

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved