ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> spamassassin-users
spamassassin-users
Re: bayes_seen file of 340MB
by Matt Kettler other posts by this author
Jun 15 2005 7:48AM messages near this date
bayes_seen file of 340MB | FORGED_YAHOO_RCVD false positive
At 06:59 AM 6/15/2005, Federico Giannici wrote:
> We have a SpamAssassin installation with a single bayes database for 
> all   our mailboxes (a couple thousand).
> 
> I think that the "bayes_toks" file has the expected size (around 8MB), but 
> the "bayes_seen" file seems too big to me: around 340MB!
> Is this size normal?

Yes, bayes_seen doesn't have expiry (yet). It was completely overlooked in 
the original bayes design.

It should be addressed in 3.1.0 (although I'm not sure if it's very 
automatic unless you take the path of disabling bayes_seen)

http://bugzilla.spamassassin.org/show_bug.cgi?id=2975

In the interim, you can stop SA and delete the file.

Be aware that when you do so, messages that have already been trained can 
be re-learned with sa-learn. This is not a big deal for most, but a few 
people rely on dumping files into a directory and learning the whole 
directory, including the files from the last learning run.


> Doesn't such a dimension slow down the queries?

I'm not sure, probably. I try to wipe my bayes seen on occasion.
Thread:
Federico Giannici
Matt Kettler

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved