Re: text categorization with SVM and NaiveBayes
by Ken Williams other posts by this author
Jan 9 2007 4:52AM messages near this date
Re: text categorization with SVM and NaiveBayes
|
Creating Collection of uncategorized data
On Jan 8, 2007, at 10:51 AM, Tom Fawcett wrote:
> Just to add a note here: Ken is correct -- both NB and SVMs are
> known to be rather poor at providing accurate probabilities. Their
> scores tend to be too extreme. Producing good probabilities from
> these scores is called calibrating the classifier, and it's more
> complex than just taking a root of the score. There are several
> methods for calibrating scores. The good news is that there's an
> effective one called isotonic regression (or Pool Adjacent
> Violators) which is pretty easy and fast. The bad news is that
> there's no plug-in (ie, CPAN-ready) perl implementation of it (I've
> got a simple implementation which I should convert and contribute
> someday).
>
> If you want to read about classifier calibration, google one of
> these titles:
>
> "Transforming classifier scores into accurate multiclass
> probability estimates"
> by Bianca Zadrozny and Charles Elkan
>
> "Predicting Good Probabilities With Supervised Learning"
> by A. Niculescu-Mizil and R. Caruana
Cool, thanks for the references. It might be nice to add somesuch
scheme to Algorithm::NaiveBayes (and friends), so that the user has a
choice of several normalization schemes, including "none". If I get
a surplus of tuits I'll add it, or if you feel like contributing your
stuff that would be great too.
-Ken
Thread:
Zgrim
Ken Williams
Tom Fawcett
Ken Williams
|