ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-ai
perl-ai
Creating Collection of uncategorized data
by Alan Gibson other posts by this author
Jan 4 2007 7:25PM messages near this date
Re: text categorization with SVM and NaiveBayes | ai::categorize samples
Hello,

First post to this list. Im beginning a project that will use
automated text classification to classify congressional bills and
AI::Categorizer looks like the best framework to use. However, Im
hitting a snag on what should be a simple operation.

I train an svm classifier on 1000 documents; this operation goes fine.
I then try to create an instance of AI::Categorizer::Collection::Files
containing 5 unclassified documents. I supply only the path because
the 5 documents are not yet categorized:

    my $c = new AI::Categorizer::Collection::Files(
        path =>  "$path");
    while (my $document = $c-> next) {
        my $hypothesis = $nb-> categorize($document);
        print "Best assigned category: ", $hypothesis-> best_category, "\n";
        print "All assigned categories: ", join(', ',
$hypothesis-> categories), "\n";
    }

This produces the error

No category information about '5-508' at
/usr/local/share/perl/5.8.7/AI/Categorizer/Collection/Files.pm line
44.
Mandatory parameter 'all_categories' missing in call to
AI::Categorizer::Hypothesis-> new()

To get around this error I could just supply the categories of the 5
unknown test documents, but in our real world application we will have
a constant stream of unclassified documents coming in that will
recieve human attention only long after they have been automatically
classified.

Is the design intent to only allow test documents that already are
categorized (eg for creating confidence statistics)? If so, does
anyone have any suggestions on the preffered way to classifiy unknown
documents with AI::Categorizer?

Thanks,
Alan

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved