Re: AI::Categorizer::InMemory %data
by Ken Williams other posts by this author
Aug 5 2005 4:29PM messages near this date
AI::Categorizer::InMemory %data
|
Re: AI::Categorizer::InMemory %data
Hi Bill,
The problem in your example is actually in how you're creating
%dochash. You're re-using the %doc hash for both documents, which
means that under the surface you don't have what you think you have.
Witness this condensed version of your example:
=======================================================
my %doc;
my %dochash;
$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching
them.";
$dochash{SeahawksDocTitle} = \%doc;
$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and
stuff";
$dochash{SeattleDocTitle} = \%doc;
use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
'SeattleDocTitle' => {
'content' => 'I like to go to seattle
and watch the mariners and stuff',
'name' => 'Seattle'
},
'SeahawksDocTitle' => $VAR1->{'SeattleDocTitle'}
};
=======================================================
There are several ways to create the data structure you intend - one
way would be something like this:
=======================================================
my %dochash;
$dochash{SeahawksDocTitle} =
{
name => "Seahawks",
content => "The Seahawks are a pretty good team. I enjoy watching
them.",
};
$dochash{SeattleDocTitle} =
{
name => "Seattle",
content => "I like to go to seattle and watch the mariners and
stuff",
}
use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
'SeattleDocTitle' => {
'name' => 'Seattle',
'content' => 'I like to go to seattle
and watch the mariners and stuff'
},
'SeahawksDocTitle' => {
'name' => 'Seahawks',
'content' => 'The Seahawks are a
pretty good team. I enjoy watching them.'
}
};
=======================================================
Then the following display code shows that the Collection is created
properly:
=======================================================
print "Number of docs: ", $collection-> count_documents, "\n";
while (my $doc = $collection-> next) {
print $doc-> name, " => [", join( ", ", map $_->name, $doc->categories
), "]\n";
}
=======================================================
Number of docs: 2
Seahawks => [trucks, cars]
Seattle => [seattle, baseball]
=======================================================
-Ken
On Aug 4, 2005, at 7:26 PM, Bill W. wrote:
> Hello perl-ai!
>
> I've been playing with AI::Categorizer for a week or two now, and am
> having difficulties creating a collection object using the InMemory
> module. I'm new to perl and oop and programming for that matter, but
> I've managed to get the functionality I'm looking for from
> AI::Categorizer using Collection::Files. However, it would be very
> much more useful and efficient if I could create the collection from
> memory. It seems that the collection is created, and I can load it
> into a knowledgeset. I can even train NaiveBayes on the knowledge set
> and categorize documents (although I'm not sure that it's doing so
> properly.). It seems that it's not acknowledging all of the
> categories that are included in the collection's documents, it seems
> to only be recognizing one document's category set as the set for the
> collection. The main error I'm getting is when I try to generate a
> stats_table using:
>
> my $mem_experiment = $l_mem->categorize_collection( collection =>
> $c_mem_test );
> print $mem_experiment->stats_table;
>
> Can't take log of 0 at
> /usr/local/share/perl/5.8.4/Statistics/Contingency.pm line 183.
>
> Can anyone tell me where I'm going wrong? I very much appreciate help
> from anyone who has gotten this working. And thanks to Ken for
> creating this great tool.
>
> -Bill
>
>
> ---------code snippet--------
> my %doc;
> my %dochash;
>
> my $cars = AI::Categorizer::Category->by_name(name => "cars");
> my $trucks = AI::Categorizer::Category->by_name(name => "trucks");
> my $baseball = AI::Categorizer::Category->by_name(name => "baseball");
> my $seattle = AI::Categorizer::Category->by_name(name => "seattle");
>
> push(my @seahawks_categories,$cars,$trucks);
> push(my @seattle_categories,$seattle,$baseball);
>
>
> $doc{name} = "Seahawks";
> $doc{content} = "The Seahawks are a pretty good team. I enjoy watching
> them, and going to Seattle to see them";
> $doc{categories} = \@seahawks_categories;
> $dochash{SeahawksDocTitle} = \%doc;
>
> $doc{name} = "Seattle";
> $doc{content} = "I like to go to seattle and watch the mariners and
> stuff";
> $doc{categories} = \@seattle_categories;
> $dochash{SeattleDocTitle} = \%doc;
>
>
> my $collection = new AI::Categorizer::Collection::InMemory( data
> => \%dochash);
>
> return($collection);
>
Thread:
Bill W.
Ken Williams
Steffen Schwigon
|