ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-ai
perl-ai
Re: AI::Categorizer::InMemory %data
by Ken Williams other posts by this author
Aug 5 2005 4:29PM messages near this date
AI::Categorizer::InMemory %data | Re: AI::Categorizer::InMemory %data
Hi Bill,

The problem in your example is actually in how you're creating 
%dochash.  You're re-using the %doc hash for both documents, which 
means that under the surface you don't have what you think you have.  
Witness this condensed version of your example:

=======================================================
my %doc;
my %dochash;

$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching 
them.";
$dochash{SeahawksDocTitle} = \%doc;

$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and 
stuff";
$dochash{SeattleDocTitle} = \%doc;

use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
           'SeattleDocTitle' =>  {
                                  'content' =>  'I like to go to seattle 
and watch the mariners and stuff',
                                  'name' =>  'Seattle'
                                },
           'SeahawksDocTitle' =>  $VAR1->{'SeattleDocTitle'}
         };
=======================================================


There are several ways to create the data structure you intend - one 
way would be something like this:


=======================================================
my %dochash;

$dochash{SeahawksDocTitle} =
   {
    name =>  "Seahawks",
    content =>  "The Seahawks are a pretty good team. I enjoy watching 
them.",
   };

$dochash{SeattleDocTitle} =
   {
    name =>  "Seattle",
    content =>  "I like to go to seattle and watch the mariners and 
stuff",
   }

use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
           'SeattleDocTitle' =>  {
                                  'name' =>  'Seattle',
                                  'content' =>  'I like to go to seattle 
and watch the mariners and stuff'
                                },
           'SeahawksDocTitle' =>  {
                                   'name' =>  'Seahawks',
                                   'content' =>  'The Seahawks are a 
pretty good team. I enjoy watching them.'
                                 }
         };
=======================================================


Then the following display code shows that the Collection is created 
properly:

=======================================================
print "Number of docs: ", $collection-> count_documents, "\n";
while (my $doc = $collection-> next) {
   print $doc-> name, " => [", join( ", ", map $_->name, $doc->categories 
), "]\n";
}
=======================================================
Number of docs: 2
Seahawks =>  [trucks, cars]
Seattle =>  [seattle, baseball]
=======================================================


  -Ken


On Aug 4, 2005, at 7:26 PM, Bill W. wrote:

>  Hello perl-ai!
> 
>  I've been playing with AI::Categorizer for a week or two now, and am 
>  having difficulties creating a collection object using the InMemory 
>  module.  I'm new to perl and oop and programming for that matter, but 
>  I've managed to get the functionality I'm looking for from 
>  AI::Categorizer using Collection::Files.  However, it would be very 
>  much more useful and efficient if I could create the collection from 
>  memory.  It seems that the collection is created, and I can load it 
>  into a knowledgeset.  I can even train NaiveBayes on the knowledge set 
>  and categorize documents (although I'm not sure that it's doing so 
>  properly.).  It seems that it's not acknowledging all of the 
>  categories that are included in the collection's documents, it seems 
>  to only be recognizing one document's category set as the set for the 
>  collection.  The main error I'm getting is when I try to generate a 
>  stats_table using:
> 
>  my $mem_experiment = $l_mem->categorize_collection( collection => 
>  $c_mem_test );
>  print $mem_experiment->stats_table;
> 
>  Can't take log of 0 at 
>  /usr/local/share/perl/5.8.4/Statistics/Contingency.pm line 183.
> 
>  Can anyone tell me where I'm going wrong?  I very much appreciate help 
>  from anyone who has gotten this working.  And thanks to Ken for 
>  creating this great tool.
> 
>  -Bill
> 
> 
>  ---------code snippet--------
>  my %doc;
>  my %dochash;
> 
>  my $cars = AI::Categorizer::Category->by_name(name => "cars");
>  my $trucks = AI::Categorizer::Category->by_name(name => "trucks");
>  my $baseball = AI::Categorizer::Category->by_name(name => "baseball");
>  my $seattle = AI::Categorizer::Category->by_name(name => "seattle");
> 
>  push(my @seahawks_categories,$cars,$trucks);
>  push(my @seattle_categories,$seattle,$baseball);
> 
> 
>  $doc{name} = "Seahawks";
>  $doc{content} = "The Seahawks are a pretty good team. I enjoy watching 
>  them, and going to Seattle to see them";
>  $doc{categories} = \@seahawks_categories;
>  $dochash{SeahawksDocTitle} = \%doc;
> 
>  $doc{name} = "Seattle";
>  $doc{content} = "I like to go to seattle and watch the mariners and 
>  stuff";
>  $doc{categories} = \@seattle_categories;
>  $dochash{SeattleDocTitle} = \%doc;
> 
> 
>  my $collection = new AI::Categorizer::Collection::InMemory(     data 
>  => \%dochash);
> 
>  return($collection);
> 
Thread:
Bill W.
Ken Williams
Steffen Schwigon

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved