PDF to text
by Martin Leese other posts by this author
Jul 30 2004 8:56PM messages near this date
view in the new Beta List Site
Perl 5.6 vs 5.8
|
RE: pattern matching problem
Hi All,
I want to convert a PDF file to plain text so that I can
parse out some information from the content. That is to
say, I want to dump any images and formatting, and just
extract the content.
I looked on CPAN and found PDF. As far as I can tell, I
would need to use PDF::Core. The examples give:
use PDF::Core;
$pdf=PDF::Core-> new ;
$pdf=PDF-> new(filename);
$res= $pdf-> GetObject($ref);
but I don't see where '$ref' comes from nor how to get
GetObject to traverse the entire document.
Could somebody please give me some pointers here.
I think I see how to distinguish content from other
stuff; just use something like:
if ($res =~ m/^\(.*\)$/)
{
my $string = UnQuoteString($res);
print "$string\n";
}
Many thanks,
Martin
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
|