ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-win32-users
perl-win32-users
PDF to text
by Martin Leese other posts by this author
Jul 30 2004 8:56PM messages near this date
view in the new Beta List Site
Perl 5.6 vs 5.8 | RE: pattern matching problem
Hi All,

I want to convert a PDF file to plain text so that I can
parse out some information from the content.  That is to
say, I want to dump any images and formatting, and just
extract the content.

I looked on CPAN and found PDF.  As far as I can tell, I
would need to use PDF::Core.  The examples give:

use PDF::Core;

$pdf=PDF::Core-> new ;
$pdf=PDF-> new(filename);

$res= $pdf-> GetObject($ref);

but I don't see where '$ref' comes from nor how to get
GetObject to traverse the entire document.

Could somebody please give me some pointers here.

I think I see how to distinguish content from other
stuff; just use something like:

if ($res =~ m/^\(.*\)$/)
{
     my $string = UnQuoteString($res);
     print "$string\n";
}

Many thanks,
Martin



_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved