ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
Re: Problem timing out XML::LibXML parse_html_string call
by Aaron Crane other posts by this author
Feb 3 2009 1:11PM messages near this date
view in the new Beta List Site
Problem timing out XML::LibXML parse_html_string call | Re: Problem timing out XML::LibXML parse_html_string call
& XSLT Sam Tregar writes:
>  I'm using XML::LibXML to parse some HTML.  Mostly it's working great
>  - fast and very useful XPath support.  My problem is that it's
>  choking on some very bad HTML in a very bad way - it's sitting on
>  the CPU until killed manually.  I expected some HTML wouldn't parse,
>  so this isn't such a tragedy.  What is a big problem is that my
>  attempt to work around this with alarm() aren't working!

The problem with handling signals in Perl is that they happen
asynchronously.  If a signal is delivered while the Perl interpreter
is executing an op, the code in the Perl-level signal handler might
attempt to modify interpreter state in a way that will cause later
crashes.

Perl 5.8 introduced "safe signals" to alleviate this problem.  The
approach is to have the OS-level signal handler merely set a flag
indicating that the signal has been received.  Then the interpreter
checks the flags at safe points (between ops, effectively), and
invokes your Perl-level handler at that point, when it's known to be
safe.

The only problem with this scheme is that if an op goes into an
infinite loop, the Perl-level signal handler never gets invoked.
That's very unlikely for regular ops in stable releases of Perl, but
a call to an XS function -- a single op -- might ultimately fall into
an infinite loop.  And that's what's happening here; libxml2 (or
perhaps the XS component of XML::LibXML) has an infinite-loop bug, so
your signal handler never gets invoked.

You can switch back to the pre-5.8 signal-handling behaviour by
setting the environment variable PERL_SIGNALS to 'unsafe'.  This has
to have happened at the point Perl starts executing; you can't do it
by setting that variable from inside your code.  For example, using
env(1):

    $ env PERL_SIGNALS=unsafe perl your_program.pl

If it's not possible for you to put an appropriate wrapper round your
program, something along these lines might help, if placed suitably
early in your code:

    BEGIN {
        if (!$ENV{PERL_SIGNALS} || $ENV{PERL_SIGNALS} ne 'unsafe') {
            $ENV{PERL_SIGNALS} = 'unsafe';
            exec $^X, $0, @ARGV;
        }
    }

See also `perldoc perlipc` and search for "safe signals".

-- 
Aaron Crane ** http://aaroncrane.co.uk/
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Sam Tregar
Aaron Crane
Sam Tregar
Bjoern Hoehrmann

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved