ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> php-dev
php-dev
#41896 [NEW]: preg_replace crashes with large input
by Giacomoread At Hotmail Dot Com other posts by this author
Jul 4 2007 12:03PM messages near this date
#41897 [Opn->Fbk]: Loading dynamic library fails although paths and configs are correct | #41896 [Opn->Bgs]: preg_replace crashes with large input
From:             giacomoread at hotmail dot com
Operating system: All
PHP version:      5.2.3
PHP Bug Type:     Scripting Engine problem
Bug description:  preg_replace crashes with large input

Description:
------------
I found a similar bug which was closed with status bogus. Unacceptable!
There is nothing in the documentation which states limits to the input of
preg_replace or any portable work arounds documented. Stating that 'it is
just a stack overflow' just to keep the bug count down is more than a
little unprofessional. A scripting language should either make the
workaround internal or document input limits NOT cause seg faults. This is
a bug whether the php community is willing to accept it or not.

Reproduce code:
---------------
function parse($html, &$title, &$text, &$anchors)
{
  $pstring1 = "'[^']*'";
  $pstring2 = '"[^"]*"';
  $pnstring = "[^'\"> ]";
  $pintag   = "(?:$pstring1|$pstring2|$pnstring)*";
  $pattrs   = "(?:\\s$pintag){0,1}";

  $pcomment = enclose("<!--", "-", "-> ");
  $pscript  = enclose("<script$pattrs> ", "<", "\\/script>");
  $pstyle   = enclose("<style$pattrs> ", "<", "\\/style>");
  $pexclude = "(?:$pcomment|$pscript|$pstyle)";

  $ptitle   = enclose("<title$pattrs> ", "<", "\\/title>");
  $panchor  = "<a(?:\\s$pintag){0,1}> ";
  $phref    = "href\\s*=[\\s'\"]*([^\\s'\"> ]*)";

  $html = preg_replace("/$pexclude/iX", " ", $html);

  if ($title !== false)
    $title = preg_match("/$ptitle/iX", $html, $title)
             ? $title[1] : '';

  if ($text !== false)
  {
    $text = preg_replace("/<$pintag> /iX",   " ", $html);
    $text = preg_replace("/\\s+|&nbsp;/iX", " ", $text);
  }

  if ($anchors !== false)
  {
    preg_match_all("/$panchor/iX", $html, $anchors);
    $anchors = $anchors[0];

    reset($anchors);
    while (list($i, $x) = each($anchors))
      $anchors[$i] =
        preg_match("/$phref/iX", $x, $x) ? $x[1] : '';

    $anchors = array_unique($anchors);
  }
}

function enclose($start, $end1, $end2)
{
  return "$start((?:[^$end1]|$end1(?!$end2))*)$end1$end2";
}

Expected result:
----------------
The code should clean the html pages into title, text and links. It works
fine until large pages are downloaded. Then it seg faults with gdb showing
the blame lying on preg_replace.


-- 
Edit bug report at http://bugs.php.net/?id=41896&edit=1
-- 
Try a CVS snapshot (PHP 4.4): http://bugs.php.net/fix.php?id=41896&r=trysnapshot44
Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=41896&r=trysnapshot52
Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=41896&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=41896&r=fixedcvs
Fixed in release:             http://bugs.php.net/fix.php?id=41896&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=41896&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=41896&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=41896&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=41896&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=41896&r=notwrong
Not enough info:              http://bugs.php.net/fix.php?id=41896&r=notenoughinfo
Submitted twice:              http://bugs.php.net/fix.php?id=41896&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=41896&r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=41896&r=php3
Daylight Savings:             http://bugs.php.net/fix.php?id=41896&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=41896&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=41896&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=41896&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=41896&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=41896&r=mysqlcfg
Thread:
Giacomoread At Hotmail Dot Com
tony2001

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved