ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> php-dev
php-dev
[PHP-DEV] re2c issues? (Was Re: [ZEND-ENGINE-CVS] cvs: ZendEngine2(PHP_5_3) / zend_language_scanner.l)
by Matt Wilmas other posts by this author
Jul 8 2008 9:51AM messages near this date
#45462 [Opn->Bgs]: Backslash char is not listed on a string correctly | Re: [PHP-DEV] re2c issues? (Was Re: [ZEND-ENGINE-CVS] cvs: ZendEngine2(PHP_5_3) / zend_language_scanner.l)
Hi Nuno, all,

I didn't test it, but yeah that should fix the # problem. :-)  BTW, I also
had other ideas about checking for <?, <%, <script> , etc. tags in the inline
HTML scanning part, so the largest chunk of HTML is always grabbed (I'll
send the patch in the future; didn't modify anything yet, and it's not
related to the subject anyway :-)).

Still wondering about the behavior of re2c at EOF being different than
Flex -- can't re2c have an addition/enhancement that simply keeps track of
the rule that *would have* matched before hitting EOF (e.g. YYCURSOR > =
YYLIMIT) and then jump to it when doing the YYFILL check?

Another thing that isn't working is the warning about /* Unterminated
comments... (never seen).  The optimization for comment parsing I was going
to do (along with the above HTML stuff) would also work around that -- not
using re2c rules, but a manual scan or zend_memnstr() for the closing */.

Like I said in the comments for Bug #45372, if the last thing at the end of
a file is matched by a variable length rule, it will not be returned.
Because of

#define YYFILL(n) { if (YYCURSOR > = YYLIMIT) return 0; }

I put the ? in the subject line because I'm not sure how important this
really is, but it just seems broken to me (though it's usually with invalid
code), and I couldn't think of a workaround with my limited knowledge of
re2c (though I think it would need to be changed internally). Some things
this affects are 1) the tokenizer extension -- the last token won't be
returned (if variable length, of course); 2) highlighting (if someone is
trying to "see" an unclosed string error, for example? PHP highlighting on
forums...), and parse errors can be different than previously if the parser
gets one less token, for example:

$foo = "Unclosed string<newline>    // Different error line number; I think
the space before the quote is the last token returned (even w/o newline)

function test   // *Nothing* after test, used to say expecting '(', now says
expecting T_STRING; again, space before test is the last token

function test()
   OR
array

at the end of a file work the same still because ")" and "array" are fixed
length matches...  Or, add a few newlines at the end and they won't be
counted, etc.

It's been awhile since I checked out the details, so I can't recall at the
moment if there are more serious examples.  Also not sure if some of this is
affecting the ini scanner (see Bug #45384), as I haven't really look at its
code.

What are everyone's thoughts...?


- Matt


----- Original Message -----
From: "Nuno Lopes"
Sent: Tuesday, July 08, 2008

>  nlopess Tue Jul  8 15:16:35 2008 UTC
> 
>    Modified files:              (Branch: PHP_5_3)
>      /ZendEngine2 zend_language_scanner.l
>    Log:
>    now really fix once and for all the #-style comments.
>    also remove some duplicated code in <?, <%, <%= handlers. this also has
the side-effect of producing better bytecodes in some special cases
> 
> 
http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_language_scanner.l?r1=1.131.2.11.2.13.2.21&r2
=1.131.2.11.2.13.2.22&diff_format=u


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
Thread:
Matt Wilmas
Nuno Lopes
Lukas Kahwe Smith

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved