ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl6-internals
perl6-internals
RE: Applying regexen/grammars to objects (was Re: String API)
by Gordon Henriksen other posts by this author
Aug 25 2003 1:57AM messages near this date
Re: Applying regexen/grammars to objects (was Re: String API) | Re: String API
Benjamin Goldberg wrote:

>  Gordon Henriksen wrote:

>  

>  > Having a lazily slurped file string simply delays disaster, and

>  > opens the door for Very Big Mistakes. Such strings would have to be

>  > treated very delicately, or the program would behave very

>  > inefficiently or crash.

>  

>  Although Dan's convinced me that STRING*s don't need to be anything

>  other than concrete, wholly-in-memory, non-active buffers of data,

>  (for various and sundry reasons), I'm not sure why a lazily slurped

>  file string would need to be treated "delicately".

>  

>  In particular, what would make the program crash?


s/crash/uses HUGE GOBS OF MEMORY and exhaust the system's swapfile/g.

>  Why would you have the potential to load the entire file into 

>  memory if you're careless?


Mutations would remain in memory, right? uc() such a string and watch
your swapfile fill right up. Or s///g. Or just in general change it.

And character indexing a file too big to fit in memory, when char
indexing is an O(n) problem for significant cases (UTF-8)...? Very
Bad things...

Or were you thinking that changes would be written back? In which
case... each string mod would have to rewrite the (huge, remember) file
from that point forward. Way to render an API useless.

I have no doubt that p6 will have file-tied strings which will address
many of these problems--they're just very complex and don't belong
inside STRING*.


>  > And what if your admittedly huge file is larger than 2**32 bytes? (A

>  > very real possibility! You said it was too big to fit in memory!)

>  > Are you going to suggest that all STRING* consumers on 32-bit

>  > platforms emulate 64-bit arithmetic whenever manipulating STRING*

>  > lengths?

>  

>  Blech.  Yeah, that *would* be annoying.  OTOH, they're already

>  emulating 64-bit arithmetic whenever they deal with file offsets.  Or

>  perhaps I should be saying, "bad enough that they're already ... With

>  file offsets, we don't want to have to do it with string lengths,

>  too."


I've got my money on option #2.


>  >         grammar HTTPServer {

>  >                 rule http {

>  >                         (<request> <commit>)*

>  >                 }

>  >                 rule request {

>  >                         <get_request> | <post_request> | ...

>  >                 }

>  >                 rule get_request {

>  >                         GET <path> <version> <crlf>

>  >                         <header>

>  [snip]

>  

>  You should have a <commit> after that CRLF there :)


Yeah, well, one could go after GET, too, and after <path> , and after

<version> , and every other non-optional protocol element. It gets noisy

after a while.


>  > How cool is that? Just imagine trying to apply the same pattern to a

>  > more long-lived protocol than HTTP, though-a database connection,

>  > maybe, or IRC.

>  

>  Through a database connection?  I can envision that for the purpose of

>  implementing the protocol, [...]


I did indeed mean implementing the database protocol. Though, not
thRough.


>  > [2] No doubt, unshift hacks[3] could be found to make the lazy

>  > slurpy file string not crash. But these are just changes to make 

>  > strings behave like streams, and would impose upon STRING*

>  > consumers everywhere Very Strange things like those strings which

>  > don't know their own length. A string wants to be a string, and a

>  > stream wants to be a stream.

>  

>  I wasn't considering allowing lazily slurped file strings on anything

>  other than plain files (ones for which perl's "-f" operator returns

>  true).

>  

>  Thus, I can't see how the string wouldn't know it's own length.


Fine, in theory--but UTF-8 and other variable-length encodings would
need to open and scan THE ENTIRE FILE at the time it was tied in order
to know their length in characters. Ouch.


>  > [3] Unshift hack #1: Where commit appears in the above, exit the

>  > grammar, trim the beginning of the string, and re-enter. (But that

>  > forces the grammar author to discard the regex state, whereas commit

>  > would offer no such restriction.) Unshift hack #2: Tell =~ that

>  > <commit> can trim the beginning of the string. (DWIM departs;

>  > /cgxism returns.)

>  

>  Trimming off the beginning of the string is the job of the <cut>

>  operator, not the <commit> operator.


Indeed, my bad--been a while since I read the apocalypse.

>  Hmm... I wonder how <cut> would be done with an iterator.  Bleh.


Equivalent to <commit> , I say.... Then your grammar rule can work on an

iterator, or on a string that's being used as a buffer.

Here's a question: How does $iter =~ /a+b/ work on an iterator which
returns "aaaaaaack!"? Requires a putback op.

I'm not sure about <cut>  vs. <commit>. They seem so orthogonal, and they

pervasively tie a grammar to an implementation choice. It seems more
like an m:option.

--
 
Gordon Henriksen
IT Manager
ICLUBcentral Inc.
gordon@[...].com



Thread:
Benjamin Goldberg
Benjamin Goldberg
Dan Sugalski
Benjamin Goldberg
Benjamin Goldberg
Benjamin Goldberg
Gordon Henriksen
Gordon Henriksen
Leopold Toetsch
Dan Sugalski
Benjamin Goldberg
Nicholas Clark
Peter Gibbs
Dan Sugalski
Leopold Toetsch
Benjamin Goldberg
Nicholas Clark
Elizabeth Mattijsen
Dan Sugalski
Benjamin Goldberg
Benjamin Goldberg
Leopold Toetsch
Benjamin Goldberg
Tim Bunce
Leopold Toetsch
Luke Palmer

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved