ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl6-internals
perl6-internals
Re: String API
by Gordon Henriksen other posts by this author
Aug 24 2003 5:29AM messages near this date
RE: Applying regexen/grammars to objects (was Re: String API) | Re: String API
Now, I don't really have much of an opinion on compound strings in=20
general. I do want to address one particular argument, though=97the =
lazily=20
slurped file string.

On Thursday, August 21, 2003, at 07:22 , Benjamin Goldberg wrote:

>  A foolish question: can you imagine strings which are lazily read from=20=


>  a file?

> 

>  If so, could you imagine such a string, sitting in front of a really=20=


>  really big file, bigger than could fit into memory?


Having a lazily slurped file string simply delays disaster, and opens=20
the door for Very Big Mistakes. Such strings would have to be treated=20
very delicately, or the program would behave very inefficiently or=20
crash. (And let's be frank, a lazily concatenated STRING* is just a=20
tie()d string value=97I thought that was leaving the core.) There's =
power=20
in such strings, no doubt. There's also TERROR of passing the string to=20=

anything lest your program explode because some CPAN module's author=20
wasn't also TERRIFIED of your input being something not-just-a-string.=20=

If I'm going to have the potential to load the entire file into memory=20=

if I'm the least bit careless, I'd prefer to be up front about it.=20
Anti-action-at-a-distance. I don't need to be deluded that my code is=20
efficient because it reads lazily. (Fact is, it's probably faster if it=20=

buffers the file all at once, if it's going to buffer it at all.=20
Certainly more memory-efficient (!). Fewer chunks. Less overhead. But=20
probably faster still to mmap() it.)

And what if your admittedly huge file is larger than 2**32 bytes? (A=20
very real possibility! You said it was too big to fit in memory!) Are=20
you going to suggest that all STRING* consumers on 32-bit platforms=20
emulate 64-bit arithmetic whenever manipulating STRING* lengths?

To efficiently process a Very Large String, you need to *stream* through=20=

it, not buffer it. Same applies to infinite strings (generators) or=20
indeterminate strings (generators and sockets). Such strings don't have=20=

representable or knowable lengths. STRING*'s *really* *really* should=20
reliably have lengths, I think.

IMAGINE, if you will, something absolutely crazy:

	grammar HTTPServer {
		rule http {
			(<request>  <commit>)*

		}
		rule request {
			<get_request>  | <post_request> | ...

		}
		rule get_request {
			GET <path>  <version> <crlf>

			<header> 

			{
				my $file =3D open(...)
					or print("403 Access =
Denied\r\n"), fail;
				print "200 OK\r\n";
				while (<$file> ) print;

				close $file;
			}
		}
		rule post_request {
			GET <path>  <version> <crlf>

			<header> 

			{
				# Blahblahblah...
			}
		}
		rule crlf { \r\n }
		rule header {
			<header_line> * <crlf>

			<commit> 

		}
		rule header_line {
			([:alpha:]+): ([^\r\n]* <crlf>  ([ \t]+ [^\r\n]* =

<crlf> )*)

			<commit> 

		}
		# ... more ...
	}

If perl's using a stream rather than buffering to a STRING*, then=20
$sock =3D~ /<HTTPServer::http> / could actually work=97and quite =

efficiently.=20
[1] How cool is that? Just imagine trying to apply the same pattern to a=20=

more long-lived protocol than HTTP, though=97a database connection, =
maybe,=20
or IRC. Or an HTTP client, which can download lots of data. Using chunky=20=

strings? perl, meet rlimit. rlimit, this is perl. [2] Using streams?=20
Network programming becomes crazily easy.

=97

Gordon Henriksen
malichus@[...].com

[1] Of course, this requires that the regex engine be coded to think in=20=

sequences. The regex engine could keep its own backtracking buffer, and=20=

trim that buffer at each commit.

[2] No doubt, unshift hacks[3] could be found to make the lazy slurpy=20
file string not crash. But these are just changes to make strings behave=20=

like streams, and would impose upon STRING* consumers everywhere Very=20
Strange things like those strings which don't know their own length. A=20=

string wants to be a string, and a stream wants to be a stream.

[3] Unshift hack #1: Where commit appears in the above, exit the=20
grammar, trim the beginning of the string, and re-enter. (But that=20
forces the grammar author to discard the regex state, whereas commit=20
would offer no such restriction.) Unshift hack #2: Tell =3D~ that =
<commit> =20

can trim the beginning of the string. (DWIM departs; /cgxism returns.)

Thread:
Benjamin Goldberg
Benjamin Goldberg
Dan Sugalski
Benjamin Goldberg
Benjamin Goldberg
Benjamin Goldberg
Gordon Henriksen
Gordon Henriksen
Leopold Toetsch
Dan Sugalski
Benjamin Goldberg
Nicholas Clark
Peter Gibbs
Dan Sugalski
Leopold Toetsch
Benjamin Goldberg
Nicholas Clark
Elizabeth Mattijsen
Dan Sugalski
Benjamin Goldberg
Benjamin Goldberg
Leopold Toetsch
Benjamin Goldberg
Tim Bunce
Leopold Toetsch
Luke Palmer

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved