ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-xml
perl-xml
RE: PerlSax to parse/search large (~350 MB) file
by Sterin, Ilya other posts by this author
Jul 11 2001 1:50PM messages near this date
view in the new Beta List Site
RE: PerlSax to parse/search large (~350 MB) file | Compiling XML::LibXML on SCO
OK, now we understand.  The only other problem is that you have an
interpretation of lines, which XML does not use, one tag can be on one line
or twenty lines, it will parse the same way.  When you set your handlers you
will have to start keeping track of things starting <conDef>  start element.
Keep appending until you find the </conDef>  end element.
Then check for the <code>  content when you encounter it, if it's not it set
a flag to discard of the <conDef>  containing string later, if the code
matches, set a flag and when you finally get to the end of </conDef>  abort
parsing and use the string.  Remember if you want the full string and the
elements you will have to append tags and content to the string.  I haven't
used PerlSax before, but this is easily accomplished with XML::Parser.
Above should give you an overview on how to approach this.

Ilya

-----Original Message-----
From: Corey Smith (s)
To: Sterin, Ilya; 'perl-xml@listserv.ActiveState.com'
Sent: 7/11/01 7:01 AM
Subject: RE: PerlSax to parse/search large (~350 MB) file

Let me try this again.  Here's a sample line from the xml file I'm
working
with:

<conDef> <name>Influenza</name><code>C12345</code><id>637</id>...........
...<
/condDef> 

I would like to search the file for the content of the <code>  tag.  Once
the
code is located, the entire line  (everything from <conDef>  to
</conDef> )
containing that code will be output.  Because the file is large,
speed/efficiency is important.

Thanks for the response.

Corey

>  -----Original Message-----
>  From:	Sterin, Ilya [SMTP:Isterin@[...].com]
>  Sent:	Tuesday, July 10, 2001 11:10 PM
>  To:	Corey Smith (s); 'perl-xml@listserv.ActiveState.com'
>  Subject:	RE: PerlSax to parse/search large (~350 MB) file
>  
>  I'm a little confused as to what you are trying to do.  Give us a
better
>  example, unless someone here can understand you problem.  Are you
looking
>  for a specific tag <...> or content of a tag?  Once you find it, are
you
>  asking how you can extrace the content?
>  
>  Ilya
>  
>  > -----Original Message-----
>  > From: perl-xml-admin@[...].com
>  > [mailto:perl-xml-admin@[...].com]On Behalf Of Corey
Smith
>  > (s)
>  > Sent: Tuesday, July 10, 2001 6:02 PM
>  > To: 'perl-xml@listserv.ActiveState.com'
>  > Subject: PerlSax to parse/search large (~350 MB) file
>  >
>  >
>  > The task:
>  >   	Search a large xml file for an identifier contained in an
element.
>  > Having located the line associated with 	the desired identifier,
>  > output line from source file to file.  Output all other lines to
another
>  > file.
>  >
>  > The problem:
>  > 	Once there is a match on the identifier, how can I identify the
line
>  > from the input file so that I can output 	it to a file?
>  >
>  > Any help would be greatly appreciated.  Thanks.
>  >
>  >
>  >
>  >
>  > _______________________________________________
>  > Perl-XML mailing list
>  > Perl-XML@[...].com
>  > http://listserv.ActiveState.com/mailman/listinfo/perl-xml
_______________________________________________
Perl-XML mailing list
Perl-XML@[...].com
http://listserv.ActiveState.com/mailman/listinfo/perl-xml

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved