ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> perl-win32-users
perl-win32-users
RE: spidering/crawling/scraping a site..
by Mark - BLS CTR Thomas other posts by this author
Oct 28 2005 8:14AM messages near this date
view in the new Beta List Site
RE: spidering/crawling/scraping a site.. | RE: spidering/crawling/scraping a site..
bruce [mailto:bedouglas@[...].net] wrote:
>  mark...
>  
>  i'm actually faced with a greater issue. i'm looking to
>  crawl/extract/download the site. simply scraping each site 
>  doesn't get me
>  the underlying files for the site, in the correct location/names on my
>  server, to allow me to kind of replicate the basic links of the site.

Aside from the questionable legality of what you're asking, keep in mind
that sites using database-backed form-based authentication (as opposed to
HTTP Basic Authentication) are often generated via assembly from includes or
CMSes or other applications, in which case trying to get the "underlying
files" is futile. They may not exist as discrete files on the server.

>  this requires a crawler... which is what i was originally 
>  looking for/hoping
>  for, that allows me to deal with the login (user/passwd) form.

I have given you the information you need to do this. Simply save the files
locally as you crawl them.

- Mark.

_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Bruce
Mark - BLS CTR Thomas
Mark - BLS CTR Thomas
Mark - BLS CTR Thomas
Bruce
Bruce

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved