RE: spidering/crawling/scraping a site..
by Mark - BLS CTR Thomas other posts by this author
Oct 28 2005 8:14AM messages near this date
view in the new Beta List Site
RE: spidering/crawling/scraping a site..
|
RE: spidering/crawling/scraping a site..
bruce [mailto:bedouglas@[...].net] wrote:
> mark...
>
> i'm actually faced with a greater issue. i'm looking to
> crawl/extract/download the site. simply scraping each site
> doesn't get me
> the underlying files for the site, in the correct location/names on my
> server, to allow me to kind of replicate the basic links of the site.
Aside from the questionable legality of what you're asking, keep in mind
that sites using database-backed form-based authentication (as opposed to
HTTP Basic Authentication) are often generated via assembly from includes or
CMSes or other applications, in which case trying to get the "underlying
files" is futile. They may not exist as discrete files on the server.
> this requires a crawler... which is what i was originally
> looking for/hoping
> for, that allows me to deal with the login (user/passwd) form.
I have given you the information you need to do this. Simply save the files
locally as you crawl them.
- Mark.
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
Bruce
Mark - BLS CTR Thomas
Mark - BLS CTR Thomas
Mark - BLS CTR Thomas
Bruce
Bruce
|