Re: [pyxpcom] Python web crawler using Gecko
by Mark Hammond other posts by this author
Mar 17 2008 10:11AM messages near this date
view in the new Beta List Site
[pyxpcom] Python web crawler using Gecko
|
Re: [pyxpcom] Python web crawler using Gecko
I'm afraid that you are blazing a trail here (and might be a little confused
about how you want this architected). Note that using XULRunner implies you
are *not* using python as the main executable - XULRunner *is* the
executable - so what you might want is for your XULRunner app to boot up,
then call your pyxpcom component, which fires off your web-crawler (in the
same process), which does its thing - but note that this is quite different
from "embedding gecko in a GUI-less Python app"
Maybe look at something like the sample in extensions/python/dom/samples (or
something like that) - that creates a XULRunner app with a number of
buttons. You can adopt one of the buttons, so what when it is clicked, the
process above starts (ie, import the webcrawler module and call its entry
point). I'm not sure how you would then hook attempts to load other URLs
(other than the loading of JS, which it sounds like you want to allow), but
that kind of question isn't really related to Python - ie, if you can
workout how to hook it in JS, we can help "port" it to Python - so asking on
a generic Mozilla/JS list might get better answers for that specific part of
your puzzle.
Hope this helps,
Mark
> -----Original Message-----
> From: pyxpcom-bounces@[...].com
> [mailto:pyxpcom-bounces@[...].com] On Behalf
> Of Ivan Vrtaric
> Sent: Sunday, 16 March 2008 9:57 PM
> To: pyxpcom@[...].com
> Subject: [pyxpcom] Python web crawler using Gecko
>
> Hi,
>
> I have got a task for a course on faculty to try and see if
> it's possible to extend a web crawler written in Python with
> methods that would return DOM after the JavaScript is
> executed, and that would check if further JavaScript calls on
> the same page open any new HTTP connections. I got a
> suggestion to use PyXPCOM and XULRunner, and I've been
> researching the XPCOM for some time now, but I get confused
> any time I try to get beyond simple examples and tests.
> I'd appreciate if anyone on this list could help me with some
> examples and/or pointers for creating a browser instance
> (without any GUI), passing it the URL, executing JavaScript,
> and returning the resulting DOM.
> I read Gecko embedding guide on Mozilla's website, and also
> browsed through XULPlanet's interface reference, but I'm
> still not sure how to embed Gecko in GUI-less Python app. So
> any help would be really appreciated.
>
> TIA
>
> _______________________________________________
> pyxpcom mailing list
> pyxpcom@[...].com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
_______________________________________________
pyxpcom mailing list
pyxpcom@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
ivrtaric
Mark Hammond
ivrtaric
Todd Whiteman
ivrtaric
|