ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> pyxpcom
pyxpcom
Re: [pyxpcom] Python web crawler using Gecko
by Mark Hammond other posts by this author
Mar 17 2008 10:11AM messages near this date
view in the new Beta List Site
[pyxpcom] Python web crawler using Gecko | Re: [pyxpcom] Python web crawler using Gecko
I'm afraid that you are blazing a trail here (and might be a little confused
about how you want this architected).  Note that using XULRunner implies you
are *not* using python as the main executable - XULRunner *is* the
executable - so what you might want is for your XULRunner app to boot up,
then call your pyxpcom component, which fires off your web-crawler (in the
same process), which does its thing - but note that this is quite different
from "embedding gecko in a GUI-less Python app"

Maybe look at something like the sample in extensions/python/dom/samples (or
something like that) - that creates a XULRunner app with a number of
buttons.  You can adopt one of the buttons, so what when it is clicked, the
process above starts (ie, import the webcrawler module and call its entry
point).  I'm not sure how you would then hook attempts to load other URLs
(other than the loading of JS, which it sounds like you want to allow), but
that kind of question isn't really related to Python - ie, if you can
workout how to hook it in JS, we can help "port" it to Python - so asking on
a generic Mozilla/JS list might get better answers for that specific part of
your puzzle.

Hope this helps,

Mark

>  -----Original Message-----
>  From: pyxpcom-bounces@[...].com 
>  [mailto:pyxpcom-bounces@[...].com] On Behalf 
>  Of Ivan Vrtaric
>  Sent: Sunday, 16 March 2008 9:57 PM
>  To: pyxpcom@[...].com
>  Subject: [pyxpcom] Python web crawler using Gecko
>  
>  Hi,
>  
>  I have got a task for a course on faculty to try and see if 
>  it's possible to extend a web crawler written in Python with 
>  methods that would return DOM after the JavaScript is 
>  executed, and that would check if further JavaScript calls on 
>  the same page open any new HTTP connections. I got a 
>  suggestion to use PyXPCOM and XULRunner, and I've been 
>  researching the XPCOM for some time now, but I get confused 
>  any time I try to get beyond simple examples and tests.
>  I'd appreciate if anyone on this list could help me with some 
>  examples and/or pointers for creating a browser instance 
>  (without any GUI), passing it the URL, executing JavaScript, 
>  and returning the resulting DOM.
>  I read Gecko embedding guide on Mozilla's website, and also 
>  browsed through XULPlanet's interface reference, but I'm 
>  still not sure how to embed Gecko in GUI-less Python app. So 
>  any help would be really appreciated.
>  
>  TIA
>  
>  _______________________________________________
>  pyxpcom mailing list
>  pyxpcom@[...].com
>  To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>  

_______________________________________________
pyxpcom mailing list
pyxpcom@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
ivrtaric
Mark Hammond
ivrtaric
Todd Whiteman
ivrtaric

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved