ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> pyxpcom
pyxpcom
Re: [pyxpcom] Python web crawler using Gecko
by ivrtaric other posts by this author
Mar 30 2008 4:27PM messages near this date
view in the new Beta List Site
Re: [pyxpcom] Python web crawler using Gecko | Re: [pyxpcom] Python web crawler using Gecko
Sorry for belated reply, but better late than never, I suppose :)

I'm sorry I mentioned XULRunner, because I actually don't use it as a 
runner. It was in the suggestions professor gave me, and I had to 
install xulrunner package in my Ubuntu system along with python-xpcom. 
Other than that, I don't use it, at least not intentionally :)

I'm actually having trouble using the instance of nsWebBrowser in my 
python code:

### Code start
class MyChrome:
    _com_interfaces_ = [interfaces.nsIWebBrowserChrome, 
interfaces.nsIEmbeddingSiteWindow]
   
    def __init__(self, webBrowser=None, title="Test title"):
        self.webBrowser = webBrowser
        self.title = title
   
    # nsIWebBrowserChrome methods implementation
    # nsIEmbeddingSiteWindow methods implementation

chrome = xpcom.server.WrapObject( MyChrome(), 
interfaces.nsIWebBrowserChrome )

browsercls = 
components.classes["@mozilla.org/embedding/browser/nsWebBrowser;1"].createInstance()

setup = browsercls.queryInterface(interfaces.nsIWebBrowserSetup)
setup.setProperty( setup.SETUP_IS_CHROME_WRAPPER, True )
# setup.setProperty( setup.SETUP_ALLOW_JAVASCRIPT, True )
# setup.setProperty( setup.SETUP_ALLOW_IMAGES, True )

browser = browsercls.queryInterface(interfaces.nsIWebBrowser)
browser.containerWindow = chrome

baseWindow = browser.queryInterface(interfaces.nsIBaseWindow)

print;
print browser.containerWindow
print;
print baseWindow.title
### Code end

I have several problems with this code. First, after I set 
SETUP_IS_CHROME_WRAPPER, Python throws an exception every time I try to 
set any other SETUP_ property (that's why those setProperty calls are 
commented out in the code). Second, I have no idea how to connect my 
chrome class object with WebBrowser instance, other than direct 
assignment (browser.containerWindow = chrome). In C++ code snippets I 
found on the web (specifically, on the Mozilla embedding APIs overview 
page) a function SetContainerWindow() is used, but it appears that there 
is no such function in Python implementation. Also, Python throws 
"unspecified XPCOM exception" on the last line, and it throws the same 
exception if I try to print (or access) browser.contentDOMWindow .

Now, I'm almost completely sure I'm doing a lot of things wrong, but I 
don't know where, how, or why. What I'd like to ask is, is this the 
right way of using the PyXPCOM at all, and is it even possible to use 
Gecko from Python in this manner (for example, to instance it, pass it 
some URL and retrieve the DOM tree from it)?

Some code snippets would be appreciated, too :)

TIA

>  I'm afraid that you are blazing a trail here (and might be a little confused
>  about how you want this architected).  Note that using XULRunner implies you
>  are *not* using python as the main executable - XULRunner *is* the
>  executable - so what you might want is for your XULRunner app to boot up,
>  then call your pyxpcom component, which fires off your web-crawler (in the
>  same process), which does its thing - but note that this is quite different
>  from "embedding gecko in a GUI-less Python app"
> 
>  Maybe look at something like the sample in extensions/python/dom/samples (or
>  something like that) - that creates a XULRunner app with a number of
>  buttons.  You can adopt one of the buttons, so what when it is clicked, the
>  process above starts (ie, import the webcrawler module and call its entry
>  point).  I'm not sure how you would then hook attempts to load other URLs
>  (other than the loading of JS, which it sounds like you want to allow), but
>  that kind of question isn't really related to Python - ie, if you can
>  workout how to hook it in JS, we can help "port" it to Python - so asking on
>  a generic Mozilla/JS list might get better answers for that specific part of
>  your puzzle.
> 
>  Hope this helps,
> 
>  Mark
> 
>    
> > -----Original Message-----
> > From: pyxpcom-bounces@[...].com 
> > [mailto:pyxpcom-bounces@[...].com] On Behalf 
> > Of Ivan Vrtaric
> > Sent: Sunday, 16 March 2008 9:57 PM
> > To: pyxpcom@[...].com
> > Subject: [pyxpcom] Python web crawler using Gecko
> >
> > Hi,
> >
> > I have got a task for a course on faculty to try and see if 
> > it's possible to extend a web crawler written in Python with 
> > methods that would return DOM after the JavaScript is 
> > executed, and that would check if further JavaScript calls on 
> > the same page open any new HTTP connections. I got a 
> > suggestion to use PyXPCOM and XULRunner, and I've been 
> > researching the XPCOM for some time now, but I get confused 
> > any time I try to get beyond simple examples and tests.
> > I'd appreciate if anyone on this list could help me with some 
> > examples and/or pointers for creating a browser instance 
> > (without any GUI), passing it the URL, executing JavaScript, 
> > and returning the resulting DOM.
> > I read Gecko embedding guide on Mozilla's website, and also 
> > browsed through XULPlanet's interface reference, but I'm 
> > still not sure how to embed Gecko in GUI-less Python app. So 
> > any help would be really appreciated.
> >
> > TIA
> >
> > _______________________________________________
> > pyxpcom mailing list
> > pyxpcom@[...].com
> > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
> >
> >     
> 
> 
>    

_______________________________________________
pyxpcom mailing list
pyxpcom@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
ivrtaric
Mark Hammond
ivrtaric
Todd Whiteman
ivrtaric

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved