Re: [pyxpcom] Python web crawler using Gecko
by ivrtaric other posts by this author
Mar 30 2008 4:27PM messages near this date
view in the new Beta List Site
Re: [pyxpcom] Python web crawler using Gecko
|
Re: [pyxpcom] Python web crawler using Gecko
Sorry for belated reply, but better late than never, I suppose :)
I'm sorry I mentioned XULRunner, because I actually don't use it as a
runner. It was in the suggestions professor gave me, and I had to
install xulrunner package in my Ubuntu system along with python-xpcom.
Other than that, I don't use it, at least not intentionally :)
I'm actually having trouble using the instance of nsWebBrowser in my
python code:
### Code start
class MyChrome:
_com_interfaces_ = [interfaces.nsIWebBrowserChrome,
interfaces.nsIEmbeddingSiteWindow]
def __init__(self, webBrowser=None, title="Test title"):
self.webBrowser = webBrowser
self.title = title
# nsIWebBrowserChrome methods implementation
# nsIEmbeddingSiteWindow methods implementation
chrome = xpcom.server.WrapObject( MyChrome(),
interfaces.nsIWebBrowserChrome )
browsercls =
components.classes["@mozilla.org/embedding/browser/nsWebBrowser;1"].createInstance()
setup = browsercls.queryInterface(interfaces.nsIWebBrowserSetup)
setup.setProperty( setup.SETUP_IS_CHROME_WRAPPER, True )
# setup.setProperty( setup.SETUP_ALLOW_JAVASCRIPT, True )
# setup.setProperty( setup.SETUP_ALLOW_IMAGES, True )
browser = browsercls.queryInterface(interfaces.nsIWebBrowser)
browser.containerWindow = chrome
baseWindow = browser.queryInterface(interfaces.nsIBaseWindow)
print;
print browser.containerWindow
print;
print baseWindow.title
### Code end
I have several problems with this code. First, after I set
SETUP_IS_CHROME_WRAPPER, Python throws an exception every time I try to
set any other SETUP_ property (that's why those setProperty calls are
commented out in the code). Second, I have no idea how to connect my
chrome class object with WebBrowser instance, other than direct
assignment (browser.containerWindow = chrome). In C++ code snippets I
found on the web (specifically, on the Mozilla embedding APIs overview
page) a function SetContainerWindow() is used, but it appears that there
is no such function in Python implementation. Also, Python throws
"unspecified XPCOM exception" on the last line, and it throws the same
exception if I try to print (or access) browser.contentDOMWindow .
Now, I'm almost completely sure I'm doing a lot of things wrong, but I
don't know where, how, or why. What I'd like to ask is, is this the
right way of using the PyXPCOM at all, and is it even possible to use
Gecko from Python in this manner (for example, to instance it, pass it
some URL and retrieve the DOM tree from it)?
Some code snippets would be appreciated, too :)
TIA
> I'm afraid that you are blazing a trail here (and might be a little confused
> about how you want this architected). Note that using XULRunner implies you
> are *not* using python as the main executable - XULRunner *is* the
> executable - so what you might want is for your XULRunner app to boot up,
> then call your pyxpcom component, which fires off your web-crawler (in the
> same process), which does its thing - but note that this is quite different
> from "embedding gecko in a GUI-less Python app"
>
> Maybe look at something like the sample in extensions/python/dom/samples (or
> something like that) - that creates a XULRunner app with a number of
> buttons. You can adopt one of the buttons, so what when it is clicked, the
> process above starts (ie, import the webcrawler module and call its entry
> point). I'm not sure how you would then hook attempts to load other URLs
> (other than the loading of JS, which it sounds like you want to allow), but
> that kind of question isn't really related to Python - ie, if you can
> workout how to hook it in JS, we can help "port" it to Python - so asking on
> a generic Mozilla/JS list might get better answers for that specific part of
> your puzzle.
>
> Hope this helps,
>
> Mark
>
>
> > -----Original Message-----
> > From: pyxpcom-bounces@[...].com
> > [mailto:pyxpcom-bounces@[...].com] On Behalf
> > Of Ivan Vrtaric
> > Sent: Sunday, 16 March 2008 9:57 PM
> > To: pyxpcom@[...].com
> > Subject: [pyxpcom] Python web crawler using Gecko
> >
> > Hi,
> >
> > I have got a task for a course on faculty to try and see if
> > it's possible to extend a web crawler written in Python with
> > methods that would return DOM after the JavaScript is
> > executed, and that would check if further JavaScript calls on
> > the same page open any new HTTP connections. I got a
> > suggestion to use PyXPCOM and XULRunner, and I've been
> > researching the XPCOM for some time now, but I get confused
> > any time I try to get beyond simple examples and tests.
> > I'd appreciate if anyone on this list could help me with some
> > examples and/or pointers for creating a browser instance
> > (without any GUI), passing it the URL, executing JavaScript,
> > and returning the resulting DOM.
> > I read Gecko embedding guide on Mozilla's website, and also
> > browsed through XULPlanet's interface reference, but I'm
> > still not sure how to embed Gecko in GUI-less Python app. So
> > any help would be really appreciated.
> >
> > TIA
> >
> > _______________________________________________
> > pyxpcom mailing list
> > pyxpcom@[...].com
> > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
> >
> >
>
>
>
_______________________________________________
pyxpcom mailing list
pyxpcom@[...].com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Thread:
ivrtaric
Mark Hammond
ivrtaric
Todd Whiteman
ivrtaric
|