ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> catalog-sig
catalog-sig
Re: [Catalog-sig] search queries in PyPI
by Tarek Ziade other posts by this author
May 14 2008 12:06PM messages near this date
Re: [Catalog-sig] search queries in PyPI | Re: [Catalog-sig] search queries in PyPI
2008/5/14 Noah Kantrowitz <kantrn@[...].edu> :

>  Tarek Ziade wrote:
> 
> > Hi,
> >
> > I was wondering how the search works in PyPI (didn't have time to digg the
> > code)
> >
> > I was unable to do specific queries. For instance, how do I get the
> > packages
> > that have
> > the word "nose" and the word "plugin" in their short descriptions ?
> >
> > I tried 'nose AND plugin', 'nose+plugin', etc.. without success.
> >
> > I tried '"nose plugin"' and I got back a package that had this sequence of
> > words, but also had a package that
> > has nothing to do with it (z3c.sampledata
> > 0.1.0<http://pypi.python.org/pypi/z3c.sampledata/0.1.0>
> > )
> >
> >
>  Try "nose%plugin". Thats the syntax used in the XML-RPC API at least.


ah ! interesting, that worked, thanks !

I have also digged the code to get how it is done.

here's the pseudo code:

def search(query):
    results = {}

    terms = query.split('')
    for term in terms:
        for field in ('name', 'description', 'summary'):
            for result in store.query_packages(term):
                # ... some score calculation if result.name == field
                results[result.name] = result

    return results

Basically, there is one request over the storage (database) for each word
entered in the query,

'AND' is not used, it is event removed because it is listed as a stop word.

So, Noah's query, using %, doesn't split the words and sends them directly
to the DB
using the LIKE sql statement in one string.

In the meantime, store.query_package. *has* a feature to do AND and OR
searches:

def query_packages(query, operator='and'):
   ...


I think it wouldn't cost too much here to change the webui interface, to use
store.py features.
It woud also make it faster since only one database query could be done per
search.

I still need to install a PyPI instance for a patch I wanted to propose for
making pypi permissive on
unexisting classifiers, so maybe I can try a patch for this in the meantime
?

the change could take into account AND and OR words, to do the proper query,

Tarek




> 
> 
>  --Noah
> 
> 
>  _______________________________________________
>  Catalog-SIG mailing list
>  Catalog-SIG@[...].org
>  http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 


-- 
Tarek Ziadé - Directeur Technique
INGENIWEB (TM) - SAS 50000 Euros - RC B 438 725 632
Bureaux de la Colline - 1 rue Royale - Bâtiment D - 9ème étage
92210 Saint Cloud - France
Phone : 01.78.15.24.00 / Fax : 01 46 02 44 04
http://www.ingeniweb.com - une société du groupe Alter Way
Thread:
Tarek Ziade
Noah Kantrowitz
Tarek Ziade
Richard Jones
Tarek Ziade
martin

Privacy Policy | Email Opt-out | Feedback | Syndication
© 2004 ActiveState, a division of Sophos All rights reserved