ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> squeak
squeak
Re: Finding and indexing 'similar' string
by other posts by this author
Aug 26 2003 6:00PM messages near this date
Re: Squeak History... don't forget the Levco Prodigy! :-) | SOAPing IPaqs Squeaky Clean
This is a multipart message in MIME format.
--=_alternative 00630F6085256D8E_=
Content-Type: text/plain; charset="us-ascii"

Chris,

        There is also an algorithm called 'Metaphone' that was originally
published in Computer Language in 1990.  It does a somewhat better job
of matching similar sounding words (at least in English).  The principal
weakness of soundex is that it always uses the first letter of the word,
which can often be spelled differently.

        You might also try searches on 'agrep' ("approximate grep") and
'string similarity' and 'approximate string matching' or
'approximate pattern matching' for other references.


Here are a few fairly good references:

        http://www.bitmechanic.com/mail-archives/mysql/Jan1998/0666.html
        http://aspell.net/metaphone/metaphone-kuhn.txt
        http://www.dcc.ufmg.br/~ghuiban/paa/tp3/node18.html


                                        -Dean







Chris Muller <afunkyobject@[...].com> 
Sent by: squeak-dev-bounces@[...].org
08/26/03 12:01 PM
Please respond to chris; Please respond to The general-purpose Squeak 
developers list 

 
        To:     Squeak List <squeak-dev@[...].org> 
        cc: 
        Subject:        Re: Finding and indexing 'similar' string



Jim Menard wrote:

>  How about using the Soundex algorithm? A quick Google search found this 
>  brief explanation <http://www.frontiernet.net/~rjacob/soundex.htm>

Ohhh!  Thank you Jim!  What a simple, well-explained method for a 
sounds-like
index.  This would be a great new index type for MagmaCollections..

Do you know whether it works for other keywords?  Or just Surnames?  I 
would
think it would, since some people's surname are regular words anyway..

 - Chris

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com




--=_alternative 00630F6085256D8E_=
Content-Type: text/html; charset="us-ascii"


<br> <font size=2 face="Courier">Chris,</font>
<br> 
<br> <font size=2 face="Courier">&nbsp; &nbsp; &nbsp; &nbsp; There is also an algorithm calle
d 'Metaphone' that was originally</font> 
<br> <font size=2 face="Courier">published in Computer Language in 1990. &nbsp;It does a some
what better job</font> 
<br> <font size=2 face="Courier">of matching similar sounding words (at least in English). &n
bsp;The principal</font> 
<br> <font size=2 face="Courier">weakness of soundex is that it always uses the first letter 
of the word,</font> 
<br> <font size=2 face="Courier">which can often be spelled differently.</font>
<br> 
<br> <font size=2 face="Courier">&nbsp; &nbsp; &nbsp; &nbsp; You might also try searches on '
agrep' (&quot;approximate grep&quot;) and</font> 
<br> <font size=2 face="Courier">'string similarity' and 'approximate string matching' or</fo
nt> 
<br> <font size=2 face="Courier">'approximate pattern matching' for other references.</font>
<br> 
<br> 
<br> <font size=2 face="Courier">Here are a few fairly good references:</font>
<br> 
<br> <font size=2 face="Courier">&nbsp; &nbsp; &nbsp; &nbsp; http://www.bitmechanic.com/mail-
archives/mysql/Jan1998/0666.html</font> 
<br> <font size=2 face="Courier">&nbsp; &nbsp; &nbsp; &nbsp; http://aspell.net/metaphone/meta
phone-kuhn.txt</font> 
<br> <font size=2 face="Courier">&nbsp; &nbsp; &nbsp; &nbsp; http://www.dcc.ufmg.br/~ghuiban/
paa/tp3/node18.html</font> 
<br> 
<br> 
<br> <font size=2 face="Courier">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbs
p; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -Dean</font> 
<br> 
<br> 
<br> 
<br> 
<br> 
<br> 
<table width=100%> 
<tr valign=top> 
<td> 
<td> <font size=1 face="sans-serif"><b>Chris Muller &lt;afunkyobject@yahoo.com&gt;</b></font>
<br> <font size=1 face="sans-serif">Sent by: squeak-dev-bounces@[...].org</font>
<p> <font size=1 face="sans-serif">08/26/03 12:01 PM</font>
<br> <font size=1 face="sans-serif">Please respond to chris; Please respond to The general-pu
rpose Squeak developers list &nbsp; &nbsp; &nbsp; &nbsp;</font> 
<br> 
<td> <font size=1 face="Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font>
<br> <font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To: &nbsp; &nbsp; &nbsp; &nbs
p;Squeak List &lt;squeak-dev@lists.squeakfoundation.org&gt;</font> 
<br> <font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc: &nbsp; &nbsp; &nbsp; &nbs
p;</font> 
<br> <font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject: &nbsp; &nbsp; &nbsp;
 &nbsp;Re: Finding and indexing 'similar' string</font> </table>
<br> 
<br> 
<br> <font size=2 face="Courier New"><br>
Jim Menard wrote:<br> 
<br> 
&gt; How about using the Soundex algorithm? A quick Google search found this <br> 
&gt; brief explanation &lt;http://www.frontiernet.net/~rjacob/soundex.htm&gt;<br> 
<br> 
Ohhh! &nbsp;Thank you Jim! &nbsp;What a simple, well-explained method for a sounds-like<br> 
index. &nbsp;This would be a great new index type for MagmaCollections..<br> 
<br> 
Do you know whether it works for other keywords? &nbsp;Or just Surnames? &nbsp;I would<br> 
think it would, since some people's surname are regular words anyway..<br> 
<br> 
 - Chris<br> 
<br> 
__________________________________<br> 
Do you Yahoo!?<br> 
Yahoo! SiteBuilder - Free, easy-to-use web site design software<br> 
http://sitebuilder.yahoo.com<br> 
<br> 
</font> 
<br> 
<br> 
--=_alternative 00630F6085256D8E_=--
Attachments:
unknown1
unknown1


Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved