Re: Finding and indexing 'similar' string
by Jim Menard other posts by this author
Aug 26 2003 4:10PM messages near this date
Re: Finding and indexing 'similar' string
|
Re: Finding and indexing 'similar' string
Chris,
On Tuesday, August 26, 2003, at 12:01 PM, Chris Muller wrote:
> Jim Menard wrote:
>
> > How about using the Soundex algorithm? A quick Google search found
> > this
> > brief explanation <http://www.frontiernet.net/~rjacob/soundex.htm>
>
> Ohhh! Thank you Jim! What a simple, well-explained method for a
> sounds-like
> index. This would be a great new index type for MagmaCollections..
>
> Do you know whether it works for other keywords? Or just Surnames? I
> would
> think it would, since some people's surname are regular words anyway..
It works for any words because it is based on how they sound. I have
read about one problem with the algorithm, though: you need different
sets of characters and weightings for different languages. For example,
I think you would want "j" and "h" to map to the same sound in Mexican
Spanish. (Forgive me if that's a bad example. The only Spanish I've
ever learned was "May I have another beer, please?" and "Where is the
bathroom?")
Jim
--
Jim Menard, jimm@[...].com, http://www.io.com/~jimm/
"333: Eric the Half A Beast" -- Tim Allen in rec.humor.oracle.d
Thread:
Chris Muller
Jim Menard
Julian Fitzell
Avi Bryant
|