Talk:Soundex: Difference between revisions

Content added Content deleted

Inline

Revision as of 06:57, 31 May 2012

Task Improvement

It's all very well to have "do soundex" as a task, but it would be far better if we had a more concrete task. For example, attempting spelling corrections on a short text using soundex matching against a supplied dictionary. Right now, it feels like this task isn't really going anywhere. –Donal Fellows 15:51, 12 November 2009 (UTC)

A contributor to the problem is that there's no algorithm. I looked at the one on WP and it doesn't make that much sense to me. I checked the talk page and there's an alternate algorithm proposed, but it apparently doesn't cover all cases. Also, for languages without built-in soundex libraries, doing the conversion alone seems like task enough to me. --Mwn3d 16:07, 12 November 2009 (UTC)

Fair point about languages without soundex in libs. Maybe the other idea would be better as a task that builds on this one… –Donal Fellows 16:30, 12 November 2009 (UTC)

As I understand, there are different soundex algorithms, based somewhat on the language and on the applicatons. I also read the Wikipedia entry and it does not present the algorithm clearly. A couple of years ago I needed an algorithm to match information for new entries in a database to existing names. That's when I ran across the soundex algorithm. --Rldrenth 21:03, 12 November 2009 (UTC)

The task seems very ambiguous to me. Should I be writing code that parses a word based on the Wikipedia "Rules" section? Should I show that burrows and Burroughs have the same soundex index? Should there be a "Sample Output" section? The task may be simple to implement, if I only knew what was expected. -Crazyfirex Feb. 20, 19:39:47 (UTC)

Yes, the Burroughs and burrows words took me a while to get straight, but in doing so, I found the bug in the program. It was because of this bug that I verified my REXX program with almost all other samples to verify my interpretation and implementation of the rules were correct.

Also, thanks to the Go program, I found another bug (using 12346 as a word). If I hadn't perused through all of the examples' outputs, I'd never found that error.

I think the ole saw about it's not over until the fat lady sings should apply here. If you don't show any output, we can't assume the program (example) is correct. I coded up an equivalent program of the PL/I example and it produced incorrect results. [I don't have a PL/I compiler, so I can't bet my life on it that it's wrong.] I'm sure that there are other examples that are incorrect, particularly those examples that assume the first character is a letter, and in other cases, where the character being examined isn't a letter of the Latin alphabet (punctuation, blanks, apostrophes, hyphens, etc). -- Gerard Schildberger 06:57, 31 May 2012 (UTC)

Which Soundex?

It isn't clear which Soundex algorithm each example is implementing. For example, the US Census rules have a special case for "H" and "W" (ignored but don't separate runs of consonants). I suggest adding a set of test cases to the problem description which can distinguish between the many variants of Soundex out there. For starters:

A261 for Ashcraft
B620 for Burroughs and Burrows

--IanOsgood 15:35, 13 November 2009 (UTC)

There's an algorithm that's been floating around for years that's attributed to (DE?) Knuth, but that's not the original. Mind you, the original was not designed for use by computers either. In any case, go with the Knuth algorithm (Google can find implementations of it easily enough). –Donal Fellows 12:12, 14 November 2009 (UTC)

@@ Line 5: / Line 5: @@
 :As I understand, there are different soundex algorithms, based somewhat on the language and on the applicatons. I also read the Wikipedia entry and it does not present the algorithm clearly. A couple of years ago I needed an algorithm to match information for new entries in a database to existing names. That's when I ran across the soundex algorithm. --[[User:Rldrenth|Rldrenth]] 21:03, 12 November 2009 (UTC)
 The task seems very ambiguous to me. Should I be writing code that parses a word based on the Wikipedia "Rules" section? Should I show that burrows and Burroughs have the same soundex index? Should there be a "Sample Output" section? The task may be simple to implement, if I only knew what was expected.   -[[User:Crazyfirex|Crazyfirex]] Feb. 20, 19:39:47 (UTC)
+::: Yes, the ''Burroughs'' and ''burrows'' words took me a while to get straight, but in doing so, I found the bug in the program.  It was because of this bug that I verified my REXX program with almost all other samples to verify my interpretation and implementation of the rules were correct.
+::: Also, thanks to the '''Go''' program, I found another bug (using ''12346'' as a word).  If I hadn't perused through ''all'' of the examples' outputs, I'd never found that error.
+::: I think the ole saw about ''it's not over until the fat lady sings'' should apply here.  If you don't show any output, we can't assume the program (example) is correct.  I coded up an equivalent program of the PL/I example and it produced incorrect results. [I don't have a PL/I compiler, so I can't bet my life on it that it's wrong.]  I'm sure that there are other examples that are incorrect, particularly those examples that assume the first character is a letter, and in other cases, where the character being examined isn't a letter of the Latin alphabet (punctuation, blanks, apostrophes, hyphens, etc). -- [[User:Gerard Schildberger|Gerard Schildberger]] 06:57, 31 May 2012 (UTC)
 == Which Soundex? ==