|
1
|
Function to generate soundex code for any string (usually a name). Conforms to Knuth's algorithm and the common Perl implementation.
Tags: algorithms
|
1 comment
Add a comment
Sign in to comment
|
1
|
Function to generate soundex code for any string (usually a name). Conforms to Knuth's algorithm and the common Perl implementation.
Tags: algorithms
|
Sign in to comment
Warning: This is designed for English names. Warning: This algorithm (by Odell and Russell, as reported in Knuth) is designed for English language surnames. If you have a significant number of non-English surnames, you might do well to alter the values in digits to improve your matches. For example, to accomodate a large number of Spanish surname data, you should count 'J' and 'L' ('L' because of the way 'll' is used) as vowels, setting their position in digit to '0'.
The basic assumptions of Soundex are that the consonants are more important than the vowels, and that the consonants are grouped into "confusable" groups. Coming up with a set of confusables for a language is not horribly tough, but remember: each group should contain all letters that are confusable with any of those in the group. a slightly better code for both English and Spanish names has digits = '01230120002055012623010202'.