The Secret of Soundex, Page 2
Now my little Visual Basic pumpkins, it's time for a history lesson. And it'll be about as exciting as my bedside clock, originally sculptured in the shape of a bedside clock.
You see, back in the 19th century, the US National Archive folk were sitting around and realised they had a problem.
The president was breathing down their necks for the latest census reports. But listing every single name on the reports was a bit space consuming and would have required more paper than potentially existed in the Amazon either the rainforest or online bookstore.
So they decided to group the many variations of names together depending on how they sounded. So the surname 'Moore' sounds a lot like 'Mower'. So 'Mower' would probably get classified under the same heading as 'Moore'.
And 'Mour', 'Moor' and 'Moooooore' are all similar-sounding variations, strange though they may be. So each would get consolidated under the general heading of 'Moore'.
But the problem these Archive chappies had was how they could clearly define which words matched phonetically.
Yes - Moore, Mower, Mour, Moor and Mooooore - all sound the same. But the boffins wanted a system; a categorical method of determining whether two names actually sound the same.
And hence, Soundex was born.
Soundex is an algorithm that follows a set number of rules to produce a four-letter code for any word. The theory is that two words sounding roughly the same will produce the same four-letter code.
So 'Moore' has the code M600. And so does 'Mower'. And 'Mour', 'Moor' and 'Moooooore'. Groovy, eh?
Hold on a minute- hey, are you chewing gum? This is a history lesson! Spit it out and instead, try the new orange flavoured Tic Tac's more freshness, less fattening. Yummy!
<Karl's bank balance increases by another #10,000>
Now imagine the real world implications of this. If you had a surname field in a database*, you could add another to hold its Soundex code. And that would make 'fuzzy' phonetical searching exceptionally easy.
And if you wanted to create a spell checker, you'd simply need to create a database full of correctly spelled words and their related Soundex codes. If the user taps in a word that isn't in the database, your program would simply need to lookup words with the same Soundex code.
In short, Soundex allows you to give your applications an in-built intelligence, a knowledge of how words are spoken rather than spelt. Next up, we're going to learn how the Soundex code is generated, as well as taking a peek at a sample VB project...
* Note: It's worth pointing out that SQL Server inherently supports Soundex. Here's a sample SQL statement which retrieves all records where the AU_LNAME field sounds like 'Green': SELECT * FROM authors WHERE Soundex(AU_LNAME) LIKE Soundex('Green') - For more information, lookup SOUNDEX in SQL Server Books Online