This is one of those times when I discover something that I probably should have known for a long time, and I certainly wish I had. And if you’re one of my friends and you knew about this and didn’t tell me, may you rot in hell lying in a bed of nails covered with black flies. Now that I got this out of the way, on with the article…
I decided to look for some kind of algorithm that would allow for matching words that are similar. This would inevitably be to retrieve records from a database based on some search criteria. So I search google and find something about a Soundex algorithm. The algorithm goes as follows (copied straight from Wikipedia):
- Retain the first letter of the string
- Remove all occurrences of the following letters, unless it is the first letter: a, e, h, i, o, u, w, y
-
Assign numbers to the remaining letters (after the first) as follows:
- b, f, p, v = 1
- c, g, j, k, q, s, x, z = 2
- d, t = 3
- l = 4
- m, n = 5
- r = 6
- If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w (American census only), then omit all but the first.
- Return the first four characters, right-padding with zeroes if there are fewer than four.
But that’s not the good part… The good part is that this algorithm is implemented in some DBMS systems. And apparently, you guessed it, it’s implemented in the most popular ones, SQL Server, Oracle, and MySQL. How does it work? It’s oh so difficult… Check out the code below:
SELECT *
FROM address
WHERE SOUNDEX(city) = SOUNDEX('Washgton')
If you have any records in the database for Washington (note in the query it’s missing the “i”) it will be returned. Wonderful! I could probably have used this before. And for those who have the possibility of adding UDFs to their DB server, there are implementations of other algorithms such as Metaphone and Similar_text.
Note that Soundex is a phonetic algorithm, so it looks for words that would sound similar. So you might not always get the results you want. When I searched my person table using SOUNDEX(first_name) = SOUNDEX(’Tomas’) I did get a bunch of “Thomas” records back, but if I use SOUNDEX(’Thomas’) I did not, I got a bunch of “Tom”, “Tommy”, and even “Tony” records, but no “Thomas”. Oh well, still better than nothing. I bet that using a combination of different algorithms you can probably get some good results. More research to be done…
This entry was posted on Friday, March 23rd, 2007 at 5:07 pm and is filed under Databases. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


March 26th, 2007 at 10:17 pm
hehe, I knew about this. A while back I wanted to do a spell check thing and soundex algorithms are one of the ways you can figure out what the person wanted to say when they misspelt a word. I deliberately decided not to tell you just hoping for this very day. Oh revenge is such a sweet and tasty dish when served cold.
I did not know that dbms’s have it integrated natively though. Will definitely use a hybrid of this in order to improve my search returns.
February 17th, 2009 at 5:48 pm
Новый способ давления на кандидата на пост Главы г. Химки
Новый способ “наказать” тех, кто посмел участвовать в выборной кампании не на стороне действующей власти изобрели правоохранительные органы г.о. Химки.
Руководствуясь не нормой закона, а чьей-то “волей” сотрудники милиции решили “проверить” все фирмы, внесшие денежные средства в избирательный фонд неудобных кандидатов.
Начались “проверки” с телефонных звонков – где директор, сколько человек работает на фирме. После чего последовали “письма счастья” с просьбой предоставить всю бухгалтерскую документацию, учредительные документы фирмы, и даже, план экспликации БТИ.
Такие запросы химкинским фирмам рассылает 1 отдел Оперативно-розыскной части № 9 Управления по налоговым преступлениям ГУВД Московской области за подписью начальника подполковника милиции Д.В. Языкова.
И всё это в то время, когда Президент дал прямое указание правоохранительным органам о прекращении всех незаконных проверок малого и среднего бизнеса. С это целью внесены изменения в Федеральный закон “О милиции” – из статьи 11 этого закона исключены пункты 25 и 35, на основании которых ранее правоохранительные органы имели право проверять финансово-хозяйственную деятельность предприятий.
Видно, об изменениях действующего законодательства местные правоохранительные органы не уведомлены. И не смотрят телепередачи с выступлениями Президента.
Может быть, эта публикация подвигнет их к исполнению указаний Президента, а также к изучению и соблюдению действующего законодательства
June 5th, 2009 at 12:04 pm
Привет! Мы предлогаем аренда квартиры в Москве без комиссии!
Звоните, все расскажут и покажут. Большой выбор квартир на сайте
July 2nd, 2009 at 8:08 am
Film online
July 12th, 2009 at 11:04 pm
Хмм… а можно ли у вас брать посты с рсс канала? Ссылку на вас обязательно поставлю.
October 10th, 2009 at 12:59 am
Looking for Azithromycin ?
Check our best Azithromycin online drugstores!
http://drugsovernight.org/thumbs/pharma3.jpg
**CLICK TO BUY CHEAP Azithromycin PILLS**
http://iwebimg.net/ifeed/img/74/fimg/Azithromycin/1
http://iwebimg.net/ifeed/img/74/fimg/Azithromycin/2
http://iwebimg.net/ifeed/img/74/fimg/Azithromycin/3
http://iwebimg.net/ifeed/img/74/fimg/Azithromycin/4
http://iwebimg.net/ifeed/img/74/fimg/Azithromycin/5
http://iwebimg.net/ifeed/img/74/fimg/Azithromycin/6
Full information you can find at Wikipedia
All these mechanical methods are based on a selective sample of research in this area.Alternate treatments also be a sign of hyper-thyroidism, depression, your doctor in the context of her health Your history of the illness.Azithromycin.Depression, low self esteem, body stores up to four fused rings. azithromycin for eye infections
Related topics:
November 15th, 2009 at 11:05 pm
Hey everyone just wanna say hello and introduce myself!
January 9th, 2010 at 5:09 pm
Hello, I came across your web site a little while ago and have ploughed through all the posts and comments quietly. I decided I might make my firstcomment. Not really sure what to say but anyway. Informative blog. Will visit soon to see more of what you have to offer.