Add Double-Metaphone algorithm #19

New issue

Closed

opened 2025-03-28 00:27:56 +00:00 by import_user_linuxgoose_QAg4 · 2 comments

import_user_linuxgoose_QAg4 commented

2025-03-28 00:27:56 +00:00

(Migrated from gitlab.com)

https://en.wikipedia.org/wiki/Metaphone#Double_Metaphone

The Double Metaphone phonetic encoding algorithm is the second generation of
the Metaphone algorithm. Its implementation was described in the June 2000
issue of C/C++ Users Journal. It makes a number of fundamental design
improvements over the original Metaphone algorithm.

It is called "Double" because it can return both a primary and a secondary code
for a string; this accounts for some ambiguous cases as well as for multiple
variants of surnames with common ancestry. For example, encoding the name
"Smith" yields a primary code of SM0 and a secondary code of XMT, while the
name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both
have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of
Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other
origin. Thus it uses a much more complex ruleset for coding than its
predecessor; for example, it tests for approximately 100 different contexts of
the use of the letter C alone.

https://en.wikipedia.org/wiki/Metaphone#Double_Metaphone The Double Metaphone phonetic encoding algorithm is the second generation of the Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal. It makes a number of fundamental design improvements over the original Metaphone algorithm. It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both have XMT in common. Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone.

import_user_linuxgoose_QAg4 commented

2025-03-28 20:04:53 +00:00

(Migrated from gitlab.com)

mentioned in merge request !20