Add Double-Metaphone algorithm #19

Closed
opened 2025-03-28 00:27:56 +00:00 by import_user_linuxgoose_QAg4 · 2 comments
import_user_linuxgoose_QAg4 commented 2025-03-28 00:27:56 +00:00 (Migrated from gitlab.com)

https://en.wikipedia.org/wiki/Metaphone#Double_Metaphone

The Double Metaphone phonetic encoding algorithm is the second generation of
the Metaphone algorithm. Its implementation was described in the June 2000
issue of C/C++ Users Journal. It makes a number of fundamental design
improvements over the original Metaphone algorithm.

It is called "Double" because it can return both a primary and a secondary code
for a string; this accounts for some ambiguous cases as well as for multiple
variants of surnames with common ancestry. For example, encoding the name
"Smith" yields a primary code of SM0 and a secondary code of XMT, while the
name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both
have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of
Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other
origin. Thus it uses a much more complex ruleset for coding than its
predecessor; for example, it tests for approximately 100 different contexts of
the use of the letter C alone.

https://en.wikipedia.org/wiki/Metaphone#Double_Metaphone The Double Metaphone phonetic encoding algorithm is the second generation of the Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal. It makes a number of fundamental design improvements over the original Metaphone algorithm. It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both have XMT in common. Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone.
import_user_linuxgoose_QAg4 commented 2025-03-28 20:04:53 +00:00 (Migrated from gitlab.com)

mentioned in merge request !20

mentioned in merge request !20
import_user_linuxgoose_QAg4 commented 2025-03-28 20:05:03 +00:00 (Migrated from gitlab.com)

assigned to @linuxgoose

assigned to `@linuxgoose`
import_user_linuxgoose_QAg4 (Migrated from gitlab.com) closed this issue 2025-03-28 20:08:23 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: linuxgoose/linguistics-robin#19
No description provided.