Reference Centre, Calendars-Calculators
History of Soundex and How to Code Yourself
The soundex coding system is a system whereby names, when pronounced in English, are indexed together by their phonetic sound. The basic aim of the soundex system is for surnames with the same pronunciation to be coded to the same number, such as 'Robert' and 'Roberts', so that blanket indexing of a group of records can occur easily despite minor differences in spelling. The soundex system is referred to as a phonetic algorithm and is the most widely known and used of all phonetic algorithms.
The soundex algorithm was originally devised by Robert Russell and Margaret Odell and is known as the "Russell" Soundex. The algorithm was patented in 1918 and 1922 under U.S. Patent 1,261,167 and U.S. Patent 1,435,663, respectively.
A variation of the original Russell Soundex, called "American" Soundex, was used in the 1930s to produce indices to several of the federal U.S. census records. The soundex index of census records is available as a record separate and apart from the population and ancilliary schedules of that same census. The soundex index usually includes the total inhabitants of a State - an especially valuable tool when the exact place of residence of a family within a State is not known. The entire censuses of 1880, 1900 and 1920 have been soundexed while the 1910 census has been soundexed for only some States. The 1930 federal census has been soundexed for only twelve southern States. During the twentieth century the soundex code was primarily found in reference to documents created by the federal and state agencies of the United States of America. Alternative encoding algorithms have been developed and are outlined later on this page.
Encoding the Name
Creating the soundex code for a surname follows a very specific set of rules:
First, all soundex codes begin with the first letter of all surnames. So if your surname is Jones the soudex code will begin with the letter 'J'. Similarly, if your surname is Anderson, the soundex code will begin with the letter 'A'.
Second, remove all of the vowels 'a', 'e', 'i' 'o' and 'u' from the surname as well as all occurrences of the letters 'h', 'w' and 'y'.
Third, the second occurring of a double letter in a surname is eliminated. For example, the soundex of the surname 'Donnatelli' would only code on the letters 'l'.
Fourth, if 2 coded consonants having the same code value appear side-by-side in the surname (see the letter-code chart in step six, below) those two consonants are treated and coded with one number only. For example, the letters 'ck' and 'sz'. All four of these letters code as a number 2, so in each instance only one number 2 for each set of double-coded letters would appear in the surname code.
the soundex code for the surname 'B
u erck' would be B620 not B622.
the soundex code for the surname 'L
uk asc h o wsk y', where there are two separate sets of side-by-side same value letters, would be L222.
Fifth, only the first three remaining letters of the surname are assigned a numerical value, as follows:
- B F P V
- C G J K Q S X Z
- D T
- M, N
If there are only two letters remaining in the surname that can be coded, add a '0' to the end of the soundex number to make up the third digit.
Step 1) 'W' - 'illiams' remains of the surname
Step 2) the second of the letters 'l' is crossed out
Step 3) the letters 'i', and 'a' are removed leaving only 'l', 'm', 's'
Step 4) according to the chart above,
'l' = 4
'm' = 5
's' = 2
Soundex Code for the surname Williams is W452.
Step 1) Y -' oung' remains of the surname
Step 2) no double consonants so 'oung' still remains
Step 3) the letters 'o' and 'u' are removed leaving only 'n' and 'g'
Step 4) according to the chart above,
'n' = 5
'g' = 2
Step 5) only two letters remained so '0' has to be added to the end of the code
Soundex Code for the surname Young is Y520.
Exceptions to the Rules
Surnames with prefixed pose a particular sort of problem as sometimes the prefix was ignored when the surname was coded. For example, if you are searching for the surname 'de la Roux', it may be that only the 'Roux' portion of the surname had been soundex coded.
Oriental or Indian names were sometimes soundex coded as if they were one longer continuous name if there was no distinguishable surname. For example, the name 'White-cloud' may have been soundex coded as 'Whitecloud' or simply as 'Cloud'. As another example, 'Shinka-Wa-Sa' may have been coded as 'Shinka' or 'Sa'.
Nuns in religious orders were all coded as if their last name was 'Sister'. So, in this instance all nuns will be found under the soundex code of 'S236'.
Alternative Soundex Algorithms
The "Reverse Soundex" prefixes the last letter of the name instead of the first but otherwise essentially retains the same coding system.
Deficiencies in the Russell and American Soundex algorithms, lead Lawrence Philips to develop the "Metaphone" algorithm, which can produce a more precise index.
"Daitch-Mokotoff" Soundex was developed by Gary Mokotoff and Randy Daitch to address the problems they had while attempting to apply the Russell Soundex to Jews with Germanic or Slavic surnames such as Moskowitz vs. Moskovitz or Levine vs. Lewin. The procedure for encoding a name using the Daitch-Mokotoff algorithm is far more complex than that used for coding a name according to the the Russell and American Soundex.
The New York State Identification and Intelligence System (NYSIIS) algorithm adds it's own improvement on the Russell and American Soundex encoding systems.
- U.S.A. Archives
- Daitch-Mokotoff from Avotaynu
- Daitch-Mokotoff from Jewish Genealogy
- Freely available Soundex, Metaphone, and Double Metaphone implementation in Java at Apache Commons' codec project