9 years ago · f68f344e1d
--- a/Makefile.am
+++ b/Makefile.am
@@ -63,6 +63,7 @@ src/espeak-ng.1: src/espeak-ng.1.ronn

 docs:	docs/index.html \
 	docs/add_language.html \
 	docs/dictionary.html \
 	src/espeak-ng.1.html \
 	README.html \
 	src/espeak-ng.1
--- a/docs/dictionary.html
+++ b/docs/dictionary.html
@@ -1,652 +0,0 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 <html>

 <head>
  <title>eSpeak: Pronunciation Dictionaries</title>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 </head>
 <body>

 <A href="docindex.html">Back</A>
 <hr>
 <h2>4. TEXT TO PHONEME TRANSLATION</h2>
 <hr>
 <h3>4.1 Translation Files</h3>
 There is a separate set of pronunciation files for each language, their names starting with the language name.
 <p>
 There are two separate methods for translating words into phonemes:
 <ul>
 <li>Pronunciation Rules.  These are an attempt to define the pronunciation rules for the language. The source file is:<br>
 	<strong><em>&lt;language&gt;_rules</em></strong>&nbsp;	(eg.  en_rules)<br>

 <p>
 <li>
 Lookup Dictionary.  A list of individual words and their pronunciations and/or various other properties.  The source files are:<br>
 	<strong><em>&lt;language&gt;_list</em></strong>&nbsp;	(eg.  en_list) and optionally <strong><em>&lt;language&gt;_extra</em></strong>&nbsp;<br>
 </ul>
 These two files are compiled into the file
 	<strong><em>&lt;language&gt;_dict</em></strong>&nbsp; in the espeak-data directory (eg.  espeak-data/en_dict)

 <p>&nbsp;<hr>
 <h3>4.2  Phoneme names</h3>
 Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, or 4 characters.  Together with a number of utility codes (eg. stress marks and pauses), these are defined in the phoneme data file (see *spec not yet available*).
 <p>
 The utility 'phonemes' are:

 <ul><table>
  <tbody align="left">
    <tr>
      <td><strong>' </strong></td>
      <td>primary stress</td>
    </tr>
    <tr>
      <td><strong>, </strong></td>
      <td>secondary stress</td>
    </tr>
    <tr>
      <td><strong>% </strong></td>
      <td>unstressed syllable</td>
    </tr>
    <tr>
      <td><strong>=&nbsp;&nbsp;&nbsp;</strong></td>
      <td>put the primary stress on the preceding syllable</td>
    </tr>
    <tr>
      <td><strong>_:</strong></td>
      <td>short pause</td>
    </tr>
    <tr>
      <td><strong>_</strong></td>
      <td>a shorter pause</td>
    </tr>
    <tr>
      <td><strong>|| </strong></td>
      <td>indicates a word boundary within a phoneme string</td>
    </tr>
    <tr>
      <td><strong>| </strong></td>
      <td>can be used to separate two adjacent characters, to prevent them from being considered as a multi-character phoneme mnemonic</td>
    </tr>
  </tbody>
 </table>
 </ul>

 It is not necessary to specify the stress of every syllable.  Stress markers are only needed in order to change the effect of the language's default stress rule.
 <p>
 The phonemes which are used to represent a language's sounds are based loosely on the Kirshenbaum ascii character representation of the International Phonetic Alphabet <a href="http://www.kirshenbaum.net/IPA/ascii-ipa.pdf">www.kirshenbaum.net/IPA/ascii-ipa.pdf</a>

 <p>&nbsp;<hr>
 <h3>4.3  Pronunciation Rules</h3>
 The rules in the <strong><em>&lt;language&gt;_rules</em></strong>&nbsp;  file specify the phonemes which are used to pronounce each letter, or sequence of letters.  Some rules only apply when the letter or letters are preceded by, or followed by, other specified letters.
 <p>
 To find the pronunciation of a word, the rules are searched and any which match the letters at the in the word are given a score depending on how many letters are matched.  The pronunciation from the best matching rule is chosen. The pointer into the source word is then advanced past those letters which have been matched and the process is repeated until all the letters of the word have been processed.
 <p>
 <h4>4.3.1 Rule Groups</h4>
 The rules are organized in groups, each starting with a ".group" line:
 <ul><dl>
 <dt><strong>.group &lt;character&gt;</strong><br><dd>
 	A group for each letter or character.
 <p>
 <dt><strong>.group &lt;2 characters&gt;</strong><br><dd>
 	Optional groups for some common 2 letter combinations.  This is only needed, for efficiency, in cases where there are many rules for a particular letter.  They would not be needed for a language which has regular spelling rules.  The first character can only be an ascii character (less than 0x80).
 <p>
 <dt><strong>.group</strong><br><dd>
 	A group for other characters which don't have their own group.
 <p>
 <dt><strong>.L&lt;nn&gt;</strong><br><dd>
 	Defines a group of letter sequences, any of which can match with <strong>Lnn</strong> in a <strong>pre</strong> or <strong>post</strong> rule (see below).  <strong>nn</strong> is a 2 digit decimal number in the range 01 to 25.  eg:<p>
 	<code>.L01  b bl br pl pr</code>
 <p>
 <dt><strong>.replace</strong><br><dd>
 	See section 4.7 Character Substitution, below.
 </dl>
 </ul>When matching a word, firstly the 2-letter group for the two letters at the current position in the word (if such a group exists) is searched, and then the single-letter group.  The highest scoring rule in either of those two groups is used.

 <h4>4.3.2 Rules</h4>
 Each rule is on separate line, and has the syntax:
 <ul>
 	[&lt;pre&gt;)]  &lt;match&gt;  [(&lt;post&gt;]  &lt;phoneme string&gt;
 </ul>
 eg.

 <ul><pre>.group o
       o        0	// "o" is pronounced as [0]
       oo       u:      // but "oo" is pronounced as [u:]
    b) oo (k    U
 </pre>
 </ul> "oo" is pronounced as [u:], but when also preceded by "b" and followed by "k", it is pronounced [U].
 <p>
 In the case of a single-letter group, the first character of &lt;match&gt; much be the group letter.  In the case of a 2-letter group, the first two characters of &lt;match&gt; must be the group letters.  The second and third rules above may be in either  .group o  or  .group oo
 <p>
 Alphabetic characters in the &lt;pre&gt;, &lt;match&gt;, and &lt;post&gt; parts must be lower case, and matching is case-insensitive.  Some upper case letters are used in &lt;pre&gt; and &lt;post&gt; with special meanings.
 <p>
 <h4>4.3.3 Special characters in &lt;phoneme string&gt;:</h4>
 <ul><table>
  <tbody>
    <tr>
      <td><strong>_^_&lt;language code&gt;&nbsp;&nbsp;&nbsp;</strong></td>
      <td>Translate using a different language.</td>
    </tr>
  </tbody>
 </table>
 If this rule is selected when translating a word, then the translation is aborted and the word is re-translated using the specified different language.  &lt;language code&gt; may be upper or lower case. This can be used to recognise certain letter combinations as being foreign words and to use the foreign pronunciation for them. eg:
 <pre>
    th (_     _^_EN
 </pre>
 indicates that a word which ends in "th" is translated using the English translation rules and spoken with English phonemes.
 </ul>
 <h4>4.3.4 Special Characters in both &lt;pre&gt; and &lt;post&gt;:</h4>

 <ul><table>
  <tbody>
    <tr>
      <td><strong>_</strong></td>
      <td>Beginning or end of a word (or a hyphen).</td>
    </tr>
    <tr>
      <td><strong>-</strong></td>
      <td>Hyphen.</td>
    </tr>
    <tr>
      <td><strong>A</strong></td>
      <td>Any vowel (the set of vowel characters may be defined for a particular language).</td>
    </tr>
    <tr>
      <td><strong>C</strong></td>
      <td>Any consonant.</td>
    </tr>
    <tr>
      <td><strong>B&nbsp;H&nbsp;F&nbsp;G&nbsp;Y&nbsp;</strong></td>
      <td>These may indicate other sets of characters (defined for a particular language).</td>
    </tr>
     <tr>
      <td><strong>L&lt;nn&gt;</strong></td>
      <td>Any of the sequence of characters defined as a letter group (see 4.3.1 above).</td>
    </tr>
     <tr>
      <td><strong>D</strong></td>
      <td>Any digit.</td>
    </tr>
    <tr>
      <td><strong>K</strong></td>
      <td>Not a vowel (i.e. a consonant or word boundary or non-alphabetic character).</td>
    </tr>
    <tr>
      <td><strong>X</strong></td>
      <td>There is no vowel until the word boundary.</td>
    </tr>
    <tr>
      <td><strong>Z</strong></td>
      <td>A non-alphabetic character.</td>
    </tr>
    <tr>
      <td><strong>%</strong></td>
      <td>Doubled (placed before a character in &lt;pre&gt; and after it in &lt;post&gt;.</td>
    </tr>
    <tr>
      <td><strong>/</strong></td>
      <td>The following character is treated literally.</td>
    </tr>
  </tbody>
 </table>
 </ul>
 The sets of letters indicated by A, B, C, E, F G may be defined differently for each language.
 <p>
 Examples of rules:
 <pre>     _)  a         // "a" at the start of a word
         a (CC     // "a" followed by two consonants
         a (C%     // "a" followed by a double consonant (the same letter twice)
         a (/%     // "a" followed by a percent sign
     %C) a         // "a" preceded by a double consonants
 </pre>
 <h4>4.3.5 Special characters only in &lt;pre&gt;:</h4>
 <ul><table>
  <tbody>
    <tr>
      <td><strong>@&nbsp;&nbsp;&nbsp;</strong></td>
      <td>Any syllable.</td>
    </tr>
    <tr>
      <td><strong>&#038;</strong></td>
      <td>A syllable which may be stressed (i.e. is not defined as unstressed).</td>
    </tr>
    <tr>
      <td><strong>V</strong></td>
      <td>Matches only if a previous word has indicated that a verb form is expected.</td>
    </tr>
  </tbody>
 </table>
 </ul>
 eg.
 <pre>     @@)  bi      // "bi" preceded by at least two syllables
     @@a) bi      // "bi" preceded by at least 2 syllables and following 'a'
 </pre>
 Note, that matching characters in the &lt;pre&gt; part do not affect the syllable counting.
 <p>
 <h4>4.3.6 Special characters only in &lt;post&gt;:</h4>
 <ul><table>
  <tbody>
    <tr>
      <td><strong>@</strong></td>
      <td>A vowel follows somewhere in the word.</td>
    </tr>
    <tr>
      <td><strong>+</strong></td>
      <td>Force an increase in the score in this rule (may be repeated for more effect).</td>
    </tr>
    <tr>
      <td><strong>S&lt;number&gt;&nbsp;&nbsp;</strong></td>
      <td>This number of matching characters are a standard suffix, remove them and retranslate the word.</td>
    </tr>
    <tr>
      <td><strong>P&lt;number&gt;</strong></td>
      <td>This number of matching characters are a standard prefix, remove them and retranslate the word.</td>
    </tr>
    <tr>
      <td><strong>Lnn</strong></td>
      <td><strong>nn</strong> is a 2-digit decimal number in the range 01 to 20<br>
          Matches with any of the letter sequences which have been defined for letter group <strong>nn</strong></td>
    </tr>
    <tr>
      <td><strong>N</strong></td>
      <td>Only use this rule if the word is not a retranslation after removing a suffix.</td>
    </tr>
    <tr>
      <td><strong>#</strong></td>
      <td>(English specific) change the next "e" into a special character "E"</td>
    </tr>
    <tr>
      <td><strong>$noprefix</strong></td>
      <td>Only use this rule if the word is not a retranslation after removing a prefix.</td>
    </tr>
    <tr>
      <td><strong>$w_alt<br>$w_alt2<br>$w_alt3</strong></td>
      <td>Only use this rule if the word is found in the *_list file with the <b>$alt</b>, <b>$alt2</b> or <b>$alt3</b> attribute respectively.</td>
    </tr>
    <tr>
      <td><strong>$p_alt<br>$p_alt2<br>$p_alt3</strong></td>
      <td>Only use this rule if the part-word, up to and including the pre and match parts of this rule, is found in the *_list file with the <b>$alt</b>, <b>$alt2</b> or <b>$alt3</b> attribute respectively.</td>
    </tr>
  </tbody>
 </table>
 </ul>

 eg.
 <pre>   @) ly (_S2   lI      // "ly", at end of a word with at least one other
                        //   syllable, is a suffix pronounced [lI].  Remove
                        //   it and retranslate the word.

   _) un (@P2   %Vn     // "un" at the start of a word is an unstressed
                        //   prefix pronounced [Vn]
   _) un (i     ju:     // ... except in words starting "uni"
   _) un (inP2  ,Vn     // ... but it is for words starting "unin"
 </pre>
 S and P must be at the end of the &lt;post&gt; string.
 <p>
 S&lt;number&gt; may be followed by additional letters (eg. S2ei ).  Some of these are probably specific to English, but similar functions could be made for other languages.

 <ul><table>
  <tbody>
    <tr>
      <td><strong>q</strong></td>
      <td>query the _list file to find stress position or other attributes for the stem, but don't re-translate the word with the suffix removed.</td>
    </tr>
    <tr>
      <td><strong>t</strong></td>
      <td>determine the stress pattern of the word <strong>before</strong> adding the suffix</td>
    </tr>
    <tr>
      <td><strong>d&nbsp;&nbsp;&nbsp;</strong></td>
      <td>the previous letter may have been doubled when the suffix was added.</td>
    </tr>
    <tr>
      <td><strong>e</strong></td>
      <td>"e" may have been removed.</td>
    </tr>
    <tr>
      <td><strong>i</strong></td>
      <td>"y" may have been changed to "i."</td>
    </tr>
    <tr>
      <td><strong>v</strong></td>
      <td>the suffix means the verb form of pronunciation should be used.</td>
    </tr>
    <tr>
      <td><strong>f</strong></td>
      <td>the suffix means the next word is likely to be a verb.</td>
    </tr>
    <tr>
      <td><strong>m</strong></td>
      <td>after this suffix has been removed, additional suffixes may be removed.</td>
    </tr>
  </tbody>
 </table>
 </ul>
 <p>
 P&lt;number&gt; may be followed by additonal letters (eg. P3v ).

 <ul><table>
  <tbody>
    <tr>
      <td><strong>t&nbsp;&nbsp;&nbsp;</strong></td>
      <td>determine the stress pattern of the word <strong>before</strong> adding the prefix</td>
    </tr>
    <tr>
      <td><strong>v</strong></td>
      <td>the suffix means the verb form of pronunciation should be used.</td>
    </tr>
  </tbody>
 </table>
 </ul>

 <p>&nbsp;<hr>
 <h3>4.4  Pronunciation Dictionary List</h3>
 The <strong><em>&lt;language&gt;_list</em></strong>&nbsp;  file contains a list of words whose pronunciations are given explicitly, rather than determined by the Pronunciation Rules.
 The <strong><em>&lt;language&gt;_extra</em></strong>&nbsp; file, if present, is also used and it's contents are taken as coming after those in <strong><em>&lt;language&gt;_list</em></strong>.
 <p>
 Also the list can be used to specify the stress pattern, or other properties, of a word.
 <p>
 If the Pronunciation rules are applied to a word and indicate a standard prefix or suffix, then the word is again looked up in Pronunciation Dictionary List after the prefix or suffix has been removed.
 <p>
 Lines in the dictionary list have the form:
 <ul>
 &lt;word&gt; &nbsp; &nbsp;  [&lt;phoneme string&gt;] &nbsp; &nbsp;	[&lt;flags&gt;]
 </ul>eg.
 <pre>     book      bUk
 </pre>
 Rather than a full pronunciation, just the stress may be given, to change where it would be otherwise placed by the Pronunciation Rules:
 <pre>     berlin       $2      // stress on second syllable
     absolutely   $3      // stress on third syllable
     for          $u      // an unstressed word
 </pre>
 <h4>4.4.1 Multiple Words</h4>
 A pronunciation may also be specified for a group of words, when these appear together. Up to  four words may be given, enclosed in brackets.  This may be used for change the pronunciation or stress pattern when these words occur together,
 <pre>    (de jure)    deI||dZ'U@rI2   // note || used as a word break in the phoneme string</pre>
 or to run them together, pronounced as a single word
 <pre>    (of a)       @v@
 </pre>
 or to give them a flag when they occur together
 <pre>    (such as)    sVtS||a2z   $pause	   // precede with a pause
 </pre>
 Hyphenated words in the <strong><em>&lt;language&gt;_list</em></strong>&nbsp;  file must also be enclosed within brackets, because the two parts are considered as separate words.
 <h4>4.4.2 Special characters in &lt;phoneme string&gt;:</h4>
 <ul><table>
  <tbody>
    <tr>
      <td><strong>_^_&lt;language code&gt;&nbsp;&nbsp;&nbsp;</strong></td>
      <td>Translate using a different language.  See explanation in 4.3.3 above.</td>
    </tr>
  </tbody>
 </table>
 </ul>
 <h4>4.4.3 Flags</h4>
 A word (or group of words) may be given one or more flags, either instead of, or as well as, the phonetic translation.

 <ul><table>
  <tbody valign="top">
    <tr>
      <td>$u</td>
      <td>The word is unstressed. In the case of a multi-syllable word, a slight stress is applied according to the default stress rules.</td>
    </tr>
    <tr>
      <td>$u1</td>
      <td>The word is unstressed, with a slight stress on its 1st syllable.</td>
    </tr>
    <tr>
      <td>$u2</td>
      <td>The word is unstressed, with a slight stress on its 2nd syllable.</td>
    </tr>
    <tr>
      <td>$u3</td>
      <td>The word is unstressed, with a slight stress on its 3rd syllable.</td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
    </tr>
    <tr>
      <td>$u+ $u1+ $u2+ $u3+</td>
      <td>As above, but the word has full stress if it's at the end of a clause.</td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
    </tr>
    <tr>
      <td>$1</td>
      <td>Primary stress on the 1st syllable.</td>
    </tr>
    <tr>
      <td>$2</td>
      <td>Primary stress on the 2nd syllable.</td>
    </tr>
    <tr>
      <td>$3</td>
      <td>Primary stress on the 3rd syllable.</td>
    </tr>
    <tr>
      <td>$4</td>
      <td>Primary stress on the 4th syllable.</td>
    </tr>
    <tr>
      <td>$5</td>
      <td>Primary stress on the 5th syllable.</td>
    </tr>
    <tr>
      <td>$6</td>
      <td>Primary stress on the 6th syllable.</td>
    </tr>
    <tr>
      <td>$7</td>
      <td>Primary stress on the 7th syllable.</td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
    </tr>
    <tr>
      <td>$pause</td>
      <td>Ensure a short pause before this word (eg. for conjunctions such as "and", some prepositions, etc).</td>
    </tr>
    <tr>
      <td>$brk</td>
      <td>Ensure a very short pause before this word, shorter than $pause (eg. for some prepositions, etc).</td>
    </tr>
    <tr>
      <td>$only</td>
      <td>The rule does not apply if a prefix or suffix has already been removed.</td>
    </tr>
    <tr>
      <td>$onlys</td>
      <td>As $only, except that a standard plural ending is allowed.</td>
    </tr>
    <tr>
      <td>$stem</td>
      <td>The rule only applies if a suffix has already been removed.</td>
    </tr>
    <tr>
      <td>$strend</td>
      <td>Word is fully stressed if it's at the end of a clause.</td>
    </tr>
    <tr>
      <td>$strend2</td>
      <td>As $strend, but the word is also stressed if followed only by unstressed word(s).</td>
    </tr>
    <tr>
      <td>$unstressend&nbsp;</td>
      <td>Word is unstressed if it's at the end of a clause.</td>
    </tr>
    <tr>
      <td>$atend</td>
      <td>Use this pronunciation if it's at the end of a clause.</td>
    </tr>
    <tr>
      <td>$double</td>
      <td>Cause a doubling of the initial consonant of the following word (used for Italian).</td>
    </tr>
    <tr>
      <td>$capital</td>
      <td>Use this pronunciation if the word has initial capital letter (eg. polish v Polish).</td>
    </tr>
    <tr>
      <td>$allcaps</td>
      <td>Use this pronunciation if the word is all capitals.</td>
    </tr>
    <tr>
      <td>$dot</td>
      <td>Ignore a . after this word even when followed by a capital letter (eg. Mr. Dr. ).</td>
    </tr>
    <tr>
      <td>$hasdot</td>
      <td>Use this pronunciation if the word is followed by a dot. (This attribute also implies $dot).</td>
    </tr>
    <tr>
      <td>$sentence</td>
      <td>The rule only applies if the clause includes end-of-sentence (i.e. it is not terminated by a comma).  For example, "$atend $sentence" means that the rule only applies at the end of a sentence.</td>
    </tr>
    <tr>
      <td>$abbrev</td>
      <td>This has two meanings.<br> 1. If there is no phoneme string: Speak the word as individual letters, even if it contains a vowel (eg. "abc" should be spoken as "a" "b" "c").<br>2. If there is a phoneme string: This word is capitalized because it is an abbreviation and capitalization does not indicate emphasis (if the "emphasize all-caps" is on).</td>
    </tr>
    <tr>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
    </tr>
    <tr>
      <td>$accent</td>
      <td>Used for the pronunciation of a single alphabetic character.  The character name is spoken as the base-letter name plus the accent (diacritic) name.  eg. It can be used to specify that "&#xe2;" is spoken as "a" "circumflex".</td>
    </tr>
    <tr>
      <td>$combine</td>
      <td>This word is treated as though it is combined with the following word with a hyphen.  This may be subject to fuither conditions for certain languages.</td>
    </tr>
     <tr>
      <td>$alt &nbsp; $alt2 &nbsp; $alt3</td>
      <td>These are language specific.  Their use should be described in the language's **_list file</td>
    </tr>
   <tr>
    <tr>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
    </tr>
    <tr>
      <td>$verb</td>
      <td>Use this pronunciation if it's a verb.</td>
    </tr>
    <tr>
      <td>$noun</td>
      <td>Use this pronunciation if it's a noun.</td>
    </tr>
    <tr>
      <td>$past</td>
      <td>Use this pronunciation if it's past tense.</td>
    </tr>
    <tr>
      <td>$verbf</td>
      <td>The following word is probably is a verb.</td>
    </tr>
    <tr>
      <td>$verbsf</td>
      <td>The following word is probably is a if it has an "s" suffix.</td>
    </tr>
    <tr>
      <td>$nounf</td>
      <td>The following word is probably not a verb.</td>
    </tr>
    <tr>
      <td>$pastf</td>
      <td>The following word is probably past tense.</td>
    </tr>
    <tr>
      <td>$verbextend</td>
      <td>Extend the influence of $verbf and $verbsf.</td>
    </tr>
  </tbody>
 </table></ul>
 The last group are probably English specific, but something similar may be useful in other languages. They are a crude attempt to improve the accuracy of pairs like  ob'ject (verb) v  'object (noun) and read (present) v read (past).
 <p>
 The dictionary list is searched from bottom to top.  The first match that satisfies any conditions is used (i.e. the one lowest down the list).  So if we have:
 <pre>
    to    t@               // unstressed version
    to    tu:   $atend     // stressed version
 </pre>
 then if "to" is at the end of the clause, we get [tu:], if not then we get [t@].

 <p>
 <h4>4.4.4  Translating a Word to another Word</h4>
 Rather than specifying the pronunciation of a word by a phoneme string, you can specify another "sounds like" word.<p>Use the attribute <b>$text</b>  eg.<p>
 <pre>
    cough    coff   $text
 </pre>
 Alternatively, use the command <b>$textmode</b> on a line by itself to turn this on for all subsequent entries in the file, until it's turned off by <b>$phonememode</b>. eg.<p>
 <pre>
    $textmode
    cough     coff
    through   threw
    $phonememode
 </pre>
 This feature cannot be used for the special entries in the <b>_list</b> files which start with an underscore, such as numbers.<p>
 Currently "textmode" entries are only recognized for complete words, and not for for stems from which a prefix or suffix has been removed (eg. the word "coughs" would not match the example above).
 <p>

 <p>&nbsp;<hr>
 <h3>4.5  Conditional Rules</h3>
 Rules in a <b>_rules</b> file and entries in a <b>_list</b> file can be made conditional.  They apply only to some voices.  This can be useful to specify different pronunciations for different variants of a language (dialects or accents).<p>
 Conditional rules have &nbsp; <b>?</b> &nbsp; and a condition number at the start if the line in the <b>_rules</b> or <b>_list</b> file.  This means that the rule only applies of that condition number is specified in a <b>dictrules</b> line in the <a href="voices.html">voice file</a>.<p>
 If the rule starts with &nbsp; <b>?!</b> &nbsp; then the rule only applies if the condition number is <b>not</b> specified in the voice file. eg.<p>
 <pre>
   ?3     can't     kant    // only use this if the voice has:  dictrules 3
   ?!3    rather    rA:D3   // only use if the voice doesn't have:  dictrules 3
 </pre>

 <p>&nbsp;<hr>
 <h3>4.6  Numbers and Character Names</h3>
 <h4>4.6.1 Letter names</h4>
 The names of individual letters can be given either in the <b>_rules</b> or <b>_list</b> file.  Sometimes an individual letter is also used as a word in the language and its pronunciation as a word differs from its letter name.  If so, it should be listed in the <b>_list</b> file, preceded by an underscore, to give the letter name (as distinct from its pronunciation as a word).  eg. in English:
 <pre>   _a   eI</pre>
 <h4>4.6.2 Numbers</h4>
 The operation the TranslateNumber() function is controlled by the language's <code>langopts.numbers</code> option.  This constructs spoken numbers from fragments according to various options which can be set for each language.  The number fragments are given in the <b>_list</b> file.
 <p>
 <ul>
 <table><tbody align="left">
 <tr>
 <td>
 _0 to _9 &nbsp;
 <td>The numbers 0 to 9
 </tr>
 <tr>
 <td>_13<td>etc. Any pronunciations which are needed for specific numbers in the range _10 to _99 &nbsp;
 </tr>
 <tr>
 <td>_2X &nbsp;_3X<td>Twenty, thirty, etc., used to make numbers 10 to 99 &nbsp;
 </tr>
 <tr>
 </tr>
 <tr><TD>_0C<td>The word for "hundred"</td>
 <tr><TD>_1C &nbsp;_2C<td>Special pronunciation for one hundred, two hundred, etc., if needed.</tr>
 <tr><TD>_1C0<td>Special pronunciation (if needed) for 100 exactly</td>
 <tr><TD>_0M1<td>The word for "thousand"</tr>
 <tr><TD>_0M2<td>The word for "million"</tr>
 <tr><TD>_0M3<td>The word for 1000000000</tr>
 <tr><TD>_1M1 &nbsp;_2M1<td>Special pronunciation for one thousand, two thousand, etc, if needed</td>
 <tr><TD>_0and<td>Word for "and" when speaking numbers (eg. "two hundred and twenty").</tr>
 <tr><TD>_dpt<td>Word spoken for the decimnal point/comma</tr>
 <tr><TD>_dpt2<td>Word spoken (if any) at the end of all the digits after a decimal point.</tr>
 </tbody></table>
 </ul>

 <p>&nbsp;<hr>
 <h3>4.7  Character Substitution</h3>
 Character substitutions can be specified by using a <b> .replace </b> section at the start of the <b> _rules </b> file.  Each line specified either one or two alphabetic characters to be replaced by another one or two alphabetic characters.  This substitution is done to a word before it is translated using the spelling-to-phoneme rules.  Only the lower-case version of the characters needs to be specified.  eg.<p>
 &nbsp; .replace<br>
 &nbsp; &nbsp; &#xf4; &nbsp; &#x151; &nbsp; // (Hungarian) allow the use of o-circumflex instead of o-double-accute<br>
 &nbsp; &nbsp; &#xfb; &nbsp; &#x171;<p>
 &nbsp; &nbsp; cx &nbsp; &#x109; &nbsp; // (Esperanto) allow "cx" as an alternative to c-circumflex<p>

 &nbsp; &nbsp; &#xfb01; &nbsp; fi &nbsp; // replace a single character ligature by two characters
 <p>



 </body>
 </html>
--- a/docs/dictionary.md
+++ b/docs/dictionary.md
@@ -1,64 +1,62 @@
 # Table of contents

 * [Text to phoneme translation](#text-to-phoneme-translation)
  * [Translation Files](#translation-files)
  * [Phoneme names](#phoneme-names)
  * [Pronunciation Rules](#pronunciation-rules)
    * [Rule Groups](#rule-groups)
    * [Rules](#rules)
    * [Special characters in \<phoneme string\>](#special-characters-in-phoneme-string)
    * [Special Characters in both \<pre\> and \<post\> ](#special-characters-in-both-pre-and-post)
    * [Special characters only in \<pre\> ](#special-characters-only-in-pre)
    * [Special characters only in \<post\> ](#special-characters-only-in-post)
  * [Pronunciation Dictionary List](#pronunciation-dictionary-list)
    * [Multiple Words](#multiple-words)
    * [Special characters in \<phoneme string\>](#special-characters-in-phoneme-string)
    * [Flags](#flags)
    * [Translating a Word to another Word](#translating-a-word-to-another-word)
  * [Conditional Rules](#conditional-rules)
  * [Numbers and Character Names](#numbers-and-character-names)
    * [Letter names](#letter-names)
    * [Numbers](#numbers)
  * [Character Substitution](#character-substitution)

 # Text to phoneme translation

 # Text to Phoneme Translation

 - [Translation Files](#translation-files)
 - [Phoneme names](#phoneme-names)
 - [Pronunciation Rules](#pronunciation-rules)
  - [Rule Groups](#rule-groups)
  - [Rules](#rules)
  - [Special Characters in \<phoneme string\>](#special-characters-in-phoneme-string)
  - [Special Characters in Both \<pre\> and \<post\> ](#special-characters-in-both-pre-and-post)
  - [Special Characters Only in \<pre\> ](#special-characters-only-in-pre)
  - [Special Characters Only in \<post\> ](#special-characters-only-in-post)
 - [Pronunciation Dictionary List](#pronunciation-dictionary-list)
  - [Multiple Words](#multiple-words)
  - [Special Characters in \<phoneme string\>](#special-characters-in-phoneme-string-1)
  - [Flags](#flags)
  - [Translating a Word to Another Word](#translating-a-word-to-another-word)
 - [Conditional Rules](#conditional-rules)
 - [Numbers and Character Names](#numbers-and-character-names)
  - [Letter Names](#letter-names)
  - [Numbers](#numbers)
 - [Character Substitution](#character-substitution)

 ----------

 ## Translation Files

 There is a separate set of pronunciation files for each language, their
 names starting with the language name.
 There is a separate set of pronunciation files for each language, their names
 starting with the language name.

 There are two separate methods for translating words into phonemes:

 * Pronunciation Rules. These are an attempt to define the pronunciation rules for the language. The source file is:
 **\<language\>\_rules**  (eg. `en_rules`)
 * Pronunciation Rules. These are an attempt to define the pronunciation rules
  for the language. The source file is: `<language>_rules` (e.g. `en_rules`).

 * Lookup Dictionary. A list of individual words and their pronunciations and/or various other properties. The source files are:
 **\<language\>\_list**  (eg. `en_list`) and optionally **\<language\>\_extra**.
 * Lookup Dictionary. A list of individual words and their pronunciations and/or
  various other properties. The source files are: `<language>_list` (e.g.
  `en_list`) and optionally `<language>_extra` (e.g. `en_extra`).

 These two files are compiled into the file **\<language\>\_dict**  in
 the espeak-data directory (eg. `espeak-data/en_dict`)
 These files are compiled into the file `<language>_dict`  in the espeak-data
 directory (e.g. `espeak-data/en_dict`).

 ## Phoneme names

 Each of the language's phonemes is represented by a mnemonic of 1, 2, 3,
 or 4 characters. Together with a number of utility codes (eg. stress
 marks and pauses), these are defined in the phoneme data file (_TODO_).
 marks and pauses), these are defined in the phoneme data file.

 The utility 'phonemes' are:


 Symbol    | Description
 --------- | -------------
 **'**     | primary stress
 **,**     | secondary stress
 **%**     | unstressed syllable
 **=**     | put the primary stress on the preceding syllable
 **\_:**   | short pause
 **\_**    | a shorter pause
 **\|\|**  | indicates a word boundary within a phoneme string
 **\|**    | can be used to separate two adjacent characters, to prevent them from being considered as a multi-character phoneme mnemonic 
 | Symbol | Description |
 |--------|-------------|
 | `'`    | primary stress |
 | `,`    | secondary stress |
 | `%`    | unstressed syllable |
 | `=`    | put the primary stress on the preceding syllable |
 | `_:`   | short pause |
 | `_`    | a shorter pause |
 | `||`   | indicates a word boundary within a phoneme string |
 | `|`    | can be used to separate two adjacent characters, to prevent them from being considered as a multi-character phoneme mnemonic |
 
 It is not necessary to specify the stress of every syllable. Stress
 markers are only needed in order to change the effect of the language's
@@ -69,14 +67,15 @@ loosely on the Kirshenbaum ascii character representation of the
 International Phonetic Alphabet
 [www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf)

 Full list of commonly used phonemes can be found in [phsource/phonemes](../phsource/phonemes) file.
 Full list of commonly used phonemes can be found in the `phsource/phonemes`
 file.

 ## Pronunciation Rules

 The rules in the **\<language\>\_rules**  file specify the phonemes
 which are used to pronounce each letter, or sequence of letters. Some
 rules only apply when the letter or letters are preceded by, or followed
 by, other specified letters.
 The rules in the `<language>_rules` file specify the phonemes which are
 used to pronounce each letter, or sequence of letters. Some rules only
 apply when the letter or letters are preceded by, or followed by, other
 specified letters.

 To find the pronunciation of a word, the rules are searched and any
 which match the letters at the in the word are given a score depending
@@ -87,24 +86,29 @@ repeated until all the letters of the word have been processed.

 ### Rule Groups

 The rules are organized in groups, each starting with a ".group" line:
 The rules are organized in groups, each starting with a `.group` line:

 * **.group \<character\>**  
 * `.group <character>`
  A group for each letter or character.

 * **.group \<2 characters\>**  
  Optional groups for some common 2 letter combinations. This is only needed, for efficiency, in cases where there are many rules for a particular letter. They would not be needed for a language which has regular spelling rules. The first character can only be an ascii character (less than 0x80).
 * `.group <2 characters>`
  Optional groups for some common 2 letter combinations. This is only needed,
  for efficiency, in cases where there are many rules for a particular letter.
  They would not be needed for a language which has regular spelling rules. The
  first character can only be an ascii character (less than 0x80).

 * **.group**  
 * `.group`
  A group for other characters which don't have their own group.

 * **.L\<nn\>**  
  Defines a group of letter sequences, any of which can match with Lnn in a pre or post rule (see below). nn is a 2 digit decimal number in the range 01 to 25. eg:  
  `.L01 b bl br pl pr`
 * `.L<nn>`
  Defines a group of letter sequences, any of which can match with `Lnn` in a
  pre or post rule (see below). nn is a 2 digit decimal number in the range 01
  to 25. e.g.:

 * **.replace**  
  See section [Character Substitution](#character-substitution).
 	.L01 b bl br pl pr

 * `.replace`
  See section [Character Substitution](#character-substitution).

 When matching a word, firstly the 2-letter group for the two letters at
 the current position in the word (if such a group exists) is searched,
@@ -115,145 +119,144 @@ those two groups is used.

 Each rule is on separate line, and has the syntax:

 `[<pre>)] <match> [(<post>] <phoneme string>`
 	[<pre>)] <match> [(<post>] <phoneme string>

 e.g.

 eg.

 ```
 .group o
       o        0    // "o" is pronounced as [0]
       oo       u:   // but "oo" is pronounced as [u:]
    b) oo (k    U
 ```
 	.group o
 	       o        0    // "o" is pronounced as [0]
 	       oo       u:   // but "oo" is pronounced as [u:]
 	    b) oo (k    U

 "oo" is pronounced as [u:], but when also preceded by "b" and followed
 by "k", it is pronounced [U].
 `oo` is pronounced as `[u:]`, but when also preceded by `b` and followed
 by `k`, it is pronounced `[U]`.

 In the case of a single-letter group, the first character of \<match\>
 In the case of a single-letter group, the first character of `<match>`
 much be the group letter. In the case of a 2-letter group, the first two
 characters of \<match\> must be the group letters. The second and third
 characters of `<match>` must be the group letters. The second and third
 rules above may be in either .group o or .group oo

 Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must
 Alphabetic characters in the `<pre>`, `<match>`, and `<post>` parts must
 be lower case, and matching is case-insensitive. Some upper case letters
 are used in \<pre\> and \<post\> with special meanings.
 are used in `<pre>` and `<post>` with special meanings.

 ### Special characters in \<phoneme string\>:


 * **_^_\<language code\>**  
 * `_^_<language code>`
  Translate using a different language.
 If this rule is selected when translating a word, then the translation is aborted and the word is re-translated using the specified different language. \<language code\> may be upper or lower case. This can be used to recognise certain letter combinations as being foreign words and to use the foreign pronunciation for them. eg:
 `th (_    _^_EN`

 indicates that a word which ends in "th" is translated using the English translation rules and spoken with English phonemes. 
 If this rule is selected when translating a word, then the translation is
 aborted and the word is re-translated using the specified different language.
 `<language code>` may be upper or lower case. This can be used to recognise
 certain letter combinations as being foreign words and to use the foreign
 pronunciation for them. e.g.:

 ### Special Characters in both \<pre\> and \<post\>
 	th (_    _^_EN

 indicates that a word which ends in `th` is translated using the English
 translation rules and spoken with English phonemes. 

 ### Special Characters in both \<pre\> and \<post\>

 Symbol           | Description
 -----------------| -----------
 **\_**           | Beginning or end of a word (or a hyphen)
 **-**            | Hyphen.                              
 **A**            | Any vowel (the set of vowel characters may be defined for a particular language).                
 **C**            | Any consonant.                       
 **B H F G Y**    | These may indicate other sets of characters (defined for a particular language).                           
 **L\<nn\>**      | Any of the sequence of characters defined as a letter group (see above).                              
 **D**            | Any digit.                           
 **K**            | Not a vowel (i.e. a consonant or word boundary or non-alphabetic character).                          
 **X**            | There is no vowel until the word boundary.                            
 **Z**            | A non-alphabetic character.          
 **%**            | Doubled (placed before a character in \<pre\> and after it in \<post\>. 
 **/**            | The following character is treated literally.                           

 The sets of letters indicated by A, B, C, E, F G may be defined
 | Symbol      | Description |
 |-------------|-------------|
 | `_`         | Beginning or end of a word (or a hyphen). |
 | `-`         | Hyphen. |
 | `A`         | Any vowel (the set of vowel characters may be defined for a particular language). |
 | `C`         | Any consonant. |
 | `B H F G Y` | These may indicate other sets of characters (defined for a particular language). |
 | `L<nn>`     | Any of the sequence of characters defined as a letter group (see above). |
 | `D`         | Any digit. |
 | `K`         | Not a vowel (i.e. a consonant or word boundary or non-alphabetic character). |
 | `X`         | There is no vowel until the word boundary. |
 | `Z`         | A non-alphabetic character. |
 | `%`         | Doubled (placed before a character in \<pre\> and after it in \<post\>. |
 | `/`         | The following character is treated literally. |

 The sets of letters indicated by `A`, `B`, `C`, `E`, `F` and `G` may be defined
 differently for each language.

 Examples of rules:

 ```
     _)  a         // "a" at the start of a word
         a (CC     // "a" followed by two consonants
         a (C%     // "a" followed by a double consonant (the same letter twice)
         a (/%     // "a" followed by a percent sign
     %C) a         // "a" preceded by a double consonants
 ```
 	_)  a         // "a" at the start of a word
 	    a (CC     // "a" followed by two consonants
 	    a (C%     // "a" followed by a double consonant (the same letter twice)
 	    a (/%     // "a" followed by a percent sign
 	%C) a         // "a" preceded by a double consonants

 ### Special characters only in \<pre\>:

 Symbol          | Description
 **@**           | Any syllable.   
 **&**           | A syllable which may be stressed (i.e. is not defined as unstressed).
 **V**           | Matches only if a previous word has  indicated that a verb form is expected.                            
 | Symbol | Description |
 |--------|-------------|
 | `@`    | Any syllable. |
 | `&`    | A syllable which may be stressed (i.e. is not defined as unstressed). |
 | `V`    | Matches only if a previous word has  indicated that a verb form is expected. |

 eg.
 ```
     @@)  bi      // "bi" preceded by at least two syllables
     @@a) bi      // "bi" preceded by at least 2 syllables and following 'a'
 ```
 e.g.

 	@@)  bi      // "bi" preceded by at least two syllables
 	@@a) bi      // "bi" preceded by at least 2 syllables and following 'a'

 Note, that matching characters in the \<pre\> part do not affect the
 syllable counting.

 ### Special characters only in \<post\>


 Symbol             | Description
 -------------------| -----------
 **@**              | A vowel follows somewhere in the word.                               
 **+**              | Force an increase in the score in this rule (may be repeated for more effect).                            
 **S\<number\>**    | This number of matching characters are a standard suffix, remove them and retranslate the word.           
 **P\<number\>**    | This number of matching characters are a standard prefix, remove them and retranslate the word.           
 **Lnn**            | **nn** is a 2-digit decimal number in the range 01 to 20 Matches with any of the letter sequences which have been defined for letter group **nn** 
 **N**              | Only use this rule if the word is not a retranslation after removing a suffix.                             
 **\#**             | (English specific) change the next "e" into a special character "E"    
 **$noprefix**     | Only use this rule if the word is not a retranslation after removing a prefix.                             
 **$w\_alt**       | Only use this rule if the word is $w\_alt2 found in the \*\_list file with the **$w\_alt3** **$alt**, **$alt2** or **$alt3** attribute respectively.             
 **$p\_alt**       | Only use this rule if the part-word, $p\_alt2\ up to and including the pre and $p\_alt3 match parts of this rule, is found in the \*\_list file with the **$alt**, **$alt2** or **$alt3** attribute respectively.             

 eg.
 ```
   @) ly (_S2   lI      // "ly", at end of a word with at least one other
                        //   syllable, is a suffix pronounced [lI].  Remove
                        //   it and retranslate the word.

   _) un (@P2   %Vn     // "un" at the start of a word is an unstressed
                        //   prefix pronounced [Vn]
   _) un (i     ju:     // ... except in words starting "uni"
   _) un (inP2  ,Vn     // ... but it is for words starting "unin"
 ```

 S and P must be at the end of the \<post\> string.

 S\<number\> may be followed by additional letters (eg. S2ei ). Some of
 ### Special characters only in \<post\>:

 | Symbol      | Description |
 |-------------|-------------|
 | `@`         | A vowel follows somewhere in the word. |
 | `+`         | Force an increase in the score in this rule (may be repeated for more effect). |
 | `S<number>` | This number of matching characters are a standard suffix, remove them and retranslate the word. |
 | `P<number>` | This number of matching characters are a standard prefix, remove them and retranslate the word. |
 | `Lnn`       | `nn` is a 2-digit decimal number in the range 01 to 20 Matches with any of the letter sequences which have been defined for letter group `nn` |
 | `N`         | Only use this rule if the word is not a retranslation after removing a suffix. |
 | `#`         | (English specific) change the next "e" into a special character "E" |
 | `$noprefix` | Only use this rule if the word is not a retranslation after removing a prefix. |
 | `$w_alt`    | Only use this rule if the word is `$w_alt2` found in the `*_list` file with the `$w_alt3`, `$alt`, `$alt2` or `$alt3` attribute respectively. |
 | `$p_alt`    | Only use this rule if the part-word, `$p_alt2` up to and including the pre and `$p_alt3` match parts of this rule, is found in the `*_list` file with the `$alt`, `$alt2` or `$alt3` attribute respectively. |

 e.g.

 	@) ly (_S2   lI      // "ly", at end of a word with at least one other
 	                     //   syllable, is a suffix pronounced [lI].  Remove
 	                     //   it and retranslate the word.
 	
 	_) un (@P2   %Vn     // "un" at the start of a word is an unstressed
 	                     //   prefix pronounced [Vn]
 	_) un (i     ju:     // ... except in words starting "uni"
 	_) un (inP2  ,Vn     // ... but it is for words starting "unin"

 `S` and `P` must be at the end of the \<post\> string.

 `S<number>` may be followed by additional letters (e.g. `S2ei`). Some of
 these are probably specific to English, but similar functions could be
 made for other languages.

 Symbol| Description
 ----- | -----------
 **q** | query the \_list file to find stress position or other attributes for the stem, but don't re-translate the word with the suffix removed.       
 **t** | determine the stress pattern of the word **before** adding the suffix   
 **d** | the previous letter may have been doubled when the suffix was added.  
 **e** | "e" may have been removed.          
 **i** | "y" may have been changed to "i."   
 **v** | the suffix means the verb form of pronunciation should be used.       
 **f** | the suffix means the next word is likely to be a verb.                
 **m** | after this suffix has been removed, additional suffixes may be removed. 
 | Symbol | Description |
 |--------|-------------|
 | `q`    | Query the `*_list` file to find stress position or other attributes for the stem, but don't re-translate the word with the suffix removed. |
 | `t`    | Determine the stress pattern of the word *before* adding the suffix. |
 | `d`    | The previous letter may have been doubled when the suffix was added. |
 | `e`    | `e` may have been removed. |
 | `i`    | `y` may have been changed to `i`. |
 | `v`    | The suffix means the verb form of pronunciation should be used. |
 | `f`    | The suffix means the next word is likely to be a verb. |
 | `m`    | After this suffix has been removed, additional suffixes may be removed. |

 P\<number\> may be followed by additonal letters (eg. P3v ).
 `P<number>` may be followed by additonal letters (e.g. `P3v`).

 Symbol| Description
 ----- | -----------
 **t** | determine the stress pattern of the word **before** adding the prefix  
 **v** | the suffix means the verb form of pronunciation should be used.      
 | Symbol | Description |
 |--------|-------------|
 | `t`    | Determine the stress pattern of the word **before** adding the prefix. |
 | `v`    | The suffix means the verb form of pronunciation should be used. |

 ## Pronunciation Dictionary List

 The **\<language\>\_list**  file contains a list of words whose
 pronunciations are given explicitly, rather than determined by the
 Pronunciation Rules. The **\<language\>\_extra**  file, if present, is
 also used and it's contents are taken as coming after those in
 **\<language\>\_list**.
 The `<language>_list` file contains a list of words whose pronunciations are
 given explicitly, rather than determined by the Pronunciation Rules. The
 `<language>_extra` file, if present, is also used and it's contents are taken
 as coming after those in `<language>_list`.

 Also the list can be used to specify the stress pattern, or other
 properties, of a word.
@@ -264,23 +267,18 @@ Dictionary List after the prefix or suffix has been removed.

 Lines in the dictionary list have the form:

 ```
 <word>     [<phoneme string>]     [<flags>]
 ```
 	<word>     [<phoneme string>]     [<flags>]

 e.g.

 eg.
 ```
     book      bUk
 ```
 	book      bUk

 Rather than a full pronunciation, just the stress may be given, to
 change where it would be otherwise placed by the Pronunciation Rules:

 ```
     berlin       $2      // stress on second syllable
     absolutely   $3      // stress on third syllable
     for          $u      // an unstressed word
 ```
 	berlin       $2      // stress on second syllable
 	absolutely   $3      // stress on third syllable
 	for          $u      // an unstressed word

 ### Multiple Words

@@ -289,77 +287,71 @@ appear together. Up to four words may be given, enclosed in brackets.
 This may be used for change the pronunciation or stress pattern when
 these words occur together,

 ```
    (de jure)    deI||dZ'U@rI2   // note || used as a word break in the phoneme string
 ```
 	(de jure)    deI||dZ'U@rI2   // note || used as a word break in the phoneme string

 or to run them together, pronounced as a single word
 ```
    (of a)       @v@
 ```

 	(of a)       @v@

 or to give them a flag when they occur together

 ```
    (such as)    sVtS||a2z   $pause        // precede with a pause
 ```
 	(such as)    sVtS||a2z   $pause        // precede with a pause

 Hyphenated words in the **\<language\>\_list**  file must also be
 enclosed within brackets, because the two parts are considered as
 separate words.
 Hyphenated words in the `<language>_list` file must also be enclosed within
 brackets, because the two parts are considered as separate words.

 ### Special characters in \<phoneme string\>:

 Symbol| Description
 ----- | -----------
 **\_\^\_\<language code\>**  | Translate using a different language. See explanation above.                         
 | Symbol               | Description |
 |----------------------|-------------|
 | `_^_<language code>` | Translate using a different language. See explanation above. |

 ### 3 Flags

 A word (or group of words) may be given one or more flags, either
 instead of, or as well as, the phonetic translation.

 Symbol| Description
 ----- | -----------
 $u                        | The word is unstressed. In the case of a multi-syllable word, a slight stress is applied according to the default stress rules.               
 $u1                       | The word is unstressed, with a slight stress on its 1st syllable.  
 $u2                       | The word is unstressed, with a slight stress on its 2nd syllable.  
 $u3                       | The word is unstressed, with a slight stress on its 3rd syllable.  
 $u+ $u1+ $u2+ $u3+        | As above, but the word has full stress if it's at the end of a clause.                             
 $1                        | Primary stress on the 1st syllable. 
 $2                        | Primary stress on the 2nd syllable. 
 $3                        | Primary stress on the 3rd syllable. 
 $4                        | Primary stress on the 4th syllable. 
 $5                        | Primary stress on the 5th syllable. 
 $6                        | Primary stress on the 6th syllable. 
 $7                        | Primary stress on the 7th syllable. 
 $pause                    | Ensure a short pause before this word (eg. for conjunctions such as "and", some prepositions, etc).     
 $brk                      | Ensure a very short pause before this word, shorter than $pause (eg. for some prepositions, etc).        
 $only                     | The rule does not apply if a prefix or suffix has already been removed. 
 $onlys                    | As $only, except that a standard  plural ending is allowed.           
 $stem                     | The rule only applies if a suffix has already been removed.           
 $strend                   | Word is fully stressed if it's at the end of a clause.                
 $strend2                  | As $strend, but the word is also stressed if followed only by unstressed word(s).                 
 $unstressend              | Word is unstressed if it's at the end of a clause.                    
 $atend                    | Use this pronunciation if it's at the end of a clause.                
 $double                   | Cause a doubling of the initial consonant of the following word (used for Italian).                 
 $capital                  | Use this pronunciation if the word has initial capital letter (eg. polish v Polish).                   
 $allcaps                  | Use this pronunciation if the word is all capitals.                    
 $dot                      | Ignore a . after this word even when followed by a capital letter (eg. Mr. Dr. ).                          
 $hasdot                   | Use this pronunciation if the word is followed by a dot. (This attribute also implies $dot).      
 $sentence                 | The rule only applies if the clause includes end-of-sentence (i.e. it is not terminated by a comma). For example, "$atend $sentence" means that the rule only applies at the end of a sentence.                  
 $abbrev                   | This has two meanings. If there is no phoneme string: Speak the word as individual letters, even if it contains a vowel (eg. "abc" should be spoken as "a" "b" "c"). If there is a phoneme string: This word is capitalized because it is an abbreviation and capitalization does not indicate emphasis (if the "emphasize all-caps" is on).                   
 $accent                   | Used for the pronunciation of a single alphabetic character. The character name is spoken as the base-letter name plus the accent (diacritic) name. eg. It can be used to specify that "â" is spoken as "a" "circumflex".                       
 $combine                  | This word is treated as though it is combined with the following word with a hyphen. This may be subject to fuither conditions for certain languages.                          
 $alt   $alt2   $alt3      | These are language specific. Their use should be described in the language's \*\*\_list file 
 $verb                     | Use this pronunciation if it's a verb.                               
 $noun                     | Use this pronunciation if it's a noun.                               
 $past                     | Use this pronunciation if it's past tense.                              
 $verbf                    | The following word is probably is a verb.                               
 $verbsf                   | The following word is probably is a if it has an "s" suffix.            
 $nounf                    | The following word is probably not a verb.                               
 $pastf                    | The following word is probably past tense.                              
 $verbextend               | Extend the influence of $verbf and $verbsf.                           
 | Symbol               | Description |
 |----------------------|-------------|
 | `$u`                 | The word is unstressed. In the case of a multi-syllable word, a slight stress is applied according to the default stress rules. |
 | `$u1`                | The word is unstressed, with a slight stress on its 1st syllable. |
 | `$u2`                | The word is unstressed, with a slight stress on its 2nd syllable. |
 | `$u3`                | The word is unstressed, with a slight stress on its 3rd syllable. |
 | `$u+ $u1+ $u2+ $u3+` | As above, but the word has full stress if it's at the end of a clause. |
 | `$1`                 | Primary stress on the 1st syllable. |
 | `$2`                 | Primary stress on the 2nd syllable. |
 | `$3`                 | Primary stress on the 3rd syllable. |
 | `$4`                 | Primary stress on the 4th syllable. |
 | `$5`                 | Primary stress on the 5th syllable. |
 | `$6`                 | Primary stress on the 6th syllable. |
 | `$7`                 | Primary stress on the 7th syllable. |
 | `$pause`             | Ensure a short pause before this word (eg. for conjunctions such as "and", some prepositions, etc). |
 | `$brk`               | Ensure a very short pause before this word, shorter than $pause (eg. for some prepositions, etc). |
 | `$only`              | The rule does not apply if a prefix or suffix has already been removed. |
 | `$onlys`             | As $only, except that a standard  plural ending is allowed. |
 | `$stem`              | The rule only applies if a suffix has already been removed. |
 | `$strend`            | Word is fully stressed if it's at the end of a clause. |
 | `$strend2`           | As $strend, but the word is also stressed if followed only by unstressed word(s). |
 | `$unstressend`       | Word is unstressed if it's at the end of a clause. |
 | `$atend`             | Use this pronunciation if it's at the end of a clause. |
 | `$double`            | Cause a doubling of the initial consonant of the following word (used for Italian). |
 | `$capital`           | Use this pronunciation if the word has initial capital letter (eg. polish v Polish). |
 | `$allcaps`           | Use this pronunciation if the word is all capitals. |
 | `$dot`               | Ignore a `.` after this word even when followed by a capital letter (e.g. `Mr.` and `Dr.`). |
 | `$hasdot`            | Use this pronunciation if the word is followed by a dot. (This attribute also implies `$dot`). |
 | `$sentence`          | The rule only applies if the clause includes end-of-sentence (i.e. it is not terminated by a comma). For example, `$atend $sentence` means that the rule only applies at the end of a sentence. |
 | `$abbrev`            | This has two meanings. If there is no phoneme string: Speak the word as individual letters, even if it contains a vowel (eg. "abc" should be spoken as "a" "b" "c"). If there is a phoneme string: This word is capitalized because it is an abbreviation and capitalization does not indicate emphasis (if the "emphasize all-caps" is on). |
 | `$accent`            | Used for the pronunciation of a single alphabetic character. The character name is spoken as the base-letter name plus the accent (diacritic) name. eg. It can be used to specify that "â" is spoken as "a" "circumflex". |
 | `$combine`           | This word is treated as though it is combined with the following word with a hyphen. This may be subject to further conditions for certain languages. |
 | `$alt $alt2 $alt3`   | These are language specific. Their use should be described in the language's `*_list` file. |
 | `$verb`              | Use this pronunciation if it's a verb. |
 | `$noun`              | Use this pronunciation if it's a noun. |
 | `$past`              | Use this pronunciation if it's past tense. |
 | `$verbf`             | The following word is probably is a verb. |
 | `$verbsf`            | The following word is probably is a if it has an "s" suffix. |
 | `$nounf`             | The following word is probably not a verb. |
 | `$pastf`             | The following word is probably past tense. |
 | `$verbextend`        | Extend the influence of `$verbf` and `$verbsf`. |

 The last group are probably English specific, but something similar may
 be useful in other languages. They are a crude attempt to improve the
@@ -370,113 +362,102 @@ The dictionary list is searched from bottom to top. The first match that
 satisfies any conditions is used (i.e. the one lowest down the list). So
 if we have:

 ```
    to    t@               // unstressed version
    to    tu:   $atend     // stressed version
 ```
 	to    t@               // unstressed version
 	to    tu:   $atend     // stressed version

 then if "to" is at the end of the clause, we get [tu:], if not then we
 get [t@].
 then if `to` is at the end of the clause, we get `[tu:]`, if not then we
 get `[t@]`.

 ### Translating a Word to another Word
 ### Translating a Word to Another Word

 Rather than specifying the pronunciation of a word by a phoneme string,
 you can specify another "sounds like" word.

 Use the attribute **$text** eg.
 ```
    cough    coff   $text
 ```

 Alternatively, use the command **$textmode** on a line by itself to
 turn this on for all subsequent entries in the file, until it's turned
 off by **$phonememode**. eg.
 ```
    $textmode
    cough     coff
    through   threw
    $phonememode
 ```

 This feature cannot be used for the special entries in the **\_list**
 files which start with an underscore, such as numbers.

 Currently "textmode" entries are only recognized for complete words, and
 not for for stems from which a prefix or suffix has been removed (eg.
 Use the attribute `$text` e.g.

 	cough    coff   $text

 Alternatively, use the command `$textmode` on a line by itself to turn this on
 for all subsequent entries in the file, until it's turned off by
 `$phonememode`. e.g.

 	$textmode
 	cough     coff
 	through   threw
 	$phonememode

 This feature cannot be used for the special entries in the `*_list` files which
 start with an underscore, such as numbers.

 Currently `textmode` entries are only recognized for complete words, and
 not for for stems from which a prefix or suffix has been removed (e.g.
 the word "coughs" would not match the example above).

 ## Conditional Rules

 Rules in a **\_rules** file and entries in a **\_list** file can be made
 Rules in a `*_rules` file and entries in a `*_list` file can be made
 conditional. They apply only to some voices. This can be useful to
 specify different pronunciations for different variants of a language
 (dialects or accents).

 Conditional rules have   **?**   and a condition number at the start if
 the line in the **\_rules** or **\_list** file. This means that the rule
 only applies of that condition number is specified in a **dictrules**
 Conditional rules have `?` and a condition number at the start if
 the line in the `*_rules` or `*_list` file. This means that the rule
 only applies of that condition number is specified in a `dictrules`
 line in the [voice file](voices.html).

 If the rule starts with   **?!**   then the rule only applies if the
 condition number is **not** specified in the voice file. eg.
 If the rule starts with `?!` then the rule only applies if the
 condition number is `not` specified in the voice file. e.g.

 ```
   ?3     can't     kant    // only use this if the voice has:  dictrules 3
   ?!3    rather    rA:D3   // only use if the voice doesn't have:  dictrules 3
 ```
 	?3     can't     kant    // only use this if the voice has:  dictrules 3
 	?!3    rather    rA:D3   // only use if the voice doesn't have:  dictrules 3

 ## Numbers and Character Names

 ### Letter names

 The names of individual letters can be given either in the **\_rules**
 or **\_list** file. Sometimes an individual letter is also used as a
 word in the language and its pronunciation as a word differs from its
 letter name. If so, it should be listed in the **\_list** file, preceded
 by an underscore, to give the letter name (as distinct from its
 pronunciation as a word). eg. in English:
 The names of individual letters can be given either in the `*_rules` or
 `*_list` file. Sometimes an individual letter is also used as a word in
 the language and its pronunciation as a word differs from its letter name.
 If so, it should be listed in the `*_list` file, preceded by an underscore,
 to give the letter name (as distinct from its pronunciation as a word).
 e.g. in English:

 ```
   _a   eI
 ```
 	_a   eI

 ### Numbers

 The operation the TranslateNumber() function is controlled by the
 The operation the `TranslateNumber()` function is controlled by the
 language's `langopts.numbers` option. This constructs spoken
 numbers from fragments according to various options which can be set for
 each language. The number fragments are given in the **\_list** file.

 Symbol| Description
 ----- | -----------
 \_0 to \_9      | The numbers 0 to 9                  
 \_13            | etc. Any pronunciations which are needed for specific numbers in the range \_10 to \_99                  
 \_2X  \_3X      | Twenty, thirty, etc., used to make  numbers 10 to 99                    
 \_0C            | The word for "hundred"              
 \_1C  \_2C      | Special pronunciation for one hundred, two hundred, etc., if needed.                             
 \_1C0           | Special pronunciation (if needed) for 100 exactly                     
 \_0M1           | The word for "thousand"             
 \_0M2           | The word for "million"              
 \_0M3           | The word for 1000000000             
 \_1M1  \_2M1    | Special pronunciation for one thousand, two thousand, etc, if needed                              
 \_0and          | Word for "and" when speaking numbers (eg. "two hundred and twenty").     
 \_dpt           | Word spoken for the decimnal point/comma                         
 \_dpt2          | Word spoken (if any) at the end of all the digits after a decimal point.                              
 each language. The number fragments are given in the `*_list` file.

 | Symbol        | Description |
 |---------------|-------------|
 | `_0` to `_9`  | The numbers 0 to 9. |
 | `_13` etc.    | Any pronunciations which are needed for specific numbers in the range `_10` to `_99`. |
 | `_2X` `_3X`   | Twenty, thirty, etc., used to make numbers 10 to 99. |
 | `_0C`         | The word for `hundred`. |
 | `_1C` `_2C`   | Special pronunciation for one hundred, two hundred, etc., if needed. |
 | `_1C0`        | Special pronunciation (if needed) for 100 exactly. |
 | `_0M1`        | The word for `thousand`. |
 | `_0M2`        | The word for `million`. |
 | `_0M3`        | The word for 1,000,000,000. |
 | `_1M1` `_2M1` | Special pronunciation for one thousand, two thousand, etc, if needed. |
 | `_0and`       | Word for `and` when speaking numbers (e.g. `two hundred and twenty`). |
 | `_dpt`        | Word spoken for the decimnal point/comma. |
 | `_dpt2`       | Word spoken (if any) at the end of all the digits after a decimal point. |

 ## Character Substitution

 Character substitutions can be specified by using a **.replace**section
 at the start of the **\_rules**file. Each line specified either one or
 Character substitutions can be specified by using a `.replace` section
 at the start of the `*_rules` file. Each line specified either one or
 two alphabetic characters to be replaced by another one or two
 alphabetic characters. This substitution is done to a word before it is
 translated using the spelling-to-phoneme rules. Only the lower-case
 version of the characters needs to be specified. eg.

 ```
   .replace
     ô   ő   // (Hungarian) allow the use of o-circumflex instead of o-double-accute
     û   ű
     cx   ĉ   // (Esperanto) allow "cx" as an alternative to c-circumflex
     ﬁ   fi   // replace a single character ligature by two characters
 ```
 version of the characters needs to be specified. e.g.

 	.replace
 	   ô   ő   // (Hungarian) allow the use of o-circumflex instead of o-double-accute
 	   û   ű
 	   cx  ĉ   // (Esperanto) allow "cx" as an alternative to c-circumflex
 	   ﬁ   fi  // replace a single character ligature by two characters
--- a/docs/index.md
+++ b/docs/index.md
@@ -3,7 +3,7 @@
 - [Features](#features)
 - [History](#history)
 - [Adding a Language](add_language.md)
  - [Pronunciation Dictionary](dictionary.md)
  - [Text to Phoneme Translation](dictionary.md)
 - [Voice Files](voices.md)
  - [MBROLA Voices](mbrola.md)
  - [Phonemes](phonemes.md)