| 6. ADDING OR IMPROVING A LANGUAGE {.western} | |||||
| --------------------------------- | |||||
| Most of the work doesn't need any programming knowledge. Just an | |||||
| understanding of the language, an awareness of its features, patience | |||||
| and attention to detail. Wikipedia is a good source of basic phonetic | |||||
| information, eg | |||||
| [http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel). | |||||
| In many cases it should be fairly easy to add a rough implementation of | |||||
| a new language, hopefully enough to be intelligible. After that it's a | |||||
| gradual process of improvement. | |||||
| ### 6.1 Language Code {.western} | |||||
| Generally, the language's international [ISO | |||||
| 639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify | |||||
| the language. It is used in the filenames which contain the language's | |||||
| data. In the examples below the code **"fr"** is used as an example. | |||||
| Replace this with the code of your language. | |||||
| If the language does not have a 2-letter ISO\_639-1 code, then use the | |||||
| 3-letter ISO\_639-3 code. Language codes may differ from country codes. | |||||
| It is possible to have different variants of a language for different | |||||
| dialects. For example the sound of some phonemes are changed, or some of | |||||
| the pronunciation rules differ. | |||||
| ### 6.2 Language Files {.western} | |||||
| The following files are needed for your language. | |||||
| - - - - | |||||
| The **fr\_rules** and **fr\_list** files are compiled to produce the | |||||
| file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking. | |||||
| ### 6.3 Voice File {.western} | |||||
| Each language needs a voice file in **espeak-data/voices** or | |||||
| **espeak-data/voices/test**. The filename of the default voice for a | |||||
| language should be the same as the language code (eg. "fr" for French). | |||||
| Details of the contents of voice files are given in | |||||
| [voices.html](http://espeak.sf.net/voices.html). | |||||
| The simplest voice file would contain just 2 lines to give the language | |||||
| name and language code, eg: | |||||
| ~~~~ {.western} | |||||
| name french | |||||
| language fr | |||||
| ~~~~ | |||||
| This language code specifies which phoneme table and dictionary to use | |||||
| (i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If | |||||
| needed, these can be overridden by **phonemes** and **dictionary** | |||||
| attributes in the voice file. For example you may want to start the | |||||
| implementation of a new language by using the phoneme table of an | |||||
| existing language. | |||||
| ### 6.4 Phoneme Definition File {.western} | |||||
| You must first decide on the set of phonemes (vowel and consonant | |||||
| sounds) for the language. These should be defined in a phoneme | |||||
| definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your | |||||
| language. A reference to this file is then included at the end of the | |||||
| master phoneme file, **phsource/phonemes**, eg: | |||||
| ~~~~ {.western} | |||||
| phonemetable fr base | |||||
| include ph_french | |||||
| ~~~~ | |||||
| This example defines a phoneme table **"fr"** which inherits the | |||||
| contents of phoneme table **"base"**. Its contents are found in the file | |||||
| **ph\_french**. | |||||
| The **base** phoneme table contains definitions of a basic set of | |||||
| consonants, and also some "control" phonemes such as stress marks and | |||||
| pauses. These are defined in **phsource/phonemes**. The phoneme table | |||||
| for a language will inherit these, or alternatively it may inherit the | |||||
| phoneme table of another language which in turn inherits the **base** | |||||
| phoneme table. | |||||
| The phonemes file for the language defines those additional phonemes | |||||
| which are not inherited (generally the vowels and diphthongs, plus any | |||||
| additional consonants that are needed), or phonemes whose definitions | |||||
| differ from the inherited version (eg. the redefinition of a consonant). | |||||
| Details of phonemes files are given in | |||||
| [phontab.html](http://espeak.sf.net/phontab.html). | |||||
| The **Compile phoneme data** function of the **espeakedit** program | |||||
| compiles the phonemes files of all languages to produce the files | |||||
| **espeak-data/phontab**, **phonindex**, and **phondata** which are used | |||||
| by eSpeak. | |||||
| For many languages, the consonant phonemes which are already available | |||||
| in eSpeak, together with the available vowel files which can be used to | |||||
| define vowel phonemes, will be sufficient. At least for an initial | |||||
| implementation. | |||||
| ### 6.5 Dictionary Files {.western} | |||||
| Once the language's phonemes have been defined, then pronunciation | |||||
| dictionary data can be produced in order to translate the language's | |||||
| source text into phonemes. This consists of two source files: | |||||
| **fr\_rules** (the spelling to phoneme rules) and **fr\_list** (an | |||||
| exceptions list, and attributes of certain words). The corresponding | |||||
| compiled data file is **espeak-data/fr\_dict** which is produced from | |||||
| **fr\_rules** and **fr\_list** sources by the command: | |||||
| > `espeak-ng --compile=fr`{.western}. | |||||
| Or by using the **espeakedit** program. | |||||
| Details of the contents of the dictionary files are given in | |||||
| [dictionary.html](http://espeak.sf.net/dictionary.html). | |||||
| The **fr\_list** file contains: | |||||
| - - - - | |||||
| ### 6.6 Program Code {.western} | |||||
| The behaviour of the eSpeak program is controlled by various options | |||||
| such as: | |||||
| - - - - | |||||
| The function SetTranslator() at the start of the source code file | |||||
| tr\_languages.cpp recognizes the language code and sets the appropriate | |||||
| options. For a new language, you would add its language code and the | |||||
| required options in SetTranslator(). However, this may not be necessary | |||||
| during testing because most of the options can also be set in the voice | |||||
| file in espeak-data/voices (see [Voice | |||||
| files](http://espeak.sf.net/voices.html)). | |||||
| ### 6.7 Improving a Language {.western} | |||||
| Listen carefully to the eSpeak voice. Try to identify what sounds wrong | |||||
| and what needs to be improved. | |||||
| - - - - - | |||||
| **If you are interested in working on a language, please contact me so | |||||
| that I can set up the initial data and discuss the features of the | |||||
| language.** | |||||
| For most of the eSpeak voices, I do not speak or understand the | |||||
| language, and I do not know how it should sound. I can only make | |||||
| improvements as a result of feedback from speakers of that language. If | |||||
| you want to help to improve a language, listen carefully and try to | |||||
| identify individual errors, either in the spelling-to-phoneme | |||||
| translation, the position of stressed syllables within words, or the | |||||
| sound of phonemes, or problems with rhythm and vowel lengths. |
| ANALYSIS | |||||
| ======== | |||||
| (Further notes are needed) | |||||
| Recordings of spoken words and phrases can be analysed to try and make | |||||
| eSpeak match a language more closely. Unlike most other (larger and | |||||
| better quality) synthesizers, eSpeak's data is not produced directly | |||||
| from recorded sounds. To use an analogy, it's like a drawing or sketch | |||||
| compared with a photograph. Or vector graphics compared with a bitmap | |||||
| image. It's smaller, less accurate, with less subtlety, but it can | |||||
| sometimes show some aspects of the picture more clearly than a more | |||||
| accurate image. | |||||
| #### Recording Sounds {.western} | |||||
| Recordings should be made while speaking slowly, clearly, and firmly and | |||||
| loudly (but not shouting). Speak about half a metre from the microphone. | |||||
| Try to avoid background noise and hum interference from electrical power | |||||
| cables. | |||||
| #### Praat {.western} | |||||
| I use a modified version of the praat program | |||||
| ([www.praat.org](www.praat.org)) to view and analyse both sound | |||||
| recordings and output from eSpeak. The modification adds a new function | |||||
| (`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and | |||||
| produces a file which can be loaded into espeakedit. Details of the | |||||
| modification are in the `"praat-mod"`{.western} directory in the | |||||
| espeakedit package. The analysis contains a sequence of frames, one per | |||||
| cycle at the speech's fundamental frequency. Each frame is a short time | |||||
| spectrum, together with praat's estimation of the f1 to f5 formant | |||||
| frequencies at the time of that cycle. I also use Praat's | |||||
| `New->Record_mono_sound`{.western} function to make sound recordings. | |||||
| ### Vowels and Diphthongs {.western} | |||||
| #### Analysing a Recording {.western} | |||||
| Make a recording, with a male voice, and trim it in Praat to keep just | |||||
| the required vowel sound. Then use the new | |||||
| `Spectrum->To_eSpeak`{.western} modification (this was named | |||||
| `To_Spectrogram2`{.western} in earlier versions) to analyse the sound. | |||||
| It produces a file named `"spectrum.dat"`{.western}. Load the | |||||
| `"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open | |||||
| functions, `File->Open`{.western} and `File->Open2`{.western}. They are | |||||
| the same, except that they remember different paths. I generally use | |||||
| `File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file. | |||||
| The data is displayed in espeakedit as a sequence of spectrum frames | |||||
| (see [editor.html](editor.html)). | |||||
| #### Tone Quality {.western} | |||||
| It can be difficult to match the tonal quality of a new vowel to be | |||||
| compatible with existing vowel files. This is determined by the relative | |||||
| heights and widths of the formant peaks. These vary depending on how the | |||||
| recording was made, the microphone, and the strength and tone of the | |||||
| voice. Also the positions of the higher peaks (F3 upwards) can vary | |||||
| depending on the characteristics of the speaker's voice. Formant peaks | |||||
| correspond to resonances within the mouth and throat, and they depend on | |||||
| its size and shape. With a female voice, all the formants (F1 upwards) | |||||
| are generally shifted to higher frequencies. For these reasons, it's | |||||
| best to use a male voice, and to use its analysed spectra only as | |||||
| guidance. Rather than construct formant-peaks entirely to match the | |||||
| analysed data, instead copy keyframes from a similar existing vowel. | |||||
| Then make small adjustments to match the position of the F1, F2, F3 | |||||
| formant peaks and hopefully produce the required vowel sound. | |||||
| #### Using an Existing Vowel File {.western} | |||||
| Choose a similar vowel file from `phsource/vowel`{.western} and open it | |||||
| into espeakedit. It may be useful to use | |||||
| `phsource/vowel/vowelchart`{.western} as a map to show how vowel files | |||||
| compare with each other. You can select a keyframe from the vowel file | |||||
| and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame | |||||
| of the new spectrum sequence. Then adjust the peaks to match the new | |||||
| frame. Press F1 to hear the sound of the formant peaks in the selected | |||||
| frame. The F0 peak is provided in order to adjust the correct balance of | |||||
| low frequencies, below the F1 peak. If the sound is too muffled, or | |||||
| conversely, too "thin", try adjusting the amplitude or position of the | |||||
| F0 peak. | |||||
| #### Length and Amplitude {.western} | |||||
| Use an existing vowel file as a guide for how to set the amplitude and | |||||
| length of the keyframes. At the right of each keyframe, its length is | |||||
| shown in mS and under that is its relative (RMS) amplitude. The second | |||||
| keyframe should be marked with a red marker (use CTRL-M to toggle this). | |||||
| This divides the vowel into the front-part (with one frame), and the | |||||
| rest. Use F2 to play the sound of the new vowel sequence. It will also | |||||
| produce a WAV file (the default name is speech.wav) which you can read | |||||
| into praat to see whether it has a sensible shape. | |||||
| #### Using the New Vowel {.western} | |||||
| Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save | |||||
| the spectrum sequence with a name which you have chosen for it. You can | |||||
| then edit the phoneme file for your language (eg. phsource/ph\_xxx), and | |||||
| change a phoneme to refer to your new vowel file. Then do | |||||
| `Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to | |||||
| re-compile the phoneme data. |
| 2.1 INSTALLATION {.western} | |||||
| ---------------- | |||||
| ### 2.1.1 Linux and other Posix systems {.western} | |||||
| There are two versions of the command line program. They both have the | |||||
| same command parameters (see below). | |||||
| 1. 2. | |||||
| Place the **espeak-ng** or **speak-ng** executable file in the command | |||||
| path, eg in **/usr/local/bin** | |||||
| Place the "**espeak-data**" directory in /usr/share as | |||||
| **/usr/share/espeak-data**.\ | |||||
| Alternatively if it is placed in the user's home directory (i.e. | |||||
| **/home/\<user\>/espeak-data**) then that will be used instead. | |||||
| #### Dependencies {.western} | |||||
| **espeak-ng** uses the PortAudio sound library (version 18), so you will | |||||
| need to have the **libportaudio0** library package installed. It may be | |||||
| already, since it's used by other software, such as OpenOffice.org and | |||||
| the Audacity sound editor. | |||||
| Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio | |||||
| which has a slightly different API. The speak program can be compiled to | |||||
| use version 19 of PortAudio by copying the file portaudio19.h to | |||||
| portaudio.h before compiling. | |||||
| The speak program may be compiled without using PortAudio, by removing | |||||
| the line | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| #define USE_PORTAUDIO | |||||
| ~~~~ | |||||
| in the file speech.h. | |||||
| ### 2.1.2 Windows {.western} | |||||
| The installer: **setup\_espeak.exe** installs the SAPI5 version of | |||||
| eSpeak. During installation you need to specify which voices you want to | |||||
| appear in SAPI5 voice menus. | |||||
| It also installs a command line program **espeak-ng** in the espeak-ng | |||||
| program directory. | |||||
| 2.2 COMMAND OPTIONS {.western} | |||||
| ------------------- | |||||
| ### 2.2.1 Examples {.western} | |||||
| To use at the command line, type:\ | |||||
| **espeak-ng "This is a test"**\ | |||||
| or\ | |||||
| **espeak-ng -f \<text file\>** | |||||
| Or just type\ | |||||
| **espeak-ng**\ | |||||
| followed by text on subsequent lines. Each line is spoken when RETURN | |||||
| is pressed. | |||||
| Use **espeak-ng -x** to see the corresponding phoneme codes. | |||||
| ### 2.2.2 The Command Line Options {.western} | |||||
| **espeak-ng [options] ["text words"]** | |||||
| : Text input can be taken either from a file, from a string in the | |||||
| command, or from stdin. | |||||
| **-f \<text file\>** | |||||
| : Speaks a text file. | |||||
| **--stdin** | |||||
| : Takes the text input from stdin. | |||||
| If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). \ | |||||
| If that is not present then text is taken from stdin, but each line is treated as a separate sentence. \ | |||||
| **-a \<integer\>** | |||||
| : Sets amplitude (volume) in a range of 0 to 200. The default is 100. | |||||
| **-p \<integer\>** | |||||
| : Adjusts the pitch in a range of 0 to 99. The default is 50. | |||||
| **-s \<integer\>** | |||||
| : Sets the speed in words-per-minute (approximate values for the | |||||
| default English voice, others may differ slightly). The default | |||||
| value is 175. I generally use a faster speed of 260. The lower limit | |||||
| is 80. There is no upper limit, but about 500 is probably a | |||||
| practical maximum. | |||||
| **-b \<integer\>** | |||||
| : Input text character format. | |||||
| : 1 UTF-8. This is the default. | |||||
| : 2 The 8-bit character set which corresponds to the language (eg. | |||||
| Latin-2 for Polish). | |||||
| : 4 16 bit Unicode. | |||||
| : Without this option, eSpeak assumes text is UTF-8, but will | |||||
| automatically switch to the 8-bit character set if it finds an | |||||
| illegal UTF-8 sequence. | |||||
| **-g \<integer\>** | |||||
| : Word gap. This option inserts a pause between words. The value is | |||||
| the length of the pause, in units of 10 mS (at the default speed of | |||||
| 170 wpm). | |||||
| **-h**or **--help** | |||||
| : The first line of output gives the eSpeak version number. | |||||
| **-k \<integer\>** | |||||
| : Indicate words which begin with capital letters. | |||||
| : 1 eSpeak uses a click sound to indicate when a word starts with a | |||||
| capital letter, or double click if word is all capitals. | |||||
| : 2 eSpeak speaks the word "capital" before a word which begins with | |||||
| a capital letter. | |||||
| : Other values: eSpeak increases the pitch for words which begin | |||||
| with a capital letter. The greater the value, the greater the | |||||
| increase in pitch. Try -k20. | |||||
| **-l \<integer\>** | |||||
| : Line-break length, default value 0. If set, then lines which are | |||||
| shorter than this are treated as separate clauses and spoken | |||||
| separately with a break between them. This can be useful for some | |||||
| text files, but bad for others. | |||||
| **-m** | |||||
| : Indicates that the text contains SSML (Speech Synthesis Markup | |||||
| Language) tags or other XML tags. Those SSML tags which are | |||||
| supported are interpreted. Other tags, including HTML, are ignored, | |||||
| except that some HTML tags such as \<hr\> \<h2\> and \<li\> ensure a | |||||
| break in the speech. | |||||
| **-q** | |||||
| : Quiet. No sound is generated. This may be useful with options such | |||||
| as -x and --pho. | |||||
| **-v \<voice filename\>[+\<variant\>]** | |||||
| : Sets a Voice for the speech, usually to select a language. eg: | |||||
| ~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||||
| espeak-ng -vaf | |||||
| ~~~~ | |||||
| To use the Afrikaans voice. A modifier after the voice name can be used | |||||
| to vary the tone of the voice, eg: | |||||
| ~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||||
| espeak-ng -vaf+3 | |||||
| ~~~~ | |||||
| The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male voices | |||||
| and `+f1 +f2 +f3 +f4 `{.western}which simulate female voices by using | |||||
| higher pitches. Other variants include `+croak`{.western} and | |||||
| `+whisper`{.western}. | |||||
| \<voice filename\> is a file within the `espeak-data/voices`{.western} | |||||
| directory.\ | |||||
| \<variant\> is a file within the `espeak-data/voices/!v`{.western} | |||||
| directory. | |||||
| Voice files can specify a language, alternative pronunciations or | |||||
| phoneme sets, different pitches, tonal qualities, and prosody for the | |||||
| voice. See the [voices.html](voices.html) file. | |||||
| Voice names which start with **mb-** are for use with Mbrola diphone | |||||
| voices, see [mbrola.html](mbrola.html) | |||||
| Some languages may need additional dictionary data, see | |||||
| [languages.html](languages.html) | |||||
| **-w \<wave file\>** | |||||
| Writes the speech output to a file in WAV format, rather than speaking | |||||
| it. | |||||
| **-x** | |||||
| The phoneme mnemonics, into which the input text is translated, are | |||||
| written to stdout. If a phoneme name contains more than one letter (eg. | |||||
| [tS]), the --sep or --tie option can be used to distinguish this from | |||||
| separate phonemes. | |||||
| **-X** | |||||
| As -x, but in addition, details are shown of the pronunciation rule and | |||||
| dictionary list lookup. This can be useful to see why a certain | |||||
| pronunciation is being produced. Each matching pronunciation rule is | |||||
| listed, together with its score, the highest scoring rule being used in | |||||
| the translation. "Found:" indicates the word was found in the dictionary | |||||
| lookup list, and "Flags:" means the word was found with only properties | |||||
| and not a pronunciation. You can see when a word has been retranslated | |||||
| after removing a prefix or suffix. | |||||
| **-z** | |||||
| The option removes the end-of-sentence pause which normally occurs at | |||||
| the end of the text. | |||||
| **--stdout** | |||||
| Writes the speech output to stdout as it is produced, rather than | |||||
| speaking it. The data starts with a WAV file header which indicates the | |||||
| sample rate and format of the data. The length field is set to zero | |||||
| because the length of the data is unknown when the header is produced. | |||||
| **--compile [=\<voice name\>]** | |||||
| Compile the pronunciation rule and dictionary lookup data from their | |||||
| source files in the current directory. The Voice determines which | |||||
| language's files are compiled. For example, if it's an English voice, | |||||
| then *en\_rules*, *en\_list*, and *en\_extra* (if present), are compiled | |||||
| to replace *en\_dict* in the *speak-data* directory. If no Voice is | |||||
| specified then the default Voice is used. | |||||
| **--compile-debug [=\<voice name\>]** | |||||
| The same as **--compile**, but source line numbers from the \*\_rules | |||||
| file are included. These are included in the rules trace when the **-X** | |||||
| option is used. | |||||
| **--ipa** | |||||
| Writes phonemes to stdout, using the International Phonetic Alphabet | |||||
| (IPA).\ | |||||
| If a phoneme name contains more than one letter (eg. [tS]), the --sep | |||||
| or --tie option can be used to distinguish this from separate phonemes. | |||||
| **--path [="\<directory path\>"]** | |||||
| Specifies the directory which contains the espeak-data directory. | |||||
| **--pho** | |||||
| When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme | |||||
| data (.pho file format) to stdout. This includes the mbrola phoneme | |||||
| names with duration and pitch information, in a form which is suitable | |||||
| as input to this mbrola voice. The --phonout option can be used to write | |||||
| this data to a file. | |||||
| **--phonout [="\<filename\>"]** | |||||
| If specified, the output from -x, -X, --ipa, and --pho options is | |||||
| written to this file, rather than to stdout. | |||||
| **--punct [="\<characters\>"]** | |||||
| Speaks the names of punctuation characters when they are encountered in | |||||
| the text. If \<characters\> are given, then only those listed | |||||
| punctuation characters are spoken, eg. `--punct=".,;?"`{.western} | |||||
| **--sep [=\<character\>]** | |||||
| The character is used to separate individual phonemes in the output | |||||
| which is produced by the -x or --ipa options. The default is a space | |||||
| character. The character z means use a ZWNJ character (U+200c). | |||||
| **--split [=\<minutes\>]** | |||||
| Used with **-w**, it starts a new WAV file every `<minutes>`{.western} | |||||
| minutes, at the next sentence boundary. | |||||
| **--tie [=\<character\>]** | |||||
| The character is used within multi-letter phonemes in the output which | |||||
| is produced by the -x or --ipa options. The default is the tie | |||||
| character ͡ U+361. The character z means use a ZWJ character (U+200d). | |||||
| **--voices [=\<language code\>]** | |||||
| Lists the available voices.\ | |||||
| If =\<language code\> is present then only those voices which are | |||||
| suitable for that language are listed.\ | |||||
| `--voices=mbrola`{.western} lists the voices which use mbrola diphone | |||||
| voices. These are not included in the default `--voices`{.western} list\ | |||||
| `--voices=variant`{.western} lists the available voice variants (voice | |||||
| modifiers). | |||||
| ### 2.2.3 The Input Text {.western} | |||||
| **HTML Input** | |||||
| : If the -m option is used to indicate marked-up text, then HTML can | |||||
| be spoken directly. | |||||
| **Phoneme Input** | |||||
| : As well as plain text, phoneme mnemonics can be used in the text | |||||
| input to **espeak-ng**. They are enclosed within double square | |||||
| brackets. Spaces are used to separate words and all stressed | |||||
| syllables must be marked explicitly. | |||||
| : eg: | |||||
| `espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" `{.western} | |||||
| : This command will speak: "This is some phonetic text input". | |||||
| 4. TEXT TO PHONEME TRANSLATION {.western} | |||||
| ------------------------------ | |||||
| ### 4.1 Translation Files {.western} | |||||
| There is a separate set of pronunciation files for each language, their | |||||
| names starting with the language name. | |||||
| There are two separate methods for translating words into phonemes: | |||||
| - - | |||||
| These two files are compiled into the file ***\<language\>\_dict*** in | |||||
| the espeak-data directory (eg. espeak-data/en\_dict) | |||||
| ### 4.2 Phoneme names {.western} | |||||
| Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, | |||||
| or 4 characters. Together with a number of utility codes (eg. stress | |||||
| marks and pauses), these are defined in the phoneme data file (see | |||||
| \*spec not yet available\*). | |||||
| The utility 'phonemes' are: | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **'** | primary stress | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **,** | secondary stress | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **%** | unstressed syllable | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **= ** | put the primary stress on the | | |||||
| | | preceding syllable | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\_:** | short pause | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\_** | a shorter pause | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **||** | indicates a word boundary within a | | |||||
| | | phoneme string | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **|** | can be used to separate two adjacent | | |||||
| | | characters, to prevent them from | | |||||
| | | being considered as a | | |||||
| | | multi-character phoneme mnemonic | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| It is not necessary to specify the stress of every syllable. Stress | |||||
| markers are only needed in order to change the effect of the language's | |||||
| default stress rule. | |||||
| The phonemes which are used to represent a language's sounds are based | |||||
| loosely on the Kirshenbaum ascii character representation of the | |||||
| International Phonetic Alphabet | |||||
| [www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf) | |||||
| ### 4.3 Pronunciation Rules {.western} | |||||
| The rules in the ***\<language\>\_rules*** file specify the phonemes | |||||
| which are used to pronounce each letter, or sequence of letters. Some | |||||
| rules only apply when the letter or letters are preceded by, or followed | |||||
| by, other specified letters. | |||||
| To find the pronunciation of a word, the rules are searched and any | |||||
| which match the letters at the in the word are given a score depending | |||||
| on how many letters are matched. The pronunciation from the best | |||||
| matching rule is chosen. The pointer into the source word is then | |||||
| advanced past those letters which have been matched and the process is | |||||
| repeated until all the letters of the word have been processed. | |||||
| #### 4.3.1 Rule Groups {.western} | |||||
| The rules are organized in groups, each starting with a ".group" line: | |||||
| When matching a word, firstly the 2-letter group for the two letters at | |||||
| the current position in the word (if such a group exists) is searched, | |||||
| and then the single-letter group. The highest scoring rule in either of | |||||
| those two groups is used. | |||||
| #### 4.3.2 Rules {.western} | |||||
| Each rule is on separate line, and has the syntax: | |||||
| eg. | |||||
| "oo" is pronounced as [u:], but when also preceded by "b" and followed | |||||
| by "k", it is pronounced [U]. | |||||
| In the case of a single-letter group, the first character of \<match\> | |||||
| much be the group letter. In the case of a 2-letter group, the first two | |||||
| characters of \<match\> must be the group letters. The second and third | |||||
| rules above may be in either .group o or .group oo | |||||
| Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must | |||||
| be lower case, and matching is case-insensitive. Some upper case letters | |||||
| are used in \<pre\> and \<post\> with special meanings. | |||||
| #### 4.3.3 Special characters in \<phoneme string\>: {.western} | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\_\^\_\<language code\> ** | Translate using a different | | |||||
| | | language. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| #### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western} | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\_** | Beginning or end of a word (or a | | |||||
| | | hyphen). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **-** | Hyphen. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **A** | Any vowel (the set of vowel | | |||||
| | | characters may be defined for a | | |||||
| | | particular language). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **C** | Any consonant. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **B H F G Y ** | These may indicate other sets of | | |||||
| | | characters (defined for a particular | | |||||
| | | language). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **L\<nn\>** | Any of the sequence of characters | | |||||
| | | defined as a letter group (see 4.3.1 | | |||||
| | | above). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **D** | Any digit. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **K** | Not a vowel (i.e. a consonant or | | |||||
| | | word boundary or non-alphabetic | | |||||
| | | character). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **X** | There is no vowel until the word | | |||||
| | | boundary. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **Z** | A non-alphabetic character. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **%** | Doubled (placed before a character | | |||||
| | | in \<pre\> and after it in \<post\>. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **/** | The following character is treated | | |||||
| | | literally. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| The sets of letters indicated by A, B, C, E, F G may be defined | |||||
| differently for each language. | |||||
| Examples of rules: | |||||
| ~~~~ {.western} | |||||
| _) a // "a" at the start of a word | |||||
| a (CC // "a" followed by two consonants | |||||
| a (C% // "a" followed by a double consonant (the same letter twice) | |||||
| a (/% // "a" followed by a percent sign | |||||
| %C) a // "a" preceded by a double consonants | |||||
| ~~~~ | |||||
| #### 4.3.5 Special characters only in \<pre\>: {.western} | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **@ ** | Any syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **&** | A syllable which may be stressed | | |||||
| | | (i.e. is not defined as unstressed). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **V** | Matches only if a previous word has | | |||||
| | | indicated that a verb form is | | |||||
| | | expected. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| eg. | |||||
| ~~~~ {.western} | |||||
| @@) bi // "bi" preceded by at least two syllables | |||||
| @@a) bi // "bi" preceded by at least 2 syllables and following 'a' | |||||
| ~~~~ | |||||
| Note, that matching characters in the \<pre\> part do not affect the | |||||
| syllable counting. | |||||
| #### 4.3.6 Special characters only in \<post\>: {.western} | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **@** | A vowel follows somewhere in the | | |||||
| | | word. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **+** | Force an increase in the score in | | |||||
| | | this rule (may be repeated for more | | |||||
| | | effect). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **S\<number\> ** | This number of matching characters | | |||||
| | | are a standard suffix, remove them | | |||||
| | | and retranslate the word. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **P\<number\>** | This number of matching characters | | |||||
| | | are a standard prefix, remove them | | |||||
| | | and retranslate the word. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **Lnn** | **nn** is a 2-digit decimal number | | |||||
| | | in the range 01 to 20\ | | |||||
| | | Matches with any of the letter | | |||||
| | | sequences which have been defined | | |||||
| | | for letter group **nn** | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **N** | Only use this rule if the word is | | |||||
| | | not a retranslation after removing a | | |||||
| | | suffix. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\#** | (English specific) change the next | | |||||
| | | "e" into a special character "E" | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\$noprefix** | Only use this rule if the word is | | |||||
| | | not a retranslation after removing a | | |||||
| | | prefix. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\$w\_alt\ | Only use this rule if the word is | | |||||
| | \$w\_alt2\ | found in the \*\_list file with the | | |||||
| | \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** | | |||||
| | | attribute respectively. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\$p\_alt\ | Only use this rule if the part-word, | | |||||
| | \$p\_alt2\ | up to and including the pre and | | |||||
| | \$p\_alt3** | match parts of this rule, is found | | |||||
| | | in the \*\_list file with the | | |||||
| | | **\$alt**, **\$alt2** or **\$alt3** | | |||||
| | | attribute respectively. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| eg. | |||||
| ~~~~ {.western} | |||||
| @) ly (_S2 lI // "ly", at end of a word with at least one other | |||||
| // syllable, is a suffix pronounced [lI]. Remove | |||||
| // it and retranslate the word. | |||||
| _) un (@P2 %Vn // "un" at the start of a word is an unstressed | |||||
| // prefix pronounced [Vn] | |||||
| _) un (i ju: // ... except in words starting "uni" | |||||
| _) un (inP2 ,Vn // ... but it is for words starting "unin" | |||||
| ~~~~ | |||||
| S and P must be at the end of the \<post\> string. | |||||
| S\<number\> may be followed by additional letters (eg. S2ei ). Some of | |||||
| these are probably specific to English, but similar functions could be | |||||
| made for other languages. | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **q** | query the \_list file to find stress | | |||||
| | | position or other attributes for the | | |||||
| | | stem, but don't re-translate the | | |||||
| | | word with the suffix removed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **t** | determine the stress pattern of the | | |||||
| | | word **before** adding the suffix | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **d ** | the previous letter may have been | | |||||
| | | doubled when the suffix was added. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **e** | "e" may have been removed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **i** | "y" may have been changed to "i." | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **v** | the suffix means the verb form of | | |||||
| | | pronunciation should be used. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **f** | the suffix means the next word is | | |||||
| | | likely to be a verb. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **m** | after this suffix has been removed, | | |||||
| | | additional suffixes may be removed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| P\<number\> may be followed by additonal letters (eg. P3v ). | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **t ** | determine the stress pattern of the | | |||||
| | | word **before** adding the prefix | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **v** | the suffix means the verb form of | | |||||
| | | pronunciation should be used. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| ### 4.4 Pronunciation Dictionary List {.western} | |||||
| The ***\<language\>\_list*** file contains a list of words whose | |||||
| pronunciations are given explicitly, rather than determined by the | |||||
| Pronunciation Rules. The ***\<language\>\_extra*** file, if present, is | |||||
| also used and it's contents are taken as coming after those in | |||||
| ***\<language\>\_list***. | |||||
| Also the list can be used to specify the stress pattern, or other | |||||
| properties, of a word. | |||||
| If the Pronunciation rules are applied to a word and indicate a standard | |||||
| prefix or suffix, then the word is again looked up in Pronunciation | |||||
| Dictionary List after the prefix or suffix has been removed. | |||||
| Lines in the dictionary list have the form: | |||||
| eg. | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| book bUk | |||||
| ~~~~ | |||||
| Rather than a full pronunciation, just the stress may be given, to | |||||
| change where it would be otherwise placed by the Pronunciation Rules: | |||||
| ~~~~ {.western} | |||||
| berlin $2 // stress on second syllable | |||||
| absolutely $3 // stress on third syllable | |||||
| for $u // an unstressed word | |||||
| ~~~~ | |||||
| #### 4.4.1 Multiple Words {.western} | |||||
| A pronunciation may also be specified for a group of words, when these | |||||
| appear together. Up to four words may be given, enclosed in brackets. | |||||
| This may be used for change the pronunciation or stress pattern when | |||||
| these words occur together, | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| (de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string | |||||
| ~~~~ | |||||
| or to run them together, pronounced as a single word | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| (of a) @v@ | |||||
| ~~~~ | |||||
| or to give them a flag when they occur together | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| (such as) sVtS||a2z $pause // precede with a pause | |||||
| ~~~~ | |||||
| Hyphenated words in the ***\<language\>\_list*** file must also be | |||||
| enclosed within brackets, because the two parts are considered as | |||||
| separate words. | |||||
| #### 4.4.2 Special characters in \<phoneme string\>: {.western} | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | **\_\^\_\<language code\> ** | Translate using a different | | |||||
| | | language. See explanation in 4.3.3 | | |||||
| | | above. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| #### 4.4.3 Flags {.western} | |||||
| A word (or group of words) may be given one or more flags, either | |||||
| instead of, or as well as, the phonetic translation. | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$u | The word is unstressed. In the case | | |||||
| | | of a multi-syllable word, a slight | | |||||
| | | stress is applied according to the | | |||||
| | | default stress rules. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$u1 | The word is unstressed, with a | | |||||
| | | slight stress on its 1st syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$u2 | The word is unstressed, with a | | |||||
| | | slight stress on its 2nd syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$u3 | The word is unstressed, with a | | |||||
| | | slight stress on its 3rd syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | | | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$u+ \$u1+ \$u2+ \$u3+ | As above, but the word has full | | |||||
| | | stress if it's at the end of a | | |||||
| | | clause. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | | | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$1 | Primary stress on the 1st syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$2 | Primary stress on the 2nd syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$3 | Primary stress on the 3rd syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$4 | Primary stress on the 4th syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$5 | Primary stress on the 5th syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$6 | Primary stress on the 6th syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$7 | Primary stress on the 7th syllable. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | | | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$pause | Ensure a short pause before this | | |||||
| | | word (eg. for conjunctions such as | | |||||
| | | "and", some prepositions, etc). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$brk | Ensure a very short pause before | | |||||
| | | this word, shorter than \$pause (eg. | | |||||
| | | for some prepositions, etc). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$only | The rule does not apply if a prefix | | |||||
| | | or suffix has already been removed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$onlys | As \$only, except that a standard | | |||||
| | | plural ending is allowed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$stem | The rule only applies if a suffix | | |||||
| | | has already been removed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$strend | Word is fully stressed if it's at | | |||||
| | | the end of a clause. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$strend2 | As \$strend, but the word is also | | |||||
| | | stressed if followed only by | | |||||
| | | unstressed word(s). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$unstressend | Word is unstressed if it's at the | | |||||
| | | end of a clause. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$atend | Use this pronunciation if it's at | | |||||
| | | the end of a clause. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$double | Cause a doubling of the initial | | |||||
| | | consonant of the following word | | |||||
| | | (used for Italian). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$capital | Use this pronunciation if the word | | |||||
| | | has initial capital letter (eg. | | |||||
| | | polish v Polish). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$allcaps | Use this pronunciation if the word | | |||||
| | | is all capitals. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$dot | Ignore a . after this word even when | | |||||
| | | followed by a capital letter (eg. | | |||||
| | | Mr. Dr. ). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$hasdot | Use this pronunciation if the word | | |||||
| | | is followed by a dot. (This | | |||||
| | | attribute also implies \$dot). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$sentence | The rule only applies if the clause | | |||||
| | | includes end-of-sentence (i.e. it is | | |||||
| | | not terminated by a comma). For | | |||||
| | | example, "\$atend \$sentence" means | | |||||
| | | that the rule only applies at the | | |||||
| | | end of a sentence. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$abbrev | This has two meanings.\ | | |||||
| | | 1. If there is no phoneme string: | | |||||
| | | Speak the word as individual | | |||||
| | | letters, even if it contains a vowel | | |||||
| | | (eg. "abc" should be spoken as "a" | | |||||
| | | "b" "c").\ | | |||||
| | | 2. If there is a phoneme string: | | |||||
| | | This word is capitalized because it | | |||||
| | | is an abbreviation and | | |||||
| | | capitalization does not indicate | | |||||
| | | emphasis (if the "emphasize | | |||||
| | | all-caps" is on). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | | | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$accent | Used for the pronunciation of a | | |||||
| | | single alphabetic character. The | | |||||
| | | character name is spoken as the | | |||||
| | | base-letter name plus the accent | | |||||
| | | (diacritic) name. eg. It can be used | | |||||
| | | to specify that "â" is spoken as "a" | | |||||
| | | "circumflex". | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$combine | This word is treated as though it is | | |||||
| | | combined with the following word | | |||||
| | | with a hyphen. This may be subject | | |||||
| | | to fuither conditions for certain | | |||||
| | | languages. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$alt \$alt2 \$alt3 | These are language specific. Their | | |||||
| | | use should be described in the | | |||||
| | | language's \*\*\_list file | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | | | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$verb | Use this pronunciation if it's a | | |||||
| | | verb. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$noun | Use this pronunciation if it's a | | |||||
| | | noun. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$past | Use this pronunciation if it's past | | |||||
| | | tense. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$verbf | The following word is probably is a | | |||||
| | | verb. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$verbsf | The following word is probably is a | | |||||
| | | if it has an "s" suffix. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$nounf | The following word is probably not a | | |||||
| | | verb. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$pastf | The following word is probably past | | |||||
| | | tense. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \$verbextend | Extend the influence of \$verbf and | | |||||
| | | \$verbsf. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| The last group are probably English specific, but something similar may | |||||
| be useful in other languages. They are a crude attempt to improve the | |||||
| accuracy of pairs like ob'ject (verb) v 'object (noun) and read | |||||
| (present) v read (past). | |||||
| The dictionary list is searched from bottom to top. The first match that | |||||
| satisfies any conditions is used (i.e. the one lowest down the list). So | |||||
| if we have: | |||||
| ~~~~ {.western} | |||||
| to t@ // unstressed version | |||||
| to tu: $atend // stressed version | |||||
| ~~~~ | |||||
| then if "to" is at the end of the clause, we get [tu:], if not then we | |||||
| get [t@]. | |||||
| #### 4.4.4 Translating a Word to another Word {.western} | |||||
| Rather than specifying the pronunciation of a word by a phoneme string, | |||||
| you can specify another "sounds like" word. | |||||
| Use the attribute **\$text** eg. | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| cough coff $text | |||||
| ~~~~ | |||||
| Alternatively, use the command **\$textmode** on a line by itself to | |||||
| turn this on for all subsequent entries in the file, until it's turned | |||||
| off by **\$phonememode**. eg. | |||||
| ~~~~ {.western} | |||||
| $textmode | |||||
| cough coff | |||||
| through threw | |||||
| $phonememode | |||||
| ~~~~ | |||||
| This feature cannot be used for the special entries in the **\_list** | |||||
| files which start with an underscore, such as numbers. | |||||
| Currently "textmode" entries are only recognized for complete words, and | |||||
| not for for stems from which a prefix or suffix has been removed (eg. | |||||
| the word "coughs" would not match the example above). | |||||
| ### 4.5 Conditional Rules {.western} | |||||
| Rules in a **\_rules** file and entries in a **\_list** file can be made | |||||
| conditional. They apply only to some voices. This can be useful to | |||||
| specify different pronunciations for different variants of a language | |||||
| (dialects or accents). | |||||
| Conditional rules have **?** and a condition number at the start if | |||||
| the line in the **\_rules** or **\_list** file. This means that the rule | |||||
| only applies of that condition number is specified in a **dictrules** | |||||
| line in the [voice file](voices.html). | |||||
| If the rule starts with **?!** then the rule only applies if the | |||||
| condition number is **not** specified in the voice file. eg. | |||||
| ~~~~ {.western} | |||||
| ?3 can't kant // only use this if the voice has: dictrules 3 | |||||
| ?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3 | |||||
| ~~~~ | |||||
| ### 4.6 Numbers and Character Names {.western} | |||||
| #### 4.6.1 Letter names {.western} | |||||
| The names of individual letters can be given either in the **\_rules** | |||||
| or **\_list** file. Sometimes an individual letter is also used as a | |||||
| word in the language and its pronunciation as a word differs from its | |||||
| letter name. If so, it should be listed in the **\_list** file, preceded | |||||
| by an underscore, to give the letter name (as distinct from its | |||||
| pronunciation as a word). eg. in English: | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| _a eI | |||||
| ~~~~ | |||||
| #### 4.6.2 Numbers {.western} | |||||
| The operation the TranslateNumber() function is controlled by the | |||||
| language's `langopts.numbers`{.western} option. This constructs spoken | |||||
| numbers from fragments according to various options which can be set for | |||||
| each language. The number fragments are given in the **\_list** file. | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_0 to \_9 | The numbers 0 to 9 | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_13 | etc. Any pronunciations which are | | |||||
| | | needed for specific numbers in the | | |||||
| | | range \_10 to \_99 | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_2X \_3X | Twenty, thirty, etc., used to make | | |||||
| | | numbers 10 to 99 | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_0C | The word for "hundred" | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_1C \_2C | Special pronunciation for one | | |||||
| | | hundred, two hundred, etc., if | | |||||
| | | needed. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_1C0 | Special pronunciation (if needed) | | |||||
| | | for 100 exactly | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_0M1 | The word for "thousand" | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_0M2 | The word for "million" | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_0M3 | The word for 1000000000 | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_1M1 \_2M1 | Special pronunciation for one | | |||||
| | | thousand, two thousand, etc, if | | |||||
| | | needed | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_0and | Word for "and" when speaking numbers | | |||||
| | | (eg. "two hundred and twenty"). | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_dpt | Word spoken for the decimnal | | |||||
| | | point/comma | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | \_dpt2 | Word spoken (if any) at the end of | | |||||
| | | all the digits after a decimal | | |||||
| | | point. | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| ### 4.7 Character Substitution {.western} | |||||
| Character substitutions can be specified by using a **.replace**section | |||||
| at the start of the **\_rules**file. Each line specified either one or | |||||
| two alphabetic characters to be replaced by another one or two | |||||
| alphabetic characters. This substitution is done to a word before it is | |||||
| translated using the spelling-to-phoneme rules. Only the lower-case | |||||
| version of the characters needs to be specified. eg. | |||||
| .replace\ | |||||
| ô ő // (Hungarian) allow the use of o-circumflex instead of | |||||
| o-double-accute\ | |||||
| û ű | |||||
| cx ĉ // (Esperanto) allow "cx" as an alternative to c-circumflex | |||||
| fi fi // replace a single character ligature by two characters |
| ESPEAKEDIT PROGRAM {.western} | |||||
| ------------------ | |||||
| The **espeakedit** program is used to prepare phoneme data for the | |||||
| eSpeak speech synthesizer. | |||||
| It has two main functions: | |||||
| - - | |||||
| ### Installation {.western} | |||||
| **espeakedit** needs the following packages:\ | |||||
| (The package names mentioned here are those from the Ubuntu "Dapper" | |||||
| Linux distribution). | |||||
| - - - | |||||
| In addition, a modified version of **praat** | |||||
| ([www.praat.org](www.praat.org)) is used to view and analyse WAV sound | |||||
| files. This needs the package **libmotif3** to run and **libmotif-dev** | |||||
| to compile. | |||||
| ### Quick Guide {.western} | |||||
| This will quickly illustrate the main features. Details of the interface | |||||
| and key commands are given in [editor\_if.html](editor_if.html) | |||||
| For more detailed information on analysing sound recordings and | |||||
| preparing phoneme definitions and keyframe data see | |||||
| [analyse.html](analyse.html) (to be written). | |||||
| #### Compiling Phoneme Data {.western} | |||||
| 1. 2. 3. 4. | |||||
| #### Keyframe Sequences {.western} | |||||
| 1. 2. 3. 4. 5. 6. 7. | |||||
| #### Text and Prosody Windows {.western} | |||||
| 1. 2. 3. 4. 5. 6. 7. 8. 9. | |||||
| The Prosody window can be used to experiment with different phoneme | |||||
| lengths and different intonation. |
| USER INTERFACE - FORMANT EDITOR {.western} | |||||
| ------------------------------- | |||||
| ### Frame Sequence Display {.western} | |||||
| The eSpeak editor can display a number of frame-sequencies in tabbed | |||||
| windows. Each frame can contain a short-time frequency spectrum, | |||||
| covering the period of one cycle at the sound's pitch. Frames can also | |||||
| show: | |||||
| - - - - - | |||||
| ### Text Tab {.western} | |||||
| Enter text in the top left text window. Click the **Translate** button | |||||
| to see the phonetic transcription in the text window below. Then click | |||||
| the **Speak** button to speak the text and show the results in the | |||||
| **Prosody** tab, if that is open. | |||||
| If changes are made in the **Prosody** tab, then clicking **Speak** will | |||||
| speak the modified prosody while **Translate** will revert to the | |||||
| default prosody settings for the text. | |||||
| To enter phonetic symbols (Kirschenbaum encoding) in the top left text | |||||
| window, enclose them within [[ ]]. | |||||
| ### Spect Tab {.western} | |||||
| The "Spect" tab in the left panel of the eSpeak editor shows information | |||||
| about the currently selected frame and sequence. | |||||
| - - - - - - | |||||
| ### Key Commands {.western} | |||||
| - - - - - | |||||
| USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"} | |||||
| ------------------------------- | |||||
| - |
| # eSpeak NG - Documentation | |||||
| ====================== | |||||
| ### [Usage](commands.md) | |||||
| ### [Languages](languages.md) | |||||
| ### [Voice Files](voices.md) | |||||
| Voice files specify a language and other characteristics of a voice. | |||||
| ### [Mbrola Voices](mbrola.md) | |||||
| eSpeak NG can be used as a front-end for Mbrola diphone voices. | |||||
| ### [Pronunciation Dictionary](dictionary.md) | |||||
| ### [Adding a Language](add_language.md) | |||||
| How to add or improve a language. | |||||
| ### [Phonemes](phonemes.md) | |||||
| The list of phoneme mnemonics for English, for use in the Pronunciation | |||||
| Dictionary. | |||||
| ### [Phoneme Tables](phontab.md) | |||||
| The tables of the phonemes used by each language, with their properties | |||||
| and sound production. | |||||
| ### [Intonation](intonation.md) | |||||
| Different intonation "tunes" may be defined for different languages for | |||||
| clauses which end in full-stop, comma, question-mark, and | |||||
| exclamation-mark. | |||||
| ### [eSpeak NG Library API](speak_lib.h) | |||||
| API definition and header file for a shared library version of eSpeak NG. | |||||
| ### [Markup tags](ssml.md) | |||||
| SSML (Speech Synthesis Markup Language) and HTML tags recognized by | |||||
| eSpeak NG. | |||||
| ### [The espeakedit program](editor.md) | |||||
| GUI software to edit vowel files and to compile the phoneme data for use | |||||
| by eSpeak NG. See also [Espeakedit user interface](editor_if.md). | |||||
| INTONATION {.western} | |||||
| ---------- | |||||
| In eSpeak's standard intonation model, a "tune" is applied to each | |||||
| clause depending on its punctuation. Other intonation models may be used | |||||
| for some languages, such as tone languages. | |||||
| Named tunes are defined in the text file: | |||||
| `phsource/intonation`{.western}. This file must be compiled for use by | |||||
| eSpeak by using the espeakedit program, using the menu option: | |||||
| `Compile -> Compile intonation data`{.western}. | |||||
| ### Clauses {.western} | |||||
| The tunes which are used for a language can be specified by using a | |||||
| `tunes`{.western} statement in a voice file in | |||||
| `espeak-data/voices`{.western}. eg: | |||||
| `tunes s1 c1 q1 e1`{.western} | |||||
| It's parameters are four tune names which are used for clauses which end | |||||
| in: | |||||
| 1. 2. 3. 4. | |||||
| A clause consists of the following parts: | |||||
| - - - - | |||||
| ### Tune definitions {.western} | |||||
| Here is an example tune definition from the file | |||||
| `phsource/intonation`{.western}. | |||||
| ~~~~ {.western} | |||||
| tune s1 | |||||
| prehead 46 57 | |||||
| headenv fall 16 | |||||
| head 4 80 55 -8 -5 | |||||
| headextend 0 63 38 13 0 | |||||
| nucleus fall 70 18 24 12 | |||||
| nucleus0 fall 64 8 | |||||
| endtune | |||||
| ~~~~ | |||||
| It contains: | |||||
| **tune** \<tune name\> | |||||
| : Starts the definition of a tune. The `tune name`{.western} can | |||||
| be used in a `tunes`{.western} statements in voice files. | |||||
| **endtune** \<tune name\> | |||||
| : Ends the definition of a tune. | |||||
| **prehead** \<start pitch\> \<end pitch\> | |||||
| : Gives the pitch path for any series of unstressed syllables before | |||||
| the first stressed syllable. | |||||
| **headenv** \<envelope\> \<height\> | |||||
| : Gives the pitch envelope which is used for stressed syllables in the | |||||
| head (before the nucleus), including `onset`{.western} and | |||||
| `headlast`{.western} syllables if these are specified. | |||||
| `height`{.western} gives a pitch range for the envelope. | |||||
| **head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\> | |||||
| : `start pitch`{.western} and `end pitch`{.western} give a pitch | |||||
| path for the stressed syllables of the head. `steps`{.western} is | |||||
| the maximum number of stressed syllables for which this applies. If | |||||
| there are additional stressed syllables, then the | |||||
| `headextend`{.western} statement is used for them. | |||||
| : `unstressed start`{.western} and `unstressed end`{.western} give | |||||
| a pitch path for unstressed syllables between two stressed | |||||
| syllables. Their values are relative to the pitch of the previous | |||||
| stressed syllable. Values are usually negative, meaning that the | |||||
| unstressed syllables have lower pitch than the previous stressed | |||||
| syllable. | |||||
| **headextend** \<percentage list\> | |||||
| : If the head contains more stressed syllables than is specified by | |||||
| `steps`{.western}, then `percentage list`{.western} is used. It | |||||
| contains up to 8 numbers which are used repeatedly for the | |||||
| additional stressed syllables. A value of 0 corresponds to the lower | |||||
| the `start pitch`{.western} and `end pitch`{.western} values of the | |||||
| `head`{.western} statement. 100 corresponds to the higher value. | |||||
| Negative values and values greater than 100 are allowed. | |||||
| **nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\> | |||||
| : This gives the pitch envelope and pitch range of the last stressed | |||||
| syllable of the clause. `tail start`{.western} and | |||||
| `tail end`{.western} give a pitch path for the unstressed syllables | |||||
| which are after the last stressed syllable. | |||||
| **nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\> | |||||
| : This is used instead of `nucleus`{.western} if there are no | |||||
| unstressed syllables after the last stressed syllable. In this case, | |||||
| the pitch changes of the nucleus and the tail and both included in | |||||
| the nucleus. | |||||
| The following attributes may also be included: | |||||
| **onset** \<pitch\> \<unstressed start\> \<unstressed end\> | |||||
| : This specifies the pitch for the first stressed syllable of the | |||||
| head. If the `onset`{.western} statement is present, then the | |||||
| `head`{.western} statement used for the stressed syllables after the | |||||
| first. | |||||
| **headlast** \<pitch\> \<unstressed start\> \<unstressed end\> | |||||
| : This specifies the pitch for the last stressed syllable of the head | |||||
| (i.e. the stressed syllable before the nucleus). | |||||
| 3. LANGUAGES {.western} | |||||
| ------------ | |||||
| **Languages**. The eSpeak speech synthesizer supports several languages, | |||||
| however in many cases these are initial drafts and need more work to | |||||
| improve them. Assistance from native speakers is welcome for these, or | |||||
| other new languages. Please contact me if you want to help. | |||||
| eSpeak does text to speech synthesis for the following languages, some | |||||
| better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, | |||||
| Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, | |||||
| German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, | |||||
| Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, | |||||
| Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, | |||||
| Swedish, Tamil, Turkish, Vietnamese, Welsh. | |||||
| #### Help Needed {.western} | |||||
| Many of these are just experimental attempts at these languages, | |||||
| produced after a quick reading of the corresponding article on | |||||
| wikipedia.org. They will need work or advice from native speakers to | |||||
| improve them. Please contact me if you want to advise or assist with | |||||
| these or other languages. | |||||
| The sound of some phonemes may be poorly implemented, particularly [r] | |||||
| since I'm English and therefore unable to make a "proper" [r] sound. | |||||
| A major factor is the rhythm or cadance. An Italian speaker told me the | |||||
| Italian voice improved from "difficult to understand" to "good" by | |||||
| changing the relative length of stressed syllables. Identifying | |||||
| unstressed function words in the xx\_list file is also important to make | |||||
| the speech flow well. See [Adding or Improving a | |||||
| Language](add_language.html) | |||||
| #### Character sets {.western} | |||||
| Languages recognise text either as UTF8 or alternatively in an 8-bit | |||||
| character set which is appropriate for that language. For example, for | |||||
| Polish this is Latin2, for Russian it is KOI8-R. This choice can be | |||||
| overridden by a line in the voices file to specify an ISO 8859 character | |||||
| set, eg. for Russian the line: | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| charset 5 | |||||
| ~~~~ | |||||
| will mean that ISO 8859-5 is used as the 8-bit character set rather than | |||||
| KOI8-R. | |||||
| In the case of a language which uses a non-Latin character set (eg. | |||||
| Greek or Russian) if the text contains a word with Latin characters then | |||||
| that particular word will be pronounced using English pronunciation | |||||
| rules and English phonemes. Speaking entirely English text using a Greek | |||||
| or Russian voice will sound OK, but each word is spoken separately so it | |||||
| won't flow properly. | |||||
| Sample texts in various languages can be found at | |||||
| [http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias) | |||||
| and [www.gutenberg.org](http://www.gutenberg.org/) | |||||
| ### 3.1 Voice Files {.western} | |||||
| A number of Voice files are provided in the | |||||
| `espeak-data/voices`{.western} directory. You can select one of these | |||||
| with the **-v \<voice filename\>** parameter to the speak command, eg: | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| espeak-ng -vaf | |||||
| ~~~~ | |||||
| to speak using the Afrikaans voice. | |||||
| Language voices generally start with the 2 letter [ISO 639-1 | |||||
| code](http://en.wikipedia.org/wiki/ISO_639-1) for the language. If the | |||||
| language does not have an ISO 639-1 code, then the 3 letter [ISO 639-3 | |||||
| code](http://www.sil.org/iso639-3/codes.asp) can be used. | |||||
| For details of the voice files see [Voices](voices.html). | |||||
| #### Default Voice {.western} | |||||
| ### 3.2 English Voices {.western} | |||||
| ### 3.3 Voice Variants {.western} | |||||
| To make alternative voices for a language, you can make additional voice | |||||
| files in espeak-data/voices which contains commands to change various | |||||
| voice and pronunciation attributes. See [voices.html](voices.html). | |||||
| Alternatively there are some preset voice variants which can be applied | |||||
| to any of the language voices, by appending `+`{.western} and a variant | |||||
| name. Their effects are defined by files in | |||||
| `espeak-data/voices/!v`{.western}. | |||||
| The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male | |||||
| voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and | |||||
| `+croak +whisper`{.western} for other effects. For example: | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| espeak-ng -ven+m3 | |||||
| ~~~~ | |||||
| The available voice variants can be listed with: | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| espeak-ng --voices=variant | |||||
| ~~~~ | |||||
| ### 3.4 Other Languages {.western} | |||||
| The eSpeak speech synthesizer does text to speech for the following | |||||
| additional langauges. | |||||
| ### 3.5 Provisional Languages {.western} | |||||
| These languages are only initial naive implementations which have had | |||||
| little or no feedback and improvement from native speakers. | |||||
| ### 3.6 Mbrola Voices {.western} | |||||
| Some additional voices, whose name start with **mb-** (for example | |||||
| **mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak | |||||
| does the spelling-to-phoneme translation and intonation. See | |||||
| [mbrola.html](mbrola.html). |
| MBROLA VOICES {.western} | |||||
| ------------- | |||||
| The Mbrola project is a collection of diphone voices for speech | |||||
| synthesis. They do not include any text-to-phoneme translation, so this | |||||
| must be done by another program. The Mbrola voices are cost-free but are | |||||
| not open source. They are available from the Mbrola website at:\ | |||||
| [http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) | |||||
| eSpeak can be used as a front-end to Mbrola. It provides the | |||||
| spelling-to-phoneme translation and intonation, which Mbrola then uses | |||||
| to generate speech sound. | |||||
| ### Voice Names {.western} | |||||
| To use a Mbrola voice, eSpeak needs information to translate from its | |||||
| own phonemes to the equivalent Mbrola phonemes. This has been set up for | |||||
| only some voices so far. | |||||
| The eSpeak voices which use Mbrola are named as:\ | |||||
| **mb-**xxx | |||||
| where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola | |||||
| "**en1**" English voice). These voice files are in eSpeak's directory | |||||
| `espeak-data/voices/mbrola`{.western}. | |||||
| The installation instructions below use the Mbrola voice "en1" as an | |||||
| example. You can use other mbrola voices for which there is an | |||||
| equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}. | |||||
| There are some additional eSpeak Mbrola voices which speak English text | |||||
| using a Mbrola voice for a different language. These contain the name of | |||||
| the Mbrola voice with a suffix **-en**. For example, the voice | |||||
| **mb-de4-en** will speak English text with a German accent by using the | |||||
| Mbrola **de4** voice. | |||||
| ### Windows Installation {.western} | |||||
| The SAPI5 version of eSpeak uses the mbrola.dll. | |||||
| 1. 2. 3. 4. | |||||
| ### Linux Installation {.western} | |||||
| From eSpeak version 1.44 onwards, eSpeak calls the mbrola program | |||||
| directly, rather than passing phoneme data to it using a pipe. | |||||
| 1. 2. 3. | |||||
| ### Mbrola Voice Files {.western} | |||||
| eSpeak's voice files for Mbrola voices are in directory | |||||
| `espeak-data/voices/mbrola`{.western}. They contain a line:\ | |||||
| `mbrola <voice> <translation>`{.western} \ | |||||
| eg.\ | |||||
| `mbrola en1 en1_phtrans`{.western} | |||||
| - - | |||||
| They are binary files which are compiled, using espeakedit, from source | |||||
| files in `phsource/mbrola`{.western}, see below. | |||||
| ### Mbrola Phoneme Translation Data {.western} | |||||
| Mbrola phoneme translation files specify translations from eSpeak | |||||
| phoneme names to mbrola phoneme names. They are referenced from voice | |||||
| files. | |||||
| The source files are in `phsource/mbrola`{.western}. These are compiled | |||||
| using the `espeakedit`{.western} program | |||||
| (`Compile->Compile mbrola phonemes list`{.western}) to produce data | |||||
| files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak. | |||||
| Each line in the mbrola phoneme translation file contains: | |||||
| `<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western} | |||||
| **\<control\>** | |||||
| - - - - | |||||
| **\<espeak ph1\>**\ | |||||
| The eSpeak phoneme which is to be translated to an mbrola phoneme. | |||||
| **\<espeak ph2\>**\ | |||||
| If this field is not `NULL`{.western}, then the match only occurs if | |||||
| this field matches the next phoneme. If control bit 1 is set, then the | |||||
| *previous* rather than the *next* phoneme is matched. This field may | |||||
| also have the following values:\ | |||||
| `VWL`{.western} matches any Vowel phoneme. | |||||
| **\<percent\>**\ | |||||
| If this field is zero then only one mbrola phoneme is used. If this | |||||
| field is non-zero, then two mbrola phonemes are used, and this value | |||||
| gives the percentage length of the first mbrola phoneme. | |||||
| **\<mbrola ph1\>**\ | |||||
| The mbrola phoneme to which the eSpeak phoneme is translated. This | |||||
| field may be `NULL`{.western}. | |||||
| **\<mbrola ph2\>**\ | |||||
| The second mbrola phoneme. This field is only used if the \<percent\> | |||||
| field is not zero. | |||||
| The list is searched from start to finish, until a match is found. | |||||
| Therefore, a line with more specific match condition should appear | |||||
| before a line which matches the same eSpeak phoneme but with a more | |||||
| general condition. | |||||
| The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes | |||||
| which are used for each language. Translations for all these should be | |||||
| given in the mbrola phoneme translation file. In addition, some phonemes | |||||
| which are referenced from phoneme files (eg. | |||||
| `phsource/ph_language, phsource/phonemes`{.western}) in lines such as: | |||||
| ~~~~ {.western} | |||||
| beforenotvowel l/ | |||||
| reduceto a# 0 | |||||
| ~~~~ | |||||
| should also be included, even though they don't appear in | |||||
| `dictsource/dict_phonemes`{.western}. | |||||
| If the language's \*\_list or \*\_rules files includes rules to speak | |||||
| words "as English" the mbrola phoneme translation file should include | |||||
| rules which translate English phonemes into near equivalents, so that | |||||
| they can spoken by the mbrola voice. |
| PHONEMES {.western} | |||||
| -------- | |||||
| In general a different set of phonemes can be defined for each language. | |||||
| In most cases different languages inherit the same basic set of | |||||
| consonants. They can add to these or modify them as needed. | |||||
| The phoneme mnemonics are based on the scheme by Kirshenbaum which | |||||
| represents International Phonetic Alphabet symbols using ascii | |||||
| characters. See: | |||||
| [www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf). | |||||
| Phoneme mnemonics can be used directly in the text input to | |||||
| **espeak-ng**. They are enclosed within double square brackets. Spaces | |||||
| are used to separate words, and all stressed syllables must be marked | |||||
| explicitly. eg:\ | |||||
| `[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western} | |||||
| ### English Consonants {.western} | |||||
| `[p]`{.western} | |||||
| `[b]`{.western} | |||||
| `[t]`{.western} | |||||
| `[d]`{.western} | |||||
| `[tS]`{.western} | |||||
| **ch**urch | |||||
| `[dZ]`{.western} | |||||
| **j**udge | |||||
| `[k]`{.western} | |||||
| `[g]`{.western} | |||||
| `[f]`{.western} | |||||
| `[v]`{.western} | |||||
| `[T]`{.western} | |||||
| **th**in | |||||
| `[D]`{.western} | |||||
| **th**is | |||||
| `[s]`{.western} | |||||
| `[z]`{.western} | |||||
| `[S]`{.western} | |||||
| **sh**op | |||||
| `[Z]`{.western} | |||||
| plea**s**ure | |||||
| `[h]`{.western} | |||||
| `[m]`{.western} | |||||
| `[n]`{.western} | |||||
| `[N]`{.western} | |||||
| si**ng** | |||||
| `[l]`{.western} | |||||
| `[r]`{.western} | |||||
| **r**ed (Omitted if not immediately followed by a vowel). | |||||
| `[j]`{.western} | |||||
| **y**es | |||||
| `[w]`{.western} | |||||
| **Some Additional Consonants** | |||||
| \ | |||||
| `[C]`{.western} | |||||
| German i**ch** | |||||
| `[x]`{.western} | |||||
| German bu**ch** | |||||
| `[l^]`{.western} | |||||
| Italian **gl**i | |||||
| `[n^]`{.western} | |||||
| Spanish **ñ** | |||||
| ### English Vowels {.western} | |||||
| These are the phonemes which are used by the English spelling-to-phoneme | |||||
| translations (en\_rules and en\_list). In some varieties of English | |||||
| different phonemes may have the same sound, but they are kept separate | |||||
| because they may differ in another variety. | |||||
| In rhotic accents, such as General American, the phonemes | |||||
| `[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound. | |||||
| `[@]`{.western} | |||||
| alph**a** | |||||
| schwa | |||||
| `[3]`{.western} | |||||
| bett**er** | |||||
| rhotic schwa. In British English this is the same as `[@]`{.western}, | |||||
| but it includes 'r' colouring in American and other rhotic accents. In | |||||
| these cases a separate `[r]`{.western} should not be included unless it | |||||
| is followed immediately by another vowel. | |||||
| `[3:]`{.western} | |||||
| n**ur**se | |||||
| `[@L]`{.western} | |||||
| simp**le** | |||||
| `[@2]`{.western} | |||||
| the | |||||
| Used only for "the". | |||||
| `[@5]`{.western} | |||||
| to | |||||
| Used only for "to". | |||||
| `[a]`{.western} | |||||
| tr**a**p | |||||
| `[aa]`{.western} | |||||
| b**a**th | |||||
| This is `[a]`{.western} in some accents, `[A:]`{.western} in others. | |||||
| `[a#]`{.western} | |||||
| **a**bout | |||||
| This may be `[@]`{.western} or may be a more open schwa. | |||||
| `[A:]`{.western} | |||||
| p**al**m | |||||
| `[A@]`{.western} | |||||
| st**ar**t | |||||
| `[E]`{.western} | |||||
| dr**e**ss | |||||
| `[e@]`{.western} | |||||
| squ**are** | |||||
| `[I]`{.western} | |||||
| k**i**t | |||||
| `[I2]`{.western} | |||||
| **i**ntend | |||||
| As `[I]`{.western}, but also indicates an unstressed syllable. | |||||
| `[i]`{.western} | |||||
| happ**y** | |||||
| An unstressed "i" sound at the end of a word. | |||||
| `[i:]`{.western} | |||||
| fl**ee**ce | |||||
| `[i@]`{.western} | |||||
| n**ear** | |||||
| `[0]`{.western} | |||||
| l**o**t | |||||
| `[V]`{.western} | |||||
| str**u**t | |||||
| `[u:]`{.western} | |||||
| g**oo**se | |||||
| `[U]`{.western} | |||||
| f**oo**t | |||||
| `[U@]`{.western} | |||||
| c**ure** | |||||
| `[O:]`{.western} | |||||
| th**ou**ght | |||||
| `[O@]`{.western} | |||||
| n**or**th | |||||
| `[o@]`{.western} | |||||
| f**or**ce | |||||
| `[aI]`{.western} | |||||
| pr**i**ce | |||||
| `[eI]`{.western} | |||||
| f**a**ce | |||||
| `[OI]`{.western} | |||||
| ch**oi**ce | |||||
| `[aU]`{.western} | |||||
| m**ou**th | |||||
| `[oU]`{.western} | |||||
| g**oa**t | |||||
| `[aI@]`{.western} | |||||
| sc**ie**nce | |||||
| `[aU@]`{.western} | |||||
| h**our** | |||||
| ### Some Additional Vowels {.western} | |||||
| Other languages will have their own vowel definitions, eg: | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | `[e]`{.western} | German **eh**, French **é** | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | `[o]`{.western} | German **oo**, French **o** | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | `[y]`{.western} | German **ü**, French **u** | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| | `[Y]`{.western} | German **ö**, French **oe** | | |||||
| +--------------------------------------+--------------------------------------+ | |||||
| `[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western} |
| PHONEME TABLES {.western} | |||||
| -------------- | |||||
| A phoneme table defines all the phonemes which are used by a language, | |||||
| together with their properties and the data for their production as | |||||
| sounds. | |||||
| Generally each language has its own phoneme table, although additional | |||||
| phoneme tables can be used for different voices within the language. | |||||
| These alternatives are referenced from Voice files. | |||||
| A phoneme table does not need to define all the phonemes used by a | |||||
| language. It can inherit the phonemes from a previously defined phoneme | |||||
| table. For example, a phoneme table may redefine (or add) some of the | |||||
| vowels that it uses, but inherit most of its consonants from a standard | |||||
| set. | |||||
| The source files for the phoneme data are in the "phsource" directory in | |||||
| the espeakedit download package. "Vowel files", which are referenced in | |||||
| FMT(), VowelStart(), and VowelEnding() instructions are made using the | |||||
| espeakedit program. | |||||
| ### Phoneme files {.western} | |||||
| The phoneme tables are defined in a master phoneme file, named | |||||
| **phonemes**. This starts with the **base** phoneme table followed by | |||||
| phoneme tables for other languages and voices. These inherit phonemes | |||||
| from the **base** table or previously defined tables. | |||||
| In addition to phoneme definitions, the phoneme file can contain the | |||||
| following: | |||||
| **include** \<filename\> | |||||
| : Includes the text of the specified file at this point. This allows | |||||
| different phoneme tables to be kept in different text files, for | |||||
| convenience. \<filename\> is a relative path. The included file can | |||||
| itself contain **include** statements. | |||||
| **phonemetable** \<name\> \<parent\> | |||||
| : Starts a new phoneme table, and ends the previous table.\ | |||||
| \<name\> Is the name of this phoneme table. This name is used in | |||||
| Voice files.\ | |||||
| \<parent\> Is the name of a previously defined phoneme table whose | |||||
| phoneme definitions are inherited by this one. The name **base** | |||||
| indicates the first (base) phoneme table. | |||||
| ### Phoneme definitions {.western} | |||||
| Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and | |||||
| later. | |||||
| A phoneme table contains a list of phoneme definitions. Each starts with | |||||
| the keyword **phoneme** and the phoneme name (this is the name used in | |||||
| the pronunciation rules in a language's \*\_rules and \*\_list files), | |||||
| and ends with the keyword **endphoneme**. For example: | |||||
| ~~~~ {.western} | |||||
| phoneme aI | |||||
| vowel | |||||
| starttype #a endtype #i | |||||
| length 230 | |||||
| FMT(vowels/ai) | |||||
| endphoneme | |||||
| phoneme s | |||||
| vls alv frc sibilant | |||||
| voicingswitch z | |||||
| lengthmod 3 | |||||
| Vowelin f1=0 f2=1700 -300 300 f3=-100 80 | |||||
| Vowelout f1=0 f2=1700 -300 250 f3=-100 80 rms=20 | |||||
| IF nextPh(isPause) THEN | |||||
| WAV(ufric/s_) | |||||
| ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN | |||||
| WAV(ufric/s!) | |||||
| ENDIF | |||||
| WAV(ufric/s) | |||||
| endphoneme | |||||
| ~~~~ | |||||
| A phoneme definition contains both static properties and executed | |||||
| instructions. The instructions may contain conditional statements, so | |||||
| that the effect of the phoneme may be different depending on adjacent | |||||
| phonemes, whether the syllable is stressed, etc. | |||||
| The instructions of a phoneme are interpreted in two different phases. | |||||
| In the first phase, the instructions may change the phoneme and replace | |||||
| it by a different phoneme. In the second phase, instructions are used to | |||||
| produce the sound for the phoneme. | |||||
| The **import\_phoneme** statement can be used to copy a previously | |||||
| defined phoneme from a specified phoneme table. For example: | |||||
| ~~~~ {.western} | |||||
| phoneme t | |||||
| import_phoneme base/t[ | |||||
| endphoneme | |||||
| ~~~~ | |||||
| means: `phoneme t`{.western} in this phoneme table is a copy of | |||||
| `phoneme t[`{.western} from phoneme table "base". A **length** | |||||
| instruction can be used after **import\_phoneme** to vary the length | |||||
| from the original. | |||||
| ### Phoneme Properties {.western} | |||||
| Within the phoneme definition the following lines may occur: ( (V) | |||||
| indicates only for vowels, (C) only for consonants) | |||||
| ### Phoneme Instructions {.western} | |||||
| Phoneme Instructions may be included within conditional statements. | |||||
| During the first phase of phoneme interpretation, an instruction which | |||||
| causes a change to a different phoneme will terminate the instructions. | |||||
| During the second phase, FMT() and WAV() instructions will terminate the | |||||
| instructions. | |||||
| ### Conditional Statements {.western} | |||||
| Phoneme definitions can contain conditional statements such as: | |||||
| ~~~~ {.western} | |||||
| IF <condition> THEN | |||||
| <statements> | |||||
| ENDIF | |||||
| ~~~~ | |||||
| or more generally: | |||||
| ~~~~ {.western} | |||||
| IF <condition> THEN | |||||
| <statements> | |||||
| ELIF <condition> THEN | |||||
| <statements> | |||||
| ... | |||||
| ELSE | |||||
| <statements> | |||||
| ENDIF | |||||
| ~~~~ | |||||
| where the `ELSE`{.western} and multiple `ELSE`{.western} parts are | |||||
| optional. | |||||
| Multiple conditions may be joined with `AND`{.western} or | |||||
| `OR`{.western}, but not a mixture of `AND`{.western}s and | |||||
| `OR`{.western}s. | |||||
| A condition may be preceded by `NOT`{.western}. For example: | |||||
| ~~~~ {.western} | |||||
| IF <condition> AND NOT <condition> THEN | |||||
| <statements> | |||||
| ENDIF | |||||
| ~~~~ | |||||
| **Condition** Can be: | |||||
| **Attributes** | |||||
| ### Sound Specifications {.western} | |||||
| There are three ways to produce sounds: | |||||
| - - - | |||||
| ### Vowel Transitions {.western} | |||||
| These specify how a consonant affects an adjacent vowel. A consonant may | |||||
| cause a transition in the vowel's formants as the mouth changes shape | |||||
| between the consonant and the vowel. The following attributes may be | |||||
| specified. Note that the maximum rate of change of formant frequencies | |||||
| is limited by the speak program. | |||||
| TEXT MARKUP {.western} | |||||
| ----------- | |||||
| ### SSML: Speech Synthesis Markup Language {.western} | |||||
| The following markup tags and attributes are recognised: | |||||
| **\<speak\>** | |||||
| - - | |||||
| **\<voice\>** | |||||
| - - - - - | |||||
| **\<prosody\>** | |||||
| - - - - | |||||
| **\<say-as\>** | |||||
| - - - - - | |||||
| **\<mark\>** name | |||||
| **\<s\>** | |||||
| - | |||||
| **\<p\>** | |||||
| - | |||||
| **\<sub\>** alias | |||||
| **\<tts:style\>** | |||||
| - - | |||||
| **\<audio\>** src | |||||
| **\<emphasis\>** | |||||
| - | |||||
| **\<break\>** | |||||
| - - | |||||
| ### HTML {.western} | |||||
| eSpeak can speak HTML text directly, or text containing both SSML and | |||||
| HTML markup.\ | |||||
| Any unrecognised tags are ignored. | |||||
| The following tags case a sentence break.\ | |||||
| **\<br\> \<dd\> \<li\> \<img\> \<td\> ** | |||||
| The following tags case a paragraph break.\ | |||||
| **\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> ** | |||||
| Text between the following tags is ignored.\ | |||||
| **\<script\> ... \</script\> \ | |||||
| \<style\> ... \</style\> ** |
| 5. VOICES {.western} | |||||
| --------- | |||||
| ### 5.1 Voice Files {.western} | |||||
| A Voice file specifies a language (and possibly a language variant or | |||||
| dialect) together with various attributes that affect the | |||||
| characteristics of the voice quality and how the language is spoken. | |||||
| Voice files are placed in the `espeak-data/voices`{.western} directory, | |||||
| or within subdirectories in there. | |||||
| The available voice files can be listed by: | |||||
| ~~~~ {.western} | |||||
| espeak-ng --voices | |||||
| or | |||||
| espeak-ng --voices=<language> | |||||
| ~~~~ | |||||
| also | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| espeak-ng --voices=<variant> | |||||
| ~~~~ | |||||
| Lists voice variants which can be applied to eSpeak voices. | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| espeak-ng --voices=<mbrola> | |||||
| ~~~~ | |||||
| Lists the Mbrola voices. | |||||
| ### 5.2 Contents of Voice Files {.western} | |||||
| The **language** attribute is mandatory. All the other attributes are | |||||
| optional. | |||||
| #### Identification Attributes {.western} | |||||
| **name \<name\>** | |||||
| A name given to this voice. | |||||
| **language \<language code\> [\<priority\>]** | |||||
| This attribute should appear before the other attributes which are | |||||
| listed below. | |||||
| It selects the default behaviour and characteristics for the language, | |||||
| and sets default values for "phonemes", "dictionary" and other | |||||
| attributes. The \<language code\> should be a two-letter ISO 639-1 | |||||
| language code. One or more language variant codes may be appended, | |||||
| separated by hyphens. (eg. en-uk-north). | |||||
| The optional \<priority\> value gives the preference of this voice | |||||
| compared with others for the specified language. A low value indicates a | |||||
| more preferred voice. The default value is 5. | |||||
| More than one **language** line may be present. A voice may be selected | |||||
| for other related languages (variants which have the same initial 2 | |||||
| letter language code as the specified language), but it will be less | |||||
| preferred for these. Different language variants may be specified by | |||||
| additional **language** lines in order to indicate that this is a | |||||
| preferred voice for them also. Eg. | |||||
| ~~~~ {.western} | |||||
| language en-uk-north | |||||
| language en | |||||
| ~~~~ | |||||
| indicates that this is voice is for the "en-uk-north" dialect, but it is | |||||
| also a main choice when a general "en" language is specified. Without | |||||
| the second **language** line, it would be disfavoured for "en" for being | |||||
| a more specialised voice. | |||||
| **gender \<gender\> [\<age\>]** | |||||
| This attribute is only a label for use in voice selection. It doesn't | |||||
| change the sound of the voice. | |||||
| \<gender\> may be male, female, or unknown.\ | |||||
| \<age\> is optional and gives an age in years. | |||||
| **pitch \<base\> \<range\>** | |||||
| Two integer values. The first gives a base pitch to the voice (value in | |||||
| Hz) The second controls the range of pitches used by the voice. Setting | |||||
| it equal to the base pitch will give a monotone. The default values are | |||||
| 82 118. | |||||
| **formant \<number\> \<frequency\> \<strength\> \<width\> | |||||
| \<freq\_add\>** | |||||
| Systematically adjusts the frequency, strength, and width of the | |||||
| resonance peaks of the voice. Values are percentages of the default | |||||
| values. Changing these affects the tone/quality of the voice. | |||||
| **freq\_add**Adds a constant value (in Hz) to the frequency of the | |||||
| formant peak. The value may be negative. | |||||
| - - - - | |||||
| **echo \<delay\> \<amplitude\>** | |||||
| Parameter 1 gives the delay in mS (0 to 250mS).\ | |||||
| Parameter 2 gives the echo amplitude (0 to 100).\ | |||||
| Adding some echo can give a clearer or more interesting sound, | |||||
| especially when listening through a domestic stereo sound system, rather | |||||
| than small computer speakers. | |||||
| **tone** | |||||
| Controls the tone of the sound.\ | |||||
| **tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\> | |||||
| which define a frequency response graph. Frequency is in Hz and | |||||
| amplitude is in the range 0 to 255. The default is: | |||||
| ` `{.western}`tone 600 170 1200 135 2000 110`{.western} | |||||
| This means that from frequency 0Hz to 600Hz the amplitude is 170. From | |||||
| 600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases | |||||
| to 110 at 2000Hz and remains at 110 at higher frequencies. This | |||||
| adjustment applies only to voiced sounds such as vowels and sonorant | |||||
| consonants (such as [n] and [l]). Unvoiced sounds such as [s] are | |||||
| unaffected. | |||||
| This **tone** statement can also appear in | |||||
| `espeak-data/config`{.western}, in which case it applies to all voices | |||||
| which don't have their own **tone** statement. | |||||
| **flutter \<value\>** | |||||
| Default value: 2.\ | |||||
| Adds pitch fluctuations to give a wavering or older-sounding voice. A | |||||
| large value (eg. 20) makes the voice sound "croaky". | |||||
| **roughness \<value\>** | |||||
| Default value: 2. Range 0 - 7\ | |||||
| Reduces the amplitude of alternate waveform cycles in order to make the | |||||
| voice sound creaky. | |||||
| **voicing \<value\>** | |||||
| Default value: 100.\ | |||||
| Adjusts the strength of formant-synthesized sounds (vowels and sonorant | |||||
| consonants). | |||||
| **consonants \<value\> \<value\>** | |||||
| Default values: 100, 100.\ | |||||
| Adjusts the strength of noise sounds which are used in consonants. The | |||||
| first value is the strength of unvoiced consonants such as "s" and "t". | |||||
| The second value is the strength of the noise component of voiced | |||||
| consonants such as "z" and "d". | |||||
| **breath \<up to 8 integer values\>** | |||||
| Default values: 0.\ | |||||
| Adds noise which corresponds to the formant frequency peaks. The values | |||||
| give the strength of noise for each formant peak (formants 1 to 8). | |||||
| Use together with a low or zero value of the **voicing** attribute to | |||||
| make a "wisper". For example:\ | |||||
| `breath 75 75 60 40 15 10 breathw 150 150 200 200 400 400 voicing 18 flutter 20 formant 0 100 0 100 // remove formant 0 `{.western} | |||||
| **breathw \<up to 8 integer values\>** | |||||
| These values give bandwidths of the noise peaks of the **breath** | |||||
| attribute. If **breathw** values are not given, then suitable default | |||||
| values will be used. | |||||
| **speed \<value\>** | |||||
| Default value 100.\ | |||||
| Adjusts the speaking speed by a percentage of the default rate. This | |||||
| can be used if a language voice seems faster or slower compared to other | |||||
| voices. | |||||
| **phonemes \<name\>** | |||||
| Specifies which set of phonemes to use from those contained in the | |||||
| phontab, phonindex, and phondata data files. This is a **phonemetable** | |||||
| name as given in the "phoneme" source file. | |||||
| This parameter is usually not needed as it is set by default to the | |||||
| first two letters of the "language" parameter. However, different voices | |||||
| of the same language can use different phoneme sets, to give different | |||||
| accents. | |||||
| **dictionary \<name\>** | |||||
| Specifies which pair of dictionary files to use. eg. "english" indicates | |||||
| that *speak-data/en\_dict* should be used to translate from words to | |||||
| phonemes. This parameter is usually not needed as it is set by default | |||||
| to the first two letters of "language" parameter. | |||||
| **dictrules \<list of rule numbers\>** | |||||
| Gives a list of conditional dictionary rules which are applied for this | |||||
| voice. Rule numbers are in the range 0 to 31 and are specific to a | |||||
| language dictionary. They apply to rules in the language's **\_rules** | |||||
| dictionary file and also its **\_list** exceptions list. See | |||||
| [dictionary.html](dictionary.html). | |||||
| **replace \<flags\> \<phoneme\> \<replacement phoneme\>** | |||||
| Replace a phoneme by another whenever it occurs. | |||||
| \<replacement phoneme\> may be NULL. | |||||
| Flags: bit 0: replacement only occurs on the final phoneme of a word.\ | |||||
| Flags: bit 1: replacement doesn't occur in stressed syllables.\ | |||||
| eg. | |||||
| ~~~~ {.western} | |||||
| replace 0 h NULL // drops h's | |||||
| replace 0 V U // replaces vowel in 'strut' by that in 'foot' | |||||
| // as occurs in northern British English | |||||
| replace 3 N n // change 'fishing' to 'fishin' etc. | |||||
| // (only the last phoneme of a word, only in unstressed syllables) | |||||
| ~~~~ | |||||
| The phoneme mnemonics can be defined for each language, but some are | |||||
| listed in [phonemes.html](phonemes.html) | |||||
| **stressLength \<8 integer values\>** | |||||
| Eight integer parameters. These control the relative lengths of the | |||||
| vowels in stressed and unstressed syllables. | |||||
| - - - - - - - - | |||||
| **stressAdd \<8 integer values\>** | |||||
| Eight integer parameters. These are added to the voice's corresponding | |||||
| stressLength values. They are used in the voice variant files in | |||||
| `espeak-data/voices/!v`{.western} to give some variety. Negative values | |||||
| may be used. | |||||
| **stressAmp \<8 integer values\>** | |||||
| Eight integer parameters. These control the relative amplitudes of the | |||||
| vowels in stressed and unstressed syllables (see stressLength above). | |||||
| The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although | |||||
| these defaults may be different for particular languages. | |||||
| **intonation \<param1\>** | |||||
| - - - - | |||||
| **charset \<param1\>** | |||||
| The ISO 8859 character set number. (not all are implemented). | |||||
| **dictmin \<value\>** | |||||
| Used for some languages to detect if additional language data is | |||||
| installed. If the size of the compiled dictionary data for the language | |||||
| (the file `espeak-data/*_dict`{.western}) is less than this size then a | |||||
| warning is given. | |||||
| **alphabet2 \<alphabet\> \<language\>** | |||||
| Used to specify a language to be used to speak words which are written | |||||
| in a non-native alphabet. eg: | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| alphabet2 cyr ru | |||||
| ~~~~ | |||||
| Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default | |||||
| language for latin alphabet is English. | |||||
| **dictdialect \<dialect\>** | |||||
| Words can be marked in the \*\_list or \*\_rules file to be spoken using | |||||
| a foreign voice. This **dictdialect** attribute can be used to specify | |||||
| which dialect of the foreign language should be used, instead of the | |||||
| default dialect. The currently available dialects are:\ | |||||
| **en-us** (US English)\ | |||||
| **es-la** (Latin American Spanish).\ | |||||
| eg. | |||||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
| dictdialect en-us | |||||
| ~~~~ | |||||
| This means that any words or rules which are maked with \_\^\_EN will be | |||||
| spoken with the US English voice instead of the default UK English | |||||
| voice. | |||||
| Additional attributes are available to set various internal options | |||||
| which control how language is processed. These would normally be set in | |||||
| the program code rather than in a voice file. | |||||
| A number of Voice files are provided in the | |||||
| `espeak-data/voices`{.western} directory. You can select one of these | |||||
| with the **-v \<voice filename\>** parameter to the speak command. | |||||
| **default** | |||||
| This voice is used if none is specified in the speak command. You can | |||||
| copy your preferred voice to "default" so you can use the speak command | |||||
| without the need to specify a voice. | |||||
| For a list of voices provided for English and other languages see | |||||
| [Languages](languages.html). |