| @@ -0,0 +1,157 @@ | |||
| 6. ADDING OR IMPROVING A LANGUAGE {.western} | |||
| --------------------------------- | |||
| Most of the work doesn't need any programming knowledge. Just an | |||
| understanding of the language, an awareness of its features, patience | |||
| and attention to detail. Wikipedia is a good source of basic phonetic | |||
| information, eg | |||
| [http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel). | |||
| In many cases it should be fairly easy to add a rough implementation of | |||
| a new language, hopefully enough to be intelligible. After that it's a | |||
| gradual process of improvement. | |||
| ### 6.1 Language Code {.western} | |||
| Generally, the language's international [ISO | |||
| 639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify | |||
| the language. It is used in the filenames which contain the language's | |||
| data. In the examples below the code **"fr"** is used as an example. | |||
| Replace this with the code of your language. | |||
| If the language does not have a 2-letter ISO\_639-1 code, then use the | |||
| 3-letter ISO\_639-3 code. Language codes may differ from country codes. | |||
| It is possible to have different variants of a language for different | |||
| dialects. For example the sound of some phonemes are changed, or some of | |||
| the pronunciation rules differ. | |||
| ### 6.2 Language Files {.western} | |||
| The following files are needed for your language. | |||
| - - - - | |||
| The **fr\_rules** and **fr\_list** files are compiled to produce the | |||
| file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking. | |||
| ### 6.3 Voice File {.western} | |||
| Each language needs a voice file in **espeak-data/voices** or | |||
| **espeak-data/voices/test**. The filename of the default voice for a | |||
| language should be the same as the language code (eg. "fr" for French). | |||
| Details of the contents of voice files are given in | |||
| [voices.html](http://espeak.sf.net/voices.html). | |||
| The simplest voice file would contain just 2 lines to give the language | |||
| name and language code, eg: | |||
| ~~~~ {.western} | |||
| name french | |||
| language fr | |||
| ~~~~ | |||
| This language code specifies which phoneme table and dictionary to use | |||
| (i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If | |||
| needed, these can be overridden by **phonemes** and **dictionary** | |||
| attributes in the voice file. For example you may want to start the | |||
| implementation of a new language by using the phoneme table of an | |||
| existing language. | |||
| ### 6.4 Phoneme Definition File {.western} | |||
| You must first decide on the set of phonemes (vowel and consonant | |||
| sounds) for the language. These should be defined in a phoneme | |||
| definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your | |||
| language. A reference to this file is then included at the end of the | |||
| master phoneme file, **phsource/phonemes**, eg: | |||
| ~~~~ {.western} | |||
| phonemetable fr base | |||
| include ph_french | |||
| ~~~~ | |||
| This example defines a phoneme table **"fr"** which inherits the | |||
| contents of phoneme table **"base"**. Its contents are found in the file | |||
| **ph\_french**. | |||
| The **base** phoneme table contains definitions of a basic set of | |||
| consonants, and also some "control" phonemes such as stress marks and | |||
| pauses. These are defined in **phsource/phonemes**. The phoneme table | |||
| for a language will inherit these, or alternatively it may inherit the | |||
| phoneme table of another language which in turn inherits the **base** | |||
| phoneme table. | |||
| The phonemes file for the language defines those additional phonemes | |||
| which are not inherited (generally the vowels and diphthongs, plus any | |||
| additional consonants that are needed), or phonemes whose definitions | |||
| differ from the inherited version (eg. the redefinition of a consonant). | |||
| Details of phonemes files are given in | |||
| [phontab.html](http://espeak.sf.net/phontab.html). | |||
| The **Compile phoneme data** function of the **espeakedit** program | |||
| compiles the phonemes files of all languages to produce the files | |||
| **espeak-data/phontab**, **phonindex**, and **phondata** which are used | |||
| by eSpeak. | |||
| For many languages, the consonant phonemes which are already available | |||
| in eSpeak, together with the available vowel files which can be used to | |||
| define vowel phonemes, will be sufficient. At least for an initial | |||
| implementation. | |||
| ### 6.5 Dictionary Files {.western} | |||
| Once the language's phonemes have been defined, then pronunciation | |||
| dictionary data can be produced in order to translate the language's | |||
| source text into phonemes. This consists of two source files: | |||
| **fr\_rules** (the spelling to phoneme rules) and **fr\_list** (an | |||
| exceptions list, and attributes of certain words). The corresponding | |||
| compiled data file is **espeak-data/fr\_dict** which is produced from | |||
| **fr\_rules** and **fr\_list** sources by the command: | |||
| > `espeak-ng --compile=fr`{.western}. | |||
| Or by using the **espeakedit** program. | |||
| Details of the contents of the dictionary files are given in | |||
| [dictionary.html](http://espeak.sf.net/dictionary.html). | |||
| The **fr\_list** file contains: | |||
| - - - - | |||
| ### 6.6 Program Code {.western} | |||
| The behaviour of the eSpeak program is controlled by various options | |||
| such as: | |||
| - - - - | |||
| The function SetTranslator() at the start of the source code file | |||
| tr\_languages.cpp recognizes the language code and sets the appropriate | |||
| options. For a new language, you would add its language code and the | |||
| required options in SetTranslator(). However, this may not be necessary | |||
| during testing because most of the options can also be set in the voice | |||
| file in espeak-data/voices (see [Voice | |||
| files](http://espeak.sf.net/voices.html)). | |||
| ### 6.7 Improving a Language {.western} | |||
| Listen carefully to the eSpeak voice. Try to identify what sounds wrong | |||
| and what needs to be improved. | |||
| - - - - - | |||
| **If you are interested in working on a language, please contact me so | |||
| that I can set up the initial data and discuss the features of the | |||
| language.** | |||
| For most of the eSpeak voices, I do not speak or understand the | |||
| language, and I do not know how it should sound. I can only make | |||
| improvements as a result of feedback from speakers of that language. If | |||
| you want to help to improve a language, listen carefully and try to | |||
| identify individual errors, either in the spelling-to-phoneme | |||
| translation, the position of stressed syllables within words, or the | |||
| sound of phonemes, or problems with rhythm and vowel lengths. | |||
| @@ -0,0 +1,101 @@ | |||
| ANALYSIS | |||
| ======== | |||
| (Further notes are needed) | |||
| Recordings of spoken words and phrases can be analysed to try and make | |||
| eSpeak match a language more closely. Unlike most other (larger and | |||
| better quality) synthesizers, eSpeak's data is not produced directly | |||
| from recorded sounds. To use an analogy, it's like a drawing or sketch | |||
| compared with a photograph. Or vector graphics compared with a bitmap | |||
| image. It's smaller, less accurate, with less subtlety, but it can | |||
| sometimes show some aspects of the picture more clearly than a more | |||
| accurate image. | |||
| #### Recording Sounds {.western} | |||
| Recordings should be made while speaking slowly, clearly, and firmly and | |||
| loudly (but not shouting). Speak about half a metre from the microphone. | |||
| Try to avoid background noise and hum interference from electrical power | |||
| cables. | |||
| #### Praat {.western} | |||
| I use a modified version of the praat program | |||
| ([www.praat.org](www.praat.org)) to view and analyse both sound | |||
| recordings and output from eSpeak. The modification adds a new function | |||
| (`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and | |||
| produces a file which can be loaded into espeakedit. Details of the | |||
| modification are in the `"praat-mod"`{.western} directory in the | |||
| espeakedit package. The analysis contains a sequence of frames, one per | |||
| cycle at the speech's fundamental frequency. Each frame is a short time | |||
| spectrum, together with praat's estimation of the f1 to f5 formant | |||
| frequencies at the time of that cycle. I also use Praat's | |||
| `New->Record_mono_sound`{.western} function to make sound recordings. | |||
| ### Vowels and Diphthongs {.western} | |||
| #### Analysing a Recording {.western} | |||
| Make a recording, with a male voice, and trim it in Praat to keep just | |||
| the required vowel sound. Then use the new | |||
| `Spectrum->To_eSpeak`{.western} modification (this was named | |||
| `To_Spectrogram2`{.western} in earlier versions) to analyse the sound. | |||
| It produces a file named `"spectrum.dat"`{.western}. Load the | |||
| `"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open | |||
| functions, `File->Open`{.western} and `File->Open2`{.western}. They are | |||
| the same, except that they remember different paths. I generally use | |||
| `File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file. | |||
| The data is displayed in espeakedit as a sequence of spectrum frames | |||
| (see [editor.html](editor.html)). | |||
| #### Tone Quality {.western} | |||
| It can be difficult to match the tonal quality of a new vowel to be | |||
| compatible with existing vowel files. This is determined by the relative | |||
| heights and widths of the formant peaks. These vary depending on how the | |||
| recording was made, the microphone, and the strength and tone of the | |||
| voice. Also the positions of the higher peaks (F3 upwards) can vary | |||
| depending on the characteristics of the speaker's voice. Formant peaks | |||
| correspond to resonances within the mouth and throat, and they depend on | |||
| its size and shape. With a female voice, all the formants (F1 upwards) | |||
| are generally shifted to higher frequencies. For these reasons, it's | |||
| best to use a male voice, and to use its analysed spectra only as | |||
| guidance. Rather than construct formant-peaks entirely to match the | |||
| analysed data, instead copy keyframes from a similar existing vowel. | |||
| Then make small adjustments to match the position of the F1, F2, F3 | |||
| formant peaks and hopefully produce the required vowel sound. | |||
| #### Using an Existing Vowel File {.western} | |||
| Choose a similar vowel file from `phsource/vowel`{.western} and open it | |||
| into espeakedit. It may be useful to use | |||
| `phsource/vowel/vowelchart`{.western} as a map to show how vowel files | |||
| compare with each other. You can select a keyframe from the vowel file | |||
| and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame | |||
| of the new spectrum sequence. Then adjust the peaks to match the new | |||
| frame. Press F1 to hear the sound of the formant peaks in the selected | |||
| frame. The F0 peak is provided in order to adjust the correct balance of | |||
| low frequencies, below the F1 peak. If the sound is too muffled, or | |||
| conversely, too "thin", try adjusting the amplitude or position of the | |||
| F0 peak. | |||
| #### Length and Amplitude {.western} | |||
| Use an existing vowel file as a guide for how to set the amplitude and | |||
| length of the keyframes. At the right of each keyframe, its length is | |||
| shown in mS and under that is its relative (RMS) amplitude. The second | |||
| keyframe should be marked with a red marker (use CTRL-M to toggle this). | |||
| This divides the vowel into the front-part (with one frame), and the | |||
| rest. Use F2 to play the sound of the new vowel sequence. It will also | |||
| produce a WAV file (the default name is speech.wav) which you can read | |||
| into praat to see whether it has a sensible shape. | |||
| #### Using the New Vowel {.western} | |||
| Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save | |||
| the spectrum sequence with a name which you have chosen for it. You can | |||
| then edit the phoneme file for your language (eg. phsource/ph\_xxx), and | |||
| change a phoneme to refer to your new vowel file. Then do | |||
| `Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to | |||
| re-compile the phoneme data. | |||
| @@ -0,0 +1,279 @@ | |||
| 2.1 INSTALLATION {.western} | |||
| ---------------- | |||
| ### 2.1.1 Linux and other Posix systems {.western} | |||
| There are two versions of the command line program. They both have the | |||
| same command parameters (see below). | |||
| 1. 2. | |||
| Place the **espeak-ng** or **speak-ng** executable file in the command | |||
| path, eg in **/usr/local/bin** | |||
| Place the "**espeak-data**" directory in /usr/share as | |||
| **/usr/share/espeak-data**.\ | |||
| Alternatively if it is placed in the user's home directory (i.e. | |||
| **/home/\<user\>/espeak-data**) then that will be used instead. | |||
| #### Dependencies {.western} | |||
| **espeak-ng** uses the PortAudio sound library (version 18), so you will | |||
| need to have the **libportaudio0** library package installed. It may be | |||
| already, since it's used by other software, such as OpenOffice.org and | |||
| the Audacity sound editor. | |||
| Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio | |||
| which has a slightly different API. The speak program can be compiled to | |||
| use version 19 of PortAudio by copying the file portaudio19.h to | |||
| portaudio.h before compiling. | |||
| The speak program may be compiled without using PortAudio, by removing | |||
| the line | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| #define USE_PORTAUDIO | |||
| ~~~~ | |||
| in the file speech.h. | |||
| ### 2.1.2 Windows {.western} | |||
| The installer: **setup\_espeak.exe** installs the SAPI5 version of | |||
| eSpeak. During installation you need to specify which voices you want to | |||
| appear in SAPI5 voice menus. | |||
| It also installs a command line program **espeak-ng** in the espeak-ng | |||
| program directory. | |||
| 2.2 COMMAND OPTIONS {.western} | |||
| ------------------- | |||
| ### 2.2.1 Examples {.western} | |||
| To use at the command line, type:\ | |||
| **espeak-ng "This is a test"**\ | |||
| or\ | |||
| **espeak-ng -f \<text file\>** | |||
| Or just type\ | |||
| **espeak-ng**\ | |||
| followed by text on subsequent lines. Each line is spoken when RETURN | |||
| is pressed. | |||
| Use **espeak-ng -x** to see the corresponding phoneme codes. | |||
| ### 2.2.2 The Command Line Options {.western} | |||
| **espeak-ng [options] ["text words"]** | |||
| : Text input can be taken either from a file, from a string in the | |||
| command, or from stdin. | |||
| **-f \<text file\>** | |||
| : Speaks a text file. | |||
| **--stdin** | |||
| : Takes the text input from stdin. | |||
| If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). \ | |||
| If that is not present then text is taken from stdin, but each line is treated as a separate sentence. \ | |||
| **-a \<integer\>** | |||
| : Sets amplitude (volume) in a range of 0 to 200. The default is 100. | |||
| **-p \<integer\>** | |||
| : Adjusts the pitch in a range of 0 to 99. The default is 50. | |||
| **-s \<integer\>** | |||
| : Sets the speed in words-per-minute (approximate values for the | |||
| default English voice, others may differ slightly). The default | |||
| value is 175. I generally use a faster speed of 260. The lower limit | |||
| is 80. There is no upper limit, but about 500 is probably a | |||
| practical maximum. | |||
| **-b \<integer\>** | |||
| : Input text character format. | |||
| : 1 UTF-8. This is the default. | |||
| : 2 The 8-bit character set which corresponds to the language (eg. | |||
| Latin-2 for Polish). | |||
| : 4 16 bit Unicode. | |||
| : Without this option, eSpeak assumes text is UTF-8, but will | |||
| automatically switch to the 8-bit character set if it finds an | |||
| illegal UTF-8 sequence. | |||
| **-g \<integer\>** | |||
| : Word gap. This option inserts a pause between words. The value is | |||
| the length of the pause, in units of 10 mS (at the default speed of | |||
| 170 wpm). | |||
| **-h**or **--help** | |||
| : The first line of output gives the eSpeak version number. | |||
| **-k \<integer\>** | |||
| : Indicate words which begin with capital letters. | |||
| : 1 eSpeak uses a click sound to indicate when a word starts with a | |||
| capital letter, or double click if word is all capitals. | |||
| : 2 eSpeak speaks the word "capital" before a word which begins with | |||
| a capital letter. | |||
| : Other values: eSpeak increases the pitch for words which begin | |||
| with a capital letter. The greater the value, the greater the | |||
| increase in pitch. Try -k20. | |||
| **-l \<integer\>** | |||
| : Line-break length, default value 0. If set, then lines which are | |||
| shorter than this are treated as separate clauses and spoken | |||
| separately with a break between them. This can be useful for some | |||
| text files, but bad for others. | |||
| **-m** | |||
| : Indicates that the text contains SSML (Speech Synthesis Markup | |||
| Language) tags or other XML tags. Those SSML tags which are | |||
| supported are interpreted. Other tags, including HTML, are ignored, | |||
| except that some HTML tags such as \<hr\> \<h2\> and \<li\> ensure a | |||
| break in the speech. | |||
| **-q** | |||
| : Quiet. No sound is generated. This may be useful with options such | |||
| as -x and --pho. | |||
| **-v \<voice filename\>[+\<variant\>]** | |||
| : Sets a Voice for the speech, usually to select a language. eg: | |||
| ~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||
| espeak-ng -vaf | |||
| ~~~~ | |||
| To use the Afrikaans voice. A modifier after the voice name can be used | |||
| to vary the tone of the voice, eg: | |||
| ~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||
| espeak-ng -vaf+3 | |||
| ~~~~ | |||
| The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male voices | |||
| and `+f1 +f2 +f3 +f4 `{.western}which simulate female voices by using | |||
| higher pitches. Other variants include `+croak`{.western} and | |||
| `+whisper`{.western}. | |||
| \<voice filename\> is a file within the `espeak-data/voices`{.western} | |||
| directory.\ | |||
| \<variant\> is a file within the `espeak-data/voices/!v`{.western} | |||
| directory. | |||
| Voice files can specify a language, alternative pronunciations or | |||
| phoneme sets, different pitches, tonal qualities, and prosody for the | |||
| voice. See the [voices.html](voices.html) file. | |||
| Voice names which start with **mb-** are for use with Mbrola diphone | |||
| voices, see [mbrola.html](mbrola.html) | |||
| Some languages may need additional dictionary data, see | |||
| [languages.html](languages.html) | |||
| **-w \<wave file\>** | |||
| Writes the speech output to a file in WAV format, rather than speaking | |||
| it. | |||
| **-x** | |||
| The phoneme mnemonics, into which the input text is translated, are | |||
| written to stdout. If a phoneme name contains more than one letter (eg. | |||
| [tS]), the --sep or --tie option can be used to distinguish this from | |||
| separate phonemes. | |||
| **-X** | |||
| As -x, but in addition, details are shown of the pronunciation rule and | |||
| dictionary list lookup. This can be useful to see why a certain | |||
| pronunciation is being produced. Each matching pronunciation rule is | |||
| listed, together with its score, the highest scoring rule being used in | |||
| the translation. "Found:" indicates the word was found in the dictionary | |||
| lookup list, and "Flags:" means the word was found with only properties | |||
| and not a pronunciation. You can see when a word has been retranslated | |||
| after removing a prefix or suffix. | |||
| **-z** | |||
| The option removes the end-of-sentence pause which normally occurs at | |||
| the end of the text. | |||
| **--stdout** | |||
| Writes the speech output to stdout as it is produced, rather than | |||
| speaking it. The data starts with a WAV file header which indicates the | |||
| sample rate and format of the data. The length field is set to zero | |||
| because the length of the data is unknown when the header is produced. | |||
| **--compile [=\<voice name\>]** | |||
| Compile the pronunciation rule and dictionary lookup data from their | |||
| source files in the current directory. The Voice determines which | |||
| language's files are compiled. For example, if it's an English voice, | |||
| then *en\_rules*, *en\_list*, and *en\_extra* (if present), are compiled | |||
| to replace *en\_dict* in the *speak-data* directory. If no Voice is | |||
| specified then the default Voice is used. | |||
| **--compile-debug [=\<voice name\>]** | |||
| The same as **--compile**, but source line numbers from the \*\_rules | |||
| file are included. These are included in the rules trace when the **-X** | |||
| option is used. | |||
| **--ipa** | |||
| Writes phonemes to stdout, using the International Phonetic Alphabet | |||
| (IPA).\ | |||
| If a phoneme name contains more than one letter (eg. [tS]), the --sep | |||
| or --tie option can be used to distinguish this from separate phonemes. | |||
| **--path [="\<directory path\>"]** | |||
| Specifies the directory which contains the espeak-data directory. | |||
| **--pho** | |||
| When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme | |||
| data (.pho file format) to stdout. This includes the mbrola phoneme | |||
| names with duration and pitch information, in a form which is suitable | |||
| as input to this mbrola voice. The --phonout option can be used to write | |||
| this data to a file. | |||
| **--phonout [="\<filename\>"]** | |||
| If specified, the output from -x, -X, --ipa, and --pho options is | |||
| written to this file, rather than to stdout. | |||
| **--punct [="\<characters\>"]** | |||
| Speaks the names of punctuation characters when they are encountered in | |||
| the text. If \<characters\> are given, then only those listed | |||
| punctuation characters are spoken, eg. `--punct=".,;?"`{.western} | |||
| **--sep [=\<character\>]** | |||
| The character is used to separate individual phonemes in the output | |||
| which is produced by the -x or --ipa options. The default is a space | |||
| character. The character z means use a ZWNJ character (U+200c). | |||
| **--split [=\<minutes\>]** | |||
| Used with **-w**, it starts a new WAV file every `<minutes>`{.western} | |||
| minutes, at the next sentence boundary. | |||
| **--tie [=\<character\>]** | |||
| The character is used within multi-letter phonemes in the output which | |||
| is produced by the -x or --ipa options. The default is the tie | |||
| character ͡ U+361. The character z means use a ZWJ character (U+200d). | |||
| **--voices [=\<language code\>]** | |||
| Lists the available voices.\ | |||
| If =\<language code\> is present then only those voices which are | |||
| suitable for that language are listed.\ | |||
| `--voices=mbrola`{.western} lists the voices which use mbrola diphone | |||
| voices. These are not included in the default `--voices`{.western} list\ | |||
| `--voices=variant`{.western} lists the available voice variants (voice | |||
| modifiers). | |||
| ### 2.2.3 The Input Text {.western} | |||
| **HTML Input** | |||
| : If the -m option is used to indicate marked-up text, then HTML can | |||
| be spoken directly. | |||
| **Phoneme Input** | |||
| : As well as plain text, phoneme mnemonics can be used in the text | |||
| input to **espeak-ng**. They are enclosed within double square | |||
| brackets. Spaces are used to separate words and all stressed | |||
| syllables must be marked explicitly. | |||
| : eg: | |||
| `espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" `{.western} | |||
| : This command will speak: "This is some phonetic text input". | |||
| @@ -0,0 +1,655 @@ | |||
| 4. TEXT TO PHONEME TRANSLATION {.western} | |||
| ------------------------------ | |||
| ### 4.1 Translation Files {.western} | |||
| There is a separate set of pronunciation files for each language, their | |||
| names starting with the language name. | |||
| There are two separate methods for translating words into phonemes: | |||
| - - | |||
| These two files are compiled into the file ***\<language\>\_dict*** in | |||
| the espeak-data directory (eg. espeak-data/en\_dict) | |||
| ### 4.2 Phoneme names {.western} | |||
| Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, | |||
| or 4 characters. Together with a number of utility codes (eg. stress | |||
| marks and pauses), these are defined in the phoneme data file (see | |||
| \*spec not yet available\*). | |||
| The utility 'phonemes' are: | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **'** | primary stress | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **,** | secondary stress | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **%** | unstressed syllable | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **= ** | put the primary stress on the | | |||
| | | preceding syllable | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\_:** | short pause | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\_** | a shorter pause | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **||** | indicates a word boundary within a | | |||
| | | phoneme string | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **|** | can be used to separate two adjacent | | |||
| | | characters, to prevent them from | | |||
| | | being considered as a | | |||
| | | multi-character phoneme mnemonic | | |||
| +--------------------------------------+--------------------------------------+ | |||
| It is not necessary to specify the stress of every syllable. Stress | |||
| markers are only needed in order to change the effect of the language's | |||
| default stress rule. | |||
| The phonemes which are used to represent a language's sounds are based | |||
| loosely on the Kirshenbaum ascii character representation of the | |||
| International Phonetic Alphabet | |||
| [www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf) | |||
| ### 4.3 Pronunciation Rules {.western} | |||
| The rules in the ***\<language\>\_rules*** file specify the phonemes | |||
| which are used to pronounce each letter, or sequence of letters. Some | |||
| rules only apply when the letter or letters are preceded by, or followed | |||
| by, other specified letters. | |||
| To find the pronunciation of a word, the rules are searched and any | |||
| which match the letters at the in the word are given a score depending | |||
| on how many letters are matched. The pronunciation from the best | |||
| matching rule is chosen. The pointer into the source word is then | |||
| advanced past those letters which have been matched and the process is | |||
| repeated until all the letters of the word have been processed. | |||
| #### 4.3.1 Rule Groups {.western} | |||
| The rules are organized in groups, each starting with a ".group" line: | |||
| When matching a word, firstly the 2-letter group for the two letters at | |||
| the current position in the word (if such a group exists) is searched, | |||
| and then the single-letter group. The highest scoring rule in either of | |||
| those two groups is used. | |||
| #### 4.3.2 Rules {.western} | |||
| Each rule is on separate line, and has the syntax: | |||
| eg. | |||
| "oo" is pronounced as [u:], but when also preceded by "b" and followed | |||
| by "k", it is pronounced [U]. | |||
| In the case of a single-letter group, the first character of \<match\> | |||
| much be the group letter. In the case of a 2-letter group, the first two | |||
| characters of \<match\> must be the group letters. The second and third | |||
| rules above may be in either .group o or .group oo | |||
| Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must | |||
| be lower case, and matching is case-insensitive. Some upper case letters | |||
| are used in \<pre\> and \<post\> with special meanings. | |||
| #### 4.3.3 Special characters in \<phoneme string\>: {.western} | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\_\^\_\<language code\> ** | Translate using a different | | |||
| | | language. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| #### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western} | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\_** | Beginning or end of a word (or a | | |||
| | | hyphen). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **-** | Hyphen. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **A** | Any vowel (the set of vowel | | |||
| | | characters may be defined for a | | |||
| | | particular language). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **C** | Any consonant. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **B H F G Y ** | These may indicate other sets of | | |||
| | | characters (defined for a particular | | |||
| | | language). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **L\<nn\>** | Any of the sequence of characters | | |||
| | | defined as a letter group (see 4.3.1 | | |||
| | | above). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **D** | Any digit. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **K** | Not a vowel (i.e. a consonant or | | |||
| | | word boundary or non-alphabetic | | |||
| | | character). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **X** | There is no vowel until the word | | |||
| | | boundary. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **Z** | A non-alphabetic character. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **%** | Doubled (placed before a character | | |||
| | | in \<pre\> and after it in \<post\>. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **/** | The following character is treated | | |||
| | | literally. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| The sets of letters indicated by A, B, C, E, F G may be defined | |||
| differently for each language. | |||
| Examples of rules: | |||
| ~~~~ {.western} | |||
| _) a // "a" at the start of a word | |||
| a (CC // "a" followed by two consonants | |||
| a (C% // "a" followed by a double consonant (the same letter twice) | |||
| a (/% // "a" followed by a percent sign | |||
| %C) a // "a" preceded by a double consonants | |||
| ~~~~ | |||
| #### 4.3.5 Special characters only in \<pre\>: {.western} | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **@ ** | Any syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **&** | A syllable which may be stressed | | |||
| | | (i.e. is not defined as unstressed). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **V** | Matches only if a previous word has | | |||
| | | indicated that a verb form is | | |||
| | | expected. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| eg. | |||
| ~~~~ {.western} | |||
| @@) bi // "bi" preceded by at least two syllables | |||
| @@a) bi // "bi" preceded by at least 2 syllables and following 'a' | |||
| ~~~~ | |||
| Note, that matching characters in the \<pre\> part do not affect the | |||
| syllable counting. | |||
| #### 4.3.6 Special characters only in \<post\>: {.western} | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **@** | A vowel follows somewhere in the | | |||
| | | word. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **+** | Force an increase in the score in | | |||
| | | this rule (may be repeated for more | | |||
| | | effect). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **S\<number\> ** | This number of matching characters | | |||
| | | are a standard suffix, remove them | | |||
| | | and retranslate the word. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **P\<number\>** | This number of matching characters | | |||
| | | are a standard prefix, remove them | | |||
| | | and retranslate the word. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **Lnn** | **nn** is a 2-digit decimal number | | |||
| | | in the range 01 to 20\ | | |||
| | | Matches with any of the letter | | |||
| | | sequences which have been defined | | |||
| | | for letter group **nn** | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **N** | Only use this rule if the word is | | |||
| | | not a retranslation after removing a | | |||
| | | suffix. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\#** | (English specific) change the next | | |||
| | | "e" into a special character "E" | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\$noprefix** | Only use this rule if the word is | | |||
| | | not a retranslation after removing a | | |||
| | | prefix. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\$w\_alt\ | Only use this rule if the word is | | |||
| | \$w\_alt2\ | found in the \*\_list file with the | | |||
| | \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | | attribute respectively. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\$p\_alt\ | Only use this rule if the part-word, | | |||
| | \$p\_alt2\ | up to and including the pre and | | |||
| | \$p\_alt3** | match parts of this rule, is found | | |||
| | | in the \*\_list file with the | | |||
| | | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | | attribute respectively. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| eg. | |||
| ~~~~ {.western} | |||
| @) ly (_S2 lI // "ly", at end of a word with at least one other | |||
| // syllable, is a suffix pronounced [lI]. Remove | |||
| // it and retranslate the word. | |||
| _) un (@P2 %Vn // "un" at the start of a word is an unstressed | |||
| // prefix pronounced [Vn] | |||
| _) un (i ju: // ... except in words starting "uni" | |||
| _) un (inP2 ,Vn // ... but it is for words starting "unin" | |||
| ~~~~ | |||
| S and P must be at the end of the \<post\> string. | |||
| S\<number\> may be followed by additional letters (eg. S2ei ). Some of | |||
| these are probably specific to English, but similar functions could be | |||
| made for other languages. | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **q** | query the \_list file to find stress | | |||
| | | position or other attributes for the | | |||
| | | stem, but don't re-translate the | | |||
| | | word with the suffix removed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **t** | determine the stress pattern of the | | |||
| | | word **before** adding the suffix | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **d ** | the previous letter may have been | | |||
| | | doubled when the suffix was added. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **e** | "e" may have been removed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **i** | "y" may have been changed to "i." | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **v** | the suffix means the verb form of | | |||
| | | pronunciation should be used. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **f** | the suffix means the next word is | | |||
| | | likely to be a verb. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **m** | after this suffix has been removed, | | |||
| | | additional suffixes may be removed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| P\<number\> may be followed by additonal letters (eg. P3v ). | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **t ** | determine the stress pattern of the | | |||
| | | word **before** adding the prefix | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **v** | the suffix means the verb form of | | |||
| | | pronunciation should be used. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| ### 4.4 Pronunciation Dictionary List {.western} | |||
| The ***\<language\>\_list*** file contains a list of words whose | |||
| pronunciations are given explicitly, rather than determined by the | |||
| Pronunciation Rules. The ***\<language\>\_extra*** file, if present, is | |||
| also used and it's contents are taken as coming after those in | |||
| ***\<language\>\_list***. | |||
| Also the list can be used to specify the stress pattern, or other | |||
| properties, of a word. | |||
| If the Pronunciation rules are applied to a word and indicate a standard | |||
| prefix or suffix, then the word is again looked up in Pronunciation | |||
| Dictionary List after the prefix or suffix has been removed. | |||
| Lines in the dictionary list have the form: | |||
| eg. | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| book bUk | |||
| ~~~~ | |||
| Rather than a full pronunciation, just the stress may be given, to | |||
| change where it would be otherwise placed by the Pronunciation Rules: | |||
| ~~~~ {.western} | |||
| berlin $2 // stress on second syllable | |||
| absolutely $3 // stress on third syllable | |||
| for $u // an unstressed word | |||
| ~~~~ | |||
| #### 4.4.1 Multiple Words {.western} | |||
| A pronunciation may also be specified for a group of words, when these | |||
| appear together. Up to four words may be given, enclosed in brackets. | |||
| This may be used for change the pronunciation or stress pattern when | |||
| these words occur together, | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| (de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string | |||
| ~~~~ | |||
| or to run them together, pronounced as a single word | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| (of a) @v@ | |||
| ~~~~ | |||
| or to give them a flag when they occur together | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| (such as) sVtS||a2z $pause // precede with a pause | |||
| ~~~~ | |||
| Hyphenated words in the ***\<language\>\_list*** file must also be | |||
| enclosed within brackets, because the two parts are considered as | |||
| separate words. | |||
| #### 4.4.2 Special characters in \<phoneme string\>: {.western} | |||
| +--------------------------------------+--------------------------------------+ | |||
| | **\_\^\_\<language code\> ** | Translate using a different | | |||
| | | language. See explanation in 4.3.3 | | |||
| | | above. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| #### 4.4.3 Flags {.western} | |||
| A word (or group of words) may be given one or more flags, either | |||
| instead of, or as well as, the phonetic translation. | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$u | The word is unstressed. In the case | | |||
| | | of a multi-syllable word, a slight | | |||
| | | stress is applied according to the | | |||
| | | default stress rules. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$u1 | The word is unstressed, with a | | |||
| | | slight stress on its 1st syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$u2 | The word is unstressed, with a | | |||
| | | slight stress on its 2nd syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$u3 | The word is unstressed, with a | | |||
| | | slight stress on its 3rd syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | | | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$u+ \$u1+ \$u2+ \$u3+ | As above, but the word has full | | |||
| | | stress if it's at the end of a | | |||
| | | clause. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | | | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$1 | Primary stress on the 1st syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$2 | Primary stress on the 2nd syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$3 | Primary stress on the 3rd syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$4 | Primary stress on the 4th syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$5 | Primary stress on the 5th syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$6 | Primary stress on the 6th syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$7 | Primary stress on the 7th syllable. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | | | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$pause | Ensure a short pause before this | | |||
| | | word (eg. for conjunctions such as | | |||
| | | "and", some prepositions, etc). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$brk | Ensure a very short pause before | | |||
| | | this word, shorter than \$pause (eg. | | |||
| | | for some prepositions, etc). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$only | The rule does not apply if a prefix | | |||
| | | or suffix has already been removed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$onlys | As \$only, except that a standard | | |||
| | | plural ending is allowed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$stem | The rule only applies if a suffix | | |||
| | | has already been removed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$strend | Word is fully stressed if it's at | | |||
| | | the end of a clause. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$strend2 | As \$strend, but the word is also | | |||
| | | stressed if followed only by | | |||
| | | unstressed word(s). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$unstressend | Word is unstressed if it's at the | | |||
| | | end of a clause. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$atend | Use this pronunciation if it's at | | |||
| | | the end of a clause. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$double | Cause a doubling of the initial | | |||
| | | consonant of the following word | | |||
| | | (used for Italian). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$capital | Use this pronunciation if the word | | |||
| | | has initial capital letter (eg. | | |||
| | | polish v Polish). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$allcaps | Use this pronunciation if the word | | |||
| | | is all capitals. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$dot | Ignore a . after this word even when | | |||
| | | followed by a capital letter (eg. | | |||
| | | Mr. Dr. ). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$hasdot | Use this pronunciation if the word | | |||
| | | is followed by a dot. (This | | |||
| | | attribute also implies \$dot). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$sentence | The rule only applies if the clause | | |||
| | | includes end-of-sentence (i.e. it is | | |||
| | | not terminated by a comma). For | | |||
| | | example, "\$atend \$sentence" means | | |||
| | | that the rule only applies at the | | |||
| | | end of a sentence. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$abbrev | This has two meanings.\ | | |||
| | | 1. If there is no phoneme string: | | |||
| | | Speak the word as individual | | |||
| | | letters, even if it contains a vowel | | |||
| | | (eg. "abc" should be spoken as "a" | | |||
| | | "b" "c").\ | | |||
| | | 2. If there is a phoneme string: | | |||
| | | This word is capitalized because it | | |||
| | | is an abbreviation and | | |||
| | | capitalization does not indicate | | |||
| | | emphasis (if the "emphasize | | |||
| | | all-caps" is on). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | | | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$accent | Used for the pronunciation of a | | |||
| | | single alphabetic character. The | | |||
| | | character name is spoken as the | | |||
| | | base-letter name plus the accent | | |||
| | | (diacritic) name. eg. It can be used | | |||
| | | to specify that "â" is spoken as "a" | | |||
| | | "circumflex". | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$combine | This word is treated as though it is | | |||
| | | combined with the following word | | |||
| | | with a hyphen. This may be subject | | |||
| | | to fuither conditions for certain | | |||
| | | languages. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$alt \$alt2 \$alt3 | These are language specific. Their | | |||
| | | use should be described in the | | |||
| | | language's \*\*\_list file | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | | | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$verb | Use this pronunciation if it's a | | |||
| | | verb. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$noun | Use this pronunciation if it's a | | |||
| | | noun. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$past | Use this pronunciation if it's past | | |||
| | | tense. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$verbf | The following word is probably is a | | |||
| | | verb. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$verbsf | The following word is probably is a | | |||
| | | if it has an "s" suffix. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$nounf | The following word is probably not a | | |||
| | | verb. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$pastf | The following word is probably past | | |||
| | | tense. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \$verbextend | Extend the influence of \$verbf and | | |||
| | | \$verbsf. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| The last group are probably English specific, but something similar may | |||
| be useful in other languages. They are a crude attempt to improve the | |||
| accuracy of pairs like ob'ject (verb) v 'object (noun) and read | |||
| (present) v read (past). | |||
| The dictionary list is searched from bottom to top. The first match that | |||
| satisfies any conditions is used (i.e. the one lowest down the list). So | |||
| if we have: | |||
| ~~~~ {.western} | |||
| to t@ // unstressed version | |||
| to tu: $atend // stressed version | |||
| ~~~~ | |||
| then if "to" is at the end of the clause, we get [tu:], if not then we | |||
| get [t@]. | |||
| #### 4.4.4 Translating a Word to another Word {.western} | |||
| Rather than specifying the pronunciation of a word by a phoneme string, | |||
| you can specify another "sounds like" word. | |||
| Use the attribute **\$text** eg. | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| cough coff $text | |||
| ~~~~ | |||
| Alternatively, use the command **\$textmode** on a line by itself to | |||
| turn this on for all subsequent entries in the file, until it's turned | |||
| off by **\$phonememode**. eg. | |||
| ~~~~ {.western} | |||
| $textmode | |||
| cough coff | |||
| through threw | |||
| $phonememode | |||
| ~~~~ | |||
| This feature cannot be used for the special entries in the **\_list** | |||
| files which start with an underscore, such as numbers. | |||
| Currently "textmode" entries are only recognized for complete words, and | |||
| not for for stems from which a prefix or suffix has been removed (eg. | |||
| the word "coughs" would not match the example above). | |||
| ### 4.5 Conditional Rules {.western} | |||
| Rules in a **\_rules** file and entries in a **\_list** file can be made | |||
| conditional. They apply only to some voices. This can be useful to | |||
| specify different pronunciations for different variants of a language | |||
| (dialects or accents). | |||
| Conditional rules have **?** and a condition number at the start if | |||
| the line in the **\_rules** or **\_list** file. This means that the rule | |||
| only applies of that condition number is specified in a **dictrules** | |||
| line in the [voice file](voices.html). | |||
| If the rule starts with **?!** then the rule only applies if the | |||
| condition number is **not** specified in the voice file. eg. | |||
| ~~~~ {.western} | |||
| ?3 can't kant // only use this if the voice has: dictrules 3 | |||
| ?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3 | |||
| ~~~~ | |||
| ### 4.6 Numbers and Character Names {.western} | |||
| #### 4.6.1 Letter names {.western} | |||
| The names of individual letters can be given either in the **\_rules** | |||
| or **\_list** file. Sometimes an individual letter is also used as a | |||
| word in the language and its pronunciation as a word differs from its | |||
| letter name. If so, it should be listed in the **\_list** file, preceded | |||
| by an underscore, to give the letter name (as distinct from its | |||
| pronunciation as a word). eg. in English: | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| _a eI | |||
| ~~~~ | |||
| #### 4.6.2 Numbers {.western} | |||
| The operation the TranslateNumber() function is controlled by the | |||
| language's `langopts.numbers`{.western} option. This constructs spoken | |||
| numbers from fragments according to various options which can be set for | |||
| each language. The number fragments are given in the **\_list** file. | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_0 to \_9 | The numbers 0 to 9 | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_13 | etc. Any pronunciations which are | | |||
| | | needed for specific numbers in the | | |||
| | | range \_10 to \_99 | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_2X \_3X | Twenty, thirty, etc., used to make | | |||
| | | numbers 10 to 99 | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_0C | The word for "hundred" | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_1C \_2C | Special pronunciation for one | | |||
| | | hundred, two hundred, etc., if | | |||
| | | needed. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_1C0 | Special pronunciation (if needed) | | |||
| | | for 100 exactly | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_0M1 | The word for "thousand" | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_0M2 | The word for "million" | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_0M3 | The word for 1000000000 | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_1M1 \_2M1 | Special pronunciation for one | | |||
| | | thousand, two thousand, etc, if | | |||
| | | needed | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_0and | Word for "and" when speaking numbers | | |||
| | | (eg. "two hundred and twenty"). | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_dpt | Word spoken for the decimnal | | |||
| | | point/comma | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | \_dpt2 | Word spoken (if any) at the end of | | |||
| | | all the digits after a decimal | | |||
| | | point. | | |||
| +--------------------------------------+--------------------------------------+ | |||
| ### 4.7 Character Substitution {.western} | |||
| Character substitutions can be specified by using a **.replace**section | |||
| at the start of the **\_rules**file. Each line specified either one or | |||
| two alphabetic characters to be replaced by another one or two | |||
| alphabetic characters. This substitution is done to a word before it is | |||
| translated using the spelling-to-phoneme rules. Only the lower-case | |||
| version of the characters needs to be specified. eg. | |||
| .replace\ | |||
| ô ő // (Hungarian) allow the use of o-circumflex instead of | |||
| o-double-accute\ | |||
| û ű | |||
| cx ĉ // (Esperanto) allow "cx" as an alternative to c-circumflex | |||
| fi fi // replace a single character ligature by two characters | |||
| @@ -0,0 +1,46 @@ | |||
| ESPEAKEDIT PROGRAM {.western} | |||
| ------------------ | |||
| The **espeakedit** program is used to prepare phoneme data for the | |||
| eSpeak speech synthesizer. | |||
| It has two main functions: | |||
| - - | |||
| ### Installation {.western} | |||
| **espeakedit** needs the following packages:\ | |||
| (The package names mentioned here are those from the Ubuntu "Dapper" | |||
| Linux distribution). | |||
| - - - | |||
| In addition, a modified version of **praat** | |||
| ([www.praat.org](www.praat.org)) is used to view and analyse WAV sound | |||
| files. This needs the package **libmotif3** to run and **libmotif-dev** | |||
| to compile. | |||
| ### Quick Guide {.western} | |||
| This will quickly illustrate the main features. Details of the interface | |||
| and key commands are given in [editor\_if.html](editor_if.html) | |||
| For more detailed information on analysing sound recordings and | |||
| preparing phoneme definitions and keyframe data see | |||
| [analyse.html](analyse.html) (to be written). | |||
| #### Compiling Phoneme Data {.western} | |||
| 1. 2. 3. 4. | |||
| #### Keyframe Sequences {.western} | |||
| 1. 2. 3. 4. 5. 6. 7. | |||
| #### Text and Prosody Windows {.western} | |||
| 1. 2. 3. 4. 5. 6. 7. 8. 9. | |||
| The Prosody window can be used to experiment with different phoneme | |||
| lengths and different intonation. | |||
| @@ -0,0 +1,41 @@ | |||
| USER INTERFACE - FORMANT EDITOR {.western} | |||
| ------------------------------- | |||
| ### Frame Sequence Display {.western} | |||
| The eSpeak editor can display a number of frame-sequencies in tabbed | |||
| windows. Each frame can contain a short-time frequency spectrum, | |||
| covering the period of one cycle at the sound's pitch. Frames can also | |||
| show: | |||
| - - - - - | |||
| ### Text Tab {.western} | |||
| Enter text in the top left text window. Click the **Translate** button | |||
| to see the phonetic transcription in the text window below. Then click | |||
| the **Speak** button to speak the text and show the results in the | |||
| **Prosody** tab, if that is open. | |||
| If changes are made in the **Prosody** tab, then clicking **Speak** will | |||
| speak the modified prosody while **Translate** will revert to the | |||
| default prosody settings for the text. | |||
| To enter phonetic symbols (Kirschenbaum encoding) in the top left text | |||
| window, enclose them within [[ ]]. | |||
| ### Spect Tab {.western} | |||
| The "Spect" tab in the left panel of the eSpeak editor shows information | |||
| about the currently selected frame and sequence. | |||
| - - - - - - | |||
| ### Key Commands {.western} | |||
| - - - - - | |||
| USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"} | |||
| ------------------------------- | |||
| - | |||
| @@ -0,0 +1,52 @@ | |||
| # eSpeak NG - Documentation | |||
| ====================== | |||
| ### [Usage](commands.md) | |||
| ### [Languages](languages.md) | |||
| ### [Voice Files](voices.md) | |||
| Voice files specify a language and other characteristics of a voice. | |||
| ### [Mbrola Voices](mbrola.md) | |||
| eSpeak NG can be used as a front-end for Mbrola diphone voices. | |||
| ### [Pronunciation Dictionary](dictionary.md) | |||
| ### [Adding a Language](add_language.md) | |||
| How to add or improve a language. | |||
| ### [Phonemes](phonemes.md) | |||
| The list of phoneme mnemonics for English, for use in the Pronunciation | |||
| Dictionary. | |||
| ### [Phoneme Tables](phontab.md) | |||
| The tables of the phonemes used by each language, with their properties | |||
| and sound production. | |||
| ### [Intonation](intonation.md) | |||
| Different intonation "tunes" may be defined for different languages for | |||
| clauses which end in full-stop, comma, question-mark, and | |||
| exclamation-mark. | |||
| ### [eSpeak NG Library API](speak_lib.h) | |||
| API definition and header file for a shared library version of eSpeak NG. | |||
| ### [Markup tags](ssml.md) | |||
| SSML (Speech Synthesis Markup Language) and HTML tags recognized by | |||
| eSpeak NG. | |||
| ### [The espeakedit program](editor.md) | |||
| GUI software to edit vowel files and to compile the phoneme data for use | |||
| by eSpeak NG. See also [Espeakedit user interface](editor_if.md). | |||
| @@ -0,0 +1,102 @@ | |||
| INTONATION {.western} | |||
| ---------- | |||
| In eSpeak's standard intonation model, a "tune" is applied to each | |||
| clause depending on its punctuation. Other intonation models may be used | |||
| for some languages, such as tone languages. | |||
| Named tunes are defined in the text file: | |||
| `phsource/intonation`{.western}. This file must be compiled for use by | |||
| eSpeak by using the espeakedit program, using the menu option: | |||
| `Compile -> Compile intonation data`{.western}. | |||
| ### Clauses {.western} | |||
| The tunes which are used for a language can be specified by using a | |||
| `tunes`{.western} statement in a voice file in | |||
| `espeak-data/voices`{.western}. eg: | |||
| `tunes s1 c1 q1 e1`{.western} | |||
| It's parameters are four tune names which are used for clauses which end | |||
| in: | |||
| 1. 2. 3. 4. | |||
| A clause consists of the following parts: | |||
| - - - - | |||
| ### Tune definitions {.western} | |||
| Here is an example tune definition from the file | |||
| `phsource/intonation`{.western}. | |||
| ~~~~ {.western} | |||
| tune s1 | |||
| prehead 46 57 | |||
| headenv fall 16 | |||
| head 4 80 55 -8 -5 | |||
| headextend 0 63 38 13 0 | |||
| nucleus fall 70 18 24 12 | |||
| nucleus0 fall 64 8 | |||
| endtune | |||
| ~~~~ | |||
| It contains: | |||
| **tune** \<tune name\> | |||
| : Starts the definition of a tune. The `tune name`{.western} can | |||
| be used in a `tunes`{.western} statements in voice files. | |||
| **endtune** \<tune name\> | |||
| : Ends the definition of a tune. | |||
| **prehead** \<start pitch\> \<end pitch\> | |||
| : Gives the pitch path for any series of unstressed syllables before | |||
| the first stressed syllable. | |||
| **headenv** \<envelope\> \<height\> | |||
| : Gives the pitch envelope which is used for stressed syllables in the | |||
| head (before the nucleus), including `onset`{.western} and | |||
| `headlast`{.western} syllables if these are specified. | |||
| `height`{.western} gives a pitch range for the envelope. | |||
| **head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\> | |||
| : `start pitch`{.western} and `end pitch`{.western} give a pitch | |||
| path for the stressed syllables of the head. `steps`{.western} is | |||
| the maximum number of stressed syllables for which this applies. If | |||
| there are additional stressed syllables, then the | |||
| `headextend`{.western} statement is used for them. | |||
| : `unstressed start`{.western} and `unstressed end`{.western} give | |||
| a pitch path for unstressed syllables between two stressed | |||
| syllables. Their values are relative to the pitch of the previous | |||
| stressed syllable. Values are usually negative, meaning that the | |||
| unstressed syllables have lower pitch than the previous stressed | |||
| syllable. | |||
| **headextend** \<percentage list\> | |||
| : If the head contains more stressed syllables than is specified by | |||
| `steps`{.western}, then `percentage list`{.western} is used. It | |||
| contains up to 8 numbers which are used repeatedly for the | |||
| additional stressed syllables. A value of 0 corresponds to the lower | |||
| the `start pitch`{.western} and `end pitch`{.western} values of the | |||
| `head`{.western} statement. 100 corresponds to the higher value. | |||
| Negative values and values greater than 100 are allowed. | |||
| **nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\> | |||
| : This gives the pitch envelope and pitch range of the last stressed | |||
| syllable of the clause. `tail start`{.western} and | |||
| `tail end`{.western} give a pitch path for the unstressed syllables | |||
| which are after the last stressed syllable. | |||
| **nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\> | |||
| : This is used instead of `nucleus`{.western} if there are no | |||
| unstressed syllables after the last stressed syllable. In this case, | |||
| the pitch changes of the nucleus and the tail and both included in | |||
| the nucleus. | |||
| The following attributes may also be included: | |||
| **onset** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
| : This specifies the pitch for the first stressed syllable of the | |||
| head. If the `onset`{.western} statement is present, then the | |||
| `head`{.western} statement used for the stressed syllables after the | |||
| first. | |||
| **headlast** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
| : This specifies the pitch for the last stressed syllable of the head | |||
| (i.e. the stressed syllable before the nucleus). | |||
| @@ -0,0 +1,125 @@ | |||
| 3. LANGUAGES {.western} | |||
| ------------ | |||
| **Languages**. The eSpeak speech synthesizer supports several languages, | |||
| however in many cases these are initial drafts and need more work to | |||
| improve them. Assistance from native speakers is welcome for these, or | |||
| other new languages. Please contact me if you want to help. | |||
| eSpeak does text to speech synthesis for the following languages, some | |||
| better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, | |||
| Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, | |||
| German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, | |||
| Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, | |||
| Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, | |||
| Swedish, Tamil, Turkish, Vietnamese, Welsh. | |||
| #### Help Needed {.western} | |||
| Many of these are just experimental attempts at these languages, | |||
| produced after a quick reading of the corresponding article on | |||
| wikipedia.org. They will need work or advice from native speakers to | |||
| improve them. Please contact me if you want to advise or assist with | |||
| these or other languages. | |||
| The sound of some phonemes may be poorly implemented, particularly [r] | |||
| since I'm English and therefore unable to make a "proper" [r] sound. | |||
| A major factor is the rhythm or cadance. An Italian speaker told me the | |||
| Italian voice improved from "difficult to understand" to "good" by | |||
| changing the relative length of stressed syllables. Identifying | |||
| unstressed function words in the xx\_list file is also important to make | |||
| the speech flow well. See [Adding or Improving a | |||
| Language](add_language.html) | |||
| #### Character sets {.western} | |||
| Languages recognise text either as UTF8 or alternatively in an 8-bit | |||
| character set which is appropriate for that language. For example, for | |||
| Polish this is Latin2, for Russian it is KOI8-R. This choice can be | |||
| overridden by a line in the voices file to specify an ISO 8859 character | |||
| set, eg. for Russian the line: | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| charset 5 | |||
| ~~~~ | |||
| will mean that ISO 8859-5 is used as the 8-bit character set rather than | |||
| KOI8-R. | |||
| In the case of a language which uses a non-Latin character set (eg. | |||
| Greek or Russian) if the text contains a word with Latin characters then | |||
| that particular word will be pronounced using English pronunciation | |||
| rules and English phonemes. Speaking entirely English text using a Greek | |||
| or Russian voice will sound OK, but each word is spoken separately so it | |||
| won't flow properly. | |||
| Sample texts in various languages can be found at | |||
| [http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias) | |||
| and [www.gutenberg.org](http://www.gutenberg.org/) | |||
| ### 3.1 Voice Files {.western} | |||
| A number of Voice files are provided in the | |||
| `espeak-data/voices`{.western} directory. You can select one of these | |||
| with the **-v \<voice filename\>** parameter to the speak command, eg: | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| espeak-ng -vaf | |||
| ~~~~ | |||
| to speak using the Afrikaans voice. | |||
| Language voices generally start with the 2 letter [ISO 639-1 | |||
| code](http://en.wikipedia.org/wiki/ISO_639-1) for the language. If the | |||
| language does not have an ISO 639-1 code, then the 3 letter [ISO 639-3 | |||
| code](http://www.sil.org/iso639-3/codes.asp) can be used. | |||
| For details of the voice files see [Voices](voices.html). | |||
| #### Default Voice {.western} | |||
| ### 3.2 English Voices {.western} | |||
| ### 3.3 Voice Variants {.western} | |||
| To make alternative voices for a language, you can make additional voice | |||
| files in espeak-data/voices which contains commands to change various | |||
| voice and pronunciation attributes. See [voices.html](voices.html). | |||
| Alternatively there are some preset voice variants which can be applied | |||
| to any of the language voices, by appending `+`{.western} and a variant | |||
| name. Their effects are defined by files in | |||
| `espeak-data/voices/!v`{.western}. | |||
| The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male | |||
| voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and | |||
| `+croak +whisper`{.western} for other effects. For example: | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| espeak-ng -ven+m3 | |||
| ~~~~ | |||
| The available voice variants can be listed with: | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| espeak-ng --voices=variant | |||
| ~~~~ | |||
| ### 3.4 Other Languages {.western} | |||
| The eSpeak speech synthesizer does text to speech for the following | |||
| additional langauges. | |||
| ### 3.5 Provisional Languages {.western} | |||
| These languages are only initial naive implementations which have had | |||
| little or no feedback and improvement from native speakers. | |||
| ### 3.6 Mbrola Voices {.western} | |||
| Some additional voices, whose name start with **mb-** (for example | |||
| **mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak | |||
| does the spelling-to-phoneme translation and intonation. See | |||
| [mbrola.html](mbrola.html). | |||
| @@ -0,0 +1,128 @@ | |||
| MBROLA VOICES {.western} | |||
| ------------- | |||
| The Mbrola project is a collection of diphone voices for speech | |||
| synthesis. They do not include any text-to-phoneme translation, so this | |||
| must be done by another program. The Mbrola voices are cost-free but are | |||
| not open source. They are available from the Mbrola website at:\ | |||
| [http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) | |||
| eSpeak can be used as a front-end to Mbrola. It provides the | |||
| spelling-to-phoneme translation and intonation, which Mbrola then uses | |||
| to generate speech sound. | |||
| ### Voice Names {.western} | |||
| To use a Mbrola voice, eSpeak needs information to translate from its | |||
| own phonemes to the equivalent Mbrola phonemes. This has been set up for | |||
| only some voices so far. | |||
| The eSpeak voices which use Mbrola are named as:\ | |||
| **mb-**xxx | |||
| where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola | |||
| "**en1**" English voice). These voice files are in eSpeak's directory | |||
| `espeak-data/voices/mbrola`{.western}. | |||
| The installation instructions below use the Mbrola voice "en1" as an | |||
| example. You can use other mbrola voices for which there is an | |||
| equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}. | |||
| There are some additional eSpeak Mbrola voices which speak English text | |||
| using a Mbrola voice for a different language. These contain the name of | |||
| the Mbrola voice with a suffix **-en**. For example, the voice | |||
| **mb-de4-en** will speak English text with a German accent by using the | |||
| Mbrola **de4** voice. | |||
| ### Windows Installation {.western} | |||
| The SAPI5 version of eSpeak uses the mbrola.dll. | |||
| 1. 2. 3. 4. | |||
| ### Linux Installation {.western} | |||
| From eSpeak version 1.44 onwards, eSpeak calls the mbrola program | |||
| directly, rather than passing phoneme data to it using a pipe. | |||
| 1. 2. 3. | |||
| ### Mbrola Voice Files {.western} | |||
| eSpeak's voice files for Mbrola voices are in directory | |||
| `espeak-data/voices/mbrola`{.western}. They contain a line:\ | |||
| `mbrola <voice> <translation>`{.western} \ | |||
| eg.\ | |||
| `mbrola en1 en1_phtrans`{.western} | |||
| - - | |||
| They are binary files which are compiled, using espeakedit, from source | |||
| files in `phsource/mbrola`{.western}, see below. | |||
| ### Mbrola Phoneme Translation Data {.western} | |||
| Mbrola phoneme translation files specify translations from eSpeak | |||
| phoneme names to mbrola phoneme names. They are referenced from voice | |||
| files. | |||
| The source files are in `phsource/mbrola`{.western}. These are compiled | |||
| using the `espeakedit`{.western} program | |||
| (`Compile->Compile mbrola phonemes list`{.western}) to produce data | |||
| files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak. | |||
| Each line in the mbrola phoneme translation file contains: | |||
| `<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western} | |||
| **\<control\>** | |||
| - - - - | |||
| **\<espeak ph1\>**\ | |||
| The eSpeak phoneme which is to be translated to an mbrola phoneme. | |||
| **\<espeak ph2\>**\ | |||
| If this field is not `NULL`{.western}, then the match only occurs if | |||
| this field matches the next phoneme. If control bit 1 is set, then the | |||
| *previous* rather than the *next* phoneme is matched. This field may | |||
| also have the following values:\ | |||
| `VWL`{.western} matches any Vowel phoneme. | |||
| **\<percent\>**\ | |||
| If this field is zero then only one mbrola phoneme is used. If this | |||
| field is non-zero, then two mbrola phonemes are used, and this value | |||
| gives the percentage length of the first mbrola phoneme. | |||
| **\<mbrola ph1\>**\ | |||
| The mbrola phoneme to which the eSpeak phoneme is translated. This | |||
| field may be `NULL`{.western}. | |||
| **\<mbrola ph2\>**\ | |||
| The second mbrola phoneme. This field is only used if the \<percent\> | |||
| field is not zero. | |||
| The list is searched from start to finish, until a match is found. | |||
| Therefore, a line with more specific match condition should appear | |||
| before a line which matches the same eSpeak phoneme but with a more | |||
| general condition. | |||
| The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes | |||
| which are used for each language. Translations for all these should be | |||
| given in the mbrola phoneme translation file. In addition, some phonemes | |||
| which are referenced from phoneme files (eg. | |||
| `phsource/ph_language, phsource/phonemes`{.western}) in lines such as: | |||
| ~~~~ {.western} | |||
| beforenotvowel l/ | |||
| reduceto a# 0 | |||
| ~~~~ | |||
| should also be included, even though they don't appear in | |||
| `dictsource/dict_phonemes`{.western}. | |||
| If the language's \*\_list or \*\_rules files includes rules to speak | |||
| words "as English" the mbrola phoneme translation file should include | |||
| rules which translate English phonemes into near equivalents, so that | |||
| they can spoken by the mbrola voice. | |||
| @@ -0,0 +1,283 @@ | |||
| PHONEMES {.western} | |||
| -------- | |||
| In general a different set of phonemes can be defined for each language. | |||
| In most cases different languages inherit the same basic set of | |||
| consonants. They can add to these or modify them as needed. | |||
| The phoneme mnemonics are based on the scheme by Kirshenbaum which | |||
| represents International Phonetic Alphabet symbols using ascii | |||
| characters. See: | |||
| [www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf). | |||
| Phoneme mnemonics can be used directly in the text input to | |||
| **espeak-ng**. They are enclosed within double square brackets. Spaces | |||
| are used to separate words, and all stressed syllables must be marked | |||
| explicitly. eg:\ | |||
| `[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western} | |||
| ### English Consonants {.western} | |||
| `[p]`{.western} | |||
| `[b]`{.western} | |||
| `[t]`{.western} | |||
| `[d]`{.western} | |||
| `[tS]`{.western} | |||
| **ch**urch | |||
| `[dZ]`{.western} | |||
| **j**udge | |||
| `[k]`{.western} | |||
| `[g]`{.western} | |||
| `[f]`{.western} | |||
| `[v]`{.western} | |||
| `[T]`{.western} | |||
| **th**in | |||
| `[D]`{.western} | |||
| **th**is | |||
| `[s]`{.western} | |||
| `[z]`{.western} | |||
| `[S]`{.western} | |||
| **sh**op | |||
| `[Z]`{.western} | |||
| plea**s**ure | |||
| `[h]`{.western} | |||
| `[m]`{.western} | |||
| `[n]`{.western} | |||
| `[N]`{.western} | |||
| si**ng** | |||
| `[l]`{.western} | |||
| `[r]`{.western} | |||
| **r**ed (Omitted if not immediately followed by a vowel). | |||
| `[j]`{.western} | |||
| **y**es | |||
| `[w]`{.western} | |||
| **Some Additional Consonants** | |||
| \ | |||
| `[C]`{.western} | |||
| German i**ch** | |||
| `[x]`{.western} | |||
| German bu**ch** | |||
| `[l^]`{.western} | |||
| Italian **gl**i | |||
| `[n^]`{.western} | |||
| Spanish **ñ** | |||
| ### English Vowels {.western} | |||
| These are the phonemes which are used by the English spelling-to-phoneme | |||
| translations (en\_rules and en\_list). In some varieties of English | |||
| different phonemes may have the same sound, but they are kept separate | |||
| because they may differ in another variety. | |||
| In rhotic accents, such as General American, the phonemes | |||
| `[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound. | |||
| `[@]`{.western} | |||
| alph**a** | |||
| schwa | |||
| `[3]`{.western} | |||
| bett**er** | |||
| rhotic schwa. In British English this is the same as `[@]`{.western}, | |||
| but it includes 'r' colouring in American and other rhotic accents. In | |||
| these cases a separate `[r]`{.western} should not be included unless it | |||
| is followed immediately by another vowel. | |||
| `[3:]`{.western} | |||
| n**ur**se | |||
| `[@L]`{.western} | |||
| simp**le** | |||
| `[@2]`{.western} | |||
| the | |||
| Used only for "the". | |||
| `[@5]`{.western} | |||
| to | |||
| Used only for "to". | |||
| `[a]`{.western} | |||
| tr**a**p | |||
| `[aa]`{.western} | |||
| b**a**th | |||
| This is `[a]`{.western} in some accents, `[A:]`{.western} in others. | |||
| `[a#]`{.western} | |||
| **a**bout | |||
| This may be `[@]`{.western} or may be a more open schwa. | |||
| `[A:]`{.western} | |||
| p**al**m | |||
| `[A@]`{.western} | |||
| st**ar**t | |||
| `[E]`{.western} | |||
| dr**e**ss | |||
| `[e@]`{.western} | |||
| squ**are** | |||
| `[I]`{.western} | |||
| k**i**t | |||
| `[I2]`{.western} | |||
| **i**ntend | |||
| As `[I]`{.western}, but also indicates an unstressed syllable. | |||
| `[i]`{.western} | |||
| happ**y** | |||
| An unstressed "i" sound at the end of a word. | |||
| `[i:]`{.western} | |||
| fl**ee**ce | |||
| `[i@]`{.western} | |||
| n**ear** | |||
| `[0]`{.western} | |||
| l**o**t | |||
| `[V]`{.western} | |||
| str**u**t | |||
| `[u:]`{.western} | |||
| g**oo**se | |||
| `[U]`{.western} | |||
| f**oo**t | |||
| `[U@]`{.western} | |||
| c**ure** | |||
| `[O:]`{.western} | |||
| th**ou**ght | |||
| `[O@]`{.western} | |||
| n**or**th | |||
| `[o@]`{.western} | |||
| f**or**ce | |||
| `[aI]`{.western} | |||
| pr**i**ce | |||
| `[eI]`{.western} | |||
| f**a**ce | |||
| `[OI]`{.western} | |||
| ch**oi**ce | |||
| `[aU]`{.western} | |||
| m**ou**th | |||
| `[oU]`{.western} | |||
| g**oa**t | |||
| `[aI@]`{.western} | |||
| sc**ie**nce | |||
| `[aU@]`{.western} | |||
| h**our** | |||
| ### Some Additional Vowels {.western} | |||
| Other languages will have their own vowel definitions, eg: | |||
| +--------------------------------------+--------------------------------------+ | |||
| | `[e]`{.western} | German **eh**, French **é** | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | `[o]`{.western} | German **oo**, French **o** | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | `[y]`{.western} | German **ü**, French **u** | | |||
| +--------------------------------------+--------------------------------------+ | |||
| | `[Y]`{.western} | German **ö**, French **oe** | | |||
| +--------------------------------------+--------------------------------------+ | |||
| `[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western} | |||
| @@ -0,0 +1,174 @@ | |||
| PHONEME TABLES {.western} | |||
| -------------- | |||
| A phoneme table defines all the phonemes which are used by a language, | |||
| together with their properties and the data for their production as | |||
| sounds. | |||
| Generally each language has its own phoneme table, although additional | |||
| phoneme tables can be used for different voices within the language. | |||
| These alternatives are referenced from Voice files. | |||
| A phoneme table does not need to define all the phonemes used by a | |||
| language. It can inherit the phonemes from a previously defined phoneme | |||
| table. For example, a phoneme table may redefine (or add) some of the | |||
| vowels that it uses, but inherit most of its consonants from a standard | |||
| set. | |||
| The source files for the phoneme data are in the "phsource" directory in | |||
| the espeakedit download package. "Vowel files", which are referenced in | |||
| FMT(), VowelStart(), and VowelEnding() instructions are made using the | |||
| espeakedit program. | |||
| ### Phoneme files {.western} | |||
| The phoneme tables are defined in a master phoneme file, named | |||
| **phonemes**. This starts with the **base** phoneme table followed by | |||
| phoneme tables for other languages and voices. These inherit phonemes | |||
| from the **base** table or previously defined tables. | |||
| In addition to phoneme definitions, the phoneme file can contain the | |||
| following: | |||
| **include** \<filename\> | |||
| : Includes the text of the specified file at this point. This allows | |||
| different phoneme tables to be kept in different text files, for | |||
| convenience. \<filename\> is a relative path. The included file can | |||
| itself contain **include** statements. | |||
| **phonemetable** \<name\> \<parent\> | |||
| : Starts a new phoneme table, and ends the previous table.\ | |||
| \<name\> Is the name of this phoneme table. This name is used in | |||
| Voice files.\ | |||
| \<parent\> Is the name of a previously defined phoneme table whose | |||
| phoneme definitions are inherited by this one. The name **base** | |||
| indicates the first (base) phoneme table. | |||
| ### Phoneme definitions {.western} | |||
| Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and | |||
| later. | |||
| A phoneme table contains a list of phoneme definitions. Each starts with | |||
| the keyword **phoneme** and the phoneme name (this is the name used in | |||
| the pronunciation rules in a language's \*\_rules and \*\_list files), | |||
| and ends with the keyword **endphoneme**. For example: | |||
| ~~~~ {.western} | |||
| phoneme aI | |||
| vowel | |||
| starttype #a endtype #i | |||
| length 230 | |||
| FMT(vowels/ai) | |||
| endphoneme | |||
| phoneme s | |||
| vls alv frc sibilant | |||
| voicingswitch z | |||
| lengthmod 3 | |||
| Vowelin f1=0 f2=1700 -300 300 f3=-100 80 | |||
| Vowelout f1=0 f2=1700 -300 250 f3=-100 80 rms=20 | |||
| IF nextPh(isPause) THEN | |||
| WAV(ufric/s_) | |||
| ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN | |||
| WAV(ufric/s!) | |||
| ENDIF | |||
| WAV(ufric/s) | |||
| endphoneme | |||
| ~~~~ | |||
| A phoneme definition contains both static properties and executed | |||
| instructions. The instructions may contain conditional statements, so | |||
| that the effect of the phoneme may be different depending on adjacent | |||
| phonemes, whether the syllable is stressed, etc. | |||
| The instructions of a phoneme are interpreted in two different phases. | |||
| In the first phase, the instructions may change the phoneme and replace | |||
| it by a different phoneme. In the second phase, instructions are used to | |||
| produce the sound for the phoneme. | |||
| The **import\_phoneme** statement can be used to copy a previously | |||
| defined phoneme from a specified phoneme table. For example: | |||
| ~~~~ {.western} | |||
| phoneme t | |||
| import_phoneme base/t[ | |||
| endphoneme | |||
| ~~~~ | |||
| means: `phoneme t`{.western} in this phoneme table is a copy of | |||
| `phoneme t[`{.western} from phoneme table "base". A **length** | |||
| instruction can be used after **import\_phoneme** to vary the length | |||
| from the original. | |||
| ### Phoneme Properties {.western} | |||
| Within the phoneme definition the following lines may occur: ( (V) | |||
| indicates only for vowels, (C) only for consonants) | |||
| ### Phoneme Instructions {.western} | |||
| Phoneme Instructions may be included within conditional statements. | |||
| During the first phase of phoneme interpretation, an instruction which | |||
| causes a change to a different phoneme will terminate the instructions. | |||
| During the second phase, FMT() and WAV() instructions will terminate the | |||
| instructions. | |||
| ### Conditional Statements {.western} | |||
| Phoneme definitions can contain conditional statements such as: | |||
| ~~~~ {.western} | |||
| IF <condition> THEN | |||
| <statements> | |||
| ENDIF | |||
| ~~~~ | |||
| or more generally: | |||
| ~~~~ {.western} | |||
| IF <condition> THEN | |||
| <statements> | |||
| ELIF <condition> THEN | |||
| <statements> | |||
| ... | |||
| ELSE | |||
| <statements> | |||
| ENDIF | |||
| ~~~~ | |||
| where the `ELSE`{.western} and multiple `ELSE`{.western} parts are | |||
| optional. | |||
| Multiple conditions may be joined with `AND`{.western} or | |||
| `OR`{.western}, but not a mixture of `AND`{.western}s and | |||
| `OR`{.western}s. | |||
| A condition may be preceded by `NOT`{.western}. For example: | |||
| ~~~~ {.western} | |||
| IF <condition> AND NOT <condition> THEN | |||
| <statements> | |||
| ENDIF | |||
| ~~~~ | |||
| **Condition** Can be: | |||
| **Attributes** | |||
| ### Sound Specifications {.western} | |||
| There are three ways to produce sounds: | |||
| - - - | |||
| ### Vowel Transitions {.western} | |||
| These specify how a consonant affects an adjacent vowel. A consonant may | |||
| cause a transition in the vowel's formants as the mouth changes shape | |||
| between the consonant and the vowel. The following attributes may be | |||
| specified. Note that the maximum rate of change of formant frequencies | |||
| is limited by the speak program. | |||
| @@ -0,0 +1,64 @@ | |||
| TEXT MARKUP {.western} | |||
| ----------- | |||
| ### SSML: Speech Synthesis Markup Language {.western} | |||
| The following markup tags and attributes are recognised: | |||
| **\<speak\>** | |||
| - - | |||
| **\<voice\>** | |||
| - - - - - | |||
| **\<prosody\>** | |||
| - - - - | |||
| **\<say-as\>** | |||
| - - - - - | |||
| **\<mark\>** name | |||
| **\<s\>** | |||
| - | |||
| **\<p\>** | |||
| - | |||
| **\<sub\>** alias | |||
| **\<tts:style\>** | |||
| - - | |||
| **\<audio\>** src | |||
| **\<emphasis\>** | |||
| - | |||
| **\<break\>** | |||
| - - | |||
| ### HTML {.western} | |||
| eSpeak can speak HTML text directly, or text containing both SSML and | |||
| HTML markup.\ | |||
| Any unrecognised tags are ignored. | |||
| The following tags case a sentence break.\ | |||
| **\<br\> \<dd\> \<li\> \<img\> \<td\> ** | |||
| The following tags case a paragraph break.\ | |||
| **\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> ** | |||
| Text between the following tags is ignored.\ | |||
| **\<script\> ... \</script\> \ | |||
| \<style\> ... \</style\> ** | |||
| @@ -0,0 +1,311 @@ | |||
| 5. VOICES {.western} | |||
| --------- | |||
| ### 5.1 Voice Files {.western} | |||
| A Voice file specifies a language (and possibly a language variant or | |||
| dialect) together with various attributes that affect the | |||
| characteristics of the voice quality and how the language is spoken. | |||
| Voice files are placed in the `espeak-data/voices`{.western} directory, | |||
| or within subdirectories in there. | |||
| The available voice files can be listed by: | |||
| ~~~~ {.western} | |||
| espeak-ng --voices | |||
| or | |||
| espeak-ng --voices=<language> | |||
| ~~~~ | |||
| also | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| espeak-ng --voices=<variant> | |||
| ~~~~ | |||
| Lists voice variants which can be applied to eSpeak voices. | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| espeak-ng --voices=<mbrola> | |||
| ~~~~ | |||
| Lists the Mbrola voices. | |||
| ### 5.2 Contents of Voice Files {.western} | |||
| The **language** attribute is mandatory. All the other attributes are | |||
| optional. | |||
| #### Identification Attributes {.western} | |||
| **name \<name\>** | |||
| A name given to this voice. | |||
| **language \<language code\> [\<priority\>]** | |||
| This attribute should appear before the other attributes which are | |||
| listed below. | |||
| It selects the default behaviour and characteristics for the language, | |||
| and sets default values for "phonemes", "dictionary" and other | |||
| attributes. The \<language code\> should be a two-letter ISO 639-1 | |||
| language code. One or more language variant codes may be appended, | |||
| separated by hyphens. (eg. en-uk-north). | |||
| The optional \<priority\> value gives the preference of this voice | |||
| compared with others for the specified language. A low value indicates a | |||
| more preferred voice. The default value is 5. | |||
| More than one **language** line may be present. A voice may be selected | |||
| for other related languages (variants which have the same initial 2 | |||
| letter language code as the specified language), but it will be less | |||
| preferred for these. Different language variants may be specified by | |||
| additional **language** lines in order to indicate that this is a | |||
| preferred voice for them also. Eg. | |||
| ~~~~ {.western} | |||
| language en-uk-north | |||
| language en | |||
| ~~~~ | |||
| indicates that this is voice is for the "en-uk-north" dialect, but it is | |||
| also a main choice when a general "en" language is specified. Without | |||
| the second **language** line, it would be disfavoured for "en" for being | |||
| a more specialised voice. | |||
| **gender \<gender\> [\<age\>]** | |||
| This attribute is only a label for use in voice selection. It doesn't | |||
| change the sound of the voice. | |||
| \<gender\> may be male, female, or unknown.\ | |||
| \<age\> is optional and gives an age in years. | |||
| **pitch \<base\> \<range\>** | |||
| Two integer values. The first gives a base pitch to the voice (value in | |||
| Hz) The second controls the range of pitches used by the voice. Setting | |||
| it equal to the base pitch will give a monotone. The default values are | |||
| 82 118. | |||
| **formant \<number\> \<frequency\> \<strength\> \<width\> | |||
| \<freq\_add\>** | |||
| Systematically adjusts the frequency, strength, and width of the | |||
| resonance peaks of the voice. Values are percentages of the default | |||
| values. Changing these affects the tone/quality of the voice. | |||
| **freq\_add**Adds a constant value (in Hz) to the frequency of the | |||
| formant peak. The value may be negative. | |||
| - - - - | |||
| **echo \<delay\> \<amplitude\>** | |||
| Parameter 1 gives the delay in mS (0 to 250mS).\ | |||
| Parameter 2 gives the echo amplitude (0 to 100).\ | |||
| Adding some echo can give a clearer or more interesting sound, | |||
| especially when listening through a domestic stereo sound system, rather | |||
| than small computer speakers. | |||
| **tone** | |||
| Controls the tone of the sound.\ | |||
| **tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\> | |||
| which define a frequency response graph. Frequency is in Hz and | |||
| amplitude is in the range 0 to 255. The default is: | |||
| ` `{.western}`tone 600 170 1200 135 2000 110`{.western} | |||
| This means that from frequency 0Hz to 600Hz the amplitude is 170. From | |||
| 600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases | |||
| to 110 at 2000Hz and remains at 110 at higher frequencies. This | |||
| adjustment applies only to voiced sounds such as vowels and sonorant | |||
| consonants (such as [n] and [l]). Unvoiced sounds such as [s] are | |||
| unaffected. | |||
| This **tone** statement can also appear in | |||
| `espeak-data/config`{.western}, in which case it applies to all voices | |||
| which don't have their own **tone** statement. | |||
| **flutter \<value\>** | |||
| Default value: 2.\ | |||
| Adds pitch fluctuations to give a wavering or older-sounding voice. A | |||
| large value (eg. 20) makes the voice sound "croaky". | |||
| **roughness \<value\>** | |||
| Default value: 2. Range 0 - 7\ | |||
| Reduces the amplitude of alternate waveform cycles in order to make the | |||
| voice sound creaky. | |||
| **voicing \<value\>** | |||
| Default value: 100.\ | |||
| Adjusts the strength of formant-synthesized sounds (vowels and sonorant | |||
| consonants). | |||
| **consonants \<value\> \<value\>** | |||
| Default values: 100, 100.\ | |||
| Adjusts the strength of noise sounds which are used in consonants. The | |||
| first value is the strength of unvoiced consonants such as "s" and "t". | |||
| The second value is the strength of the noise component of voiced | |||
| consonants such as "z" and "d". | |||
| **breath \<up to 8 integer values\>** | |||
| Default values: 0.\ | |||
| Adds noise which corresponds to the formant frequency peaks. The values | |||
| give the strength of noise for each formant peak (formants 1 to 8). | |||
| Use together with a low or zero value of the **voicing** attribute to | |||
| make a "wisper". For example:\ | |||
| `breath 75 75 60 40 15 10 breathw 150 150 200 200 400 400 voicing 18 flutter 20 formant 0 100 0 100 // remove formant 0 `{.western} | |||
| **breathw \<up to 8 integer values\>** | |||
| These values give bandwidths of the noise peaks of the **breath** | |||
| attribute. If **breathw** values are not given, then suitable default | |||
| values will be used. | |||
| **speed \<value\>** | |||
| Default value 100.\ | |||
| Adjusts the speaking speed by a percentage of the default rate. This | |||
| can be used if a language voice seems faster or slower compared to other | |||
| voices. | |||
| **phonemes \<name\>** | |||
| Specifies which set of phonemes to use from those contained in the | |||
| phontab, phonindex, and phondata data files. This is a **phonemetable** | |||
| name as given in the "phoneme" source file. | |||
| This parameter is usually not needed as it is set by default to the | |||
| first two letters of the "language" parameter. However, different voices | |||
| of the same language can use different phoneme sets, to give different | |||
| accents. | |||
| **dictionary \<name\>** | |||
| Specifies which pair of dictionary files to use. eg. "english" indicates | |||
| that *speak-data/en\_dict* should be used to translate from words to | |||
| phonemes. This parameter is usually not needed as it is set by default | |||
| to the first two letters of "language" parameter. | |||
| **dictrules \<list of rule numbers\>** | |||
| Gives a list of conditional dictionary rules which are applied for this | |||
| voice. Rule numbers are in the range 0 to 31 and are specific to a | |||
| language dictionary. They apply to rules in the language's **\_rules** | |||
| dictionary file and also its **\_list** exceptions list. See | |||
| [dictionary.html](dictionary.html). | |||
| **replace \<flags\> \<phoneme\> \<replacement phoneme\>** | |||
| Replace a phoneme by another whenever it occurs. | |||
| \<replacement phoneme\> may be NULL. | |||
| Flags: bit 0: replacement only occurs on the final phoneme of a word.\ | |||
| Flags: bit 1: replacement doesn't occur in stressed syllables.\ | |||
| eg. | |||
| ~~~~ {.western} | |||
| replace 0 h NULL // drops h's | |||
| replace 0 V U // replaces vowel in 'strut' by that in 'foot' | |||
| // as occurs in northern British English | |||
| replace 3 N n // change 'fishing' to 'fishin' etc. | |||
| // (only the last phoneme of a word, only in unstressed syllables) | |||
| ~~~~ | |||
| The phoneme mnemonics can be defined for each language, but some are | |||
| listed in [phonemes.html](phonemes.html) | |||
| **stressLength \<8 integer values\>** | |||
| Eight integer parameters. These control the relative lengths of the | |||
| vowels in stressed and unstressed syllables. | |||
| - - - - - - - - | |||
| **stressAdd \<8 integer values\>** | |||
| Eight integer parameters. These are added to the voice's corresponding | |||
| stressLength values. They are used in the voice variant files in | |||
| `espeak-data/voices/!v`{.western} to give some variety. Negative values | |||
| may be used. | |||
| **stressAmp \<8 integer values\>** | |||
| Eight integer parameters. These control the relative amplitudes of the | |||
| vowels in stressed and unstressed syllables (see stressLength above). | |||
| The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although | |||
| these defaults may be different for particular languages. | |||
| **intonation \<param1\>** | |||
| - - - - | |||
| **charset \<param1\>** | |||
| The ISO 8859 character set number. (not all are implemented). | |||
| **dictmin \<value\>** | |||
| Used for some languages to detect if additional language data is | |||
| installed. If the size of the compiled dictionary data for the language | |||
| (the file `espeak-data/*_dict`{.western}) is less than this size then a | |||
| warning is given. | |||
| **alphabet2 \<alphabet\> \<language\>** | |||
| Used to specify a language to be used to speak words which are written | |||
| in a non-native alphabet. eg: | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| alphabet2 cyr ru | |||
| ~~~~ | |||
| Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default | |||
| language for latin alphabet is English. | |||
| **dictdialect \<dialect\>** | |||
| Words can be marked in the \*\_list or \*\_rules file to be spoken using | |||
| a foreign voice. This **dictdialect** attribute can be used to specify | |||
| which dialect of the foreign language should be used, instead of the | |||
| default dialect. The currently available dialects are:\ | |||
| **en-us** (US English)\ | |||
| **es-la** (Latin American Spanish).\ | |||
| eg. | |||
| ~~~~ {.western style="margin-bottom: 0.5cm"} | |||
| dictdialect en-us | |||
| ~~~~ | |||
| This means that any words or rules which are maked with \_\^\_EN will be | |||
| spoken with the US English voice instead of the default UK English | |||
| voice. | |||
| Additional attributes are available to set various internal options | |||
| which control how language is processed. These would normally be set in | |||
| the program code rather than in a voice file. | |||
| A number of Voice files are provided in the | |||
| `espeak-data/voices`{.western} directory. You can select one of these | |||
| with the **-v \<voice filename\>** parameter to the speak command. | |||
| **default** | |||
| This voice is used if none is specified in the speak command. You can | |||
| copy your preferred voice to "default" so you can use the speak command | |||
| without the need to specify a voice. | |||
| For a list of voices provided for English and other languages see | |||
| [Languages](languages.html). | |||