6. ADDING OR IMPROVING A LANGUAGE {.western} | |||||
--------------------------------- | |||||
Most of the work doesn't need any programming knowledge. Just an | |||||
understanding of the language, an awareness of its features, patience | |||||
and attention to detail. Wikipedia is a good source of basic phonetic | |||||
information, eg | |||||
[http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel). | |||||
In many cases it should be fairly easy to add a rough implementation of | |||||
a new language, hopefully enough to be intelligible. After that it's a | |||||
gradual process of improvement. | |||||
### 6.1 Language Code {.western} | |||||
Generally, the language's international [ISO | |||||
639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify | |||||
the language. It is used in the filenames which contain the language's | |||||
data. In the examples below the code **"fr"** is used as an example. | |||||
Replace this with the code of your language. | |||||
If the language does not have a 2-letter ISO\_639-1 code, then use the | |||||
3-letter ISO\_639-3 code. Language codes may differ from country codes. | |||||
It is possible to have different variants of a language for different | |||||
dialects. For example the sound of some phonemes are changed, or some of | |||||
the pronunciation rules differ. | |||||
### 6.2 Language Files {.western} | |||||
The following files are needed for your language. | |||||
- - - - | |||||
The **fr\_rules** and **fr\_list** files are compiled to produce the | |||||
file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking. | |||||
### 6.3 Voice File {.western} | |||||
Each language needs a voice file in **espeak-data/voices** or | |||||
**espeak-data/voices/test**. The filename of the default voice for a | |||||
language should be the same as the language code (eg. "fr" for French). | |||||
Details of the contents of voice files are given in | |||||
[voices.html](http://espeak.sf.net/voices.html). | |||||
The simplest voice file would contain just 2 lines to give the language | |||||
name and language code, eg: | |||||
~~~~ {.western} | |||||
name french | |||||
language fr | |||||
~~~~ | |||||
This language code specifies which phoneme table and dictionary to use | |||||
(i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If | |||||
needed, these can be overridden by **phonemes** and **dictionary** | |||||
attributes in the voice file. For example you may want to start the | |||||
implementation of a new language by using the phoneme table of an | |||||
existing language. | |||||
### 6.4 Phoneme Definition File {.western} | |||||
You must first decide on the set of phonemes (vowel and consonant | |||||
sounds) for the language. These should be defined in a phoneme | |||||
definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your | |||||
language. A reference to this file is then included at the end of the | |||||
master phoneme file, **phsource/phonemes**, eg: | |||||
~~~~ {.western} | |||||
phonemetable fr base | |||||
include ph_french | |||||
~~~~ | |||||
This example defines a phoneme table **"fr"** which inherits the | |||||
contents of phoneme table **"base"**. Its contents are found in the file | |||||
**ph\_french**. | |||||
The **base** phoneme table contains definitions of a basic set of | |||||
consonants, and also some "control" phonemes such as stress marks and | |||||
pauses. These are defined in **phsource/phonemes**. The phoneme table | |||||
for a language will inherit these, or alternatively it may inherit the | |||||
phoneme table of another language which in turn inherits the **base** | |||||
phoneme table. | |||||
The phonemes file for the language defines those additional phonemes | |||||
which are not inherited (generally the vowels and diphthongs, plus any | |||||
additional consonants that are needed), or phonemes whose definitions | |||||
differ from the inherited version (eg. the redefinition of a consonant). | |||||
Details of phonemes files are given in | |||||
[phontab.html](http://espeak.sf.net/phontab.html). | |||||
The **Compile phoneme data** function of the **espeakedit** program | |||||
compiles the phonemes files of all languages to produce the files | |||||
**espeak-data/phontab**, **phonindex**, and **phondata** which are used | |||||
by eSpeak. | |||||
For many languages, the consonant phonemes which are already available | |||||
in eSpeak, together with the available vowel files which can be used to | |||||
define vowel phonemes, will be sufficient. At least for an initial | |||||
implementation. | |||||
### 6.5 Dictionary Files {.western} | |||||
Once the language's phonemes have been defined, then pronunciation | |||||
dictionary data can be produced in order to translate the language's | |||||
source text into phonemes. This consists of two source files: | |||||
**fr\_rules** (the spelling to phoneme rules) and **fr\_list** (an | |||||
exceptions list, and attributes of certain words). The corresponding | |||||
compiled data file is **espeak-data/fr\_dict** which is produced from | |||||
**fr\_rules** and **fr\_list** sources by the command: | |||||
> `espeak-ng --compile=fr`{.western}. | |||||
Or by using the **espeakedit** program. | |||||
Details of the contents of the dictionary files are given in | |||||
[dictionary.html](http://espeak.sf.net/dictionary.html). | |||||
The **fr\_list** file contains: | |||||
- - - - | |||||
### 6.6 Program Code {.western} | |||||
The behaviour of the eSpeak program is controlled by various options | |||||
such as: | |||||
- - - - | |||||
The function SetTranslator() at the start of the source code file | |||||
tr\_languages.cpp recognizes the language code and sets the appropriate | |||||
options. For a new language, you would add its language code and the | |||||
required options in SetTranslator(). However, this may not be necessary | |||||
during testing because most of the options can also be set in the voice | |||||
file in espeak-data/voices (see [Voice | |||||
files](http://espeak.sf.net/voices.html)). | |||||
### 6.7 Improving a Language {.western} | |||||
Listen carefully to the eSpeak voice. Try to identify what sounds wrong | |||||
and what needs to be improved. | |||||
- - - - - | |||||
**If you are interested in working on a language, please contact me so | |||||
that I can set up the initial data and discuss the features of the | |||||
language.** | |||||
For most of the eSpeak voices, I do not speak or understand the | |||||
language, and I do not know how it should sound. I can only make | |||||
improvements as a result of feedback from speakers of that language. If | |||||
you want to help to improve a language, listen carefully and try to | |||||
identify individual errors, either in the spelling-to-phoneme | |||||
translation, the position of stressed syllables within words, or the | |||||
sound of phonemes, or problems with rhythm and vowel lengths. |
ANALYSIS | |||||
======== | |||||
(Further notes are needed) | |||||
Recordings of spoken words and phrases can be analysed to try and make | |||||
eSpeak match a language more closely. Unlike most other (larger and | |||||
better quality) synthesizers, eSpeak's data is not produced directly | |||||
from recorded sounds. To use an analogy, it's like a drawing or sketch | |||||
compared with a photograph. Or vector graphics compared with a bitmap | |||||
image. It's smaller, less accurate, with less subtlety, but it can | |||||
sometimes show some aspects of the picture more clearly than a more | |||||
accurate image. | |||||
#### Recording Sounds {.western} | |||||
Recordings should be made while speaking slowly, clearly, and firmly and | |||||
loudly (but not shouting). Speak about half a metre from the microphone. | |||||
Try to avoid background noise and hum interference from electrical power | |||||
cables. | |||||
#### Praat {.western} | |||||
I use a modified version of the praat program | |||||
([www.praat.org](www.praat.org)) to view and analyse both sound | |||||
recordings and output from eSpeak. The modification adds a new function | |||||
(`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and | |||||
produces a file which can be loaded into espeakedit. Details of the | |||||
modification are in the `"praat-mod"`{.western} directory in the | |||||
espeakedit package. The analysis contains a sequence of frames, one per | |||||
cycle at the speech's fundamental frequency. Each frame is a short time | |||||
spectrum, together with praat's estimation of the f1 to f5 formant | |||||
frequencies at the time of that cycle. I also use Praat's | |||||
`New->Record_mono_sound`{.western} function to make sound recordings. | |||||
### Vowels and Diphthongs {.western} | |||||
#### Analysing a Recording {.western} | |||||
Make a recording, with a male voice, and trim it in Praat to keep just | |||||
the required vowel sound. Then use the new | |||||
`Spectrum->To_eSpeak`{.western} modification (this was named | |||||
`To_Spectrogram2`{.western} in earlier versions) to analyse the sound. | |||||
It produces a file named `"spectrum.dat"`{.western}. Load the | |||||
`"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open | |||||
functions, `File->Open`{.western} and `File->Open2`{.western}. They are | |||||
the same, except that they remember different paths. I generally use | |||||
`File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file. | |||||
The data is displayed in espeakedit as a sequence of spectrum frames | |||||
(see [editor.html](editor.html)). | |||||
#### Tone Quality {.western} | |||||
It can be difficult to match the tonal quality of a new vowel to be | |||||
compatible with existing vowel files. This is determined by the relative | |||||
heights and widths of the formant peaks. These vary depending on how the | |||||
recording was made, the microphone, and the strength and tone of the | |||||
voice. Also the positions of the higher peaks (F3 upwards) can vary | |||||
depending on the characteristics of the speaker's voice. Formant peaks | |||||
correspond to resonances within the mouth and throat, and they depend on | |||||
its size and shape. With a female voice, all the formants (F1 upwards) | |||||
are generally shifted to higher frequencies. For these reasons, it's | |||||
best to use a male voice, and to use its analysed spectra only as | |||||
guidance. Rather than construct formant-peaks entirely to match the | |||||
analysed data, instead copy keyframes from a similar existing vowel. | |||||
Then make small adjustments to match the position of the F1, F2, F3 | |||||
formant peaks and hopefully produce the required vowel sound. | |||||
#### Using an Existing Vowel File {.western} | |||||
Choose a similar vowel file from `phsource/vowel`{.western} and open it | |||||
into espeakedit. It may be useful to use | |||||
`phsource/vowel/vowelchart`{.western} as a map to show how vowel files | |||||
compare with each other. You can select a keyframe from the vowel file | |||||
and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame | |||||
of the new spectrum sequence. Then adjust the peaks to match the new | |||||
frame. Press F1 to hear the sound of the formant peaks in the selected | |||||
frame. The F0 peak is provided in order to adjust the correct balance of | |||||
low frequencies, below the F1 peak. If the sound is too muffled, or | |||||
conversely, too "thin", try adjusting the amplitude or position of the | |||||
F0 peak. | |||||
#### Length and Amplitude {.western} | |||||
Use an existing vowel file as a guide for how to set the amplitude and | |||||
length of the keyframes. At the right of each keyframe, its length is | |||||
shown in mS and under that is its relative (RMS) amplitude. The second | |||||
keyframe should be marked with a red marker (use CTRL-M to toggle this). | |||||
This divides the vowel into the front-part (with one frame), and the | |||||
rest. Use F2 to play the sound of the new vowel sequence. It will also | |||||
produce a WAV file (the default name is speech.wav) which you can read | |||||
into praat to see whether it has a sensible shape. | |||||
#### Using the New Vowel {.western} | |||||
Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save | |||||
the spectrum sequence with a name which you have chosen for it. You can | |||||
then edit the phoneme file for your language (eg. phsource/ph\_xxx), and | |||||
change a phoneme to refer to your new vowel file. Then do | |||||
`Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to | |||||
re-compile the phoneme data. |
2.1 INSTALLATION {.western} | |||||
---------------- | |||||
### 2.1.1 Linux and other Posix systems {.western} | |||||
There are two versions of the command line program. They both have the | |||||
same command parameters (see below). | |||||
1. 2. | |||||
Place the **espeak-ng** or **speak-ng** executable file in the command | |||||
path, eg in **/usr/local/bin** | |||||
Place the "**espeak-data**" directory in /usr/share as | |||||
**/usr/share/espeak-data**.\ | |||||
Alternatively if it is placed in the user's home directory (i.e. | |||||
**/home/\<user\>/espeak-data**) then that will be used instead. | |||||
#### Dependencies {.western} | |||||
**espeak-ng** uses the PortAudio sound library (version 18), so you will | |||||
need to have the **libportaudio0** library package installed. It may be | |||||
already, since it's used by other software, such as OpenOffice.org and | |||||
the Audacity sound editor. | |||||
Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio | |||||
which has a slightly different API. The speak program can be compiled to | |||||
use version 19 of PortAudio by copying the file portaudio19.h to | |||||
portaudio.h before compiling. | |||||
The speak program may be compiled without using PortAudio, by removing | |||||
the line | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
#define USE_PORTAUDIO | |||||
~~~~ | |||||
in the file speech.h. | |||||
### 2.1.2 Windows {.western} | |||||
The installer: **setup\_espeak.exe** installs the SAPI5 version of | |||||
eSpeak. During installation you need to specify which voices you want to | |||||
appear in SAPI5 voice menus. | |||||
It also installs a command line program **espeak-ng** in the espeak-ng | |||||
program directory. | |||||
2.2 COMMAND OPTIONS {.western} | |||||
------------------- | |||||
### 2.2.1 Examples {.western} | |||||
To use at the command line, type:\ | |||||
**espeak-ng "This is a test"**\ | |||||
or\ | |||||
**espeak-ng -f \<text file\>** | |||||
Or just type\ | |||||
**espeak-ng**\ | |||||
followed by text on subsequent lines. Each line is spoken when RETURN | |||||
is pressed. | |||||
Use **espeak-ng -x** to see the corresponding phoneme codes. | |||||
### 2.2.2 The Command Line Options {.western} | |||||
**espeak-ng [options] ["text words"]** | |||||
: Text input can be taken either from a file, from a string in the | |||||
command, or from stdin. | |||||
**-f \<text file\>** | |||||
: Speaks a text file. | |||||
**--stdin** | |||||
: Takes the text input from stdin. | |||||
If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). \ | |||||
If that is not present then text is taken from stdin, but each line is treated as a separate sentence. \ | |||||
**-a \<integer\>** | |||||
: Sets amplitude (volume) in a range of 0 to 200. The default is 100. | |||||
**-p \<integer\>** | |||||
: Adjusts the pitch in a range of 0 to 99. The default is 50. | |||||
**-s \<integer\>** | |||||
: Sets the speed in words-per-minute (approximate values for the | |||||
default English voice, others may differ slightly). The default | |||||
value is 175. I generally use a faster speed of 260. The lower limit | |||||
is 80. There is no upper limit, but about 500 is probably a | |||||
practical maximum. | |||||
**-b \<integer\>** | |||||
: Input text character format. | |||||
: 1 UTF-8. This is the default. | |||||
: 2 The 8-bit character set which corresponds to the language (eg. | |||||
Latin-2 for Polish). | |||||
: 4 16 bit Unicode. | |||||
: Without this option, eSpeak assumes text is UTF-8, but will | |||||
automatically switch to the 8-bit character set if it finds an | |||||
illegal UTF-8 sequence. | |||||
**-g \<integer\>** | |||||
: Word gap. This option inserts a pause between words. The value is | |||||
the length of the pause, in units of 10 mS (at the default speed of | |||||
170 wpm). | |||||
**-h**or **--help** | |||||
: The first line of output gives the eSpeak version number. | |||||
**-k \<integer\>** | |||||
: Indicate words which begin with capital letters. | |||||
: 1 eSpeak uses a click sound to indicate when a word starts with a | |||||
capital letter, or double click if word is all capitals. | |||||
: 2 eSpeak speaks the word "capital" before a word which begins with | |||||
a capital letter. | |||||
: Other values: eSpeak increases the pitch for words which begin | |||||
with a capital letter. The greater the value, the greater the | |||||
increase in pitch. Try -k20. | |||||
**-l \<integer\>** | |||||
: Line-break length, default value 0. If set, then lines which are | |||||
shorter than this are treated as separate clauses and spoken | |||||
separately with a break between them. This can be useful for some | |||||
text files, but bad for others. | |||||
**-m** | |||||
: Indicates that the text contains SSML (Speech Synthesis Markup | |||||
Language) tags or other XML tags. Those SSML tags which are | |||||
supported are interpreted. Other tags, including HTML, are ignored, | |||||
except that some HTML tags such as \<hr\> \<h2\> and \<li\> ensure a | |||||
break in the speech. | |||||
**-q** | |||||
: Quiet. No sound is generated. This may be useful with options such | |||||
as -x and --pho. | |||||
**-v \<voice filename\>[+\<variant\>]** | |||||
: Sets a Voice for the speech, usually to select a language. eg: | |||||
~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||||
espeak-ng -vaf | |||||
~~~~ | |||||
To use the Afrikaans voice. A modifier after the voice name can be used | |||||
to vary the tone of the voice, eg: | |||||
~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||||
espeak-ng -vaf+3 | |||||
~~~~ | |||||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male voices | |||||
and `+f1 +f2 +f3 +f4 `{.western}which simulate female voices by using | |||||
higher pitches. Other variants include `+croak`{.western} and | |||||
`+whisper`{.western}. | |||||
\<voice filename\> is a file within the `espeak-data/voices`{.western} | |||||
directory.\ | |||||
\<variant\> is a file within the `espeak-data/voices/!v`{.western} | |||||
directory. | |||||
Voice files can specify a language, alternative pronunciations or | |||||
phoneme sets, different pitches, tonal qualities, and prosody for the | |||||
voice. See the [voices.html](voices.html) file. | |||||
Voice names which start with **mb-** are for use with Mbrola diphone | |||||
voices, see [mbrola.html](mbrola.html) | |||||
Some languages may need additional dictionary data, see | |||||
[languages.html](languages.html) | |||||
**-w \<wave file\>** | |||||
Writes the speech output to a file in WAV format, rather than speaking | |||||
it. | |||||
**-x** | |||||
The phoneme mnemonics, into which the input text is translated, are | |||||
written to stdout. If a phoneme name contains more than one letter (eg. | |||||
[tS]), the --sep or --tie option can be used to distinguish this from | |||||
separate phonemes. | |||||
**-X** | |||||
As -x, but in addition, details are shown of the pronunciation rule and | |||||
dictionary list lookup. This can be useful to see why a certain | |||||
pronunciation is being produced. Each matching pronunciation rule is | |||||
listed, together with its score, the highest scoring rule being used in | |||||
the translation. "Found:" indicates the word was found in the dictionary | |||||
lookup list, and "Flags:" means the word was found with only properties | |||||
and not a pronunciation. You can see when a word has been retranslated | |||||
after removing a prefix or suffix. | |||||
**-z** | |||||
The option removes the end-of-sentence pause which normally occurs at | |||||
the end of the text. | |||||
**--stdout** | |||||
Writes the speech output to stdout as it is produced, rather than | |||||
speaking it. The data starts with a WAV file header which indicates the | |||||
sample rate and format of the data. The length field is set to zero | |||||
because the length of the data is unknown when the header is produced. | |||||
**--compile [=\<voice name\>]** | |||||
Compile the pronunciation rule and dictionary lookup data from their | |||||
source files in the current directory. The Voice determines which | |||||
language's files are compiled. For example, if it's an English voice, | |||||
then *en\_rules*, *en\_list*, and *en\_extra* (if present), are compiled | |||||
to replace *en\_dict* in the *speak-data* directory. If no Voice is | |||||
specified then the default Voice is used. | |||||
**--compile-debug [=\<voice name\>]** | |||||
The same as **--compile**, but source line numbers from the \*\_rules | |||||
file are included. These are included in the rules trace when the **-X** | |||||
option is used. | |||||
**--ipa** | |||||
Writes phonemes to stdout, using the International Phonetic Alphabet | |||||
(IPA).\ | |||||
If a phoneme name contains more than one letter (eg. [tS]), the --sep | |||||
or --tie option can be used to distinguish this from separate phonemes. | |||||
**--path [="\<directory path\>"]** | |||||
Specifies the directory which contains the espeak-data directory. | |||||
**--pho** | |||||
When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme | |||||
data (.pho file format) to stdout. This includes the mbrola phoneme | |||||
names with duration and pitch information, in a form which is suitable | |||||
as input to this mbrola voice. The --phonout option can be used to write | |||||
this data to a file. | |||||
**--phonout [="\<filename\>"]** | |||||
If specified, the output from -x, -X, --ipa, and --pho options is | |||||
written to this file, rather than to stdout. | |||||
**--punct [="\<characters\>"]** | |||||
Speaks the names of punctuation characters when they are encountered in | |||||
the text. If \<characters\> are given, then only those listed | |||||
punctuation characters are spoken, eg. `--punct=".,;?"`{.western} | |||||
**--sep [=\<character\>]** | |||||
The character is used to separate individual phonemes in the output | |||||
which is produced by the -x or --ipa options. The default is a space | |||||
character. The character z means use a ZWNJ character (U+200c). | |||||
**--split [=\<minutes\>]** | |||||
Used with **-w**, it starts a new WAV file every `<minutes>`{.western} | |||||
minutes, at the next sentence boundary. | |||||
**--tie [=\<character\>]** | |||||
The character is used within multi-letter phonemes in the output which | |||||
is produced by the -x or --ipa options. The default is the tie | |||||
character ͡ U+361. The character z means use a ZWJ character (U+200d). | |||||
**--voices [=\<language code\>]** | |||||
Lists the available voices.\ | |||||
If =\<language code\> is present then only those voices which are | |||||
suitable for that language are listed.\ | |||||
`--voices=mbrola`{.western} lists the voices which use mbrola diphone | |||||
voices. These are not included in the default `--voices`{.western} list\ | |||||
`--voices=variant`{.western} lists the available voice variants (voice | |||||
modifiers). | |||||
### 2.2.3 The Input Text {.western} | |||||
**HTML Input** | |||||
: If the -m option is used to indicate marked-up text, then HTML can | |||||
be spoken directly. | |||||
**Phoneme Input** | |||||
: As well as plain text, phoneme mnemonics can be used in the text | |||||
input to **espeak-ng**. They are enclosed within double square | |||||
brackets. Spaces are used to separate words and all stressed | |||||
syllables must be marked explicitly. | |||||
: eg: | |||||
`espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" `{.western} | |||||
: This command will speak: "This is some phonetic text input". | |||||
4. TEXT TO PHONEME TRANSLATION {.western} | |||||
------------------------------ | |||||
### 4.1 Translation Files {.western} | |||||
There is a separate set of pronunciation files for each language, their | |||||
names starting with the language name. | |||||
There are two separate methods for translating words into phonemes: | |||||
- - | |||||
These two files are compiled into the file ***\<language\>\_dict*** in | |||||
the espeak-data directory (eg. espeak-data/en\_dict) | |||||
### 4.2 Phoneme names {.western} | |||||
Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, | |||||
or 4 characters. Together with a number of utility codes (eg. stress | |||||
marks and pauses), these are defined in the phoneme data file (see | |||||
\*spec not yet available\*). | |||||
The utility 'phonemes' are: | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **'** | primary stress | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **,** | secondary stress | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **%** | unstressed syllable | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **= ** | put the primary stress on the | | |||||
| | preceding syllable | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\_:** | short pause | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\_** | a shorter pause | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **||** | indicates a word boundary within a | | |||||
| | phoneme string | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **|** | can be used to separate two adjacent | | |||||
| | characters, to prevent them from | | |||||
| | being considered as a | | |||||
| | multi-character phoneme mnemonic | | |||||
+--------------------------------------+--------------------------------------+ | |||||
It is not necessary to specify the stress of every syllable. Stress | |||||
markers are only needed in order to change the effect of the language's | |||||
default stress rule. | |||||
The phonemes which are used to represent a language's sounds are based | |||||
loosely on the Kirshenbaum ascii character representation of the | |||||
International Phonetic Alphabet | |||||
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf) | |||||
### 4.3 Pronunciation Rules {.western} | |||||
The rules in the ***\<language\>\_rules*** file specify the phonemes | |||||
which are used to pronounce each letter, or sequence of letters. Some | |||||
rules only apply when the letter or letters are preceded by, or followed | |||||
by, other specified letters. | |||||
To find the pronunciation of a word, the rules are searched and any | |||||
which match the letters at the in the word are given a score depending | |||||
on how many letters are matched. The pronunciation from the best | |||||
matching rule is chosen. The pointer into the source word is then | |||||
advanced past those letters which have been matched and the process is | |||||
repeated until all the letters of the word have been processed. | |||||
#### 4.3.1 Rule Groups {.western} | |||||
The rules are organized in groups, each starting with a ".group" line: | |||||
When matching a word, firstly the 2-letter group for the two letters at | |||||
the current position in the word (if such a group exists) is searched, | |||||
and then the single-letter group. The highest scoring rule in either of | |||||
those two groups is used. | |||||
#### 4.3.2 Rules {.western} | |||||
Each rule is on separate line, and has the syntax: | |||||
eg. | |||||
"oo" is pronounced as [u:], but when also preceded by "b" and followed | |||||
by "k", it is pronounced [U]. | |||||
In the case of a single-letter group, the first character of \<match\> | |||||
much be the group letter. In the case of a 2-letter group, the first two | |||||
characters of \<match\> must be the group letters. The second and third | |||||
rules above may be in either .group o or .group oo | |||||
Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must | |||||
be lower case, and matching is case-insensitive. Some upper case letters | |||||
are used in \<pre\> and \<post\> with special meanings. | |||||
#### 4.3.3 Special characters in \<phoneme string\>: {.western} | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\_\^\_\<language code\> ** | Translate using a different | | |||||
| | language. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
#### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western} | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\_** | Beginning or end of a word (or a | | |||||
| | hyphen). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **-** | Hyphen. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **A** | Any vowel (the set of vowel | | |||||
| | characters may be defined for a | | |||||
| | particular language). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **C** | Any consonant. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **B H F G Y ** | These may indicate other sets of | | |||||
| | characters (defined for a particular | | |||||
| | language). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **L\<nn\>** | Any of the sequence of characters | | |||||
| | defined as a letter group (see 4.3.1 | | |||||
| | above). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **D** | Any digit. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **K** | Not a vowel (i.e. a consonant or | | |||||
| | word boundary or non-alphabetic | | |||||
| | character). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **X** | There is no vowel until the word | | |||||
| | boundary. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **Z** | A non-alphabetic character. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **%** | Doubled (placed before a character | | |||||
| | in \<pre\> and after it in \<post\>. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **/** | The following character is treated | | |||||
| | literally. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
The sets of letters indicated by A, B, C, E, F G may be defined | |||||
differently for each language. | |||||
Examples of rules: | |||||
~~~~ {.western} | |||||
_) a // "a" at the start of a word | |||||
a (CC // "a" followed by two consonants | |||||
a (C% // "a" followed by a double consonant (the same letter twice) | |||||
a (/% // "a" followed by a percent sign | |||||
%C) a // "a" preceded by a double consonants | |||||
~~~~ | |||||
#### 4.3.5 Special characters only in \<pre\>: {.western} | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **@ ** | Any syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **&** | A syllable which may be stressed | | |||||
| | (i.e. is not defined as unstressed). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **V** | Matches only if a previous word has | | |||||
| | indicated that a verb form is | | |||||
| | expected. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
eg. | |||||
~~~~ {.western} | |||||
@@) bi // "bi" preceded by at least two syllables | |||||
@@a) bi // "bi" preceded by at least 2 syllables and following 'a' | |||||
~~~~ | |||||
Note, that matching characters in the \<pre\> part do not affect the | |||||
syllable counting. | |||||
#### 4.3.6 Special characters only in \<post\>: {.western} | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **@** | A vowel follows somewhere in the | | |||||
| | word. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **+** | Force an increase in the score in | | |||||
| | this rule (may be repeated for more | | |||||
| | effect). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **S\<number\> ** | This number of matching characters | | |||||
| | are a standard suffix, remove them | | |||||
| | and retranslate the word. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **P\<number\>** | This number of matching characters | | |||||
| | are a standard prefix, remove them | | |||||
| | and retranslate the word. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **Lnn** | **nn** is a 2-digit decimal number | | |||||
| | in the range 01 to 20\ | | |||||
| | Matches with any of the letter | | |||||
| | sequences which have been defined | | |||||
| | for letter group **nn** | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **N** | Only use this rule if the word is | | |||||
| | not a retranslation after removing a | | |||||
| | suffix. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\#** | (English specific) change the next | | |||||
| | "e" into a special character "E" | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\$noprefix** | Only use this rule if the word is | | |||||
| | not a retranslation after removing a | | |||||
| | prefix. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\$w\_alt\ | Only use this rule if the word is | | |||||
| \$w\_alt2\ | found in the \*\_list file with the | | |||||
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** | | |||||
| | attribute respectively. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\$p\_alt\ | Only use this rule if the part-word, | | |||||
| \$p\_alt2\ | up to and including the pre and | | |||||
| \$p\_alt3** | match parts of this rule, is found | | |||||
| | in the \*\_list file with the | | |||||
| | **\$alt**, **\$alt2** or **\$alt3** | | |||||
| | attribute respectively. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
eg. | |||||
~~~~ {.western} | |||||
@) ly (_S2 lI // "ly", at end of a word with at least one other | |||||
// syllable, is a suffix pronounced [lI]. Remove | |||||
// it and retranslate the word. | |||||
_) un (@P2 %Vn // "un" at the start of a word is an unstressed | |||||
// prefix pronounced [Vn] | |||||
_) un (i ju: // ... except in words starting "uni" | |||||
_) un (inP2 ,Vn // ... but it is for words starting "unin" | |||||
~~~~ | |||||
S and P must be at the end of the \<post\> string. | |||||
S\<number\> may be followed by additional letters (eg. S2ei ). Some of | |||||
these are probably specific to English, but similar functions could be | |||||
made for other languages. | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **q** | query the \_list file to find stress | | |||||
| | position or other attributes for the | | |||||
| | stem, but don't re-translate the | | |||||
| | word with the suffix removed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **t** | determine the stress pattern of the | | |||||
| | word **before** adding the suffix | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **d ** | the previous letter may have been | | |||||
| | doubled when the suffix was added. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **e** | "e" may have been removed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **i** | "y" may have been changed to "i." | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **v** | the suffix means the verb form of | | |||||
| | pronunciation should be used. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **f** | the suffix means the next word is | | |||||
| | likely to be a verb. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **m** | after this suffix has been removed, | | |||||
| | additional suffixes may be removed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
P\<number\> may be followed by additonal letters (eg. P3v ). | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **t ** | determine the stress pattern of the | | |||||
| | word **before** adding the prefix | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **v** | the suffix means the verb form of | | |||||
| | pronunciation should be used. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
### 4.4 Pronunciation Dictionary List {.western} | |||||
The ***\<language\>\_list*** file contains a list of words whose | |||||
pronunciations are given explicitly, rather than determined by the | |||||
Pronunciation Rules. The ***\<language\>\_extra*** file, if present, is | |||||
also used and it's contents are taken as coming after those in | |||||
***\<language\>\_list***. | |||||
Also the list can be used to specify the stress pattern, or other | |||||
properties, of a word. | |||||
If the Pronunciation rules are applied to a word and indicate a standard | |||||
prefix or suffix, then the word is again looked up in Pronunciation | |||||
Dictionary List after the prefix or suffix has been removed. | |||||
Lines in the dictionary list have the form: | |||||
eg. | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
book bUk | |||||
~~~~ | |||||
Rather than a full pronunciation, just the stress may be given, to | |||||
change where it would be otherwise placed by the Pronunciation Rules: | |||||
~~~~ {.western} | |||||
berlin $2 // stress on second syllable | |||||
absolutely $3 // stress on third syllable | |||||
for $u // an unstressed word | |||||
~~~~ | |||||
#### 4.4.1 Multiple Words {.western} | |||||
A pronunciation may also be specified for a group of words, when these | |||||
appear together. Up to four words may be given, enclosed in brackets. | |||||
This may be used for change the pronunciation or stress pattern when | |||||
these words occur together, | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
(de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string | |||||
~~~~ | |||||
or to run them together, pronounced as a single word | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
(of a) @v@ | |||||
~~~~ | |||||
or to give them a flag when they occur together | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
(such as) sVtS||a2z $pause // precede with a pause | |||||
~~~~ | |||||
Hyphenated words in the ***\<language\>\_list*** file must also be | |||||
enclosed within brackets, because the two parts are considered as | |||||
separate words. | |||||
#### 4.4.2 Special characters in \<phoneme string\>: {.western} | |||||
+--------------------------------------+--------------------------------------+ | |||||
| **\_\^\_\<language code\> ** | Translate using a different | | |||||
| | language. See explanation in 4.3.3 | | |||||
| | above. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
#### 4.4.3 Flags {.western} | |||||
A word (or group of words) may be given one or more flags, either | |||||
instead of, or as well as, the phonetic translation. | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$u | The word is unstressed. In the case | | |||||
| | of a multi-syllable word, a slight | | |||||
| | stress is applied according to the | | |||||
| | default stress rules. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$u1 | The word is unstressed, with a | | |||||
| | slight stress on its 1st syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$u2 | The word is unstressed, with a | | |||||
| | slight stress on its 2nd syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$u3 | The word is unstressed, with a | | |||||
| | slight stress on its 3rd syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| | | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$u+ \$u1+ \$u2+ \$u3+ | As above, but the word has full | | |||||
| | stress if it's at the end of a | | |||||
| | clause. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| | | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$1 | Primary stress on the 1st syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$2 | Primary stress on the 2nd syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$3 | Primary stress on the 3rd syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$4 | Primary stress on the 4th syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$5 | Primary stress on the 5th syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$6 | Primary stress on the 6th syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$7 | Primary stress on the 7th syllable. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| | | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$pause | Ensure a short pause before this | | |||||
| | word (eg. for conjunctions such as | | |||||
| | "and", some prepositions, etc). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$brk | Ensure a very short pause before | | |||||
| | this word, shorter than \$pause (eg. | | |||||
| | for some prepositions, etc). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$only | The rule does not apply if a prefix | | |||||
| | or suffix has already been removed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$onlys | As \$only, except that a standard | | |||||
| | plural ending is allowed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$stem | The rule only applies if a suffix | | |||||
| | has already been removed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$strend | Word is fully stressed if it's at | | |||||
| | the end of a clause. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$strend2 | As \$strend, but the word is also | | |||||
| | stressed if followed only by | | |||||
| | unstressed word(s). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$unstressend | Word is unstressed if it's at the | | |||||
| | end of a clause. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$atend | Use this pronunciation if it's at | | |||||
| | the end of a clause. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$double | Cause a doubling of the initial | | |||||
| | consonant of the following word | | |||||
| | (used for Italian). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$capital | Use this pronunciation if the word | | |||||
| | has initial capital letter (eg. | | |||||
| | polish v Polish). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$allcaps | Use this pronunciation if the word | | |||||
| | is all capitals. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$dot | Ignore a . after this word even when | | |||||
| | followed by a capital letter (eg. | | |||||
| | Mr. Dr. ). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$hasdot | Use this pronunciation if the word | | |||||
| | is followed by a dot. (This | | |||||
| | attribute also implies \$dot). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$sentence | The rule only applies if the clause | | |||||
| | includes end-of-sentence (i.e. it is | | |||||
| | not terminated by a comma). For | | |||||
| | example, "\$atend \$sentence" means | | |||||
| | that the rule only applies at the | | |||||
| | end of a sentence. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$abbrev | This has two meanings.\ | | |||||
| | 1. If there is no phoneme string: | | |||||
| | Speak the word as individual | | |||||
| | letters, even if it contains a vowel | | |||||
| | (eg. "abc" should be spoken as "a" | | |||||
| | "b" "c").\ | | |||||
| | 2. If there is a phoneme string: | | |||||
| | This word is capitalized because it | | |||||
| | is an abbreviation and | | |||||
| | capitalization does not indicate | | |||||
| | emphasis (if the "emphasize | | |||||
| | all-caps" is on). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| | | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$accent | Used for the pronunciation of a | | |||||
| | single alphabetic character. The | | |||||
| | character name is spoken as the | | |||||
| | base-letter name plus the accent | | |||||
| | (diacritic) name. eg. It can be used | | |||||
| | to specify that "â" is spoken as "a" | | |||||
| | "circumflex". | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$combine | This word is treated as though it is | | |||||
| | combined with the following word | | |||||
| | with a hyphen. This may be subject | | |||||
| | to fuither conditions for certain | | |||||
| | languages. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$alt \$alt2 \$alt3 | These are language specific. Their | | |||||
| | use should be described in the | | |||||
| | language's \*\*\_list file | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| | | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$verb | Use this pronunciation if it's a | | |||||
| | verb. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$noun | Use this pronunciation if it's a | | |||||
| | noun. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$past | Use this pronunciation if it's past | | |||||
| | tense. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$verbf | The following word is probably is a | | |||||
| | verb. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$verbsf | The following word is probably is a | | |||||
| | if it has an "s" suffix. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$nounf | The following word is probably not a | | |||||
| | verb. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$pastf | The following word is probably past | | |||||
| | tense. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \$verbextend | Extend the influence of \$verbf and | | |||||
| | \$verbsf. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
The last group are probably English specific, but something similar may | |||||
be useful in other languages. They are a crude attempt to improve the | |||||
accuracy of pairs like ob'ject (verb) v 'object (noun) and read | |||||
(present) v read (past). | |||||
The dictionary list is searched from bottom to top. The first match that | |||||
satisfies any conditions is used (i.e. the one lowest down the list). So | |||||
if we have: | |||||
~~~~ {.western} | |||||
to t@ // unstressed version | |||||
to tu: $atend // stressed version | |||||
~~~~ | |||||
then if "to" is at the end of the clause, we get [tu:], if not then we | |||||
get [t@]. | |||||
#### 4.4.4 Translating a Word to another Word {.western} | |||||
Rather than specifying the pronunciation of a word by a phoneme string, | |||||
you can specify another "sounds like" word. | |||||
Use the attribute **\$text** eg. | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
cough coff $text | |||||
~~~~ | |||||
Alternatively, use the command **\$textmode** on a line by itself to | |||||
turn this on for all subsequent entries in the file, until it's turned | |||||
off by **\$phonememode**. eg. | |||||
~~~~ {.western} | |||||
$textmode | |||||
cough coff | |||||
through threw | |||||
$phonememode | |||||
~~~~ | |||||
This feature cannot be used for the special entries in the **\_list** | |||||
files which start with an underscore, such as numbers. | |||||
Currently "textmode" entries are only recognized for complete words, and | |||||
not for for stems from which a prefix or suffix has been removed (eg. | |||||
the word "coughs" would not match the example above). | |||||
### 4.5 Conditional Rules {.western} | |||||
Rules in a **\_rules** file and entries in a **\_list** file can be made | |||||
conditional. They apply only to some voices. This can be useful to | |||||
specify different pronunciations for different variants of a language | |||||
(dialects or accents). | |||||
Conditional rules have **?** and a condition number at the start if | |||||
the line in the **\_rules** or **\_list** file. This means that the rule | |||||
only applies of that condition number is specified in a **dictrules** | |||||
line in the [voice file](voices.html). | |||||
If the rule starts with **?!** then the rule only applies if the | |||||
condition number is **not** specified in the voice file. eg. | |||||
~~~~ {.western} | |||||
?3 can't kant // only use this if the voice has: dictrules 3 | |||||
?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3 | |||||
~~~~ | |||||
### 4.6 Numbers and Character Names {.western} | |||||
#### 4.6.1 Letter names {.western} | |||||
The names of individual letters can be given either in the **\_rules** | |||||
or **\_list** file. Sometimes an individual letter is also used as a | |||||
word in the language and its pronunciation as a word differs from its | |||||
letter name. If so, it should be listed in the **\_list** file, preceded | |||||
by an underscore, to give the letter name (as distinct from its | |||||
pronunciation as a word). eg. in English: | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
_a eI | |||||
~~~~ | |||||
#### 4.6.2 Numbers {.western} | |||||
The operation the TranslateNumber() function is controlled by the | |||||
language's `langopts.numbers`{.western} option. This constructs spoken | |||||
numbers from fragments according to various options which can be set for | |||||
each language. The number fragments are given in the **\_list** file. | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_0 to \_9 | The numbers 0 to 9 | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_13 | etc. Any pronunciations which are | | |||||
| | needed for specific numbers in the | | |||||
| | range \_10 to \_99 | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_2X \_3X | Twenty, thirty, etc., used to make | | |||||
| | numbers 10 to 99 | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_0C | The word for "hundred" | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_1C \_2C | Special pronunciation for one | | |||||
| | hundred, two hundred, etc., if | | |||||
| | needed. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_1C0 | Special pronunciation (if needed) | | |||||
| | for 100 exactly | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_0M1 | The word for "thousand" | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_0M2 | The word for "million" | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_0M3 | The word for 1000000000 | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_1M1 \_2M1 | Special pronunciation for one | | |||||
| | thousand, two thousand, etc, if | | |||||
| | needed | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_0and | Word for "and" when speaking numbers | | |||||
| | (eg. "two hundred and twenty"). | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_dpt | Word spoken for the decimnal | | |||||
| | point/comma | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| \_dpt2 | Word spoken (if any) at the end of | | |||||
| | all the digits after a decimal | | |||||
| | point. | | |||||
+--------------------------------------+--------------------------------------+ | |||||
### 4.7 Character Substitution {.western} | |||||
Character substitutions can be specified by using a **.replace**section | |||||
at the start of the **\_rules**file. Each line specified either one or | |||||
two alphabetic characters to be replaced by another one or two | |||||
alphabetic characters. This substitution is done to a word before it is | |||||
translated using the spelling-to-phoneme rules. Only the lower-case | |||||
version of the characters needs to be specified. eg. | |||||
.replace\ | |||||
ô ő // (Hungarian) allow the use of o-circumflex instead of | |||||
o-double-accute\ | |||||
û ű | |||||
cx ĉ // (Esperanto) allow "cx" as an alternative to c-circumflex | |||||
fi fi // replace a single character ligature by two characters |
ESPEAKEDIT PROGRAM {.western} | |||||
------------------ | |||||
The **espeakedit** program is used to prepare phoneme data for the | |||||
eSpeak speech synthesizer. | |||||
It has two main functions: | |||||
- - | |||||
### Installation {.western} | |||||
**espeakedit** needs the following packages:\ | |||||
(The package names mentioned here are those from the Ubuntu "Dapper" | |||||
Linux distribution). | |||||
- - - | |||||
In addition, a modified version of **praat** | |||||
([www.praat.org](www.praat.org)) is used to view and analyse WAV sound | |||||
files. This needs the package **libmotif3** to run and **libmotif-dev** | |||||
to compile. | |||||
### Quick Guide {.western} | |||||
This will quickly illustrate the main features. Details of the interface | |||||
and key commands are given in [editor\_if.html](editor_if.html) | |||||
For more detailed information on analysing sound recordings and | |||||
preparing phoneme definitions and keyframe data see | |||||
[analyse.html](analyse.html) (to be written). | |||||
#### Compiling Phoneme Data {.western} | |||||
1. 2. 3. 4. | |||||
#### Keyframe Sequences {.western} | |||||
1. 2. 3. 4. 5. 6. 7. | |||||
#### Text and Prosody Windows {.western} | |||||
1. 2. 3. 4. 5. 6. 7. 8. 9. | |||||
The Prosody window can be used to experiment with different phoneme | |||||
lengths and different intonation. |
USER INTERFACE - FORMANT EDITOR {.western} | |||||
------------------------------- | |||||
### Frame Sequence Display {.western} | |||||
The eSpeak editor can display a number of frame-sequencies in tabbed | |||||
windows. Each frame can contain a short-time frequency spectrum, | |||||
covering the period of one cycle at the sound's pitch. Frames can also | |||||
show: | |||||
- - - - - | |||||
### Text Tab {.western} | |||||
Enter text in the top left text window. Click the **Translate** button | |||||
to see the phonetic transcription in the text window below. Then click | |||||
the **Speak** button to speak the text and show the results in the | |||||
**Prosody** tab, if that is open. | |||||
If changes are made in the **Prosody** tab, then clicking **Speak** will | |||||
speak the modified prosody while **Translate** will revert to the | |||||
default prosody settings for the text. | |||||
To enter phonetic symbols (Kirschenbaum encoding) in the top left text | |||||
window, enclose them within [[ ]]. | |||||
### Spect Tab {.western} | |||||
The "Spect" tab in the left panel of the eSpeak editor shows information | |||||
about the currently selected frame and sequence. | |||||
- - - - - - | |||||
### Key Commands {.western} | |||||
- - - - - | |||||
USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"} | |||||
------------------------------- | |||||
- |
# eSpeak NG - Documentation | |||||
====================== | |||||
### [Usage](commands.md) | |||||
### [Languages](languages.md) | |||||
### [Voice Files](voices.md) | |||||
Voice files specify a language and other characteristics of a voice. | |||||
### [Mbrola Voices](mbrola.md) | |||||
eSpeak NG can be used as a front-end for Mbrola diphone voices. | |||||
### [Pronunciation Dictionary](dictionary.md) | |||||
### [Adding a Language](add_language.md) | |||||
How to add or improve a language. | |||||
### [Phonemes](phonemes.md) | |||||
The list of phoneme mnemonics for English, for use in the Pronunciation | |||||
Dictionary. | |||||
### [Phoneme Tables](phontab.md) | |||||
The tables of the phonemes used by each language, with their properties | |||||
and sound production. | |||||
### [Intonation](intonation.md) | |||||
Different intonation "tunes" may be defined for different languages for | |||||
clauses which end in full-stop, comma, question-mark, and | |||||
exclamation-mark. | |||||
### [eSpeak NG Library API](speak_lib.h) | |||||
API definition and header file for a shared library version of eSpeak NG. | |||||
### [Markup tags](ssml.md) | |||||
SSML (Speech Synthesis Markup Language) and HTML tags recognized by | |||||
eSpeak NG. | |||||
### [The espeakedit program](editor.md) | |||||
GUI software to edit vowel files and to compile the phoneme data for use | |||||
by eSpeak NG. See also [Espeakedit user interface](editor_if.md). | |||||
INTONATION {.western} | |||||
---------- | |||||
In eSpeak's standard intonation model, a "tune" is applied to each | |||||
clause depending on its punctuation. Other intonation models may be used | |||||
for some languages, such as tone languages. | |||||
Named tunes are defined in the text file: | |||||
`phsource/intonation`{.western}. This file must be compiled for use by | |||||
eSpeak by using the espeakedit program, using the menu option: | |||||
`Compile -> Compile intonation data`{.western}. | |||||
### Clauses {.western} | |||||
The tunes which are used for a language can be specified by using a | |||||
`tunes`{.western} statement in a voice file in | |||||
`espeak-data/voices`{.western}. eg: | |||||
`tunes s1 c1 q1 e1`{.western} | |||||
It's parameters are four tune names which are used for clauses which end | |||||
in: | |||||
1. 2. 3. 4. | |||||
A clause consists of the following parts: | |||||
- - - - | |||||
### Tune definitions {.western} | |||||
Here is an example tune definition from the file | |||||
`phsource/intonation`{.western}. | |||||
~~~~ {.western} | |||||
tune s1 | |||||
prehead 46 57 | |||||
headenv fall 16 | |||||
head 4 80 55 -8 -5 | |||||
headextend 0 63 38 13 0 | |||||
nucleus fall 70 18 24 12 | |||||
nucleus0 fall 64 8 | |||||
endtune | |||||
~~~~ | |||||
It contains: | |||||
**tune** \<tune name\> | |||||
: Starts the definition of a tune. The `tune name`{.western} can | |||||
be used in a `tunes`{.western} statements in voice files. | |||||
**endtune** \<tune name\> | |||||
: Ends the definition of a tune. | |||||
**prehead** \<start pitch\> \<end pitch\> | |||||
: Gives the pitch path for any series of unstressed syllables before | |||||
the first stressed syllable. | |||||
**headenv** \<envelope\> \<height\> | |||||
: Gives the pitch envelope which is used for stressed syllables in the | |||||
head (before the nucleus), including `onset`{.western} and | |||||
`headlast`{.western} syllables if these are specified. | |||||
`height`{.western} gives a pitch range for the envelope. | |||||
**head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\> | |||||
: `start pitch`{.western} and `end pitch`{.western} give a pitch | |||||
path for the stressed syllables of the head. `steps`{.western} is | |||||
the maximum number of stressed syllables for which this applies. If | |||||
there are additional stressed syllables, then the | |||||
`headextend`{.western} statement is used for them. | |||||
: `unstressed start`{.western} and `unstressed end`{.western} give | |||||
a pitch path for unstressed syllables between two stressed | |||||
syllables. Their values are relative to the pitch of the previous | |||||
stressed syllable. Values are usually negative, meaning that the | |||||
unstressed syllables have lower pitch than the previous stressed | |||||
syllable. | |||||
**headextend** \<percentage list\> | |||||
: If the head contains more stressed syllables than is specified by | |||||
`steps`{.western}, then `percentage list`{.western} is used. It | |||||
contains up to 8 numbers which are used repeatedly for the | |||||
additional stressed syllables. A value of 0 corresponds to the lower | |||||
the `start pitch`{.western} and `end pitch`{.western} values of the | |||||
`head`{.western} statement. 100 corresponds to the higher value. | |||||
Negative values and values greater than 100 are allowed. | |||||
**nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\> | |||||
: This gives the pitch envelope and pitch range of the last stressed | |||||
syllable of the clause. `tail start`{.western} and | |||||
`tail end`{.western} give a pitch path for the unstressed syllables | |||||
which are after the last stressed syllable. | |||||
**nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\> | |||||
: This is used instead of `nucleus`{.western} if there are no | |||||
unstressed syllables after the last stressed syllable. In this case, | |||||
the pitch changes of the nucleus and the tail and both included in | |||||
the nucleus. | |||||
The following attributes may also be included: | |||||
**onset** \<pitch\> \<unstressed start\> \<unstressed end\> | |||||
: This specifies the pitch for the first stressed syllable of the | |||||
head. If the `onset`{.western} statement is present, then the | |||||
`head`{.western} statement used for the stressed syllables after the | |||||
first. | |||||
**headlast** \<pitch\> \<unstressed start\> \<unstressed end\> | |||||
: This specifies the pitch for the last stressed syllable of the head | |||||
(i.e. the stressed syllable before the nucleus). | |||||
3. LANGUAGES {.western} | |||||
------------ | |||||
**Languages**. The eSpeak speech synthesizer supports several languages, | |||||
however in many cases these are initial drafts and need more work to | |||||
improve them. Assistance from native speakers is welcome for these, or | |||||
other new languages. Please contact me if you want to help. | |||||
eSpeak does text to speech synthesis for the following languages, some | |||||
better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, | |||||
Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, | |||||
German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, | |||||
Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, | |||||
Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, | |||||
Swedish, Tamil, Turkish, Vietnamese, Welsh. | |||||
#### Help Needed {.western} | |||||
Many of these are just experimental attempts at these languages, | |||||
produced after a quick reading of the corresponding article on | |||||
wikipedia.org. They will need work or advice from native speakers to | |||||
improve them. Please contact me if you want to advise or assist with | |||||
these or other languages. | |||||
The sound of some phonemes may be poorly implemented, particularly [r] | |||||
since I'm English and therefore unable to make a "proper" [r] sound. | |||||
A major factor is the rhythm or cadance. An Italian speaker told me the | |||||
Italian voice improved from "difficult to understand" to "good" by | |||||
changing the relative length of stressed syllables. Identifying | |||||
unstressed function words in the xx\_list file is also important to make | |||||
the speech flow well. See [Adding or Improving a | |||||
Language](add_language.html) | |||||
#### Character sets {.western} | |||||
Languages recognise text either as UTF8 or alternatively in an 8-bit | |||||
character set which is appropriate for that language. For example, for | |||||
Polish this is Latin2, for Russian it is KOI8-R. This choice can be | |||||
overridden by a line in the voices file to specify an ISO 8859 character | |||||
set, eg. for Russian the line: | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
charset 5 | |||||
~~~~ | |||||
will mean that ISO 8859-5 is used as the 8-bit character set rather than | |||||
KOI8-R. | |||||
In the case of a language which uses a non-Latin character set (eg. | |||||
Greek or Russian) if the text contains a word with Latin characters then | |||||
that particular word will be pronounced using English pronunciation | |||||
rules and English phonemes. Speaking entirely English text using a Greek | |||||
or Russian voice will sound OK, but each word is spoken separately so it | |||||
won't flow properly. | |||||
Sample texts in various languages can be found at | |||||
[http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias) | |||||
and [www.gutenberg.org](http://www.gutenberg.org/) | |||||
### 3.1 Voice Files {.western} | |||||
A number of Voice files are provided in the | |||||
`espeak-data/voices`{.western} directory. You can select one of these | |||||
with the **-v \<voice filename\>** parameter to the speak command, eg: | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
espeak-ng -vaf | |||||
~~~~ | |||||
to speak using the Afrikaans voice. | |||||
Language voices generally start with the 2 letter [ISO 639-1 | |||||
code](http://en.wikipedia.org/wiki/ISO_639-1) for the language. If the | |||||
language does not have an ISO 639-1 code, then the 3 letter [ISO 639-3 | |||||
code](http://www.sil.org/iso639-3/codes.asp) can be used. | |||||
For details of the voice files see [Voices](voices.html). | |||||
#### Default Voice {.western} | |||||
### 3.2 English Voices {.western} | |||||
### 3.3 Voice Variants {.western} | |||||
To make alternative voices for a language, you can make additional voice | |||||
files in espeak-data/voices which contains commands to change various | |||||
voice and pronunciation attributes. See [voices.html](voices.html). | |||||
Alternatively there are some preset voice variants which can be applied | |||||
to any of the language voices, by appending `+`{.western} and a variant | |||||
name. Their effects are defined by files in | |||||
`espeak-data/voices/!v`{.western}. | |||||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male | |||||
voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and | |||||
`+croak +whisper`{.western} for other effects. For example: | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
espeak-ng -ven+m3 | |||||
~~~~ | |||||
The available voice variants can be listed with: | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
espeak-ng --voices=variant | |||||
~~~~ | |||||
### 3.4 Other Languages {.western} | |||||
The eSpeak speech synthesizer does text to speech for the following | |||||
additional langauges. | |||||
### 3.5 Provisional Languages {.western} | |||||
These languages are only initial naive implementations which have had | |||||
little or no feedback and improvement from native speakers. | |||||
### 3.6 Mbrola Voices {.western} | |||||
Some additional voices, whose name start with **mb-** (for example | |||||
**mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak | |||||
does the spelling-to-phoneme translation and intonation. See | |||||
[mbrola.html](mbrola.html). |
MBROLA VOICES {.western} | |||||
------------- | |||||
The Mbrola project is a collection of diphone voices for speech | |||||
synthesis. They do not include any text-to-phoneme translation, so this | |||||
must be done by another program. The Mbrola voices are cost-free but are | |||||
not open source. They are available from the Mbrola website at:\ | |||||
[http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) | |||||
eSpeak can be used as a front-end to Mbrola. It provides the | |||||
spelling-to-phoneme translation and intonation, which Mbrola then uses | |||||
to generate speech sound. | |||||
### Voice Names {.western} | |||||
To use a Mbrola voice, eSpeak needs information to translate from its | |||||
own phonemes to the equivalent Mbrola phonemes. This has been set up for | |||||
only some voices so far. | |||||
The eSpeak voices which use Mbrola are named as:\ | |||||
**mb-**xxx | |||||
where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola | |||||
"**en1**" English voice). These voice files are in eSpeak's directory | |||||
`espeak-data/voices/mbrola`{.western}. | |||||
The installation instructions below use the Mbrola voice "en1" as an | |||||
example. You can use other mbrola voices for which there is an | |||||
equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}. | |||||
There are some additional eSpeak Mbrola voices which speak English text | |||||
using a Mbrola voice for a different language. These contain the name of | |||||
the Mbrola voice with a suffix **-en**. For example, the voice | |||||
**mb-de4-en** will speak English text with a German accent by using the | |||||
Mbrola **de4** voice. | |||||
### Windows Installation {.western} | |||||
The SAPI5 version of eSpeak uses the mbrola.dll. | |||||
1. 2. 3. 4. | |||||
### Linux Installation {.western} | |||||
From eSpeak version 1.44 onwards, eSpeak calls the mbrola program | |||||
directly, rather than passing phoneme data to it using a pipe. | |||||
1. 2. 3. | |||||
### Mbrola Voice Files {.western} | |||||
eSpeak's voice files for Mbrola voices are in directory | |||||
`espeak-data/voices/mbrola`{.western}. They contain a line:\ | |||||
`mbrola <voice> <translation>`{.western} \ | |||||
eg.\ | |||||
`mbrola en1 en1_phtrans`{.western} | |||||
- - | |||||
They are binary files which are compiled, using espeakedit, from source | |||||
files in `phsource/mbrola`{.western}, see below. | |||||
### Mbrola Phoneme Translation Data {.western} | |||||
Mbrola phoneme translation files specify translations from eSpeak | |||||
phoneme names to mbrola phoneme names. They are referenced from voice | |||||
files. | |||||
The source files are in `phsource/mbrola`{.western}. These are compiled | |||||
using the `espeakedit`{.western} program | |||||
(`Compile->Compile mbrola phonemes list`{.western}) to produce data | |||||
files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak. | |||||
Each line in the mbrola phoneme translation file contains: | |||||
`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western} | |||||
**\<control\>** | |||||
- - - - | |||||
**\<espeak ph1\>**\ | |||||
The eSpeak phoneme which is to be translated to an mbrola phoneme. | |||||
**\<espeak ph2\>**\ | |||||
If this field is not `NULL`{.western}, then the match only occurs if | |||||
this field matches the next phoneme. If control bit 1 is set, then the | |||||
*previous* rather than the *next* phoneme is matched. This field may | |||||
also have the following values:\ | |||||
`VWL`{.western} matches any Vowel phoneme. | |||||
**\<percent\>**\ | |||||
If this field is zero then only one mbrola phoneme is used. If this | |||||
field is non-zero, then two mbrola phonemes are used, and this value | |||||
gives the percentage length of the first mbrola phoneme. | |||||
**\<mbrola ph1\>**\ | |||||
The mbrola phoneme to which the eSpeak phoneme is translated. This | |||||
field may be `NULL`{.western}. | |||||
**\<mbrola ph2\>**\ | |||||
The second mbrola phoneme. This field is only used if the \<percent\> | |||||
field is not zero. | |||||
The list is searched from start to finish, until a match is found. | |||||
Therefore, a line with more specific match condition should appear | |||||
before a line which matches the same eSpeak phoneme but with a more | |||||
general condition. | |||||
The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes | |||||
which are used for each language. Translations for all these should be | |||||
given in the mbrola phoneme translation file. In addition, some phonemes | |||||
which are referenced from phoneme files (eg. | |||||
`phsource/ph_language, phsource/phonemes`{.western}) in lines such as: | |||||
~~~~ {.western} | |||||
beforenotvowel l/ | |||||
reduceto a# 0 | |||||
~~~~ | |||||
should also be included, even though they don't appear in | |||||
`dictsource/dict_phonemes`{.western}. | |||||
If the language's \*\_list or \*\_rules files includes rules to speak | |||||
words "as English" the mbrola phoneme translation file should include | |||||
rules which translate English phonemes into near equivalents, so that | |||||
they can spoken by the mbrola voice. |
PHONEMES {.western} | |||||
-------- | |||||
In general a different set of phonemes can be defined for each language. | |||||
In most cases different languages inherit the same basic set of | |||||
consonants. They can add to these or modify them as needed. | |||||
The phoneme mnemonics are based on the scheme by Kirshenbaum which | |||||
represents International Phonetic Alphabet symbols using ascii | |||||
characters. See: | |||||
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf). | |||||
Phoneme mnemonics can be used directly in the text input to | |||||
**espeak-ng**. They are enclosed within double square brackets. Spaces | |||||
are used to separate words, and all stressed syllables must be marked | |||||
explicitly. eg:\ | |||||
`[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western} | |||||
### English Consonants {.western} | |||||
`[p]`{.western} | |||||
`[b]`{.western} | |||||
`[t]`{.western} | |||||
`[d]`{.western} | |||||
`[tS]`{.western} | |||||
**ch**urch | |||||
`[dZ]`{.western} | |||||
**j**udge | |||||
`[k]`{.western} | |||||
`[g]`{.western} | |||||
`[f]`{.western} | |||||
`[v]`{.western} | |||||
`[T]`{.western} | |||||
**th**in | |||||
`[D]`{.western} | |||||
**th**is | |||||
`[s]`{.western} | |||||
`[z]`{.western} | |||||
`[S]`{.western} | |||||
**sh**op | |||||
`[Z]`{.western} | |||||
plea**s**ure | |||||
`[h]`{.western} | |||||
`[m]`{.western} | |||||
`[n]`{.western} | |||||
`[N]`{.western} | |||||
si**ng** | |||||
`[l]`{.western} | |||||
`[r]`{.western} | |||||
**r**ed (Omitted if not immediately followed by a vowel). | |||||
`[j]`{.western} | |||||
**y**es | |||||
`[w]`{.western} | |||||
**Some Additional Consonants** | |||||
\ | |||||
`[C]`{.western} | |||||
German i**ch** | |||||
`[x]`{.western} | |||||
German bu**ch** | |||||
`[l^]`{.western} | |||||
Italian **gl**i | |||||
`[n^]`{.western} | |||||
Spanish **ñ** | |||||
### English Vowels {.western} | |||||
These are the phonemes which are used by the English spelling-to-phoneme | |||||
translations (en\_rules and en\_list). In some varieties of English | |||||
different phonemes may have the same sound, but they are kept separate | |||||
because they may differ in another variety. | |||||
In rhotic accents, such as General American, the phonemes | |||||
`[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound. | |||||
`[@]`{.western} | |||||
alph**a** | |||||
schwa | |||||
`[3]`{.western} | |||||
bett**er** | |||||
rhotic schwa. In British English this is the same as `[@]`{.western}, | |||||
but it includes 'r' colouring in American and other rhotic accents. In | |||||
these cases a separate `[r]`{.western} should not be included unless it | |||||
is followed immediately by another vowel. | |||||
`[3:]`{.western} | |||||
n**ur**se | |||||
`[@L]`{.western} | |||||
simp**le** | |||||
`[@2]`{.western} | |||||
the | |||||
Used only for "the". | |||||
`[@5]`{.western} | |||||
to | |||||
Used only for "to". | |||||
`[a]`{.western} | |||||
tr**a**p | |||||
`[aa]`{.western} | |||||
b**a**th | |||||
This is `[a]`{.western} in some accents, `[A:]`{.western} in others. | |||||
`[a#]`{.western} | |||||
**a**bout | |||||
This may be `[@]`{.western} or may be a more open schwa. | |||||
`[A:]`{.western} | |||||
p**al**m | |||||
`[A@]`{.western} | |||||
st**ar**t | |||||
`[E]`{.western} | |||||
dr**e**ss | |||||
`[e@]`{.western} | |||||
squ**are** | |||||
`[I]`{.western} | |||||
k**i**t | |||||
`[I2]`{.western} | |||||
**i**ntend | |||||
As `[I]`{.western}, but also indicates an unstressed syllable. | |||||
`[i]`{.western} | |||||
happ**y** | |||||
An unstressed "i" sound at the end of a word. | |||||
`[i:]`{.western} | |||||
fl**ee**ce | |||||
`[i@]`{.western} | |||||
n**ear** | |||||
`[0]`{.western} | |||||
l**o**t | |||||
`[V]`{.western} | |||||
str**u**t | |||||
`[u:]`{.western} | |||||
g**oo**se | |||||
`[U]`{.western} | |||||
f**oo**t | |||||
`[U@]`{.western} | |||||
c**ure** | |||||
`[O:]`{.western} | |||||
th**ou**ght | |||||
`[O@]`{.western} | |||||
n**or**th | |||||
`[o@]`{.western} | |||||
f**or**ce | |||||
`[aI]`{.western} | |||||
pr**i**ce | |||||
`[eI]`{.western} | |||||
f**a**ce | |||||
`[OI]`{.western} | |||||
ch**oi**ce | |||||
`[aU]`{.western} | |||||
m**ou**th | |||||
`[oU]`{.western} | |||||
g**oa**t | |||||
`[aI@]`{.western} | |||||
sc**ie**nce | |||||
`[aU@]`{.western} | |||||
h**our** | |||||
### Some Additional Vowels {.western} | |||||
Other languages will have their own vowel definitions, eg: | |||||
+--------------------------------------+--------------------------------------+ | |||||
| `[e]`{.western} | German **eh**, French **é** | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| `[o]`{.western} | German **oo**, French **o** | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| `[y]`{.western} | German **ü**, French **u** | | |||||
+--------------------------------------+--------------------------------------+ | |||||
| `[Y]`{.western} | German **ö**, French **oe** | | |||||
+--------------------------------------+--------------------------------------+ | |||||
`[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western} |
PHONEME TABLES {.western} | |||||
-------------- | |||||
A phoneme table defines all the phonemes which are used by a language, | |||||
together with their properties and the data for their production as | |||||
sounds. | |||||
Generally each language has its own phoneme table, although additional | |||||
phoneme tables can be used for different voices within the language. | |||||
These alternatives are referenced from Voice files. | |||||
A phoneme table does not need to define all the phonemes used by a | |||||
language. It can inherit the phonemes from a previously defined phoneme | |||||
table. For example, a phoneme table may redefine (or add) some of the | |||||
vowels that it uses, but inherit most of its consonants from a standard | |||||
set. | |||||
The source files for the phoneme data are in the "phsource" directory in | |||||
the espeakedit download package. "Vowel files", which are referenced in | |||||
FMT(), VowelStart(), and VowelEnding() instructions are made using the | |||||
espeakedit program. | |||||
### Phoneme files {.western} | |||||
The phoneme tables are defined in a master phoneme file, named | |||||
**phonemes**. This starts with the **base** phoneme table followed by | |||||
phoneme tables for other languages and voices. These inherit phonemes | |||||
from the **base** table or previously defined tables. | |||||
In addition to phoneme definitions, the phoneme file can contain the | |||||
following: | |||||
**include** \<filename\> | |||||
: Includes the text of the specified file at this point. This allows | |||||
different phoneme tables to be kept in different text files, for | |||||
convenience. \<filename\> is a relative path. The included file can | |||||
itself contain **include** statements. | |||||
**phonemetable** \<name\> \<parent\> | |||||
: Starts a new phoneme table, and ends the previous table.\ | |||||
\<name\> Is the name of this phoneme table. This name is used in | |||||
Voice files.\ | |||||
\<parent\> Is the name of a previously defined phoneme table whose | |||||
phoneme definitions are inherited by this one. The name **base** | |||||
indicates the first (base) phoneme table. | |||||
### Phoneme definitions {.western} | |||||
Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and | |||||
later. | |||||
A phoneme table contains a list of phoneme definitions. Each starts with | |||||
the keyword **phoneme** and the phoneme name (this is the name used in | |||||
the pronunciation rules in a language's \*\_rules and \*\_list files), | |||||
and ends with the keyword **endphoneme**. For example: | |||||
~~~~ {.western} | |||||
phoneme aI | |||||
vowel | |||||
starttype #a endtype #i | |||||
length 230 | |||||
FMT(vowels/ai) | |||||
endphoneme | |||||
phoneme s | |||||
vls alv frc sibilant | |||||
voicingswitch z | |||||
lengthmod 3 | |||||
Vowelin f1=0 f2=1700 -300 300 f3=-100 80 | |||||
Vowelout f1=0 f2=1700 -300 250 f3=-100 80 rms=20 | |||||
IF nextPh(isPause) THEN | |||||
WAV(ufric/s_) | |||||
ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN | |||||
WAV(ufric/s!) | |||||
ENDIF | |||||
WAV(ufric/s) | |||||
endphoneme | |||||
~~~~ | |||||
A phoneme definition contains both static properties and executed | |||||
instructions. The instructions may contain conditional statements, so | |||||
that the effect of the phoneme may be different depending on adjacent | |||||
phonemes, whether the syllable is stressed, etc. | |||||
The instructions of a phoneme are interpreted in two different phases. | |||||
In the first phase, the instructions may change the phoneme and replace | |||||
it by a different phoneme. In the second phase, instructions are used to | |||||
produce the sound for the phoneme. | |||||
The **import\_phoneme** statement can be used to copy a previously | |||||
defined phoneme from a specified phoneme table. For example: | |||||
~~~~ {.western} | |||||
phoneme t | |||||
import_phoneme base/t[ | |||||
endphoneme | |||||
~~~~ | |||||
means: `phoneme t`{.western} in this phoneme table is a copy of | |||||
`phoneme t[`{.western} from phoneme table "base". A **length** | |||||
instruction can be used after **import\_phoneme** to vary the length | |||||
from the original. | |||||
### Phoneme Properties {.western} | |||||
Within the phoneme definition the following lines may occur: ( (V) | |||||
indicates only for vowels, (C) only for consonants) | |||||
### Phoneme Instructions {.western} | |||||
Phoneme Instructions may be included within conditional statements. | |||||
During the first phase of phoneme interpretation, an instruction which | |||||
causes a change to a different phoneme will terminate the instructions. | |||||
During the second phase, FMT() and WAV() instructions will terminate the | |||||
instructions. | |||||
### Conditional Statements {.western} | |||||
Phoneme definitions can contain conditional statements such as: | |||||
~~~~ {.western} | |||||
IF <condition> THEN | |||||
<statements> | |||||
ENDIF | |||||
~~~~ | |||||
or more generally: | |||||
~~~~ {.western} | |||||
IF <condition> THEN | |||||
<statements> | |||||
ELIF <condition> THEN | |||||
<statements> | |||||
... | |||||
ELSE | |||||
<statements> | |||||
ENDIF | |||||
~~~~ | |||||
where the `ELSE`{.western} and multiple `ELSE`{.western} parts are | |||||
optional. | |||||
Multiple conditions may be joined with `AND`{.western} or | |||||
`OR`{.western}, but not a mixture of `AND`{.western}s and | |||||
`OR`{.western}s. | |||||
A condition may be preceded by `NOT`{.western}. For example: | |||||
~~~~ {.western} | |||||
IF <condition> AND NOT <condition> THEN | |||||
<statements> | |||||
ENDIF | |||||
~~~~ | |||||
**Condition** Can be: | |||||
**Attributes** | |||||
### Sound Specifications {.western} | |||||
There are three ways to produce sounds: | |||||
- - - | |||||
### Vowel Transitions {.western} | |||||
These specify how a consonant affects an adjacent vowel. A consonant may | |||||
cause a transition in the vowel's formants as the mouth changes shape | |||||
between the consonant and the vowel. The following attributes may be | |||||
specified. Note that the maximum rate of change of formant frequencies | |||||
is limited by the speak program. | |||||
TEXT MARKUP {.western} | |||||
----------- | |||||
### SSML: Speech Synthesis Markup Language {.western} | |||||
The following markup tags and attributes are recognised: | |||||
**\<speak\>** | |||||
- - | |||||
**\<voice\>** | |||||
- - - - - | |||||
**\<prosody\>** | |||||
- - - - | |||||
**\<say-as\>** | |||||
- - - - - | |||||
**\<mark\>** name | |||||
**\<s\>** | |||||
- | |||||
**\<p\>** | |||||
- | |||||
**\<sub\>** alias | |||||
**\<tts:style\>** | |||||
- - | |||||
**\<audio\>** src | |||||
**\<emphasis\>** | |||||
- | |||||
**\<break\>** | |||||
- - | |||||
### HTML {.western} | |||||
eSpeak can speak HTML text directly, or text containing both SSML and | |||||
HTML markup.\ | |||||
Any unrecognised tags are ignored. | |||||
The following tags case a sentence break.\ | |||||
**\<br\> \<dd\> \<li\> \<img\> \<td\> ** | |||||
The following tags case a paragraph break.\ | |||||
**\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> ** | |||||
Text between the following tags is ignored.\ | |||||
**\<script\> ... \</script\> \ | |||||
\<style\> ... \</style\> ** |
5. VOICES {.western} | |||||
--------- | |||||
### 5.1 Voice Files {.western} | |||||
A Voice file specifies a language (and possibly a language variant or | |||||
dialect) together with various attributes that affect the | |||||
characteristics of the voice quality and how the language is spoken. | |||||
Voice files are placed in the `espeak-data/voices`{.western} directory, | |||||
or within subdirectories in there. | |||||
The available voice files can be listed by: | |||||
~~~~ {.western} | |||||
espeak-ng --voices | |||||
or | |||||
espeak-ng --voices=<language> | |||||
~~~~ | |||||
also | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
espeak-ng --voices=<variant> | |||||
~~~~ | |||||
Lists voice variants which can be applied to eSpeak voices. | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
espeak-ng --voices=<mbrola> | |||||
~~~~ | |||||
Lists the Mbrola voices. | |||||
### 5.2 Contents of Voice Files {.western} | |||||
The **language** attribute is mandatory. All the other attributes are | |||||
optional. | |||||
#### Identification Attributes {.western} | |||||
**name \<name\>** | |||||
A name given to this voice. | |||||
**language \<language code\> [\<priority\>]** | |||||
This attribute should appear before the other attributes which are | |||||
listed below. | |||||
It selects the default behaviour and characteristics for the language, | |||||
and sets default values for "phonemes", "dictionary" and other | |||||
attributes. The \<language code\> should be a two-letter ISO 639-1 | |||||
language code. One or more language variant codes may be appended, | |||||
separated by hyphens. (eg. en-uk-north). | |||||
The optional \<priority\> value gives the preference of this voice | |||||
compared with others for the specified language. A low value indicates a | |||||
more preferred voice. The default value is 5. | |||||
More than one **language** line may be present. A voice may be selected | |||||
for other related languages (variants which have the same initial 2 | |||||
letter language code as the specified language), but it will be less | |||||
preferred for these. Different language variants may be specified by | |||||
additional **language** lines in order to indicate that this is a | |||||
preferred voice for them also. Eg. | |||||
~~~~ {.western} | |||||
language en-uk-north | |||||
language en | |||||
~~~~ | |||||
indicates that this is voice is for the "en-uk-north" dialect, but it is | |||||
also a main choice when a general "en" language is specified. Without | |||||
the second **language** line, it would be disfavoured for "en" for being | |||||
a more specialised voice. | |||||
**gender \<gender\> [\<age\>]** | |||||
This attribute is only a label for use in voice selection. It doesn't | |||||
change the sound of the voice. | |||||
\<gender\> may be male, female, or unknown.\ | |||||
\<age\> is optional and gives an age in years. | |||||
**pitch \<base\> \<range\>** | |||||
Two integer values. The first gives a base pitch to the voice (value in | |||||
Hz) The second controls the range of pitches used by the voice. Setting | |||||
it equal to the base pitch will give a monotone. The default values are | |||||
82 118. | |||||
**formant \<number\> \<frequency\> \<strength\> \<width\> | |||||
\<freq\_add\>** | |||||
Systematically adjusts the frequency, strength, and width of the | |||||
resonance peaks of the voice. Values are percentages of the default | |||||
values. Changing these affects the tone/quality of the voice. | |||||
**freq\_add**Adds a constant value (in Hz) to the frequency of the | |||||
formant peak. The value may be negative. | |||||
- - - - | |||||
**echo \<delay\> \<amplitude\>** | |||||
Parameter 1 gives the delay in mS (0 to 250mS).\ | |||||
Parameter 2 gives the echo amplitude (0 to 100).\ | |||||
Adding some echo can give a clearer or more interesting sound, | |||||
especially when listening through a domestic stereo sound system, rather | |||||
than small computer speakers. | |||||
**tone** | |||||
Controls the tone of the sound.\ | |||||
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\> | |||||
which define a frequency response graph. Frequency is in Hz and | |||||
amplitude is in the range 0 to 255. The default is: | |||||
` `{.western}`tone 600 170 1200 135 2000 110`{.western} | |||||
This means that from frequency 0Hz to 600Hz the amplitude is 170. From | |||||
600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases | |||||
to 110 at 2000Hz and remains at 110 at higher frequencies. This | |||||
adjustment applies only to voiced sounds such as vowels and sonorant | |||||
consonants (such as [n] and [l]). Unvoiced sounds such as [s] are | |||||
unaffected. | |||||
This **tone** statement can also appear in | |||||
`espeak-data/config`{.western}, in which case it applies to all voices | |||||
which don't have their own **tone** statement. | |||||
**flutter \<value\>** | |||||
Default value: 2.\ | |||||
Adds pitch fluctuations to give a wavering or older-sounding voice. A | |||||
large value (eg. 20) makes the voice sound "croaky". | |||||
**roughness \<value\>** | |||||
Default value: 2. Range 0 - 7\ | |||||
Reduces the amplitude of alternate waveform cycles in order to make the | |||||
voice sound creaky. | |||||
**voicing \<value\>** | |||||
Default value: 100.\ | |||||
Adjusts the strength of formant-synthesized sounds (vowels and sonorant | |||||
consonants). | |||||
**consonants \<value\> \<value\>** | |||||
Default values: 100, 100.\ | |||||
Adjusts the strength of noise sounds which are used in consonants. The | |||||
first value is the strength of unvoiced consonants such as "s" and "t". | |||||
The second value is the strength of the noise component of voiced | |||||
consonants such as "z" and "d". | |||||
**breath \<up to 8 integer values\>** | |||||
Default values: 0.\ | |||||
Adds noise which corresponds to the formant frequency peaks. The values | |||||
give the strength of noise for each formant peak (formants 1 to 8). | |||||
Use together with a low or zero value of the **voicing** attribute to | |||||
make a "wisper". For example:\ | |||||
`breath 75 75 60 40 15 10 breathw 150 150 200 200 400 400 voicing 18 flutter 20 formant 0 100 0 100 // remove formant 0 `{.western} | |||||
**breathw \<up to 8 integer values\>** | |||||
These values give bandwidths of the noise peaks of the **breath** | |||||
attribute. If **breathw** values are not given, then suitable default | |||||
values will be used. | |||||
**speed \<value\>** | |||||
Default value 100.\ | |||||
Adjusts the speaking speed by a percentage of the default rate. This | |||||
can be used if a language voice seems faster or slower compared to other | |||||
voices. | |||||
**phonemes \<name\>** | |||||
Specifies which set of phonemes to use from those contained in the | |||||
phontab, phonindex, and phondata data files. This is a **phonemetable** | |||||
name as given in the "phoneme" source file. | |||||
This parameter is usually not needed as it is set by default to the | |||||
first two letters of the "language" parameter. However, different voices | |||||
of the same language can use different phoneme sets, to give different | |||||
accents. | |||||
**dictionary \<name\>** | |||||
Specifies which pair of dictionary files to use. eg. "english" indicates | |||||
that *speak-data/en\_dict* should be used to translate from words to | |||||
phonemes. This parameter is usually not needed as it is set by default | |||||
to the first two letters of "language" parameter. | |||||
**dictrules \<list of rule numbers\>** | |||||
Gives a list of conditional dictionary rules which are applied for this | |||||
voice. Rule numbers are in the range 0 to 31 and are specific to a | |||||
language dictionary. They apply to rules in the language's **\_rules** | |||||
dictionary file and also its **\_list** exceptions list. See | |||||
[dictionary.html](dictionary.html). | |||||
**replace \<flags\> \<phoneme\> \<replacement phoneme\>** | |||||
Replace a phoneme by another whenever it occurs. | |||||
\<replacement phoneme\> may be NULL. | |||||
Flags: bit 0: replacement only occurs on the final phoneme of a word.\ | |||||
Flags: bit 1: replacement doesn't occur in stressed syllables.\ | |||||
eg. | |||||
~~~~ {.western} | |||||
replace 0 h NULL // drops h's | |||||
replace 0 V U // replaces vowel in 'strut' by that in 'foot' | |||||
// as occurs in northern British English | |||||
replace 3 N n // change 'fishing' to 'fishin' etc. | |||||
// (only the last phoneme of a word, only in unstressed syllables) | |||||
~~~~ | |||||
The phoneme mnemonics can be defined for each language, but some are | |||||
listed in [phonemes.html](phonemes.html) | |||||
**stressLength \<8 integer values\>** | |||||
Eight integer parameters. These control the relative lengths of the | |||||
vowels in stressed and unstressed syllables. | |||||
- - - - - - - - | |||||
**stressAdd \<8 integer values\>** | |||||
Eight integer parameters. These are added to the voice's corresponding | |||||
stressLength values. They are used in the voice variant files in | |||||
`espeak-data/voices/!v`{.western} to give some variety. Negative values | |||||
may be used. | |||||
**stressAmp \<8 integer values\>** | |||||
Eight integer parameters. These control the relative amplitudes of the | |||||
vowels in stressed and unstressed syllables (see stressLength above). | |||||
The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although | |||||
these defaults may be different for particular languages. | |||||
**intonation \<param1\>** | |||||
- - - - | |||||
**charset \<param1\>** | |||||
The ISO 8859 character set number. (not all are implemented). | |||||
**dictmin \<value\>** | |||||
Used for some languages to detect if additional language data is | |||||
installed. If the size of the compiled dictionary data for the language | |||||
(the file `espeak-data/*_dict`{.western}) is less than this size then a | |||||
warning is given. | |||||
**alphabet2 \<alphabet\> \<language\>** | |||||
Used to specify a language to be used to speak words which are written | |||||
in a non-native alphabet. eg: | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
alphabet2 cyr ru | |||||
~~~~ | |||||
Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default | |||||
language for latin alphabet is English. | |||||
**dictdialect \<dialect\>** | |||||
Words can be marked in the \*\_list or \*\_rules file to be spoken using | |||||
a foreign voice. This **dictdialect** attribute can be used to specify | |||||
which dialect of the foreign language should be used, instead of the | |||||
default dialect. The currently available dialects are:\ | |||||
**en-us** (US English)\ | |||||
**es-la** (Latin American Spanish).\ | |||||
eg. | |||||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||||
dictdialect en-us | |||||
~~~~ | |||||
This means that any words or rules which are maked with \_\^\_EN will be | |||||
spoken with the US English voice instead of the default UK English | |||||
voice. | |||||
Additional attributes are available to set various internal options | |||||
which control how language is processed. These would normally be set in | |||||
the program code rather than in a voice file. | |||||
A number of Voice files are provided in the | |||||
`espeak-data/voices`{.western} directory. You can select one of these | |||||
with the **-v \<voice filename\>** parameter to the speak command. | |||||
**default** | |||||
This voice is used if none is specified in the speak command. You can | |||||
copy your preferred voice to "default" so you can use the speak command | |||||
without the need to specify a voice. | |||||
For a list of voices provided for English and other languages see | |||||
[Languages](languages.html). |