@@ -0,0 +1,157 @@ | |||
6. ADDING OR IMPROVING A LANGUAGE {.western} | |||
--------------------------------- | |||
Most of the work doesn't need any programming knowledge. Just an | |||
understanding of the language, an awareness of its features, patience | |||
and attention to detail. Wikipedia is a good source of basic phonetic | |||
information, eg | |||
[http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel). | |||
In many cases it should be fairly easy to add a rough implementation of | |||
a new language, hopefully enough to be intelligible. After that it's a | |||
gradual process of improvement. | |||
### 6.1 Language Code {.western} | |||
Generally, the language's international [ISO | |||
639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify | |||
the language. It is used in the filenames which contain the language's | |||
data. In the examples below the code **"fr"** is used as an example. | |||
Replace this with the code of your language. | |||
If the language does not have a 2-letter ISO\_639-1 code, then use the | |||
3-letter ISO\_639-3 code. Language codes may differ from country codes. | |||
It is possible to have different variants of a language for different | |||
dialects. For example the sound of some phonemes are changed, or some of | |||
the pronunciation rules differ. | |||
### 6.2 Language Files {.western} | |||
The following files are needed for your language. | |||
- - - - | |||
The **fr\_rules** and **fr\_list** files are compiled to produce the | |||
file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking. | |||
### 6.3 Voice File {.western} | |||
Each language needs a voice file in **espeak-data/voices** or | |||
**espeak-data/voices/test**. The filename of the default voice for a | |||
language should be the same as the language code (eg. "fr" for French). | |||
Details of the contents of voice files are given in | |||
[voices.html](http://espeak.sf.net/voices.html). | |||
The simplest voice file would contain just 2 lines to give the language | |||
name and language code, eg: | |||
~~~~ {.western} | |||
name french | |||
language fr | |||
~~~~ | |||
This language code specifies which phoneme table and dictionary to use | |||
(i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If | |||
needed, these can be overridden by **phonemes** and **dictionary** | |||
attributes in the voice file. For example you may want to start the | |||
implementation of a new language by using the phoneme table of an | |||
existing language. | |||
### 6.4 Phoneme Definition File {.western} | |||
You must first decide on the set of phonemes (vowel and consonant | |||
sounds) for the language. These should be defined in a phoneme | |||
definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your | |||
language. A reference to this file is then included at the end of the | |||
master phoneme file, **phsource/phonemes**, eg: | |||
~~~~ {.western} | |||
phonemetable fr base | |||
include ph_french | |||
~~~~ | |||
This example defines a phoneme table **"fr"** which inherits the | |||
contents of phoneme table **"base"**. Its contents are found in the file | |||
**ph\_french**. | |||
The **base** phoneme table contains definitions of a basic set of | |||
consonants, and also some "control" phonemes such as stress marks and | |||
pauses. These are defined in **phsource/phonemes**. The phoneme table | |||
for a language will inherit these, or alternatively it may inherit the | |||
phoneme table of another language which in turn inherits the **base** | |||
phoneme table. | |||
The phonemes file for the language defines those additional phonemes | |||
which are not inherited (generally the vowels and diphthongs, plus any | |||
additional consonants that are needed), or phonemes whose definitions | |||
differ from the inherited version (eg. the redefinition of a consonant). | |||
Details of phonemes files are given in | |||
[phontab.html](http://espeak.sf.net/phontab.html). | |||
The **Compile phoneme data** function of the **espeakedit** program | |||
compiles the phonemes files of all languages to produce the files | |||
**espeak-data/phontab**, **phonindex**, and **phondata** which are used | |||
by eSpeak. | |||
For many languages, the consonant phonemes which are already available | |||
in eSpeak, together with the available vowel files which can be used to | |||
define vowel phonemes, will be sufficient. At least for an initial | |||
implementation. | |||
### 6.5 Dictionary Files {.western} | |||
Once the language's phonemes have been defined, then pronunciation | |||
dictionary data can be produced in order to translate the language's | |||
source text into phonemes. This consists of two source files: | |||
**fr\_rules** (the spelling to phoneme rules) and **fr\_list** (an | |||
exceptions list, and attributes of certain words). The corresponding | |||
compiled data file is **espeak-data/fr\_dict** which is produced from | |||
**fr\_rules** and **fr\_list** sources by the command: | |||
> `espeak-ng --compile=fr`{.western}. | |||
Or by using the **espeakedit** program. | |||
Details of the contents of the dictionary files are given in | |||
[dictionary.html](http://espeak.sf.net/dictionary.html). | |||
The **fr\_list** file contains: | |||
- - - - | |||
### 6.6 Program Code {.western} | |||
The behaviour of the eSpeak program is controlled by various options | |||
such as: | |||
- - - - | |||
The function SetTranslator() at the start of the source code file | |||
tr\_languages.cpp recognizes the language code and sets the appropriate | |||
options. For a new language, you would add its language code and the | |||
required options in SetTranslator(). However, this may not be necessary | |||
during testing because most of the options can also be set in the voice | |||
file in espeak-data/voices (see [Voice | |||
files](http://espeak.sf.net/voices.html)). | |||
### 6.7 Improving a Language {.western} | |||
Listen carefully to the eSpeak voice. Try to identify what sounds wrong | |||
and what needs to be improved. | |||
- - - - - | |||
**If you are interested in working on a language, please contact me so | |||
that I can set up the initial data and discuss the features of the | |||
language.** | |||
For most of the eSpeak voices, I do not speak or understand the | |||
language, and I do not know how it should sound. I can only make | |||
improvements as a result of feedback from speakers of that language. If | |||
you want to help to improve a language, listen carefully and try to | |||
identify individual errors, either in the spelling-to-phoneme | |||
translation, the position of stressed syllables within words, or the | |||
sound of phonemes, or problems with rhythm and vowel lengths. |
@@ -0,0 +1,101 @@ | |||
ANALYSIS | |||
======== | |||
(Further notes are needed) | |||
Recordings of spoken words and phrases can be analysed to try and make | |||
eSpeak match a language more closely. Unlike most other (larger and | |||
better quality) synthesizers, eSpeak's data is not produced directly | |||
from recorded sounds. To use an analogy, it's like a drawing or sketch | |||
compared with a photograph. Or vector graphics compared with a bitmap | |||
image. It's smaller, less accurate, with less subtlety, but it can | |||
sometimes show some aspects of the picture more clearly than a more | |||
accurate image. | |||
#### Recording Sounds {.western} | |||
Recordings should be made while speaking slowly, clearly, and firmly and | |||
loudly (but not shouting). Speak about half a metre from the microphone. | |||
Try to avoid background noise and hum interference from electrical power | |||
cables. | |||
#### Praat {.western} | |||
I use a modified version of the praat program | |||
([www.praat.org](www.praat.org)) to view and analyse both sound | |||
recordings and output from eSpeak. The modification adds a new function | |||
(`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and | |||
produces a file which can be loaded into espeakedit. Details of the | |||
modification are in the `"praat-mod"`{.western} directory in the | |||
espeakedit package. The analysis contains a sequence of frames, one per | |||
cycle at the speech's fundamental frequency. Each frame is a short time | |||
spectrum, together with praat's estimation of the f1 to f5 formant | |||
frequencies at the time of that cycle. I also use Praat's | |||
`New->Record_mono_sound`{.western} function to make sound recordings. | |||
### Vowels and Diphthongs {.western} | |||
#### Analysing a Recording {.western} | |||
Make a recording, with a male voice, and trim it in Praat to keep just | |||
the required vowel sound. Then use the new | |||
`Spectrum->To_eSpeak`{.western} modification (this was named | |||
`To_Spectrogram2`{.western} in earlier versions) to analyse the sound. | |||
It produces a file named `"spectrum.dat"`{.western}. Load the | |||
`"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open | |||
functions, `File->Open`{.western} and `File->Open2`{.western}. They are | |||
the same, except that they remember different paths. I generally use | |||
`File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file. | |||
The data is displayed in espeakedit as a sequence of spectrum frames | |||
(see [editor.html](editor.html)). | |||
#### Tone Quality {.western} | |||
It can be difficult to match the tonal quality of a new vowel to be | |||
compatible with existing vowel files. This is determined by the relative | |||
heights and widths of the formant peaks. These vary depending on how the | |||
recording was made, the microphone, and the strength and tone of the | |||
voice. Also the positions of the higher peaks (F3 upwards) can vary | |||
depending on the characteristics of the speaker's voice. Formant peaks | |||
correspond to resonances within the mouth and throat, and they depend on | |||
its size and shape. With a female voice, all the formants (F1 upwards) | |||
are generally shifted to higher frequencies. For these reasons, it's | |||
best to use a male voice, and to use its analysed spectra only as | |||
guidance. Rather than construct formant-peaks entirely to match the | |||
analysed data, instead copy keyframes from a similar existing vowel. | |||
Then make small adjustments to match the position of the F1, F2, F3 | |||
formant peaks and hopefully produce the required vowel sound. | |||
#### Using an Existing Vowel File {.western} | |||
Choose a similar vowel file from `phsource/vowel`{.western} and open it | |||
into espeakedit. It may be useful to use | |||
`phsource/vowel/vowelchart`{.western} as a map to show how vowel files | |||
compare with each other. You can select a keyframe from the vowel file | |||
and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame | |||
of the new spectrum sequence. Then adjust the peaks to match the new | |||
frame. Press F1 to hear the sound of the formant peaks in the selected | |||
frame. The F0 peak is provided in order to adjust the correct balance of | |||
low frequencies, below the F1 peak. If the sound is too muffled, or | |||
conversely, too "thin", try adjusting the amplitude or position of the | |||
F0 peak. | |||
#### Length and Amplitude {.western} | |||
Use an existing vowel file as a guide for how to set the amplitude and | |||
length of the keyframes. At the right of each keyframe, its length is | |||
shown in mS and under that is its relative (RMS) amplitude. The second | |||
keyframe should be marked with a red marker (use CTRL-M to toggle this). | |||
This divides the vowel into the front-part (with one frame), and the | |||
rest. Use F2 to play the sound of the new vowel sequence. It will also | |||
produce a WAV file (the default name is speech.wav) which you can read | |||
into praat to see whether it has a sensible shape. | |||
#### Using the New Vowel {.western} | |||
Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save | |||
the spectrum sequence with a name which you have chosen for it. You can | |||
then edit the phoneme file for your language (eg. phsource/ph\_xxx), and | |||
change a phoneme to refer to your new vowel file. Then do | |||
`Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to | |||
re-compile the phoneme data. |
@@ -0,0 +1,279 @@ | |||
2.1 INSTALLATION {.western} | |||
---------------- | |||
### 2.1.1 Linux and other Posix systems {.western} | |||
There are two versions of the command line program. They both have the | |||
same command parameters (see below). | |||
1. 2. | |||
Place the **espeak-ng** or **speak-ng** executable file in the command | |||
path, eg in **/usr/local/bin** | |||
Place the "**espeak-data**" directory in /usr/share as | |||
**/usr/share/espeak-data**.\ | |||
Alternatively if it is placed in the user's home directory (i.e. | |||
**/home/\<user\>/espeak-data**) then that will be used instead. | |||
#### Dependencies {.western} | |||
**espeak-ng** uses the PortAudio sound library (version 18), so you will | |||
need to have the **libportaudio0** library package installed. It may be | |||
already, since it's used by other software, such as OpenOffice.org and | |||
the Audacity sound editor. | |||
Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio | |||
which has a slightly different API. The speak program can be compiled to | |||
use version 19 of PortAudio by copying the file portaudio19.h to | |||
portaudio.h before compiling. | |||
The speak program may be compiled without using PortAudio, by removing | |||
the line | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
#define USE_PORTAUDIO | |||
~~~~ | |||
in the file speech.h. | |||
### 2.1.2 Windows {.western} | |||
The installer: **setup\_espeak.exe** installs the SAPI5 version of | |||
eSpeak. During installation you need to specify which voices you want to | |||
appear in SAPI5 voice menus. | |||
It also installs a command line program **espeak-ng** in the espeak-ng | |||
program directory. | |||
2.2 COMMAND OPTIONS {.western} | |||
------------------- | |||
### 2.2.1 Examples {.western} | |||
To use at the command line, type:\ | |||
**espeak-ng "This is a test"**\ | |||
or\ | |||
**espeak-ng -f \<text file\>** | |||
Or just type\ | |||
**espeak-ng**\ | |||
followed by text on subsequent lines. Each line is spoken when RETURN | |||
is pressed. | |||
Use **espeak-ng -x** to see the corresponding phoneme codes. | |||
### 2.2.2 The Command Line Options {.western} | |||
**espeak-ng [options] ["text words"]** | |||
: Text input can be taken either from a file, from a string in the | |||
command, or from stdin. | |||
**-f \<text file\>** | |||
: Speaks a text file. | |||
**--stdin** | |||
: Takes the text input from stdin. | |||
If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). \ | |||
If that is not present then text is taken from stdin, but each line is treated as a separate sentence. \ | |||
**-a \<integer\>** | |||
: Sets amplitude (volume) in a range of 0 to 200. The default is 100. | |||
**-p \<integer\>** | |||
: Adjusts the pitch in a range of 0 to 99. The default is 50. | |||
**-s \<integer\>** | |||
: Sets the speed in words-per-minute (approximate values for the | |||
default English voice, others may differ slightly). The default | |||
value is 175. I generally use a faster speed of 260. The lower limit | |||
is 80. There is no upper limit, but about 500 is probably a | |||
practical maximum. | |||
**-b \<integer\>** | |||
: Input text character format. | |||
: 1 UTF-8. This is the default. | |||
: 2 The 8-bit character set which corresponds to the language (eg. | |||
Latin-2 for Polish). | |||
: 4 16 bit Unicode. | |||
: Without this option, eSpeak assumes text is UTF-8, but will | |||
automatically switch to the 8-bit character set if it finds an | |||
illegal UTF-8 sequence. | |||
**-g \<integer\>** | |||
: Word gap. This option inserts a pause between words. The value is | |||
the length of the pause, in units of 10 mS (at the default speed of | |||
170 wpm). | |||
**-h**or **--help** | |||
: The first line of output gives the eSpeak version number. | |||
**-k \<integer\>** | |||
: Indicate words which begin with capital letters. | |||
: 1 eSpeak uses a click sound to indicate when a word starts with a | |||
capital letter, or double click if word is all capitals. | |||
: 2 eSpeak speaks the word "capital" before a word which begins with | |||
a capital letter. | |||
: Other values: eSpeak increases the pitch for words which begin | |||
with a capital letter. The greater the value, the greater the | |||
increase in pitch. Try -k20. | |||
**-l \<integer\>** | |||
: Line-break length, default value 0. If set, then lines which are | |||
shorter than this are treated as separate clauses and spoken | |||
separately with a break between them. This can be useful for some | |||
text files, but bad for others. | |||
**-m** | |||
: Indicates that the text contains SSML (Speech Synthesis Markup | |||
Language) tags or other XML tags. Those SSML tags which are | |||
supported are interpreted. Other tags, including HTML, are ignored, | |||
except that some HTML tags such as \<hr\> \<h2\> and \<li\> ensure a | |||
break in the speech. | |||
**-q** | |||
: Quiet. No sound is generated. This may be useful with options such | |||
as -x and --pho. | |||
**-v \<voice filename\>[+\<variant\>]** | |||
: Sets a Voice for the speech, usually to select a language. eg: | |||
~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||
espeak-ng -vaf | |||
~~~~ | |||
To use the Afrikaans voice. A modifier after the voice name can be used | |||
to vary the tone of the voice, eg: | |||
~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"} | |||
espeak-ng -vaf+3 | |||
~~~~ | |||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male voices | |||
and `+f1 +f2 +f3 +f4 `{.western}which simulate female voices by using | |||
higher pitches. Other variants include `+croak`{.western} and | |||
`+whisper`{.western}. | |||
\<voice filename\> is a file within the `espeak-data/voices`{.western} | |||
directory.\ | |||
\<variant\> is a file within the `espeak-data/voices/!v`{.western} | |||
directory. | |||
Voice files can specify a language, alternative pronunciations or | |||
phoneme sets, different pitches, tonal qualities, and prosody for the | |||
voice. See the [voices.html](voices.html) file. | |||
Voice names which start with **mb-** are for use with Mbrola diphone | |||
voices, see [mbrola.html](mbrola.html) | |||
Some languages may need additional dictionary data, see | |||
[languages.html](languages.html) | |||
**-w \<wave file\>** | |||
Writes the speech output to a file in WAV format, rather than speaking | |||
it. | |||
**-x** | |||
The phoneme mnemonics, into which the input text is translated, are | |||
written to stdout. If a phoneme name contains more than one letter (eg. | |||
[tS]), the --sep or --tie option can be used to distinguish this from | |||
separate phonemes. | |||
**-X** | |||
As -x, but in addition, details are shown of the pronunciation rule and | |||
dictionary list lookup. This can be useful to see why a certain | |||
pronunciation is being produced. Each matching pronunciation rule is | |||
listed, together with its score, the highest scoring rule being used in | |||
the translation. "Found:" indicates the word was found in the dictionary | |||
lookup list, and "Flags:" means the word was found with only properties | |||
and not a pronunciation. You can see when a word has been retranslated | |||
after removing a prefix or suffix. | |||
**-z** | |||
The option removes the end-of-sentence pause which normally occurs at | |||
the end of the text. | |||
**--stdout** | |||
Writes the speech output to stdout as it is produced, rather than | |||
speaking it. The data starts with a WAV file header which indicates the | |||
sample rate and format of the data. The length field is set to zero | |||
because the length of the data is unknown when the header is produced. | |||
**--compile [=\<voice name\>]** | |||
Compile the pronunciation rule and dictionary lookup data from their | |||
source files in the current directory. The Voice determines which | |||
language's files are compiled. For example, if it's an English voice, | |||
then *en\_rules*, *en\_list*, and *en\_extra* (if present), are compiled | |||
to replace *en\_dict* in the *speak-data* directory. If no Voice is | |||
specified then the default Voice is used. | |||
**--compile-debug [=\<voice name\>]** | |||
The same as **--compile**, but source line numbers from the \*\_rules | |||
file are included. These are included in the rules trace when the **-X** | |||
option is used. | |||
**--ipa** | |||
Writes phonemes to stdout, using the International Phonetic Alphabet | |||
(IPA).\ | |||
If a phoneme name contains more than one letter (eg. [tS]), the --sep | |||
or --tie option can be used to distinguish this from separate phonemes. | |||
**--path [="\<directory path\>"]** | |||
Specifies the directory which contains the espeak-data directory. | |||
**--pho** | |||
When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme | |||
data (.pho file format) to stdout. This includes the mbrola phoneme | |||
names with duration and pitch information, in a form which is suitable | |||
as input to this mbrola voice. The --phonout option can be used to write | |||
this data to a file. | |||
**--phonout [="\<filename\>"]** | |||
If specified, the output from -x, -X, --ipa, and --pho options is | |||
written to this file, rather than to stdout. | |||
**--punct [="\<characters\>"]** | |||
Speaks the names of punctuation characters when they are encountered in | |||
the text. If \<characters\> are given, then only those listed | |||
punctuation characters are spoken, eg. `--punct=".,;?"`{.western} | |||
**--sep [=\<character\>]** | |||
The character is used to separate individual phonemes in the output | |||
which is produced by the -x or --ipa options. The default is a space | |||
character. The character z means use a ZWNJ character (U+200c). | |||
**--split [=\<minutes\>]** | |||
Used with **-w**, it starts a new WAV file every `<minutes>`{.western} | |||
minutes, at the next sentence boundary. | |||
**--tie [=\<character\>]** | |||
The character is used within multi-letter phonemes in the output which | |||
is produced by the -x or --ipa options. The default is the tie | |||
character ͡ U+361. The character z means use a ZWJ character (U+200d). | |||
**--voices [=\<language code\>]** | |||
Lists the available voices.\ | |||
If =\<language code\> is present then only those voices which are | |||
suitable for that language are listed.\ | |||
`--voices=mbrola`{.western} lists the voices which use mbrola diphone | |||
voices. These are not included in the default `--voices`{.western} list\ | |||
`--voices=variant`{.western} lists the available voice variants (voice | |||
modifiers). | |||
### 2.2.3 The Input Text {.western} | |||
**HTML Input** | |||
: If the -m option is used to indicate marked-up text, then HTML can | |||
be spoken directly. | |||
**Phoneme Input** | |||
: As well as plain text, phoneme mnemonics can be used in the text | |||
input to **espeak-ng**. They are enclosed within double square | |||
brackets. Spaces are used to separate words and all stressed | |||
syllables must be marked explicitly. | |||
: eg: | |||
`espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" `{.western} | |||
: This command will speak: "This is some phonetic text input". | |||
@@ -0,0 +1,655 @@ | |||
4. TEXT TO PHONEME TRANSLATION {.western} | |||
------------------------------ | |||
### 4.1 Translation Files {.western} | |||
There is a separate set of pronunciation files for each language, their | |||
names starting with the language name. | |||
There are two separate methods for translating words into phonemes: | |||
- - | |||
These two files are compiled into the file ***\<language\>\_dict*** in | |||
the espeak-data directory (eg. espeak-data/en\_dict) | |||
### 4.2 Phoneme names {.western} | |||
Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, | |||
or 4 characters. Together with a number of utility codes (eg. stress | |||
marks and pauses), these are defined in the phoneme data file (see | |||
\*spec not yet available\*). | |||
The utility 'phonemes' are: | |||
+--------------------------------------+--------------------------------------+ | |||
| **'** | primary stress | | |||
+--------------------------------------+--------------------------------------+ | |||
| **,** | secondary stress | | |||
+--------------------------------------+--------------------------------------+ | |||
| **%** | unstressed syllable | | |||
+--------------------------------------+--------------------------------------+ | |||
| **= ** | put the primary stress on the | | |||
| | preceding syllable | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_:** | short pause | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_** | a shorter pause | | |||
+--------------------------------------+--------------------------------------+ | |||
| **||** | indicates a word boundary within a | | |||
| | phoneme string | | |||
+--------------------------------------+--------------------------------------+ | |||
| **|** | can be used to separate two adjacent | | |||
| | characters, to prevent them from | | |||
| | being considered as a | | |||
| | multi-character phoneme mnemonic | | |||
+--------------------------------------+--------------------------------------+ | |||
It is not necessary to specify the stress of every syllable. Stress | |||
markers are only needed in order to change the effect of the language's | |||
default stress rule. | |||
The phonemes which are used to represent a language's sounds are based | |||
loosely on the Kirshenbaum ascii character representation of the | |||
International Phonetic Alphabet | |||
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf) | |||
### 4.3 Pronunciation Rules {.western} | |||
The rules in the ***\<language\>\_rules*** file specify the phonemes | |||
which are used to pronounce each letter, or sequence of letters. Some | |||
rules only apply when the letter or letters are preceded by, or followed | |||
by, other specified letters. | |||
To find the pronunciation of a word, the rules are searched and any | |||
which match the letters at the in the word are given a score depending | |||
on how many letters are matched. The pronunciation from the best | |||
matching rule is chosen. The pointer into the source word is then | |||
advanced past those letters which have been matched and the process is | |||
repeated until all the letters of the word have been processed. | |||
#### 4.3.1 Rule Groups {.western} | |||
The rules are organized in groups, each starting with a ".group" line: | |||
When matching a word, firstly the 2-letter group for the two letters at | |||
the current position in the word (if such a group exists) is searched, | |||
and then the single-letter group. The highest scoring rule in either of | |||
those two groups is used. | |||
#### 4.3.2 Rules {.western} | |||
Each rule is on separate line, and has the syntax: | |||
eg. | |||
"oo" is pronounced as [u:], but when also preceded by "b" and followed | |||
by "k", it is pronounced [U]. | |||
In the case of a single-letter group, the first character of \<match\> | |||
much be the group letter. In the case of a 2-letter group, the first two | |||
characters of \<match\> must be the group letters. The second and third | |||
rules above may be in either .group o or .group oo | |||
Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must | |||
be lower case, and matching is case-insensitive. Some upper case letters | |||
are used in \<pre\> and \<post\> with special meanings. | |||
#### 4.3.3 Special characters in \<phoneme string\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_\^\_\<language code\> ** | Translate using a different | | |||
| | language. | | |||
+--------------------------------------+--------------------------------------+ | |||
#### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_** | Beginning or end of a word (or a | | |||
| | hyphen). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **-** | Hyphen. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **A** | Any vowel (the set of vowel | | |||
| | characters may be defined for a | | |||
| | particular language). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **C** | Any consonant. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **B H F G Y ** | These may indicate other sets of | | |||
| | characters (defined for a particular | | |||
| | language). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **L\<nn\>** | Any of the sequence of characters | | |||
| | defined as a letter group (see 4.3.1 | | |||
| | above). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **D** | Any digit. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **K** | Not a vowel (i.e. a consonant or | | |||
| | word boundary or non-alphabetic | | |||
| | character). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **X** | There is no vowel until the word | | |||
| | boundary. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **Z** | A non-alphabetic character. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **%** | Doubled (placed before a character | | |||
| | in \<pre\> and after it in \<post\>. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **/** | The following character is treated | | |||
| | literally. | | |||
+--------------------------------------+--------------------------------------+ | |||
The sets of letters indicated by A, B, C, E, F G may be defined | |||
differently for each language. | |||
Examples of rules: | |||
~~~~ {.western} | |||
_) a // "a" at the start of a word | |||
a (CC // "a" followed by two consonants | |||
a (C% // "a" followed by a double consonant (the same letter twice) | |||
a (/% // "a" followed by a percent sign | |||
%C) a // "a" preceded by a double consonants | |||
~~~~ | |||
#### 4.3.5 Special characters only in \<pre\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **@ ** | Any syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **&** | A syllable which may be stressed | | |||
| | (i.e. is not defined as unstressed). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **V** | Matches only if a previous word has | | |||
| | indicated that a verb form is | | |||
| | expected. | | |||
+--------------------------------------+--------------------------------------+ | |||
eg. | |||
~~~~ {.western} | |||
@@) bi // "bi" preceded by at least two syllables | |||
@@a) bi // "bi" preceded by at least 2 syllables and following 'a' | |||
~~~~ | |||
Note, that matching characters in the \<pre\> part do not affect the | |||
syllable counting. | |||
#### 4.3.6 Special characters only in \<post\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **@** | A vowel follows somewhere in the | | |||
| | word. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **+** | Force an increase in the score in | | |||
| | this rule (may be repeated for more | | |||
| | effect). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **S\<number\> ** | This number of matching characters | | |||
| | are a standard suffix, remove them | | |||
| | and retranslate the word. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **P\<number\>** | This number of matching characters | | |||
| | are a standard prefix, remove them | | |||
| | and retranslate the word. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **Lnn** | **nn** is a 2-digit decimal number | | |||
| | in the range 01 to 20\ | | |||
| | Matches with any of the letter | | |||
| | sequences which have been defined | | |||
| | for letter group **nn** | | |||
+--------------------------------------+--------------------------------------+ | |||
| **N** | Only use this rule if the word is | | |||
| | not a retranslation after removing a | | |||
| | suffix. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\#** | (English specific) change the next | | |||
| | "e" into a special character "E" | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\$noprefix** | Only use this rule if the word is | | |||
| | not a retranslation after removing a | | |||
| | prefix. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\$w\_alt\ | Only use this rule if the word is | | |||
| \$w\_alt2\ | found in the \*\_list file with the | | |||
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | attribute respectively. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\$p\_alt\ | Only use this rule if the part-word, | | |||
| \$p\_alt2\ | up to and including the pre and | | |||
| \$p\_alt3** | match parts of this rule, is found | | |||
| | in the \*\_list file with the | | |||
| | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | attribute respectively. | | |||
+--------------------------------------+--------------------------------------+ | |||
eg. | |||
~~~~ {.western} | |||
@) ly (_S2 lI // "ly", at end of a word with at least one other | |||
// syllable, is a suffix pronounced [lI]. Remove | |||
// it and retranslate the word. | |||
_) un (@P2 %Vn // "un" at the start of a word is an unstressed | |||
// prefix pronounced [Vn] | |||
_) un (i ju: // ... except in words starting "uni" | |||
_) un (inP2 ,Vn // ... but it is for words starting "unin" | |||
~~~~ | |||
S and P must be at the end of the \<post\> string. | |||
S\<number\> may be followed by additional letters (eg. S2ei ). Some of | |||
these are probably specific to English, but similar functions could be | |||
made for other languages. | |||
+--------------------------------------+--------------------------------------+ | |||
| **q** | query the \_list file to find stress | | |||
| | position or other attributes for the | | |||
| | stem, but don't re-translate the | | |||
| | word with the suffix removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **t** | determine the stress pattern of the | | |||
| | word **before** adding the suffix | | |||
+--------------------------------------+--------------------------------------+ | |||
| **d ** | the previous letter may have been | | |||
| | doubled when the suffix was added. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **e** | "e" may have been removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **i** | "y" may have been changed to "i." | | |||
+--------------------------------------+--------------------------------------+ | |||
| **v** | the suffix means the verb form of | | |||
| | pronunciation should be used. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **f** | the suffix means the next word is | | |||
| | likely to be a verb. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **m** | after this suffix has been removed, | | |||
| | additional suffixes may be removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
P\<number\> may be followed by additonal letters (eg. P3v ). | |||
+--------------------------------------+--------------------------------------+ | |||
| **t ** | determine the stress pattern of the | | |||
| | word **before** adding the prefix | | |||
+--------------------------------------+--------------------------------------+ | |||
| **v** | the suffix means the verb form of | | |||
| | pronunciation should be used. | | |||
+--------------------------------------+--------------------------------------+ | |||
### 4.4 Pronunciation Dictionary List {.western} | |||
The ***\<language\>\_list*** file contains a list of words whose | |||
pronunciations are given explicitly, rather than determined by the | |||
Pronunciation Rules. The ***\<language\>\_extra*** file, if present, is | |||
also used and it's contents are taken as coming after those in | |||
***\<language\>\_list***. | |||
Also the list can be used to specify the stress pattern, or other | |||
properties, of a word. | |||
If the Pronunciation rules are applied to a word and indicate a standard | |||
prefix or suffix, then the word is again looked up in Pronunciation | |||
Dictionary List after the prefix or suffix has been removed. | |||
Lines in the dictionary list have the form: | |||
eg. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
book bUk | |||
~~~~ | |||
Rather than a full pronunciation, just the stress may be given, to | |||
change where it would be otherwise placed by the Pronunciation Rules: | |||
~~~~ {.western} | |||
berlin $2 // stress on second syllable | |||
absolutely $3 // stress on third syllable | |||
for $u // an unstressed word | |||
~~~~ | |||
#### 4.4.1 Multiple Words {.western} | |||
A pronunciation may also be specified for a group of words, when these | |||
appear together. Up to four words may be given, enclosed in brackets. | |||
This may be used for change the pronunciation or stress pattern when | |||
these words occur together, | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
(de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string | |||
~~~~ | |||
or to run them together, pronounced as a single word | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
(of a) @v@ | |||
~~~~ | |||
or to give them a flag when they occur together | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
(such as) sVtS||a2z $pause // precede with a pause | |||
~~~~ | |||
Hyphenated words in the ***\<language\>\_list*** file must also be | |||
enclosed within brackets, because the two parts are considered as | |||
separate words. | |||
#### 4.4.2 Special characters in \<phoneme string\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_\^\_\<language code\> ** | Translate using a different | | |||
| | language. See explanation in 4.3.3 | | |||
| | above. | | |||
+--------------------------------------+--------------------------------------+ | |||
#### 4.4.3 Flags {.western} | |||
A word (or group of words) may be given one or more flags, either | |||
instead of, or as well as, the phonetic translation. | |||
+--------------------------------------+--------------------------------------+ | |||
| \$u | The word is unstressed. In the case | | |||
| | of a multi-syllable word, a slight | | |||
| | stress is applied according to the | | |||
| | default stress rules. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$u1 | The word is unstressed, with a | | |||
| | slight stress on its 1st syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$u2 | The word is unstressed, with a | | |||
| | slight stress on its 2nd syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$u3 | The word is unstressed, with a | | |||
| | slight stress on its 3rd syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| | | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$u+ \$u1+ \$u2+ \$u3+ | As above, but the word has full | | |||
| | stress if it's at the end of a | | |||
| | clause. | | |||
+--------------------------------------+--------------------------------------+ | |||
| | | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$1 | Primary stress on the 1st syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$2 | Primary stress on the 2nd syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$3 | Primary stress on the 3rd syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$4 | Primary stress on the 4th syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$5 | Primary stress on the 5th syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$6 | Primary stress on the 6th syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$7 | Primary stress on the 7th syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| | | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$pause | Ensure a short pause before this | | |||
| | word (eg. for conjunctions such as | | |||
| | "and", some prepositions, etc). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$brk | Ensure a very short pause before | | |||
| | this word, shorter than \$pause (eg. | | |||
| | for some prepositions, etc). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$only | The rule does not apply if a prefix | | |||
| | or suffix has already been removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$onlys | As \$only, except that a standard | | |||
| | plural ending is allowed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$stem | The rule only applies if a suffix | | |||
| | has already been removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$strend | Word is fully stressed if it's at | | |||
| | the end of a clause. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$strend2 | As \$strend, but the word is also | | |||
| | stressed if followed only by | | |||
| | unstressed word(s). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$unstressend | Word is unstressed if it's at the | | |||
| | end of a clause. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$atend | Use this pronunciation if it's at | | |||
| | the end of a clause. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$double | Cause a doubling of the initial | | |||
| | consonant of the following word | | |||
| | (used for Italian). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$capital | Use this pronunciation if the word | | |||
| | has initial capital letter (eg. | | |||
| | polish v Polish). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$allcaps | Use this pronunciation if the word | | |||
| | is all capitals. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$dot | Ignore a . after this word even when | | |||
| | followed by a capital letter (eg. | | |||
| | Mr. Dr. ). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$hasdot | Use this pronunciation if the word | | |||
| | is followed by a dot. (This | | |||
| | attribute also implies \$dot). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$sentence | The rule only applies if the clause | | |||
| | includes end-of-sentence (i.e. it is | | |||
| | not terminated by a comma). For | | |||
| | example, "\$atend \$sentence" means | | |||
| | that the rule only applies at the | | |||
| | end of a sentence. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$abbrev | This has two meanings.\ | | |||
| | 1. If there is no phoneme string: | | |||
| | Speak the word as individual | | |||
| | letters, even if it contains a vowel | | |||
| | (eg. "abc" should be spoken as "a" | | |||
| | "b" "c").\ | | |||
| | 2. If there is a phoneme string: | | |||
| | This word is capitalized because it | | |||
| | is an abbreviation and | | |||
| | capitalization does not indicate | | |||
| | emphasis (if the "emphasize | | |||
| | all-caps" is on). | | |||
+--------------------------------------+--------------------------------------+ | |||
| | | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$accent | Used for the pronunciation of a | | |||
| | single alphabetic character. The | | |||
| | character name is spoken as the | | |||
| | base-letter name plus the accent | | |||
| | (diacritic) name. eg. It can be used | | |||
| | to specify that "â" is spoken as "a" | | |||
| | "circumflex". | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$combine | This word is treated as though it is | | |||
| | combined with the following word | | |||
| | with a hyphen. This may be subject | | |||
| | to fuither conditions for certain | | |||
| | languages. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$alt \$alt2 \$alt3 | These are language specific. Their | | |||
| | use should be described in the | | |||
| | language's \*\*\_list file | | |||
+--------------------------------------+--------------------------------------+ | |||
| | | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$verb | Use this pronunciation if it's a | | |||
| | verb. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$noun | Use this pronunciation if it's a | | |||
| | noun. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$past | Use this pronunciation if it's past | | |||
| | tense. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$verbf | The following word is probably is a | | |||
| | verb. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$verbsf | The following word is probably is a | | |||
| | if it has an "s" suffix. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$nounf | The following word is probably not a | | |||
| | verb. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$pastf | The following word is probably past | | |||
| | tense. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$verbextend | Extend the influence of \$verbf and | | |||
| | \$verbsf. | | |||
+--------------------------------------+--------------------------------------+ | |||
The last group are probably English specific, but something similar may | |||
be useful in other languages. They are a crude attempt to improve the | |||
accuracy of pairs like ob'ject (verb) v 'object (noun) and read | |||
(present) v read (past). | |||
The dictionary list is searched from bottom to top. The first match that | |||
satisfies any conditions is used (i.e. the one lowest down the list). So | |||
if we have: | |||
~~~~ {.western} | |||
to t@ // unstressed version | |||
to tu: $atend // stressed version | |||
~~~~ | |||
then if "to" is at the end of the clause, we get [tu:], if not then we | |||
get [t@]. | |||
#### 4.4.4 Translating a Word to another Word {.western} | |||
Rather than specifying the pronunciation of a word by a phoneme string, | |||
you can specify another "sounds like" word. | |||
Use the attribute **\$text** eg. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
cough coff $text | |||
~~~~ | |||
Alternatively, use the command **\$textmode** on a line by itself to | |||
turn this on for all subsequent entries in the file, until it's turned | |||
off by **\$phonememode**. eg. | |||
~~~~ {.western} | |||
$textmode | |||
cough coff | |||
through threw | |||
$phonememode | |||
~~~~ | |||
This feature cannot be used for the special entries in the **\_list** | |||
files which start with an underscore, such as numbers. | |||
Currently "textmode" entries are only recognized for complete words, and | |||
not for for stems from which a prefix or suffix has been removed (eg. | |||
the word "coughs" would not match the example above). | |||
### 4.5 Conditional Rules {.western} | |||
Rules in a **\_rules** file and entries in a **\_list** file can be made | |||
conditional. They apply only to some voices. This can be useful to | |||
specify different pronunciations for different variants of a language | |||
(dialects or accents). | |||
Conditional rules have **?** and a condition number at the start if | |||
the line in the **\_rules** or **\_list** file. This means that the rule | |||
only applies of that condition number is specified in a **dictrules** | |||
line in the [voice file](voices.html). | |||
If the rule starts with **?!** then the rule only applies if the | |||
condition number is **not** specified in the voice file. eg. | |||
~~~~ {.western} | |||
?3 can't kant // only use this if the voice has: dictrules 3 | |||
?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3 | |||
~~~~ | |||
### 4.6 Numbers and Character Names {.western} | |||
#### 4.6.1 Letter names {.western} | |||
The names of individual letters can be given either in the **\_rules** | |||
or **\_list** file. Sometimes an individual letter is also used as a | |||
word in the language and its pronunciation as a word differs from its | |||
letter name. If so, it should be listed in the **\_list** file, preceded | |||
by an underscore, to give the letter name (as distinct from its | |||
pronunciation as a word). eg. in English: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
_a eI | |||
~~~~ | |||
#### 4.6.2 Numbers {.western} | |||
The operation the TranslateNumber() function is controlled by the | |||
language's `langopts.numbers`{.western} option. This constructs spoken | |||
numbers from fragments according to various options which can be set for | |||
each language. The number fragments are given in the **\_list** file. | |||
+--------------------------------------+--------------------------------------+ | |||
| \_0 to \_9 | The numbers 0 to 9 | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_13 | etc. Any pronunciations which are | | |||
| | needed for specific numbers in the | | |||
| | range \_10 to \_99 | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_2X \_3X | Twenty, thirty, etc., used to make | | |||
| | numbers 10 to 99 | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_0C | The word for "hundred" | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_1C \_2C | Special pronunciation for one | | |||
| | hundred, two hundred, etc., if | | |||
| | needed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_1C0 | Special pronunciation (if needed) | | |||
| | for 100 exactly | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_0M1 | The word for "thousand" | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_0M2 | The word for "million" | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_0M3 | The word for 1000000000 | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_1M1 \_2M1 | Special pronunciation for one | | |||
| | thousand, two thousand, etc, if | | |||
| | needed | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_0and | Word for "and" when speaking numbers | | |||
| | (eg. "two hundred and twenty"). | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_dpt | Word spoken for the decimnal | | |||
| | point/comma | | |||
+--------------------------------------+--------------------------------------+ | |||
| \_dpt2 | Word spoken (if any) at the end of | | |||
| | all the digits after a decimal | | |||
| | point. | | |||
+--------------------------------------+--------------------------------------+ | |||
### 4.7 Character Substitution {.western} | |||
Character substitutions can be specified by using a **.replace**section | |||
at the start of the **\_rules**file. Each line specified either one or | |||
two alphabetic characters to be replaced by another one or two | |||
alphabetic characters. This substitution is done to a word before it is | |||
translated using the spelling-to-phoneme rules. Only the lower-case | |||
version of the characters needs to be specified. eg. | |||
.replace\ | |||
ô ő // (Hungarian) allow the use of o-circumflex instead of | |||
o-double-accute\ | |||
û ű | |||
cx ĉ // (Esperanto) allow "cx" as an alternative to c-circumflex | |||
fi fi // replace a single character ligature by two characters |
@@ -0,0 +1,46 @@ | |||
ESPEAKEDIT PROGRAM {.western} | |||
------------------ | |||
The **espeakedit** program is used to prepare phoneme data for the | |||
eSpeak speech synthesizer. | |||
It has two main functions: | |||
- - | |||
### Installation {.western} | |||
**espeakedit** needs the following packages:\ | |||
(The package names mentioned here are those from the Ubuntu "Dapper" | |||
Linux distribution). | |||
- - - | |||
In addition, a modified version of **praat** | |||
([www.praat.org](www.praat.org)) is used to view and analyse WAV sound | |||
files. This needs the package **libmotif3** to run and **libmotif-dev** | |||
to compile. | |||
### Quick Guide {.western} | |||
This will quickly illustrate the main features. Details of the interface | |||
and key commands are given in [editor\_if.html](editor_if.html) | |||
For more detailed information on analysing sound recordings and | |||
preparing phoneme definitions and keyframe data see | |||
[analyse.html](analyse.html) (to be written). | |||
#### Compiling Phoneme Data {.western} | |||
1. 2. 3. 4. | |||
#### Keyframe Sequences {.western} | |||
1. 2. 3. 4. 5. 6. 7. | |||
#### Text and Prosody Windows {.western} | |||
1. 2. 3. 4. 5. 6. 7. 8. 9. | |||
The Prosody window can be used to experiment with different phoneme | |||
lengths and different intonation. |
@@ -0,0 +1,41 @@ | |||
USER INTERFACE - FORMANT EDITOR {.western} | |||
------------------------------- | |||
### Frame Sequence Display {.western} | |||
The eSpeak editor can display a number of frame-sequencies in tabbed | |||
windows. Each frame can contain a short-time frequency spectrum, | |||
covering the period of one cycle at the sound's pitch. Frames can also | |||
show: | |||
- - - - - | |||
### Text Tab {.western} | |||
Enter text in the top left text window. Click the **Translate** button | |||
to see the phonetic transcription in the text window below. Then click | |||
the **Speak** button to speak the text and show the results in the | |||
**Prosody** tab, if that is open. | |||
If changes are made in the **Prosody** tab, then clicking **Speak** will | |||
speak the modified prosody while **Translate** will revert to the | |||
default prosody settings for the text. | |||
To enter phonetic symbols (Kirschenbaum encoding) in the top left text | |||
window, enclose them within [[ ]]. | |||
### Spect Tab {.western} | |||
The "Spect" tab in the left panel of the eSpeak editor shows information | |||
about the currently selected frame and sequence. | |||
- - - - - - | |||
### Key Commands {.western} | |||
- - - - - | |||
USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"} | |||
------------------------------- | |||
- |
@@ -0,0 +1,52 @@ | |||
# eSpeak NG - Documentation | |||
====================== | |||
### [Usage](commands.md) | |||
### [Languages](languages.md) | |||
### [Voice Files](voices.md) | |||
Voice files specify a language and other characteristics of a voice. | |||
### [Mbrola Voices](mbrola.md) | |||
eSpeak NG can be used as a front-end for Mbrola diphone voices. | |||
### [Pronunciation Dictionary](dictionary.md) | |||
### [Adding a Language](add_language.md) | |||
How to add or improve a language. | |||
### [Phonemes](phonemes.md) | |||
The list of phoneme mnemonics for English, for use in the Pronunciation | |||
Dictionary. | |||
### [Phoneme Tables](phontab.md) | |||
The tables of the phonemes used by each language, with their properties | |||
and sound production. | |||
### [Intonation](intonation.md) | |||
Different intonation "tunes" may be defined for different languages for | |||
clauses which end in full-stop, comma, question-mark, and | |||
exclamation-mark. | |||
### [eSpeak NG Library API](speak_lib.h) | |||
API definition and header file for a shared library version of eSpeak NG. | |||
### [Markup tags](ssml.md) | |||
SSML (Speech Synthesis Markup Language) and HTML tags recognized by | |||
eSpeak NG. | |||
### [The espeakedit program](editor.md) | |||
GUI software to edit vowel files and to compile the phoneme data for use | |||
by eSpeak NG. See also [Espeakedit user interface](editor_if.md). | |||
@@ -0,0 +1,102 @@ | |||
INTONATION {.western} | |||
---------- | |||
In eSpeak's standard intonation model, a "tune" is applied to each | |||
clause depending on its punctuation. Other intonation models may be used | |||
for some languages, such as tone languages. | |||
Named tunes are defined in the text file: | |||
`phsource/intonation`{.western}. This file must be compiled for use by | |||
eSpeak by using the espeakedit program, using the menu option: | |||
`Compile -> Compile intonation data`{.western}. | |||
### Clauses {.western} | |||
The tunes which are used for a language can be specified by using a | |||
`tunes`{.western} statement in a voice file in | |||
`espeak-data/voices`{.western}. eg: | |||
`tunes s1 c1 q1 e1`{.western} | |||
It's parameters are four tune names which are used for clauses which end | |||
in: | |||
1. 2. 3. 4. | |||
A clause consists of the following parts: | |||
- - - - | |||
### Tune definitions {.western} | |||
Here is an example tune definition from the file | |||
`phsource/intonation`{.western}. | |||
~~~~ {.western} | |||
tune s1 | |||
prehead 46 57 | |||
headenv fall 16 | |||
head 4 80 55 -8 -5 | |||
headextend 0 63 38 13 0 | |||
nucleus fall 70 18 24 12 | |||
nucleus0 fall 64 8 | |||
endtune | |||
~~~~ | |||
It contains: | |||
**tune** \<tune name\> | |||
: Starts the definition of a tune. The `tune name`{.western} can | |||
be used in a `tunes`{.western} statements in voice files. | |||
**endtune** \<tune name\> | |||
: Ends the definition of a tune. | |||
**prehead** \<start pitch\> \<end pitch\> | |||
: Gives the pitch path for any series of unstressed syllables before | |||
the first stressed syllable. | |||
**headenv** \<envelope\> \<height\> | |||
: Gives the pitch envelope which is used for stressed syllables in the | |||
head (before the nucleus), including `onset`{.western} and | |||
`headlast`{.western} syllables if these are specified. | |||
`height`{.western} gives a pitch range for the envelope. | |||
**head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\> | |||
: `start pitch`{.western} and `end pitch`{.western} give a pitch | |||
path for the stressed syllables of the head. `steps`{.western} is | |||
the maximum number of stressed syllables for which this applies. If | |||
there are additional stressed syllables, then the | |||
`headextend`{.western} statement is used for them. | |||
: `unstressed start`{.western} and `unstressed end`{.western} give | |||
a pitch path for unstressed syllables between two stressed | |||
syllables. Their values are relative to the pitch of the previous | |||
stressed syllable. Values are usually negative, meaning that the | |||
unstressed syllables have lower pitch than the previous stressed | |||
syllable. | |||
**headextend** \<percentage list\> | |||
: If the head contains more stressed syllables than is specified by | |||
`steps`{.western}, then `percentage list`{.western} is used. It | |||
contains up to 8 numbers which are used repeatedly for the | |||
additional stressed syllables. A value of 0 corresponds to the lower | |||
the `start pitch`{.western} and `end pitch`{.western} values of the | |||
`head`{.western} statement. 100 corresponds to the higher value. | |||
Negative values and values greater than 100 are allowed. | |||
**nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\> | |||
: This gives the pitch envelope and pitch range of the last stressed | |||
syllable of the clause. `tail start`{.western} and | |||
`tail end`{.western} give a pitch path for the unstressed syllables | |||
which are after the last stressed syllable. | |||
**nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\> | |||
: This is used instead of `nucleus`{.western} if there are no | |||
unstressed syllables after the last stressed syllable. In this case, | |||
the pitch changes of the nucleus and the tail and both included in | |||
the nucleus. | |||
The following attributes may also be included: | |||
**onset** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
: This specifies the pitch for the first stressed syllable of the | |||
head. If the `onset`{.western} statement is present, then the | |||
`head`{.western} statement used for the stressed syllables after the | |||
first. | |||
**headlast** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
: This specifies the pitch for the last stressed syllable of the head | |||
(i.e. the stressed syllable before the nucleus). | |||
@@ -0,0 +1,125 @@ | |||
3. LANGUAGES {.western} | |||
------------ | |||
**Languages**. The eSpeak speech synthesizer supports several languages, | |||
however in many cases these are initial drafts and need more work to | |||
improve them. Assistance from native speakers is welcome for these, or | |||
other new languages. Please contact me if you want to help. | |||
eSpeak does text to speech synthesis for the following languages, some | |||
better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, | |||
Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, | |||
German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, | |||
Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, | |||
Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, | |||
Swedish, Tamil, Turkish, Vietnamese, Welsh. | |||
#### Help Needed {.western} | |||
Many of these are just experimental attempts at these languages, | |||
produced after a quick reading of the corresponding article on | |||
wikipedia.org. They will need work or advice from native speakers to | |||
improve them. Please contact me if you want to advise or assist with | |||
these or other languages. | |||
The sound of some phonemes may be poorly implemented, particularly [r] | |||
since I'm English and therefore unable to make a "proper" [r] sound. | |||
A major factor is the rhythm or cadance. An Italian speaker told me the | |||
Italian voice improved from "difficult to understand" to "good" by | |||
changing the relative length of stressed syllables. Identifying | |||
unstressed function words in the xx\_list file is also important to make | |||
the speech flow well. See [Adding or Improving a | |||
Language](add_language.html) | |||
#### Character sets {.western} | |||
Languages recognise text either as UTF8 or alternatively in an 8-bit | |||
character set which is appropriate for that language. For example, for | |||
Polish this is Latin2, for Russian it is KOI8-R. This choice can be | |||
overridden by a line in the voices file to specify an ISO 8859 character | |||
set, eg. for Russian the line: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
charset 5 | |||
~~~~ | |||
will mean that ISO 8859-5 is used as the 8-bit character set rather than | |||
KOI8-R. | |||
In the case of a language which uses a non-Latin character set (eg. | |||
Greek or Russian) if the text contains a word with Latin characters then | |||
that particular word will be pronounced using English pronunciation | |||
rules and English phonemes. Speaking entirely English text using a Greek | |||
or Russian voice will sound OK, but each word is spoken separately so it | |||
won't flow properly. | |||
Sample texts in various languages can be found at | |||
[http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias) | |||
and [www.gutenberg.org](http://www.gutenberg.org/) | |||
### 3.1 Voice Files {.western} | |||
A number of Voice files are provided in the | |||
`espeak-data/voices`{.western} directory. You can select one of these | |||
with the **-v \<voice filename\>** parameter to the speak command, eg: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng -vaf | |||
~~~~ | |||
to speak using the Afrikaans voice. | |||
Language voices generally start with the 2 letter [ISO 639-1 | |||
code](http://en.wikipedia.org/wiki/ISO_639-1) for the language. If the | |||
language does not have an ISO 639-1 code, then the 3 letter [ISO 639-3 | |||
code](http://www.sil.org/iso639-3/codes.asp) can be used. | |||
For details of the voice files see [Voices](voices.html). | |||
#### Default Voice {.western} | |||
### 3.2 English Voices {.western} | |||
### 3.3 Voice Variants {.western} | |||
To make alternative voices for a language, you can make additional voice | |||
files in espeak-data/voices which contains commands to change various | |||
voice and pronunciation attributes. See [voices.html](voices.html). | |||
Alternatively there are some preset voice variants which can be applied | |||
to any of the language voices, by appending `+`{.western} and a variant | |||
name. Their effects are defined by files in | |||
`espeak-data/voices/!v`{.western}. | |||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male | |||
voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and | |||
`+croak +whisper`{.western} for other effects. For example: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng -ven+m3 | |||
~~~~ | |||
The available voice variants can be listed with: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng --voices=variant | |||
~~~~ | |||
### 3.4 Other Languages {.western} | |||
The eSpeak speech synthesizer does text to speech for the following | |||
additional langauges. | |||
### 3.5 Provisional Languages {.western} | |||
These languages are only initial naive implementations which have had | |||
little or no feedback and improvement from native speakers. | |||
### 3.6 Mbrola Voices {.western} | |||
Some additional voices, whose name start with **mb-** (for example | |||
**mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak | |||
does the spelling-to-phoneme translation and intonation. See | |||
[mbrola.html](mbrola.html). |
@@ -0,0 +1,128 @@ | |||
MBROLA VOICES {.western} | |||
------------- | |||
The Mbrola project is a collection of diphone voices for speech | |||
synthesis. They do not include any text-to-phoneme translation, so this | |||
must be done by another program. The Mbrola voices are cost-free but are | |||
not open source. They are available from the Mbrola website at:\ | |||
[http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) | |||
eSpeak can be used as a front-end to Mbrola. It provides the | |||
spelling-to-phoneme translation and intonation, which Mbrola then uses | |||
to generate speech sound. | |||
### Voice Names {.western} | |||
To use a Mbrola voice, eSpeak needs information to translate from its | |||
own phonemes to the equivalent Mbrola phonemes. This has been set up for | |||
only some voices so far. | |||
The eSpeak voices which use Mbrola are named as:\ | |||
**mb-**xxx | |||
where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola | |||
"**en1**" English voice). These voice files are in eSpeak's directory | |||
`espeak-data/voices/mbrola`{.western}. | |||
The installation instructions below use the Mbrola voice "en1" as an | |||
example. You can use other mbrola voices for which there is an | |||
equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}. | |||
There are some additional eSpeak Mbrola voices which speak English text | |||
using a Mbrola voice for a different language. These contain the name of | |||
the Mbrola voice with a suffix **-en**. For example, the voice | |||
**mb-de4-en** will speak English text with a German accent by using the | |||
Mbrola **de4** voice. | |||
### Windows Installation {.western} | |||
The SAPI5 version of eSpeak uses the mbrola.dll. | |||
1. 2. 3. 4. | |||
### Linux Installation {.western} | |||
From eSpeak version 1.44 onwards, eSpeak calls the mbrola program | |||
directly, rather than passing phoneme data to it using a pipe. | |||
1. 2. 3. | |||
### Mbrola Voice Files {.western} | |||
eSpeak's voice files for Mbrola voices are in directory | |||
`espeak-data/voices/mbrola`{.western}. They contain a line:\ | |||
`mbrola <voice> <translation>`{.western} \ | |||
eg.\ | |||
`mbrola en1 en1_phtrans`{.western} | |||
- - | |||
They are binary files which are compiled, using espeakedit, from source | |||
files in `phsource/mbrola`{.western}, see below. | |||
### Mbrola Phoneme Translation Data {.western} | |||
Mbrola phoneme translation files specify translations from eSpeak | |||
phoneme names to mbrola phoneme names. They are referenced from voice | |||
files. | |||
The source files are in `phsource/mbrola`{.western}. These are compiled | |||
using the `espeakedit`{.western} program | |||
(`Compile->Compile mbrola phonemes list`{.western}) to produce data | |||
files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak. | |||
Each line in the mbrola phoneme translation file contains: | |||
`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western} | |||
**\<control\>** | |||
- - - - | |||
**\<espeak ph1\>**\ | |||
The eSpeak phoneme which is to be translated to an mbrola phoneme. | |||
**\<espeak ph2\>**\ | |||
If this field is not `NULL`{.western}, then the match only occurs if | |||
this field matches the next phoneme. If control bit 1 is set, then the | |||
*previous* rather than the *next* phoneme is matched. This field may | |||
also have the following values:\ | |||
`VWL`{.western} matches any Vowel phoneme. | |||
**\<percent\>**\ | |||
If this field is zero then only one mbrola phoneme is used. If this | |||
field is non-zero, then two mbrola phonemes are used, and this value | |||
gives the percentage length of the first mbrola phoneme. | |||
**\<mbrola ph1\>**\ | |||
The mbrola phoneme to which the eSpeak phoneme is translated. This | |||
field may be `NULL`{.western}. | |||
**\<mbrola ph2\>**\ | |||
The second mbrola phoneme. This field is only used if the \<percent\> | |||
field is not zero. | |||
The list is searched from start to finish, until a match is found. | |||
Therefore, a line with more specific match condition should appear | |||
before a line which matches the same eSpeak phoneme but with a more | |||
general condition. | |||
The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes | |||
which are used for each language. Translations for all these should be | |||
given in the mbrola phoneme translation file. In addition, some phonemes | |||
which are referenced from phoneme files (eg. | |||
`phsource/ph_language, phsource/phonemes`{.western}) in lines such as: | |||
~~~~ {.western} | |||
beforenotvowel l/ | |||
reduceto a# 0 | |||
~~~~ | |||
should also be included, even though they don't appear in | |||
`dictsource/dict_phonemes`{.western}. | |||
If the language's \*\_list or \*\_rules files includes rules to speak | |||
words "as English" the mbrola phoneme translation file should include | |||
rules which translate English phonemes into near equivalents, so that | |||
they can spoken by the mbrola voice. |
@@ -0,0 +1,283 @@ | |||
PHONEMES {.western} | |||
-------- | |||
In general a different set of phonemes can be defined for each language. | |||
In most cases different languages inherit the same basic set of | |||
consonants. They can add to these or modify them as needed. | |||
The phoneme mnemonics are based on the scheme by Kirshenbaum which | |||
represents International Phonetic Alphabet symbols using ascii | |||
characters. See: | |||
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf). | |||
Phoneme mnemonics can be used directly in the text input to | |||
**espeak-ng**. They are enclosed within double square brackets. Spaces | |||
are used to separate words, and all stressed syllables must be marked | |||
explicitly. eg:\ | |||
`[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western} | |||
### English Consonants {.western} | |||
`[p]`{.western} | |||
`[b]`{.western} | |||
`[t]`{.western} | |||
`[d]`{.western} | |||
`[tS]`{.western} | |||
**ch**urch | |||
`[dZ]`{.western} | |||
**j**udge | |||
`[k]`{.western} | |||
`[g]`{.western} | |||
`[f]`{.western} | |||
`[v]`{.western} | |||
`[T]`{.western} | |||
**th**in | |||
`[D]`{.western} | |||
**th**is | |||
`[s]`{.western} | |||
`[z]`{.western} | |||
`[S]`{.western} | |||
**sh**op | |||
`[Z]`{.western} | |||
plea**s**ure | |||
`[h]`{.western} | |||
`[m]`{.western} | |||
`[n]`{.western} | |||
`[N]`{.western} | |||
si**ng** | |||
`[l]`{.western} | |||
`[r]`{.western} | |||
**r**ed (Omitted if not immediately followed by a vowel). | |||
`[j]`{.western} | |||
**y**es | |||
`[w]`{.western} | |||
**Some Additional Consonants** | |||
\ | |||
`[C]`{.western} | |||
German i**ch** | |||
`[x]`{.western} | |||
German bu**ch** | |||
`[l^]`{.western} | |||
Italian **gl**i | |||
`[n^]`{.western} | |||
Spanish **ñ** | |||
### English Vowels {.western} | |||
These are the phonemes which are used by the English spelling-to-phoneme | |||
translations (en\_rules and en\_list). In some varieties of English | |||
different phonemes may have the same sound, but they are kept separate | |||
because they may differ in another variety. | |||
In rhotic accents, such as General American, the phonemes | |||
`[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound. | |||
`[@]`{.western} | |||
alph**a** | |||
schwa | |||
`[3]`{.western} | |||
bett**er** | |||
rhotic schwa. In British English this is the same as `[@]`{.western}, | |||
but it includes 'r' colouring in American and other rhotic accents. In | |||
these cases a separate `[r]`{.western} should not be included unless it | |||
is followed immediately by another vowel. | |||
`[3:]`{.western} | |||
n**ur**se | |||
`[@L]`{.western} | |||
simp**le** | |||
`[@2]`{.western} | |||
the | |||
Used only for "the". | |||
`[@5]`{.western} | |||
to | |||
Used only for "to". | |||
`[a]`{.western} | |||
tr**a**p | |||
`[aa]`{.western} | |||
b**a**th | |||
This is `[a]`{.western} in some accents, `[A:]`{.western} in others. | |||
`[a#]`{.western} | |||
**a**bout | |||
This may be `[@]`{.western} or may be a more open schwa. | |||
`[A:]`{.western} | |||
p**al**m | |||
`[A@]`{.western} | |||
st**ar**t | |||
`[E]`{.western} | |||
dr**e**ss | |||
`[e@]`{.western} | |||
squ**are** | |||
`[I]`{.western} | |||
k**i**t | |||
`[I2]`{.western} | |||
**i**ntend | |||
As `[I]`{.western}, but also indicates an unstressed syllable. | |||
`[i]`{.western} | |||
happ**y** | |||
An unstressed "i" sound at the end of a word. | |||
`[i:]`{.western} | |||
fl**ee**ce | |||
`[i@]`{.western} | |||
n**ear** | |||
`[0]`{.western} | |||
l**o**t | |||
`[V]`{.western} | |||
str**u**t | |||
`[u:]`{.western} | |||
g**oo**se | |||
`[U]`{.western} | |||
f**oo**t | |||
`[U@]`{.western} | |||
c**ure** | |||
`[O:]`{.western} | |||
th**ou**ght | |||
`[O@]`{.western} | |||
n**or**th | |||
`[o@]`{.western} | |||
f**or**ce | |||
`[aI]`{.western} | |||
pr**i**ce | |||
`[eI]`{.western} | |||
f**a**ce | |||
`[OI]`{.western} | |||
ch**oi**ce | |||
`[aU]`{.western} | |||
m**ou**th | |||
`[oU]`{.western} | |||
g**oa**t | |||
`[aI@]`{.western} | |||
sc**ie**nce | |||
`[aU@]`{.western} | |||
h**our** | |||
### Some Additional Vowels {.western} | |||
Other languages will have their own vowel definitions, eg: | |||
+--------------------------------------+--------------------------------------+ | |||
| `[e]`{.western} | German **eh**, French **é** | | |||
+--------------------------------------+--------------------------------------+ | |||
| `[o]`{.western} | German **oo**, French **o** | | |||
+--------------------------------------+--------------------------------------+ | |||
| `[y]`{.western} | German **ü**, French **u** | | |||
+--------------------------------------+--------------------------------------+ | |||
| `[Y]`{.western} | German **ö**, French **oe** | | |||
+--------------------------------------+--------------------------------------+ | |||
`[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western} |
@@ -0,0 +1,174 @@ | |||
PHONEME TABLES {.western} | |||
-------------- | |||
A phoneme table defines all the phonemes which are used by a language, | |||
together with their properties and the data for their production as | |||
sounds. | |||
Generally each language has its own phoneme table, although additional | |||
phoneme tables can be used for different voices within the language. | |||
These alternatives are referenced from Voice files. | |||
A phoneme table does not need to define all the phonemes used by a | |||
language. It can inherit the phonemes from a previously defined phoneme | |||
table. For example, a phoneme table may redefine (or add) some of the | |||
vowels that it uses, but inherit most of its consonants from a standard | |||
set. | |||
The source files for the phoneme data are in the "phsource" directory in | |||
the espeakedit download package. "Vowel files", which are referenced in | |||
FMT(), VowelStart(), and VowelEnding() instructions are made using the | |||
espeakedit program. | |||
### Phoneme files {.western} | |||
The phoneme tables are defined in a master phoneme file, named | |||
**phonemes**. This starts with the **base** phoneme table followed by | |||
phoneme tables for other languages and voices. These inherit phonemes | |||
from the **base** table or previously defined tables. | |||
In addition to phoneme definitions, the phoneme file can contain the | |||
following: | |||
**include** \<filename\> | |||
: Includes the text of the specified file at this point. This allows | |||
different phoneme tables to be kept in different text files, for | |||
convenience. \<filename\> is a relative path. The included file can | |||
itself contain **include** statements. | |||
**phonemetable** \<name\> \<parent\> | |||
: Starts a new phoneme table, and ends the previous table.\ | |||
\<name\> Is the name of this phoneme table. This name is used in | |||
Voice files.\ | |||
\<parent\> Is the name of a previously defined phoneme table whose | |||
phoneme definitions are inherited by this one. The name **base** | |||
indicates the first (base) phoneme table. | |||
### Phoneme definitions {.western} | |||
Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and | |||
later. | |||
A phoneme table contains a list of phoneme definitions. Each starts with | |||
the keyword **phoneme** and the phoneme name (this is the name used in | |||
the pronunciation rules in a language's \*\_rules and \*\_list files), | |||
and ends with the keyword **endphoneme**. For example: | |||
~~~~ {.western} | |||
phoneme aI | |||
vowel | |||
starttype #a endtype #i | |||
length 230 | |||
FMT(vowels/ai) | |||
endphoneme | |||
phoneme s | |||
vls alv frc sibilant | |||
voicingswitch z | |||
lengthmod 3 | |||
Vowelin f1=0 f2=1700 -300 300 f3=-100 80 | |||
Vowelout f1=0 f2=1700 -300 250 f3=-100 80 rms=20 | |||
IF nextPh(isPause) THEN | |||
WAV(ufric/s_) | |||
ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN | |||
WAV(ufric/s!) | |||
ENDIF | |||
WAV(ufric/s) | |||
endphoneme | |||
~~~~ | |||
A phoneme definition contains both static properties and executed | |||
instructions. The instructions may contain conditional statements, so | |||
that the effect of the phoneme may be different depending on adjacent | |||
phonemes, whether the syllable is stressed, etc. | |||
The instructions of a phoneme are interpreted in two different phases. | |||
In the first phase, the instructions may change the phoneme and replace | |||
it by a different phoneme. In the second phase, instructions are used to | |||
produce the sound for the phoneme. | |||
The **import\_phoneme** statement can be used to copy a previously | |||
defined phoneme from a specified phoneme table. For example: | |||
~~~~ {.western} | |||
phoneme t | |||
import_phoneme base/t[ | |||
endphoneme | |||
~~~~ | |||
means: `phoneme t`{.western} in this phoneme table is a copy of | |||
`phoneme t[`{.western} from phoneme table "base". A **length** | |||
instruction can be used after **import\_phoneme** to vary the length | |||
from the original. | |||
### Phoneme Properties {.western} | |||
Within the phoneme definition the following lines may occur: ( (V) | |||
indicates only for vowels, (C) only for consonants) | |||
### Phoneme Instructions {.western} | |||
Phoneme Instructions may be included within conditional statements. | |||
During the first phase of phoneme interpretation, an instruction which | |||
causes a change to a different phoneme will terminate the instructions. | |||
During the second phase, FMT() and WAV() instructions will terminate the | |||
instructions. | |||
### Conditional Statements {.western} | |||
Phoneme definitions can contain conditional statements such as: | |||
~~~~ {.western} | |||
IF <condition> THEN | |||
<statements> | |||
ENDIF | |||
~~~~ | |||
or more generally: | |||
~~~~ {.western} | |||
IF <condition> THEN | |||
<statements> | |||
ELIF <condition> THEN | |||
<statements> | |||
... | |||
ELSE | |||
<statements> | |||
ENDIF | |||
~~~~ | |||
where the `ELSE`{.western} and multiple `ELSE`{.western} parts are | |||
optional. | |||
Multiple conditions may be joined with `AND`{.western} or | |||
`OR`{.western}, but not a mixture of `AND`{.western}s and | |||
`OR`{.western}s. | |||
A condition may be preceded by `NOT`{.western}. For example: | |||
~~~~ {.western} | |||
IF <condition> AND NOT <condition> THEN | |||
<statements> | |||
ENDIF | |||
~~~~ | |||
**Condition** Can be: | |||
**Attributes** | |||
### Sound Specifications {.western} | |||
There are three ways to produce sounds: | |||
- - - | |||
### Vowel Transitions {.western} | |||
These specify how a consonant affects an adjacent vowel. A consonant may | |||
cause a transition in the vowel's formants as the mouth changes shape | |||
between the consonant and the vowel. The following attributes may be | |||
specified. Note that the maximum rate of change of formant frequencies | |||
is limited by the speak program. | |||
@@ -0,0 +1,64 @@ | |||
TEXT MARKUP {.western} | |||
----------- | |||
### SSML: Speech Synthesis Markup Language {.western} | |||
The following markup tags and attributes are recognised: | |||
**\<speak\>** | |||
- - | |||
**\<voice\>** | |||
- - - - - | |||
**\<prosody\>** | |||
- - - - | |||
**\<say-as\>** | |||
- - - - - | |||
**\<mark\>** name | |||
**\<s\>** | |||
- | |||
**\<p\>** | |||
- | |||
**\<sub\>** alias | |||
**\<tts:style\>** | |||
- - | |||
**\<audio\>** src | |||
**\<emphasis\>** | |||
- | |||
**\<break\>** | |||
- - | |||
### HTML {.western} | |||
eSpeak can speak HTML text directly, or text containing both SSML and | |||
HTML markup.\ | |||
Any unrecognised tags are ignored. | |||
The following tags case a sentence break.\ | |||
**\<br\> \<dd\> \<li\> \<img\> \<td\> ** | |||
The following tags case a paragraph break.\ | |||
**\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> ** | |||
Text between the following tags is ignored.\ | |||
**\<script\> ... \</script\> \ | |||
\<style\> ... \</style\> ** |
@@ -0,0 +1,311 @@ | |||
5. VOICES {.western} | |||
--------- | |||
### 5.1 Voice Files {.western} | |||
A Voice file specifies a language (and possibly a language variant or | |||
dialect) together with various attributes that affect the | |||
characteristics of the voice quality and how the language is spoken. | |||
Voice files are placed in the `espeak-data/voices`{.western} directory, | |||
or within subdirectories in there. | |||
The available voice files can be listed by: | |||
~~~~ {.western} | |||
espeak-ng --voices | |||
or | |||
espeak-ng --voices=<language> | |||
~~~~ | |||
also | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng --voices=<variant> | |||
~~~~ | |||
Lists voice variants which can be applied to eSpeak voices. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng --voices=<mbrola> | |||
~~~~ | |||
Lists the Mbrola voices. | |||
### 5.2 Contents of Voice Files {.western} | |||
The **language** attribute is mandatory. All the other attributes are | |||
optional. | |||
#### Identification Attributes {.western} | |||
**name \<name\>** | |||
A name given to this voice. | |||
**language \<language code\> [\<priority\>]** | |||
This attribute should appear before the other attributes which are | |||
listed below. | |||
It selects the default behaviour and characteristics for the language, | |||
and sets default values for "phonemes", "dictionary" and other | |||
attributes. The \<language code\> should be a two-letter ISO 639-1 | |||
language code. One or more language variant codes may be appended, | |||
separated by hyphens. (eg. en-uk-north). | |||
The optional \<priority\> value gives the preference of this voice | |||
compared with others for the specified language. A low value indicates a | |||
more preferred voice. The default value is 5. | |||
More than one **language** line may be present. A voice may be selected | |||
for other related languages (variants which have the same initial 2 | |||
letter language code as the specified language), but it will be less | |||
preferred for these. Different language variants may be specified by | |||
additional **language** lines in order to indicate that this is a | |||
preferred voice for them also. Eg. | |||
~~~~ {.western} | |||
language en-uk-north | |||
language en | |||
~~~~ | |||
indicates that this is voice is for the "en-uk-north" dialect, but it is | |||
also a main choice when a general "en" language is specified. Without | |||
the second **language** line, it would be disfavoured for "en" for being | |||
a more specialised voice. | |||
**gender \<gender\> [\<age\>]** | |||
This attribute is only a label for use in voice selection. It doesn't | |||
change the sound of the voice. | |||
\<gender\> may be male, female, or unknown.\ | |||
\<age\> is optional and gives an age in years. | |||
**pitch \<base\> \<range\>** | |||
Two integer values. The first gives a base pitch to the voice (value in | |||
Hz) The second controls the range of pitches used by the voice. Setting | |||
it equal to the base pitch will give a monotone. The default values are | |||
82 118. | |||
**formant \<number\> \<frequency\> \<strength\> \<width\> | |||
\<freq\_add\>** | |||
Systematically adjusts the frequency, strength, and width of the | |||
resonance peaks of the voice. Values are percentages of the default | |||
values. Changing these affects the tone/quality of the voice. | |||
**freq\_add**Adds a constant value (in Hz) to the frequency of the | |||
formant peak. The value may be negative. | |||
- - - - | |||
**echo \<delay\> \<amplitude\>** | |||
Parameter 1 gives the delay in mS (0 to 250mS).\ | |||
Parameter 2 gives the echo amplitude (0 to 100).\ | |||
Adding some echo can give a clearer or more interesting sound, | |||
especially when listening through a domestic stereo sound system, rather | |||
than small computer speakers. | |||
**tone** | |||
Controls the tone of the sound.\ | |||
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\> | |||
which define a frequency response graph. Frequency is in Hz and | |||
amplitude is in the range 0 to 255. The default is: | |||
` `{.western}`tone 600 170 1200 135 2000 110`{.western} | |||
This means that from frequency 0Hz to 600Hz the amplitude is 170. From | |||
600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases | |||
to 110 at 2000Hz and remains at 110 at higher frequencies. This | |||
adjustment applies only to voiced sounds such as vowels and sonorant | |||
consonants (such as [n] and [l]). Unvoiced sounds such as [s] are | |||
unaffected. | |||
This **tone** statement can also appear in | |||
`espeak-data/config`{.western}, in which case it applies to all voices | |||
which don't have their own **tone** statement. | |||
**flutter \<value\>** | |||
Default value: 2.\ | |||
Adds pitch fluctuations to give a wavering or older-sounding voice. A | |||
large value (eg. 20) makes the voice sound "croaky". | |||
**roughness \<value\>** | |||
Default value: 2. Range 0 - 7\ | |||
Reduces the amplitude of alternate waveform cycles in order to make the | |||
voice sound creaky. | |||
**voicing \<value\>** | |||
Default value: 100.\ | |||
Adjusts the strength of formant-synthesized sounds (vowels and sonorant | |||
consonants). | |||
**consonants \<value\> \<value\>** | |||
Default values: 100, 100.\ | |||
Adjusts the strength of noise sounds which are used in consonants. The | |||
first value is the strength of unvoiced consonants such as "s" and "t". | |||
The second value is the strength of the noise component of voiced | |||
consonants such as "z" and "d". | |||
**breath \<up to 8 integer values\>** | |||
Default values: 0.\ | |||
Adds noise which corresponds to the formant frequency peaks. The values | |||
give the strength of noise for each formant peak (formants 1 to 8). | |||
Use together with a low or zero value of the **voicing** attribute to | |||
make a "wisper". For example:\ | |||
`breath 75 75 60 40 15 10 breathw 150 150 200 200 400 400 voicing 18 flutter 20 formant 0 100 0 100 // remove formant 0 `{.western} | |||
**breathw \<up to 8 integer values\>** | |||
These values give bandwidths of the noise peaks of the **breath** | |||
attribute. If **breathw** values are not given, then suitable default | |||
values will be used. | |||
**speed \<value\>** | |||
Default value 100.\ | |||
Adjusts the speaking speed by a percentage of the default rate. This | |||
can be used if a language voice seems faster or slower compared to other | |||
voices. | |||
**phonemes \<name\>** | |||
Specifies which set of phonemes to use from those contained in the | |||
phontab, phonindex, and phondata data files. This is a **phonemetable** | |||
name as given in the "phoneme" source file. | |||
This parameter is usually not needed as it is set by default to the | |||
first two letters of the "language" parameter. However, different voices | |||
of the same language can use different phoneme sets, to give different | |||
accents. | |||
**dictionary \<name\>** | |||
Specifies which pair of dictionary files to use. eg. "english" indicates | |||
that *speak-data/en\_dict* should be used to translate from words to | |||
phonemes. This parameter is usually not needed as it is set by default | |||
to the first two letters of "language" parameter. | |||
**dictrules \<list of rule numbers\>** | |||
Gives a list of conditional dictionary rules which are applied for this | |||
voice. Rule numbers are in the range 0 to 31 and are specific to a | |||
language dictionary. They apply to rules in the language's **\_rules** | |||
dictionary file and also its **\_list** exceptions list. See | |||
[dictionary.html](dictionary.html). | |||
**replace \<flags\> \<phoneme\> \<replacement phoneme\>** | |||
Replace a phoneme by another whenever it occurs. | |||
\<replacement phoneme\> may be NULL. | |||
Flags: bit 0: replacement only occurs on the final phoneme of a word.\ | |||
Flags: bit 1: replacement doesn't occur in stressed syllables.\ | |||
eg. | |||
~~~~ {.western} | |||
replace 0 h NULL // drops h's | |||
replace 0 V U // replaces vowel in 'strut' by that in 'foot' | |||
// as occurs in northern British English | |||
replace 3 N n // change 'fishing' to 'fishin' etc. | |||
// (only the last phoneme of a word, only in unstressed syllables) | |||
~~~~ | |||
The phoneme mnemonics can be defined for each language, but some are | |||
listed in [phonemes.html](phonemes.html) | |||
**stressLength \<8 integer values\>** | |||
Eight integer parameters. These control the relative lengths of the | |||
vowels in stressed and unstressed syllables. | |||
- - - - - - - - | |||
**stressAdd \<8 integer values\>** | |||
Eight integer parameters. These are added to the voice's corresponding | |||
stressLength values. They are used in the voice variant files in | |||
`espeak-data/voices/!v`{.western} to give some variety. Negative values | |||
may be used. | |||
**stressAmp \<8 integer values\>** | |||
Eight integer parameters. These control the relative amplitudes of the | |||
vowels in stressed and unstressed syllables (see stressLength above). | |||
The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although | |||
these defaults may be different for particular languages. | |||
**intonation \<param1\>** | |||
- - - - | |||
**charset \<param1\>** | |||
The ISO 8859 character set number. (not all are implemented). | |||
**dictmin \<value\>** | |||
Used for some languages to detect if additional language data is | |||
installed. If the size of the compiled dictionary data for the language | |||
(the file `espeak-data/*_dict`{.western}) is less than this size then a | |||
warning is given. | |||
**alphabet2 \<alphabet\> \<language\>** | |||
Used to specify a language to be used to speak words which are written | |||
in a non-native alphabet. eg: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
alphabet2 cyr ru | |||
~~~~ | |||
Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default | |||
language for latin alphabet is English. | |||
**dictdialect \<dialect\>** | |||
Words can be marked in the \*\_list or \*\_rules file to be spoken using | |||
a foreign voice. This **dictdialect** attribute can be used to specify | |||
which dialect of the foreign language should be used, instead of the | |||
default dialect. The currently available dialects are:\ | |||
**en-us** (US English)\ | |||
**es-la** (Latin American Spanish).\ | |||
eg. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
dictdialect en-us | |||
~~~~ | |||
This means that any words or rules which are maked with \_\^\_EN will be | |||
spoken with the US English voice instead of the default UK English | |||
voice. | |||
Additional attributes are available to set various internal options | |||
which control how language is processed. These would normally be set in | |||
the program code rather than in a voice file. | |||
A number of Voice files are provided in the | |||
`espeak-data/voices`{.western} directory. You can select one of these | |||
with the **-v \<voice filename\>** parameter to the speak command. | |||
**default** | |||
This voice is used if none is specified in the speak command. You can | |||
copy your preferred voice to "default" so you can use the speak command | |||
without the need to specify a voice. | |||
For a list of voices provided for English and other languages see | |||
[Languages](languages.html). |