@@ -46,6 +46,8 @@ libespeak-ng.so* | |||
*.html | |||
*.exe | |||
src/espeak-ng.1 | |||
src/espeak-ng | |||
src/espeakedit | |||
src/speak-ng |
@@ -32,7 +32,6 @@ EXTRA_DIST += ChangeLog | |||
all-local: \ | |||
espeak-data/phontab \ | |||
docs/speak_lib.h \ | |||
dictionaries \ | |||
mbrola | |||
@@ -53,12 +52,19 @@ distclean-local: | |||
##### documentation: | |||
%.html: %.md _layouts/webpage.html | |||
cat $< | sed -e 's/\.md)/.html)/g' | kramdown --template _layouts/webpage.html > $@ | |||
cat $< | sed -e 's/\.md)/.html)/g' -e 's/\.ronn/.html/g' | \ | |||
kramdown --template _layouts/webpage.html > $@ | |||
docs: README.html | |||
%.html: %.ronn | |||
ronn --html $< | |||
docs/speak_lib.h: src/include/espeak-ng/speak_lib.h | |||
cp $< $@ | |||
src/espeak-ng.1: src/espeak-ng.1.ronn | |||
ronn --roff $< | |||
docs: docs/index.html \ | |||
src/espeak-ng.1.html \ | |||
README.html \ | |||
src/espeak-ng.1 | |||
##### build targets: | |||
@@ -9,6 +9,7 @@ | |||
- [Cross-Compiling For Windows](#cross-compiling-for-windows) | |||
- [Testing](#testing) | |||
- [Installing](#installing) | |||
- [Documentation](#documentation) | |||
- [Building Voices](#building-voices) | |||
- [Adding New Voices](#adding-new-voices) | |||
- [Praat Changes](#praat-changes) | |||
@@ -42,6 +43,7 @@ Optionally, you need: | |||
To build the documentation, you need: | |||
1. the `kramdown` markdown processor. | |||
2. the `ronn` man-page markdown processor. | |||
### Debian | |||
@@ -65,6 +67,7 @@ Documentation dependencies: | |||
| Dependency | Install | | |||
|---------------|--------------------------------------| | |||
| kramdown | `sudo apt-get install ruby-kramdown` | | |||
| ronn | `sudo apt-get install ruby-ronn` | | |||
Cross-compiling for windows: | |||
@@ -181,6 +184,14 @@ already have an espeak-ng install by running: | |||
find /usr/lib | grep libespeak-ng | |||
## Documentation | |||
The [main documentation](docs/index.md) for eSpeak NG provides more information | |||
on using and creating voices/languages for for eSpeak NG. | |||
The [espeak-ng](src/espeak-ng.1.ronn) command-line documentation provides a | |||
reference of the different command-line options available, with example usage. | |||
## Building Voices | |||
If you are modifying a language's phoneme, voice or dictionary files, you |
@@ -37,12 +37,14 @@ | |||
.group a | |||
a a | |||
a (a a_! | |||
ai ai | |||
au au | |||
ap ap // prefix | |||
.group ā | |||
ā a: | |||
ā (ā a:_! | |||
.group b | |||
b b | |||
@@ -109,6 +111,8 @@ | |||
.group i | |||
i i | |||
i (i i_! | |||
i (ī i_! | |||
ie ie | |||
iu iu | |||
@@ -1150,6 +1154,7 @@ | |||
.group u | |||
u u | |||
u (u u_! | |||
ui ui | |||
.group ū |
@@ -1,5 +1,15 @@ | |||
6. ADDING OR IMPROVING A LANGUAGE {.western} | |||
--------------------------------- | |||
# Table of contents | |||
* [Adding or improving a language](#adding-or-improving-a-language) | |||
* [Language Code](#language-code) | |||
* [Language Files](#language-files) | |||
* [Voice File](#voice-file) | |||
* [Phoneme Definition File](#phoneme-definition-file) | |||
* [Dictionary Files](#dictionary-files) | |||
* [Program Code](#program-code) | |||
* [Improving a Language](#improving-a-language) | |||
# Adding or improving a language | |||
Most of the work doesn't need any programming knowledge. Just an | |||
understanding of the language, an awareness of its features, patience | |||
@@ -11,10 +21,9 @@ In many cases it should be fairly easy to add a rough implementation of | |||
a new language, hopefully enough to be intelligible. After that it's a | |||
gradual process of improvement. | |||
### 6.1 Language Code {.western} | |||
## Language Code | |||
Generally, the language's international [ISO | |||
639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify | |||
Generally, the language's international [ISO 639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify | |||
the language. It is used in the filenames which contain the language's | |||
data. In the examples below the code **"fr"** is used as an example. | |||
Replace this with the code of your language. | |||
@@ -26,31 +35,28 @@ It is possible to have different variants of a language for different | |||
dialects. For example the sound of some phonemes are changed, or some of | |||
the pronunciation rules differ. | |||
### 6.2 Language Files {.western} | |||
## Language Files | |||
The following files are needed for your language. | |||
- - - - | |||
The **fr\_rules** and **fr\_list** files are compiled to produce the | |||
file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking. | |||
### 6.3 Voice File {.western} | |||
## Voice File | |||
Each language needs a voice file in **espeak-data/voices** or | |||
**espeak-data/voices/test**. The filename of the default voice for a | |||
language should be the same as the language code (eg. "fr" for French). | |||
Details of the contents of voice files are given in | |||
[voices.html](http://espeak.sf.net/voices.html). | |||
[voices](voices.md). | |||
The simplest voice file would contain just 2 lines to give the language | |||
name and language code, eg: | |||
~~~~ {.western} | |||
name french | |||
language fr | |||
~~~~ | |||
name french | |||
language fr | |||
This language code specifies which phoneme table and dictionary to use | |||
(i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If | |||
@@ -59,7 +65,7 @@ attributes in the voice file. For example you may want to start the | |||
implementation of a new language by using the phoneme table of an | |||
existing language. | |||
### 6.4 Phoneme Definition File {.western} | |||
## Phoneme Definition File | |||
You must first decide on the set of phonemes (vowel and consonant | |||
sounds) for the language. These should be defined in a phoneme | |||
@@ -67,10 +73,8 @@ definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your | |||
language. A reference to this file is then included at the end of the | |||
master phoneme file, **phsource/phonemes**, eg: | |||
~~~~ {.western} | |||
phonemetable fr base | |||
include ph_french | |||
~~~~ | |||
phonemetable fr base | |||
include ph_french | |||
This example defines a phoneme table **"fr"** which inherits the | |||
contents of phoneme table **"base"**. Its contents are found in the file | |||
@@ -89,7 +93,7 @@ additional consonants that are needed), or phonemes whose definitions | |||
differ from the inherited version (eg. the redefinition of a consonant). | |||
Details of phonemes files are given in | |||
[phontab.html](http://espeak.sf.net/phontab.html). | |||
[phontab](phontab.md). | |||
The **Compile phoneme data** function of the **espeakedit** program | |||
compiles the phonemes files of all languages to produce the files | |||
@@ -101,7 +105,7 @@ in eSpeak, together with the available vowel files which can be used to | |||
define vowel phonemes, will be sufficient. At least for an initial | |||
implementation. | |||
### 6.5 Dictionary Files {.western} | |||
## Dictionary Files | |||
Once the language's phonemes have been defined, then pronunciation | |||
dictionary data can be produced in order to translate the language's | |||
@@ -111,23 +115,31 @@ exceptions list, and attributes of certain words). The corresponding | |||
compiled data file is **espeak-data/fr\_dict** which is produced from | |||
**fr\_rules** and **fr\_list** sources by the command: | |||
> `espeak-ng --compile=fr`{.western}. | |||
`espeak-ng --compile=fr` | |||
Or by using the **espeakedit** program. | |||
Details of the contents of the dictionary files are given in | |||
[dictionary.html](http://espeak.sf.net/dictionary.html). | |||
[dictionary](dictionary.md). | |||
The **fr\_list** file contains: | |||
- - - - | |||
### 6.6 Program Code {.western} | |||
* Pronunciations which exceptions to the rules in fr_rules, (eg. foreign names). | |||
* Pronunciation of letter names, symbol names, and punctuation names. | |||
* Pronunciation of numbers. | |||
* Attributes for words. For example, common function words which should not be stressed, or conjunctions which should be preceded by a pause. | |||
## Program Code | |||
The behaviour of the eSpeak program is controlled by various options | |||
such as: | |||
- - - - | |||
* Default rules for which syllable of a word has the main stress. | |||
* Relative lengths and amplitude of vowels in stressed and unstressed syllables. | |||
* Which intonation tunes to use. | |||
* Rules for speaking numbers. | |||
The function SetTranslator() at the start of the source code file | |||
tr\_languages.cpp recognizes the language code and sets the appropriate | |||
@@ -135,18 +147,19 @@ options. For a new language, you would add its language code and the | |||
required options in SetTranslator(). However, this may not be necessary | |||
during testing because most of the options can also be set in the voice | |||
file in espeak-data/voices (see [Voice | |||
files](http://espeak.sf.net/voices.html)). | |||
files](voices.md)). | |||
### 6.7 Improving a Language {.western} | |||
## Improving a Language | |||
Listen carefully to the eSpeak voice. Try to identify what sounds wrong | |||
and what needs to be improved. | |||
- - - - - | |||
**If you are interested in working on a language, please contact me so | |||
that I can set up the initial data and discuss the features of the | |||
language.** | |||
* Make the spelling-to-phoneme translation rules more accurate, including the position of stressed syllables within words. Some languages are easier than others. I expect most are easier than English. | |||
* Improve the sounds of the phonemes. It may be that a phoneme should sound different depending on adjacent sounds, or whether it's at the start or the end of a word, between vowels, in a stressed or unstressed syllable, etc. This may consist of making small adjustments to vowel and diphthong quality or length, or adjusting the strength of consonants. Phoneme definitions can include conditional statements which can be used to change the sound of a phoneme depending on its environment. Bigger changes may be recording new or replacement consonant sounds, or may even need program code to implement new types of sounds. | |||
* Some common words should be added to the dictionary (the fr_list file for the language) with an "unstressed" attribute **\$u** or **\$u+** (eg. in English, words such as "the", "is", "had", "my", "she", "of", "in", "some"), or should be preceded by a short pause (such as "and", "but", "which"), or have other attributes, in order to make the speech flow better. | |||
* Improve the rhythm of the speech by adjusting the relative lengths of vowels in different contexts, eg. stressed/unstressed syllable, or depending on the following phonemes. This is important for making the speech sound good for the language. | |||
* Make new intonation "tunes" for statements or questions (see [Intonation](intonation.md)). | |||
For most of the eSpeak voices, I do not speak or understand the | |||
language, and I do not know how it should sound. I can only make |
@@ -1,69 +0,0 @@ | |||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | |||
<html> | |||
<head> | |||
<title></title> | |||
<meta name="GENERATOR" content="Quanta Plus"> | |||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |||
</head> | |||
<body> | |||
<A href="docindex.html">Back</A> | |||
<hr> | |||
<h2>ANALYSIS</h2> | |||
<hr> | |||
(Further notes are needed) | |||
<p> | |||
Recordings of spoken words and phrases can be analysed to try and make eSpeak match a language more closely. | |||
Unlike most other (larger and better quality) synthesizers, eSpeak's data is not produced directly from recorded sounds. To use an analogy, it's like a drawing or sketch compared with a photograph. Or vector graphics compared with a bitmap image. It's smaller, less accurate, with less subtlety, but it can sometimes show some aspects of the picture more clearly than a more accurate image. | |||
<h4>Recording Sounds</h4> | |||
Recordings should be made while speaking slowly, clearly, and firmly and loudly (but not shouting). Speak about half a metre from the microphone. Try to avoid background noise and hum interference from electrical power cables. | |||
<h4>Praat</h4> | |||
I use a modified version of the praat program (<a href="www.praat.org">www.praat.org</a>) to view and analyse both sound recordings and output from eSpeak. The modification adds a new function (<code>Spectrum->To_eSpeak</code>) which analysis a voiced sound and produces a file which can be loaded into espeakedit. Details of the modification are in the <code>"praat-mod"</code> directory in the espeakedit package. | |||
The analysis contains a sequence of frames, one per cycle at the speech's fundamental frequency. Each frame is a short time spectrum, together with praat's estimation of the f1 to f5 formant frequencies at the time of that cycle. | |||
I also use Praat's <code>New->Record_mono_sound</code> function to make sound recordings. | |||
<h3>Vowels and Diphthongs</h3> | |||
<h4>Analysing a Recording</h4> | |||
Make a recording, with a male voice, and trim it in Praat to keep just the required vowel sound. Then use the new <code>Spectrum->To_eSpeak</code> modification (this was named <code>To_Spectrogram2</code> in earlier versions) to analyse the sound. It produces a file named <code>"spectrum.dat"</code>. | |||
Load the <code>"spectrum.dat"</code> file into espeakedit. Espeakedit has two Open functions, <code>File->Open</code> and <code>File->Open2</code>. They are the same, except that they remember different paths. I generally use <code>File->Open2</code> for reading the <code>"spectrum.dat"</code> file. | |||
The data is displayed in espeakedit as a sequence of spectrum frames (see <a href="editor.html">editor.html</a>). | |||
<h4>Tone Quality</h4> | |||
It can be difficult to match the tonal quality of a new vowel to be compatible with existing vowel files. This is determined by the relative heights and widths of the formant peaks. These vary depending on how the recording was made, the microphone, and the strength and tone of the voice. Also the positions of the higher peaks (F3 upwards) can vary depending on the characteristics of the speaker's voice. Formant peaks correspond to resonances within the mouth and throat, and they depend on its size and shape. With a female voice, all the formants (F1 upwards) are generally shifted to higher frequencies. | |||
For these reasons, it's best to use a male voice, and to use its analysed spectra only as guidance. Rather than construct formant-peaks entirely to match the analysed data, instead copy keyframes from a similar existing vowel. Then make small adjustments to match the position of the F1, F2, F3 formant peaks and hopefully produce the required vowel sound. | |||
<h4>Using an Existing Vowel File</h4> | |||
Choose a similar vowel file from <code>phsource/vowel</code> and open it into espeakedit. It may be useful to use <code>phsource/vowel/vowelchart</code> as a map to show how vowel files compare with each other. You can select a keyframe from the vowel file and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame of the new spectrum sequence. Then adjust the peaks to match the new frame. Press F1 to hear the sound of the formant peaks in the selected frame. | |||
The F0 peak is provided in order to adjust the correct balance of low frequencies, below the F1 peak. If the sound is too muffled, or conversely, too "thin", try adjusting the amplitude or position of the F0 peak. | |||
<h4>Length and Amplitude</h4> | |||
Use an existing vowel file as a guide for how to set the amplitude and length of the keyframes. At the right of each keyframe, its length is shown in mS and under that is its relative (RMS) amplitude. | |||
The second keyframe should be marked with a red marker (use CTRL-M to toggle this). This divides the vowel into the front-part (with one frame), and the rest. | |||
Use F2 to play the sound of the new vowel sequence. It will also produce a WAV file (the default name is speech.wav) which you can read into praat to see whether it has a sensible shape. | |||
<h4>Using the New Vowel</h4> | |||
Make a new directory (eg. vwl_xx) in phsource for your new vowels. Save the spectrum sequence with a name which you have chosen for it. | |||
You can then edit the phoneme file for your language (eg. phsource/ph_xxx), and change a phoneme to refer to your new vowel file. Then do <code>Data->Compile_Phoneme_Data</code> from espeakedit's menubar to re-compile the phoneme data. | |||
</body> | |||
</html> |
@@ -1,55 +1,66 @@ | |||
ANALYSIS | |||
======== | |||
# Table of contents | |||
* [ANALYSIS](#analysis) | |||
* [Recording Sounds](#recording-sounds) | |||
* [Praat](#praat) | |||
* [Vowels and Diphthongs](#vowels-and-diphthongs) | |||
* [Analysing a Recording](#analysing-a-recording) | |||
* [Tone Quality](#tone-quality) | |||
* [Using an Existing Vowel File](#using-an-existing-vowel-file) | |||
* [Length and Amplitude](#length-and-amplitude) | |||
* [Using the New Vowel](#using-the-new-vowel) | |||
# ANALYSIS | |||
(Further notes are needed) | |||
Recordings of spoken words and phrases can be analysed to try and make | |||
eSpeak match a language more closely. Unlike most other (larger and | |||
better quality) synthesizers, eSpeak's data is not produced directly | |||
eSpeak NG match a language more closely. Unlike most other (larger and | |||
better quality) synthesizers, of eSpeak NG data is not produced directly | |||
from recorded sounds. To use an analogy, it's like a drawing or sketch | |||
compared with a photograph. Or vector graphics compared with a bitmap | |||
image. It's smaller, less accurate, with less subtlety, but it can | |||
sometimes show some aspects of the picture more clearly than a more | |||
accurate image. | |||
#### Recording Sounds {.western} | |||
## Recording Sounds | |||
Recordings should be made while speaking slowly, clearly, and firmly and | |||
loudly (but not shouting). Speak about half a metre from the microphone. | |||
Try to avoid background noise and hum interference from electrical power | |||
cables. | |||
#### Praat {.western} | |||
## Praat | |||
I use a modified version of the praat program | |||
([www.praat.org](www.praat.org)) to view and analyse both sound | |||
recordings and output from eSpeak. The modification adds a new function | |||
(`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and | |||
([www.praat.org](http://www.praat.org)) to view and analyse both sound | |||
recordings and output from eSpeak NG. The modification adds a new function | |||
(**Spectrum->To_eSpeak**) which analysis a voiced sound and | |||
produces a file which can be loaded into espeakedit. Details of the | |||
modification are in the `"praat-mod"`{.western} directory in the | |||
modification are in the `praat-mod` directory in the | |||
espeakedit package. The analysis contains a sequence of frames, one per | |||
cycle at the speech's fundamental frequency. Each frame is a short time | |||
spectrum, together with praat's estimation of the f1 to f5 formant | |||
frequencies at the time of that cycle. I also use Praat's | |||
`New->Record_mono_sound`{.western} function to make sound recordings. | |||
**New->Record_mono_sound** function to make sound recordings. | |||
### Vowels and Diphthongs {.western} | |||
# Vowels and Diphthongs | |||
#### Analysing a Recording {.western} | |||
## Analysing a Recording | |||
Make a recording, with a male voice, and trim it in Praat to keep just | |||
the required vowel sound. Then use the new | |||
`Spectrum->To_eSpeak`{.western} modification (this was named | |||
`To_Spectrogram2`{.western} in earlier versions) to analyse the sound. | |||
It produces a file named `"spectrum.dat"`{.western}. Load the | |||
`"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open | |||
functions, `File->Open`{.western} and `File->Open2`{.western}. They are | |||
**Spectrum->To_eSpeak** modification (this was named | |||
`To_Spectrogram2` in earlier versions) to analyse the sound. | |||
It produces a file named `spectrum.dat`. Load the | |||
`spectrum.dat` file into espeakedit. Espeakedit has two Open | |||
functions, **File->Open**. They are | |||
the same, except that they remember different paths. I generally use | |||
`File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file. | |||
**File->Open2** file. | |||
The data is displayed in espeakedit as a sequence of spectrum frames | |||
(see [editor.html](editor.html)). | |||
(see [editor](editor.md)). | |||
#### Tone Quality {.western} | |||
## Tone Quality | |||
It can be difficult to match the tonal quality of a new vowel to be | |||
compatible with existing vowel files. This is determined by the relative | |||
@@ -66,11 +77,11 @@ analysed data, instead copy keyframes from a similar existing vowel. | |||
Then make small adjustments to match the position of the F1, F2, F3 | |||
formant peaks and hopefully produce the required vowel sound. | |||
#### Using an Existing Vowel File {.western} | |||
## Using an Existing Vowel File | |||
Choose a similar vowel file from `phsource/vowel`{.western} and open it | |||
Choose a similar vowel file from `phsource/vowel` and open it | |||
into espeakedit. It may be useful to use | |||
`phsource/vowel/vowelchart`{.western} as a map to show how vowel files | |||
`phsource/vowel/vowelchart` as a map to show how vowel files | |||
compare with each other. You can select a keyframe from the vowel file | |||
and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame | |||
of the new spectrum sequence. Then adjust the peaks to match the new | |||
@@ -80,22 +91,22 @@ low frequencies, below the F1 peak. If the sound is too muffled, or | |||
conversely, too "thin", try adjusting the amplitude or position of the | |||
F0 peak. | |||
#### Length and Amplitude {.western} | |||
## Length and Amplitude | |||
Use an existing vowel file as a guide for how to set the amplitude and | |||
length of the keyframes. At the right of each keyframe, its length is | |||
shown in mS and under that is its relative (RMS) amplitude. The second | |||
shown in mili seconds and under that is its relative (RMS) amplitude. The second | |||
keyframe should be marked with a red marker (use CTRL-M to toggle this). | |||
This divides the vowel into the front-part (with one frame), and the | |||
rest. Use F2 to play the sound of the new vowel sequence. It will also | |||
produce a WAV file (the default name is speech.wav) which you can read | |||
into praat to see whether it has a sensible shape. | |||
#### Using the New Vowel {.western} | |||
## Using the New Vowel | |||
Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save | |||
Make a new directory (eg. `vwl\_xx`) in phsource for your new vowels. Save | |||
the spectrum sequence with a name which you have chosen for it. You can | |||
then edit the phoneme file for your language (eg. phsource/ph\_xxx), and | |||
then edit the phoneme file for your language (eg. `phsource/ph\_xxx`), and | |||
change a phoneme to refer to your new vowel file. Then do | |||
`Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to | |||
**Data->Compile_Phoneme_Data** from espeakedit's menubar to | |||
re-compile the phoneme data. |
@@ -1,227 +0,0 @@ | |||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | |||
<html> | |||
<head> | |||
<title>eSpeak Speech Synthesizer</title> | |||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | |||
</head> | |||
<body> | |||
<A href="index.html">Back</A> | |||
<hr> | |||
<h2>2.1 INSTALLATION</h2> | |||
<hr> | |||
<h3>2.1.1 Linux and other Posix systems</h3> | |||
There are two versions of the command line program. They both have the same command parameters (see below). | |||
<ol> | |||
<li><strong>espeak-ng</strong> uses speech engine in the <strong>libespeak-ng</strong> shared library. The libespeak-ng library must first be installed. | |||
<p> | |||
<li><strong>speak-ng</strong> is a stand-alone version which includes its own copy of the speech engine. | |||
</ol> | |||
Place the <strong>espeak-ng</strong> or <strong>speak-ng</strong> executable file in the command path, eg in <strong>/usr/local/bin</strong> | |||
<p> | |||
Place the "<strong>espeak-data</strong>" directory in /usr/share as <strong>/usr/share/espeak-data</strong>.<br> | |||
Alternatively if it is placed in the user's home directory (i.e. <strong>/home/<user>/espeak-data</strong>) | |||
then that will be used instead. | |||
<p> | |||
<h4>Dependencies</h4> | |||
<strong>espeak-ng</strong> uses the PortAudio sound library (version 18), so you will need to have the <strong>libportaudio0</strong> library package installed. It may be already, since it's used by other software, such as OpenOffice.org and the Audacity sound editor.<p> | |||
Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio which has a slightly different API. The speak program can be compiled to use version 19 of PortAudio by copying the file portaudio19.h to portaudio.h before compiling.<p> | |||
The speak program may be compiled without using PortAudio, by removing the line<pre> #define USE_PORTAUDIO | |||
</pre>in the file speech.h. | |||
<p> <hr> | |||
<h3>2.1.2 Windows</h3> | |||
The installer: <strong>setup_espeak.exe</strong> installs the SAPI5 version of eSpeak. | |||
During installation you need to specify which voices you want to appear in SAPI5 voice menus. | |||
<p> | |||
It also installs a command line program <strong>espeak-ng</strong> in the espeak-ng program directory. | |||
<p> <hr> | |||
<h2>2.2 COMMAND OPTIONS</h2> | |||
<hr> | |||
<h3>2.2.1 Examples</h3> | |||
To use at the command line, type:<br> | |||
<strong>espeak-ng "This is a test"</strong><br> | |||
or<br> | |||
<strong>espeak-ng -f <text file></strong> | |||
<p> | |||
Or just type<br> | |||
<strong>espeak-ng</strong><br> | |||
followed by text on subsequent lines. Each line is spoken when | |||
RETURN is pressed. | |||
<p> | |||
Use <strong>espeak-ng -x</strong> to see the corresponding phoneme codes. | |||
<p> <hr> | |||
<h3>2.2.2 The Command Line Options</h3> | |||
<dl> | |||
<dt> | |||
<strong>espeak-ng [options] ["text words"]</strong><br> | |||
<dd>Text input can be taken either from a file, from a string in the command, or from stdin. | |||
<p> | |||
<dt> | |||
<strong>-f <text file></strong><br> | |||
<dd>Speaks a text file. | |||
<p> | |||
<dt> | |||
<strong> --stdin</strong><br> | |||
<dd>Takes the text input from stdin. | |||
<p> | |||
<dt> | |||
If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). <br>If that is not present then text is taken from stdin, but each line is treated as a separate sentence. | |||
<p> | |||
<dt> | |||
<strong>-a <integer></strong><br> | |||
<dd>Sets amplitude (volume) in a range of 0 to 200. The default is 100. | |||
<p> | |||
<dt> | |||
<strong>-p <integer></strong><br> | |||
<dd>Adjusts the pitch in a range of 0 to 99. The default is 50. | |||
<p> | |||
<dt> | |||
<strong>-s <integer></strong><br> | |||
<dd>Sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 175. I generally use a faster speed | |||
of 260. The lower limit is 80. There is no upper limit, but about 500 is probably a practical maximum. | |||
<p> | |||
<dt> | |||
<strong>-b <integer></strong><br> | |||
<dd>Input text character format.<p> | |||
1 UTF-8. This is the default.<p> | |||
2 The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish).<p> | |||
4 16 bit Unicode.<p> | |||
Without this option, eSpeak assumes text is UTF-8, but will automatically switch to the 8-bit character set if it finds an illegal UTF-8 sequence. | |||
<p> | |||
<dt> | |||
<strong>-g <integer></strong><br> | |||
<dd>Word gap. This option inserts a pause between words. The value is the length of the pause, in units of 10 mS (at the default speed of 170 wpm). | |||
<p> | |||
<dt> | |||
<strong>-h </strong> or <strong> --help</strong><br> | |||
<dd>The first line of output gives the eSpeak version number. | |||
<p> | |||
<dt> | |||
<strong>-k <integer></strong><br> | |||
<dd>Indicate words which begin with capital letters.<p> | |||
1 eSpeak uses a click sound to indicate when a word starts with a capital letter, or double click if word is all capitals.<p> | |||
2 eSpeak speaks the word "capital" before a word which begins with a capital letter.<p> | |||
Other values: eSpeak increases the pitch for words which begin with a capital letter. The greater the value, the greater the increase in pitch. Try -k20. | |||
<p> | |||
<dt> | |||
<strong>-l <integer></strong><br> | |||
<dd>Line-break length, default value 0. If set, then lines which are shorter | |||
than this are treated as separate clauses and spoken separately with a | |||
break between them. This can be useful for some text files, but bad for | |||
others. | |||
<p> | |||
<dt> | |||
<strong>-m</strong><br> | |||
<dd>Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. Those SSML tags which are supported are interpreted. Other tags, including HTML, are ignored, except that some HTML tags such as <hr> <h2> and <li> ensure a break in the speech. | |||
<p> | |||
<dt> | |||
<strong>-q</strong><br><dd> | |||
Quiet. No sound is generated. This may be useful with options such as -x and --pho. | |||
<p> | |||
<dt> | |||
<strong>-v <voice filename>[+<variant>]</strong><br> | |||
<dd>Sets a Voice for the speech, usually to select a language. eg: | |||
<pre> espeak-ng -vaf</pre> | |||
To use the Afrikaans voice. A modifier after the voice name can be used to vary the tone of the voice, eg: | |||
<pre> espeak-ng -vaf+3</pre> | |||
The variants are <code> +m1 +m2 +m3 +m4 +m5 +m6 +m7</code> for male voices and <code> +f1 +f2 +f3 +f4 </code> which simulate female voices by using higher pitches. Other variants include <code>+croak</code> and <code>+whisper</code>. | |||
<p> | |||
<voice filename> is a file within the <code>espeak-data/voices</code> directory.<br> | |||
<variant> is a file within the <code>espeak-data/voices/!v</code> directory.<p> | |||
Voice files can specify a language, alternative pronunciations or phoneme sets, different pitches, tonal qualities, and prosody for the voice. | |||
See the <a href="voices.html">voices.html</a> file.<p> | |||
Voice names which start with <b>mb-</b> are for use with Mbrola diphone voices, see <a href="mbrola.html">mbrola.html</a><p> | |||
Some languages may need additional dictionary data, see <a href="languages.html">languages.html</a> | |||
<p> | |||
<dt> | |||
<strong>-w <wave file></strong><br> | |||
<dd>Writes the speech output to a file in WAV format, rather than speaking it. | |||
<p> | |||
<dt> | |||
<strong>-x</strong><br> | |||
<dd>The phoneme mnemonics, into which the input text is translated, are written to stdout. | |||
If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish | |||
this from separate phonemes. | |||
<p> | |||
<dt> | |||
<strong>-X</strong><br> | |||
<dd>As -x, but in addition, details are shown of the pronunciation rule and dictionary list lookup. This can be useful to see why a certain pronunciation is being produced. Each matching pronunciation rule is listed, together with its score, the highest scoring rule being used in the translation. "Found:" indicates the word was found in the dictionary lookup list, and "Flags:" means the word was found with only properties and not a pronunciation. You can see when a word has been retranslated after removing a prefix or suffix. | |||
<p> | |||
<dt> | |||
<strong>-z</strong><br> | |||
<dd>The option removes the end-of-sentence pause which normally occurs at the end of the text. | |||
<p> | |||
<dt> | |||
<strong>--stdout</strong><br> | |||
<dd>Writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced. | |||
<p> | |||
<dt><strong>--compile [=<voice name>]</strong><br> | |||
<dd> | |||
Compile the pronunciation rule and dictionary lookup data from their source files in the current directory. The Voice determines which language's files are compiled. For example, if it's an English voice, then <em>en_rules</em>, <em>en_list</em>, and <em>en_extra</em> (if present), are compiled to replace <em>en_dict</em> in the <em>speak-data</em> directory. If no Voice is specified then the default Voice is used. | |||
<p> | |||
<dt><strong>--compile-debug [=<voice name>]</strong><br> | |||
<dd> | |||
The same as <strong>--compile</strong>, but source line numbers from the *_rules file are included. These are included in the rules trace when the <strong>-X</strong> option is used. | |||
<p> | |||
<dt><strong>--ipa</strong><br> | |||
<dd> | |||
Writes phonemes to stdout, using the International Phonetic Alphabet (IPA).<br> | |||
If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish | |||
this from separate phonemes. | |||
<p> | |||
<dt><strong>--path [="<directory path>"]</strong><br> | |||
<dd> | |||
Specifies the directory which contains the espeak-data directory. | |||
<p> | |||
<dt><strong>--pho</strong><br> | |||
<dd> | |||
When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme data (.pho file format) to stdout. This includes the mbrola phoneme names with duration and pitch information, in a form which is suitable as input to this mbrola voice. The --phonout option can be used to write this data to a file. | |||
<p> | |||
<dt><strong>--phonout [="<filename>"]</strong><br> | |||
<dd> | |||
If specified, the output from -x, -X, --ipa, and --pho options is written to this file, rather than to stdout. | |||
<p> | |||
<dt><strong>--punct [="<characters>"]</strong><br> | |||
<dd> | |||
Speaks the names of punctuation characters when they are encountered in the text. If <characters> are given, then only those listed punctuation characters are spoken, eg. <code> --punct=".,;?"</code> | |||
<p> | |||
<dt><strong>--sep [=<character>]</strong><br> | |||
<dd> | |||
The character is used to separate individual phonemes in the output which is produced by the -x or --ipa options. The default is a space character. The character z means use a ZWNJ character (U+200c). | |||
<p> | |||
<dt><strong>--split [=<minutes>]</strong><br> | |||
<dd> | |||
Used with <strong>-w</strong>, it starts a new WAV file every <code><minutes></code> minutes, at the next sentence boundary. | |||
<p> | |||
<dt><strong>--tie [=<character>]</strong><br> | |||
<dd> | |||
The character is used within multi-letter phonemes in the output which is produced by the -x or --ipa options. The default is the tie character ͡ U+361. The character z means use a ZWJ character (U+200d). | |||
<p> | |||
<dt> | |||
<strong>--voices [=<language code>]</strong><br> | |||
<dd>Lists the available voices.<br> | |||
If =<language code> is present then only those voices which are suitable for that language are listed.<br> | |||
<code>--voices=mbrola</code> lists the voices which use mbrola diphone voices. These are not included in the default <code>--voices</code> list<br> | |||
<code>--voices=variant</code> lists the available voice variants (voice modifiers).<br> | |||
</dl> | |||
<p> <hr> | |||
<h3>2.2.3 The Input Text</h3> | |||
<dl> | |||
<dt><b>HTML Input</b> | |||
<dd> | |||
If the -m option is used to indicate marked-up text, then HTML can be spoken directly. | |||
<p> | |||
<dt><b>Phoneme Input</b> | |||
<dd> | |||
As well as plain text, phoneme mnemonics can be used in the text input to <strong>espeak-ng</strong>. They are enclosed within double square brackets. Spaces are used to separate words and all stressed syllables must be marked explicitly.<p> | |||
eg: <code> espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" </code><p> | |||
This command will speak: "This is some phonetic text input". | |||
</dl> | |||
<hr> | |||
<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=159649&type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a> | |||
</body> |
@@ -43,7 +43,7 @@ in the file speech.h. | |||
## Windows | |||
The installer: **setup\_espeak.exe** installs the SAPI5 version of | |||
eSpeak. During installation you need to specify which voices you want to | |||
eSpeak NG. During installation you need to specify which voices you want to | |||
appear in SAPI5 voice menus. | |||
It also installs a command line program **espeak-ng** in the espeak-ng | |||
@@ -104,7 +104,7 @@ practical maximum. | |||
> 1 UTF-8. This is the default. | |||
> 2 The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish). | |||
> 4 16 bit Unicode. | |||
> Without this option, eSpeak assumes text is UTF-8, but will | |||
> Without this option, eSpeak NG assumes text is UTF-8, but will | |||
automatically switch to the 8-bit character set if it finds an | |||
illegal UTF-8 sequence. | |||
@@ -116,16 +116,16 @@ the length of the pause, in units of 10 mS (at the default speed of | |||
**-h** or **--help** | |||
> The first line of output gives the eSpeak version number. | |||
> The first line of output gives the eSpeak NG version number. | |||
**-k \<integer\>** | |||
> Indicate words which begin with capital letters. | |||
> 1 eSpeak uses a click sound to indicate when a word starts with a | |||
> 1 eSpeak NG uses a click sound to indicate when a word starts with a | |||
capital letter, or double click if word is all capitals. | |||
> 2 eSpeak speaks the word "capital" before a word which begins with | |||
> 2 eSpeak NG speaks the word "capital" before a word which begins with | |||
a capital letter. | |||
> Other values: eSpeak increases the pitch for words which begin | |||
> Other values: eSpeak NG increases the pitch for words which begin | |||
with a capital letter. The greater the value, the greater the | |||
increase in pitch. Try -k20. | |||
@@ -1,49 +1,75 @@ | |||
4. TEXT TO PHONEME TRANSLATION {.western} | |||
------------------------------ | |||
### 4.1 Translation Files {.western} | |||
# Table of contents | |||
* [Text to phoneme translation](#text-to-phoneme-translation) | |||
* [Translation Files](#translation-files) | |||
* [Phoneme names](#phoneme-names) | |||
* [Pronunciation Rules](#pronunciation-rules) | |||
* [Rule Groups](#rule-groups) | |||
* [Rules](#rules) | |||
* [Special characters in \<phoneme string\>](#special-characters-in-phoneme-string) | |||
* [Special Characters in both \<pre\> and \<post\> ](#special-characters-in-both-pre-and-post) | |||
* [Special characters only in \<pre\> ](#special-characters-only-in-pre) | |||
* [Special characters only in \<post\> ](#special-characters-only-in-post) | |||
* [Pronunciation Dictionary List](#pronunciation-dictionary-list) | |||
* [Multiple Words](#multiple-words) | |||
* [Special characters in \<phoneme string\>](#special-characters-in-phoneme-string) | |||
* [Flags](#flags) | |||
* [Translating a Word to another Word](#translating-a-word-to-another-word) | |||
* [Conditional Rules](#conditional-rules) | |||
* [Numbers and Character Names](#numbers-and-character-names) | |||
* [Letter names](#letter-names) | |||
* [Numbers](#numbers) | |||
* [Character Substitution](#character-substitution) | |||
# Text to phoneme translation | |||
## Translation Files | |||
There is a separate set of pronunciation files for each language, their | |||
names starting with the language name. | |||
There are two separate methods for translating words into phonemes: | |||
- - | |||
* Pronunciation Rules. These are an attempt to define the pronunciation rules for the language. The source file is: | |||
**\<language\>\_rules** (eg. `en_rules`) | |||
* Lookup Dictionary. A list of individual words and their pronunciations and/or various other properties. The source files are: | |||
**\<language\>\_list** (eg. `en_list`) and optionally **\<language\>\_extra**. | |||
These two files are compiled into the file ***\<language\>\_dict*** in | |||
the espeak-data directory (eg. espeak-data/en\_dict) | |||
These two files are compiled into the file **\<language\>\_dict** in | |||
the espeak-data directory (eg. `espeak-data/en_dict`) | |||
### 4.2 Phoneme names {.western} | |||
## Phoneme names | |||
Each of the language's phonemes is represented by a mnemonic of 1, 2, 3, | |||
or 4 characters. Together with a number of utility codes (eg. stress | |||
marks and pauses), these are defined in the phoneme data file (see | |||
\*spec not yet available\*). | |||
marks and pauses), these are defined in the phoneme data file (_TODO_). | |||
The utility 'phonemes' are: | |||
+--------------------------------------+--------------------------------------+ | |||
| **'** | primary stress | | |||
+--------------------------------------+--------------------------------------+ | |||
| **,** | secondary stress | | |||
+--------------------------------------+--------------------------------------+ | |||
| **%** | unstressed syllable | | |||
+--------------------------------------+--------------------------------------+ | |||
| **= ** | put the primary stress on the | | |||
| | preceding syllable | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_:** | short pause | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_** | a shorter pause | | |||
+--------------------------------------+--------------------------------------+ | |||
| **||** | indicates a word boundary within a | | |||
| | phoneme string | | |||
+--------------------------------------+--------------------------------------+ | |||
| **|** | can be used to separate two adjacent | | |||
| | characters, to prevent them from | | |||
| | being considered as a | | |||
| | multi-character phoneme mnemonic | | |||
+--------------------------------------+--------------------------------------+ | |||
+-----------+--------------------------------------+ | |||
| **'** | primary stress | | |||
+-----------+--------------------------------------+ | |||
| **,** | secondary stress | | |||
+-----------+--------------------------------------+ | |||
| **%** | unstressed syllable | | |||
+-----------+--------------------------------------+ | |||
| **=** | put the primary stress on the | | |||
| | preceding syllable | | |||
+-----------+--------------------------------------+ | |||
| **\_:** | short pause | | |||
+-----------+--------------------------------------+ | |||
| **\_** | a shorter pause | | |||
+-----------+--------------------------------------+ | |||
| **||** | indicates a word boundary within a | | |||
| | phoneme string | | |||
+-----------+--------------------------------------+ | |||
| **|** | can be used to separate two adjacent | | |||
| | characters, to prevent them from | | |||
| | being considered as a | | |||
| | multi-character phoneme mnemonic | | |||
+-----------+--------------------------------------+ | |||
It is not necessary to specify the stress of every syllable. Stress | |||
markers are only needed in order to change the effect of the language's | |||
@@ -54,9 +80,11 @@ loosely on the Kirshenbaum ascii character representation of the | |||
International Phonetic Alphabet | |||
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf) | |||
### 4.3 Pronunciation Rules {.western} | |||
Full list of commonly used phonemes can be found in [phsource/phonemes](../phsource/phonemes) file. | |||
The rules in the ***\<language\>\_rules*** file specify the phonemes | |||
## Pronunciation Rules | |||
The rules in the **\<language\>\_rules** file specify the phonemes | |||
which are used to pronounce each letter, or sequence of letters. Some | |||
rules only apply when the letter or letters are preceded by, or followed | |||
by, other specified letters. | |||
@@ -68,21 +96,52 @@ matching rule is chosen. The pointer into the source word is then | |||
advanced past those letters which have been matched and the process is | |||
repeated until all the letters of the word have been processed. | |||
#### 4.3.1 Rule Groups {.western} | |||
### Rule Groups | |||
The rules are organized in groups, each starting with a ".group" line: | |||
**.group \<character\>** | |||
> A group for each letter or character. | |||
**.group \<2 characters\>** | |||
> Optional groups for some common 2 letter combinations. This is only needed, for efficiency, in cases where there are many rules for a particular letter. They would not be needed for a language which has regular spelling rules. The first character can only be an ascii character (less than 0x80). | |||
**.group** | |||
> A group for other characters which don't have their own group. | |||
**.L\<nn\>** | |||
> Defines a group of letter sequences, any of which can match with Lnn in a pre or post rule (see below). nn is a 2 digit decimal number in the range 01 to 25. eg: | |||
`.L01 b bl br pl pr` | |||
**.replace** | |||
> See section [Character Substitution](#character-substitution). | |||
When matching a word, firstly the 2-letter group for the two letters at | |||
the current position in the word (if such a group exists) is searched, | |||
and then the single-letter group. The highest scoring rule in either of | |||
those two groups is used. | |||
#### 4.3.2 Rules {.western} | |||
### Rules | |||
Each rule is on separate line, and has the syntax: | |||
`[<pre>)] <match> [(<post>] <phoneme string>` | |||
eg. | |||
``` | |||
.group o | |||
o 0 // "o" is pronounced as [0] | |||
oo u: // but "oo" is pronounced as [u:] | |||
b) oo (k U | |||
``` | |||
"oo" is pronounced as [u:], but when also preceded by "b" and followed | |||
by "k", it is pronounced [U]. | |||
@@ -95,140 +154,142 @@ Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must | |||
be lower case, and matching is case-insensitive. Some upper case letters | |||
are used in \<pre\> and \<post\> with special meanings. | |||
#### 4.3.3 Special characters in \<phoneme string\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_\^\_\<language code\> ** | Translate using a different | | |||
| | language. | | |||
+--------------------------------------+--------------------------------------+ | |||
#### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_** | Beginning or end of a word (or a | | |||
| | hyphen). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **-** | Hyphen. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **A** | Any vowel (the set of vowel | | |||
| | characters may be defined for a | | |||
| | particular language). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **C** | Any consonant. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **B H F G Y ** | These may indicate other sets of | | |||
| | characters (defined for a particular | | |||
| | language). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **L\<nn\>** | Any of the sequence of characters | | |||
| | defined as a letter group (see 4.3.1 | | |||
| | above). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **D** | Any digit. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **K** | Not a vowel (i.e. a consonant or | | |||
| | word boundary or non-alphabetic | | |||
| | character). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **X** | There is no vowel until the word | | |||
| | boundary. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **Z** | A non-alphabetic character. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **%** | Doubled (placed before a character | | |||
| | in \<pre\> and after it in \<post\>. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **/** | The following character is treated | | |||
| | literally. | | |||
+--------------------------------------+--------------------------------------+ | |||
### Special characters in \<phoneme string\>: | |||
**_^_\<language code\>** | |||
> Translate using a different language. | |||
If this rule is selected when translating a word, then the translation is aborted and the word is re-translated using the specified different language. \<language code\> may be upper or lower case. This can be used to recognise certain letter combinations as being foreign words and to use the foreign pronunciation for them. eg: | |||
`th (_ _^_EN` | |||
indicates that a word which ends in "th" is translated using the English translation rules and spoken with English phonemes. | |||
### Special Characters in both \<pre\> and \<post\> | |||
+------------------+--------------------------------------+ | |||
| **\_** | Beginning or end of a word (or a | | |||
| | hyphen). | | |||
+------------------+--------------------------------------+ | |||
| **-** | Hyphen. | | |||
+------------------+--------------------------------------+ | |||
| **A** | Any vowel (the set of vowel | | |||
| | characters may be defined for a | | |||
| | particular language). | | |||
+------------------+--------------------------------------+ | |||
| **C** | Any consonant. | | |||
+------------------+--------------------------------------+ | |||
| **B H F G Y** | These may indicate other sets of | | |||
| | characters (defined for a particular | | |||
| | language). | | |||
+------------------+--------------------------------------+ | |||
| **L\<nn\>** | Any of the sequence of characters | | |||
| | defined as a letter group (see 1 | | |||
| | above). | | |||
+------------------+--------------------------------------+ | |||
| **D** | Any digit. | | |||
+------------------+--------------------------------------+ | |||
| **K** | Not a vowel (i.e. a consonant or | | |||
| | word boundary or non-alphabetic | | |||
| | character). | | |||
+------------------+--------------------------------------+ | |||
| **X** | There is no vowel until the word | | |||
| | boundary. | | |||
+------------------+--------------------------------------+ | |||
| **Z** | A non-alphabetic character. | | |||
+------------------+--------------------------------------+ | |||
| **%** | Doubled (placed before a character | | |||
| | in \<pre\> and after it in \<post\>. | | |||
+------------------+--------------------------------------+ | |||
| **/** | The following character is treated | | |||
| | literally. | | |||
+------------------+--------------------------------------+ | |||
The sets of letters indicated by A, B, C, E, F G may be defined | |||
differently for each language. | |||
Examples of rules: | |||
~~~~ {.western} | |||
``` | |||
_) a // "a" at the start of a word | |||
a (CC // "a" followed by two consonants | |||
a (C% // "a" followed by a double consonant (the same letter twice) | |||
a (/% // "a" followed by a percent sign | |||
%C) a // "a" preceded by a double consonants | |||
~~~~ | |||
``` | |||
#### 4.3.5 Special characters only in \<pre\>: {.western} | |||
### Special characters only in \<pre\>: | |||
+--------------------------------------+--------------------------------------+ | |||
| **@ ** | Any syllable. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **&** | A syllable which may be stressed | | |||
| | (i.e. is not defined as unstressed). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **V** | Matches only if a previous word has | | |||
| | indicated that a verb form is | | |||
| | expected. | | |||
+--------------------------------------+--------------------------------------+ | |||
+-----------------+--------------------------------------+ | |||
| **@** | Any syllable. | | |||
+-----------------+--------------------------------------+ | |||
| **&** | A syllable which may be stressed | | |||
| | (i.e. is not defined as unstressed). | | |||
+-----------------+--------------------------------------+ | |||
| **V** | Matches only if a previous word has | | |||
| | indicated that a verb form is | | |||
| | expected. | | |||
+-----------------+--------------------------------------+ | |||
eg. | |||
~~~~ {.western} | |||
``` | |||
@@) bi // "bi" preceded by at least two syllables | |||
@@a) bi // "bi" preceded by at least 2 syllables and following 'a' | |||
~~~~ | |||
``` | |||
Note, that matching characters in the \<pre\> part do not affect the | |||
syllable counting. | |||
#### 4.3.6 Special characters only in \<post\>: {.western} | |||
+--------------------------------------+--------------------------------------+ | |||
| **@** | A vowel follows somewhere in the | | |||
| | word. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **+** | Force an increase in the score in | | |||
| | this rule (may be repeated for more | | |||
| | effect). | | |||
+--------------------------------------+--------------------------------------+ | |||
| **S\<number\> ** | This number of matching characters | | |||
| | are a standard suffix, remove them | | |||
| | and retranslate the word. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **P\<number\>** | This number of matching characters | | |||
| | are a standard prefix, remove them | | |||
| | and retranslate the word. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **Lnn** | **nn** is a 2-digit decimal number | | |||
| | in the range 01 to 20\ | | |||
| | Matches with any of the letter | | |||
| | sequences which have been defined | | |||
| | for letter group **nn** | | |||
+--------------------------------------+--------------------------------------+ | |||
| **N** | Only use this rule if the word is | | |||
| | not a retranslation after removing a | | |||
| | suffix. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\#** | (English specific) change the next | | |||
| | "e" into a special character "E" | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\$noprefix** | Only use this rule if the word is | | |||
| | not a retranslation after removing a | | |||
| | prefix. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\$w\_alt\ | Only use this rule if the word is | | |||
| \$w\_alt2\ | found in the \*\_list file with the | | |||
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | attribute respectively. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **\$p\_alt\ | Only use this rule if the part-word, | | |||
| \$p\_alt2\ | up to and including the pre and | | |||
| \$p\_alt3** | match parts of this rule, is found | | |||
| | in the \*\_list file with the | | |||
| | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | attribute respectively. | | |||
+--------------------------------------+--------------------------------------+ | |||
### Special characters only in \<post\> | |||
+--------------------+--------------------------------------+ | |||
| **@** | A vowel follows somewhere in the | | |||
| | word. | | |||
+--------------------+--------------------------------------+ | |||
| **+** | Force an increase in the score in | | |||
| | this rule (may be repeated for more | | |||
| | effect). | | |||
+--------------------+--------------------------------------+ | |||
| **S\<number\>** | This number of matching characters | | |||
| | are a standard suffix, remove them | | |||
| | and retranslate the word. | | |||
+--------------------+--------------------------------------+ | |||
| **P\<number\>** | This number of matching characters | | |||
| | are a standard prefix, remove them | | |||
| | and retranslate the word. | | |||
+--------------------+--------------------------------------+ | |||
| **Lnn** | **nn** is a 2-digit decimal number | | |||
| | in the range 01 to 20\ | | |||
| | Matches with any of the letter | | |||
| | sequences which have been defined | | |||
| | for letter group **nn** | | |||
+--------------------+--------------------------------------+ | |||
| **N** | Only use this rule if the word is | | |||
| | not a retranslation after removing a | | |||
| | suffix. | | |||
+--------------------+--------------------------------------+ | |||
| **\#** | (English specific) change the next | | |||
| | "e" into a special character "E" | | |||
+--------------------+--------------------------------------+ | |||
| **\$noprefix** | Only use this rule if the word is | | |||
| | not a retranslation after removing a | | |||
| | prefix. | | |||
+--------------------+--------------------------------------+ | |||
| **\$w\_alt\ | Only use this rule if the word is | | |||
| \$w\_alt2\ | found in the \*\_list file with the | | |||
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | attribute respectively. | | |||
+--------------------+--------------------------------------+ | |||
| **\$p\_alt\ | Only use this rule if the part-word, | | |||
| \$p\_alt2\ | up to and including the pre and | | |||
| \$p\_alt3** | match parts of this rule, is found | | |||
| | in the \*\_list file with the | | |||
| | **\$alt**, **\$alt2** or **\$alt3** | | |||
| | attribute respectively. | | |||
+--------------------+--------------------------------------+ | |||
eg. | |||
~~~~ {.western} | |||
``` | |||
@) ly (_S2 lI // "ly", at end of a word with at least one other | |||
// syllable, is a suffix pronounced [lI]. Remove | |||
// it and retranslate the word. | |||
@@ -237,7 +298,7 @@ eg. | |||
// prefix pronounced [Vn] | |||
_) un (i ju: // ... except in words starting "uni" | |||
_) un (inP2 ,Vn // ... but it is for words starting "unin" | |||
~~~~ | |||
``` | |||
S and P must be at the end of the \<post\> string. | |||
@@ -245,49 +306,49 @@ S\<number\> may be followed by additional letters (eg. S2ei ). Some of | |||
these are probably specific to English, but similar functions could be | |||
made for other languages. | |||
+--------------------------------------+--------------------------------------+ | |||
| **q** | query the \_list file to find stress | | |||
| | position or other attributes for the | | |||
| | stem, but don't re-translate the | | |||
| | word with the suffix removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **t** | determine the stress pattern of the | | |||
| | word **before** adding the suffix | | |||
+--------------------------------------+--------------------------------------+ | |||
| **d ** | the previous letter may have been | | |||
| | doubled when the suffix was added. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **e** | "e" may have been removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **i** | "y" may have been changed to "i." | | |||
+--------------------------------------+--------------------------------------+ | |||
| **v** | the suffix means the verb form of | | |||
| | pronunciation should be used. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **f** | the suffix means the next word is | | |||
| | likely to be a verb. | | |||
+--------------------------------------+--------------------------------------+ | |||
| **m** | after this suffix has been removed, | | |||
| | additional suffixes may be removed. | | |||
+--------------------------------------+--------------------------------------+ | |||
+-------+--------------------------------------+ | |||
| **q** | query the \_list file to find stress | | |||
| | position or other attributes for the | | |||
| | stem, but don't re-translate the | | |||
| | word with the suffix removed. | | |||
+-------+--------------------------------------+ | |||
| **t** | determine the stress pattern of the | | |||
| | word **before** adding the suffix | | |||
+-------+--------------------------------------+ | |||
| **d** | the previous letter may have been | | |||
| | doubled when the suffix was added. | | |||
+-------+--------------------------------------+ | |||
| **e** | "e" may have been removed. | | |||
+-------+--------------------------------------+ | |||
| **i** | "y" may have been changed to "i." | | |||
+-------+--------------------------------------+ | |||
| **v** | the suffix means the verb form of | | |||
| | pronunciation should be used. | | |||
+-------+--------------------------------------+ | |||
| **f** | the suffix means the next word is | | |||
| | likely to be a verb. | | |||
+-------+--------------------------------------+ | |||
| **m** | after this suffix has been removed, | | |||
| | additional suffixes may be removed. | | |||
+-------+--------------------------------------+ | |||
P\<number\> may be followed by additonal letters (eg. P3v ). | |||
+--------------------------------------+--------------------------------------+ | |||
| **t ** | determine the stress pattern of the | | |||
| | word **before** adding the prefix | | |||
+--------------------------------------+--------------------------------------+ | |||
| **v** | the suffix means the verb form of | | |||
| | pronunciation should be used. | | |||
+--------------------------------------+--------------------------------------+ | |||
+--------+--------------------------------------+ | |||
| **t** | determine the stress pattern of the | | |||
| | word **before** adding the prefix | | |||
+--------+--------------------------------------+ | |||
| **v** | the suffix means the verb form of | | |||
| | pronunciation should be used. | | |||
+--------+--------------------------------------+ | |||
### 4.4 Pronunciation Dictionary List {.western} | |||
## Pronunciation Dictionary List | |||
The ***\<language\>\_list*** file contains a list of words whose | |||
The **\<language\>\_list** file contains a list of words whose | |||
pronunciations are given explicitly, rather than determined by the | |||
Pronunciation Rules. The ***\<language\>\_extra*** file, if present, is | |||
Pronunciation Rules. The **\<language\>\_extra** file, if present, is | |||
also used and it's contents are taken as coming after those in | |||
***\<language\>\_list***. | |||
**\<language\>\_list**. | |||
Also the list can be used to specify the stress pattern, or other | |||
properties, of a word. | |||
@@ -298,57 +359,59 @@ Dictionary List after the prefix or suffix has been removed. | |||
Lines in the dictionary list have the form: | |||
eg. | |||
``` | |||
<word> [<phoneme string>] [<flags>] | |||
``` | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
eg. | |||
``` | |||
book bUk | |||
~~~~ | |||
``` | |||
Rather than a full pronunciation, just the stress may be given, to | |||
change where it would be otherwise placed by the Pronunciation Rules: | |||
~~~~ {.western} | |||
``` | |||
berlin $2 // stress on second syllable | |||
absolutely $3 // stress on third syllable | |||
for $u // an unstressed word | |||
~~~~ | |||
``` | |||
#### 4.4.1 Multiple Words {.western} | |||
### Multiple Words | |||
A pronunciation may also be specified for a group of words, when these | |||
appear together. Up to four words may be given, enclosed in brackets. | |||
This may be used for change the pronunciation or stress pattern when | |||
these words occur together, | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
``` | |||
(de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string | |||
~~~~ | |||
``` | |||
or to run them together, pronounced as a single word | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
``` | |||
(of a) @v@ | |||
~~~~ | |||
``` | |||
or to give them a flag when they occur together | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
``` | |||
(such as) sVtS||a2z $pause // precede with a pause | |||
~~~~ | |||
``` | |||
Hyphenated words in the ***\<language\>\_list*** file must also be | |||
Hyphenated words in the **\<language\>\_list** file must also be | |||
enclosed within brackets, because the two parts are considered as | |||
separate words. | |||
#### 4.4.2 Special characters in \<phoneme string\>: {.western} | |||
### Special characters in \<phoneme string\>: | |||
+--------------------------------------+--------------------------------------+ | |||
| **\_\^\_\<language code\> ** | Translate using a different | | |||
| | language. See explanation in 4.3.3 | | |||
| **\_\^\_\<language code\>** | Translate using a different | | |||
| | language. See explanation in 3 | | |||
| | above. | | |||
+--------------------------------------+--------------------------------------+ | |||
#### 4.4.3 Flags {.western} | |||
### 3 Flags | |||
A word (or group of words) may be given one or more flags, either | |||
instead of, or as well as, the phonetic translation. | |||
@@ -449,12 +512,12 @@ instead of, or as well as, the phonetic translation. | |||
| | end of a sentence. | | |||
+--------------------------------------+--------------------------------------+ | |||
| \$abbrev | This has two meanings.\ | | |||
| | 1. If there is no phoneme string: | | |||
| | If there is no phoneme string: | | |||
| | Speak the word as individual | | |||
| | letters, even if it contains a vowel | | |||
| | (eg. "abc" should be spoken as "a" | | |||
| | "b" "c").\ | | |||
| | 2. If there is a phoneme string: | | |||
| | If there is a phoneme string: | | |||
| | This word is capitalized because it | | |||
| | is an abbreviation and | | |||
| | capitalization does not indicate | | |||
@@ -517,35 +580,33 @@ The dictionary list is searched from bottom to top. The first match that | |||
satisfies any conditions is used (i.e. the one lowest down the list). So | |||
if we have: | |||
~~~~ {.western} | |||
``` | |||
to t@ // unstressed version | |||
to tu: $atend // stressed version | |||
~~~~ | |||
``` | |||
then if "to" is at the end of the clause, we get [tu:], if not then we | |||
get [t@]. | |||
#### 4.4.4 Translating a Word to another Word {.western} | |||
### Translating a Word to another Word | |||
Rather than specifying the pronunciation of a word by a phoneme string, | |||
you can specify another "sounds like" word. | |||
Use the attribute **\$text** eg. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
``` | |||
cough coff $text | |||
~~~~ | |||
``` | |||
Alternatively, use the command **\$textmode** on a line by itself to | |||
turn this on for all subsequent entries in the file, until it's turned | |||
off by **\$phonememode**. eg. | |||
~~~~ {.western} | |||
``` | |||
$textmode | |||
cough coff | |||
through threw | |||
$phonememode | |||
~~~~ | |||
``` | |||
This feature cannot be used for the special entries in the **\_list** | |||
files which start with an underscore, such as numbers. | |||
@@ -554,7 +615,7 @@ Currently "textmode" entries are only recognized for complete words, and | |||
not for for stems from which a prefix or suffix has been removed (eg. | |||
the word "coughs" would not match the example above). | |||
### 4.5 Conditional Rules {.western} | |||
## Conditional Rules | |||
Rules in a **\_rules** file and entries in a **\_list** file can be made | |||
conditional. They apply only to some voices. This can be useful to | |||
@@ -569,14 +630,14 @@ line in the [voice file](voices.html). | |||
If the rule starts with **?!** then the rule only applies if the | |||
condition number is **not** specified in the voice file. eg. | |||
~~~~ {.western} | |||
``` | |||
?3 can't kant // only use this if the voice has: dictrules 3 | |||
?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3 | |||
~~~~ | |||
``` | |||
### 4.6 Numbers and Character Names {.western} | |||
## Numbers and Character Names | |||
#### 4.6.1 Letter names {.western} | |||
### Letter names | |||
The names of individual letters can be given either in the **\_rules** | |||
or **\_list** file. Sometimes an individual letter is also used as a | |||
@@ -585,14 +646,14 @@ letter name. If so, it should be listed in the **\_list** file, preceded | |||
by an underscore, to give the letter name (as distinct from its | |||
pronunciation as a word). eg. in English: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
``` | |||
_a eI | |||
~~~~ | |||
``` | |||
#### 4.6.2 Numbers {.western} | |||
### Numbers | |||
The operation the TranslateNumber() function is controlled by the | |||
language's `langopts.numbers`{.western} option. This constructs spoken | |||
language's `langopts.numbers` option. This constructs spoken | |||
numbers from fragments according to various options which can be set for | |||
each language. The number fragments are given in the **\_list** file. | |||
@@ -636,7 +697,7 @@ each language. The number fragments are given in the **\_list** file. | |||
| | point. | | |||
+--------------------------------------+--------------------------------------+ | |||
### 4.7 Character Substitution {.western} | |||
## Character Substitution | |||
Character substitutions can be specified by using a **.replace**section | |||
at the start of the **\_rules**file. Each line specified either one or | |||
@@ -645,11 +706,11 @@ alphabetic characters. This substitution is done to a word before it is | |||
translated using the spelling-to-phoneme rules. Only the lower-case | |||
version of the characters needs to be specified. eg. | |||
``` | |||
.replace\ | |||
ô ő // (Hungarian) allow the use of o-circumflex instead of | |||
o-double-accute\ | |||
ô ő // (Hungarian) allow the use of o-circumflex instead of o-double-accute | |||
û ű | |||
cx ĉ // (Esperanto) allow "cx" as an alternative to c-circumflex | |||
fi fi // replace a single character ligature by two characters | |||
``` | |||
@@ -1,67 +0,0 @@ | |||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | |||
<html> | |||
<head> | |||
<title>eSpeak Speech Synthesizer</title> | |||
<meta name="GENERATOR" content="Quanta Plus"> | |||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |||
</head> | |||
<body> | |||
<table border="1" cellpadding="10" background="images/sand-light.jpg" width="100%"> | |||
<tbody> | |||
<tr> | |||
<td width="15%"> | |||
<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=159649&type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a> | |||
</td> | |||
<td> | |||
<div align="center"><h1>eSpeak - Documents</h1></div> | |||
</td> | |||
</tr> | |||
<tr> | |||
<td valign="top"> | |||
<font size="+1"><strong> | |||
<A href="index.html">Home</A> | |||
<p> | |||
<A href="commands.html">Usage</A> | |||
<p> | |||
<A href="languages.html">Languages</A> | |||
</strong></font> | |||
</td> | |||
<td> | |||
<h3><A href="voices.html">Voice Files</A></h3> | |||
Voice files specify a language and other characteristics of a voice. | |||
<h3><A href="mbrola.html">Mbrola Voices</A></h3> | |||
eSpeak can be used as a front-end for Mbrola diphone voices. | |||
<h3><A href="dictionary.html">Pronunciation Dictionary</A></h3> | |||
<ul> | |||
<li>How to add pronunciation corrections. | |||
<li>How to build up pronunciation rules for a new language. | |||
</ul><p> | |||
<h3><A href="add_language.html">Adding a Language</A></h3> | |||
How to add or improve a language. | |||
<h3><A href="phonemes.html">Phonemes</A></h3> | |||
The list of phoneme mnemonics for English, for use in the Pronunciation Dictionary. | |||
<h3><A href="phontab.html">Phoneme Tables</A></h3> | |||
The tables of the phonemes used by each language, with their properties and sound production. | |||
<h3><A href="intonation.html">Intonation</A></h3> | |||
Different intonation "tunes" may be defined for different languages for clauses which end in full-stop, comma, question-mark, and exclamation-mark. | |||
<h3><A href="speak_lib.h">eSpeak Library API</A></h3> | |||
API definition and header file for a shared library version of eSpeak. | |||
<h3><A href="ssml.html">Markup tags</A></h3> | |||
SSML (Speech Synthesis Markup Language) and HTML tags recognized by eSpeak. | |||
<h3><A href="editor.html">The espeakedit program</A></h3> | |||
GUI software to edit vowel files and to compile the phoneme data for use by eSpeak.<br> | |||
<ul> | |||
<li><a href="editor_if.html">espeakedit program GUI details</a> | |||
<li><a href="analyse.html">Analysing sound recordings</a> | |||
<li><a href="makephonemes.html">Adjusting phoneme data</a> (to be written) | |||
</ul> | |||
</td> | |||
</tr> | |||
</tbody> | |||
</table> | |||
</body> | |||
</html> |
@@ -1,75 +0,0 @@ | |||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | |||
<html> | |||
<head> | |||
<title>espeakedit</title> | |||
<meta name="GENERATOR" content="Quanta Plus"> | |||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |||
</head> | |||
<body> | |||
<A href="docindex.html">Back</A> | |||
<hr> | |||
<h2>ESPEAKEDIT PROGRAM</h2> | |||
<hr> | |||
The <strong>espeakedit</strong> program is used to prepare phoneme data for the eSpeak speech synthesizer.<p> | |||
It has two main functions: | |||
<ul> | |||
<li>Prepare keyframe files for individual vowels and voiced consonants. These each contain a sequence of keyframes which define how formant peaks (peaks in the frequency spectrum) vary during the sound.<p> | |||
<li>Process the master <strong>phonemes</strong> file which, by including the phoneme files for the various languages, defines all their phonemes and references the keyframe files and the sound sample files which they use. <strong>espeakedit</strong> processes these and compiles them into the <strong>phondata</strong>, <strong>phonindex</strong>, and <strong>phontab</strong> files in the <strong>espeak-data</strong> directory which are used by the eSpeak speech synthesizer. | |||
</ul> | |||
<hr> | |||
<h3>Installation</h3> | |||
<strong>espeakedit</strong> needs the following packages:<br> | |||
(The package names mentioned here are those from the Ubuntu "Dapper" Linux distribution). | |||
<ul> | |||
<li><strong>sox</strong> (a universal sound sample translator) | |||
<li><strong>libwxgtk2.6-0</strong> (wxWidgets Cross-platform C++ GUI toolkit) | |||
<li><strong>portaudio0</strong> (Portaudio V18, portable audio I/O) | |||
</ul> | |||
In addition, a modified version of <strong>praat</strong> (<a href="www.praat.org">www.praat.org</a>) is used to view and analyse WAV sound files. | |||
This needs the package <strong>libmotif3</strong> to run and <strong>libmotif-dev</strong> to compile. | |||
<hr> | |||
<h3>Quick Guide</h3> | |||
This will quickly illustrate the main features. Details of the interface and key commands are given in <a href="editor_if.html">editor_if.html</a><p> | |||
For more detailed information on analysing sound recordings and preparing phoneme definitions and keyframe data see <a href="analyse.html">analyse.html</a> (to be written). | |||
<h4>Compiling Phoneme Data</h4> | |||
<ol> | |||
<li>Run the <strong>espeakedit</strong> program.<p> | |||
<li>Select <b>Data->Compile phoneme data</b> from the menu bar. Dialog boxes will ask you to locate the directory (<b>phsource</b>) which contains the master phonemes file, and the directory (<b>dictsource,</b>) which contains the dictionary files (en_rules, en_list, etc). Once specified, espeakedit will remember their locations, although they can be changed later from <b>Options->Paths</b>.<p> | |||
<li>A message in the status line at the bottom of the espeakedit window will indicate whether there are any errors in the phoneme data, and how many language's dictionary files have been compiled. The compiled data is placed into the <b>espeak-data</b> directory, ready for use by the speak program. If errors are found in the phoneme data, they are listed in a file <b>error_log</b> in the <b>phsource</b> directory.</li> | |||
<p> | |||
NOTE: espeakedit can be used from the command line to compile the phoneme data, with the command: <b> espeakedit --compile</b> | |||
<li>Select <b>Tools->Make vowels chart->From compiled phoneme data</b>. This will look for the vowels in the compiled phoneme data of each language and produce a vowel chart (.png file) in <b>phsource/vowelcharts</b>. These charts plot the vowels' F1 (formant 1) frequency against their F2 frequency, which corresponds approximately to their open/close and front/back positions. The colour in the circle for each vowel indicates its F3 frequency, red indicates a low F3, through yellow and green to blue and violet for a high F3. In the case of a diphthong, a line is drawn from the circle to the position of the end of the vowel. | |||
</ol> | |||
<h4>Keyframe Sequences</h4> | |||
<ol> | |||
<li>Select <b>File->Open</b> from the menu bar and select a vowel file, <b>phsource/vowel/a</b>. This will open a tab in the espeakedit window which contains a sequence of 4 keyframes. Each keyframe shows a black graph, which is the outline of an original analysed spectrum from a sound recording, and also a green line, which shows the formant peaks which have been added (using the black graph as a guide) and which produce the sound.<p> | |||
<li>Click in the "a" tab window and then press the <b>F2</b> key. This will produce and play the sound of the keyframe sequence. The first time you do this, you'll get a save dialog asking where you want the WAV file to be saved. Once you give a location all future sounds will be stored in that same location, although it can be changed from <b>Options->Paths</b>.<p> | |||
<li>Click on the second of the four frames, the one with the red square. Press <b>F1</b>. That plays the sound of just that frame.<p> | |||
<li>Press the <b>1</b> (number one) key. That selects formant F1 and a red triangle appears under the F1 formant peak to indicate that it's selected. Also an = sign appears next to formant 1 in the formants list in the left panel of the window.<p> | |||
<li>Press the left-arrow key a couple of times to move the F1 peak to the left. The red triangle and its associated green formant peak moves lower frequency. Its numeric value in the formants list in the left panel decreases.<p> | |||
<li>Press the <b>F1</b> key again. The frame will give a slightly different vowel sound. As you move the F1 peak slightly up and down and then press <b>F1</b> again, the sound changes. Similarly if you press the <b>2</b> key to select the F2 formant, then moving that will also change the sound. If you move the F1 peak down to about 700 Hz (and reduce its height a bit with the down-arrow key) and move F2 up to 1400 Hz, then you'll hear a "er" schwa [@] sound instead of the original [a].<p> | |||
<li>Select <b>File->Open</b> and choose <b>phsource/vowel/aI</b>. This opens a new tab labelled "aI" which contains more frames. This is the [aI] diphthong and if you click in the tab window and press <b>F2</b> you'll hear the English word "eye". If you click on each frame in turn and press <b>F1</b> then you can hear each of the keyframes in turn. They sound different, starting with an [A] sound (as in "palm"), going through something like [@] in "her" and ending with something like [I] in "kit" (or perhaps a French é). Together they make the diphthong [aI]. | |||
</ol> | |||
<h4>Text and Prosody Windows</h4> | |||
<ol> | |||
<li>Click on the <b>Text</b> tab in the left panel. Two text windows appear in the panel with buttons <b>Translate</b> and <b>Speak</b> below them.<p> | |||
<li>Type some text into the top window and click the <b>Translate</b> button. The phonetic translation will appear in the lower window.<p> | |||
<li>Click the <b>Speak</b> button. The text will be spoken and a <b>Prosody</b> tab will open in the main window.<p> | |||
<li>Click on a vowel phoneme which is displayed in the Prosody tab. A red line appears under it to indicate that it has been selected.<p> | |||
<li>Use the <b>up-arrow</b> or <b>down-arrow</b> key to move the vowel's blue pitch contour up or down. Then click the <b>Speak</b> button again to hear the effect of the altered pitch. If the adjacent phoneme also has a pitch contour then you may hear a discontinuity in the sound if it no longer matches with the one which you have moved.<p> | |||
<li>Hold down the <b>Ctrl</b> key while using the <b>up-arrow</b> or <b>down-arrow</b> keys. The gradient of the pitch contour will change.<p> | |||
<li>Click with the right mouse button over a phoneme. A menu allows you to select a different pitch envelope shape. Details of the currently selected phoneme appear in the Status line at the bottom of the window. The <b>Stress</b> number gives the stress level of the phoneme (see voices.html for a list).<p> | |||
<li>Click the <b>Translate</b> button. This re-translates the text and restores the original pitches.<p> | |||
<li>Click on a vowel phoneme in the Prosody window and use the <b><</b> and <b>></b> keys to shorten or lengthen it.<p> | |||
</ol> | |||
The Prosody window can be used to experiment with different phoneme lengths and different intonation.<p> | |||
<hr> | |||
</body> | |||
</html> | |||
@@ -1,46 +1,72 @@ | |||
ESPEAKEDIT PROGRAM {.western} | |||
------------------ | |||
# Table of contents | |||
The **espeakedit** program is used to prepare phoneme data for the | |||
eSpeak speech synthesizer. | |||
* [Espeakedit program](#espeakedit-program) | |||
* [Installation](#installation) | |||
* [Quick Guide](#quick-guide) | |||
* [Compiling Phoneme Data](#compiling-phoneme-data) | |||
* [Keyframe Sequences](#keyframe-sequences) | |||
* [Text and Prosody Windows](#text-and-prosody-windows) | |||
# Espeakedit program | |||
The **espeakedit** program is used to prepare phoneme data for the eSpeak speech synthesizer. | |||
It has two main functions: | |||
- - | |||
* Prepare keyframe files for individual vowels and voiced consonants. These each contain a sequence of keyframes which define how formant peaks (peaks in the frequency spectrum) vary during the sound. | |||
* Process the master **phonemes** file which, by including the phoneme files for the various languages, defines all their phonemes and references the keyframe files and the sound sample files which they use. **espeakedit** processes these and compiles them into the **phondata**, **phonindex**, and **phontab** files in the **espeak-data** directory which are used by the eSpeak speech synthesizer. | |||
## Installation | |||
**espeakedit** needs the following packages: | |||
(The package names mentioned here are those from the Ubuntu "Dapper" Linux distribution). | |||
* **sox** (a universal sound sample translator) | |||
* **libwxgtk2.6-0** (wxWidgets Cross-platform C++ GUI toolkit) | |||
* **portaudio0** (Portaudio V18, portable audio I/O) | |||
In addition, a modified version of **praat** ([www.praat.org](http://www.praat.org/)) is used to view and analyse WAV sound files. This needs the package **libmotif3** to run and **libmotif-dev** to compile. | |||
### Installation {.western} | |||
## Quick Guide | |||
**espeakedit** needs the following packages:\ | |||
(The package names mentioned here are those from the Ubuntu "Dapper" | |||
Linux distribution). | |||
This will quickly illustrate the main features. Details of the interface and key commands are given in [editor_if](editor_if.md) | |||
- - - | |||
For more detailed information on analysing sound recordings and preparing phoneme definitions and keyframe data see [analyse](analyse.md). | |||
In addition, a modified version of **praat** | |||
([www.praat.org](www.praat.org)) is used to view and analyse WAV sound | |||
files. This needs the package **libmotif3** to run and **libmotif-dev** | |||
to compile. | |||
### Compiling Phoneme Data | |||
### Quick Guide {.western} | |||
1. Run the `espeakedit` program. | |||
2. Select **Data->Compile phoneme data** from the menu bar. Dialog boxes will ask you to locate the directory (`phsource`) which contains the master phonemes file, and the directory (`dictsource,`) which contains the dictionary files (en_rules, en_list, etc). Once specified, espeakedit will remember their locations, although they can be changed later from **Options->Paths**. | |||
3. A message in the status line at the bottom of the espeakedit window will indicate whether there are any errors in the phoneme data, and how many language's dictionary files have been compiled. The compiled data is placed into the `espeak-data` directory, ready for use by the speak program. If errors are found in the phoneme data, they are listed in a file `error_log` in the `phsource` directory. | |||
This will quickly illustrate the main features. Details of the interface | |||
and key commands are given in [editor\_if.html](editor_if.html) | |||
NOTE: espeakedit can be used from the command line to compile the phoneme data, with the command: | |||
For more detailed information on analysing sound recordings and | |||
preparing phoneme definitions and keyframe data see | |||
[analyse.html](analyse.html) (to be written). | |||
`espeakedit --compile` | |||
#### Compiling Phoneme Data {.western} | |||
5. Select **Tools->Make vowels chart->From compiled phoneme data**. This will look for the vowels in the compiled phoneme data of each language and produce a vowel chart (.png file) in `phsource/vowelcharts`. These charts plot the vowels' F1 (formant 1) frequency against their F2 frequency, which corresponds approximately to their open/close and front/back positions. The colour in the circle for each vowel indicates its F3 frequency, red indicates a low F3, through yellow and green to blue and violet for a high F3\. In the case of a diphthong, a line is drawn from the circle to the position of the end of the vowel. | |||
1. 2. 3. 4. | |||
### Keyframe Sequences | |||
#### Keyframe Sequences {.western} | |||
1. Select **File->Open** from the menu bar and select a vowel file, `phsource/vowel/a`. This will open a tab in the espeakedit window which contains a sequence of 4 keyframes. Each keyframe shows a black graph, which is the outline of an original analysed spectrum from a sound recording, and also a green line, which shows the formant peaks which have been added (using the black graph as a guide) and which produce the sound. | |||
2. Click in the "a" tab window and then press the **F2** key. This will produce and play the sound of the keyframe sequence. The first time you do this, you'll get a save dialog asking where you want the WAV file to be saved. Once you give a location all future sounds will be stored in that same location, although it can be changed from **Options->Paths**. | |||
3. Click on the second of the four frames, the one with the red square. Press **F1**. That plays the sound of just that frame. | |||
4. Press the **1** (number one) key. That selects formant F1 and a red triangle appears under the F1 formant peak to indicate that it's selected. Also an = sign appears next to formant 1 in the formants list in the left panel of the window. | |||
5. Press the left-arrow key a couple of times to move the F1 peak to the left. The red triangle and its associated green formant peak moves lower frequency. Its numeric value in the formants list in the left panel decreases. | |||
6. Press the **F1** key again. The frame will give a slightly different vowel sound. As you move the F1 peak slightly up and down and then press **F1** again, the sound changes. Similarly if you press the **2** key to select the F2 formant, then moving that will also change the sound. If you move the F1 peak down to about 700 Hz (and reduce its height a bit with the down-arrow key) and move F2 up to 1400 Hz, then you'll hear a "er" schwa [@] sound instead of the original [a]. | |||
7. Select **File->Open** and choose `phsource/vowel/aI`. This opens a new tab labelled "aI" which contains more frames. This is the [aI] diphthong and if you click in the tab window and press **F2** you'll hear the English word "eye". If you click on each frame in turn and press **F1** then you can hear each of the keyframes in turn. They sound different, starting with an [A] sound (as in "palm"), going through something like [@] in "her" and ending with something like [I] in "kit" (or perhaps a French é). Together they make the diphthong [aI]. | |||
1. 2. 3. 4. 5. 6. 7. | |||
### Text and Prosody Windows | |||
#### Text and Prosody Windows {.western} | |||
1. Click on the **Text** tab in the left panel. Two text windows appear in the panel with buttons **Translate** and **Speak** below them. | |||
2. Type some text into the top window and click the **Translate** button. The phonetic translation will appear in the lower window. | |||
3. Click the **Speak** button. The text will be spoken and a **Prosody** tab will open in the main window. | |||
4. Click on a vowel phoneme which is displayed in the Prosody tab. A red line appears under it to indicate that it has been selected. | |||
5. Use the **up-arrow** or **down-arrow** key to move the vowel's blue pitch contour up or down. Then click the **Speak** button again to hear the effect of the altered pitch. If the adjacent phoneme also has a pitch contour then you may hear a discontinuity in the sound if it no longer matches with the one which you have moved. | |||
6. Hold down the **Ctrl** key while using the **up-arrow** or **down-arrow** keys. The gradient of the pitch contour will change. | |||
7. Click with the right mouse button over a phoneme. A menu allows you to select a different pitch envelope shape. Details of the currently selected phoneme appear in the Status line at the bottom of the window. The **Stress** number gives the stress level of the phoneme (see voices.html for a list). | |||
8. Click the **Translate** button. This re-translates the text and restores the original pitches. | |||
9. Click on a vowel phoneme in the Prosody window and use the **<** and **>** keys to shorten or lengthen it. | |||
1. 2. 3. 4. 5. 6. 7. 8. 9. | |||
The Prosody window can be used to experiment with different phoneme lengths and different intonation. | |||
The Prosody window can be used to experiment with different phoneme | |||
lengths and different intonation. |
@@ -1,143 +0,0 @@ | |||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | |||
<html> | |||
<head> | |||
<title>Editor - Spectrum</title> | |||
<meta name="GENERATOR" content="Quanta Plus"> | |||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | |||
</head> | |||
<body> | |||
<A href="docindex.html">Back</A> | |||
<hr> | |||
<h2>USER INTERFACE - FORMANT EDITOR</h2> | |||
<hr> | |||
<h3>Frame Sequence Display</h3> | |||
The eSpeak editor can display a number of frame-sequencies in tabbed windows. Each frame can contain a short-time frequency spectrum, covering the period of one cycle at the sound's pitch. Frames can also show: | |||
<ul> | |||
<LI>Blue vertical lines showing the estimated position of the f1 to f5 formants (if the sequence was produced by praat analysis). These should correspond with the peaks in the spectrum, but may not do so exactly<p> | |||
<li>Numbers at the right side of the frame showing the position from the start of the sequence in mS, and the pitch of the sound.<p> | |||
<li>Up to 9 formant peaks (numbered 0 to 9) added by the user, usually to match the peaks in the spectrum, in order to produce the required sound. These are shown in green, can be moved by keyboard presses as described below, and may merge if they are close together. If a frame has formant peaks then it is a Keyframe and is shown with a pale yellow background.<p> | |||
<li>If formant peaks are present, a relative amplitude (r.m.s.) value is shown at the right side of the frame. | |||
<li> | |||
</ul> | |||
<h3>Text Tab</h3> | |||
Enter text in the top left text window. Click the <b>Translate</b> button to see the phonetic transcription in the text window below. Then click the <b>Speak</b> button to speak the text and show the results in the <b>Prosody</b> tab, if that is open. | |||
<p> | |||
If changes are made in the <b>Prosody</b> tab, then clicking <b>Speak</b> will speak the modified prosody while <b>Translate</b> will revert to the default prosody settings for the text. | |||
<p> | |||
To enter phonetic symbols (Kirschenbaum encoding) in the top left text window, enclose them within [[ ]]. | |||
<h3>Spect Tab</h3> | |||
The "Spect" tab in the left panel of the eSpeak editor shows information about the currently selected frame and sequence. | |||
<ul> | |||
<li>The <strong>Formants</strong> section displays the Frequency, Height, and Width of each formant peak (peaks 0 to 8). Peaks 6, 7, 8 don't have a variable width.<p> | |||
<li><strong>% amp - Frame</strong> can be used to adjust the amplitiude of the frame. If you change this value then the rms amplitude value at the right side of the frame will change. The formant peaks don't change, just the overall amplitude of the frame.<p> | |||
<li><strong>mS</strong> shows the time in mS until the next keyframe (or end of sequence if there is none). The spin control initially shows the same value, but this can be changed in order to increase or decrease the effctive length of a keyframe.<p> | |||
<li><strong>% amp - Sequence</strong> /ul> adjusts the amplitude of the whole sequence. Changing this values changes the rms amplitudes of all the keyframes in the sequence.<p> | |||
<li><strong>% mS - Sequence</strong> /ul> shows the total length of the sequence.<p> | |||
<li><strong>Graph</strong><br> | |||
Yellow vertical lines show the position of keyframes within the sequence.<br> | |||
Black bars on these show the frequencies of formant peaks which have been set at these keyframes.<br> | |||
Thick red lines, if present, show the formants, as detected in the original analysis.<br> | |||
Thin black line, if present, shows the pitch profile measured in the original analysis. | |||
</ul> | |||
</li> | |||
</ul> | |||
<h3>Key Commands</h3> | |||
<ul> | |||
<li><strong>Selection</strong>.<p> | |||
The selected frame(s) are shown with a red border. The selected formant peak is also indicated by an equals ("=") sign next to its number in the "Spect" panel to the right of the window.<p> | |||
The selected formant peak is shown with a red triangle under the peak.<p> | |||
Keyframes are shown with a pale yellow background. A keyframe is any frame with any formant peaks which are not zero height. If all formant peaks become zero height, the frame is no longer a keyframe. If you increase a peak's height the frame becomes a keyframe. | |||
<dl> | |||
<dt><strong>Numbers 0 to 8</strong> | |||
<dd>Select formant peak number 0 to 8. | |||
<dt><strong>Page Up/Down</strong> | |||
<dd>Move to next/previous frame | |||
</dl> | |||
<li><strong>Formant movement</strong>. With the following keys, holding down <b>Shift</b> causes slower movement. | |||
<dl> | |||
<dt>Left | |||
<dd>Moves the selected formant peak to higher frequency. | |||
<dt>Right | |||
<dd>Moves the selected formant peak to lower frequency. | |||
<dt>Up | |||
<dd>Increases height of the selected formant peak. | |||
<dt>Down | |||
<dd>Decreases height of the selected formant peak. | |||
<dt><strong><</strong> | |||
<dd>Narrows the selected formant peak. | |||
<dt><strong>></strong> | |||
<dd>Widens the selected formant peak. | |||
<dt><strong>CTRL <</strong> | |||
<dd>Narrows the selected formant peak. | |||
<dt><strong>CTRL ></strong> | |||
<dd>Widens the selected formant peak. | |||
<dt><b>/</b> | |||
<dd>Makes the selected formant peak symmetrical. | |||
</dl> | |||
<li><strong>Frame Cut and Paste</strong> | |||
<dl> | |||
<dt><b>CTRL A</b> | |||
<dd>Select all frames in the sequence. | |||
<dt><b>CTRL C</b> | |||
<dd>Copy selected frames to (internal) clipboard. | |||
<dt><b>CTRL V</b> | |||
<dd>Paste frames from the clipboard to overwrite the contents of the selected frame and the frames which follow it. Only the formant peaks information is pasted. | |||
<dt><b>CTRL SHIFT V</b> | |||
<dd>Paste frames from the clippoard to insert them above the selected frame. | |||
<dt><b>CTRL X</b> | |||
<dd>Delete the selected frames. | |||
</dl> | |||
<li><strong>Frame editing</strong> | |||
<dl> | |||
<dt><b>CTRL D</b> | |||
<dd>Copy the formant peaks down to the selected frame from the next keyframe above. | |||
<dt><b>CTRL SHIFT D</b> | |||
<dd>Copy the formant peaks up to the selected frame from the next key-frame below. | |||
<dt><b>CTRL Z</b> | |||
<dd>Set all formant peaks in the selected frame to zero height. It is no longer a key-frame. | |||
<dt><b>CTRL I</b> | |||
<dd>Set the formant peaks in the selected frame as an interpolation between the next keyframes above and below it. A dialog box allows you to enter a percentage. 50% gives values half-way between the two adjacent key-frames, 0% gives values equal to the one above, and 100% equal to the one below. | |||
</dl> | |||
<li><strong>Display and Sound</strong> | |||
<dl> | |||
<dt><b>CTRL Q</b> | |||
<dd>Shows interpolated formant peaks on non-keyframes. These frames don't become keyframes until any of the peaks are edited to increase their height. | |||
<dt><b>CTRL SHIFT Q</b> | |||
<dd>Removes the interpolated formant peaks display. | |||
<dt><b>CTRL G</b> | |||
<dd>Toggle grid on and off. | |||
<dt><b>F1</b> | |||
<dd>Play sound made from the one selected keyframe. | |||
<dt><b>F2</b> | |||
<dd>Play sound made from all the keyframes in the sequence. | |||
</ul> | |||
<p> | |||
<hr> | |||
<h2>USER INTERFACE - PROSODY EDITOR</h2> | |||
<hr> | |||
<ul><LI> | |||
<dl> | |||
<dt><b>Left</b> | |||
<dd>Move to previous phoneme. | |||
<dt><b>Right</b> | |||
<dd>Move to next phoneme. | |||
<dt><b>Up</b> | |||
<dd>Increase pitch. | |||
<dt><b>Down</b> | |||
<dd>Decrease pitch. | |||
<dt><b>Ctrl Up</b> | |||
<dd>Increase pitch range. | |||
<dt><b>Ctrl Down</b> | |||
<dd>Decrease pitch range. | |||
<dt><b>></b> | |||
<dd>Increase length. | |||
<dt><b><</b> | |||
<dd>Decrease length. | |||
</dd> | |||
</dl> | |||
</LI> | |||
</ul> | |||
</body> | |||
</html> |
@@ -1,41 +1,180 @@ | |||
USER INTERFACE - FORMANT EDITOR {.western} | |||
------------------------------- | |||
# Table of contents | |||
### Frame Sequence Display {.western} | |||
* [User interface - formant editor](#user-interface---formant-editor) | |||
* [Frame Sequence Display](#frame-sequence-display) | |||
* [Text Tab](#text-tab) | |||
* [Spect Tab](#spect-tab) | |||
* [Key Commands](#key-commands) | |||
* [Selection](#selection) | |||
* [Formant movement](#formant-movement) | |||
* [Frame Cut and Paste](#frame-cut-and-paste) | |||
* [Frame editing](#frame-editing) | |||
* [Display and Sound](#display-and-sound) | |||
* [User interface - prosody editor](#user-interface---prosody-editor) | |||
The eSpeak editor can display a number of frame-sequencies in tabbed | |||
windows. Each frame can contain a short-time frequency spectrum, | |||
covering the period of one cycle at the sound's pitch. Frames can also | |||
show: | |||
# User interface - formant editor | |||
- - - - - | |||
## Frame Sequence Display | |||
### Text Tab {.western} | |||
The eSpeak editor can display a number of frame-sequencies in tabbed windows. Each frame can contain a short-time frequency spectrum, covering the period of one cycle at the sound's pitch. Frames can also show: | |||
Enter text in the top left text window. Click the **Translate** button | |||
to see the phonetic transcription in the text window below. Then click | |||
the **Speak** button to speak the text and show the results in the | |||
**Prosody** tab, if that is open. | |||
* Blue vertical lines showing the estimated position of the f1 to f5 formants (if the sequence was produced by praat analysis). These should correspond with the peaks in the spectrum, but may not do so exactly | |||
* Numbers at the right side of the frame showing the position from the start of the sequence in miliseconds, and the pitch of the sound. | |||
* Up to 9 formant peaks (numbered 0 to 9) added by the user, usually to match the peaks in the spectrum, in order to produce the required sound. These are shown in green, can be moved by keyboard presses as described below, and may merge if they are close together. If a frame has formant peaks then it is a Keyframe and is shown with a pale yellow background. | |||
* If formant peaks are present, a relative amplitude (r.m.s.) value is shown at the right side of the frame. | |||
If changes are made in the **Prosody** tab, then clicking **Speak** will | |||
speak the modified prosody while **Translate** will revert to the | |||
default prosody settings for the text. | |||
## Text Tab | |||
To enter phonetic symbols (Kirschenbaum encoding) in the top left text | |||
window, enclose them within [[ ]]. | |||
Enter text in the top left text window. Click the **Translate** button to see the phonetic transcription in the text window below. Then click the **Speak** button to speak the text and show the results in the **Prosody** tab, if that is open. | |||
### Spect Tab {.western} | |||
If changes are made in the **Prosody** tab, then clicking **Speak** will speak the modified prosody while **Translate** will revert to the default prosody settings for the text. | |||
The "Spect" tab in the left panel of the eSpeak editor shows information | |||
about the currently selected frame and sequence. | |||
To enter phonetic symbols in [Kirschenbaum](https://en.wikipedia.org/wiki/Kirshenbaum)-like encoding in the top left text window, enclose them within **[[ ]]**. | |||
- - - - - - | |||
## Spect Tab | |||
### Key Commands {.western} | |||
* **Spect** | |||
tab in the left panel of the eSpeak editor shows information about the currently selected frame and sequence. | |||
- - - - - | |||
* **Formants** | |||
section displays the Frequency, Height, and Width of each formant peak (peaks 0 to 8). Peaks 6, 7, 8 don't have a variable width. | |||
USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"} | |||
------------------------------- | |||
* **% amp - Frame** | |||
can be used to adjust the amplitiude of the frame. If you change this value then the rms amplitude value at the right side of the frame will change. | |||
The formant peaks don't change, just the overall amplitude of the frame. | |||
* **mS** | |||
shows the time in miliseconds until the next keyframe (or end of sequence if there is none). | |||
The spin control initially shows the same value, but this can be changed in order to increase or decrease the effctive length of a keyframe. | |||
* **% amp - Sequence** | |||
adjusts the amplitude of the whole sequence. Changing this values changes the rms amplitudes of all the keyframes in the sequence. | |||
* **% mS - Sequence** | |||
shows the total length of the sequence. | |||
* **Graph** | |||
Yellow vertical lines show the position of keyframes within the sequence. | |||
Black bars on these show the frequencies of formant peaks which have been set at these keyframes. | |||
Thick red lines, if present, show the formants, as detected in the original analysis. | |||
Thin black line, if present, shows the pitch profile measured in the original analysis. | |||
## Key Commands | |||
### Selection | |||
The selected frame(s) are shown with a red border. The selected formant peak is also indicated by an equals (**=**) sign next to its number in the "Spect" panel to the right of the window. | |||
The selected formant peak is shown with a red triangle under the peak. | |||
Keyframes are shown with a pale yellow background. A keyframe is any frame with any formant peaks which are not zero height. If all formant peaks become zero height, the frame is no longer a keyframe. If you increase a peak's height the frame becomes a keyframe. | |||
* **Numbers 0 to 8** | |||
Select formant peak number 0 to 8. | |||
* **Page Up/Down** | |||
Move to next/previous frame | |||
### Formant movement | |||
With the following keys, holding down **Shift** causes slower movement. | |||
* **Left** | |||
Moves the selected formant peak to higher frequency. | |||
* **Right** | |||
Moves the selected formant peak to lower frequency. | |||
* **Up** | |||
Increases height of the selected formant peak. | |||
* **Down** | |||
Decreases height of the selected formant peak. | |||
* **<** | |||
Narrows the selected formant peak. | |||
* **>** | |||
Widens the selected formant peak. | |||
* **CTRL <** | |||
Narrows the selected formant peak. | |||
* **CTRL >** | |||
Widens the selected formant peak. | |||
* **/** | |||
Makes the selected formant peak symmetrical. | |||
### Frame Cut and Paste | |||
* **CTRL A** | |||
Select all frames in the sequence. | |||
* **CTRL C** | |||
Copy selected frames to (internal) clipboard. | |||
* **CTRL V** | |||
Paste frames from the clipboard to overwrite the contents of the selected frame and the frames which follow it. Only the formant peaks information is pasted. | |||
* **CTRL SHIFT V** | |||
Paste frames from the clippoard to insert them above the selected frame. | |||
* **CTRL X** | |||
Delete the selected frames. | |||
### Frame editing | |||
* **CTRL D** | |||
Copy the formant peaks down to the selected frame from the next keyframe above. | |||
* **CTRL SHIFT D** | |||
Copy the formant peaks up to the selected frame from the next key-frame below. | |||
* **CTRL Z** | |||
Set all formant peaks in the selected frame to zero height. It is no longer a key-frame. | |||
* **CTRL I** | |||
Set the formant peaks in the selected frame as an interpolation between the next keyframes above and below it. A dialog box allows you to enter a percentage. 50% gives values half-way between the two adjacent key-frames, 0% gives values equal to the one above, and 100% equal to the one below. | |||
### Display and Sound | |||
* **CTRL Q** | |||
Shows interpolated formant peaks on non-keyframes. These frames don't become keyframes until any of the peaks are edited to increase their height. | |||
* **CTRL SHIFT Q** | |||
Removes the interpolated formant peaks display. | |||
* **CTRL G** | |||
Toggle grid on and off. | |||
* **F1** | |||
Play sound made from the one selected keyframe. | |||
* **F2** | |||
Play sound made from all the keyframes in the sequence. | |||
# User interface - prosody editor | |||
* **Left** | |||
Move to previous phoneme. | |||
* **Right** | |||
Move to next phoneme. | |||
* **Up** | |||
Increase pitch. | |||
* **Down** | |||
Decrease pitch. | |||
* **Ctrl Up** | |||
Increase pitch range. | |||
* **Ctrl Down** | |||
Decrease pitch range. | |||
* **>** | |||
Increase length. | |||
* **<** | |||
Decrease length. | |||
- |
@@ -1,87 +0,0 @@ | |||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | |||
<html> | |||
<head> | |||
<title>eSpeak: Speech Synthesizer</title> | |||
</head> | |||
<body> | |||
<table border="1" cellpadding="10" background="images/sand-light.jpg"> | |||
<tbody> | |||
<tr> | |||
<td width="15%" valign="top"> | |||
<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=159649&type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a> | |||
</td> | |||
<td> | |||
<div align="center"><IMG src="images/lips.png" width="193" height="172" border="0"> | |||
<h1>eSpeak text to speech</h1></div> | |||
<div align="center"> | |||
(email) jonsd at users dot sourceforge.net<br> | |||
<a href="http://espeak.sf.net/download.html"><strong>Download</strong></a> | |||
| |||
<a href="http://sourceforge.net/projects/espeak/"><strong>eSpeak Sourceforge page</a> | |||
| |||
<a href="http://sourceforge.net/forum/?group_id=159649"><strong>Forum</strong></a> | |||
| |||
<a href="http://sourceforge.net/mail/?group_id=159649"><strong>Mailing list</strong></a> | |||
</div> | |||
</td> | |||
</tr> | |||
<tr> | |||
<td valign="top"> | |||
<font size="+1"><strong> | |||
<A href="commands.html">Usage</a> | |||
<p> | |||
<A href="languages.html">Languages</A> | |||
<p> | |||
<A href="docindex.html">Documents</A> | |||
<p> | |||
<A href="http://espeak.sf.net/samples.html">Samples</A> | |||
<p> | |||
<A href="http://espeak.sf.net/license.html">License</A> | |||
</strong></font> | |||
</td> | |||
<td> | |||
eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. | |||
<a href="http://espeak.sourceforge.net/"><strong>http://espeak.sourceforge.net</strong></a> | |||
<p> | |||
eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. | |||
<p> | |||
eSpeak is available as: | |||
<ul> | |||
<li>A command line program (Linux and Windows) to speak text from a file or from stdin. | |||
<li>A shared library version for use by other programs. (On Windows this is a DLL). | |||
<li>A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. | |||
<li>eSpeak has been ported to other platforms, including Solaris and Mac OSX. | |||
</ul> | |||
Features. | |||
<ul> | |||
<li>Includes different Voices, whose characteristics can be altered. | |||
<li>Can produce speech output as a WAV file. | |||
<li>SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML. | |||
<li>Compact size. The program and its data, including many languages, totals about 1.4 Mbytes. | |||
<li>Can be used as a front-end to MBROLA diphone voices, see <a href="mbrola.html">mbrola.html</a>. eSpeak converts text to phonemes with pitch and length information. | |||
<li>Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine. | |||
<li>Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome. | |||
<li>Development tools are available for producing and tuning phoneme data. | |||
<li>Written in C. | |||
</ul> | |||
<p> | |||
I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh. | |||
<hr> | |||
<strong>Languages</strong>. The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. Please contact me if you want to help.<p> | |||
eSpeak does text to speech synthesis for the following languages, some better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh. | |||
<hr> | |||
The latest <strong>development version</strong> is at: | |||
<a href="http://espeak.sf.net/test/latest.html">espeak.sf.net/test/latest.html</a>. | |||
<hr> | |||
<strong>espeakedit</strong> is a GUI program used to prepare and compile phoneme data. It is now available for download. Documentation is currently sparse, but if you want to use it to add or improve language support, let me know. | |||
<hr> | |||
History. Originally known as <strong>speak</strong> and originally written for Acorn/RISC_OS computers starting in 1995. This version is an enhancement and re-write, including a relaxation of the original memory and processing power constraints, and with support for additional languages. | |||
</td> | |||
</tr> | |||
</tbody> | |||
</table> | |||
</body> | |||
</html> |
@@ -1,3 +1,4 @@ | |||
<<<<<<< HEAD | |||
# eSpeak NG - Documentation | |||
====================== | |||
@@ -50,3 +51,77 @@ GUI software to edit vowel files and to compile the phoneme data for use | |||
by eSpeak NG. See also [Espeakedit user interface](editor_if.md). | |||
======= | |||
# eSpeak NG: Speech Synthesizer | |||
- [Features](#features) | |||
- [History](#history) | |||
- [Languages](languages.html) | |||
- [Adding a Language](add_language.html) | |||
- [Pronunciation Dictionary](dictionary.html) | |||
- [Voice Files](voices.html) | |||
- [MBROLA Voices](mbrola.html) | |||
- [Phonemes](phonemes.html) | |||
- [Phoneme Tables](phontab.html) | |||
- [Intonation](intonation.html) | |||
- [Markup Tags](ssml.html) | |||
- [License](../COPYING) | |||
---------- | |||
eSpeak NG is a compact open source software speech synthesizer for English and | |||
other languages, for Linux and Windows. | |||
eSpeak NG uses a "formant synthesis" method. This allows many languages to be | |||
provided in a small size. The speech is clear, and can be used at high speeds, | |||
but is not as natural or smooth as larger synthesizers which are based on human | |||
speech recordings. | |||
eSpeak is available as: | |||
* A command line program (Linux and Windows) to speak text from a file or | |||
from stdin. | |||
* A shared library version for use by other programs. (On Windows this is | |||
a DLL). | |||
* A SAPI5 version for Windows, so it can be used with screen-readers and | |||
other programs that support the Windows SAPI5 interface. | |||
* eSpeak has been ported to other platforms, including Solaris and Mac OSX. | |||
## Features | |||
* Includes different Voices, whose characteristics can be altered. | |||
* Can produce speech output as a WAV file. | |||
* SSML (Speech Synthesis Markup Language) is supported (not complete), | |||
and also HTML. | |||
* Compact size. The program and its data, including many languages, | |||
totals about 1.4 Mbytes. | |||
* Can be used as a front-end to [MBROLA diphone voices](mbrola.html). | |||
eSpeak NG converts text to phonemes with pitch and length information. | |||
* Can translate text into phoneme codes, so it could be adapted as a | |||
front end for another speech synthesis engine. | |||
* Potential for other languages. Several are included in varying stages | |||
of progress. Help from native speakers for these or other languages is | |||
welcome. | |||
* Written in C. | |||
The eSpeak speech synthesizer supports over 70 languages, however in many cases | |||
these are initial drafts and need more work to improve them. Assistance from | |||
native speakers is welcome for these, or other new languages. Please contact me | |||
if you want to help. | |||
## History | |||
The program was originally known as __speak__ and originally written | |||
for Acorn/RISC\_OS computers starting in 1995 by Jonathan Duddington. This was | |||
enhanced and re-written in 2007 as __eSpeak__, including a relaxation of the | |||
original memory and processing power constraints, and with support for additional | |||
languages. | |||
In 2010, Reece H. Dunn started maintaining a version of eSpeak on GitHub that | |||
was designed to make it easier to build eSpeak on POSIX systems, porting the | |||
build system to autotools in 2012. In late 2015, this project was officially | |||
forked to a new eSpeak NG project. The new eSpeak NG project is a significant | |||
departure from the eSpeak project, with the intention of cleaning up the | |||
existing codebase, adding new features and adding and improving to the | |||
supported languages. | |||
>>>>>>> upstream/master |
@@ -1,38 +1,52 @@ | |||
INTONATION {.western} | |||
---------- | |||
# Table of contents | |||
In eSpeak's standard intonation model, a "tune" is applied to each | |||
* [Intonation](#intonation) | |||
* [Clauses](#clauses) | |||
* [Tune definitions](#tune-definitions) | |||
# Intonation | |||
In eSpeak NG's standard intonation model, a "tune" is applied to each | |||
clause depending on its punctuation. Other intonation models may be used | |||
for some languages, such as tone languages. | |||
Named tunes are defined in the text file: | |||
`phsource/intonation`{.western}. This file must be compiled for use by | |||
eSpeak by using the espeakedit program, using the menu option: | |||
`Compile -> Compile intonation data`{.western}. | |||
`phsource/intonation`. This file must be compiled for use by | |||
eSpeak NG by using the espeakedit program, using the menu option: | |||
**Compile -> Compile intonation data**. | |||
### Clauses {.western} | |||
## Clauses | |||
The tunes which are used for a language can be specified by using a | |||
`tunes`{.western} statement in a voice file in | |||
`espeak-data/voices`{.western}. eg: | |||
`tunes` statement in a voice file in `espeak-data/voices`. eg: | |||
`tunes s1 c1 q1 e1`{.western} | |||
`tunes s1 c1 q1 e1` | |||
It's parameters are four tune names which are used for clauses which end | |||
in: | |||
1. 2. 3. 4. | |||
1. Full-stop. | |||
1. Comma. | |||
1. Question mark. | |||
1. Exclamation mark. | |||
A clause consists of the following parts: | |||
- - - - | |||
* **Pre-head.** | |||
These are any unstressed syllables before the first stressed syllable. | |||
* **Head** | |||
This is the part from the first stressed syllable up to the last syllable before the nucleus. | |||
* **Nucleus** | |||
This is stressed syllable which is the focus of the clause. eSpeak chooses the last stressed syllable of the clause. | |||
* **Tail** | |||
These are the syllables after the nucleus. | |||
### Tune definitions {.western} | |||
## Tune definitions | |||
Here is an example tune definition from the file | |||
`phsource/intonation`{.western}. | |||
Here is an example tune definition from the file `phsource/intonation`. | |||
~~~~ {.western} | |||
``` | |||
tune s1 | |||
prehead 46 57 | |||
headenv fall 16 | |||
@@ -41,62 +55,62 @@ headextend 0 63 38 13 0 | |||
nucleus fall 70 18 24 12 | |||
nucleus0 fall 64 8 | |||
endtune | |||
~~~~ | |||
``` | |||
It contains: | |||
**tune** \<tune name\> | |||
: Starts the definition of a tune. The `tune name`{.western} can | |||
be used in a `tunes`{.western} statements in voice files. | |||
**endtune** \<tune name\> | |||
: Ends the definition of a tune. | |||
**prehead** \<start pitch\> \<end pitch\> | |||
: Gives the pitch path for any series of unstressed syllables before | |||
the first stressed syllable. | |||
**headenv** \<envelope\> \<height\> | |||
: Gives the pitch envelope which is used for stressed syllables in the | |||
head (before the nucleus), including `onset`{.western} and | |||
`headlast`{.western} syllables if these are specified. | |||
`height`{.western} gives a pitch range for the envelope. | |||
**head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\> | |||
: `start pitch`{.western} and `end pitch`{.western} give a pitch | |||
path for the stressed syllables of the head. `steps`{.western} is | |||
the maximum number of stressed syllables for which this applies. If | |||
there are additional stressed syllables, then the | |||
`headextend`{.western} statement is used for them. | |||
: `unstressed start`{.western} and `unstressed end`{.western} give | |||
a pitch path for unstressed syllables between two stressed | |||
syllables. Their values are relative to the pitch of the previous | |||
stressed syllable. Values are usually negative, meaning that the | |||
unstressed syllables have lower pitch than the previous stressed | |||
syllable. | |||
**headextend** \<percentage list\> | |||
: If the head contains more stressed syllables than is specified by | |||
`steps`{.western}, then `percentage list`{.western} is used. It | |||
contains up to 8 numbers which are used repeatedly for the | |||
additional stressed syllables. A value of 0 corresponds to the lower | |||
the `start pitch`{.western} and `end pitch`{.western} values of the | |||
`head`{.western} statement. 100 corresponds to the higher value. | |||
Negative values and values greater than 100 are allowed. | |||
**nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\> | |||
: This gives the pitch envelope and pitch range of the last stressed | |||
syllable of the clause. `tail start`{.western} and | |||
`tail end`{.western} give a pitch path for the unstressed syllables | |||
which are after the last stressed syllable. | |||
**nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\> | |||
: This is used instead of `nucleus`{.western} if there are no | |||
unstressed syllables after the last stressed syllable. In this case, | |||
the pitch changes of the nucleus and the tail and both included in | |||
the nucleus. | |||
* **tune** \<tune name\> | |||
Starts the definition of a tune. The `tune name` can | |||
be used in a `tunes` statements in voice files. | |||
* **endtune** \<tune name\> | |||
Ends the definition of a tune. | |||
* **prehead** \<start pitch\> \<end pitch\> | |||
Gives the pitch path for any series of unstressed syllables before | |||
the first stressed syllable. | |||
* **headenv** \<envelope\> \<height\> | |||
Gives the pitch envelope which is used for stressed syllables in the | |||
head (before the nucleus), including `onset` and | |||
`headlast` syllables if these are specified. | |||
`height` gives a pitch range for the envelope. | |||
* **head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\> | |||
`start pitch` give a pitch | |||
path for the stressed syllables of the head. `steps` is | |||
the maximum number of stressed syllables for which this applies. If | |||
there are additional stressed syllables, then the | |||
`headextend` statement is used for them. | |||
`unstressed start` give | |||
a pitch path for unstressed syllables between two stressed | |||
syllables. Their values are relative to the pitch of the previous | |||
stressed syllable. Values are usually negative, meaning that the | |||
unstressed syllables have lower pitch than the previous stressed | |||
syllable. | |||
* **headextend** \<percentage list\> | |||
If the head contains more stressed syllables than is specified by | |||
`steps` is used. It | |||
contains up to 8 numbers which are used repeatedly for the | |||
additional stressed syllables. A value of 0 corresponds to the lower | |||
the `start pitch` values of the | |||
`head` statement. 100 corresponds to the higher value. | |||
Negative values and values greater than 100 are allowed. | |||
* **nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\> | |||
This gives the pitch envelope and pitch range of the last stressed | |||
syllable of the clause. `tail start` and | |||
`tail end` give a pitch path for the unstressed syllables | |||
which are after the last stressed syllable. | |||
* **nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\> | |||
This is used instead of `nucleus` if there are no | |||
unstressed syllables after the last stressed syllable. In this case, | |||
the pitch changes of the nucleus and the tail and both included in | |||
the nucleus. | |||
The following attributes may also be included: | |||
**onset** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
: This specifies the pitch for the first stressed syllable of the | |||
head. If the `onset`{.western} statement is present, then the | |||
`head`{.western} statement used for the stressed syllables after the | |||
first. | |||
**headlast** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
: This specifies the pitch for the last stressed syllable of the head | |||
(i.e. the stressed syllable before the nucleus). | |||
* **onset** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
This specifies the pitch for the first stressed syllable of the | |||
head. If the `onset` statement is present, then the | |||
`head` statement used for the stressed syllables after the | |||
first. | |||
* **headlast** \<pitch\> \<unstressed start\> \<unstressed end\> | |||
This specifies the pitch for the last stressed syllable of the head | |||
(i.e. the stressed syllable before the nucleus). | |||
@@ -1,12 +1,24 @@ | |||
3. LANGUAGES {.western} | |||
------------ | |||
**Languages**. The eSpeak speech synthesizer supports several languages, | |||
# Table of contents | |||
* [Languages](#languages) | |||
* [Help Needed](#help-needed) | |||
* [Character sets](#character-sets) | |||
* [Voice Files](#voice-files) | |||
* [Default Voice](#default-voice) | |||
* [English Voices](#english-voices) | |||
* [Voice Variants](#voice-variants) | |||
* [Other Languages](#other-languages) | |||
* [Provisional Languages](#provisional-languages) | |||
* [Mbrola Voices](#mbrola-voices) | |||
# Languages | |||
The eSpeak NG speech synthesizer supports several languages, | |||
however in many cases these are initial drafts and need more work to | |||
improve them. Assistance from native speakers is welcome for these, or | |||
other new languages. Please contact me if you want to help. | |||
eSpeak does text to speech synthesis for the following languages, some | |||
eSpeak NG does text to speech synthesis for the following languages, some | |||
better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, | |||
Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, | |||
German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, | |||
@@ -15,7 +27,7 @@ Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, | |||
Swedish, Tamil, Turkish, Vietnamese, Welsh. | |||
#### Help Needed {.western} | |||
### Help Needed | |||
Many of these are just experimental attempts at these languages, | |||
produced after a quick reading of the corresponding article on | |||
@@ -31,9 +43,9 @@ Italian voice improved from "difficult to understand" to "good" by | |||
changing the relative length of stressed syllables. Identifying | |||
unstressed function words in the xx\_list file is also important to make | |||
the speech flow well. See [Adding or Improving a | |||
Language](add_language.html) | |||
Language](add_language.md) | |||
#### Character sets {.western} | |||
### Character sets | |||
Languages recognise text either as UTF8 or alternatively in an 8-bit | |||
character set which is appropriate for that language. For example, for | |||
@@ -41,9 +53,7 @@ Polish this is Latin2, for Russian it is KOI8-R. This choice can be | |||
overridden by a line in the voices file to specify an ISO 8859 character | |||
set, eg. for Russian the line: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
charset 5 | |||
~~~~ | |||
will mean that ISO 8859-5 is used as the 8-bit character set rather than | |||
KOI8-R. | |||
@@ -56,18 +66,16 @@ or Russian voice will sound OK, but each word is spoken separately so it | |||
won't flow properly. | |||
Sample texts in various languages can be found at | |||
[http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias) | |||
and [www.gutenberg.org](http://www.gutenberg.org/) | |||
[wikipedia](http://meta.wikimedia.org/wiki/List_of_Wikipedias) | |||
and [gutenberg](http://www.gutenberg.org/) | |||
### 3.1 Voice Files {.western} | |||
## Voice Files | |||
A number of Voice files are provided in the | |||
`espeak-data/voices`{.western} directory. You can select one of these | |||
with the **-v \<voice filename\>** parameter to the speak command, eg: | |||
`espeak-data/voices` directory. You can select one of these | |||
with the `-v \<voice filename\>` parameter to the speak command, eg: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng -vaf | |||
~~~~ | |||
to speak using the Afrikaans voice. | |||
@@ -78,48 +86,61 @@ code](http://www.sil.org/iso639-3/codes.asp) can be used. | |||
For details of the voice files see [Voices](voices.html). | |||
#### Default Voice {.western} | |||
### Default Voice | |||
**default** | |||
This voice is used if none is specified in the speak command. Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice. | |||
## English Voices | |||
* **en** | |||
is the standard default English voice. | |||
* **en-us** | |||
American English. | |||
* **en-sc** | |||
English with a Scottish accent. | |||
### 3.2 English Voices {.western} | |||
* **en-n** | |||
en-rp | |||
en-wm** | |||
are different English voices. These can be considered caricatures of various British accents: Northern, Received Pronunciation, West Midlands respectively. | |||
### 3.3 Voice Variants {.western} | |||
## Voice Variants | |||
To make alternative voices for a language, you can make additional voice | |||
files in espeak-data/voices which contains commands to change various | |||
voice and pronunciation attributes. See [voices.html](voices.html). | |||
voice and pronunciation attributes. See [voices](voices.md). | |||
Alternatively there are some preset voice variants which can be applied | |||
to any of the language voices, by appending `+`{.western} and a variant | |||
to any of the language voices, by appending **+** and a variant | |||
name. Their effects are defined by files in | |||
`espeak-data/voices/!v`{.western}. | |||
`espeak-data/voices/!v`. | |||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male | |||
voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and | |||
`+croak +whisper`{.western} for other effects. For example: | |||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7` for male | |||
voices, `+f1 +f2 +f3 +f4 +f5 `for female voices, and | |||
`+croak +whisper` for other effects. For example: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng -ven+m3 | |||
~~~~ | |||
The available voice variants can be listed with: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng --voices=variant | |||
~~~~ | |||
### 3.4 Other Languages {.western} | |||
## Other Languages | |||
The eSpeak speech synthesizer does text to speech for the following | |||
The eSpeak NG speech synthesizer does text to speech for the following | |||
additional langauges. | |||
### 3.5 Provisional Languages {.western} | |||
## Provisional Languages | |||
These languages are only initial naive implementations which have had | |||
little or no feedback and improvement from native speakers. | |||
### 3.6 Mbrola Voices {.western} | |||
## Mbrola Voices | |||
Some additional voices, whose name start with **mb-** (for example | |||
**mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak | |||
**mb-en1**) use eSpeak NG as a front-end to Mbrola diphone voices. eSpeak NG | |||
does the spelling-to-phoneme translation and intonation. See | |||
[mbrola.html](mbrola.html). | |||
[mbrola](mbrola.md). |
@@ -1,126 +1,154 @@ | |||
MBROLA VOICES {.western} | |||
------------- | |||
# Table of contents | |||
* [Mbrola voices](#mbrola-voices) | |||
* [Voice Names](#voice-names) | |||
* [Windows Installation](#windows-installation) | |||
* [Linux Installation](#linux-installation) | |||
* [Mbrola Voice Files](#mbrola-voice-files) | |||
* [Mbrola Phoneme Translation Data](#mbrola-phoneme-translation-data) | |||
# Mbrola voices | |||
The Mbrola project is a collection of diphone voices for speech | |||
synthesis. They do not include any text-to-phoneme translation, so this | |||
must be done by another program. The Mbrola voices are cost-free but are | |||
not open source. They are available from the Mbrola website at:\ | |||
not open source. They are available from the Mbrola website at: | |||
[http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) | |||
eSpeak can be used as a front-end to Mbrola. It provides the | |||
eSpeak NG can be used as a front-end to Mbrola. It provides the | |||
spelling-to-phoneme translation and intonation, which Mbrola then uses | |||
to generate speech sound. | |||
### Voice Names {.western} | |||
## Voice Names | |||
To use a Mbrola voice, eSpeak needs information to translate from its | |||
To use a Mbrola voice, eSpeak NG needs information to translate from its | |||
own phonemes to the equivalent Mbrola phonemes. This has been set up for | |||
only some voices so far. | |||
The eSpeak voices which use Mbrola are named as:\ | |||
The eSpeak NG voices which use Mbrola are named as:\ | |||
**mb-**xxx | |||
where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola | |||
"**en1**" English voice). These voice files are in eSpeak's directory | |||
`espeak-data/voices/mbrola`{.western}. | |||
"**en1**" English voice). These voice files are in eSpeak NG's directory | |||
`espeak-data/voices/mbrola`. | |||
The installation instructions below use the Mbrola voice "en1" as an | |||
example. You can use other mbrola voices for which there is an | |||
equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}. | |||
equivalent eSpeak NG voice in `espeak-data/voices/mbrola`. | |||
There are some additional eSpeak Mbrola voices which speak English text | |||
There are some additional eSpeak NG Mbrola voices which speak English text | |||
using a Mbrola voice for a different language. These contain the name of | |||
the Mbrola voice with a suffix **-en**. For example, the voice | |||
**mb-de4-en** will speak English text with a German accent by using the | |||
Mbrola **de4** voice. | |||
### Windows Installation {.western} | |||
## Windows Installation | |||
The SAPI5 version of eSpeak NG uses the mbrola.dll. | |||
The SAPI5 version of eSpeak uses the mbrola.dll. | |||
1. Install eSpeak. Include the voice **mb-en1** in the list of voices during the eSpeak installation. | |||
2. Install the PC/Windows version of Mbrola (MbrolaTools35.exe) from: [http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe](http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe). | |||
3. Get the **en1** voice from: [http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) unpack the archive, and copy the "**en1**" data file (not the whole "en1" directory) into `C:/Program Files/eSpeak/espeak-data/mbrola`. | |||
4. Use the voice **espeak-MB-EN1** from the list of SAPI5 voices. | |||
1. 2. 3. 4. | |||
### Linux Installation {.western} | |||
## Linux Installation | |||
From eSpeak version 1.44 onwards, eSpeak calls the mbrola program | |||
From eSpeak NG version 44 onwards, eSpeak NG calls the mbrola program | |||
directly, rather than passing phoneme data to it using a pipe. | |||
1. 2. 3. | |||
1. To install the Linux Mbrola binary, download: [http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbr301h.zip](http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbr301h.zip). Unpack the archive, and copy and rename the file from: `mbrola-linux-i386` to `mbrola` somewhere in your executable path (eg. `/usr/bin/mbrola` ). | |||
2. Get the en1 voice from: [http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html). Unpack the archive, and copy the "**en1**" data file (not the whole "en1" directory) to `/usr/share/mbrola/en1`. | |||
eSpeak will look for mbrola voices firstly in `espeak-data/mbrola` and then in `/usr/share/mbrola` | |||
3. If you use the eSpeak voice such as "**mb-en1**" then eSpeak will use the mbrola "en1" voice, eg: | |||
`espeak-ng -v mb-en1 "Hello world"` | |||
### Mbrola Voice Files {.western} | |||
To generate mbrola phoneme data (.pho file) you can use: | |||
`espeak-ng -v mb-en1 -q --pho "Hello world"` | |||
or | |||
`espeak-ng -v mb-en1 -q --pho --phonout=out.pho "Hello world"` | |||
eSpeak's voice files for Mbrola voices are in directory | |||
`espeak-data/voices/mbrola`{.western}. They contain a line:\ | |||
`mbrola <voice> <translation>`{.western} \ | |||
eg.\ | |||
`mbrola en1 en1_phtrans`{.western} | |||
- - | |||
## Mbrola Voice Files | |||
They are binary files which are compiled, using espeakedit, from source | |||
files in `phsource/mbrola`{.western}, see below. | |||
eSpeak NG's voice files for Mbrola voices are in directory `espeak-data/voices/mbrola`. | |||
They contain a line: `mbrola <voice> <translation>` | |||
### Mbrola Phoneme Translation Data {.western} | |||
eg. | |||
`mbrola en1 en1_phtrans` | |||
Mbrola phoneme translation files specify translations from eSpeak | |||
* **\<voice\>** | |||
is the name of the Mbrola voice. | |||
* **\<translation\>** | |||
is a translation file to convert between eSpeak phonemes and the equivalent Mbrola phonemes. | |||
These are kept in: `espeak-data/mbrola_ph` | |||
They are binary files which are compiled, using espeakedit, from source files in `phsource/mbrola`, see below. | |||
## Mbrola Phoneme Translation Data | |||
Mbrola phoneme translation files specify translations from eSpeak NG | |||
phoneme names to mbrola phoneme names. They are referenced from voice | |||
files. | |||
The source files are in `phsource/mbrola`{.western}. These are compiled | |||
using the `espeakedit`{.western} program | |||
(`Compile->Compile mbrola phonemes list`{.western}) to produce data | |||
files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak. | |||
The source files are in `phsource/mbrola`. These are compiled | |||
using the `espeakedit` program | |||
(`Compile->Compile mbrola phonemes list`) to produce data | |||
files in `espeak-data/mbrola_ph` which are used by eSpeak NG. | |||
Each line in the mbrola phoneme translation file contains: | |||
`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western} | |||
**\<control\>** | |||
`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] ` | |||
- - - - | |||
* **\<control\>** | |||
bit 0 skip the next phoneme | |||
bit 1 match this and Previous phoneme | |||
bit 2 only at the start of a word | |||
bit 3 don't match two phonemes across a word boundary | |||
**\<espeak ph1\>**\ | |||
The eSpeak phoneme which is to be translated to an mbrola phoneme. | |||
* **\<espeak ph1\>** | |||
The eSpeak NG phoneme which is to be translated to an mbrola phoneme. | |||
**\<espeak ph2\>**\ | |||
If this field is not `NULL`{.western}, then the match only occurs if | |||
* **\<espeak ph2\>** | |||
If this field is not `NULL`, then the match only occurs if | |||
this field matches the next phoneme. If control bit 1 is set, then the | |||
*previous* rather than the *next* phoneme is matched. This field may | |||
also have the following values:\ | |||
`VWL`{.western} matches any Vowel phoneme. | |||
also have the following values: | |||
`VWL` matches any Vowel phoneme. | |||
**\<percent\>**\ | |||
If this field is zero then only one mbrola phoneme is used. If this | |||
* **\<percent\>** | |||
If this field is zero then only one mbrola phoneme is used. If this | |||
field is non-zero, then two mbrola phonemes are used, and this value | |||
gives the percentage length of the first mbrola phoneme. | |||
**\<mbrola ph1\>**\ | |||
The mbrola phoneme to which the eSpeak phoneme is translated. This | |||
field may be `NULL`{.western}. | |||
* **\<mbrola ph1\>** | |||
The mbrola phoneme to which the eSpeak NG phoneme is translated. This | |||
field may be `NULL`. | |||
**\<mbrola ph2\>**\ | |||
The second mbrola phoneme. This field is only used if the \<percent\> | |||
* **\<mbrola ph2\>** | |||
The second mbrola phoneme. This field is only used if the \<percent\> | |||
field is not zero. | |||
The list is searched from start to finish, until a match is found. | |||
Therefore, a line with more specific match condition should appear | |||
before a line which matches the same eSpeak phoneme but with a more | |||
before a line which matches the same eSpeak NG phoneme but with a more | |||
general condition. | |||
The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes | |||
The file `dictsource/dict_phonemes` lists the eSpeak NG phonemes | |||
which are used for each language. Translations for all these should be | |||
given in the mbrola phoneme translation file. In addition, some phonemes | |||
which are referenced from phoneme files (eg. | |||
`phsource/ph_language, phsource/phonemes`{.western}) in lines such as: | |||
`phsource/ph_language, phsource/phonemes`) in lines such as: | |||
~~~~ {.western} | |||
beforenotvowel l/ | |||
reduceto a# 0 | |||
~~~~ | |||
beforenotvowel l/ | |||
reduceto a# 0 | |||
should also be included, even though they don't appear in | |||
`dictsource/dict_phonemes`{.western}. | |||
`dictsource/dict_phonemes`. | |||
If the language's \*\_list or \*\_rules files includes rules to speak | |||
words "as English" the mbrola phoneme translation file should include |
@@ -1,5 +1,12 @@ | |||
PHONEMES {.western} | |||
-------- | |||
# Table of contents | |||
* [Phonemes](#phonemes) | |||
* [English Consonants](#english-consonants) | |||
* [Some Additional Consonants](#some-additional-consonants) | |||
* [English Vowels](#english-vowels) | |||
* [Some Additional Vowels](#some-additional-vowels) | |||
# Phonemes | |||
In general a different set of phonemes can be defined for each language. | |||
@@ -14,98 +21,48 @@ characters. See: | |||
Phoneme mnemonics can be used directly in the text input to | |||
**espeak-ng**. They are enclosed within double square brackets. Spaces | |||
are used to separate words, and all stressed syllables must be marked | |||
explicitly. eg:\ | |||
`[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western} | |||
### English Consonants {.western} | |||
`[p]`{.western} | |||
`[b]`{.western} | |||
`[t]`{.western} | |||
`[d]`{.western} | |||
`[tS]`{.western} | |||
**ch**urch | |||
`[dZ]`{.western} | |||
**j**udge | |||
`[k]`{.western} | |||
`[g]`{.western} | |||
`[f]`{.western} | |||
`[v]`{.western} | |||
`[T]`{.western} | |||
**th**in | |||
`[D]`{.western} | |||
**th**is | |||
`[s]`{.western} | |||
`[z]`{.western} | |||
`[S]`{.western} | |||
**sh**op | |||
`[Z]`{.western} | |||
plea**s**ure | |||
`[h]`{.western} | |||
`[m]`{.western} | |||
`[n]`{.western} | |||
`[N]`{.western} | |||
si**ng** | |||
`[l]`{.western} | |||
`[r]`{.western} | |||
**r**ed (Omitted if not immediately followed by a vowel). | |||
`[j]`{.western} | |||
**y**es | |||
`[w]`{.western} | |||
**Some Additional Consonants** | |||
\ | |||
`[C]`{.western} | |||
German i**ch** | |||
`[x]`{.western} | |||
German bu**ch** | |||
`[l^]`{.western} | |||
Italian **gl**i | |||
`[n^]`{.western} | |||
Spanish **ñ** | |||
### English Vowels {.western} | |||
explicitly. eg: | |||
\[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]\] | |||
## English Consonants | |||
+----------------+-------------------------------+ | |||
|\[p\] | \[b\] | | |||
+----------------+-------------------------------+ | |||
|\[t\] | \[d\] | | |||
+----------------+-------------------------------+ | |||
|\[tS\] **ch**urch | \[dZ\] **j**udge | | |||
+----------------+-------------------------------+ | |||
|\[k\] | \[g\] | | |||
+----------------+-------------------------------+ | |||
|\[f\] | \[v\] | | |||
+----------------+-------------------------------+ | |||
|\[T\] **th**in | \[D\] **th**is | | |||
+----------------+-------------------------------+ | |||
|\[s\] | \[z\] | | |||
+----------------+-------------------------------+ | |||
|\[S\] **sh**op | \[Z\] plea**s**ure | | |||
+----------------+-------------------------------+ | |||
|\[h\] | | | |||
+----------------+-------------------------------+ | |||
|\[m\] | \[n\] | | |||
+----------------+-------------------------------+ | |||
|\[N\] si**ng** | | | |||
+----------------+-------------------------------+ | |||
|\[l\] | \[r\] **r**ed (Omitted if not immediately followed by a vowel). | | |||
+----------------+-------------------------------+ | |||
|\[j\] **y**es | \[w\] | | |||
+----------------+-------------------------------+ | |||
## Some Additional Consonants | |||
+-------------------------+---------------------------+ | |||
| \[C]\ German i**ch** | \[x\] German bu**ch** | | |||
+---------------------+-------------------------------+ | |||
| \[l^\] Italian **g**li | \[n^\] Spanish **ñ** | | |||
+-------------------------+---------------------------+ | |||
## English Vowels | |||
These are the phonemes which are used by the English spelling-to-phoneme | |||
translations (en\_rules and en\_list). In some varieties of English | |||
@@ -113,171 +70,91 @@ different phonemes may have the same sound, but they are kept separate | |||
because they may differ in another variety. | |||
In rhotic accents, such as General American, the phonemes | |||
`[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound. | |||
`[@]`{.western} | |||
alph**a** | |||
schwa | |||
`[3]`{.western} | |||
bett**er** | |||
rhotic schwa. In British English this is the same as `[@]`{.western}, | |||
but it includes 'r' colouring in American and other rhotic accents. In | |||
these cases a separate `[r]`{.western} should not be included unless it | |||
is followed immediately by another vowel. | |||
`[3:]`{.western} | |||
n**ur**se | |||
`[@L]`{.western} | |||
simp**le** | |||
`[@2]`{.western} | |||
the | |||
Used only for "the". | |||
`[@5]`{.western} | |||
to | |||
Used only for "to". | |||
`[a]`{.western} | |||
tr**a**p | |||
`[aa]`{.western} | |||
b**a**th | |||
This is `[a]`{.western} in some accents, `[A:]`{.western} in others. | |||
`[a#]`{.western} | |||
**a**bout | |||
This may be `[@]`{.western} or may be a more open schwa. | |||
`[A:]`{.western} | |||
p**al**m | |||
`[A@]`{.western} | |||
st**ar**t | |||
`[E]`{.western} | |||
dr**e**ss | |||
`[e@]`{.western} | |||
squ**are** | |||
`[I]`{.western} | |||
k**i**t | |||
`[I2]`{.western} | |||
**i**ntend | |||
As `[I]`{.western}, but also indicates an unstressed syllable. | |||
`[i]`{.western} | |||
happ**y** | |||
An unstressed "i" sound at the end of a word. | |||
`[i:]`{.western} | |||
fl**ee**ce | |||
`[i@]`{.western} | |||
n**ear** | |||
`[0]`{.western} | |||
l**o**t | |||
`[V]`{.western} | |||
str**u**t | |||
`[u:]`{.western} | |||
g**oo**se | |||
`[U]`{.western} | |||
f**oo**t | |||
`[U@]`{.western} | |||
c**ure** | |||
`[O:]`{.western} | |||
th**ou**ght | |||
`[O@]`{.western} | |||
n**or**th | |||
`[o@]`{.western} | |||
f**or**ce | |||
`[aI]`{.western} | |||
pr**i**ce | |||
`[eI]`{.western} | |||
f**a**ce | |||
`[OI]`{.western} | |||
ch**oi**ce | |||
`[aU]`{.western} | |||
m**ou**th | |||
`[oU]`{.western} | |||
g**oa**t | |||
`[aI@]`{.western} | |||
sc**ie**nce | |||
`[aU@]`{.western} | |||
h**our** | |||
### Some Additional Vowels {.western} | |||
`[3:], [A@], [e@], [i@], [O@], [U@]` include the "r" sound. | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[@\] | alph**a** | schwa | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[3\] | bett**er** | rhotic schwa. In British English this is the same as \[@\], | | |||
| | | but it includes 'r' colouring in American and other rhotic accents. | | |||
| | | In these cases a separate \[r\] should not be included unless it is | | |||
| | | followed immediately by another vowel. | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[3:\] | n**ur**se | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[@L\] | simp**le** | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[@2\] | the Used only for "the". | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[@5\] | to Used only for "to". | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[a\] | tr**a**p | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[aa\] | b**a**th | This is \[a\] in some accents, \[A:\] in others. | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[a#\] | **a**bout | This may be \[@\] or may be a more open schwa. | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[A:\] | p**al**m | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[A@\] | st**ar**t | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[E\] | dr**e**ss | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[e@\] | squ**are** | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[I\] | k**i**t | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[I2\] | **i**ntend | As \[I\], but also indicates an unstressed syllable. | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[i\] | happ**y** | An unstressed "i" sound at the end of a word. | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[i:\] | fl**ee**ce | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[i@\] | n**ear** | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[0\] | l**o**t | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[V\] | str**u**t | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[u:\] | g**oo**se | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[U\] | f**oo**t | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[U@\] | c**ure** | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[O:\] | th**ou**ght | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[O@\] | n**or**th | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[o@\] | f**or**ce | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[aI\] | pr**i**ce | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[eI\] | f**a**ce | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[OI\] | ch**oi**ce | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[aU\] | m**ou**th | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[oU\] | g**oa**t | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[aI@\] | sc**ie**nce | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
|\[aU@\] | h**our** | | | |||
+---------+--------------------------+---------------------------------------------------------------------+ | |||
## Some Additional Vowels | |||
Other languages will have their own vowel definitions, eg: | |||
+--------------------------------------+--------------------------------------+ | |||
| `[e]`{.western} | German **eh**, French **é** | | |||
+--------------------------------------+--------------------------------------+ | |||
| `[o]`{.western} | German **oo**, French **o** | | |||
+--------------------------------------+--------------------------------------+ | |||
| `[y]`{.western} | German **ü**, French **u** | | |||
+--------------------------------------+--------------------------------------+ | |||
| `[Y]`{.western} | German **ö**, French **oe** | | |||
+--------------------------------------+--------------------------------------+ | |||
`[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western} | |||
+---------+--------------------------------------+ | |||
| \[e\] | German **eh**, French **é** | | |||
+-------------------+----------------------------+ | |||
| \[o\] | German **oo**, French **o** | | |||
+-------------------+----------------------------+ | |||
| \[y\] | German **ü**, French **u** | | |||
+-------------------+----------------------------+ | |||
| \[Y\] | German **ö**, French **oe** | | |||
+---------+--------------------------------------+ | |||
**\[:\]** can be used to lengthen a vowel, eg \[e:\] |
@@ -1,5 +1,15 @@ | |||
PHONEME TABLES {.western} | |||
-------------- | |||
# Table of contents | |||
* [Phoneme tables](#phoneme-tables) | |||
* [Phoneme files](#phoneme-files) | |||
* [Phoneme definitions](#phoneme-definitions) | |||
* [Phoneme Properties](#phoneme-properties) | |||
* [Phoneme Instructions](#phoneme-instructions) | |||
* [Conditional Statements](#conditional-statements) | |||
* [Sound Specifications](#sound-specifications) | |||
* [Vowel Transitions](#vowel-transitions) | |||
# Phoneme tables | |||
A phoneme table defines all the phonemes which are used by a language, | |||
together with their properties and the data for their production as | |||
@@ -20,7 +30,7 @@ the espeakedit download package. "Vowel files", which are referenced in | |||
FMT(), VowelStart(), and VowelEnding() instructions are made using the | |||
espeakedit program. | |||
### Phoneme files {.western} | |||
## Phoneme files | |||
The phoneme tables are defined in a master phoneme file, named | |||
**phonemes**. This starts with the **base** phoneme table followed by | |||
@@ -30,22 +40,22 @@ from the **base** table or previously defined tables. | |||
In addition to phoneme definitions, the phoneme file can contain the | |||
following: | |||
**include** \<filename\> | |||
: Includes the text of the specified file at this point. This allows | |||
different phoneme tables to be kept in different text files, for | |||
convenience. \<filename\> is a relative path. The included file can | |||
itself contain **include** statements. | |||
**phonemetable** \<name\> \<parent\> | |||
: Starts a new phoneme table, and ends the previous table.\ | |||
\<name\> Is the name of this phoneme table. This name is used in | |||
Voice files.\ | |||
\<parent\> Is the name of a previously defined phoneme table whose | |||
phoneme definitions are inherited by this one. The name **base** | |||
indicates the first (base) phoneme table. | |||
### Phoneme definitions {.western} | |||
Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and | |||
* **include** \<filename\> | |||
Includes the text of the specified file at this point. This allows | |||
different phoneme tables to be kept in different text files, for | |||
convenience. \<filename\> is a relative path. The included file can | |||
itself contain **include** statements. | |||
* **phonemetable** \<name\> \<parent\> | |||
Starts a new phoneme table, and ends the previous table. | |||
\<name\> Is the name of this phoneme table. This name is used in | |||
Voice files. | |||
\<parent\> Is the name of a previously defined phoneme table whose | |||
phoneme definitions are inherited by this one. The name **base** | |||
indicates the first (base) phoneme table. | |||
## Phoneme definitions | |||
Note: These new Phoneme definitions apply to eSpeak NG version 420 and | |||
later. | |||
A phoneme table contains a list of phoneme definitions. Each starts with | |||
@@ -53,7 +63,7 @@ the keyword **phoneme** and the phoneme name (this is the name used in | |||
the pronunciation rules in a language's \*\_rules and \*\_list files), | |||
and ends with the keyword **endphoneme**. For example: | |||
~~~~ {.western} | |||
``` | |||
phoneme aI | |||
vowel | |||
starttype #a endtype #i | |||
@@ -75,7 +85,7 @@ and ends with the keyword **endphoneme**. For example: | |||
ENDIF | |||
WAV(ufric/s) | |||
endphoneme | |||
~~~~ | |||
``` | |||
A phoneme definition contains both static properties and executed | |||
instructions. The instructions may contain conditional statements, so | |||
@@ -90,23 +100,110 @@ produce the sound for the phoneme. | |||
The **import\_phoneme** statement can be used to copy a previously | |||
defined phoneme from a specified phoneme table. For example: | |||
~~~~ {.western} | |||
``` | |||
phoneme t | |||
import_phoneme base/t[ | |||
endphoneme | |||
~~~~ | |||
``` | |||
means: `phoneme t`{.western} in this phoneme table is a copy of | |||
`phoneme t[`{.western} from phoneme table "base". A **length** | |||
means: `phoneme t` in this phoneme table is a copy of | |||
`phoneme t[` from phoneme table "base". A **length** | |||
instruction can be used after **import\_phoneme** to vary the length | |||
from the original. | |||
### Phoneme Properties {.western} | |||
## Phoneme Properties | |||
Within the phoneme definition the following lines may occur: ( (V) | |||
indicates only for vowels, (C) only for consonants) | |||
### Phoneme Instructions {.western} | |||
Type. One of these must be present. | |||
+------------+-----------------------------------------------+ | |||
| **vowel** | | | |||
+------------+-----------------------------------------------+ | |||
| **liquid** | semi-vowels, such as: `r, l, j, w` | | |||
+------------+-----------------------------------------------+ | |||
| **nasal** | nasal eg: `m, n, N` | | |||
+------------+-----------------------------------------------+ | |||
| **stop** | stop eg: `p, b, t, d, k, g` | | |||
+------------+-----------------------------------------------+ | |||
| **frc** | fricative eg: `f, v, T, D, s, z, S, Z, C, x` | | |||
+------------+-----------------------------------------------+ | |||
| **afr** | affricate eg: `tS, dZ` | | |||
+------------+-----------------------------------------------+ | |||
| **pause** | | | |||
+------------+-----------------------------------------------+ | |||
| **stress** | used for stress symbols, eg: ' , = % | | |||
+------------+-----------------------------------------------+ | |||
| **virtual**| Used to represent a class of phonemes. | | |||
+------------+-----------------------------------------------+ | |||
Properties: | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**vls** | (C) voiceless eg. `p, t, k, f, s` | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**vcd** | (C) voiced eg. `b, d, g, v, z` | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**sibilant** | (C) eg: `s, z, S, Z, tS, dZ` | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**palatal** | (C) A palatal or palatalized consonant. | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**rhotic** | (C) An "r" type consonant. | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**unstressed**| (V) This vowel is always unstressed, unless explicitly marked otherwise. | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**nolink** | Prevent any linking from the previous phoneme. | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**nopause** | Used in a `liquid` or `nasal` phoneme to prevent eSpeak inserting a short | | |||
| | pause if a word starts with this phoneme and the previous word ends with a vowel.| | |||
+--------------+----------------------------------------------------------------------------------+ | |||
|**trill** | (C) Apply trill to the voicing. | | |||
+--------------+----------------------------------------------------------------------------------+ | |||
Place of Articulation (C): | |||
+--------+------------------+ | |||
|**blb** | bi-labial | | |||
+--------+------------------+ | |||
|**ldb** | labio-dental | | |||
+--------+------------------+ | |||
|**dnt** | dental | | |||
+--------+------------------+ | |||
|**alv** | alveolar | | |||
+--------+------------------+ | |||
|**rfx** | retroflex | | |||
+--------+------------------+ | |||
|**pla** | palato-alveolar | | |||
+--------+------------------+ | |||
|**pal** | palatal | | |||
+--------+------------------+ | |||
|**vel** | velar | | |||
+--------+------------------+ | |||
|**lbv** | labio-velar | | |||
+--------+------------------+ | |||
|**uvl** | uvular | | |||
+--------+------------------+ | |||
|**phr** | pharyngeal | | |||
+--------+------------------+ | |||
|**glt** | glottal | | |||
+--------+------------------+ | |||
* **starttype** \<phoneme\> | |||
Allocates this phoneme to a group so that conditions such as nextPh(#e) can test for any of a group of phonemes. Pre-defined groups for use for vowels are: #@ #a #e #i #o #u. Additional groups can be defined as phonemes with type "virtual". | |||
* **endtype** \<phoneme\> | |||
Allocates this phoneme to a group so that conditions such as prevPh(#e) can test for any of a group of phonemes. Pre-defined groups for use for vowels are: #@ #a #e #i #o #u. Additional groups can be defined as phonemes with type "virtual". | |||
* **lengthmod** \<integer\> | |||
\(C\) Determines how this consonant affects the length of the previous vowel. | |||
This value is used as index into the `length_mods` table in the `CalcLengths()` function in the eSpeak program. | |||
* **voicingswitch** \<phoneme\> | |||
This is used for some languages to change between voiced and unvoiced phonemes. | |||
## Phoneme Instructions | |||
Phoneme Instructions may be included within conditional statements. | |||
@@ -115,20 +212,75 @@ causes a change to a different phoneme will terminate the instructions. | |||
During the second phase, FMT() and WAV() instructions will terminate the | |||
instructions. | |||
### Conditional Statements {.western} | |||
* **length** \<length\> | |||
The relative length of the phoneme, typically about 140 for a short vowel and from 200 to 300 for a long vowel or diphong. A length() instruction is needed for vowels. It is optional for consonants. | |||
* **ipa** \<ipa string\> | |||
In many cases, eSpeak makes IPA (International Phonetic Alpbabet) phoneme names automatically from eSpeak phoneme names. If this is not correct, then the phoneme definition can include an **ipa** instruction to specify the correct IPA name. IPA strings may include non-ascii characters. They may also include characters specified by their character codes in the form U+ followed by 4 hexadecimal digits. For example a string: aU+0303 indicates 'a' with a 'combining tilde'. | |||
* **WAV**(\<wav file\>, \<amplitude\>) | |||
\<wav file\> is a path to a WAV file (22 kHz, 16 bits, mono) within `phsource/` which will be played to produce the sound. This method is used for unvoiced consonants. \<wavefile\> does not include a .WAV filename extension, although the file to which it refers may or may not have one. | |||
\<amplitude\> is optional. It is a percentage change to the amplitude of the WAV file. So, `WAV(ufric/s, 50)` means: play file 'ufric/s.wav' at 50% amplitude. | |||
* **FMT**(\<vowel file\>, \<amplitude\>) | |||
\<vowel file\> is a path to a file (within `phsource/`) which defines how to generate the sound (a vowel or voiced consonant) from a sequence of formant values. Vowel files are made using the espeakedit program. | |||
\<amplitude\> is optional. It is a percentage change to the amplitude of the sound which is synthesized from the FMT() instruction. | |||
* **FMT**(\<vowel file\>, \<amplitude\>) **addWav**(\<wav file\>, \<amplitude\>) | |||
For voiced consonants, a FMT() instruction may be followed by an addWav() instruction. addWav() has the same format as a WAV() instruction, but the WAV file is mixed with the sound which is synthesized from the FMT() instruction. | |||
* **VowelStart**(\<vowel file\>, \<length adjust\>) | |||
This is used to modify the start of a vowel when it follows a sonorant consonant (such as [l] or [j]). It replaces the first frame of the \<vowel file\> which is specified in a FMT() instruction by this \<vowel file\>, and adjusts the length of the original by a signed value \<length adjust\>. The VowelStart() instruction may be specified either in the phoneme definition of the vowel, or in the phoneme definition of the sonorant consonant which precedes the vowel. The former takes precedence. | |||
* **VowelEnding**(\<vowel file\>, \<length adjust\>) | |||
This is used to modify the end of a vowel when it is followed by a sonorant consonant (such as [l] or [j]). It is appended to the \<vowel file\> which is specified in a FMT() instruction by this \<vowel file\>, and adjusts the length of the original by a signed value \<length adjust\>. The VowelEnding() instruction may be specified either in the phoneme definition of the vowel, or in the phoneme definition of the sonorant consonant which follows the vowel. The former takes precedence. | |||
* **Vowelin** \<vowel transition data\> | |||
(C) Specifies the effects of this consonant on the formants of a following vowel. See "vowel transitions", below. | |||
* **Vowelout** \<vowel transition data\> | |||
(C) Specifies the effects of this consonant on the formants of a preceding vowel. See "vowel transitions", below. | |||
* **ChangePhoneme(**\<phoneme\>) | |||
Change to the specified phoneme. | |||
* **ChangeIfDiminished(**\<phoneme\>) | |||
Change to the specified phoneme (such as schwa, @) if this syllable has "diminished" stress. | |||
* **ChangeIfUnstressed(**\<phoneme\>) | |||
Change to the specified phoneme if this syllable has "diminished" or "unstressed" stress. | |||
* **ChangeIfNotStressed(**\<phoneme\>) | |||
Change to the specified phoneme if this syllable does not have "primary" stress. | |||
* **ChangeIfStressed(**\<phoneme\>) | |||
Change to the specified phoneme if this syllable has "primary" stress. | |||
* **IfNextVowelAppend(**\<phoneme\>) | |||
If the following phoneme is a vowel then this additional phoneme will be inserted before it. | |||
* **RETURN** | |||
Ends executions of instructions. | |||
* **CALL** \<phoneme table\>/\<phoneme\> | |||
Executes the instructions of the specified phoneme. | |||
### Conditional Statements | |||
Phoneme definitions can contain conditional statements such as: | |||
~~~~ {.western} | |||
IF <condition> THEN | |||
``` | |||
<pre> IF <condition> THEN | |||
<statements> | |||
ENDIF | |||
~~~~ | |||
</pre> | |||
``` | |||
or more generally: | |||
~~~~ {.western} | |||
IF <condition> THEN | |||
``` | |||
<pre> IF <condition> THEN | |||
<statements> | |||
ELIF <condition> THEN | |||
<statements> | |||
@@ -136,34 +288,138 @@ or more generally: | |||
ELSE | |||
<statements> | |||
ENDIF | |||
~~~~ | |||
</pre> | |||
``` | |||
where the `ELSE`{.western} and multiple `ELSE`{.western} parts are | |||
optional. | |||
where the `ELSE` and multiple `ELSE` parts are optional. | |||
Multiple conditions may be joined with `AND`{.western} or | |||
`OR`{.western}, but not a mixture of `AND`{.western}s and | |||
`OR`{.western}s. | |||
Multiple conditions may be joined with `AND` or `OR`, but not a mixture of `AND`s and `OR`s. | |||
A condition may be preceded by `NOT`{.western}. For example: | |||
A condition may be preceded by `NOT`. For example: | |||
~~~~ {.western} | |||
IF <condition> AND NOT <condition> THEN | |||
``` | |||
<pre> IF <condition> AND NOT <condition> THEN | |||
<statements> | |||
ENDIF | |||
~~~~ | |||
</pre> | |||
``` | |||
### Conditions | |||
Conditions can be: | |||
* thisPh(\<attribute\>) | |||
Test this current phoneme | |||
* prevPh(\<attribute\>) | |||
Test the previous phoneme | |||
* prevPhW(\<attribute\>) | |||
Test the previous phoneme, but only within the same word. Returns false if there is no previous phoneme in the word. | |||
* prev2PhW(\<attribute\>) | |||
Test the phoneme before the previous phoneme, but only within the same word. Returns false if it is not in this word. | |||
* nextPh(\<attribute\>) | |||
Test the following phoneme | |||
* next2Ph(\<attribute\>) | |||
Test the phoneme after the next phoneme. | |||
* nextPhW(\<attribute\>) | |||
Test the next phoneme, but only within the same word. Returns false if there is no following phoneme in the word. | |||
* next2PhW(\<attribute\>) | |||
Test the phoneme after the next phoneme, but only within the same word. Returns false if not found before the word end. | |||
* next3PhW(\<attribute\>) | |||
Test the third phoneme after the current phoneme, but only within the same word. Returns false if not found before the word end. | |||
* nextVowel(\<attribute\>) | |||
Test the next vowel after the current phoneme, but only within the same word. Returns false if there is none. | |||
* prevVowel(\<attribute\>) | |||
Test the previous vowel before the current phoneme, but only within the same word. Returns false if there is none. | |||
* PreVoicing() | |||
This is used as part of the instructions for voiced stop consonants (eg. [d] [g]). If true then produce a voiced murmur before the stop. | |||
* KlattSynth() | |||
Returns true if the voice is using the Klatt synthesizer rather than the eSpeak synthesizer. | |||
### Attributes | |||
Note: Additional attributes could be added to eSpeak if needed. | |||
**Condition** Can be: | |||
True if the phoneme has this phoneme name. | |||
**Attributes** | |||
* \<phoneme name\> | |||
True if the phoneme has this phoneme name. | |||
### Sound Specifications {.western} | |||
* \<phoneme group\> | |||
True if the phoneme has this starttype (or if it has this endtype if it's used in prevPh() ). The pre-defined phoneme groups are #@, #a, #e, #i, #o, #u. | |||
* isPause | |||
True if the phoneme is a pause. | |||
* isPause2 | |||
`nextPh(isPause2)` is used to test whether the next phoneme is not a vowel or liquid consonant within the same word. | |||
* isVowel | |||
isNotVowel | |||
isLiquid | |||
isNasal | |||
isVFricative | |||
These test the phoneme type. | |||
* isPalatal | |||
isRhotic | |||
These test whether the phoneme has this property. | |||
* isWordStart | |||
notWordStart | |||
* These text whether this is the first phoneme in a word. | |||
* isWordEnd | |||
True if this is the final phoneme in a word. | |||
* isFirstVowel | |||
isSecondVowel | |||
isFinalVowel | |||
* True if this is the First, Second, or Last vowel in a word. | |||
* isAfterStress | |||
True if this phoneme is after the stressed vowel in a word. | |||
* isVoiced | |||
True if this phoneme is a vowel or a voiced consonant. | |||
* isDiminished | |||
True if the syllable stress is "diminished" | |||
* isUnstressed | |||
True if the syllable stress is "diminished" or "unstressed" | |||
* isNotStressed | |||
True if the syllable stress is not "primary stress". | |||
* isStressed | |||
True if the syllable stress is "primary stress". | |||
* isMaxStress | |||
True if this is the highest stressed syllable in the word. | |||
## Sound Specifications | |||
There are three ways to produce sounds: | |||
- - - | |||
* Playing a WAV file, by using a WAV() instruction. This is used for unvoiced consonants such as `[p] [t] [s]`. | |||
* Generating a wave from a sequence of formant parameters, by using a FMT() instruction.This is used for vowels and also for sonorants such as `[l] [j] [n]`. | |||
* A mixture of these. A stored WAV file is mixed with a wave generated from formant parameters. Use a FMT() instruction followed by addWav(). This is used for voiced stops and fricatives such as `[b] [g] [v] [z]`. | |||
### Vowel Transitions {.western} | |||
## Vowel Transitions | |||
These specify how a consonant affects an adjacent vowel. A consonant may | |||
cause a transition in the vowel's formants as the mouth changes shape | |||
@@ -172,3 +428,34 @@ specified. Note that the maximum rate of change of formant frequencies | |||
is limited by the speak program. | |||
* **len=<integer>** | |||
Nominal length of the transition in mS. If omitted a default value is used. | |||
* **rms=<integer>** | |||
Adjusts the amplitude of the vowel at the end of the transition. If omitted a default value is used. | |||
* **f1=<integer>** | |||
0: f1 formant frequency unchanged. | |||
1: f1 formant frequency decreases. | |||
2: f1 formant frequency decreases more. | |||
* **f2=<freq> <min> <max>** | |||
<freq>: The frequency towards which the f2 formant moves (Hz). | |||
<min>: Signed integer (Hz). The minimum f2 frequency change. | |||
<max>: Signed integer (Hz). The maximum f2 frequency change. | |||
* **f3=<change> <amplitude>** | |||
<change>: Signed integer (Hz). Frequence change of f3, f4, and f5 formants. | |||
<amplitude>: Amplitude of the f3, f4, and f5 formants at the end of the transition. 100 = no change. | |||
* **brk** | |||
Break. Do not merge the synthesized wave of the consonant into the vowel. This will produce a discontinuity in the formants. | |||
* **rate** | |||
Allow a greater maximum rate of change of formant frequencies. | |||
* **glstop** | |||
Indicates a glottal stop. | |||
@@ -1,64 +1,78 @@ | |||
TEXT MARKUP {.western} | |||
----------- | |||
### SSML: Speech Synthesis Markup Language {.western} | |||
# Text markup | |||
## SSML: Speech Synthesis Markup Language | |||
The following markup tags and attributes are recognised: | |||
**\<speak\>** | |||
- - | |||
* xml:base (the value is just passed back as a parameter with the UriCallback() function) | |||
* xml:lang | |||
**\<voice\>** | |||
- - - - - | |||
* xml:lang | |||
* name | |||
* age | |||
* variant | |||
* gender | |||
**\<prosody\>** | |||
- - - - | |||
* rate | |||
* volume | |||
* pitch | |||
* range | |||
**\<say-as\>** | |||
- - - - - | |||
* interpret-as="characters" | |||
* interpret-as="characters" format="glyphs" | |||
* interpret-as="tts:key" | |||
* interpret-as="tts:char" | |||
* interpret-as="tts:digits" | |||
**\<mark\>** name | |||
**\<s\>** | |||
- | |||
* xml:lang | |||
**\<p\>** | |||
- | |||
* xml:lang | |||
**\<sub\>** alias | |||
**\<tts:style\>** | |||
- - | |||
* field="punctuation" mode=none,all,some | |||
* field="capital_letters" mode=no,spelling,icon,pitch | |||
**\<audio\>** src | |||
**\<emphasis\>** | |||
- | |||
* level | |||
**\<break\>** | |||
- - | |||
* strength | |||
* time | |||
### HTML {.western} | |||
## HTML | |||
eSpeak can speak HTML text directly, or text containing both SSML and | |||
HTML markup.\ | |||
Any unrecognised tags are ignored. | |||
eSpeak can speak HTML text directly, or text containing both SSML and HTML markup. | |||
Any unrecognised tags are ignored. | |||
The following tags case a sentence break.\ | |||
**\<br\> \<dd\> \<li\> \<img\> \<td\> ** | |||
The following tags case a sentence break. | |||
**\<br\> \<dd\> \<li\> \<img\> \<td\> ** | |||
The following tags case a paragraph break.\ | |||
**\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> ** | |||
The following tags case a paragraph break. | |||
**\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> ** | |||
Text between the following tags is ignored.\ | |||
**\<script\> ... \</script\> \ | |||
\<style\> ... \</style\> ** | |||
Text between the following tags is ignored. | |||
**\<script\> ... \</script\> | |||
\<style\> ... \</style\> | |||
** |
@@ -1,311 +1,279 @@ | |||
5. VOICES {.western} | |||
--------- | |||
### 5.1 Voice Files {.western} | |||
# Voice Files | |||
A Voice file specifies a language (and possibly a language variant or | |||
dialect) together with various attributes that affect the | |||
characteristics of the voice quality and how the language is spoken. | |||
Voice files are placed in the `espeak-data/voices`{.western} directory, | |||
Voice files are placed in the `espeak-data/voices` directory, | |||
or within subdirectories in there. | |||
The available voice files can be listed by: | |||
~~~~ {.western} | |||
espeak-ng --voices | |||
espeak-ng --voices | |||
or | |||
espeak-ng --voices=<language> | |||
~~~~ | |||
espeak-ng --voices=<language> | |||
also | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng --voices=<variant> | |||
~~~~ | |||
espeak-ng --voices=<variant> | |||
Lists voice variants which can be applied to eSpeak voices. | |||
Lists voice variants which can be applied to eSpeak NG voices. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
espeak-ng --voices=<mbrola> | |||
~~~~ | |||
espeak-ng --voices=<mbrola> | |||
Lists the Mbrola voices. | |||
### 5.2 Contents of Voice Files {.western} | |||
## Contents of Voice Files | |||
The **language** attribute is mandatory. All the other attributes are | |||
optional. | |||
#### Identification Attributes {.western} | |||
**name \<name\>** | |||
A name given to this voice. | |||
**language \<language code\> [\<priority\>]** | |||
### Identification Attributes | |||
This attribute should appear before the other attributes which are | |||
listed below. | |||
* **name \<name\>** | |||
A name given to this voice. | |||
* **language \<language code\> [\<priority\>]** | |||
This attribute should appear before the other attributes which are | |||
listed below. | |||
It selects the default behaviour and characteristics for the language, | |||
and sets default values for "phonemes", "dictionary" and other | |||
attributes. The \<language code\> should be a two-letter ISO 639-1 | |||
language code. One or more language variant codes may be appended, | |||
separated by hyphens. (eg. en-uk-north). | |||
separated by hyphens. (eg. en-uk-north). | |||
The optional \<priority\> value gives the preference of this voice | |||
compared with others for the specified language. A low value indicates a | |||
more preferred voice. The default value is 5. | |||
more preferred voice. The default value is 5. | |||
More than one **language** line may be present. A voice may be selected | |||
for other related languages (variants which have the same initial 2 | |||
letter language code as the specified language), but it will be less | |||
preferred for these. Different language variants may be specified by | |||
additional **language** lines in order to indicate that this is a | |||
preferred voice for them also. Eg. | |||
~~~~ {.western} | |||
language en-uk-north | |||
language en | |||
~~~~ | |||
indicates that this is voice is for the "en-uk-north" dialect, but it is | |||
preferred voice for them also. Eg. | |||
``` | |||
language en-uk-north | |||
language en | |||
``` | |||
indicates that this is voice is for the "en-uk-north" dialect, but it is | |||
also a main choice when a general "en" language is specified. Without | |||
the second **language** line, it would be disfavoured for "en" for being | |||
a more specialised voice. | |||
**gender \<gender\> [\<age\>]** | |||
This attribute is only a label for use in voice selection. It doesn't | |||
change the sound of the voice. | |||
\<gender\> may be male, female, or unknown.\ | |||
\<age\> is optional and gives an age in years. | |||
* **gender \<gender\> [\<age\>]** | |||
This attribute is only a label for use in voice selection. It doesn't | |||
change the sound of the voice. | |||
\<gender\> may be male, female, or unknown. | |||
\<age\> is optional and gives an age in years. | |||
**pitch \<base\> \<range\>** | |||
### Voice Attributes | |||
Two integer values. The first gives a base pitch to the voice (value in | |||
* **pitch \<base\> \<range\>** | |||
Two integer values. The first gives a base pitch to the voice (value in | |||
Hz) The second controls the range of pitches used by the voice. Setting | |||
it equal to the base pitch will give a monotone. The default values are | |||
82 118. | |||
**formant \<number\> \<frequency\> \<strength\> \<width\> | |||
\<freq\_add\>** | |||
Systematically adjusts the frequency, strength, and width of the | |||
it equal to the base pitch will give a monotone. The default values are 82 118. | |||
* **formant \<number\> \<frequency\> \<strength\> \<width\> | |||
\<freq\_add\>** | |||
Systematically adjusts the frequency, strength, and width of the | |||
resonance peaks of the voice. Values are percentages of the default | |||
values. Changing these affects the tone/quality of the voice. | |||
**freq\_add**Adds a constant value (in Hz) to the frequency of the | |||
* **freq\_add** | |||
Adds a constant value (in Hz) to the frequency of the | |||
formant peak. The value may be negative. | |||
* Formants 1,2,3 are the standard three formants which define vowels. | |||
* Formant 0 is used to give a low frequency component to the sounds, of frequency lower than F1. | |||
* Formants 4,5 are higher than F3. They affect the quality of the voice. | |||
* Formants 6,7,8 are weak, high frequency, additions to vowels to give a clearer sound. | |||
- - - - | |||
**echo \<delay\> \<amplitude\>** | |||
Parameter 1 gives the delay in mS (0 to 250mS).\ | |||
Parameter 2 gives the echo amplitude (0 to 100).\ | |||
Adding some echo can give a clearer or more interesting sound, | |||
* **echo \<delay\> \<amplitude\>** | |||
Parameter 1 gives the delay in mS (0 to 250mS). | |||
Parameter 2 gives the echo amplitude (0 to 100). | |||
Adding some echo can give a clearer or more interesting sound, | |||
especially when listening through a domestic stereo sound system, rather | |||
than small computer speakers. | |||
**tone** | |||
Controls the tone of the sound.\ | |||
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\> | |||
* **tone** | |||
Controls the tone of the sound. | |||
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\> | |||
which define a frequency response graph. Frequency is in Hz and | |||
amplitude is in the range 0 to 255. The default is: | |||
` `{.western}`tone 600 170 1200 135 2000 110`{.western} | |||
This means that from frequency 0Hz to 600Hz the amplitude is 170. From | |||
amplitude is in the range 0 to 25 The default is: | |||
`tone 600 170 1200 135 2000 110` | |||
This means that from frequency 0Hz to 600Hz the amplitude is 17 From | |||
600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases | |||
to 110 at 2000Hz and remains at 110 at higher frequencies. This | |||
adjustment applies only to voiced sounds such as vowels and sonorant | |||
consonants (such as [n] and [l]). Unvoiced sounds such as [s] are | |||
unaffected. | |||
This **tone** statement can also appear in | |||
`espeak-data/config`{.western}, in which case it applies to all voices | |||
unaffected. | |||
This **tone** statement can also appear in | |||
`espeak-data/config`, in which case it applies to all voices | |||
which don't have their own **tone** statement. | |||
**flutter \<value\>** | |||
Default value: 2.\ | |||
Adds pitch fluctuations to give a wavering or older-sounding voice. A | |||
* **flutter \<value\>** | |||
Default value: 100. | |||
Adds pitch fluctuations to give a wavering or older-sounding voice. A | |||
large value (eg. 20) makes the voice sound "croaky". | |||
**roughness \<value\>** | |||
Default value: 2. Range 0 - 7\ | |||
Reduces the amplitude of alternate waveform cycles in order to make the | |||
* **roughness \<value\>** | |||
Default value: Range 0 - 7 | |||
Reduces the amplitude of alternate waveform cycles in order to make the | |||
voice sound creaky. | |||
**voicing \<value\>** | |||
Default value: 100.\ | |||
Adjusts the strength of formant-synthesized sounds (vowels and sonorant | |||
* **voicing \<value\>** | |||
Default value: 100 | |||
Adjusts the strength of formant-synthesized sounds (vowels and sonorant | |||
consonants). | |||
**consonants \<value\> \<value\>** | |||
Default values: 100, 100.\ | |||
Adjusts the strength of noise sounds which are used in consonants. The | |||
first value is the strength of unvoiced consonants such as "s" and "t". | |||
The second value is the strength of the noise component of voiced | |||
* **consonants \<value\> \<value\>** | |||
Default values: 100, 100 | |||
Adjusts the strength of noise sounds which are used in consonants. The | |||
first value is the strength of unvoiced consonants such as "s" and "t". | |||
The second value is the strength of the noise component of voiced | |||
consonants such as "z" and "d". | |||
**breath \<up to 8 integer values\>** | |||
Default values: 0.\ | |||
Adds noise which corresponds to the formant frequency peaks. The values | |||
give the strength of noise for each formant peak (formants 1 to 8). | |||
Use together with a low or zero value of the **voicing** attribute to | |||
make a "wisper". For example:\ | |||
`breath 75 75 60 40 15 10 breathw 150 150 200 200 400 400 voicing 18 flutter 20 formant 0 100 0 100 // remove formant 0 `{.western} | |||
**breathw \<up to 8 integer values\>** | |||
These values give bandwidths of the noise peaks of the **breath** | |||
* **breath \<up to 8 integer values\>** | |||
Default values: 0. | |||
Adds noise which corresponds to the formant frequency peaks. The values | |||
give the strength of noise for each formant peak (formants 1 to 8). | |||
Use together with a low or zero value of the **voicing** attribute to | |||
make a "wisper". For example: | |||
``` | |||
breath 75 75 60 40 15 10 | |||
breathw 150 150 200 200 400 400 | |||
voicing 18 | |||
flutter 20 | |||
formant 0 100 0 100 // remove formant 0 | |||
``` | |||
* **breathw \<up to 8 integer values\>** | |||
These values give bandwidths of the noise peaks of the **breath** | |||
attribute. If **breathw** values are not given, then suitable default | |||
values will be used. | |||
**speed \<value\>** | |||
Default value 100.\ | |||
Adjusts the speaking speed by a percentage of the default rate. This | |||
* **speed \<value\>** | |||
Default value 10 | |||
Adjusts the speaking speed by a percentage of the default rate. This | |||
can be used if a language voice seems faster or slower compared to other | |||
voices. | |||
**phonemes \<name\>** | |||
### Language Attributes | |||
Specifies which set of phonemes to use from those contained in the | |||
* **phonemes \<name\>** | |||
Specifies which set of phonemes to use from those contained in the | |||
phontab, phonindex, and phondata data files. This is a **phonemetable** | |||
name as given in the "phoneme" source file. | |||
This parameter is usually not needed as it is set by default to the | |||
This parameter is usually not needed as it is set by default to the | |||
first two letters of the "language" parameter. However, different voices | |||
of the same language can use different phoneme sets, to give different | |||
accents. | |||
**dictionary \<name\>** | |||
Specifies which pair of dictionary files to use. eg. "english" indicates | |||
* **dictionary \<name\>** | |||
Specifies which pair of dictionary files to use. eg. "english" indicates | |||
that *speak-data/en\_dict* should be used to translate from words to | |||
phonemes. This parameter is usually not needed as it is set by default | |||
to the first two letters of "language" parameter. | |||
**dictrules \<list of rule numbers\>** | |||
Gives a list of conditional dictionary rules which are applied for this | |||
* **dictrules \<list of rule numbers\>** | |||
Gives a list of conditional dictionary rules which are applied for this | |||
voice. Rule numbers are in the range 0 to 31 and are specific to a | |||
language dictionary. They apply to rules in the language's **\_rules** | |||
dictionary file and also its **\_list** exceptions list. See | |||
[dictionary.html](dictionary.html). | |||
**replace \<flags\> \<phoneme\> \<replacement phoneme\>** | |||
Replace a phoneme by another whenever it occurs. | |||
\<replacement phoneme\> may be NULL. | |||
Flags: bit 0: replacement only occurs on the final phoneme of a word.\ | |||
Flags: bit 1: replacement doesn't occur in stressed syllables.\ | |||
eg. | |||
~~~~ {.western} | |||
* **replace \<flags\> \<phoneme\> \<replacement phoneme\>** | |||
Replace a phoneme by another whenever it occurs. | |||
\<replacement phoneme\> may be NULL. | |||
Flags: bit 0: replacement only occurs on the final phoneme of a word. | |||
Flags: bit 1: replacement doesn't occur in stressed syllables. | |||
eg. | |||
``` | |||
replace 0 h NULL // drops h's | |||
replace 0 V U // replaces vowel in 'strut' by that in 'foot' | |||
// as occurs in northern British English | |||
replace 3 N n // change 'fishing' to 'fishin' etc. | |||
// (only the last phoneme of a word, only in unstressed syllables) | |||
~~~~ | |||
The phoneme mnemonics can be defined for each language, but some are | |||
listed in [phonemes.html](phonemes.html) | |||
**stressLength \<8 integer values\>** | |||
``` | |||
The phoneme mnemonics can be defined for each language, but some are | |||
listed in [phonemes](phonemes.md) | |||
Eight integer parameters. These control the relative lengths of the | |||
* **stressLength \<8 integer values\>** | |||
Eight integer parameters. These control the relative lengths of the | |||
vowels in stressed and unstressed syllables. | |||
- - - - - - - - | |||
**stressAdd \<8 integer values\>** | |||
Eight integer parameters. These are added to the voice's corresponding | |||
* 0 unstressed | |||
* 1 diminished. Its use depends on the language. In English it's used for unstressed syllables within multisyllabic words. In Spanish it's used for unstressed final syllables. | |||
* 2 secondary stress | |||
* 3 words marked as "unstressed" in the dictionary | |||
* 4 not currently used | |||
* 5 not currently used | |||
* 6 stressed syllable (the main syllable in stressed words) | |||
* 7 tonic syllable (by default, the last stressed syllable in the clause) | |||
* **stressAdd \<8 integer values\>** | |||
Eight integer parameters. These are added to the voice's corresponding | |||
stressLength values. They are used in the voice variant files in | |||
`espeak-data/voices/!v`{.western} to give some variety. Negative values | |||
may be used. | |||
`espeak-data/voices/!v` to give some variety. Negative values may be used. | |||
**stressAmp \<8 integer values\>** | |||
Eight integer parameters. These control the relative amplitudes of the | |||
* **stressAmp \<8 integer values\>** | |||
Eight integer parameters. These control the relative amplitudes of the | |||
vowels in stressed and unstressed syllables (see stressLength above). | |||
The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although | |||
these defaults may be different for particular languages. | |||
**intonation \<param1\>** | |||
- - - - | |||
**charset \<param1\>** | |||
* **intonation \<param1\>** | |||
1 Default. | |||
2 Less intonation. | |||
3 Less intonation, and comma does not raise the pitch. | |||
4 Pitch rises (rather than falls) at the end of sentence. | |||
The ISO 8859 character set number. (not all are implemented). | |||
**dictmin \<value\>** | |||
* **charset \<param1\>** | |||
The ISO 8859 character set number. (not all are implemented). | |||
Used for some languages to detect if additional language data is | |||
* **dictmin \<value\>** | |||
Used for some languages to detect if additional language data is | |||
installed. If the size of the compiled dictionary data for the language | |||
(the file `espeak-data/*_dict`{.western}) is less than this size then a | |||
(the file `espeak-data/*_dict`) is less than this size then a | |||
warning is given. | |||
**alphabet2 \<alphabet\> \<language\>** | |||
Used to specify a language to be used to speak words which are written | |||
in a non-native alphabet. eg: | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
alphabet2 cyr ru | |||
~~~~ | |||
Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default | |||
* **alphabet2 \<alphabet\> \<language\>** | |||
Used to specify a language to be used to speak words which are written | |||
in a non-native alphabet. eg: | |||
``` | |||
alphabet2 cyr ru | |||
``` | |||
Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default | |||
language for latin alphabet is English. | |||
**dictdialect \<dialect\>** | |||
Words can be marked in the \*\_list or \*\_rules file to be spoken using | |||
* **dictdialect \<dialect\>** | |||
Words can be marked in the \*\_list or \*\_rules file to be spoken using | |||
a foreign voice. This **dictdialect** attribute can be used to specify | |||
which dialect of the foreign language should be used, instead of the | |||
default dialect. The currently available dialects are:\ | |||
**en-us** (US English)\ | |||
**es-la** (Latin American Spanish).\ | |||
eg. | |||
~~~~ {.western style="margin-bottom: 0.5cm"} | |||
dictdialect en-us | |||
~~~~ | |||
This means that any words or rules which are maked with \_\^\_EN will be | |||
default dialect. The currently available dialects are: | |||
**en-us** (US English) | |||
**es-la** (Latin American Spanish). | |||
eg. | |||
``` | |||
dictdialect en-us | |||
``` | |||
This means that any words or rules which are maked with \_\^\_EN will be | |||
spoken with the US English voice instead of the default UK English | |||
voice. | |||
Additional attributes are available to set various internal options | |||
which control how language is processed. These would normally be set in | |||
the program code rather than in a voice file. | |||
## Voice Files Provided | |||
A number of Voice files are provided in the | |||
`espeak-data/voices`{.western} directory. You can select one of these | |||
`espeak-data/voices` directory. You can select one of these | |||
with the **-v \<voice filename\>** parameter to the speak command. | |||
**default** | |||
This voice is used if none is specified in the speak command. You can | |||
* **default** | |||
This voice is used if none is specified in the speak command. You can | |||
copy your preferred voice to "default" so you can use the speak command | |||
without the need to specify a voice. | |||
For a list of voices provided for English and other languages see | |||
[Languages](languages.html). | |||
[Languages](languages.md). |
@@ -1,8 +1,6 @@ | |||
phoneme i | |||
vowel starttype #i endtype #i | |||
length 100 | |||
IfNextVowelAppend(;) | |||
FMT(vowel/i_6) | |||
endphoneme | |||
@@ -59,8 +57,7 @@ endphoneme | |||
phoneme i: | |||
vowel starttype #i endtype #i | |||
length 250 | |||
IfNextVowelAppend(;) | |||
FMT(vowel/i_6) | |||
FMT(vowel/i_7) | |||
endphoneme | |||
phoneme E | |||
@@ -91,7 +88,7 @@ endphoneme | |||
phoneme a | |||
vowel starttype #a endtype #a | |||
length 100 | |||
FMT(vowel/aa_7) // a_5 or aa_7 | |||
FMT(vowel/aa_7) // possible variants: a_3, a_5 or aa_7 | |||
endphoneme | |||
phoneme a: | |||
@@ -101,52 +98,6 @@ phoneme a: | |||
FMT(vowel/aa_9) // was a_3 or aa_9 | |||
endphoneme | |||
phoneme a3 | |||
vowel starttype #a endtype #a | |||
length 100 | |||
//ChangeIfDiminished(a#) | |||
FMT(vowel/a_3) | |||
endphoneme | |||
phoneme a5 | |||
vowel starttype #a endtype #a | |||
length 100 | |||
//ChangeIfDiminished(a#) | |||
FMT(vowel/a_5) | |||
endphoneme | |||
phoneme a5: | |||
vowel starttype #a endtype #a | |||
length 350 | |||
FMT(vowel/a_5) | |||
endphoneme | |||
phoneme a77 | |||
vowel starttype #a endtype #a | |||
length 100 | |||
//ChangeIfDiminished(a#) | |||
FMT(vowel/aa_7) | |||
endphoneme | |||
phoneme a77: | |||
vowel starttype #a endtype #a | |||
length 350 | |||
FMT(vowel/aa_7) | |||
endphoneme | |||
phoneme a22 | |||
vowel starttype #a endtype #a | |||
length 100 | |||
//ChangeIfDiminished(a#) | |||
FMT(vowel/aa_2) | |||
endphoneme | |||
phoneme a22: | |||
vowel starttype #a endtype #a | |||
length 350 | |||
FMT(vowel/aa_2) | |||
endphoneme | |||
phoneme o | |||
vowel starttype #o endtype #o | |||
length 100 | |||
@@ -168,7 +119,7 @@ endphoneme | |||
phoneme u: | |||
vowel starttype #u endtype #u | |||
length 250 | |||
FMT(vowel/u) | |||
FMT(vowel/u_3) | |||
endphoneme | |||
@@ -300,7 +251,7 @@ phoneme c | |||
vls pal stop palatal | |||
voicingswitch J | |||
lengthmod 2 | |||
WAV(ustop/c, 80) | |||
WAV(ustop/c, 90) | |||
endphoneme | |||
phoneme l |
@@ -0,0 +1,149 @@ | |||
# espeak-ng - A multi-lingual software speech synthesizer. | |||
## SYNOPSIS | |||
__espeak-ng__ [<options>] [<<words>>] | |||
## DESCRIPTION | |||
__espeak-ng__ is a software speech synthesizer for English, and some other | |||
languages. | |||
## OPTIONS | |||
* `-h`, `--help`: | |||
Show summary of options. | |||
* `--version`: | |||
Prints the espeak library version and the location of the espeak voice | |||
data. | |||
* `-f <text file>`: | |||
Text file to speak. | |||
* `--stdin`: | |||
Read text input from stdin instead of a file. | |||
If neither -f nor --stdin are provided, <words> are spoken, or if no | |||
words are provided then text is spoken from stdin a line at a time. | |||
* `-q`: | |||
Quiet, don't produce any speech (may be useful with -x). | |||
* `-a <integer>`: | |||
Amplitude, 0 to 200, default is 100. | |||
* `-g <integer>`: | |||
Word gap. Pause between words, units of 10ms at the default speed. | |||
* `-k <integer>`: | |||
Indicate capital letters with: 1=sound, 2=the word "capitals", higher | |||
values = a pitch increase (try -k20). | |||
* `-l <integer>`: | |||
Line length. If not zero (which is the default), consider lines less than | |||
this length as end-of-clause. | |||
* `-p <integer>`: | |||
Pitch adjustment, 0 to 99, default is 50. | |||
* `-s <integer>`: | |||
Speed in words per minute, default is 160. | |||
* `-v <voice name>`: | |||
Use voice file of this name from espeak-data/voices. A variant can be | |||
specified using <voice>+<variant>, such as af+m3. | |||
* `-w <wave file name>`: | |||
Write output to this WAV file, rather than speaking it directly. | |||
* `--split=<minutes>`: | |||
Used with `-w` to split the audio output into <minutes> recorded | |||
chunks. | |||
* `-b`: | |||
Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit. | |||
* `-m`: | |||
Indicates that the text contains SSML (Speech Synthesis Markup Language) | |||
tags or other XML tags. Those SSML tags which are supported are | |||
interpreted. Other tags, including HTML, are ignored, except that some HTML | |||
tags such as <hr> <h2> and <li> ensure a break in the | |||
speech. | |||
* `-x`: | |||
Write phoneme mnemonics to stdout. | |||
* `-X`: | |||
Write phonemes mnemonics and translation trace to stdout. If rules files | |||
have been built with --compile=debug, line numbers will also be displayed. | |||
* `-z`: | |||
No final sentence pause at the end of the text. | |||
* `--stdout`: | |||
Write speech output to stdout. | |||
* `--compile=voicename`: | |||
Compile the pronunciation rules and dictionary in the current directory. | |||
=<voicename< is optional and specifies which language is compiled. | |||
* `--compile-debug=voicename`: | |||
Compile the pronunciation rules and dictionary in the current directory as | |||
above, but include line numbers, that get shown when -X is used. | |||
* `--ipa`: | |||
Write phonemes to stdout using International Phonetic Alphabet. --ipa=1 Use | |||
ties, --ipa=2 Use ZWJ, --ipa=3 Separate with _. | |||
* `--tie=<character>`: | |||
The character to use to join multi-letter phonemes in -x and --ipa output. | |||
* `--path=<path>`: | |||
Specifies the directory containing the espeak-data directory. | |||
* `--pho`: | |||
Write mbrola phoneme data (.pho) to stdout or to the file in --phonout. | |||
* `--phonout=<filename>`: | |||
Write output from -x -X commands and mbrola phoneme data to this file. | |||
* `--punct="<characters>"`: | |||
Speak the names of punctuation characters during speaking. If | |||
=<characters> is omitted, all punctuation is spoken. | |||
* `--sep=<character>`: | |||
The character to separate phonemes from the -x and --ipa output. | |||
* `--voices[=<language code>]`: | |||
Lists the available voices. If =<language code> is present then only | |||
those voices which are suitable for that language are listed. | |||
* `--voices=<directory>`: | |||
Lists the voices in the specified subdirectory. | |||
## EXAMPLES | |||
* `espeak-ng "This is a test"`: | |||
Speak the sentence "This is a test" using the default English voice. | |||
* `espeak-ng -f hello.txt`: | |||
Speak the contents of hello.txt using the default English voice. | |||
* `cat hello.txt | espeak-ng`: | |||
Speak the contents of hello.txt using the default English voice. | |||
* `espeak-ng -x hello`: | |||
Speak the word "hello" using the default English voice, and print the | |||
phonemes that were spoken. | |||
* `espeak-ng -ven-us "[[h@'loU]]"`: | |||
Speak the phonemes "h@'loU" using the American English voice. | |||
## AUTHOR | |||
eSpeak NG is maintained by Reece H. Dunn <[email protected]>. It is based on | |||
eSpeak by Jonathan Duddington <[email protected]>. | |||
This manual page is based on the eSpeak page written by Luke Yelavich | |||
<[email protected]> for the Ubuntu project. |