Browse Source

Markdown document files cleaned up.

master
Valdis Vitolins 9 years ago
parent
commit
5530cf85c8
26 changed files with 1743 additions and 1753 deletions
  1. 2
    0
      .gitignore
  2. 11
    5
      Makefile.am
  3. 11
    0
      README.md
  4. 5
    0
      dictsource/lv_rules
  5. 44
    31
      docs/add_language.md
  6. 0
    69
      docs/analyse.html
  7. 41
    30
      docs/analyse.md
  8. 0
    227
      docs/commands.html
  9. 6
    6
      docs/commands.md
  10. 288
    227
      docs/dictionary.md
  11. 0
    67
      docs/docindex.html
  12. 0
    75
      docs/editor.html
  13. 54
    28
      docs/editor.md
  14. 0
    143
      docs/editor_if.html
  15. 166
    27
      docs/editor_if.md
  16. 0
    87
      docs/index.html
  17. 75
    0
      docs/index.md
  18. 82
    68
      docs/intonation.md
  19. 57
    36
      docs/languages.md
  20. 85
    57
      docs/mbrola.md
  21. 136
    259
      docs/phonemes.md
  22. 335
    48
      docs/phontab.md
  23. 37
    23
      docs/ssml.md
  24. 155
    187
      docs/voices.md
  25. 4
    53
      phsource/ph_latvian
  26. 149
    0
      src/espeak-ng.1.ronn

+ 2
- 0
.gitignore View File

@@ -46,6 +46,8 @@ libespeak-ng.so*
*.html
*.exe

src/espeak-ng.1

src/espeak-ng
src/espeakedit
src/speak-ng

+ 11
- 5
Makefile.am View File

@@ -32,7 +32,6 @@ EXTRA_DIST += ChangeLog

all-local: \
espeak-data/phontab \
docs/speak_lib.h \
dictionaries \
mbrola

@@ -53,12 +52,19 @@ distclean-local:
##### documentation:

%.html: %.md _layouts/webpage.html
cat $< | sed -e 's/\.md)/.html)/g' | kramdown --template _layouts/webpage.html > $@
cat $< | sed -e 's/\.md)/.html)/g' -e 's/\.ronn/.html/g' | \
kramdown --template _layouts/webpage.html > $@

docs: README.html
%.html: %.ronn
ronn --html $<

docs/speak_lib.h: src/include/espeak-ng/speak_lib.h
cp $< $@
src/espeak-ng.1: src/espeak-ng.1.ronn
ronn --roff $<

docs: docs/index.html \
src/espeak-ng.1.html \
README.html \
src/espeak-ng.1

##### build targets:


+ 11
- 0
README.md View File

@@ -9,6 +9,7 @@
- [Cross-Compiling For Windows](#cross-compiling-for-windows)
- [Testing](#testing)
- [Installing](#installing)
- [Documentation](#documentation)
- [Building Voices](#building-voices)
- [Adding New Voices](#adding-new-voices)
- [Praat Changes](#praat-changes)
@@ -42,6 +43,7 @@ Optionally, you need:
To build the documentation, you need:

1. the `kramdown` markdown processor.
2. the `ronn` man-page markdown processor.

### Debian

@@ -65,6 +67,7 @@ Documentation dependencies:
| Dependency | Install |
|---------------|--------------------------------------|
| kramdown | `sudo apt-get install ruby-kramdown` |
| ronn | `sudo apt-get install ruby-ronn` |

Cross-compiling for windows:

@@ -181,6 +184,14 @@ already have an espeak-ng install by running:

find /usr/lib | grep libespeak-ng

## Documentation

The [main documentation](docs/index.md) for eSpeak NG provides more information
on using and creating voices/languages for for eSpeak NG.

The [espeak-ng](src/espeak-ng.1.ronn) command-line documentation provides a
reference of the different command-line options available, with example usage.

## Building Voices

If you are modifying a language's phoneme, voice or dictionary files, you

+ 5
- 0
dictsource/lv_rules View File

@@ -37,12 +37,14 @@

.group a
a a
a (a a_!
ai ai
au au
ap ap // prefix

.group ā
ā a:
ā (ā a:_!

.group b
b b
@@ -109,6 +111,8 @@

.group i
i i
i (i i_!
i (ī i_!
ie ie
iu iu

@@ -1150,6 +1154,7 @@

.group u
u u
u (u u_!
ui ui

.group ū

+ 44
- 31
docs/add_language.md View File

@@ -1,5 +1,15 @@
6. ADDING OR IMPROVING A LANGUAGE {.western}
---------------------------------
# Table of contents

* [Adding or improving a language](#adding-or-improving-a-language)
* [Language Code](#language-code)
* [Language Files](#language-files)
* [Voice File](#voice-file)
* [Phoneme Definition File](#phoneme-definition-file)
* [Dictionary Files](#dictionary-files)
* [Program Code](#program-code)
* [Improving a Language](#improving-a-language)

# Adding or improving a language

Most of the work doesn't need any programming knowledge. Just an
understanding of the language, an awareness of its features, patience
@@ -11,10 +21,9 @@ In many cases it should be fairly easy to add a rough implementation of
a new language, hopefully enough to be intelligible. After that it's a
gradual process of improvement.

### 6.1 Language Code {.western}
## Language Code

Generally, the language's international [ISO
639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify
Generally, the language's international [ISO 639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify
the language. It is used in the filenames which contain the language's
data. In the examples below the code **"fr"** is used as an example.
Replace this with the code of your language.
@@ -26,31 +35,28 @@ It is possible to have different variants of a language for different
dialects. For example the sound of some phonemes are changed, or some of
the pronunciation rules differ.

### 6.2 Language Files {.western}
## Language Files

The following files are needed for your language.

- - - -

The **fr\_rules** and **fr\_list** files are compiled to produce the
file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking.

### 6.3 Voice File {.western}
## Voice File

Each language needs a voice file in **espeak-data/voices** or
**espeak-data/voices/test**. The filename of the default voice for a
language should be the same as the language code (eg. "fr" for French).

Details of the contents of voice files are given in
[voices.html](http://espeak.sf.net/voices.html).
[voices](voices.md).

The simplest voice file would contain just 2 lines to give the language
name and language code, eg:

~~~~ {.western}
name french
language fr
~~~~
name french
language fr

This language code specifies which phoneme table and dictionary to use
(i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If
@@ -59,7 +65,7 @@ attributes in the voice file. For example you may want to start the
implementation of a new language by using the phoneme table of an
existing language.

### 6.4 Phoneme Definition File {.western}
## Phoneme Definition File

You must first decide on the set of phonemes (vowel and consonant
sounds) for the language. These should be defined in a phoneme
@@ -67,10 +73,8 @@ definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your
language. A reference to this file is then included at the end of the
master phoneme file, **phsource/phonemes**, eg:

~~~~ {.western}
phonemetable fr base
include ph_french
~~~~
phonemetable fr base
include ph_french

This example defines a phoneme table **"fr"** which inherits the
contents of phoneme table **"base"**. Its contents are found in the file
@@ -89,7 +93,7 @@ additional consonants that are needed), or phonemes whose definitions
differ from the inherited version (eg. the redefinition of a consonant).

Details of phonemes files are given in
[phontab.html](http://espeak.sf.net/phontab.html).
[phontab](phontab.md).

The **Compile phoneme data** function of the **espeakedit** program
compiles the phonemes files of all languages to produce the files
@@ -101,7 +105,7 @@ in eSpeak, together with the available vowel files which can be used to
define vowel phonemes, will be sufficient. At least for an initial
implementation.

### 6.5 Dictionary Files {.western}
## Dictionary Files

Once the language's phonemes have been defined, then pronunciation
dictionary data can be produced in order to translate the language's
@@ -111,23 +115,31 @@ exceptions list, and attributes of certain words). The corresponding
compiled data file is **espeak-data/fr\_dict** which is produced from
**fr\_rules** and **fr\_list** sources by the command:

> `espeak-ng --compile=fr`{.western}.
`espeak-ng --compile=fr`

Or by using the **espeakedit** program.

Details of the contents of the dictionary files are given in
[dictionary.html](http://espeak.sf.net/dictionary.html).
[dictionary](dictionary.md).

The **fr\_list** file contains:

- - - -

### 6.6 Program Code {.western}
* Pronunciations which exceptions to the rules in fr_rules, (eg. foreign names).
* Pronunciation of letter names, symbol names, and punctuation names.
* Pronunciation of numbers.
* Attributes for words. For example, common function words which should not be stressed, or conjunctions which should be preceded by a pause.

## Program Code

The behaviour of the eSpeak program is controlled by various options
such as:

- - - -

* Default rules for which syllable of a word has the main stress.
* Relative lengths and amplitude of vowels in stressed and unstressed syllables.
* Which intonation tunes to use.
* Rules for speaking numbers.

The function SetTranslator() at the start of the source code file
tr\_languages.cpp recognizes the language code and sets the appropriate
@@ -135,18 +147,19 @@ options. For a new language, you would add its language code and the
required options in SetTranslator(). However, this may not be necessary
during testing because most of the options can also be set in the voice
file in espeak-data/voices (see [Voice
files](http://espeak.sf.net/voices.html)).
files](voices.md)).

### 6.7 Improving a Language {.western}
## Improving a Language

Listen carefully to the eSpeak voice. Try to identify what sounds wrong
and what needs to be improved.

- - - - -

**If you are interested in working on a language, please contact me so
that I can set up the initial data and discuss the features of the
language.**
* Make the spelling-to-phoneme translation rules more accurate, including the position of stressed syllables within words. Some languages are easier than others. I expect most are easier than English.
* Improve the sounds of the phonemes. It may be that a phoneme should sound different depending on adjacent sounds, or whether it's at the start or the end of a word, between vowels, in a stressed or unstressed syllable, etc. This may consist of making small adjustments to vowel and diphthong quality or length, or adjusting the strength of consonants. Phoneme definitions can include conditional statements which can be used to change the sound of a phoneme depending on its environment. Bigger changes may be recording new or replacement consonant sounds, or may even need program code to implement new types of sounds.
* Some common words should be added to the dictionary (the fr_list file for the language) with an "unstressed" attribute **\$u** or **\$u+** (eg. in English, words such as "the", "is", "had", "my", "she", "of", "in", "some"), or should be preceded by a short pause (such as "and", "but", "which"), or have other attributes, in order to make the speech flow better.
* Improve the rhythm of the speech by adjusting the relative lengths of vowels in different contexts, eg. stressed/unstressed syllable, or depending on the following phonemes. This is important for making the speech sound good for the language.
* Make new intonation "tunes" for statements or questions (see [Intonation](intonation.md)).

For most of the eSpeak voices, I do not speak or understand the
language, and I do not know how it should sound. I can only make

+ 0
- 69
docs/analyse.html View File

@@ -1,69 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<title></title>
<meta name="GENERATOR" content="Quanta Plus">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<A href="docindex.html">Back</A>
<hr>
<h2>ANALYSIS</h2>
<hr>
(Further notes are needed)
<p>
Recordings of spoken words and phrases can be analysed to try and make eSpeak match a language more closely.

Unlike most other (larger and better quality) synthesizers, eSpeak's data is not produced directly from recorded sounds. To use an analogy, it's like a drawing or sketch compared with a photograph. Or vector graphics compared with a bitmap image. It's smaller, less accurate, with less subtlety, but it can sometimes show some aspects of the picture more clearly than a more accurate image.

<h4>Recording Sounds</h4>
Recordings should be made while speaking slowly, clearly, and firmly and loudly (but not shouting). Speak about half a metre from the microphone. Try to avoid background noise and hum interference from electrical power cables.


<h4>Praat</h4>
I use a modified version of the praat program (<a href="www.praat.org">www.praat.org</a>) to view and analyse both sound recordings and output from eSpeak. The modification adds a new function (<code>Spectrum->To_eSpeak</code>) which analysis a voiced sound and produces a file which can be loaded into espeakedit. Details of the modification are in the <code>"praat-mod"</code> directory in the espeakedit package.

The analysis contains a sequence of frames, one per cycle at the speech's fundamental frequency. Each frame is a short time spectrum, together with praat's estimation of the f1 to f5 formant frequencies at the time of that cycle.

I also use Praat's <code>New->Record_mono_sound</code> function to make sound recordings.

<h3>Vowels and Diphthongs</h3>
<h4>Analysing a Recording</h4>

Make a recording, with a male voice, and trim it in Praat to keep just the required vowel sound. Then use the new <code>Spectrum->To_eSpeak</code> modification (this was named <code>To_Spectrogram2</code> in earlier versions) to analyse the sound. It produces a file named <code>"spectrum.dat"</code>.

Load the <code>"spectrum.dat"</code> file into espeakedit. Espeakedit has two Open functions, <code>File->Open</code> and <code>File->Open2</code>. They are the same, except that they remember different paths. I generally use <code>File->Open2</code> for reading the <code>"spectrum.dat"</code> file.

The data is displayed in espeakedit as a sequence of spectrum frames (see <a href="editor.html">editor.html</a>).

<h4>Tone Quality</h4>

It can be difficult to match the tonal quality of a new vowel to be compatible with existing vowel files. This is determined by the relative heights and widths of the formant peaks. These vary depending on how the recording was made, the microphone, and the strength and tone of the voice. Also the positions of the higher peaks (F3 upwards) can vary depending on the characteristics of the speaker's voice. Formant peaks correspond to resonances within the mouth and throat, and they depend on its size and shape. With a female voice, all the formants (F1 upwards) are generally shifted to higher frequencies.

For these reasons, it's best to use a male voice, and to use its analysed spectra only as guidance. Rather than construct formant-peaks entirely to match the analysed data, instead copy keyframes from a similar existing vowel. Then make small adjustments to match the position of the F1, F2, F3 formant peaks and hopefully produce the required vowel sound.

<h4>Using an Existing Vowel File</h4>

Choose a similar vowel file from <code>phsource/vowel</code> and open it into espeakedit. It may be useful to use <code>phsource/vowel/vowelchart</code> as a map to show how vowel files compare with each other. You can select a keyframe from the vowel file and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame of the new spectrum sequence. Then adjust the peaks to match the new frame. Press F1 to hear the sound of the formant peaks in the selected frame.

The F0 peak is provided in order to adjust the correct balance of low frequencies, below the F1 peak. If the sound is too muffled, or conversely, too "thin", try adjusting the amplitude or position of the F0 peak.


<h4>Length and Amplitude</h4>

Use an existing vowel file as a guide for how to set the amplitude and length of the keyframes. At the right of each keyframe, its length is shown in mS and under that is its relative (RMS) amplitude.

The second keyframe should be marked with a red marker (use CTRL-M to toggle this). This divides the vowel into the front-part (with one frame), and the rest.

Use F2 to play the sound of the new vowel sequence. It will also produce a WAV file (the default name is speech.wav) which you can read into praat to see whether it has a sensible shape.


<h4>Using the New Vowel</h4>

Make a new directory (eg. vwl_xx) in phsource for your new vowels. Save the spectrum sequence with a name which you have chosen for it.

You can then edit the phoneme file for your language (eg. phsource/ph_xxx), and change a phoneme to refer to your new vowel file. Then do <code>Data->Compile_Phoneme_Data</code> from espeakedit's menubar to re-compile the phoneme data.

</body>
</html>

+ 41
- 30
docs/analyse.md View File

@@ -1,55 +1,66 @@
ANALYSIS
========
# Table of contents

* [ANALYSIS](#analysis)
* [Recording Sounds](#recording-sounds)
* [Praat](#praat)
* [Vowels and Diphthongs](#vowels-and-diphthongs)
* [Analysing a Recording](#analysing-a-recording)
* [Tone Quality](#tone-quality)
* [Using an Existing Vowel File](#using-an-existing-vowel-file)
* [Length and Amplitude](#length-and-amplitude)
* [Using the New Vowel](#using-the-new-vowel)

# ANALYSIS

(Further notes are needed)

Recordings of spoken words and phrases can be analysed to try and make
eSpeak match a language more closely. Unlike most other (larger and
better quality) synthesizers, eSpeak's data is not produced directly
eSpeak NG match a language more closely. Unlike most other (larger and
better quality) synthesizers, of eSpeak NG data is not produced directly
from recorded sounds. To use an analogy, it's like a drawing or sketch
compared with a photograph. Or vector graphics compared with a bitmap
image. It's smaller, less accurate, with less subtlety, but it can
sometimes show some aspects of the picture more clearly than a more
accurate image.

#### Recording Sounds {.western}
## Recording Sounds

Recordings should be made while speaking slowly, clearly, and firmly and
loudly (but not shouting). Speak about half a metre from the microphone.
Try to avoid background noise and hum interference from electrical power
cables.

#### Praat {.western}
## Praat

I use a modified version of the praat program
([www.praat.org](www.praat.org)) to view and analyse both sound
recordings and output from eSpeak. The modification adds a new function
(`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and
([www.praat.org](http://www.praat.org)) to view and analyse both sound
recordings and output from eSpeak NG. The modification adds a new function
(**Spectrum->To_eSpeak**) which analysis a voiced sound and
produces a file which can be loaded into espeakedit. Details of the
modification are in the `"praat-mod"`{.western} directory in the
modification are in the `praat-mod` directory in the
espeakedit package. The analysis contains a sequence of frames, one per
cycle at the speech's fundamental frequency. Each frame is a short time
spectrum, together with praat's estimation of the f1 to f5 formant
frequencies at the time of that cycle. I also use Praat's
`New->Record_mono_sound`{.western} function to make sound recordings.
**New->Record_mono_sound** function to make sound recordings.

### Vowels and Diphthongs {.western}
# Vowels and Diphthongs

#### Analysing a Recording {.western}
## Analysing a Recording

Make a recording, with a male voice, and trim it in Praat to keep just
the required vowel sound. Then use the new
`Spectrum->To_eSpeak`{.western} modification (this was named
`To_Spectrogram2`{.western} in earlier versions) to analyse the sound.
It produces a file named `"spectrum.dat"`{.western}. Load the
`"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open
functions, `File->Open`{.western} and `File->Open2`{.western}. They are
**Spectrum->To_eSpeak** modification (this was named
`To_Spectrogram2` in earlier versions) to analyse the sound.
It produces a file named `spectrum.dat`. Load the
`spectrum.dat` file into espeakedit. Espeakedit has two Open
functions, **File->Open**. They are
the same, except that they remember different paths. I generally use
`File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file.
**File->Open2** file.
The data is displayed in espeakedit as a sequence of spectrum frames
(see [editor.html](editor.html)).
(see [editor](editor.md)).

#### Tone Quality {.western}
## Tone Quality

It can be difficult to match the tonal quality of a new vowel to be
compatible with existing vowel files. This is determined by the relative
@@ -66,11 +77,11 @@ analysed data, instead copy keyframes from a similar existing vowel.
Then make small adjustments to match the position of the F1, F2, F3
formant peaks and hopefully produce the required vowel sound.

#### Using an Existing Vowel File {.western}
## Using an Existing Vowel File

Choose a similar vowel file from `phsource/vowel`{.western} and open it
Choose a similar vowel file from `phsource/vowel` and open it
into espeakedit. It may be useful to use
`phsource/vowel/vowelchart`{.western} as a map to show how vowel files
`phsource/vowel/vowelchart` as a map to show how vowel files
compare with each other. You can select a keyframe from the vowel file
and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame
of the new spectrum sequence. Then adjust the peaks to match the new
@@ -80,22 +91,22 @@ low frequencies, below the F1 peak. If the sound is too muffled, or
conversely, too "thin", try adjusting the amplitude or position of the
F0 peak.

#### Length and Amplitude {.western}
## Length and Amplitude

Use an existing vowel file as a guide for how to set the amplitude and
length of the keyframes. At the right of each keyframe, its length is
shown in mS and under that is its relative (RMS) amplitude. The second
shown in mili seconds and under that is its relative (RMS) amplitude. The second
keyframe should be marked with a red marker (use CTRL-M to toggle this).
This divides the vowel into the front-part (with one frame), and the
rest. Use F2 to play the sound of the new vowel sequence. It will also
produce a WAV file (the default name is speech.wav) which you can read
into praat to see whether it has a sensible shape.

#### Using the New Vowel {.western}
## Using the New Vowel

Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save
Make a new directory (eg. `vwl\_xx`) in phsource for your new vowels. Save
the spectrum sequence with a name which you have chosen for it. You can
then edit the phoneme file for your language (eg. phsource/ph\_xxx), and
then edit the phoneme file for your language (eg. `phsource/ph\_xxx`), and
change a phoneme to refer to your new vowel file. Then do
`Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to
**Data->Compile_Phoneme_Data** from espeakedit's menubar to
re-compile the phoneme data.

+ 0
- 227
docs/commands.html View File

@@ -1,227 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<title>eSpeak Speech Synthesizer</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<A href="index.html">Back</A>
<hr>
<h2>2.1 INSTALLATION</h2>
<hr>
<h3>2.1.1 Linux and other Posix systems</h3>
There are two versions of the command line program. They both have the same command parameters (see below).
<ol>
<li><strong>espeak-ng</strong> uses speech engine in the <strong>libespeak-ng</strong> shared library. The libespeak-ng library must first be installed.
<p>
<li><strong>speak-ng</strong> is a stand-alone version which includes its own copy of the speech engine.
</ol>
Place the <strong>espeak-ng</strong> or <strong>speak-ng</strong> executable file in the command path, eg in <strong>/usr/local/bin</strong>
<p>
Place the "<strong>espeak-data</strong>" directory in /usr/share as <strong>/usr/share/espeak-data</strong>.<br>
Alternatively if it is placed in the user's home directory (i.e. <strong>/home/&lt;user&gt;/espeak-data</strong>)
then that will be used instead.
<p>
<h4>Dependencies</h4>
<strong>espeak-ng</strong> uses the PortAudio sound library (version 18), so you will need to have the <strong>libportaudio0</strong> library package installed. It may be already, since it's used by other software, such as OpenOffice.org and the Audacity sound editor.<p>
Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio which has a slightly different API. The speak program can be compiled to use version 19 of PortAudio by copying the file portaudio19.h to portaudio.h before compiling.<p>
The speak program may be compiled without using PortAudio, by removing the line<pre> #define USE_PORTAUDIO
</pre>in the file speech.h.
<p>&nbsp;<hr>

<h3>2.1.2 Windows</h3>
The installer: <strong>setup_espeak.exe</strong> installs the SAPI5 version of eSpeak.
During installation you need to specify which voices you want to appear in SAPI5 voice menus.
<p>
It also installs a command line program <strong>espeak-ng</strong> in the espeak-ng program directory.

<p>&nbsp;<hr>
<h2>2.2 COMMAND OPTIONS</h2>
<hr>
<h3>2.2.1 Examples</h3>
To use at the command line, type:<br>
&nbsp; <strong>espeak-ng "This is a test"</strong><br>
or<br>
&nbsp; <strong>espeak-ng -f &lt;text file&gt;</strong>
<p>
Or just type<br>
&nbsp; <strong>espeak-ng</strong><br>
followed by text on subsequent lines. Each line is spoken when
RETURN is pressed.
<p>
Use <strong>espeak-ng -x</strong> to see the corresponding phoneme codes.
<p>&nbsp;<hr>
<h3>2.2.2 The Command Line Options</h3>
<dl>
<dt>
<strong>espeak-ng [options] ["text words"]</strong><br>
<dd>Text input can be taken either from a file, from a string in the command, or from stdin.
<p>
<dt>
<strong>-f &lt;text file&gt;</strong><br>
<dd>Speaks a text file.
<p>
<dt>
<strong> --stdin</strong><br>
<dd>Takes the text input from stdin.
<p>
<dt>
If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). <br>If that is not present then text is taken from stdin, but each line is treated as a separate sentence.
<p>
<dt>
<strong>-a &lt;integer&gt;</strong><br>
<dd>Sets amplitude (volume) in a range of 0 to 200. The default is 100.
<p>
<dt>
<strong>-p &lt;integer&gt;</strong><br>
<dd>Adjusts the pitch in a range of 0 to 99. The default is 50.
<p>
<dt>
<strong>-s &lt;integer&gt;</strong><br>
<dd>Sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 175. I generally use a faster speed
of 260. The lower limit is 80. There is no upper limit, but about 500 is probably a practical maximum.
<p>
<dt>
<strong>-b &lt;integer&gt;</strong><br>
<dd>Input text character format.<p>
1 &nbsp; UTF-8. This is the default.<p>
2 &nbsp; The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish).<p>
4 &nbsp; 16 bit Unicode.<p>
Without this option, eSpeak assumes text is UTF-8, but will automatically switch to the 8-bit character set if it finds an illegal UTF-8 sequence.
<p>
<dt>
<strong>-g &lt;integer&gt;</strong><br>
<dd>Word gap. This option inserts a pause between words. The value is the length of the pause, in units of 10 mS (at the default speed of 170 wpm).
<p>
<dt>
<strong>-h </strong> or <strong> --help</strong><br>
<dd>The first line of output gives the eSpeak version number.
<p>
<dt>
<strong>-k &lt;integer&gt;</strong><br>
<dd>Indicate words which begin with capital letters.<p>
1 &nbsp; eSpeak uses a click sound to indicate when a word starts with a capital letter, or double click if word is all capitals.<p>
2 &nbsp; eSpeak speaks the word "capital" before a word which begins with a capital letter.<p>
Other values: &nbsp; eSpeak increases the pitch for words which begin with a capital letter. The greater the value, the greater the increase in pitch. Try -k20.
<p>
<dt>
<strong>-l &lt;integer&gt;</strong><br>
<dd>Line-break length, default value 0. If set, then lines which are shorter
than this are treated as separate clauses and spoken separately with a
break between them. This can be useful for some text files, but bad for
others.
<p>
<dt>
<strong>-m</strong><br>
<dd>Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. Those SSML tags which are supported are interpreted. Other tags, including HTML, are ignored, except that some HTML tags such as &lt;hr&gt; &lt;h2&gt; and &lt;li&gt; ensure a break in the speech.
<p>
<dt>
<strong>-q</strong><br><dd>
Quiet. No sound is generated. This may be useful with options such as -x and --pho.
<p>
<dt>
<strong>-v &lt;voice filename&gt;[+&lt;variant&gt;]</strong><br>
<dd>Sets a Voice for the speech, usually to select a language. eg:
<pre> espeak-ng -vaf</pre>
To use the Afrikaans voice. A modifier after the voice name can be used to vary the tone of the voice, eg:
<pre> espeak-ng -vaf+3</pre>
The variants are <code> +m1 +m2 +m3 +m4 +m5 +m6 +m7</code> for male voices and <code> +f1 +f2 +f3 +f4 </code> which simulate female voices by using higher pitches. Other variants include <code>+croak</code> and <code>+whisper</code>.
<p>
&lt;voice filename&gt; is a file within the <code>espeak-data/voices</code> directory.<br>
&lt;variant&gt; is a file within the <code>espeak-data/voices/!v</code> directory.<p>
Voice files can specify a language, alternative pronunciations or phoneme sets, different pitches, tonal qualities, and prosody for the voice.
See the <a href="voices.html">voices.html</a> file.<p>
Voice names which start with <b>mb-</b> are for use with Mbrola diphone voices, see <a href="mbrola.html">mbrola.html</a><p>
Some languages may need additional dictionary data, see <a href="languages.html">languages.html</a>
<p>
<dt>
<strong>-w &lt;wave file&gt;</strong><br>
<dd>Writes the speech output to a file in WAV format, rather than speaking it.
<p>
<dt>
<strong>-x</strong><br>
<dd>The phoneme mnemonics, into which the input text is translated, are written to stdout.
If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish
this from separate phonemes.
<p>
<dt>
<strong>-X</strong><br>
<dd>As -x, but in addition, details are shown of the pronunciation rule and dictionary list lookup. This can be useful to see why a certain pronunciation is being produced. Each matching pronunciation rule is listed, together with its score, the highest scoring rule being used in the translation. "Found:" indicates the word was found in the dictionary lookup list, and "Flags:" means the word was found with only properties and not a pronunciation. You can see when a word has been retranslated after removing a prefix or suffix.
<p>
<dt>
<strong>-z</strong><br>
<dd>The option removes the end-of-sentence pause which normally occurs at the end of the text.
<p>
<dt>
<strong>--stdout</strong><br>
<dd>Writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced.
<p>
<dt><strong>--compile [=&lt;voice name&gt;]</strong><br>
<dd>
Compile the pronunciation rule and dictionary lookup data from their source files in the current directory. The Voice determines which language's files are compiled. For example, if it's an English voice, then <em>en_rules</em>, <em>en_list</em>, and <em>en_extra</em> (if present), are compiled to replace <em>en_dict</em> in the <em>speak-data</em> directory. If no Voice is specified then the default Voice is used.
<p>
<dt><strong>--compile-debug [=&lt;voice name&gt;]</strong><br>
<dd>
The same as <strong>--compile</strong>, but source line numbers from the *_rules file are included. These are included in the rules trace when the <strong>-X</strong> option is used.
<p>
<dt><strong>--ipa</strong><br>
<dd>
Writes phonemes to stdout, using the International Phonetic Alphabet (IPA).<br>
If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish
this from separate phonemes.
<p>
<dt><strong>--path [="&lt;directory path&gt;"]</strong><br>
<dd>
Specifies the directory which contains the espeak-data directory.
<p>
<dt><strong>--pho</strong><br>
<dd>
When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme data (.pho file format) to stdout. This includes the mbrola phoneme names with duration and pitch information, in a form which is suitable as input to this mbrola voice. The --phonout option can be used to write this data to a file.
<p>
<dt><strong>--phonout [="&lt;filename&gt;"]</strong><br>
<dd>
If specified, the output from -x, -X, --ipa, and --pho options is written to this file, rather than to stdout.
<p>
<dt><strong>--punct [="&lt;characters&gt;"]</strong><br>
<dd>
Speaks the names of punctuation characters when they are encountered in the text. If &lt;characters&gt; are given, then only those listed punctuation characters are spoken, eg. <code> --punct=".,;?"</code>
<p>
<dt><strong>--sep [=&lt;character&gt;]</strong><br>
<dd>
The character is used to separate individual phonemes in the output which is produced by the -x or --ipa options. The default is a space character. The character z means use a ZWNJ character (U+200c).
<p>
<dt><strong>--split [=&lt;minutes&gt;]</strong><br>
<dd>
Used with <strong>-w</strong>, it starts a new WAV file every <code>&lt;minutes&gt;</code> minutes, at the next sentence boundary.
<p>
<dt><strong>--tie [=&lt;character&gt;]</strong><br>
<dd>
The character is used within multi-letter phonemes in the output which is produced by the -x or --ipa options. The default is the tie character&nbsp; &#x361; &nbsp;U+361. The character z means use a ZWJ character (U+200d).
<p>
<dt>
<strong>--voices [=&lt;language code&gt;]</strong><br>
<dd>Lists the available voices.<br>
If =&lt;language code&gt; is present then only those voices which are suitable for that language are listed.<br>
<code>--voices=mbrola</code> lists the voices which use mbrola diphone voices. These are not included in the default <code>--voices</code> list<br>
<code>--voices=variant</code> lists the available voice variants (voice modifiers).<br>

</dl>
<p>&nbsp;<hr>
<h3>2.2.3 The Input Text</h3>
<dl>
<dt><b>HTML Input</b>
<dd>
If the -m option is used to indicate marked-up text, then HTML can be spoken directly.
<p>
<dt><b>Phoneme Input</b>
<dd>
As well as plain text, phoneme mnemonics can be used in the text input to <strong>espeak-ng</strong>. They are enclosed within double square brackets. Spaces are used to separate words and all stressed syllables must be marked explicitly.<p>
&nbsp; eg: &nbsp; <code> espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" </code><p>
This command will speak: "This is some phonetic text input".
</dl>

<hr>
<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=159649&amp;type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a>

</body>

+ 6
- 6
docs/commands.md View File

@@ -43,7 +43,7 @@ in the file speech.h.
## Windows

The installer: **setup\_espeak.exe** installs the SAPI5 version of
eSpeak. During installation you need to specify which voices you want to
eSpeak NG. During installation you need to specify which voices you want to
appear in SAPI5 voice menus.

It also installs a command line program **espeak-ng** in the espeak-ng
@@ -104,7 +104,7 @@ practical maximum.
> 1   UTF-8. This is the default.
> 2   The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish).
> 4   16 bit Unicode.
> Without this option, eSpeak assumes text is UTF-8, but will
> Without this option, eSpeak NG assumes text is UTF-8, but will
automatically switch to the 8-bit character set if it finds an
illegal UTF-8 sequence.

@@ -116,16 +116,16 @@ the length of the pause, in units of 10 mS (at the default speed of

**-h** or **--help**

> The first line of output gives the eSpeak version number.
> The first line of output gives the eSpeak NG version number.

**-k \<integer\>**

> Indicate words which begin with capital letters.
> 1   eSpeak uses a click sound to indicate when a word starts with a
> 1   eSpeak NG uses a click sound to indicate when a word starts with a
capital letter, or double click if word is all capitals.
> 2   eSpeak speaks the word "capital" before a word which begins with
> 2   eSpeak NG speaks the word "capital" before a word which begins with
a capital letter.
> Other values:   eSpeak increases the pitch for words which begin
> Other values:   eSpeak NG increases the pitch for words which begin
with a capital letter. The greater the value, the greater the
increase in pitch. Try -k20.


+ 288
- 227
docs/dictionary.md View File

@@ -1,49 +1,75 @@
4. TEXT TO PHONEME TRANSLATION {.western}
------------------------------

### 4.1 Translation Files {.western}
# Table of contents

* [Text to phoneme translation](#text-to-phoneme-translation)
* [Translation Files](#translation-files)
* [Phoneme names](#phoneme-names)
* [Pronunciation Rules](#pronunciation-rules)
* [Rule Groups](#rule-groups)
* [Rules](#rules)
* [Special characters in \<phoneme string\>](#special-characters-in-phoneme-string)
* [Special Characters in both \<pre\> and \<post\> ](#special-characters-in-both-pre-and-post)
* [Special characters only in \<pre\> ](#special-characters-only-in-pre)
* [Special characters only in \<post\> ](#special-characters-only-in-post)
* [Pronunciation Dictionary List](#pronunciation-dictionary-list)
* [Multiple Words](#multiple-words)
* [Special characters in \<phoneme string\>](#special-characters-in-phoneme-string)
* [Flags](#flags)
* [Translating a Word to another Word](#translating-a-word-to-another-word)
* [Conditional Rules](#conditional-rules)
* [Numbers and Character Names](#numbers-and-character-names)
* [Letter names](#letter-names)
* [Numbers](#numbers)
* [Character Substitution](#character-substitution)

# Text to phoneme translation


## Translation Files

There is a separate set of pronunciation files for each language, their
names starting with the language name.

There are two separate methods for translating words into phonemes:

- -
* Pronunciation Rules. These are an attempt to define the pronunciation rules for the language. The source file is:
**\<language\>\_rules** (eg. `en_rules`)

* Lookup Dictionary. A list of individual words and their pronunciations and/or various other properties. The source files are:
**\<language\>\_list** (eg. `en_list`) and optionally **\<language\>\_extra**.

These two files are compiled into the file ***\<language\>\_dict***  in
the espeak-data directory (eg. espeak-data/en\_dict)
These two files are compiled into the file **\<language\>\_dict**  in
the espeak-data directory (eg. `espeak-data/en_dict`)

### 4.2 Phoneme names {.western}
## Phoneme names

Each of the language's phonemes is represented by a mnemonic of 1, 2, 3,
or 4 characters. Together with a number of utility codes (eg. stress
marks and pauses), these are defined in the phoneme data file (see
\*spec not yet available\*).
marks and pauses), these are defined in the phoneme data file (_TODO_).

The utility 'phonemes' are:

+--------------------------------------+--------------------------------------+
| **'** | primary stress |
+--------------------------------------+--------------------------------------+
| **,** | secondary stress |
+--------------------------------------+--------------------------------------+
| **%** | unstressed syllable |
+--------------------------------------+--------------------------------------+
| **=   ** | put the primary stress on the |
| | preceding syllable |
+--------------------------------------+--------------------------------------+
| **\_:** | short pause |
+--------------------------------------+--------------------------------------+
| **\_** | a shorter pause |
+--------------------------------------+--------------------------------------+
| **||** | indicates a word boundary within a |
| | phoneme string |
+--------------------------------------+--------------------------------------+
| **|** | can be used to separate two adjacent |
| | characters, to prevent them from |
| | being considered as a |
| | multi-character phoneme mnemonic |
+--------------------------------------+--------------------------------------+
+-----------+--------------------------------------+
| **'** | primary stress |
+-----------+--------------------------------------+
| **,** | secondary stress |
+-----------+--------------------------------------+
| **%** | unstressed syllable |
+-----------+--------------------------------------+
| **=** | put the primary stress on the |
| | preceding syllable |
+-----------+--------------------------------------+
| **\_:** | short pause |
+-----------+--------------------------------------+
| **\_** | a shorter pause |
+-----------+--------------------------------------+
| **||** | indicates a word boundary within a |
| | phoneme string |
+-----------+--------------------------------------+
| **|** | can be used to separate two adjacent |
| | characters, to prevent them from |
| | being considered as a |
| | multi-character phoneme mnemonic |
+-----------+--------------------------------------+

It is not necessary to specify the stress of every syllable. Stress
markers are only needed in order to change the effect of the language's
@@ -54,9 +80,11 @@ loosely on the Kirshenbaum ascii character representation of the
International Phonetic Alphabet
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf)

### 4.3 Pronunciation Rules {.western}
Full list of commonly used phonemes can be found in [phsource/phonemes](../phsource/phonemes) file.

The rules in the ***\<language\>\_rules***  file specify the phonemes
## Pronunciation Rules

The rules in the **\<language\>\_rules**  file specify the phonemes
which are used to pronounce each letter, or sequence of letters. Some
rules only apply when the letter or letters are preceded by, or followed
by, other specified letters.
@@ -68,21 +96,52 @@ matching rule is chosen. The pointer into the source word is then
advanced past those letters which have been matched and the process is
repeated until all the letters of the word have been processed.

#### 4.3.1 Rule Groups {.western}
### Rule Groups

The rules are organized in groups, each starting with a ".group" line:

**.group \<character\>**

> A group for each letter or character.

**.group \<2 characters\>**

> Optional groups for some common 2 letter combinations. This is only needed, for efficiency, in cases where there are many rules for a particular letter. They would not be needed for a language which has regular spelling rules. The first character can only be an ascii character (less than 0x80).

**.group**

> A group for other characters which don't have their own group.

**.L\<nn\>**

> Defines a group of letter sequences, any of which can match with Lnn in a pre or post rule (see below). nn is a 2 digit decimal number in the range 01 to 25. eg:
`.L01 b bl br pl pr`

**.replace**

> See section [Character Substitution](#character-substitution).


When matching a word, firstly the 2-letter group for the two letters at
the current position in the word (if such a group exists) is searched,
and then the single-letter group. The highest scoring rule in either of
those two groups is used.

#### 4.3.2 Rules {.western}
### Rules

Each rule is on separate line, and has the syntax:

`[<pre>)] <match> [(<post>] <phoneme string>`

eg.

```
.group o
o 0 // "o" is pronounced as [0]
oo u: // but "oo" is pronounced as [u:]
b) oo (k U
```

"oo" is pronounced as [u:], but when also preceded by "b" and followed
by "k", it is pronounced [U].

@@ -95,140 +154,142 @@ Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must
be lower case, and matching is case-insensitive. Some upper case letters
are used in \<pre\> and \<post\> with special meanings.

#### 4.3.3 Special characters in \<phoneme string\>: {.western}

+--------------------------------------+--------------------------------------+
| **\_\^\_\<language code\>   ** | Translate using a different |
| | language. |
+--------------------------------------+--------------------------------------+

#### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western}

+--------------------------------------+--------------------------------------+
| **\_** | Beginning or end of a word (or a |
| | hyphen). |
+--------------------------------------+--------------------------------------+
| **-** | Hyphen. |
+--------------------------------------+--------------------------------------+
| **A** | Any vowel (the set of vowel |
| | characters may be defined for a |
| | particular language). |
+--------------------------------------+--------------------------------------+
| **C** | Any consonant. |
+--------------------------------------+--------------------------------------+
| **B H F G Y ** | These may indicate other sets of |
| | characters (defined for a particular |
| | language). |
+--------------------------------------+--------------------------------------+
| **L\<nn\>** | Any of the sequence of characters |
| | defined as a letter group (see 4.3.1 |
| | above). |
+--------------------------------------+--------------------------------------+
| **D** | Any digit. |
+--------------------------------------+--------------------------------------+
| **K** | Not a vowel (i.e. a consonant or |
| | word boundary or non-alphabetic |
| | character). |
+--------------------------------------+--------------------------------------+
| **X** | There is no vowel until the word |
| | boundary. |
+--------------------------------------+--------------------------------------+
| **Z** | A non-alphabetic character. |
+--------------------------------------+--------------------------------------+
| **%** | Doubled (placed before a character |
| | in \<pre\> and after it in \<post\>. |
+--------------------------------------+--------------------------------------+
| **/** | The following character is treated |
| | literally. |
+--------------------------------------+--------------------------------------+
### Special characters in \<phoneme string\>:


**_^_\<language code\>**

> Translate using a different language.
If this rule is selected when translating a word, then the translation is aborted and the word is re-translated using the specified different language. \<language code\> may be upper or lower case. This can be used to recognise certain letter combinations as being foreign words and to use the foreign pronunciation for them. eg:
`th (_ _^_EN`

indicates that a word which ends in "th" is translated using the English translation rules and spoken with English phonemes.

### Special Characters in both \<pre\> and \<post\>

+------------------+--------------------------------------+
| **\_** | Beginning or end of a word (or a |
| | hyphen). |
+------------------+--------------------------------------+
| **-** | Hyphen. |
+------------------+--------------------------------------+
| **A** | Any vowel (the set of vowel |
| | characters may be defined for a |
| | particular language). |
+------------------+--------------------------------------+
| **C** | Any consonant. |
+------------------+--------------------------------------+
| **B H F G Y** | These may indicate other sets of |
| | characters (defined for a particular |
| | language). |
+------------------+--------------------------------------+
| **L\<nn\>** | Any of the sequence of characters |
| | defined as a letter group (see 1 |
| | above). |
+------------------+--------------------------------------+
| **D** | Any digit. |
+------------------+--------------------------------------+
| **K** | Not a vowel (i.e. a consonant or |
| | word boundary or non-alphabetic |
| | character). |
+------------------+--------------------------------------+
| **X** | There is no vowel until the word |
| | boundary. |
+------------------+--------------------------------------+
| **Z** | A non-alphabetic character. |
+------------------+--------------------------------------+
| **%** | Doubled (placed before a character |
| | in \<pre\> and after it in \<post\>. |
+------------------+--------------------------------------+
| **/** | The following character is treated |
| | literally. |
+------------------+--------------------------------------+

The sets of letters indicated by A, B, C, E, F G may be defined
differently for each language.

Examples of rules:

~~~~ {.western}
```
_) a // "a" at the start of a word
a (CC // "a" followed by two consonants
a (C% // "a" followed by a double consonant (the same letter twice)
a (/% // "a" followed by a percent sign
%C) a // "a" preceded by a double consonants
~~~~
```

#### 4.3.5 Special characters only in \<pre\>: {.western}
### Special characters only in \<pre\>:

+--------------------------------------+--------------------------------------+
| **@   ** | Any syllable. |
+--------------------------------------+--------------------------------------+
| **&** | A syllable which may be stressed |
| | (i.e. is not defined as unstressed). |
+--------------------------------------+--------------------------------------+
| **V** | Matches only if a previous word has |
| | indicated that a verb form is |
| | expected. |
+--------------------------------------+--------------------------------------+
+-----------------+--------------------------------------+
| **@** | Any syllable. |
+-----------------+--------------------------------------+
| **&** | A syllable which may be stressed |
| | (i.e. is not defined as unstressed). |
+-----------------+--------------------------------------+
| **V** | Matches only if a previous word has |
| | indicated that a verb form is |
| | expected. |
+-----------------+--------------------------------------+

eg.

~~~~ {.western}
```
@@) bi // "bi" preceded by at least two syllables
@@a) bi // "bi" preceded by at least 2 syllables and following 'a'
~~~~
```

Note, that matching characters in the \<pre\> part do not affect the
syllable counting.

#### 4.3.6 Special characters only in \<post\>: {.western}
+--------------------------------------+--------------------------------------+
| **@** | A vowel follows somewhere in the |
| | word. |
+--------------------------------------+--------------------------------------+
| **+** | Force an increase in the score in |
| | this rule (may be repeated for more |
| | effect). |
+--------------------------------------+--------------------------------------+
| **S\<number\>  ** | This number of matching characters |
| | are a standard suffix, remove them |
| | and retranslate the word. |
+--------------------------------------+--------------------------------------+
| **P\<number\>** | This number of matching characters |
| | are a standard prefix, remove them |
| | and retranslate the word. |
+--------------------------------------+--------------------------------------+
| **Lnn** | **nn** is a 2-digit decimal number |
| | in the range 01 to 20\ |
| | Matches with any of the letter |
| | sequences which have been defined |
| | for letter group **nn** |
+--------------------------------------+--------------------------------------+
| **N** | Only use this rule if the word is |
| | not a retranslation after removing a |
| | suffix. |
+--------------------------------------+--------------------------------------+
| **\#** | (English specific) change the next |
| | "e" into a special character "E" |
+--------------------------------------+--------------------------------------+
| **\$noprefix** | Only use this rule if the word is |
| | not a retranslation after removing a |
| | prefix. |
+--------------------------------------+--------------------------------------+
| **\$w\_alt\ | Only use this rule if the word is |
| \$w\_alt2\ | found in the \*\_list file with the |
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** |
| | attribute respectively. |
+--------------------------------------+--------------------------------------+
| **\$p\_alt\ | Only use this rule if the part-word, |
| \$p\_alt2\ | up to and including the pre and |
| \$p\_alt3** | match parts of this rule, is found |
| | in the \*\_list file with the |
| | **\$alt**, **\$alt2** or **\$alt3** |
| | attribute respectively. |
+--------------------------------------+--------------------------------------+
### Special characters only in \<post\>
+--------------------+--------------------------------------+
| **@** | A vowel follows somewhere in the |
| | word. |
+--------------------+--------------------------------------+
| **+** | Force an increase in the score in |
| | this rule (may be repeated for more |
| | effect). |
+--------------------+--------------------------------------+
| **S\<number\>** | This number of matching characters |
| | are a standard suffix, remove them |
| | and retranslate the word. |
+--------------------+--------------------------------------+
| **P\<number\>** | This number of matching characters |
| | are a standard prefix, remove them |
| | and retranslate the word. |
+--------------------+--------------------------------------+
| **Lnn** | **nn** is a 2-digit decimal number |
| | in the range 01 to 20\ |
| | Matches with any of the letter |
| | sequences which have been defined |
| | for letter group **nn** |
+--------------------+--------------------------------------+
| **N** | Only use this rule if the word is |
| | not a retranslation after removing a |
| | suffix. |
+--------------------+--------------------------------------+
| **\#** | (English specific) change the next |
| | "e" into a special character "E" |
+--------------------+--------------------------------------+
| **\$noprefix** | Only use this rule if the word is |
| | not a retranslation after removing a |
| | prefix. |
+--------------------+--------------------------------------+
| **\$w\_alt\ | Only use this rule if the word is |
| \$w\_alt2\ | found in the \*\_list file with the |
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** |
| | attribute respectively. |
+--------------------+--------------------------------------+
| **\$p\_alt\ | Only use this rule if the part-word, |
| \$p\_alt2\ | up to and including the pre and |
| \$p\_alt3** | match parts of this rule, is found |
| | in the \*\_list file with the |
| | **\$alt**, **\$alt2** or **\$alt3** |
| | attribute respectively. |
+--------------------+--------------------------------------+

eg.

~~~~ {.western}
```
@) ly (_S2 lI // "ly", at end of a word with at least one other
// syllable, is a suffix pronounced [lI]. Remove
// it and retranslate the word.
@@ -237,7 +298,7 @@ eg.
// prefix pronounced [Vn]
_) un (i ju: // ... except in words starting "uni"
_) un (inP2 ,Vn // ... but it is for words starting "unin"
~~~~
```

S and P must be at the end of the \<post\> string.

@@ -245,49 +306,49 @@ S\<number\> may be followed by additional letters (eg. S2ei ). Some of
these are probably specific to English, but similar functions could be
made for other languages.

+--------------------------------------+--------------------------------------+
| **q** | query the \_list file to find stress |
| | position or other attributes for the |
| | stem, but don't re-translate the |
| | word with the suffix removed. |
+--------------------------------------+--------------------------------------+
| **t** | determine the stress pattern of the |
| | word **before** adding the suffix |
+--------------------------------------+--------------------------------------+
| **d   ** | the previous letter may have been |
| | doubled when the suffix was added. |
+--------------------------------------+--------------------------------------+
| **e** | "e" may have been removed. |
+--------------------------------------+--------------------------------------+
| **i** | "y" may have been changed to "i." |
+--------------------------------------+--------------------------------------+
| **v** | the suffix means the verb form of |
| | pronunciation should be used. |
+--------------------------------------+--------------------------------------+
| **f** | the suffix means the next word is |
| | likely to be a verb. |
+--------------------------------------+--------------------------------------+
| **m** | after this suffix has been removed, |
| | additional suffixes may be removed. |
+--------------------------------------+--------------------------------------+
+-------+--------------------------------------+
| **q** | query the \_list file to find stress |
| | position or other attributes for the |
| | stem, but don't re-translate the |
| | word with the suffix removed. |
+-------+--------------------------------------+
| **t** | determine the stress pattern of the |
| | word **before** adding the suffix |
+-------+--------------------------------------+
| **d** | the previous letter may have been |
| | doubled when the suffix was added. |
+-------+--------------------------------------+
| **e** | "e" may have been removed. |
+-------+--------------------------------------+
| **i** | "y" may have been changed to "i." |
+-------+--------------------------------------+
| **v** | the suffix means the verb form of |
| | pronunciation should be used. |
+-------+--------------------------------------+
| **f** | the suffix means the next word is |
| | likely to be a verb. |
+-------+--------------------------------------+
| **m** | after this suffix has been removed, |
| | additional suffixes may be removed. |
+-------+--------------------------------------+

P\<number\> may be followed by additonal letters (eg. P3v ).

+--------------------------------------+--------------------------------------+
| **t   ** | determine the stress pattern of the |
| | word **before** adding the prefix |
+--------------------------------------+--------------------------------------+
| **v** | the suffix means the verb form of |
| | pronunciation should be used. |
+--------------------------------------+--------------------------------------+
+--------+--------------------------------------+
| **t** | determine the stress pattern of the |
| | word **before** adding the prefix |
+--------+--------------------------------------+
| **v** | the suffix means the verb form of |
| | pronunciation should be used. |
+--------+--------------------------------------+

### 4.4 Pronunciation Dictionary List {.western}
## Pronunciation Dictionary List

The ***\<language\>\_list***  file contains a list of words whose
The **\<language\>\_list**  file contains a list of words whose
pronunciations are given explicitly, rather than determined by the
Pronunciation Rules. The ***\<language\>\_extra***  file, if present, is
Pronunciation Rules. The **\<language\>\_extra**  file, if present, is
also used and it's contents are taken as coming after those in
***\<language\>\_list***.
**\<language\>\_list**.

Also the list can be used to specify the stress pattern, or other
properties, of a word.
@@ -298,57 +359,59 @@ Dictionary List after the prefix or suffix has been removed.

Lines in the dictionary list have the form:

eg.
```
<word> [<phoneme string>] [<flags>]
```

~~~~ {.western style="margin-bottom: 0.5cm"}
eg.
```
book bUk
~~~~
```

Rather than a full pronunciation, just the stress may be given, to
change where it would be otherwise placed by the Pronunciation Rules:

~~~~ {.western}
```
berlin $2 // stress on second syllable
absolutely $3 // stress on third syllable
for $u // an unstressed word
~~~~
```

#### 4.4.1 Multiple Words {.western}
### Multiple Words

A pronunciation may also be specified for a group of words, when these
appear together. Up to four words may be given, enclosed in brackets.
This may be used for change the pronunciation or stress pattern when
these words occur together,

~~~~ {.western style="margin-bottom: 0.5cm"}
```
(de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string
~~~~
```

or to run them together, pronounced as a single word

~~~~ {.western style="margin-bottom: 0.5cm"}
```
(of a) @v@
~~~~
```

or to give them a flag when they occur together

~~~~ {.western style="margin-bottom: 0.5cm"}
```
(such as) sVtS||a2z $pause // precede with a pause
~~~~
```

Hyphenated words in the ***\<language\>\_list***  file must also be
Hyphenated words in the **\<language\>\_list**  file must also be
enclosed within brackets, because the two parts are considered as
separate words.

#### 4.4.2 Special characters in \<phoneme string\>: {.western}
### Special characters in \<phoneme string\>:

+--------------------------------------+--------------------------------------+
| **\_\^\_\<language code\>   ** | Translate using a different |
| | language. See explanation in 4.3.3 |
| **\_\^\_\<language code\>** | Translate using a different |
| | language. See explanation in 3 |
| | above. |
+--------------------------------------+--------------------------------------+

#### 4.4.3 Flags {.western}
### 3 Flags

A word (or group of words) may be given one or more flags, either
instead of, or as well as, the phonetic translation.
@@ -449,12 +512,12 @@ instead of, or as well as, the phonetic translation.
| | end of a sentence. |
+--------------------------------------+--------------------------------------+
| \$abbrev | This has two meanings.\ |
| | 1. If there is no phoneme string: |
| | If there is no phoneme string: |
| | Speak the word as individual |
| | letters, even if it contains a vowel |
| | (eg. "abc" should be spoken as "a" |
| | "b" "c").\ |
| | 2. If there is a phoneme string: |
| | If there is a phoneme string: |
| | This word is capitalized because it |
| | is an abbreviation and |
| | capitalization does not indicate |
@@ -517,35 +580,33 @@ The dictionary list is searched from bottom to top. The first match that
satisfies any conditions is used (i.e. the one lowest down the list). So
if we have:

~~~~ {.western}
```
to t@ // unstressed version
to tu: $atend // stressed version
~~~~
```

then if "to" is at the end of the clause, we get [tu:], if not then we
get [t@].

#### 4.4.4 Translating a Word to another Word {.western}
### Translating a Word to another Word

Rather than specifying the pronunciation of a word by a phoneme string,
you can specify another "sounds like" word.

Use the attribute **\$text** eg.

~~~~ {.western style="margin-bottom: 0.5cm"}
```
cough coff $text
~~~~
```

Alternatively, use the command **\$textmode** on a line by itself to
turn this on for all subsequent entries in the file, until it's turned
off by **\$phonememode**. eg.

~~~~ {.western}
```
$textmode
cough coff
through threw
$phonememode
~~~~
```

This feature cannot be used for the special entries in the **\_list**
files which start with an underscore, such as numbers.
@@ -554,7 +615,7 @@ Currently "textmode" entries are only recognized for complete words, and
not for for stems from which a prefix or suffix has been removed (eg.
the word "coughs" would not match the example above).

### 4.5 Conditional Rules {.western}
## Conditional Rules

Rules in a **\_rules** file and entries in a **\_list** file can be made
conditional. They apply only to some voices. This can be useful to
@@ -569,14 +630,14 @@ line in the [voice file](voices.html).
If the rule starts with   **?!**   then the rule only applies if the
condition number is **not** specified in the voice file. eg.

~~~~ {.western}
```
?3 can't kant // only use this if the voice has: dictrules 3
?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3
~~~~
```

### 4.6 Numbers and Character Names {.western}
## Numbers and Character Names

#### 4.6.1 Letter names {.western}
### Letter names

The names of individual letters can be given either in the **\_rules**
or **\_list** file. Sometimes an individual letter is also used as a
@@ -585,14 +646,14 @@ letter name. If so, it should be listed in the **\_list** file, preceded
by an underscore, to give the letter name (as distinct from its
pronunciation as a word). eg. in English:

~~~~ {.western style="margin-bottom: 0.5cm"}
```
_a eI
~~~~
```

#### 4.6.2 Numbers {.western}
### Numbers

The operation the TranslateNumber() function is controlled by the
language's `langopts.numbers`{.western} option. This constructs spoken
language's `langopts.numbers` option. This constructs spoken
numbers from fragments according to various options which can be set for
each language. The number fragments are given in the **\_list** file.

@@ -636,7 +697,7 @@ each language. The number fragments are given in the **\_list** file.
| | point. |
+--------------------------------------+--------------------------------------+

### 4.7 Character Substitution {.western}
## Character Substitution

Character substitutions can be specified by using a **.replace**section
at the start of the **\_rules**file. Each line specified either one or
@@ -645,11 +706,11 @@ alphabetic characters. This substitution is done to a word before it is
translated using the spelling-to-phoneme rules. Only the lower-case
version of the characters needs to be specified. eg.

```
  .replace\
    ô   ő   // (Hungarian) allow the use of o-circumflex instead of
o-double-accute\
    ô   ő   // (Hungarian) allow the use of o-circumflex instead of o-double-accute
    û   ű

    cx   ĉ   // (Esperanto) allow "cx" as an alternative to c-circumflex

    fi   fi   // replace a single character ligature by two characters
```


+ 0
- 67
docs/docindex.html View File

@@ -1,67 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<title>eSpeak Speech Synthesizer</title>
<meta name="GENERATOR" content="Quanta Plus">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<table border="1" cellpadding="10" background="images/sand-light.jpg" width="100%">
<tbody>
<tr>
<td width="15%">
<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=159649&amp;type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a>
</td>
<td>
<div align="center"><h1>eSpeak - Documents</h1></div>
</td>
</tr>

<tr>
<td valign="top">
<font size="+1"><strong>
<A href="index.html">Home</A>
<p>
<A href="commands.html">Usage</A>
<p>
<A href="languages.html">Languages</A>
</strong></font>
</td>
<td>
<h3><A href="voices.html">Voice Files</A></h3>
Voice files specify a language and other characteristics of a voice.
<h3><A href="mbrola.html">Mbrola Voices</A></h3>
eSpeak can be used as a front-end for Mbrola diphone voices.
<h3><A href="dictionary.html">Pronunciation Dictionary</A></h3>
<ul>
<li>How to add pronunciation corrections.
<li>How to build up pronunciation rules for a new language.
</ul><p>
<h3><A href="add_language.html">Adding a Language</A></h3>
How to add or improve a language.
<h3><A href="phonemes.html">Phonemes</A></h3>
The list of phoneme mnemonics for English, for use in the Pronunciation Dictionary.
<h3><A href="phontab.html">Phoneme Tables</A></h3>
The tables of the phonemes used by each language, with their properties and sound production.
<h3><A href="intonation.html">Intonation</A></h3>
Different intonation "tunes" may be defined for different languages for clauses which end in full-stop, comma, question-mark, and exclamation-mark.
<h3><A href="speak_lib.h">eSpeak Library API</A></h3>
API definition and header file for a shared library version of eSpeak.
<h3><A href="ssml.html">Markup tags</A></h3>
SSML (Speech Synthesis Markup Language) and HTML tags recognized by eSpeak.
<h3><A href="editor.html">The espeakedit program</A></h3>
GUI software to edit vowel files and to compile the phoneme data for use by eSpeak.<br>
<ul>
<li><a href="editor_if.html">espeakedit program GUI details</a>
<li><a href="analyse.html">Analysing sound recordings</a>
<li><a href="makephonemes.html">Adjusting phoneme data</a> (to be written)
</ul>
</td>
</tr>
</tbody>
</table>


</body>
</html>

+ 0
- 75
docs/editor.html View File

@@ -1,75 +0,0 @@

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<title>espeakedit</title>
<meta name="GENERATOR" content="Quanta Plus">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<A href="docindex.html">Back</A>
<hr>
<h2>ESPEAKEDIT PROGRAM</h2>
<hr>
The <strong>espeakedit</strong> program is used to prepare phoneme data for the eSpeak speech synthesizer.<p>
It has two main functions:
<ul>
<li>Prepare keyframe files for individual vowels and voiced consonants. These each contain a sequence of keyframes which define how formant peaks (peaks in the frequency spectrum) vary during the sound.<p>
<li>Process the master <strong>phonemes</strong> file which, by including the phoneme files for the various languages, defines all their phonemes and references the keyframe files and the sound sample files which they use. <strong>espeakedit</strong> processes these and compiles them into the <strong>phondata</strong>, <strong>phonindex</strong>, and <strong>phontab</strong> files in the <strong>espeak-data</strong> directory which are used by the eSpeak speech synthesizer.
</ul>
<hr>
<h3>Installation</h3>
<strong>espeakedit</strong> needs the following packages:<br>
(The package names mentioned here are those from the Ubuntu "Dapper" Linux distribution).
<ul>
<li><strong>sox</strong> &nbsp; (a universal sound sample translator)
<li><strong>libwxgtk2.6-0</strong> &nbsp; (wxWidgets Cross-platform C++ GUI toolkit)
<li><strong>portaudio0</strong> &nbsp; (Portaudio V18, portable audio I/O)
</ul>
In addition, a modified version of <strong>praat</strong> (<a href="www.praat.org">www.praat.org</a>) is used to view and analyse WAV sound files.
This needs the package <strong>libmotif3</strong> to run and <strong>libmotif-dev</strong> to compile.
<hr>
<h3>Quick Guide</h3>
This will quickly illustrate the main features. Details of the interface and key commands are given in <a href="editor_if.html">editor_if.html</a><p>
For more detailed information on analysing sound recordings and preparing phoneme definitions and keyframe data see <a href="analyse.html">analyse.html</a> (to be written).
<h4>Compiling Phoneme Data</h4>
<ol>
<li>Run the <strong>espeakedit</strong> program.<p>
<li>Select <b>Data->Compile phoneme data</b> from the menu bar. Dialog boxes will ask you to locate the directory (<b>phsource</b>) which contains the master phonemes file, and the directory (<b>dictsource,</b>) which contains the dictionary files (en_rules, en_list, etc). Once specified, espeakedit will remember their locations, although they can be changed later from <b>Options->Paths</b>.<p>
<li>A message in the status line at the bottom of the espeakedit window will indicate whether there are any errors in the phoneme data, and how many language's dictionary files have been compiled. The compiled data is placed into the <b>espeak-data</b> directory, ready for use by the speak program. If errors are found in the phoneme data, they are listed in a file <b>error_log</b> in the <b>phsource</b> directory.</li>
<p>
NOTE: espeakedit can be used from the command line to compile the phoneme data, with the command: <b> espeakedit --compile</b>
<li>Select <b>Tools->Make vowels chart->From compiled phoneme data</b>. This will look for the vowels in the compiled phoneme data of each language and produce a vowel chart (.png file) in <b>phsource/vowelcharts</b>. These charts plot the vowels' F1 (formant 1) frequency against their F2 frequency, which corresponds approximately to their open/close and front/back positions. The colour in the circle for each vowel indicates its F3 frequency, red indicates a low F3, through yellow and green to blue and violet for a high F3. In the case of a diphthong, a line is drawn from the circle to the position of the end of the vowel.
</ol>
<h4>Keyframe Sequences</h4>
<ol>
<li>Select <b>File->Open</b> from the menu bar and select a vowel file, <b>phsource/vowel/a</b>. This will open a tab in the espeakedit window which contains a sequence of 4 keyframes. Each keyframe shows a black graph, which is the outline of an original analysed spectrum from a sound recording, and also a green line, which shows the formant peaks which have been added (using the black graph as a guide) and which produce the sound.<p>
<li>Click in the "a" tab window and then press the <b>F2</b> key. This will produce and play the sound of the keyframe sequence. The first time you do this, you'll get a save dialog asking where you want the WAV file to be saved. Once you give a location all future sounds will be stored in that same location, although it can be changed from <b>Options->Paths</b>.<p>
<li>Click on the second of the four frames, the one with the red square. Press <b>F1</b>. That plays the sound of just that frame.<p>
<li>Press the <b>1</b> (number one) key. That selects formant F1 and a red triangle appears under the F1 formant peak to indicate that it's selected. Also an = sign appears next to formant 1 in the formants list in the left panel of the window.<p>
<li>Press the left-arrow key a couple of times to move the F1 peak to the left. The red triangle and its associated green formant peak moves lower frequency. Its numeric value in the formants list in the left panel decreases.<p>
<li>Press the <b>F1</b> key again. The frame will give a slightly different vowel sound. As you move the F1 peak slightly up and down and then press <b>F1</b> again, the sound changes. Similarly if you press the <b>2</b> key to select the F2 formant, then moving that will also change the sound. If you move the F1 peak down to about 700 Hz (and reduce its height a bit with the down-arrow key) and move F2 up to 1400 Hz, then you'll hear a "er" schwa [@] sound instead of the original [a].<p>
<li>Select <b>File->Open</b> and choose <b>phsource/vowel/aI</b>. This opens a new tab labelled "aI" which contains more frames. This is the [aI] diphthong and if you click in the tab window and press <b>F2</b> you'll hear the English word "eye". If you click on each frame in turn and press <b>F1</b> then you can hear each of the keyframes in turn. They sound different, starting with an [A] sound (as in "palm"), going through something like [@] in "her" and ending with something like [I] in "kit" (or perhaps a French é). Together they make the diphthong [aI].
</ol>
<h4>Text and Prosody Windows</h4>
<ol>
<li>Click on the <b>Text</b> tab in the left panel. Two text windows appear in the panel with buttons <b>Translate</b> and <b>Speak</b> below them.<p>
<li>Type some text into the top window and click the <b>Translate</b> button. The phonetic translation will appear in the lower window.<p>
<li>Click the <b>Speak</b> button. The text will be spoken and a <b>Prosody</b> tab will open in the main window.<p>
<li>Click on a vowel phoneme which is displayed in the Prosody tab. A red line appears under it to indicate that it has been selected.<p>
<li>Use the <b>up-arrow</b> or <b>down-arrow</b> key to move the vowel's blue pitch contour up or down. Then click the <b>Speak</b> button again to hear the effect of the altered pitch. If the adjacent phoneme also has a pitch contour then you may hear a discontinuity in the sound if it no longer matches with the one which you have moved.<p>
<li>Hold down the <b>Ctrl</b> key while using the <b>up-arrow</b> or <b>down-arrow</b> keys. The gradient of the pitch contour will change.<p>
<li>Click with the right mouse button over a phoneme. A menu allows you to select a different pitch envelope shape. Details of the currently selected phoneme appear in the Status line at the bottom of the window. The <b>Stress</b> number gives the stress level of the phoneme (see voices.html for a list).<p>
<li>Click the <b>Translate</b> button. This re-translates the text and restores the original pitches.<p>
<li>Click on a vowel phoneme in the Prosody window and use the <b>&lt;</b> and <b>&gt;</b> keys to shorten or lengthen it.<p>
</ol>
The Prosody window can be used to experiment with different phoneme lengths and different intonation.<p>

<hr>

</body>
</html>




+ 54
- 28
docs/editor.md View File

@@ -1,46 +1,72 @@
ESPEAKEDIT PROGRAM {.western}
------------------
# Table of contents

The **espeakedit** program is used to prepare phoneme data for the
eSpeak speech synthesizer.
* [Espeakedit program](#espeakedit-program)
* [Installation](#installation)
* [Quick Guide](#quick-guide)
* [Compiling Phoneme Data](#compiling-phoneme-data)
* [Keyframe Sequences](#keyframe-sequences)
* [Text and Prosody Windows](#text-and-prosody-windows)

# Espeakedit program

The **espeakedit** program is used to prepare phoneme data for the eSpeak speech synthesizer.

It has two main functions:

- -
* Prepare keyframe files for individual vowels and voiced consonants. These each contain a sequence of keyframes which define how formant peaks (peaks in the frequency spectrum) vary during the sound.
* Process the master **phonemes** file which, by including the phoneme files for the various languages, defines all their phonemes and references the keyframe files and the sound sample files which they use. **espeakedit** processes these and compiles them into the **phondata**, **phonindex**, and **phontab** files in the **espeak-data** directory which are used by the eSpeak speech synthesizer.


## Installation

**espeakedit** needs the following packages:
(The package names mentioned here are those from the Ubuntu "Dapper" Linux distribution).

* **sox** (a universal sound sample translator)
* **libwxgtk2.6-0** (wxWidgets Cross-platform C++ GUI toolkit)
* **portaudio0** (Portaudio V18, portable audio I/O)

In addition, a modified version of **praat** ([www.praat.org](http://www.praat.org/)) is used to view and analyse WAV sound files. This needs the package **libmotif3** to run and **libmotif-dev** to compile.

### Installation {.western}
## Quick Guide

**espeakedit** needs the following packages:\
(The package names mentioned here are those from the Ubuntu "Dapper"
Linux distribution).
This will quickly illustrate the main features. Details of the interface and key commands are given in [editor_if](editor_if.md)

- - -
For more detailed information on analysing sound recordings and preparing phoneme definitions and keyframe data see [analyse](analyse.md).

In addition, a modified version of **praat**
([www.praat.org](www.praat.org)) is used to view and analyse WAV sound
files. This needs the package **libmotif3** to run and **libmotif-dev**
to compile.
### Compiling Phoneme Data

### Quick Guide {.western}
1. Run the `espeakedit` program.
2. Select **Data->Compile phoneme data** from the menu bar. Dialog boxes will ask you to locate the directory (`phsource`) which contains the master phonemes file, and the directory (`dictsource,`) which contains the dictionary files (en_rules, en_list, etc). Once specified, espeakedit will remember their locations, although they can be changed later from **Options->Paths**.
3. A message in the status line at the bottom of the espeakedit window will indicate whether there are any errors in the phoneme data, and how many language's dictionary files have been compiled. The compiled data is placed into the `espeak-data` directory, ready for use by the speak program. If errors are found in the phoneme data, they are listed in a file `error_log` in the `phsource` directory.

This will quickly illustrate the main features. Details of the interface
and key commands are given in [editor\_if.html](editor_if.html)
NOTE: espeakedit can be used from the command line to compile the phoneme data, with the command:

For more detailed information on analysing sound recordings and
preparing phoneme definitions and keyframe data see
[analyse.html](analyse.html) (to be written).
`espeakedit --compile`

#### Compiling Phoneme Data {.western}
5. Select **Tools->Make vowels chart->From compiled phoneme data**. This will look for the vowels in the compiled phoneme data of each language and produce a vowel chart (.png file) in `phsource/vowelcharts`. These charts plot the vowels' F1 (formant 1) frequency against their F2 frequency, which corresponds approximately to their open/close and front/back positions. The colour in the circle for each vowel indicates its F3 frequency, red indicates a low F3, through yellow and green to blue and violet for a high F3\. In the case of a diphthong, a line is drawn from the circle to the position of the end of the vowel.

1. 2. 3. 4.
### Keyframe Sequences

#### Keyframe Sequences {.western}
1. Select **File->Open** from the menu bar and select a vowel file, `phsource/vowel/a`. This will open a tab in the espeakedit window which contains a sequence of 4 keyframes. Each keyframe shows a black graph, which is the outline of an original analysed spectrum from a sound recording, and also a green line, which shows the formant peaks which have been added (using the black graph as a guide) and which produce the sound.
2. Click in the "a" tab window and then press the **F2** key. This will produce and play the sound of the keyframe sequence. The first time you do this, you'll get a save dialog asking where you want the WAV file to be saved. Once you give a location all future sounds will be stored in that same location, although it can be changed from **Options->Paths**.
3. Click on the second of the four frames, the one with the red square. Press **F1**. That plays the sound of just that frame.
4. Press the **1** (number one) key. That selects formant F1 and a red triangle appears under the F1 formant peak to indicate that it's selected. Also an = sign appears next to formant 1 in the formants list in the left panel of the window.
5. Press the left-arrow key a couple of times to move the F1 peak to the left. The red triangle and its associated green formant peak moves lower frequency. Its numeric value in the formants list in the left panel decreases.
6. Press the **F1** key again. The frame will give a slightly different vowel sound. As you move the F1 peak slightly up and down and then press **F1** again, the sound changes. Similarly if you press the **2** key to select the F2 formant, then moving that will also change the sound. If you move the F1 peak down to about 700 Hz (and reduce its height a bit with the down-arrow key) and move F2 up to 1400 Hz, then you'll hear a "er" schwa [@] sound instead of the original [a].
7. Select **File->Open** and choose `phsource/vowel/aI`. This opens a new tab labelled "aI" which contains more frames. This is the [aI] diphthong and if you click in the tab window and press **F2** you'll hear the English word "eye". If you click on each frame in turn and press **F1** then you can hear each of the keyframes in turn. They sound different, starting with an [A] sound (as in "palm"), going through something like [@] in "her" and ending with something like [I] in "kit" (or perhaps a French é). Together they make the diphthong [aI].

1. 2. 3. 4. 5. 6. 7.
### Text and Prosody Windows

#### Text and Prosody Windows {.western}
1. Click on the **Text** tab in the left panel. Two text windows appear in the panel with buttons **Translate** and **Speak** below them.
2. Type some text into the top window and click the **Translate** button. The phonetic translation will appear in the lower window.
3. Click the **Speak** button. The text will be spoken and a **Prosody** tab will open in the main window.
4. Click on a vowel phoneme which is displayed in the Prosody tab. A red line appears under it to indicate that it has been selected.
5. Use the **up-arrow** or **down-arrow** key to move the vowel's blue pitch contour up or down. Then click the **Speak** button again to hear the effect of the altered pitch. If the adjacent phoneme also has a pitch contour then you may hear a discontinuity in the sound if it no longer matches with the one which you have moved.
6. Hold down the **Ctrl** key while using the **up-arrow** or **down-arrow** keys. The gradient of the pitch contour will change.
7. Click with the right mouse button over a phoneme. A menu allows you to select a different pitch envelope shape. Details of the currently selected phoneme appear in the Status line at the bottom of the window. The **Stress** number gives the stress level of the phoneme (see voices.html for a list).
8. Click the **Translate** button. This re-translates the text and restores the original pitches.
9. Click on a vowel phoneme in the Prosody window and use the **<** and **>** keys to shorten or lengthen it.

1. 2. 3. 4. 5. 6. 7. 8. 9.
The Prosody window can be used to experiment with different phoneme lengths and different intonation.

The Prosody window can be used to experiment with different phoneme
lengths and different intonation.

+ 0
- 143
docs/editor_if.html View File

@@ -1,143 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>

<head>
<title>Editor - Spectrum</title>
<meta name="GENERATOR" content="Quanta Plus">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<A href="docindex.html">Back</A>
<hr>
<h2>USER INTERFACE - FORMANT EDITOR</h2>
<hr>
<h3>Frame Sequence Display</h3>
The eSpeak editor can display a number of frame-sequencies in tabbed windows. Each frame can contain a short-time frequency spectrum, covering the period of one cycle at the sound's pitch. Frames can also show:
<ul>
<LI>Blue vertical lines showing the estimated position of the f1 to f5 formants (if the sequence was produced by praat analysis). These should correspond with the peaks in the spectrum, but may not do so exactly<p>
<li>Numbers at the right side of the frame showing the position from the start of the sequence in mS, and the pitch of the sound.<p>
<li>Up to 9 formant peaks (numbered 0 to 9) added by the user, usually to match the peaks in the spectrum, in order to produce the required sound. These are shown in green, can be moved by keyboard presses as described below, and may merge if they are close together. If a frame has formant peaks then it is a Keyframe and is shown with a pale yellow background.<p>
<li>If formant peaks are present, a relative amplitude (r.m.s.) value is shown at the right side of the frame.
<li>
</ul>
<h3>Text Tab</h3>
Enter text in the top left text window. Click the <b>Translate</b> button to see the phonetic transcription in the text window below. Then click the <b>Speak</b> button to speak the text and show the results in the <b>Prosody</b> tab, if that is open.
<p>
If changes are made in the <b>Prosody</b> tab, then clicking <b>Speak</b> will speak the modified prosody while <b>Translate</b> will revert to the default prosody settings for the text.
<p>
To enter phonetic symbols (Kirschenbaum encoding) in the top left text window, enclose them within [[ ]].
<h3>Spect Tab</h3>
The "Spect" tab in the left panel of the eSpeak editor shows information about the currently selected frame and sequence.
<ul>
<li>The <strong>Formants</strong> section displays the Frequency, Height, and Width of each formant peak (peaks 0 to 8). Peaks 6, 7, 8 don't have a variable width.<p>
<li><strong>% amp - Frame</strong> can be used to adjust the amplitiude of the frame. If you change this value then the rms amplitude value at the right side of the frame will change. The formant peaks don't change, just the overall amplitude of the frame.<p>
<li><strong>mS</strong> shows the time in mS until the next keyframe (or end of sequence if there is none). The spin control initially shows the same value, but this can be changed in order to increase or decrease the effctive length of a keyframe.<p>
<li><strong>% amp - Sequence</strong> /ul> adjusts the amplitude of the whole sequence. Changing this values changes the rms amplitudes of all the keyframes in the sequence.<p>
<li><strong>% mS - Sequence</strong> /ul> shows the total length of the sequence.<p>
<li><strong>Graph</strong><br>
Yellow vertical lines show the position of keyframes within the sequence.<br>
Black bars on these show the frequencies of formant peaks which have been set at these keyframes.<br>
Thick red lines, if present, show the formants, as detected in the original analysis.<br>
Thin black line, if present, shows the pitch profile measured in the original analysis.
</ul>
</li>
</ul>
<h3>Key Commands</h3>
<ul>
<li><strong>Selection</strong>.<p>
The selected frame(s) are shown with a red border. The selected formant peak is also indicated by an equals ("=") sign next to its number in the "Spect" panel to the right of the window.<p>
The selected formant peak is shown with a red triangle under the peak.<p>
Keyframes are shown with a pale yellow background. A keyframe is any frame with any formant peaks which are not zero height. If all formant peaks become zero height, the frame is no longer a keyframe. If you increase a peak's height the frame becomes a keyframe.

<dl>
<dt><strong>Numbers 0 to 8</strong>
<dd>Select formant peak number 0 to 8.
<dt><strong>Page Up/Down</strong>
<dd>Move to next/previous frame
</dl>
<li><strong>Formant movement</strong>. With the following keys, holding down <b>Shift</b> causes slower movement.
<dl>
<dt>Left
<dd>Moves the selected formant peak to higher frequency.
<dt>Right
<dd>Moves the selected formant peak to lower frequency.
<dt>Up
<dd>Increases height of the selected formant peak.
<dt>Down
<dd>Decreases height of the selected formant peak.
<dt><strong>&lt;</strong>
<dd>Narrows the selected formant peak.
<dt><strong>&gt;</strong>
<dd>Widens the selected formant peak.
<dt><strong>CTRL &lt;</strong>
<dd>Narrows the selected formant peak.
<dt><strong>CTRL &gt;</strong>
<dd>Widens the selected formant peak.
<dt><b>/</b>
<dd>Makes the selected formant peak symmetrical.
</dl>
<li><strong>Frame Cut and Paste</strong>
<dl>
<dt><b>CTRL A</b>
<dd>Select all frames in the sequence.
<dt><b>CTRL C</b>
<dd>Copy selected frames to (internal) clipboard.
<dt><b>CTRL V</b>
<dd>Paste frames from the clipboard to overwrite the contents of the selected frame and the frames which follow it. Only the formant peaks information is pasted.
<dt><b>CTRL SHIFT V</b>
<dd>Paste frames from the clippoard to insert them above the selected frame.
<dt><b>CTRL X</b>
<dd>Delete the selected frames.
</dl>
<li><strong>Frame editing</strong>
<dl>
<dt><b>CTRL D</b>
<dd>Copy the formant peaks down to the selected frame from the next keyframe above.
<dt><b>CTRL SHIFT D</b>
<dd>Copy the formant peaks up to the selected frame from the next key-frame below.
<dt><b>CTRL Z</b>
<dd>Set all formant peaks in the selected frame to zero height. It is no longer a key-frame.
<dt><b>CTRL I</b>
<dd>Set the formant peaks in the selected frame as an interpolation between the next keyframes above and below it. A dialog box allows you to enter a percentage. 50% gives values half-way between the two adjacent key-frames, 0% gives values equal to the one above, and 100% equal to the one below.
</dl>
<li><strong>Display and Sound</strong>
<dl>
<dt><b>CTRL Q</b>
<dd>Shows interpolated formant peaks on non-keyframes. These frames don't become keyframes until any of the peaks are edited to increase their height.
<dt><b>CTRL SHIFT Q</b>
<dd>Removes the interpolated formant peaks display.
<dt><b>CTRL G</b>
<dd>Toggle grid on and off.
<dt><b>F1</b>
<dd>Play sound made from the one selected keyframe.
<dt><b>F2</b>
<dd>Play sound made from all the keyframes in the sequence.
</ul>
<p>&nbsp;
<hr>
<h2>USER INTERFACE - PROSODY EDITOR</h2>
<hr>
<ul><LI>
<dl>
<dt><b>Left</b>
<dd>Move to previous phoneme.
<dt><b>Right</b>
<dd>Move to next phoneme.
<dt><b>Up</b>
<dd>Increase pitch.
<dt><b>Down</b>
<dd>Decrease pitch.
<dt><b>Ctrl Up</b>
<dd>Increase pitch range.
<dt><b>Ctrl Down</b>
<dd>Decrease pitch range.
<dt><b>&gt;</b>
<dd>Increase length.
<dt><b>&lt;</b>
<dd>Decrease length.
</dd>
</dl>
</LI>
</ul>
</body>
</html>

+ 166
- 27
docs/editor_if.md View File

@@ -1,41 +1,180 @@
USER INTERFACE - FORMANT EDITOR {.western}
-------------------------------
# Table of contents

### Frame Sequence Display {.western}
* [User interface - formant editor](#user-interface---formant-editor)
* [Frame Sequence Display](#frame-sequence-display)
* [Text Tab](#text-tab)
* [Spect Tab](#spect-tab)
* [Key Commands](#key-commands)
* [Selection](#selection)
* [Formant movement](#formant-movement)
* [Frame Cut and Paste](#frame-cut-and-paste)
* [Frame editing](#frame-editing)
* [Display and Sound](#display-and-sound)
* [User interface - prosody editor](#user-interface---prosody-editor)

The eSpeak editor can display a number of frame-sequencies in tabbed
windows. Each frame can contain a short-time frequency spectrum,
covering the period of one cycle at the sound's pitch. Frames can also
show:
# User interface - formant editor

- - - - -
## Frame Sequence Display

### Text Tab {.western}
The eSpeak editor can display a number of frame-sequencies in tabbed windows. Each frame can contain a short-time frequency spectrum, covering the period of one cycle at the sound's pitch. Frames can also show:

Enter text in the top left text window. Click the **Translate** button
to see the phonetic transcription in the text window below. Then click
the **Speak** button to speak the text and show the results in the
**Prosody** tab, if that is open.
* Blue vertical lines showing the estimated position of the f1 to f5 formants (if the sequence was produced by praat analysis). These should correspond with the peaks in the spectrum, but may not do so exactly
* Numbers at the right side of the frame showing the position from the start of the sequence in miliseconds, and the pitch of the sound.
* Up to 9 formant peaks (numbered 0 to 9) added by the user, usually to match the peaks in the spectrum, in order to produce the required sound. These are shown in green, can be moved by keyboard presses as described below, and may merge if they are close together. If a frame has formant peaks then it is a Keyframe and is shown with a pale yellow background.
* If formant peaks are present, a relative amplitude (r.m.s.) value is shown at the right side of the frame.

If changes are made in the **Prosody** tab, then clicking **Speak** will
speak the modified prosody while **Translate** will revert to the
default prosody settings for the text.
## Text Tab

To enter phonetic symbols (Kirschenbaum encoding) in the top left text
window, enclose them within [[ ]].
Enter text in the top left text window. Click the **Translate** button to see the phonetic transcription in the text window below. Then click the **Speak** button to speak the text and show the results in the **Prosody** tab, if that is open.

### Spect Tab {.western}
If changes are made in the **Prosody** tab, then clicking **Speak** will speak the modified prosody while **Translate** will revert to the default prosody settings for the text.

The "Spect" tab in the left panel of the eSpeak editor shows information
about the currently selected frame and sequence.
To enter phonetic symbols in [Kirschenbaum](https://en.wikipedia.org/wiki/Kirshenbaum)-like encoding in the top left text window, enclose them within **[[ ]]**.

- - - - - -
## Spect Tab

### Key Commands {.western}
* **Spect**
tab in the left panel of the eSpeak editor shows information about the currently selected frame and sequence.

- - - - -
* **Formants**
section displays the Frequency, Height, and Width of each formant peak (peaks 0 to 8). Peaks 6, 7, 8 don't have a variable width.

USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"}
-------------------------------
* **% amp - Frame**
can be used to adjust the amplitiude of the frame. If you change this value then the rms amplitude value at the right side of the frame will change.
The formant peaks don't change, just the overall amplitude of the frame.

* **mS**
shows the time in miliseconds until the next keyframe (or end of sequence if there is none).
The spin control initially shows the same value, but this can be changed in order to increase or decrease the effctive length of a keyframe.

* **% amp - Sequence**
adjusts the amplitude of the whole sequence. Changing this values changes the rms amplitudes of all the keyframes in the sequence.

* **% mS - Sequence**
shows the total length of the sequence.

* **Graph**
Yellow vertical lines show the position of keyframes within the sequence.
Black bars on these show the frequencies of formant peaks which have been set at these keyframes.
Thick red lines, if present, show the formants, as detected in the original analysis.
Thin black line, if present, shows the pitch profile measured in the original analysis.

## Key Commands

### Selection

The selected frame(s) are shown with a red border. The selected formant peak is also indicated by an equals (**=**) sign next to its number in the "Spect" panel to the right of the window.
The selected formant peak is shown with a red triangle under the peak.
Keyframes are shown with a pale yellow background. A keyframe is any frame with any formant peaks which are not zero height. If all formant peaks become zero height, the frame is no longer a keyframe. If you increase a peak's height the frame becomes a keyframe.

* **Numbers 0 to 8**
Select formant peak number 0 to 8.

* **Page Up/Down**
Move to next/previous frame

### Formant movement

With the following keys, holding down **Shift** causes slower movement.

* **Left**
Moves the selected formant peak to higher frequency.

* **Right**
Moves the selected formant peak to lower frequency.

* **Up**
Increases height of the selected formant peak.

* **Down**
Decreases height of the selected formant peak.

* **<**
Narrows the selected formant peak.

* **>**
Widens the selected formant peak.

* **CTRL <**
Narrows the selected formant peak.

* **CTRL >**
Widens the selected formant peak.

* **/**
Makes the selected formant peak symmetrical.

### Frame Cut and Paste

* **CTRL A**
Select all frames in the sequence.

* **CTRL C**
Copy selected frames to (internal) clipboard.

* **CTRL V**
Paste frames from the clipboard to overwrite the contents of the selected frame and the frames which follow it. Only the formant peaks information is pasted.

* **CTRL SHIFT V**
Paste frames from the clippoard to insert them above the selected frame.

* **CTRL X**
Delete the selected frames.

### Frame editing

* **CTRL D**
Copy the formant peaks down to the selected frame from the next keyframe above.

* **CTRL SHIFT D**
Copy the formant peaks up to the selected frame from the next key-frame below.

* **CTRL Z**
Set all formant peaks in the selected frame to zero height. It is no longer a key-frame.

* **CTRL I**
Set the formant peaks in the selected frame as an interpolation between the next keyframes above and below it. A dialog box allows you to enter a percentage. 50% gives values half-way between the two adjacent key-frames, 0% gives values equal to the one above, and 100% equal to the one below.

### Display and Sound

* **CTRL Q**
Shows interpolated formant peaks on non-keyframes. These frames don't become keyframes until any of the peaks are edited to increase their height.

* **CTRL SHIFT Q**
Removes the interpolated formant peaks display.

* **CTRL G**
Toggle grid on and off.

* **F1**
Play sound made from the one selected keyframe.

* **F2**
Play sound made from all the keyframes in the sequence.

# User interface - prosody editor

* **Left**
Move to previous phoneme.

* **Right**
Move to next phoneme.

* **Up**
Increase pitch.

* **Down**
Decrease pitch.

* **Ctrl Up**
Increase pitch range.

* **Ctrl Down**
Decrease pitch range.

* **>**
Increase length.

* **<**
Decrease length.

-

+ 0
- 87
docs/index.html View File

@@ -1,87 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>eSpeak: Speech Synthesizer</title>
</head>
<body>

<table border="1" cellpadding="10" background="images/sand-light.jpg">
<tbody>
<tr>
<td width="15%" valign="top">
<a href="http://sourceforge.net"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=159649&amp;type=2" width="125" height="37" border="0" alt="SourceForge.net Logo" /></a>
</td>
<td>
<div align="center"><IMG src="images/lips.png" width="193" height="172" border="0">
<h1>eSpeak text to speech</h1></div>
<div align="center">
(email) &nbsp; jonsd at users dot sourceforge.net<br>
<a href="http://espeak.sf.net/download.html"><strong>Download</strong></a>
&nbsp; &nbsp; &nbsp; &nbsp;
<a href="http://sourceforge.net/projects/espeak/"><strong>eSpeak Sourceforge page</a>
&nbsp; &nbsp; &nbsp; &nbsp;
<a href="http://sourceforge.net/forum/?group_id=159649"><strong>Forum</strong></a>
&nbsp; &nbsp; &nbsp; &nbsp;
<a href="http://sourceforge.net/mail/?group_id=159649"><strong>Mailing list</strong></a>
</div>
</td>
</tr>
<tr>
<td valign="top">
<font size="+1"><strong>
<A href="commands.html">Usage</a>
<p>
<A href="languages.html">Languages</A>
<p>
<A href="docindex.html">Documents</A>
<p>
<A href="http://espeak.sf.net/samples.html">Samples</A>
<p>
<A href="http://espeak.sf.net/license.html">License</A>
</strong></font>
</td>
<td>
eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. &nbsp;
<a href="http://espeak.sourceforge.net/"><strong>http://espeak.sourceforge.net</strong></a>
<p>
eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.
<p>
eSpeak is available as:
<ul>
<li>A command line program (Linux and Windows) to speak text from a file or from stdin.
<li>A shared library version for use by other programs. (On Windows this is a DLL).
<li>A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
<li>eSpeak has been ported to other platforms, including Solaris and Mac OSX.
</ul>
Features.
<ul>
<li>Includes different Voices, whose characteristics can be altered.
<li>Can produce speech output as a WAV file.
<li>SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
<li>Compact size. The program and its data, including many languages, totals about 1.4 Mbytes.
<li>Can be used as a front-end to MBROLA diphone voices, see <a href="mbrola.html">mbrola.html</a>. eSpeak converts text to phonemes with pitch and length information.
<li>Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
<li>Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
<li>Development tools are available for producing and tuning phoneme data.
<li>Written in C.
</ul>
<p>
I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh.

<hr>
<strong>Languages</strong>. The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. Please contact me if you want to help.<p>
eSpeak does text to speech synthesis for the following languages, some better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian, Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.
<hr>
The latest <strong>development version</strong> is at:
<a href="http://espeak.sf.net/test/latest.html">espeak.sf.net/test/latest.html</a>.
<hr>
<strong>espeakedit</strong> is a GUI program used to prepare and compile phoneme data. It is now available for download. Documentation is currently sparse, but if you want to use it to add or improve language support, let me know.
<hr>
History. Originally known as <strong>speak</strong> and originally written for Acorn/RISC_OS computers starting in 1995. This version is an enhancement and re-write, including a relaxation of the original memory and processing power constraints, and with support for additional languages.
</td>
</tr>
</tbody>
</table>

</body>
</html>

+ 75
- 0
docs/index.md View File

@@ -1,3 +1,4 @@
<<<<<<< HEAD
# eSpeak NG - Documentation
======================

@@ -50,3 +51,77 @@ GUI software to edit vowel files and to compile the phoneme data for use
by eSpeak NG. See also [Espeakedit user interface](editor_if.md).


=======
# eSpeak NG: Speech Synthesizer

- [Features](#features)
- [History](#history)
- [Languages](languages.html)
- [Adding a Language](add_language.html)
- [Pronunciation Dictionary](dictionary.html)
- [Voice Files](voices.html)
- [MBROLA Voices](mbrola.html)
- [Phonemes](phonemes.html)
- [Phoneme Tables](phontab.html)
- [Intonation](intonation.html)
- [Markup Tags](ssml.html)
- [License](../COPYING)

----------

eSpeak NG is a compact open source software speech synthesizer for English and
other languages, for Linux and Windows.

eSpeak NG uses a "formant synthesis" method. This allows many languages to be
provided in a small size. The speech is clear, and can be used at high speeds,
but is not as natural or smooth as larger synthesizers which are based on human
speech recordings.

eSpeak is available as:

* A command line program (Linux and Windows) to speak text from a file or
from stdin.
* A shared library version for use by other programs. (On Windows this is
a DLL).
* A SAPI5 version for Windows, so it can be used with screen-readers and
other programs that support the Windows SAPI5 interface.
* eSpeak has been ported to other platforms, including Solaris and Mac OSX.

## Features

* Includes different Voices, whose characteristics can be altered.
* Can produce speech output as a WAV file.
* SSML (Speech Synthesis Markup Language) is supported (not complete),
and also HTML.
* Compact size. The program and its data, including many languages,
totals about 1.4 Mbytes.
* Can be used as a front-end to [MBROLA diphone voices](mbrola.html).
eSpeak NG converts text to phonemes with pitch and length information.
* Can translate text into phoneme codes, so it could be adapted as a
front end for another speech synthesis engine.
* Potential for other languages. Several are included in varying stages
of progress. Help from native speakers for these or other languages is
welcome.
* Written in C.

The eSpeak speech synthesizer supports over 70 languages, however in many cases
these are initial drafts and need more work to improve them. Assistance from
native speakers is welcome for these, or other new languages. Please contact me
if you want to help.

## History

The program was originally known as __speak__ and originally written
for Acorn/RISC\_OS computers starting in 1995 by Jonathan Duddington. This was
enhanced and re-written in 2007 as __eSpeak__, including a relaxation of the
original memory and processing power constraints, and with support for additional
languages.

In 2010, Reece H. Dunn started maintaining a version of eSpeak on GitHub that
was designed to make it easier to build eSpeak on POSIX systems, porting the
build system to autotools in 2012. In late 2015, this project was officially
forked to a new eSpeak NG project. The new eSpeak NG project is a significant
departure from the eSpeak project, with the intention of cleaning up the
existing codebase, adding new features and adding and improving to the
supported languages.
>>>>>>> upstream/master

+ 82
- 68
docs/intonation.md View File

@@ -1,38 +1,52 @@
INTONATION {.western}
----------
# Table of contents

In eSpeak's standard intonation model, a "tune" is applied to each
* [Intonation](#intonation)
* [Clauses](#clauses)
* [Tune definitions](#tune-definitions)

# Intonation

In eSpeak NG's standard intonation model, a "tune" is applied to each
clause depending on its punctuation. Other intonation models may be used
for some languages, such as tone languages.

Named tunes are defined in the text file:
`phsource/intonation`{.western}. This file must be compiled for use by
eSpeak by using the espeakedit program, using the menu option:
`Compile -> Compile intonation data`{.western}.
`phsource/intonation`. This file must be compiled for use by
eSpeak NG by using the espeakedit program, using the menu option:
**Compile -> Compile intonation data**.

### Clauses {.western}
## Clauses

The tunes which are used for a language can be specified by using a
`tunes`{.western} statement in a voice file in
`espeak-data/voices`{.western}. eg:
`tunes` statement in a voice file in `espeak-data/voices`. eg:

`tunes   s1  c1  q1  e1`{.western}
`tunes   s1  c1  q1  e1`

It's parameters are four tune names which are used for clauses which end
in:

1. 2. 3. 4.
1. Full-stop.
1. Comma.
1. Question mark.
1. Exclamation mark.


A clause consists of the following parts:

- - - -
* **Pre-head.**
These are any unstressed syllables before the first stressed syllable.
* **Head**
This is the part from the first stressed syllable up to the last syllable before the nucleus.
* **Nucleus**
This is stressed syllable which is the focus of the clause. eSpeak chooses the last stressed syllable of the clause.
* **Tail**
These are the syllables after the nucleus.

### Tune definitions {.western}
## Tune definitions

Here is an example tune definition from the file
`phsource/intonation`{.western}.
Here is an example tune definition from the file `phsource/intonation`.

~~~~ {.western}
```
tune s1
prehead 46 57
headenv fall 16
@@ -41,62 +55,62 @@ headextend 0 63 38 13 0
nucleus fall 70 18 24 12
nucleus0 fall 64 8
endtune
~~~~
```

It contains:

**tune** \<tune name\>
: Starts the definition of a tune. The `tune name`{.western} can
be used in a `tunes`{.western} statements in voice files.
**endtune** \<tune name\>
: Ends the definition of a tune.
**prehead** \<start pitch\> \<end pitch\>
: Gives the pitch path for any series of unstressed syllables before
the first stressed syllable.
**headenv** \<envelope\> \<height\>
: Gives the pitch envelope which is used for stressed syllables in the
head (before the nucleus), including `onset`{.western} and
`headlast`{.western} syllables if these are specified.
`height`{.western} gives a pitch range for the envelope.
**head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\>
: `start pitch`{.western} and `end pitch`{.western} give a pitch
path for the stressed syllables of the head. `steps`{.western} is
the maximum number of stressed syllables for which this applies. If
there are additional stressed syllables, then the
`headextend`{.western} statement is used for them.
: `unstressed start`{.western} and `unstressed end`{.western} give
a pitch path for unstressed syllables between two stressed
syllables. Their values are relative to the pitch of the previous
stressed syllable. Values are usually negative, meaning that the
unstressed syllables have lower pitch than the previous stressed
syllable.
**headextend** \<percentage list\>
: If the head contains more stressed syllables than is specified by
`steps`{.western}, then `percentage list`{.western} is used. It
contains up to 8 numbers which are used repeatedly for the
additional stressed syllables. A value of 0 corresponds to the lower
the `start pitch`{.western} and `end pitch`{.western} values of the
`head`{.western} statement. 100 corresponds to the higher value.
Negative values and values greater than 100 are allowed.
**nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\>
: This gives the pitch envelope and pitch range of the last stressed
syllable of the clause. `tail start`{.western} and
`tail end`{.western} give a pitch path for the unstressed syllables
which are after the last stressed syllable.
**nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\>
: This is used instead of `nucleus`{.western} if there are no
unstressed syllables after the last stressed syllable. In this case,
the pitch changes of the nucleus and the tail and both included in
the nucleus.
* **tune** \<tune name\>
Starts the definition of a tune. The `tune name` can
be used in a `tunes` statements in voice files.
* **endtune** \<tune name\>
Ends the definition of a tune.
* **prehead** \<start pitch\> \<end pitch\>
Gives the pitch path for any series of unstressed syllables before
the first stressed syllable.
* **headenv** \<envelope\> \<height\>
Gives the pitch envelope which is used for stressed syllables in the
head (before the nucleus), including `onset` and
`headlast` syllables if these are specified.
`height` gives a pitch range for the envelope.
* **head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\>
`start pitch` give a pitch
path for the stressed syllables of the head. `steps` is
the maximum number of stressed syllables for which this applies. If
there are additional stressed syllables, then the
`headextend` statement is used for them.
`unstressed start` give
a pitch path for unstressed syllables between two stressed
syllables. Their values are relative to the pitch of the previous
stressed syllable. Values are usually negative, meaning that the
unstressed syllables have lower pitch than the previous stressed
syllable.
* **headextend** \<percentage list\>
If the head contains more stressed syllables than is specified by
`steps` is used. It
contains up to 8 numbers which are used repeatedly for the
additional stressed syllables. A value of 0 corresponds to the lower
the `start pitch` values of the
`head` statement. 100 corresponds to the higher value.
Negative values and values greater than 100 are allowed.
* **nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\>
This gives the pitch envelope and pitch range of the last stressed
syllable of the clause. `tail start` and
`tail end` give a pitch path for the unstressed syllables
which are after the last stressed syllable.
* **nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\>
This is used instead of `nucleus` if there are no
unstressed syllables after the last stressed syllable. In this case,
the pitch changes of the nucleus and the tail and both included in
the nucleus.

The following attributes may also be included:

**onset** \<pitch\> \<unstressed start\> \<unstressed end\>
: This specifies the pitch for the first stressed syllable of the
head. If the `onset`{.western} statement is present, then the
`head`{.western} statement used for the stressed syllables after the
first.
**headlast** \<pitch\> \<unstressed start\> \<unstressed end\>
: This specifies the pitch for the last stressed syllable of the head
(i.e. the stressed syllable before the nucleus).
* **onset** \<pitch\> \<unstressed start\> \<unstressed end\>
This specifies the pitch for the first stressed syllable of the
head. If the `onset` statement is present, then the
`head` statement used for the stressed syllables after the
first.
* **headlast** \<pitch\> \<unstressed start\> \<unstressed end\>
This specifies the pitch for the last stressed syllable of the head
(i.e. the stressed syllable before the nucleus).


+ 57
- 36
docs/languages.md View File

@@ -1,12 +1,24 @@
3. LANGUAGES {.western}
------------

**Languages**. The eSpeak speech synthesizer supports several languages,
# Table of contents

* [Languages](#languages)
* [Help Needed](#help-needed)
* [Character sets](#character-sets)
* [Voice Files](#voice-files)
* [Default Voice](#default-voice)
* [English Voices](#english-voices)
* [Voice Variants](#voice-variants)
* [Other Languages](#other-languages)
* [Provisional Languages](#provisional-languages)
* [Mbrola Voices](#mbrola-voices)

# Languages

The eSpeak NG speech synthesizer supports several languages,
however in many cases these are initial drafts and need more work to
improve them. Assistance from native speakers is welcome for these, or
other new languages. Please contact me if you want to help.

eSpeak does text to speech synthesis for the following languages, some
eSpeak NG does text to speech synthesis for the following languages, some
better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan,
Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French,
German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian,
@@ -15,7 +27,7 @@ Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili,
Swedish, Tamil, Turkish, Vietnamese, Welsh.


#### Help Needed {.western}
### Help Needed

Many of these are just experimental attempts at these languages,
produced after a quick reading of the corresponding article on
@@ -31,9 +43,9 @@ Italian voice improved from "difficult to understand" to "good" by
changing the relative length of stressed syllables. Identifying
unstressed function words in the xx\_list file is also important to make
the speech flow well. See [Adding or Improving a
Language](add_language.html)
Language](add_language.md)

#### Character sets {.western}
### Character sets

Languages recognise text either as UTF8 or alternatively in an 8-bit
character set which is appropriate for that language. For example, for
@@ -41,9 +53,7 @@ Polish this is Latin2, for Russian it is KOI8-R. This choice can be
overridden by a line in the voices file to specify an ISO 8859 character
set, eg. for Russian the line:

~~~~ {.western style="margin-bottom: 0.5cm"}
charset 5
~~~~

will mean that ISO 8859-5 is used as the 8-bit character set rather than
KOI8-R.
@@ -56,18 +66,16 @@ or Russian voice will sound OK, but each word is spoken separately so it
won't flow properly.

Sample texts in various languages can be found at
[http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias)
and [www.gutenberg.org](http://www.gutenberg.org/)
[wikipedia](http://meta.wikimedia.org/wiki/List_of_Wikipedias)
and [gutenberg](http://www.gutenberg.org/)

### 3.1 Voice Files {.western}
## Voice Files

A number of Voice files are provided in the
`espeak-data/voices`{.western} directory. You can select one of these
with the **-v \<voice filename\>** parameter to the speak command, eg:
`espeak-data/voices` directory. You can select one of these
with the `-v \<voice filename\>` parameter to the speak command, eg:

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng -vaf
~~~~

to speak using the Afrikaans voice.

@@ -78,48 +86,61 @@ code](http://www.sil.org/iso639-3/codes.asp) can be used.

For details of the voice files see [Voices](voices.html).

#### Default Voice {.western}
### Default Voice

**default**
This voice is used if none is specified in the speak command. Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice.

## English Voices

* **en**
is the standard default English voice.

* **en-us**
American English.

* **en-sc**
English with a Scottish accent.

### 3.2 English Voices {.western}
* **en-n**
en-rp
en-wm**
are different English voices. These can be considered caricatures of various British accents: Northern, Received Pronunciation, West Midlands respectively.

### 3.3 Voice Variants {.western}
## Voice Variants

To make alternative voices for a language, you can make additional voice
files in espeak-data/voices which contains commands to change various
voice and pronunciation attributes. See [voices.html](voices.html).
voice and pronunciation attributes. See [voices](voices.md).

Alternatively there are some preset voice variants which can be applied
to any of the language voices, by appending `+`{.western} and a variant
to any of the language voices, by appending **+** and a variant
name. Their effects are defined by files in
`espeak-data/voices/!v`{.western}.
`espeak-data/voices/!v`.

The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male
voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and
`+croak +whisper`{.western} for other effects. For example:
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7` for male
voices, `+f1 +f2 +f3 +f4 +f5 `for female voices, and
`+croak +whisper` for other effects. For example:

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng -ven+m3
~~~~

The available voice variants can be listed with:

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng --voices=variant
~~~~

### 3.4 Other Languages {.western}
## Other Languages

The eSpeak speech synthesizer does text to speech for the following
The eSpeak NG speech synthesizer does text to speech for the following
additional langauges.

### 3.5 Provisional Languages {.western}
## Provisional Languages

These languages are only initial naive implementations which have had
little or no feedback and improvement from native speakers.

### 3.6 Mbrola Voices {.western}
## Mbrola Voices

Some additional voices, whose name start with **mb-** (for example
**mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak
**mb-en1**) use eSpeak NG as a front-end to Mbrola diphone voices. eSpeak NG
does the spelling-to-phoneme translation and intonation. See
[mbrola.html](mbrola.html).
[mbrola](mbrola.md).

+ 85
- 57
docs/mbrola.md View File

@@ -1,126 +1,154 @@
MBROLA VOICES {.western}
-------------
# Table of contents

* [Mbrola voices](#mbrola-voices)
* [Voice Names](#voice-names)
* [Windows Installation](#windows-installation)
* [Linux Installation](#linux-installation)
* [Mbrola Voice Files](#mbrola-voice-files)
* [Mbrola Phoneme Translation Data](#mbrola-phoneme-translation-data)

# Mbrola voices

The Mbrola project is a collection of diphone voices for speech
synthesis. They do not include any text-to-phoneme translation, so this
must be done by another program. The Mbrola voices are cost-free but are
not open source. They are available from the Mbrola website at:\
not open source. They are available from the Mbrola website at:

[http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html)

eSpeak can be used as a front-end to Mbrola. It provides the
eSpeak NG can be used as a front-end to Mbrola. It provides the
spelling-to-phoneme translation and intonation, which Mbrola then uses
to generate speech sound.

### Voice Names {.western}
## Voice Names

To use a Mbrola voice, eSpeak needs information to translate from its
To use a Mbrola voice, eSpeak NG needs information to translate from its
own phonemes to the equivalent Mbrola phonemes. This has been set up for
only some voices so far.

The eSpeak voices which use Mbrola are named as:\
The eSpeak NG voices which use Mbrola are named as:\
  **mb-**xxx

where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola
"**en1**" English voice). These voice files are in eSpeak's directory
`espeak-data/voices/mbrola`{.western}.
"**en1**" English voice). These voice files are in eSpeak NG's directory
`espeak-data/voices/mbrola`.

The installation instructions below use the Mbrola voice "en1" as an
example. You can use other mbrola voices for which there is an
equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}.
equivalent eSpeak NG voice in `espeak-data/voices/mbrola`.

There are some additional eSpeak Mbrola voices which speak English text
There are some additional eSpeak NG Mbrola voices which speak English text
using a Mbrola voice for a different language. These contain the name of
the Mbrola voice with a suffix **-en**. For example, the voice
**mb-de4-en** will speak English text with a German accent by using the
Mbrola **de4** voice.

### Windows Installation {.western}
## Windows Installation

The SAPI5 version of eSpeak NG uses the mbrola.dll.

The SAPI5 version of eSpeak uses the mbrola.dll.
1. Install eSpeak. Include the voice **mb-en1** in the list of voices during the eSpeak installation.
2. Install the PC/Windows version of Mbrola (MbrolaTools35.exe) from: [http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe](http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pcwin/MbrolaTools35.exe).
3. Get the **en1** voice from: [http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html) unpack the archive, and copy the "**en1**" data file (not the whole "en1" directory) into `C:/Program Files/eSpeak/espeak-data/mbrola`.
4. Use the voice **espeak-MB-EN1** from the list of SAPI5 voices.

1. 2. 3. 4.

### Linux Installation {.western}
## Linux Installation

From eSpeak version 1.44 onwards, eSpeak calls the mbrola program
From eSpeak NG version 44 onwards, eSpeak NG calls the mbrola program
directly, rather than passing phoneme data to it using a pipe.

1. 2. 3.
1. To install the Linux Mbrola binary, download: [http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbr301h.zip](http://www.tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbr301h.zip). Unpack the archive, and copy and rename the file from: `mbrola-linux-i386` to `mbrola` somewhere in your executable path (eg. `/usr/bin/mbrola` ).
2. Get the en1 voice from: [http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html). Unpack the archive, and copy the "**en1**" data file (not the whole "en1" directory) to `/usr/share/mbrola/en1`.

eSpeak will look for mbrola voices firstly in `espeak-data/mbrola` and then in `/usr/share/mbrola`

3. If you use the eSpeak voice such as "**mb-en1**" then eSpeak will use the mbrola "en1" voice, eg:
`espeak-ng -v mb-en1 "Hello world"`

### Mbrola Voice Files {.western}
To generate mbrola phoneme data (.pho file) you can use:
`espeak-ng -v mb-en1 -q --pho "Hello world"`
or
`espeak-ng -v mb-en1 -q --pho --phonout=out.pho "Hello world"`

eSpeak's voice files for Mbrola voices are in directory
`espeak-data/voices/mbrola`{.western}. They contain a line:\
  `mbrola <voice> <translation>`{.western} \
eg.\
  `mbrola en1 en1_phtrans`{.western}

- -
## Mbrola Voice Files

They are binary files which are compiled, using espeakedit, from source
files in `phsource/mbrola`{.western}, see below.
eSpeak NG's voice files for Mbrola voices are in directory `espeak-data/voices/mbrola`.
They contain a line: `mbrola <voice> <translation>`

### Mbrola Phoneme Translation Data {.western}
eg.
`mbrola en1 en1_phtrans`

Mbrola phoneme translation files specify translations from eSpeak
* **\<voice\>**
is the name of the Mbrola voice.
* **\<translation\>**
is a translation file to convert between eSpeak phonemes and the equivalent Mbrola phonemes.
These are kept in: `espeak-data/mbrola_ph`

They are binary files which are compiled, using espeakedit, from source files in `phsource/mbrola`, see below.

## Mbrola Phoneme Translation Data

Mbrola phoneme translation files specify translations from eSpeak NG
phoneme names to mbrola phoneme names. They are referenced from voice
files.

The source files are in `phsource/mbrola`{.western}. These are compiled
using the `espeakedit`{.western} program
(`Compile->Compile mbrola phonemes list`{.western}) to produce data
files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak.
The source files are in `phsource/mbrola`. These are compiled
using the `espeakedit` program
(`Compile->Compile mbrola phonemes list`) to produce data
files in `espeak-data/mbrola_ph` which are used by eSpeak NG.

Each line in the mbrola phoneme translation file contains:

`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western}

**\<control\>**
`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `

- - - -
* **\<control\>**
bit 0 skip the next phoneme
bit 1 match this and Previous phoneme
bit 2 only at the start of a word
bit 3 don't match two phonemes across a word boundary

**\<espeak ph1\>**\
The eSpeak phoneme which is to be translated to an mbrola phoneme.
* **\<espeak ph1\>**
The eSpeak NG phoneme which is to be translated to an mbrola phoneme.

**\<espeak ph2\>**\
If this field is not `NULL`{.western}, then the match only occurs if
* **\<espeak ph2\>**
If this field is not `NULL`, then the match only occurs if
this field matches the next phoneme. If control bit 1 is set, then the
*previous* rather than the *next* phoneme is matched. This field may
also have the following values:\
`VWL`{.western}   matches any Vowel phoneme.
also have the following values:
`VWL`   matches any Vowel phoneme.

**\<percent\>**\
If this field is zero then only one mbrola phoneme is used. If this
* **\<percent\>**
If this field is zero then only one mbrola phoneme is used. If this
field is non-zero, then two mbrola phonemes are used, and this value
gives the percentage length of the first mbrola phoneme.

**\<mbrola ph1\>**\
The mbrola phoneme to which the eSpeak phoneme is translated. This
field may be `NULL`{.western}.
* **\<mbrola ph1\>**
The mbrola phoneme to which the eSpeak NG phoneme is translated. This
field may be `NULL`.

**\<mbrola ph2\>**\
The second mbrola phoneme. This field is only used if the \<percent\>
* **\<mbrola ph2\>**
The second mbrola phoneme. This field is only used if the \<percent\>
field is not zero.

The list is searched from start to finish, until a match is found.
Therefore, a line with more specific match condition should appear
before a line which matches the same eSpeak phoneme but with a more
before a line which matches the same eSpeak NG phoneme but with a more
general condition.

The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes
The file `dictsource/dict_phonemes` lists the eSpeak NG phonemes
which are used for each language. Translations for all these should be
given in the mbrola phoneme translation file. In addition, some phonemes
which are referenced from phoneme files (eg.
`phsource/ph_language, phsource/phonemes`{.western}) in lines such as:
`phsource/ph_language, phsource/phonemes`) in lines such as:

~~~~ {.western}
beforenotvowel l/
reduceto a# 0
~~~~
beforenotvowel l/
reduceto a# 0

should also be included, even though they don't appear in
`dictsource/dict_phonemes`{.western}.
`dictsource/dict_phonemes`.

If the language's \*\_list or \*\_rules files includes rules to speak
words "as English" the mbrola phoneme translation file should include

+ 136
- 259
docs/phonemes.md View File

@@ -1,5 +1,12 @@
PHONEMES {.western}
--------
# Table of contents

* [Phonemes](#phonemes)
* [English Consonants](#english-consonants)
* [Some Additional Consonants](#some-additional-consonants)
* [English Vowels](#english-vowels)
* [Some Additional Vowels](#some-additional-vowels)

# Phonemes

In general a different set of phonemes can be defined for each language.

@@ -14,98 +21,48 @@ characters. See:
Phoneme mnemonics can be used directly in the text input to
**espeak-ng**. They are enclosed within double square brackets. Spaces
are used to separate words, and all stressed syllables must be marked
explicitly. eg:\
`[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western}

### English Consonants {.western}

`[p]`{.western}

`[b]`{.western}

`[t]`{.western}

`[d]`{.western}

`[tS]`{.western}

**ch**urch

`[dZ]`{.western}

**j**udge

`[k]`{.western}

`[g]`{.western}

`[f]`{.western}

`[v]`{.western}

`[T]`{.western}

**th**in

`[D]`{.western}

**th**is

`[s]`{.western}

`[z]`{.western}

`[S]`{.western}

**sh**op

`[Z]`{.western}

plea**s**ure

`[h]`{.western}

`[m]`{.western}

`[n]`{.western}

`[N]`{.western}

si**ng**

`[l]`{.western}

`[r]`{.western}

**r**ed (Omitted if not immediately followed by a vowel).

`[j]`{.western}

**y**es

`[w]`{.western}

**Some Additional Consonants**

\

`[C]`{.western}

German i**ch**

`[x]`{.western}

German bu**ch**

`[l^]`{.western}

Italian **gl**i

`[n^]`{.western}

Spanish **ñ**

### English Vowels {.western}
explicitly. eg:
\[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]\]

## English Consonants

+----------------+-------------------------------+
|\[p\] | \[b\] |
+----------------+-------------------------------+
|\[t\] | \[d\] |
+----------------+-------------------------------+
|\[tS\] **ch**urch | \[dZ\] **j**udge |
+----------------+-------------------------------+
|\[k\] | \[g\] |
+----------------+-------------------------------+
|\[f\] | \[v\] |
+----------------+-------------------------------+
|\[T\] **th**in | \[D\] **th**is |
+----------------+-------------------------------+
|\[s\] | \[z\] |
+----------------+-------------------------------+
|\[S\] **sh**op | \[Z\] plea**s**ure |
+----------------+-------------------------------+
|\[h\] | |
+----------------+-------------------------------+
|\[m\] | \[n\] |
+----------------+-------------------------------+
|\[N\] si**ng** | |
+----------------+-------------------------------+
|\[l\] | \[r\] **r**ed (Omitted if not immediately followed by a vowel). |
+----------------+-------------------------------+
|\[j\] **y**es | \[w\] |
+----------------+-------------------------------+

## Some Additional Consonants

+-------------------------+---------------------------+
| \[C]\ German i**ch** | \[x\] German bu**ch** |
+---------------------+-------------------------------+
| \[l^\] Italian **g**li | \[n^\] Spanish **ñ** |
+-------------------------+---------------------------+

## English Vowels

These are the phonemes which are used by the English spelling-to-phoneme
translations (en\_rules and en\_list). In some varieties of English
@@ -113,171 +70,91 @@ different phonemes may have the same sound, but they are kept separate
because they may differ in another variety.

In rhotic accents, such as General American, the phonemes
`[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound.

`[@]`{.western}

alph**a**

schwa

`[3]`{.western}

bett**er**

rhotic schwa. In British English this is the same as `[@]`{.western},
but it includes 'r' colouring in American and other rhotic accents. In
these cases a separate `[r]`{.western} should not be included unless it
is followed immediately by another vowel.

`[3:]`{.western}

n**ur**se

`[@L]`{.western}

simp**le**

`[@2]`{.western}

the

Used only for "the".

`[@5]`{.western}

to

Used only for "to".

`[a]`{.western}

tr**a**p

`[aa]`{.western}

b**a**th

This is `[a]`{.western} in some accents, `[A:]`{.western} in others.

`[a#]`{.western}

**a**bout

This may be `[@]`{.western} or may be a more open schwa.

`[A:]`{.western}

p**al**m

`[A@]`{.western}

st**ar**t

`[E]`{.western}

dr**e**ss

`[e@]`{.western}

squ**are**

`[I]`{.western}

k**i**t

`[I2]`{.western}

**i**ntend

As `[I]`{.western}, but also indicates an unstressed syllable.

`[i]`{.western}

happ**y**

An unstressed "i" sound at the end of a word.

`[i:]`{.western}

fl**ee**ce

`[i@]`{.western}

n**ear**

`[0]`{.western}

l**o**t

`[V]`{.western}

str**u**t

`[u:]`{.western}

g**oo**se

`[U]`{.western}

f**oo**t

`[U@]`{.western}

c**ure**

`[O:]`{.western}

th**ou**ght

`[O@]`{.western}

n**or**th

`[o@]`{.western}

f**or**ce

`[aI]`{.western}

pr**i**ce

`[eI]`{.western}

f**a**ce

`[OI]`{.western}

ch**oi**ce

`[aU]`{.western}

m**ou**th

`[oU]`{.western}

g**oa**t

`[aI@]`{.western}

sc**ie**nce

`[aU@]`{.western}

h**our**

### Some Additional Vowels {.western}
`[3:], [A@], [e@], [i@], [O@], [U@]` include the "r" sound.

+---------+--------------------------+---------------------------------------------------------------------+
|\[@\] | alph**a** | schwa |
+---------+--------------------------+---------------------------------------------------------------------+
|\[3\] | bett**er** | rhotic schwa. In British English this is the same as \[@\], |
| | | but it includes 'r' colouring in American and other rhotic accents. |
| | | In these cases a separate \[r\] should not be included unless it is |
| | | followed immediately by another vowel. |
+---------+--------------------------+---------------------------------------------------------------------+
|\[3:\] | n**ur**se | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[@L\] | simp**le** | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[@2\] | the Used only for "the". | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[@5\] | to Used only for "to". | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[a\] | tr**a**p | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[aa\] | b**a**th | This is \[a\] in some accents, \[A:\] in others. |
+---------+--------------------------+---------------------------------------------------------------------+
|\[a#\] | **a**bout | This may be \[@\] or may be a more open schwa. |
+---------+--------------------------+---------------------------------------------------------------------+
|\[A:\] | p**al**m | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[A@\] | st**ar**t | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[E\] | dr**e**ss | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[e@\] | squ**are** | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[I\] | k**i**t | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[I2\] | **i**ntend | As \[I\], but also indicates an unstressed syllable. |
+---------+--------------------------+---------------------------------------------------------------------+
|\[i\] | happ**y** | An unstressed "i" sound at the end of a word. |
+---------+--------------------------+---------------------------------------------------------------------+
|\[i:\] | fl**ee**ce | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[i@\] | n**ear** | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[0\] | l**o**t | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[V\] | str**u**t | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[u:\] | g**oo**se | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[U\] | f**oo**t | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[U@\] | c**ure** | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[O:\] | th**ou**ght | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[O@\] | n**or**th | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[o@\] | f**or**ce | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[aI\] | pr**i**ce | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[eI\] | f**a**ce | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[OI\] | ch**oi**ce | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[aU\] | m**ou**th | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[oU\] | g**oa**t | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[aI@\] | sc**ie**nce | |
+---------+--------------------------+---------------------------------------------------------------------+
|\[aU@\] | h**our** | |
+---------+--------------------------+---------------------------------------------------------------------+

## Some Additional Vowels

Other languages will have their own vowel definitions, eg:

+--------------------------------------+--------------------------------------+
| `[e]`{.western} | German **eh**, French **é** |
+--------------------------------------+--------------------------------------+
| `[o]`{.western} | German **oo**, French **o** |
+--------------------------------------+--------------------------------------+
| `[y]`{.western} | German **ü**, French **u** |
+--------------------------------------+--------------------------------------+
| `[Y]`{.western} | German **ö**, French **oe** |
+--------------------------------------+--------------------------------------+
`[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western}
+---------+--------------------------------------+
| \[e\] | German **eh**, French **é** |
+-------------------+----------------------------+
| \[o\] | German **oo**, French **o** |
+-------------------+----------------------------+
| \[y\] | German **ü**, French **u** |
+-------------------+----------------------------+
| \[Y\] | German **ö**, French **oe** |
+---------+--------------------------------------+

**\[:\]** can be used to lengthen a vowel, eg \[e:\]

+ 335
- 48
docs/phontab.md View File

@@ -1,5 +1,15 @@
PHONEME TABLES {.western}
--------------
# Table of contents

* [Phoneme tables](#phoneme-tables)
* [Phoneme files](#phoneme-files)
* [Phoneme definitions](#phoneme-definitions)
* [Phoneme Properties](#phoneme-properties)
* [Phoneme Instructions](#phoneme-instructions)
* [Conditional Statements](#conditional-statements)
* [Sound Specifications](#sound-specifications)
* [Vowel Transitions](#vowel-transitions)

# Phoneme tables

A phoneme table defines all the phonemes which are used by a language,
together with their properties and the data for their production as
@@ -20,7 +30,7 @@ the espeakedit download package. "Vowel files", which are referenced in
FMT(), VowelStart(), and VowelEnding() instructions are made using the
espeakedit program.

### Phoneme files {.western}
## Phoneme files

The phoneme tables are defined in a master phoneme file, named
**phonemes**. This starts with the **base** phoneme table followed by
@@ -30,22 +40,22 @@ from the **base** table or previously defined tables.
In addition to phoneme definitions, the phoneme file can contain the
following:

**include** \<filename\>
: Includes the text of the specified file at this point. This allows
different phoneme tables to be kept in different text files, for
convenience. \<filename\> is a relative path. The included file can
itself contain **include** statements.
**phonemetable** \<name\> \<parent\>
: Starts a new phoneme table, and ends the previous table.\
\<name\> Is the name of this phoneme table. This name is used in
Voice files.\
\<parent\> Is the name of a previously defined phoneme table whose
phoneme definitions are inherited by this one. The name **base**
indicates the first (base) phoneme table.
### Phoneme definitions {.western}
Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and
* **include** \<filename\>
Includes the text of the specified file at this point. This allows
different phoneme tables to be kept in different text files, for
convenience. \<filename\> is a relative path. The included file can
itself contain **include** statements.
* **phonemetable** \<name\> \<parent\>
Starts a new phoneme table, and ends the previous table.
\<name\> Is the name of this phoneme table. This name is used in
Voice files.
\<parent\> Is the name of a previously defined phoneme table whose
phoneme definitions are inherited by this one. The name **base**
indicates the first (base) phoneme table.
## Phoneme definitions
Note: These new Phoneme definitions apply to eSpeak NG version 420 and
later.

A phoneme table contains a list of phoneme definitions. Each starts with
@@ -53,7 +63,7 @@ the keyword **phoneme** and the phoneme name (this is the name used in
the pronunciation rules in a language's \*\_rules and \*\_list files),
and ends with the keyword **endphoneme**. For example:

~~~~ {.western}
```
phoneme aI
vowel
starttype #a endtype #i
@@ -75,7 +85,7 @@ and ends with the keyword **endphoneme**. For example:
ENDIF
WAV(ufric/s)
endphoneme
~~~~
```

A phoneme definition contains both static properties and executed
instructions. The instructions may contain conditional statements, so
@@ -90,23 +100,110 @@ produce the sound for the phoneme.
The **import\_phoneme** statement can be used to copy a previously
defined phoneme from a specified phoneme table. For example:

~~~~ {.western}
```
phoneme t
import_phoneme base/t[
endphoneme
~~~~
```

means: `phoneme t`{.western} in this phoneme table is a copy of
`phoneme t[`{.western} from phoneme table "base". A **length**
means: `phoneme t` in this phoneme table is a copy of
`phoneme t[` from phoneme table "base". A **length**
instruction can be used after **import\_phoneme** to vary the length
from the original.

### Phoneme Properties {.western}
## Phoneme Properties

Within the phoneme definition the following lines may occur: ( (V)
indicates only for vowels, (C) only for consonants)

### Phoneme Instructions {.western}
Type. One of these must be present.

+------------+-----------------------------------------------+
| **vowel** | |
+------------+-----------------------------------------------+
| **liquid** | semi-vowels, such as: `r, l, j, w` |
+------------+-----------------------------------------------+
| **nasal** | nasal eg: `m, n, N` |
+------------+-----------------------------------------------+
| **stop** | stop eg: `p, b, t, d, k, g` |
+------------+-----------------------------------------------+
| **frc** | fricative eg: `f, v, T, D, s, z, S, Z, C, x` |
+------------+-----------------------------------------------+
| **afr** | affricate eg: `tS, dZ` |
+------------+-----------------------------------------------+
| **pause** | |
+------------+-----------------------------------------------+
| **stress** | used for stress symbols, eg: ' , = % |
+------------+-----------------------------------------------+
| **virtual**| Used to represent a class of phonemes. |
+------------+-----------------------------------------------+

Properties:

+--------------+----------------------------------------------------------------------------------+
|**vls** | (C) voiceless eg. `p, t, k, f, s` |
+--------------+----------------------------------------------------------------------------------+
|**vcd** | (C) voiced eg. `b, d, g, v, z` |
+--------------+----------------------------------------------------------------------------------+
|**sibilant** | (C) eg: `s, z, S, Z, tS, dZ` |
+--------------+----------------------------------------------------------------------------------+
|**palatal** | (C) A palatal or palatalized consonant. |
+--------------+----------------------------------------------------------------------------------+
|**rhotic** | (C) An "r" type consonant. |
+--------------+----------------------------------------------------------------------------------+
|**unstressed**| (V) This vowel is always unstressed, unless explicitly marked otherwise. |
+--------------+----------------------------------------------------------------------------------+
|**nolink** | Prevent any linking from the previous phoneme. |
+--------------+----------------------------------------------------------------------------------+
|**nopause** | Used in a `liquid` or `nasal` phoneme to prevent eSpeak inserting a short |
| | pause if a word starts with this phoneme and the previous word ends with a vowel.|
+--------------+----------------------------------------------------------------------------------+
|**trill** | (C) Apply trill to the voicing. |
+--------------+----------------------------------------------------------------------------------+

Place of Articulation (C):

+--------+------------------+
|**blb** | bi-labial |
+--------+------------------+
|**ldb** | labio-dental |
+--------+------------------+
|**dnt** | dental |
+--------+------------------+
|**alv** | alveolar |
+--------+------------------+
|**rfx** | retroflex |
+--------+------------------+
|**pla** | palato-alveolar |
+--------+------------------+
|**pal** | palatal |
+--------+------------------+
|**vel** | velar |
+--------+------------------+
|**lbv** | labio-velar |
+--------+------------------+
|**uvl** | uvular |
+--------+------------------+
|**phr** | pharyngeal |
+--------+------------------+
|**glt** | glottal |
+--------+------------------+


* **starttype** \<phoneme\>
Allocates this phoneme to a group so that conditions such as nextPh(#e) can test for any of a group of phonemes. Pre-defined groups for use for vowels are: #@ #a #e #i #o #u. Additional groups can be defined as phonemes with type "virtual".

* **endtype** \<phoneme\>
Allocates this phoneme to a group so that conditions such as prevPh(#e) can test for any of a group of phonemes. Pre-defined groups for use for vowels are: #@ #a #e #i #o #u. Additional groups can be defined as phonemes with type "virtual".

* **lengthmod** \<integer\>
\(C\) Determines how this consonant affects the length of the previous vowel.
This value is used as index into the `length_mods` table in the `CalcLengths()` function in the eSpeak program.

* **voicingswitch** \<phoneme\>
This is used for some languages to change between voiced and unvoiced phonemes.

## Phoneme Instructions

Phoneme Instructions may be included within conditional statements.

@@ -115,20 +212,75 @@ causes a change to a different phoneme will terminate the instructions.
During the second phase, FMT() and WAV() instructions will terminate the
instructions.

### Conditional Statements {.western}
* **length** \<length\>
The relative length of the phoneme, typically about 140 for a short vowel and from 200 to 300 for a long vowel or diphong. A length() instruction is needed for vowels. It is optional for consonants.

* **ipa** \<ipa string\>
In many cases, eSpeak makes IPA (International Phonetic Alpbabet) phoneme names automatically from eSpeak phoneme names. If this is not correct, then the phoneme definition can include an **ipa** instruction to specify the correct IPA name. IPA strings may include non-ascii characters. They may also include characters specified by their character codes in the form U+ followed by 4 hexadecimal digits. For example a string: aU+0303 indicates 'a' with a 'combining tilde'.

* **WAV**(\<wav file\>, \<amplitude\>)
\<wav file\> is a path to a WAV file (22 kHz, 16 bits, mono) within `phsource/` which will be played to produce the sound. This method is used for unvoiced consonants. \<wavefile\> does not include a .WAV filename extension, although the file to which it refers may or may not have one.
\<amplitude\> is optional. It is a percentage change to the amplitude of the WAV file. So, `WAV(ufric/s, 50)` means: play file 'ufric/s.wav' at 50% amplitude.

* **FMT**(\<vowel file\>, \<amplitude\>)
\<vowel file\> is a path to a file (within `phsource/`) which defines how to generate the sound (a vowel or voiced consonant) from a sequence of formant values. Vowel files are made using the espeakedit program.
\<amplitude\> is optional. It is a percentage change to the amplitude of the sound which is synthesized from the FMT() instruction.

* **FMT**(\<vowel file\>, \<amplitude\>) **addWav**(\<wav file\>, \<amplitude\>)
For voiced consonants, a FMT() instruction may be followed by an addWav() instruction. addWav() has the same format as a WAV() instruction, but the WAV file is mixed with the sound which is synthesized from the FMT() instruction.

* **VowelStart**(\<vowel file\>, \<length adjust\>)
This is used to modify the start of a vowel when it follows a sonorant consonant (such as [l] or [j]). It replaces the first frame of the \<vowel file\> which is specified in a FMT() instruction by this \<vowel file\>, and adjusts the length of the original by a signed value \<length adjust\>. The VowelStart() instruction may be specified either in the phoneme definition of the vowel, or in the phoneme definition of the sonorant consonant which precedes the vowel. The former takes precedence.

* **VowelEnding**(\<vowel file\>, \<length adjust\>)
This is used to modify the end of a vowel when it is followed by a sonorant consonant (such as [l] or [j]). It is appended to the \<vowel file\> which is specified in a FMT() instruction by this \<vowel file\>, and adjusts the length of the original by a signed value \<length adjust\>. The VowelEnding() instruction may be specified either in the phoneme definition of the vowel, or in the phoneme definition of the sonorant consonant which follows the vowel. The former takes precedence.

* **Vowelin** \<vowel transition data\>
(C) Specifies the effects of this consonant on the formants of a following vowel. See "vowel transitions", below.

* **Vowelout** \<vowel transition data\>
(C) Specifies the effects of this consonant on the formants of a preceding vowel. See "vowel transitions", below.

* **ChangePhoneme(**\<phoneme\>)
Change to the specified phoneme.

* **ChangeIfDiminished(**\<phoneme\>)
Change to the specified phoneme (such as schwa, @) if this syllable has "diminished" stress.

* **ChangeIfUnstressed(**\<phoneme\>)
Change to the specified phoneme if this syllable has "diminished" or "unstressed" stress.

* **ChangeIfNotStressed(**\<phoneme\>)
Change to the specified phoneme if this syllable does not have "primary" stress.

* **ChangeIfStressed(**\<phoneme\>)
Change to the specified phoneme if this syllable has "primary" stress.

* **IfNextVowelAppend(**\<phoneme\>)
If the following phoneme is a vowel then this additional phoneme will be inserted before it.

* **RETURN**
Ends executions of instructions.

* **CALL** \<phoneme table\>/\<phoneme\>
Executes the instructions of the specified phoneme.


### Conditional Statements

Phoneme definitions can contain conditional statements such as:

~~~~ {.western}
IF <condition> THEN
```
<pre> IF <condition> THEN
<statements>
ENDIF
~~~~
</pre>
```

or more generally:

~~~~ {.western}
IF <condition> THEN
```
<pre> IF <condition> THEN
<statements>
ELIF <condition> THEN
<statements>
@@ -136,34 +288,138 @@ or more generally:
ELSE
<statements>
ENDIF
~~~~
</pre>
```

where the `ELSE`{.western} and multiple `ELSE`{.western} parts are
optional.
where the `ELSE` and multiple `ELSE` parts are optional.

Multiple conditions may be joined with `AND`{.western} or
`OR`{.western}, but not a mixture of `AND`{.western}s and
`OR`{.western}s.
Multiple conditions may be joined with `AND` or `OR`, but not a mixture of `AND`s and `OR`s.

A condition may be preceded by `NOT`{.western}. For example:
A condition may be preceded by `NOT`. For example:

~~~~ {.western}
IF <condition> AND NOT <condition> THEN
```
<pre> IF <condition> AND NOT <condition> THEN
<statements>
ENDIF
~~~~
</pre>
```

### Conditions

Conditions can be:

* thisPh(\<attribute\>)
Test this current phoneme

* prevPh(\<attribute\>)
Test the previous phoneme

* prevPhW(\<attribute\>)
Test the previous phoneme, but only within the same word. Returns false if there is no previous phoneme in the word.
* prev2PhW(\<attribute\>)
Test the phoneme before the previous phoneme, but only within the same word. Returns false if it is not in this word.
* nextPh(\<attribute\>)
Test the following phoneme
* next2Ph(\<attribute\>)
Test the phoneme after the next phoneme.
* nextPhW(\<attribute\>)
Test the next phoneme, but only within the same word. Returns false if there is no following phoneme in the word.
* next2PhW(\<attribute\>)
Test the phoneme after the next phoneme, but only within the same word. Returns false if not found before the word end.
* next3PhW(\<attribute\>)
Test the third phoneme after the current phoneme, but only within the same word. Returns false if not found before the word end.
* nextVowel(\<attribute\>)
Test the next vowel after the current phoneme, but only within the same word. Returns false if there is none.
* prevVowel(\<attribute\>)
Test the previous vowel before the current phoneme, but only within the same word. Returns false if there is none.
* PreVoicing()
This is used as part of the instructions for voiced stop consonants (eg. [d] [g]). If true then produce a voiced murmur before the stop.
* KlattSynth()
Returns true if the voice is using the Klatt synthesizer rather than the eSpeak synthesizer.


### Attributes

Note: Additional attributes could be added to eSpeak if needed.

**Condition** Can be:
True if the phoneme has this phoneme name.

**Attributes**
* \<phoneme name\>
True if the phoneme has this phoneme name.

### Sound Specifications {.western}
* \<phoneme group\>
True if the phoneme has this starttype (or if it has this endtype if it's used in prevPh() ). The pre-defined phoneme groups are #@, #a, #e, #i, #o, #u.

* isPause
True if the phoneme is a pause.

* isPause2
`nextPh(isPause2)` is used to test whether the next phoneme is not a vowel or liquid consonant within the same word.

* isVowel
isNotVowel
isLiquid
isNasal
isVFricative
These test the phoneme type.

* isPalatal
isRhotic
These test whether the phoneme has this property.

* isWordStart
notWordStart
* These text whether this is the first phoneme in a word.

* isWordEnd
True if this is the final phoneme in a word.

* isFirstVowel
isSecondVowel
isFinalVowel
* True if this is the First, Second, or Last vowel in a word.

* isAfterStress
True if this phoneme is after the stressed vowel in a word.

* isVoiced
True if this phoneme is a vowel or a voiced consonant.

* isDiminished
True if the syllable stress is "diminished"

* isUnstressed
True if the syllable stress is "diminished" or "unstressed"

* isNotStressed
True if the syllable stress is not "primary stress".

* isStressed
True if the syllable stress is "primary stress".

* isMaxStress
True if this is the highest stressed syllable in the word.

## Sound Specifications

There are three ways to produce sounds:

- - -
* Playing a WAV file, by using a WAV() instruction. This is used for unvoiced consonants such as `[p] [t] [s]`.
* Generating a wave from a sequence of formant parameters, by using a FMT() instruction.This is used for vowels and also for sonorants such as `[l] [j] [n]`.
* A mixture of these. A stored WAV file is mixed with a wave generated from formant parameters. Use a FMT() instruction followed by addWav(). This is used for voiced stops and fricatives such as `[b] [g] [v] [z]`.


### Vowel Transitions {.western}
## Vowel Transitions

These specify how a consonant affects an adjacent vowel. A consonant may
cause a transition in the vowel's formants as the mouth changes shape
@@ -172,3 +428,34 @@ specified. Note that the maximum rate of change of formant frequencies
is limited by the speak program.



* **len=<integer>**
Nominal length of the transition in mS. If omitted a default value is used.

* **rms=<integer>**
Adjusts the amplitude of the vowel at the end of the transition. If omitted a default value is used.

* **f1=<integer>**
0: f1 formant frequency unchanged.
1: f1 formant frequency decreases.
2: f1 formant frequency decreases more.

* **f2=<freq> <min> <max>**
<freq>: The frequency towards which the f2 formant moves (Hz).
<min>: Signed integer (Hz). The minimum f2 frequency change.
<max>: Signed integer (Hz). The maximum f2 frequency change.

* **f3=<change> <amplitude>**
<change>: Signed integer (Hz). Frequence change of f3, f4, and f5 formants.
<amplitude>: Amplitude of the f3, f4, and f5 formants at the end of the transition. 100 = no change.

* **brk**
Break. Do not merge the synthesized wave of the consonant into the vowel. This will produce a discontinuity in the formants.

* **rate**
Allow a greater maximum rate of change of formant frequencies.

* **glstop**
Indicates a glottal stop.



+ 37
- 23
docs/ssml.md View File

@@ -1,64 +1,78 @@
TEXT MARKUP {.western}
-----------

### SSML: Speech Synthesis Markup Language {.western}
# Text markup

## SSML: Speech Synthesis Markup Language

The following markup tags and attributes are recognised:

**\<speak\>**

- -
* xml:base (the value is just passed back as a parameter with the UriCallback() function)
* xml:lang

**\<voice\>**

- - - - -
* xml:lang
* name
* age
* variant
* gender

**\<prosody\>**

- - - -
* rate
* volume
* pitch
* range

**\<say-as\>**

- - - - -
* interpret-as="characters"
* interpret-as="characters" format="glyphs"
* interpret-as="tts:key"
* interpret-as="tts:char"
* interpret-as="tts:digits"

**\<mark\>** name

**\<s\>**

-
* xml:lang

**\<p\>**

-
* xml:lang

**\<sub\>** alias

**\<tts:style\>**

- -
* field="punctuation" mode=none,all,some
* field="capital_letters" mode=no,spelling,icon,pitch

**\<audio\>** src

**\<emphasis\>**

-
* level

**\<break\>**

- -
* strength
* time

### HTML {.western}
## HTML

eSpeak can speak HTML text directly, or text containing both SSML and
HTML markup.\
Any unrecognised tags are ignored.
eSpeak can speak HTML text directly, or text containing both SSML and HTML markup.
Any unrecognised tags are ignored.

The following tags case a sentence break.\
**\<br\>   \<dd\>   \<li\>   \<img\>   \<td\>  **
The following tags case a sentence break.
**\<br\> \<dd\> \<li\> \<img\> \<td\> **

The following tags case a paragraph break.\
**\<h1\>   \<h2\>   \<h3\>   \<h4\>   \<hr\>  **
The following tags case a paragraph break.
**\<h1\> \<h2\> \<h3\> \<h4\> \<hr\> **

Text between the following tags is ignored.\
**\<script\>   ...   \</script\>  \
\<style\>   ...   \</style\>  **
Text between the following tags is ignored.
**\<script\> ... \</script\>
\<style\> ... \</style\>
**

+ 155
- 187
docs/voices.md View File

@@ -1,311 +1,279 @@
5. VOICES {.western}
---------

### 5.1 Voice Files {.western}
# Voice Files

A Voice file specifies a language (and possibly a language variant or
dialect) together with various attributes that affect the
characteristics of the voice quality and how the language is spoken.

Voice files are placed in the `espeak-data/voices`{.western} directory,
Voice files are placed in the `espeak-data/voices` directory,
or within subdirectories in there.

The available voice files can be listed by:

~~~~ {.western}
espeak-ng --voices
espeak-ng --voices
or
espeak-ng --voices=<language>
~~~~
espeak-ng --voices=<language>

also

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng --voices=<variant>
~~~~
espeak-ng --voices=<variant>

Lists voice variants which can be applied to eSpeak voices.
Lists voice variants which can be applied to eSpeak NG voices.

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng --voices=<mbrola>
~~~~
espeak-ng --voices=<mbrola>

Lists the Mbrola voices.

### 5.2 Contents of Voice Files {.western}
## Contents of Voice Files

The **language** attribute is mandatory. All the other attributes are
optional.

#### Identification Attributes {.western}

**name  \<name\>**

A name given to this voice.

**language  \<language code\> [\<priority\>]**
### Identification Attributes

This attribute should appear before the other attributes which are
listed below.
* **name  \<name\>**
A name given to this voice.

* **language  \<language code\> [\<priority\>]**
This attribute should appear before the other attributes which are
listed below.
It selects the default behaviour and characteristics for the language,
and sets default values for "phonemes", "dictionary" and other
attributes. The \<language code\> should be a two-letter ISO 639-1
language code. One or more language variant codes may be appended,
separated by hyphens. (eg. en-uk-north).

separated by hyphens. (eg. en-uk-north).
The optional \<priority\> value gives the preference of this voice
compared with others for the specified language. A low value indicates a
more preferred voice. The default value is 5.

more preferred voice. The default value is 5.
More than one **language** line may be present. A voice may be selected
for other related languages (variants which have the same initial 2
letter language code as the specified language), but it will be less
preferred for these. Different language variants may be specified by
additional **language** lines in order to indicate that this is a
preferred voice for them also. Eg.

~~~~ {.western}
language en-uk-north
language en
~~~~

indicates that this is voice is for the "en-uk-north" dialect, but it is
preferred voice for them also. Eg.
```
language en-uk-north
language en
```
indicates that this is voice is for the "en-uk-north" dialect, but it is
also a main choice when a general "en" language is specified. Without
the second **language** line, it would be disfavoured for "en" for being
a more specialised voice.

**gender  \<gender\> [\<age\>]**

This attribute is only a label for use in voice selection. It doesn't
change the sound of the voice.

\<gender\> may be male, female, or unknown.\
\<age\> is optional and gives an age in years.
* **gender  \<gender\> [\<age\>]**
This attribute is only a label for use in voice selection. It doesn't
change the sound of the voice.
\<gender\> may be male, female, or unknown.
\<age\> is optional and gives an age in years.

**pitch  \<base\> \<range\>**
### Voice Attributes

Two integer values. The first gives a base pitch to the voice (value in
* **pitch  \<base\> \<range\>**
Two integer values. The first gives a base pitch to the voice (value in
Hz) The second controls the range of pitches used by the voice. Setting
it equal to the base pitch will give a monotone. The default values are
82 118.

**formant  \<number\> \<frequency\> \<strength\> \<width\>
\<freq\_add\>**

Systematically adjusts the frequency, strength, and width of the
it equal to the base pitch will give a monotone. The default values are 82 118.
* **formant  \<number\> \<frequency\> \<strength\> \<width\>
\<freq\_add\>**
Systematically adjusts the frequency, strength, and width of the
resonance peaks of the voice. Values are percentages of the default
values. Changing these affects the tone/quality of the voice.

**freq\_add**Adds a constant value (in Hz) to the frequency of the
* **freq\_add**
Adds a constant value (in Hz) to the frequency of the
formant peak. The value may be negative.
* Formants 1,2,3 are the standard three formants which define vowels.
* Formant 0 is used to give a low frequency component to the sounds, of frequency lower than F1.
* Formants 4,5 are higher than F3. They affect the quality of the voice.
* Formants 6,7,8 are weak, high frequency, additions to vowels to give a clearer sound.

- - - -

**echo  \<delay\> \<amplitude\>**

Parameter 1 gives the delay in mS (0 to 250mS).\
Parameter 2 gives the echo amplitude (0 to 100).\
Adding some echo can give a clearer or more interesting sound,
* **echo  \<delay\> \<amplitude\>**
Parameter 1 gives the delay in mS (0 to 250mS).
Parameter 2 gives the echo amplitude (0 to 100).
Adding some echo can give a clearer or more interesting sound,
especially when listening through a domestic stereo sound system, rather
than small computer speakers.

**tone**

Controls the tone of the sound.\
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\>
* **tone**
Controls the tone of the sound.
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\>
which define a frequency response graph. Frequency is in Hz and
amplitude is in the range 0 to 255. The default is:

`  `{.western}`tone 600 170  1200 135  2000 110`{.western}

This means that from frequency 0Hz to 600Hz the amplitude is 170. From
amplitude is in the range 0 to 25 The default is:
`tone 600 170 1200 135 2000 110`
This means that from frequency 0Hz to 600Hz the amplitude is 17 From
600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases
to 110 at 2000Hz and remains at 110 at higher frequencies. This
adjustment applies only to voiced sounds such as vowels and sonorant
consonants (such as [n] and [l]). Unvoiced sounds such as [s] are
unaffected.

This **tone** statement can also appear in
`espeak-data/config`{.western}, in which case it applies to all voices
unaffected.
This **tone** statement can also appear in
`espeak-data/config`, in which case it applies to all voices
which don't have their own **tone** statement.

**flutter  \<value\>**

Default value: 2.\
Adds pitch fluctuations to give a wavering or older-sounding voice. A
* **flutter  \<value\>**
Default value: 100.
Adds pitch fluctuations to give a wavering or older-sounding voice. A
large value (eg. 20) makes the voice sound "croaky".

**roughness  \<value\>**

Default value: 2. Range 0 - 7\
Reduces the amplitude of alternate waveform cycles in order to make the
* **roughness  \<value\>**
Default value: Range 0 - 7
Reduces the amplitude of alternate waveform cycles in order to make the
voice sound creaky.

**voicing  \<value\>**

Default value: 100.\
Adjusts the strength of formant-synthesized sounds (vowels and sonorant
* **voicing  \<value\>**
Default value: 100
Adjusts the strength of formant-synthesized sounds (vowels and sonorant
consonants).

**consonants  \<value\> \<value\>**

Default values: 100, 100.\
Adjusts the strength of noise sounds which are used in consonants. The
first value is the strength of unvoiced consonants such as "s" and "t".
The second value is the strength of the noise component of voiced
* **consonants  \<value\> \<value\>**
Default values: 100, 100
Adjusts the strength of noise sounds which are used in consonants. The
first value is the strength of unvoiced consonants such as "s" and "t".
The second value is the strength of the noise component of voiced
consonants such as "z" and "d".

**breath  \<up to 8 integer values\>**

Default values: 0.\
Adds noise which corresponds to the formant frequency peaks. The values
give the strength of noise for each formant peak (formants 1 to 8).

Use together with a low or zero value of the **voicing** attribute to
make a "wisper". For example:\

`breath   75 75 60 40 15 10 breathw  150 150 200 200 400 400 voicing  18 flutter  20 formant   0 100 0 100   // remove formant 0 `{.western}

**breathw  \<up to 8 integer values\>**

These values give bandwidths of the noise peaks of the **breath**
* **breath  \<up to 8 integer values\>**
Default values: 0.
Adds noise which corresponds to the formant frequency peaks. The values
give the strength of noise for each formant peak (formants 1 to 8).
Use together with a low or zero value of the **voicing** attribute to
make a "wisper". For example:
```
breath 75 75 60 40 15 10
breathw 150 150 200 200 400 400
voicing 18
flutter 20
formant 0 100 0 100 // remove formant 0
```

* **breathw  \<up to 8 integer values\>**
These values give bandwidths of the noise peaks of the **breath**
attribute. If **breathw** values are not given, then suitable default
values will be used.

**speed  \<value\>**

Default value 100.\
Adjusts the speaking speed by a percentage of the default rate. This
* **speed  \<value\>**
Default value 10
Adjusts the speaking speed by a percentage of the default rate. This
can be used if a language voice seems faster or slower compared to other
voices.

**phonemes  \<name\>**
### Language Attributes

Specifies which set of phonemes to use from those contained in the
* **phonemes  \<name\>**
Specifies which set of phonemes to use from those contained in the
phontab, phonindex, and phondata data files. This is a **phonemetable**
name as given in the "phoneme" source file.

This parameter is usually not needed as it is set by default to the
This parameter is usually not needed as it is set by default to the
first two letters of the "language" parameter. However, different voices
of the same language can use different phoneme sets, to give different
accents.

**dictionary  \<name\>**

Specifies which pair of dictionary files to use. eg. "english" indicates
* **dictionary  \<name\>**
Specifies which pair of dictionary files to use. eg. "english" indicates
that *speak-data/en\_dict* should be used to translate from words to
phonemes. This parameter is usually not needed as it is set by default
to the first two letters of "language" parameter.

**dictrules  \<list of rule numbers\>**

Gives a list of conditional dictionary rules which are applied for this
* **dictrules  \<list of rule numbers\>**
Gives a list of conditional dictionary rules which are applied for this
voice. Rule numbers are in the range 0 to 31 and are specific to a
language dictionary. They apply to rules in the language's **\_rules**
dictionary file and also its **\_list** exceptions list. See
[dictionary.html](dictionary.html).

**replace  \<flags\> \<phoneme\> \<replacement phoneme\>**

Replace a phoneme by another whenever it occurs.

\<replacement phoneme\> may be NULL.

Flags: bit 0: replacement only occurs on the final phoneme of a word.\
Flags: bit 1: replacement doesn't occur in stressed syllables.\
eg.

~~~~ {.western}
* **replace  \<flags\> \<phoneme\> \<replacement phoneme\>**
Replace a phoneme by another whenever it occurs.
\<replacement phoneme\> may be NULL.
Flags: bit 0: replacement only occurs on the final phoneme of a word.
Flags: bit 1: replacement doesn't occur in stressed syllables.
eg.
```
replace 0 h NULL // drops h's
replace 0 V U // replaces vowel in 'strut' by that in 'foot'
// as occurs in northern British English
replace 3 N n // change 'fishing' to 'fishin' etc.
// (only the last phoneme of a word, only in unstressed syllables)
~~~~

The phoneme mnemonics can be defined for each language, but some are
listed in [phonemes.html](phonemes.html)

**stressLength  \<8 integer values\>**
```
The phoneme mnemonics can be defined for each language, but some are
listed in [phonemes](phonemes.md)

Eight integer parameters. These control the relative lengths of the
* **stressLength  \<8 integer values\>**
Eight integer parameters. These control the relative lengths of the
vowels in stressed and unstressed syllables.

- - - - - - - -

**stressAdd  \<8 integer values\>**

Eight integer parameters. These are added to the voice's corresponding
* 0 unstressed
* 1 diminished. Its use depends on the language. In English it's used for unstressed syllables within multisyllabic words. In Spanish it's used for unstressed final syllables.
* 2 secondary stress
* 3 words marked as "unstressed" in the dictionary
* 4 not currently used
* 5 not currently used
* 6 stressed syllable (the main syllable in stressed words)
* 7 tonic syllable (by default, the last stressed syllable in the clause)

* **stressAdd  \<8 integer values\>**
Eight integer parameters. These are added to the voice's corresponding
stressLength values. They are used in the voice variant files in
`espeak-data/voices/!v`{.western} to give some variety. Negative values
may be used.
`espeak-data/voices/!v` to give some variety. Negative values may be used.

**stressAmp  \<8 integer values\>**

Eight integer parameters. These control the relative amplitudes of the
* **stressAmp  \<8 integer values\>**
Eight integer parameters. These control the relative amplitudes of the
vowels in stressed and unstressed syllables (see stressLength above).
The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although
these defaults may be different for particular languages.

**intonation  \<param1\>**
- - - -
**charset  \<param1\>**
* **intonation  \<param1\>**
1 Default.
2 Less intonation.
3 Less intonation, and comma does not raise the pitch.
4 Pitch rises (rather than falls) at the end of sentence.

The ISO 8859 character set number. (not all are implemented).

**dictmin  \<value\>**
* **charset  \<param1\>**
The ISO 8859 character set number. (not all are implemented).

Used for some languages to detect if additional language data is
* **dictmin \<value\>**
Used for some languages to detect if additional language data is
installed. If the size of the compiled dictionary data for the language
(the file `espeak-data/*_dict`{.western}) is less than this size then a
(the file `espeak-data/*_dict`) is less than this size then a
warning is given.

**alphabet2  \<alphabet\> \<language\>**

Used to specify a language to be used to speak words which are written
in a non-native alphabet. eg:

~~~~ {.western style="margin-bottom: 0.5cm"}
alphabet2 cyr ru
~~~~

Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default
* **alphabet2  \<alphabet\> \<language\>**
Used to specify a language to be used to speak words which are written
in a non-native alphabet. eg:
```
alphabet2 cyr ru
```
Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default
language for latin alphabet is English.

**dictdialect  \<dialect\>**

Words can be marked in the \*\_list or \*\_rules file to be spoken using
* **dictdialect  \<dialect\>**
Words can be marked in the \*\_list or \*\_rules file to be spoken using
a foreign voice. This **dictdialect** attribute can be used to specify
which dialect of the foreign language should be used, instead of the
default dialect. The currently available dialects are:\
**en-us** (US English)\
**es-la** (Latin American Spanish).\
eg.

~~~~ {.western style="margin-bottom: 0.5cm"}
dictdialect en-us
~~~~

This means that any words or rules which are maked with \_\^\_EN will be
default dialect. The currently available dialects are:
**en-us** (US English)
**es-la** (Latin American Spanish).
eg.
```
dictdialect en-us
```
This means that any words or rules which are maked with \_\^\_EN will be
spoken with the US English voice instead of the default UK English
voice.
Additional attributes are available to set various internal options
which control how language is processed. These would normally be set in
the program code rather than in a voice file.

## Voice Files Provided

A number of Voice files are provided in the
`espeak-data/voices`{.western} directory. You can select one of these
`espeak-data/voices` directory. You can select one of these
with the **-v \<voice filename\>** parameter to the speak command.

**default**

This voice is used if none is specified in the speak command. You can
* **default**
This voice is used if none is specified in the speak command. You can
copy your preferred voice to "default" so you can use the speak command
without the need to specify a voice.

For a list of voices provided for English and other languages see
[Languages](languages.html).
[Languages](languages.md).

+ 4
- 53
phsource/ph_latvian View File

@@ -1,8 +1,6 @@
phoneme i

vowel starttype #i endtype #i
length 100
IfNextVowelAppend(;)
FMT(vowel/i_6)
endphoneme

@@ -59,8 +57,7 @@ endphoneme
phoneme i:
vowel starttype #i endtype #i
length 250
IfNextVowelAppend(;)
FMT(vowel/i_6)
FMT(vowel/i_7)
endphoneme

phoneme E
@@ -91,7 +88,7 @@ endphoneme
phoneme a
vowel starttype #a endtype #a
length 100
FMT(vowel/aa_7) // a_5 or aa_7
FMT(vowel/aa_7) // possible variants: a_3, a_5 or aa_7
endphoneme

phoneme a:
@@ -101,52 +98,6 @@ phoneme a:
FMT(vowel/aa_9) // was a_3 or aa_9
endphoneme

phoneme a3
vowel starttype #a endtype #a
length 100
//ChangeIfDiminished(a#)
FMT(vowel/a_3)
endphoneme

phoneme a5
vowel starttype #a endtype #a
length 100
//ChangeIfDiminished(a#)
FMT(vowel/a_5)
endphoneme

phoneme a5:
vowel starttype #a endtype #a
length 350
FMT(vowel/a_5)
endphoneme

phoneme a77
vowel starttype #a endtype #a
length 100
//ChangeIfDiminished(a#)
FMT(vowel/aa_7)
endphoneme

phoneme a77:
vowel starttype #a endtype #a
length 350
FMT(vowel/aa_7)
endphoneme

phoneme a22
vowel starttype #a endtype #a
length 100
//ChangeIfDiminished(a#)
FMT(vowel/aa_2)
endphoneme

phoneme a22:
vowel starttype #a endtype #a
length 350
FMT(vowel/aa_2)
endphoneme

phoneme o
vowel starttype #o endtype #o
length 100
@@ -168,7 +119,7 @@ endphoneme
phoneme u:
vowel starttype #u endtype #u
length 250
FMT(vowel/u)
FMT(vowel/u_3)
endphoneme


@@ -300,7 +251,7 @@ phoneme c
vls pal stop palatal
voicingswitch J
lengthmod 2
WAV(ustop/c, 80)
WAV(ustop/c, 90)
endphoneme

phoneme l

+ 149
- 0
src/espeak-ng.1.ronn View File

@@ -0,0 +1,149 @@
# espeak-ng - A multi-lingual software speech synthesizer.

## SYNOPSIS

__espeak-ng__ [<options>] [<&lt;words&gt;>]

## DESCRIPTION

__espeak-ng__ is a software speech synthesizer for English, and some other
languages.

## OPTIONS

* `-h`, `--help`:
Show summary of options.

* `--version`:
Prints the espeak library version and the location of the espeak voice
data.

* `-f <text file>`:
Text file to speak.

* `--stdin`:
Read text input from stdin instead of a file.

If neither -f nor --stdin are provided, &lt;words&gt; are spoken, or if no
words are provided then text is spoken from stdin a line at a time.

* `-q`:
Quiet, don't produce any speech (may be useful with -x).

* `-a <integer>`:
Amplitude, 0 to 200, default is 100.

* `-g <integer>`:
Word gap. Pause between words, units of 10ms at the default speed.

* `-k <integer>`:
Indicate capital letters with: 1=sound, 2=the word "capitals", higher
values = a pitch increase (try -k20).

* `-l <integer>`:
Line length. If not zero (which is the default), consider lines less than
this length as end-of-clause.

* `-p <integer>`:
Pitch adjustment, 0 to 99, default is 50.

* `-s <integer>`:
Speed in words per minute, default is 160.

* `-v <voice name>`:
Use voice file of this name from espeak-data/voices. A variant can be
specified using <voice>+<variant>, such as af+m3.

* `-w <wave file name>`:
Write output to this WAV file, rather than speaking it directly.

* `--split=<minutes>`:
Used with `-w` to split the audio output into &lt;minutes&gt; recorded
chunks.

* `-b`:
Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit.

* `-m`:
Indicates that the text contains SSML (Speech Synthesis Markup Language)
tags or other XML tags. Those SSML tags which are supported are
interpreted. Other tags, including HTML, are ignored, except that some HTML
tags such as &lt;hr&gt; &lt;h2&gt; and &lt;li&gt; ensure a break in the
speech.

* `-x`:
Write phoneme mnemonics to stdout.

* `-X`:
Write phonemes mnemonics and translation trace to stdout. If rules files
have been built with --compile=debug, line numbers will also be displayed.

* `-z`:
No final sentence pause at the end of the text.

* `--stdout`:
Write speech output to stdout.

* `--compile=voicename`:
Compile the pronunciation rules and dictionary in the current directory.
=&lt;voicename&lt; is optional and specifies which language is compiled.

* `--compile-debug=voicename`:
Compile the pronunciation rules and dictionary in the current directory as
above, but include line numbers, that get shown when -X is used.

* `--ipa`:
Write phonemes to stdout using International Phonetic Alphabet. --ipa=1 Use
ties, --ipa=2 Use ZWJ, --ipa=3 Separate with _.

* `--tie=<character>`:
The character to use to join multi-letter phonemes in -x and --ipa output.

* `--path=<path>`:
Specifies the directory containing the espeak-data directory.

* `--pho`:
Write mbrola phoneme data (.pho) to stdout or to the file in --phonout.

* `--phonout=<filename>`:
Write output from -x -X commands and mbrola phoneme data to this file.

* `--punct="<characters>"`:
Speak the names of punctuation characters during speaking. If
=&lt;characters&gt; is omitted, all punctuation is spoken.

* `--sep=<character>`:
The character to separate phonemes from the -x and --ipa output.

* `--voices[=<language code>]`:
Lists the available voices. If =&lt;language code&gt; is present then only
those voices which are suitable for that language are listed.

* `--voices=<directory>`:
Lists the voices in the specified subdirectory.

## EXAMPLES

* `espeak-ng "This is a test"`:
Speak the sentence "This is a test" using the default English voice.

* `espeak-ng -f hello.txt`:
Speak the contents of hello.txt using the default English voice.

* `cat hello.txt | espeak-ng`:
Speak the contents of hello.txt using the default English voice.

* `espeak-ng -x hello`:
Speak the word "hello" using the default English voice, and print the
phonemes that were spoken.

* `espeak-ng -ven-us "[[h@'loU]]"`:
Speak the phonemes "h@'loU" using the American English voice.

## AUTHOR

eSpeak NG is maintained by Reece H. Dunn <[email protected]>. It is based on
eSpeak by Jonathan Duddington <[email protected]>.

This manual page is based on the eSpeak page written by Luke Yelavich
<[email protected]> for the Ubuntu project.

Loading…
Cancel
Save