Browse Source

Initial conversion of documents from HTML to markdown

master
Valdis Vitolins 9 years ago
parent
commit
78aec60405
14 changed files with 2518 additions and 0 deletions
  1. 157
    0
      docs/add_language.md
  2. 101
    0
      docs/analyse.md
  3. 279
    0
      docs/commands.md
  4. 655
    0
      docs/dictionary.md
  5. 46
    0
      docs/editor.md
  6. 41
    0
      docs/editor_if.md
  7. 52
    0
      docs/index.md
  8. 102
    0
      docs/intonation.md
  9. 125
    0
      docs/languages.md
  10. 128
    0
      docs/mbrola.md
  11. 283
    0
      docs/phonemes.md
  12. 174
    0
      docs/phontab.md
  13. 64
    0
      docs/ssml.md
  14. 311
    0
      docs/voices.md

+ 157
- 0
docs/add_language.md View File

@@ -0,0 +1,157 @@
6. ADDING OR IMPROVING A LANGUAGE {.western}
---------------------------------

Most of the work doesn't need any programming knowledge. Just an
understanding of the language, an awareness of its features, patience
and attention to detail. Wikipedia is a good source of basic phonetic
information, eg
[http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel).

In many cases it should be fairly easy to add a rough implementation of
a new language, hopefully enough to be intelligible. After that it's a
gradual process of improvement.

### 6.1 Language Code {.western}

Generally, the language's international [ISO
639-1](http://en.wikipedia.org/wiki/ISO_639-1) code is used to identify
the language. It is used in the filenames which contain the language's
data. In the examples below the code **"fr"** is used as an example.
Replace this with the code of your language.

If the language does not have a 2-letter ISO\_639-1 code, then use the
3-letter ISO\_639-3 code. Language codes may differ from country codes.

It is possible to have different variants of a language for different
dialects. For example the sound of some phonemes are changed, or some of
the pronunciation rules differ.

### 6.2 Language Files {.western}

The following files are needed for your language.

- - - -

The **fr\_rules** and **fr\_list** files are compiled to produce the
file **espeak-data/fr\_dict**, which eSpeak uses when it is speaking.

### 6.3 Voice File {.western}

Each language needs a voice file in **espeak-data/voices** or
**espeak-data/voices/test**. The filename of the default voice for a
language should be the same as the language code (eg. "fr" for French).

Details of the contents of voice files are given in
[voices.html](http://espeak.sf.net/voices.html).

The simplest voice file would contain just 2 lines to give the language
name and language code, eg:

~~~~ {.western}
name french
language fr
~~~~

This language code specifies which phoneme table and dictionary to use
(i.e. **phonemetable fr** and **espeak-data/fr\_dict**) to be used. If
needed, these can be overridden by **phonemes** and **dictionary**
attributes in the voice file. For example you may want to start the
implementation of a new language by using the phoneme table of an
existing language.

### 6.4 Phoneme Definition File {.western}

You must first decide on the set of phonemes (vowel and consonant
sounds) for the language. These should be defined in a phoneme
definition file **ph\_xxxx**, where "ph\_xxxx" is the name of your
language. A reference to this file is then included at the end of the
master phoneme file, **phsource/phonemes**, eg:

~~~~ {.western}
phonemetable fr base
include ph_french
~~~~

This example defines a phoneme table **"fr"** which inherits the
contents of phoneme table **"base"**. Its contents are found in the file
**ph\_french**.

The **base** phoneme table contains definitions of a basic set of
consonants, and also some "control" phonemes such as stress marks and
pauses. These are defined in **phsource/phonemes**. The phoneme table
for a language will inherit these, or alternatively it may inherit the
phoneme table of another language which in turn inherits the **base**
phoneme table.

The phonemes file for the language defines those additional phonemes
which are not inherited (generally the vowels and diphthongs, plus any
additional consonants that are needed), or phonemes whose definitions
differ from the inherited version (eg. the redefinition of a consonant).

Details of phonemes files are given in
[phontab.html](http://espeak.sf.net/phontab.html).

The **Compile phoneme data** function of the **espeakedit** program
compiles the phonemes files of all languages to produce the files
**espeak-data/phontab**, **phonindex**, and **phondata** which are used
by eSpeak.

For many languages, the consonant phonemes which are already available
in eSpeak, together with the available vowel files which can be used to
define vowel phonemes, will be sufficient. At least for an initial
implementation.

### 6.5 Dictionary Files {.western}

Once the language's phonemes have been defined, then pronunciation
dictionary data can be produced in order to translate the language's
source text into phonemes. This consists of two source files:
**fr\_rules** (the spelling to phoneme rules) and **fr\_list** (an
exceptions list, and attributes of certain words). The corresponding
compiled data file is **espeak-data/fr\_dict** which is produced from
**fr\_rules** and **fr\_list** sources by the command:

> `espeak-ng --compile=fr`{.western}.

Or by using the **espeakedit** program.

Details of the contents of the dictionary files are given in
[dictionary.html](http://espeak.sf.net/dictionary.html).

The **fr\_list** file contains:

- - - -

### 6.6 Program Code {.western}

The behaviour of the eSpeak program is controlled by various options
such as:

- - - -

The function SetTranslator() at the start of the source code file
tr\_languages.cpp recognizes the language code and sets the appropriate
options. For a new language, you would add its language code and the
required options in SetTranslator(). However, this may not be necessary
during testing because most of the options can also be set in the voice
file in espeak-data/voices (see [Voice
files](http://espeak.sf.net/voices.html)).

### 6.7 Improving a Language {.western}

Listen carefully to the eSpeak voice. Try to identify what sounds wrong
and what needs to be improved.

- - - - -

**If you are interested in working on a language, please contact me so
that I can set up the initial data and discuss the features of the
language.**

For most of the eSpeak voices, I do not speak or understand the
language, and I do not know how it should sound. I can only make
improvements as a result of feedback from speakers of that language. If
you want to help to improve a language, listen carefully and try to
identify individual errors, either in the spelling-to-phoneme
translation, the position of stressed syllables within words, or the
sound of phonemes, or problems with rhythm and vowel lengths.

+ 101
- 0
docs/analyse.md View File

@@ -0,0 +1,101 @@
ANALYSIS
========

(Further notes are needed)

Recordings of spoken words and phrases can be analysed to try and make
eSpeak match a language more closely. Unlike most other (larger and
better quality) synthesizers, eSpeak's data is not produced directly
from recorded sounds. To use an analogy, it's like a drawing or sketch
compared with a photograph. Or vector graphics compared with a bitmap
image. It's smaller, less accurate, with less subtlety, but it can
sometimes show some aspects of the picture more clearly than a more
accurate image.

#### Recording Sounds {.western}

Recordings should be made while speaking slowly, clearly, and firmly and
loudly (but not shouting). Speak about half a metre from the microphone.
Try to avoid background noise and hum interference from electrical power
cables.

#### Praat {.western}

I use a modified version of the praat program
([www.praat.org](www.praat.org)) to view and analyse both sound
recordings and output from eSpeak. The modification adds a new function
(`Spectrum->To_eSpeak`{.western}) which analysis a voiced sound and
produces a file which can be loaded into espeakedit. Details of the
modification are in the `"praat-mod"`{.western} directory in the
espeakedit package. The analysis contains a sequence of frames, one per
cycle at the speech's fundamental frequency. Each frame is a short time
spectrum, together with praat's estimation of the f1 to f5 formant
frequencies at the time of that cycle. I also use Praat's
`New->Record_mono_sound`{.western} function to make sound recordings.

### Vowels and Diphthongs {.western}

#### Analysing a Recording {.western}

Make a recording, with a male voice, and trim it in Praat to keep just
the required vowel sound. Then use the new
`Spectrum->To_eSpeak`{.western} modification (this was named
`To_Spectrogram2`{.western} in earlier versions) to analyse the sound.
It produces a file named `"spectrum.dat"`{.western}. Load the
`"spectrum.dat"`{.western} file into espeakedit. Espeakedit has two Open
functions, `File->Open`{.western} and `File->Open2`{.western}. They are
the same, except that they remember different paths. I generally use
`File->Open2`{.western} for reading the `"spectrum.dat"`{.western} file.
The data is displayed in espeakedit as a sequence of spectrum frames
(see [editor.html](editor.html)).

#### Tone Quality {.western}

It can be difficult to match the tonal quality of a new vowel to be
compatible with existing vowel files. This is determined by the relative
heights and widths of the formant peaks. These vary depending on how the
recording was made, the microphone, and the strength and tone of the
voice. Also the positions of the higher peaks (F3 upwards) can vary
depending on the characteristics of the speaker's voice. Formant peaks
correspond to resonances within the mouth and throat, and they depend on
its size and shape. With a female voice, all the formants (F1 upwards)
are generally shifted to higher frequencies. For these reasons, it's
best to use a male voice, and to use its analysed spectra only as
guidance. Rather than construct formant-peaks entirely to match the
analysed data, instead copy keyframes from a similar existing vowel.
Then make small adjustments to match the position of the F1, F2, F3
formant peaks and hopefully produce the required vowel sound.

#### Using an Existing Vowel File {.western}

Choose a similar vowel file from `phsource/vowel`{.western} and open it
into espeakedit. It may be useful to use
`phsource/vowel/vowelchart`{.western} as a map to show how vowel files
compare with each other. You can select a keyframe from the vowel file
and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame
of the new spectrum sequence. Then adjust the peaks to match the new
frame. Press F1 to hear the sound of the formant peaks in the selected
frame. The F0 peak is provided in order to adjust the correct balance of
low frequencies, below the F1 peak. If the sound is too muffled, or
conversely, too "thin", try adjusting the amplitude or position of the
F0 peak.

#### Length and Amplitude {.western}

Use an existing vowel file as a guide for how to set the amplitude and
length of the keyframes. At the right of each keyframe, its length is
shown in mS and under that is its relative (RMS) amplitude. The second
keyframe should be marked with a red marker (use CTRL-M to toggle this).
This divides the vowel into the front-part (with one frame), and the
rest. Use F2 to play the sound of the new vowel sequence. It will also
produce a WAV file (the default name is speech.wav) which you can read
into praat to see whether it has a sensible shape.

#### Using the New Vowel {.western}

Make a new directory (eg. vwl\_xx) in phsource for your new vowels. Save
the spectrum sequence with a name which you have chosen for it. You can
then edit the phoneme file for your language (eg. phsource/ph\_xxx), and
change a phoneme to refer to your new vowel file. Then do
`Data->Compile_Phoneme_Data`{.western} from espeakedit's menubar to
re-compile the phoneme data.

+ 279
- 0
docs/commands.md View File

@@ -0,0 +1,279 @@
2.1 INSTALLATION {.western}
----------------

### 2.1.1 Linux and other Posix systems {.western}

There are two versions of the command line program. They both have the
same command parameters (see below).

1. 2.

Place the **espeak-ng** or **speak-ng** executable file in the command
path, eg in **/usr/local/bin**

Place the "**espeak-data**" directory in /usr/share as
**/usr/share/espeak-data**.\
Alternatively if it is placed in the user's home directory (i.e.
**/home/\<user\>/espeak-data**) then that will be used instead.

#### Dependencies {.western}

**espeak-ng** uses the PortAudio sound library (version 18), so you will
need to have the **libportaudio0** library package installed. It may be
already, since it's used by other software, such as OpenOffice.org and
the Audacity sound editor.

Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio
which has a slightly different API. The speak program can be compiled to
use version 19 of PortAudio by copying the file portaudio19.h to
portaudio.h before compiling.

The speak program may be compiled without using PortAudio, by removing
the line

~~~~ {.western style="margin-bottom: 0.5cm"}
#define USE_PORTAUDIO
~~~~

in the file speech.h.

### 2.1.2 Windows {.western}

The installer: **setup\_espeak.exe** installs the SAPI5 version of
eSpeak. During installation you need to specify which voices you want to
appear in SAPI5 voice menus.

It also installs a command line program **espeak-ng** in the espeak-ng
program directory.

2.2 COMMAND OPTIONS {.western}
-------------------

### 2.2.1 Examples {.western}

To use at the command line, type:\
  **espeak-ng "This is a test"**\
or\
  **espeak-ng -f \<text file\>**

Or just type\
  **espeak-ng**\
followed by text on subsequent lines. Each line is spoken when RETURN
is pressed.

Use **espeak-ng -x** to see the corresponding phoneme codes.

### 2.2.2 The Command Line Options {.western}

**espeak-ng [options] ["text words"]**
: Text input can be taken either from a file, from a string in the
command, or from stdin.
**-f \<text file\>**
: Speaks a text file.
**--stdin**
: Takes the text input from stdin.
If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). \
If that is not present then text is taken from stdin, but each line is treated as a separate sentence. \
**-a \<integer\>**
: Sets amplitude (volume) in a range of 0 to 200. The default is 100.
**-p \<integer\>**
: Adjusts the pitch in a range of 0 to 99. The default is 50.
**-s \<integer\>**
: Sets the speed in words-per-minute (approximate values for the
default English voice, others may differ slightly). The default
value is 175. I generally use a faster speed of 260. The lower limit
is 80. There is no upper limit, but about 500 is probably a
practical maximum.
**-b \<integer\>**
: Input text character format.
: 1   UTF-8. This is the default.
: 2   The 8-bit character set which corresponds to the language (eg.
Latin-2 for Polish).
: 4   16 bit Unicode.
: Without this option, eSpeak assumes text is UTF-8, but will
automatically switch to the 8-bit character set if it finds an
illegal UTF-8 sequence.
**-g \<integer\>**
: Word gap. This option inserts a pause between words. The value is
the length of the pause, in units of 10 mS (at the default speed of
170 wpm).
**-h**or **--help**
: The first line of output gives the eSpeak version number.
**-k \<integer\>**
: Indicate words which begin with capital letters.
: 1   eSpeak uses a click sound to indicate when a word starts with a
capital letter, or double click if word is all capitals.
: 2   eSpeak speaks the word "capital" before a word which begins with
a capital letter.
: Other values:   eSpeak increases the pitch for words which begin
with a capital letter. The greater the value, the greater the
increase in pitch. Try -k20.
**-l \<integer\>**
: Line-break length, default value 0. If set, then lines which are
shorter than this are treated as separate clauses and spoken
separately with a break between them. This can be useful for some
text files, but bad for others.
**-m**
: Indicates that the text contains SSML (Speech Synthesis Markup
Language) tags or other XML tags. Those SSML tags which are
supported are interpreted. Other tags, including HTML, are ignored,
except that some HTML tags such as \<hr\> \<h2\> and \<li\> ensure a
break in the speech.
**-q**
: Quiet. No sound is generated. This may be useful with options such
as -x and --pho.
**-v \<voice filename\>[+\<variant\>]**
: Sets a Voice for the speech, usually to select a language. eg:

~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"}
espeak-ng -vaf
~~~~

To use the Afrikaans voice. A modifier after the voice name can be used
to vary the tone of the voice, eg:

~~~~ {.western style="margin-left: 1cm; margin-bottom: 0.5cm"}
espeak-ng -vaf+3
~~~~

The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male voices
and `+f1 +f2 +f3 +f4 `{.western}which simulate female voices by using
higher pitches. Other variants include `+croak`{.western} and
`+whisper`{.western}.

\<voice filename\> is a file within the `espeak-data/voices`{.western}
directory.\
\<variant\> is a file within the `espeak-data/voices/!v`{.western}
directory.

Voice files can specify a language, alternative pronunciations or
phoneme sets, different pitches, tonal qualities, and prosody for the
voice. See the [voices.html](voices.html) file.

Voice names which start with **mb-** are for use with Mbrola diphone
voices, see [mbrola.html](mbrola.html)

Some languages may need additional dictionary data, see
[languages.html](languages.html)

**-w \<wave file\>**

Writes the speech output to a file in WAV format, rather than speaking
it.

**-x**

The phoneme mnemonics, into which the input text is translated, are
written to stdout. If a phoneme name contains more than one letter (eg.
[tS]), the --sep or --tie option can be used to distinguish this from
separate phonemes.

**-X**

As -x, but in addition, details are shown of the pronunciation rule and
dictionary list lookup. This can be useful to see why a certain
pronunciation is being produced. Each matching pronunciation rule is
listed, together with its score, the highest scoring rule being used in
the translation. "Found:" indicates the word was found in the dictionary
lookup list, and "Flags:" means the word was found with only properties
and not a pronunciation. You can see when a word has been retranslated
after removing a prefix or suffix.

**-z**

The option removes the end-of-sentence pause which normally occurs at
the end of the text.

**--stdout**

Writes the speech output to stdout as it is produced, rather than
speaking it. The data starts with a WAV file header which indicates the
sample rate and format of the data. The length field is set to zero
because the length of the data is unknown when the header is produced.

**--compile [=\<voice name\>]**

Compile the pronunciation rule and dictionary lookup data from their
source files in the current directory. The Voice determines which
language's files are compiled. For example, if it's an English voice,
then *en\_rules*, *en\_list*, and *en\_extra* (if present), are compiled
to replace *en\_dict* in the *speak-data* directory. If no Voice is
specified then the default Voice is used.

**--compile-debug [=\<voice name\>]**

The same as **--compile**, but source line numbers from the \*\_rules
file are included. These are included in the rules trace when the **-X**
option is used.

**--ipa**

Writes phonemes to stdout, using the International Phonetic Alphabet
(IPA).\
If a phoneme name contains more than one letter (eg. [tS]), the --sep
or --tie option can be used to distinguish this from separate phonemes.

**--path [="\<directory path\>"]**

Specifies the directory which contains the espeak-data directory.

**--pho**

When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme
data (.pho file format) to stdout. This includes the mbrola phoneme
names with duration and pitch information, in a form which is suitable
as input to this mbrola voice. The --phonout option can be used to write
this data to a file.

**--phonout [="\<filename\>"]**

If specified, the output from -x, -X, --ipa, and --pho options is
written to this file, rather than to stdout.

**--punct [="\<characters\>"]**

Speaks the names of punctuation characters when they are encountered in
the text. If \<characters\> are given, then only those listed
punctuation characters are spoken, eg. `--punct=".,;?"`{.western}

**--sep [=\<character\>]**

The character is used to separate individual phonemes in the output
which is produced by the -x or --ipa options. The default is a space
character. The character z means use a ZWNJ character (U+200c).

**--split [=\<minutes\>]**

Used with **-w**, it starts a new WAV file every `<minutes>`{.western}
minutes, at the next sentence boundary.

**--tie [=\<character\>]**

The character is used within multi-letter phonemes in the output which
is produced by the -x or --ipa options. The default is the tie
character  ͡  U+361. The character z means use a ZWJ character (U+200d).

**--voices [=\<language code\>]**

Lists the available voices.\
If =\<language code\> is present then only those voices which are
suitable for that language are listed.\
`--voices=mbrola`{.western} lists the voices which use mbrola diphone
voices. These are not included in the default `--voices`{.western} list\
`--voices=variant`{.western} lists the available voice variants (voice
modifiers).

### 2.2.3 The Input Text {.western}

**HTML Input**
: If the -m option is used to indicate marked-up text, then HTML can
be spoken directly.
**Phoneme Input**
: As well as plain text, phoneme mnemonics can be used in the text
input to **espeak-ng**. They are enclosed within double square
brackets. Spaces are used to separate words and all stressed
syllables must be marked explicitly.
:   eg:  
`espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]" `{.western}
: This command will speak: "This is some phonetic text input".


+ 655
- 0
docs/dictionary.md View File

@@ -0,0 +1,655 @@
4. TEXT TO PHONEME TRANSLATION {.western}
------------------------------

### 4.1 Translation Files {.western}

There is a separate set of pronunciation files for each language, their
names starting with the language name.

There are two separate methods for translating words into phonemes:

- -

These two files are compiled into the file ***\<language\>\_dict***  in
the espeak-data directory (eg. espeak-data/en\_dict)

### 4.2 Phoneme names {.western}

Each of the language's phonemes is represented by a mnemonic of 1, 2, 3,
or 4 characters. Together with a number of utility codes (eg. stress
marks and pauses), these are defined in the phoneme data file (see
\*spec not yet available\*).

The utility 'phonemes' are:

+--------------------------------------+--------------------------------------+
| **'** | primary stress |
+--------------------------------------+--------------------------------------+
| **,** | secondary stress |
+--------------------------------------+--------------------------------------+
| **%** | unstressed syllable |
+--------------------------------------+--------------------------------------+
| **=   ** | put the primary stress on the |
| | preceding syllable |
+--------------------------------------+--------------------------------------+
| **\_:** | short pause |
+--------------------------------------+--------------------------------------+
| **\_** | a shorter pause |
+--------------------------------------+--------------------------------------+
| **||** | indicates a word boundary within a |
| | phoneme string |
+--------------------------------------+--------------------------------------+
| **|** | can be used to separate two adjacent |
| | characters, to prevent them from |
| | being considered as a |
| | multi-character phoneme mnemonic |
+--------------------------------------+--------------------------------------+

It is not necessary to specify the stress of every syllable. Stress
markers are only needed in order to change the effect of the language's
default stress rule.

The phonemes which are used to represent a language's sounds are based
loosely on the Kirshenbaum ascii character representation of the
International Phonetic Alphabet
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf)

### 4.3 Pronunciation Rules {.western}

The rules in the ***\<language\>\_rules***  file specify the phonemes
which are used to pronounce each letter, or sequence of letters. Some
rules only apply when the letter or letters are preceded by, or followed
by, other specified letters.

To find the pronunciation of a word, the rules are searched and any
which match the letters at the in the word are given a score depending
on how many letters are matched. The pronunciation from the best
matching rule is chosen. The pointer into the source word is then
advanced past those letters which have been matched and the process is
repeated until all the letters of the word have been processed.

#### 4.3.1 Rule Groups {.western}

The rules are organized in groups, each starting with a ".group" line:

When matching a word, firstly the 2-letter group for the two letters at
the current position in the word (if such a group exists) is searched,
and then the single-letter group. The highest scoring rule in either of
those two groups is used.

#### 4.3.2 Rules {.western}

Each rule is on separate line, and has the syntax:

eg.

"oo" is pronounced as [u:], but when also preceded by "b" and followed
by "k", it is pronounced [U].

In the case of a single-letter group, the first character of \<match\>
much be the group letter. In the case of a 2-letter group, the first two
characters of \<match\> must be the group letters. The second and third
rules above may be in either .group o or .group oo

Alphabetic characters in the \<pre\>, \<match\>, and \<post\> parts must
be lower case, and matching is case-insensitive. Some upper case letters
are used in \<pre\> and \<post\> with special meanings.

#### 4.3.3 Special characters in \<phoneme string\>: {.western}

+--------------------------------------+--------------------------------------+
| **\_\^\_\<language code\>   ** | Translate using a different |
| | language. |
+--------------------------------------+--------------------------------------+

#### 4.3.4 Special Characters in both \<pre\> and \<post\>: {.western}

+--------------------------------------+--------------------------------------+
| **\_** | Beginning or end of a word (or a |
| | hyphen). |
+--------------------------------------+--------------------------------------+
| **-** | Hyphen. |
+--------------------------------------+--------------------------------------+
| **A** | Any vowel (the set of vowel |
| | characters may be defined for a |
| | particular language). |
+--------------------------------------+--------------------------------------+
| **C** | Any consonant. |
+--------------------------------------+--------------------------------------+
| **B H F G Y ** | These may indicate other sets of |
| | characters (defined for a particular |
| | language). |
+--------------------------------------+--------------------------------------+
| **L\<nn\>** | Any of the sequence of characters |
| | defined as a letter group (see 4.3.1 |
| | above). |
+--------------------------------------+--------------------------------------+
| **D** | Any digit. |
+--------------------------------------+--------------------------------------+
| **K** | Not a vowel (i.e. a consonant or |
| | word boundary or non-alphabetic |
| | character). |
+--------------------------------------+--------------------------------------+
| **X** | There is no vowel until the word |
| | boundary. |
+--------------------------------------+--------------------------------------+
| **Z** | A non-alphabetic character. |
+--------------------------------------+--------------------------------------+
| **%** | Doubled (placed before a character |
| | in \<pre\> and after it in \<post\>. |
+--------------------------------------+--------------------------------------+
| **/** | The following character is treated |
| | literally. |
+--------------------------------------+--------------------------------------+

The sets of letters indicated by A, B, C, E, F G may be defined
differently for each language.

Examples of rules:

~~~~ {.western}
_) a // "a" at the start of a word
a (CC // "a" followed by two consonants
a (C% // "a" followed by a double consonant (the same letter twice)
a (/% // "a" followed by a percent sign
%C) a // "a" preceded by a double consonants
~~~~

#### 4.3.5 Special characters only in \<pre\>: {.western}

+--------------------------------------+--------------------------------------+
| **@   ** | Any syllable. |
+--------------------------------------+--------------------------------------+
| **&** | A syllable which may be stressed |
| | (i.e. is not defined as unstressed). |
+--------------------------------------+--------------------------------------+
| **V** | Matches only if a previous word has |
| | indicated that a verb form is |
| | expected. |
+--------------------------------------+--------------------------------------+

eg.

~~~~ {.western}
@@) bi // "bi" preceded by at least two syllables
@@a) bi // "bi" preceded by at least 2 syllables and following 'a'
~~~~

Note, that matching characters in the \<pre\> part do not affect the
syllable counting.

#### 4.3.6 Special characters only in \<post\>: {.western}

+--------------------------------------+--------------------------------------+
| **@** | A vowel follows somewhere in the |
| | word. |
+--------------------------------------+--------------------------------------+
| **+** | Force an increase in the score in |
| | this rule (may be repeated for more |
| | effect). |
+--------------------------------------+--------------------------------------+
| **S\<number\>  ** | This number of matching characters |
| | are a standard suffix, remove them |
| | and retranslate the word. |
+--------------------------------------+--------------------------------------+
| **P\<number\>** | This number of matching characters |
| | are a standard prefix, remove them |
| | and retranslate the word. |
+--------------------------------------+--------------------------------------+
| **Lnn** | **nn** is a 2-digit decimal number |
| | in the range 01 to 20\ |
| | Matches with any of the letter |
| | sequences which have been defined |
| | for letter group **nn** |
+--------------------------------------+--------------------------------------+
| **N** | Only use this rule if the word is |
| | not a retranslation after removing a |
| | suffix. |
+--------------------------------------+--------------------------------------+
| **\#** | (English specific) change the next |
| | "e" into a special character "E" |
+--------------------------------------+--------------------------------------+
| **\$noprefix** | Only use this rule if the word is |
| | not a retranslation after removing a |
| | prefix. |
+--------------------------------------+--------------------------------------+
| **\$w\_alt\ | Only use this rule if the word is |
| \$w\_alt2\ | found in the \*\_list file with the |
| \$w\_alt3** | **\$alt**, **\$alt2** or **\$alt3** |
| | attribute respectively. |
+--------------------------------------+--------------------------------------+
| **\$p\_alt\ | Only use this rule if the part-word, |
| \$p\_alt2\ | up to and including the pre and |
| \$p\_alt3** | match parts of this rule, is found |
| | in the \*\_list file with the |
| | **\$alt**, **\$alt2** or **\$alt3** |
| | attribute respectively. |
+--------------------------------------+--------------------------------------+

eg.

~~~~ {.western}
@) ly (_S2 lI // "ly", at end of a word with at least one other
// syllable, is a suffix pronounced [lI]. Remove
// it and retranslate the word.

_) un (@P2 %Vn // "un" at the start of a word is an unstressed
// prefix pronounced [Vn]
_) un (i ju: // ... except in words starting "uni"
_) un (inP2 ,Vn // ... but it is for words starting "unin"
~~~~

S and P must be at the end of the \<post\> string.

S\<number\> may be followed by additional letters (eg. S2ei ). Some of
these are probably specific to English, but similar functions could be
made for other languages.

+--------------------------------------+--------------------------------------+
| **q** | query the \_list file to find stress |
| | position or other attributes for the |
| | stem, but don't re-translate the |
| | word with the suffix removed. |
+--------------------------------------+--------------------------------------+
| **t** | determine the stress pattern of the |
| | word **before** adding the suffix |
+--------------------------------------+--------------------------------------+
| **d   ** | the previous letter may have been |
| | doubled when the suffix was added. |
+--------------------------------------+--------------------------------------+
| **e** | "e" may have been removed. |
+--------------------------------------+--------------------------------------+
| **i** | "y" may have been changed to "i." |
+--------------------------------------+--------------------------------------+
| **v** | the suffix means the verb form of |
| | pronunciation should be used. |
+--------------------------------------+--------------------------------------+
| **f** | the suffix means the next word is |
| | likely to be a verb. |
+--------------------------------------+--------------------------------------+
| **m** | after this suffix has been removed, |
| | additional suffixes may be removed. |
+--------------------------------------+--------------------------------------+

P\<number\> may be followed by additonal letters (eg. P3v ).

+--------------------------------------+--------------------------------------+
| **t   ** | determine the stress pattern of the |
| | word **before** adding the prefix |
+--------------------------------------+--------------------------------------+
| **v** | the suffix means the verb form of |
| | pronunciation should be used. |
+--------------------------------------+--------------------------------------+

### 4.4 Pronunciation Dictionary List {.western}

The ***\<language\>\_list***  file contains a list of words whose
pronunciations are given explicitly, rather than determined by the
Pronunciation Rules. The ***\<language\>\_extra***  file, if present, is
also used and it's contents are taken as coming after those in
***\<language\>\_list***.

Also the list can be used to specify the stress pattern, or other
properties, of a word.

If the Pronunciation rules are applied to a word and indicate a standard
prefix or suffix, then the word is again looked up in Pronunciation
Dictionary List after the prefix or suffix has been removed.

Lines in the dictionary list have the form:

eg.

~~~~ {.western style="margin-bottom: 0.5cm"}
book bUk
~~~~

Rather than a full pronunciation, just the stress may be given, to
change where it would be otherwise placed by the Pronunciation Rules:

~~~~ {.western}
berlin $2 // stress on second syllable
absolutely $3 // stress on third syllable
for $u // an unstressed word
~~~~

#### 4.4.1 Multiple Words {.western}

A pronunciation may also be specified for a group of words, when these
appear together. Up to four words may be given, enclosed in brackets.
This may be used for change the pronunciation or stress pattern when
these words occur together,

~~~~ {.western style="margin-bottom: 0.5cm"}
(de jure) deI||dZ'U@rI2 // note || used as a word break in the phoneme string
~~~~

or to run them together, pronounced as a single word

~~~~ {.western style="margin-bottom: 0.5cm"}
(of a) @v@
~~~~

or to give them a flag when they occur together

~~~~ {.western style="margin-bottom: 0.5cm"}
(such as) sVtS||a2z $pause // precede with a pause
~~~~

Hyphenated words in the ***\<language\>\_list***  file must also be
enclosed within brackets, because the two parts are considered as
separate words.

#### 4.4.2 Special characters in \<phoneme string\>: {.western}

+--------------------------------------+--------------------------------------+
| **\_\^\_\<language code\>   ** | Translate using a different |
| | language. See explanation in 4.3.3 |
| | above. |
+--------------------------------------+--------------------------------------+

#### 4.4.3 Flags {.western}

A word (or group of words) may be given one or more flags, either
instead of, or as well as, the phonetic translation.

+--------------------------------------+--------------------------------------+
| \$u | The word is unstressed. In the case |
| | of a multi-syllable word, a slight |
| | stress is applied according to the |
| | default stress rules. |
+--------------------------------------+--------------------------------------+
| \$u1 | The word is unstressed, with a |
| | slight stress on its 1st syllable. |
+--------------------------------------+--------------------------------------+
| \$u2 | The word is unstressed, with a |
| | slight stress on its 2nd syllable. |
+--------------------------------------+--------------------------------------+
| \$u3 | The word is unstressed, with a |
| | slight stress on its 3rd syllable. |
+--------------------------------------+--------------------------------------+
|   |   |
+--------------------------------------+--------------------------------------+
| \$u+ \$u1+ \$u2+ \$u3+ | As above, but the word has full |
| | stress if it's at the end of a |
| | clause. |
+--------------------------------------+--------------------------------------+
|   |   |
+--------------------------------------+--------------------------------------+
| \$1 | Primary stress on the 1st syllable. |
+--------------------------------------+--------------------------------------+
| \$2 | Primary stress on the 2nd syllable. |
+--------------------------------------+--------------------------------------+
| \$3 | Primary stress on the 3rd syllable. |
+--------------------------------------+--------------------------------------+
| \$4 | Primary stress on the 4th syllable. |
+--------------------------------------+--------------------------------------+
| \$5 | Primary stress on the 5th syllable. |
+--------------------------------------+--------------------------------------+
| \$6 | Primary stress on the 6th syllable. |
+--------------------------------------+--------------------------------------+
| \$7 | Primary stress on the 7th syllable. |
+--------------------------------------+--------------------------------------+
|   |   |
+--------------------------------------+--------------------------------------+
| \$pause | Ensure a short pause before this |
| | word (eg. for conjunctions such as |
| | "and", some prepositions, etc). |
+--------------------------------------+--------------------------------------+
| \$brk | Ensure a very short pause before |
| | this word, shorter than \$pause (eg. |
| | for some prepositions, etc). |
+--------------------------------------+--------------------------------------+
| \$only | The rule does not apply if a prefix |
| | or suffix has already been removed. |
+--------------------------------------+--------------------------------------+
| \$onlys | As \$only, except that a standard |
| | plural ending is allowed. |
+--------------------------------------+--------------------------------------+
| \$stem | The rule only applies if a suffix |
| | has already been removed. |
+--------------------------------------+--------------------------------------+
| \$strend | Word is fully stressed if it's at |
| | the end of a clause. |
+--------------------------------------+--------------------------------------+
| \$strend2 | As \$strend, but the word is also |
| | stressed if followed only by |
| | unstressed word(s). |
+--------------------------------------+--------------------------------------+
| \$unstressend  | Word is unstressed if it's at the |
| | end of a clause. |
+--------------------------------------+--------------------------------------+
| \$atend | Use this pronunciation if it's at |
| | the end of a clause. |
+--------------------------------------+--------------------------------------+
| \$double | Cause a doubling of the initial |
| | consonant of the following word |
| | (used for Italian). |
+--------------------------------------+--------------------------------------+
| \$capital | Use this pronunciation if the word |
| | has initial capital letter (eg. |
| | polish v Polish). |
+--------------------------------------+--------------------------------------+
| \$allcaps | Use this pronunciation if the word |
| | is all capitals. |
+--------------------------------------+--------------------------------------+
| \$dot | Ignore a . after this word even when |
| | followed by a capital letter (eg. |
| | Mr. Dr. ). |
+--------------------------------------+--------------------------------------+
| \$hasdot | Use this pronunciation if the word |
| | is followed by a dot. (This |
| | attribute also implies \$dot). |
+--------------------------------------+--------------------------------------+
| \$sentence | The rule only applies if the clause |
| | includes end-of-sentence (i.e. it is |
| | not terminated by a comma). For |
| | example, "\$atend \$sentence" means |
| | that the rule only applies at the |
| | end of a sentence. |
+--------------------------------------+--------------------------------------+
| \$abbrev | This has two meanings.\ |
| | 1. If there is no phoneme string: |
| | Speak the word as individual |
| | letters, even if it contains a vowel |
| | (eg. "abc" should be spoken as "a" |
| | "b" "c").\ |
| | 2. If there is a phoneme string: |
| | This word is capitalized because it |
| | is an abbreviation and |
| | capitalization does not indicate |
| | emphasis (if the "emphasize |
| | all-caps" is on). |
+--------------------------------------+--------------------------------------+
|   |   |
+--------------------------------------+--------------------------------------+
| \$accent | Used for the pronunciation of a |
| | single alphabetic character. The |
| | character name is spoken as the |
| | base-letter name plus the accent |
| | (diacritic) name. eg. It can be used |
| | to specify that "â" is spoken as "a" |
| | "circumflex". |
+--------------------------------------+--------------------------------------+
| \$combine | This word is treated as though it is |
| | combined with the following word |
| | with a hyphen. This may be subject |
| | to fuither conditions for certain |
| | languages. |
+--------------------------------------+--------------------------------------+
| \$alt   \$alt2   \$alt3 | These are language specific. Their |
| | use should be described in the |
| | language's \*\*\_list file |
+--------------------------------------+--------------------------------------+
|   |   |
+--------------------------------------+--------------------------------------+
| \$verb | Use this pronunciation if it's a |
| | verb. |
+--------------------------------------+--------------------------------------+
| \$noun | Use this pronunciation if it's a |
| | noun. |
+--------------------------------------+--------------------------------------+
| \$past | Use this pronunciation if it's past |
| | tense. |
+--------------------------------------+--------------------------------------+
| \$verbf | The following word is probably is a |
| | verb. |
+--------------------------------------+--------------------------------------+
| \$verbsf | The following word is probably is a |
| | if it has an "s" suffix. |
+--------------------------------------+--------------------------------------+
| \$nounf | The following word is probably not a |
| | verb. |
+--------------------------------------+--------------------------------------+
| \$pastf | The following word is probably past |
| | tense. |
+--------------------------------------+--------------------------------------+
| \$verbextend | Extend the influence of \$verbf and |
| | \$verbsf. |
+--------------------------------------+--------------------------------------+

The last group are probably English specific, but something similar may
be useful in other languages. They are a crude attempt to improve the
accuracy of pairs like ob'ject (verb) v 'object (noun) and read
(present) v read (past).

The dictionary list is searched from bottom to top. The first match that
satisfies any conditions is used (i.e. the one lowest down the list). So
if we have:

~~~~ {.western}
to t@ // unstressed version
to tu: $atend // stressed version
~~~~

then if "to" is at the end of the clause, we get [tu:], if not then we
get [t@].

#### 4.4.4 Translating a Word to another Word {.western}

Rather than specifying the pronunciation of a word by a phoneme string,
you can specify another "sounds like" word.

Use the attribute **\$text** eg.

~~~~ {.western style="margin-bottom: 0.5cm"}
cough coff $text
~~~~

Alternatively, use the command **\$textmode** on a line by itself to
turn this on for all subsequent entries in the file, until it's turned
off by **\$phonememode**. eg.

~~~~ {.western}
$textmode
cough coff
through threw
$phonememode
~~~~

This feature cannot be used for the special entries in the **\_list**
files which start with an underscore, such as numbers.

Currently "textmode" entries are only recognized for complete words, and
not for for stems from which a prefix or suffix has been removed (eg.
the word "coughs" would not match the example above).

### 4.5 Conditional Rules {.western}

Rules in a **\_rules** file and entries in a **\_list** file can be made
conditional. They apply only to some voices. This can be useful to
specify different pronunciations for different variants of a language
(dialects or accents).

Conditional rules have   **?**   and a condition number at the start if
the line in the **\_rules** or **\_list** file. This means that the rule
only applies of that condition number is specified in a **dictrules**
line in the [voice file](voices.html).

If the rule starts with   **?!**   then the rule only applies if the
condition number is **not** specified in the voice file. eg.

~~~~ {.western}
?3 can't kant // only use this if the voice has: dictrules 3
?!3 rather rA:D3 // only use if the voice doesn't have: dictrules 3
~~~~

### 4.6 Numbers and Character Names {.western}

#### 4.6.1 Letter names {.western}

The names of individual letters can be given either in the **\_rules**
or **\_list** file. Sometimes an individual letter is also used as a
word in the language and its pronunciation as a word differs from its
letter name. If so, it should be listed in the **\_list** file, preceded
by an underscore, to give the letter name (as distinct from its
pronunciation as a word). eg. in English:

~~~~ {.western style="margin-bottom: 0.5cm"}
_a eI
~~~~

#### 4.6.2 Numbers {.western}

The operation the TranslateNumber() function is controlled by the
language's `langopts.numbers`{.western} option. This constructs spoken
numbers from fragments according to various options which can be set for
each language. The number fragments are given in the **\_list** file.

+--------------------------------------+--------------------------------------+
| \_0 to \_9   | The numbers 0 to 9 |
+--------------------------------------+--------------------------------------+
| \_13 | etc. Any pronunciations which are |
| | needed for specific numbers in the |
| | range \_10 to \_99   |
+--------------------------------------+--------------------------------------+
| \_2X  \_3X | Twenty, thirty, etc., used to make |
| | numbers 10 to 99   |
+--------------------------------------+--------------------------------------+
| \_0C | The word for "hundred" |
+--------------------------------------+--------------------------------------+
| \_1C  \_2C | Special pronunciation for one |
| | hundred, two hundred, etc., if |
| | needed. |
+--------------------------------------+--------------------------------------+
| \_1C0 | Special pronunciation (if needed) |
| | for 100 exactly |
+--------------------------------------+--------------------------------------+
| \_0M1 | The word for "thousand" |
+--------------------------------------+--------------------------------------+
| \_0M2 | The word for "million" |
+--------------------------------------+--------------------------------------+
| \_0M3 | The word for 1000000000 |
+--------------------------------------+--------------------------------------+
| \_1M1  \_2M1 | Special pronunciation for one |
| | thousand, two thousand, etc, if |
| | needed |
+--------------------------------------+--------------------------------------+
| \_0and | Word for "and" when speaking numbers |
| | (eg. "two hundred and twenty"). |
+--------------------------------------+--------------------------------------+
| \_dpt | Word spoken for the decimnal |
| | point/comma |
+--------------------------------------+--------------------------------------+
| \_dpt2 | Word spoken (if any) at the end of |
| | all the digits after a decimal |
| | point. |
+--------------------------------------+--------------------------------------+

### 4.7 Character Substitution {.western}

Character substitutions can be specified by using a **.replace**section
at the start of the **\_rules**file. Each line specified either one or
two alphabetic characters to be replaced by another one or two
alphabetic characters. This substitution is done to a word before it is
translated using the spelling-to-phoneme rules. Only the lower-case
version of the characters needs to be specified. eg.

  .replace\
    ô   ő   // (Hungarian) allow the use of o-circumflex instead of
o-double-accute\
    û   ű

    cx   ĉ   // (Esperanto) allow "cx" as an alternative to c-circumflex

    fi   fi   // replace a single character ligature by two characters

+ 46
- 0
docs/editor.md View File

@@ -0,0 +1,46 @@
ESPEAKEDIT PROGRAM {.western}
------------------

The **espeakedit** program is used to prepare phoneme data for the
eSpeak speech synthesizer.

It has two main functions:

- -

### Installation {.western}

**espeakedit** needs the following packages:\
(The package names mentioned here are those from the Ubuntu "Dapper"
Linux distribution).

- - -

In addition, a modified version of **praat**
([www.praat.org](www.praat.org)) is used to view and analyse WAV sound
files. This needs the package **libmotif3** to run and **libmotif-dev**
to compile.

### Quick Guide {.western}

This will quickly illustrate the main features. Details of the interface
and key commands are given in [editor\_if.html](editor_if.html)

For more detailed information on analysing sound recordings and
preparing phoneme definitions and keyframe data see
[analyse.html](analyse.html) (to be written).

#### Compiling Phoneme Data {.western}

1. 2. 3. 4.

#### Keyframe Sequences {.western}

1. 2. 3. 4. 5. 6. 7.

#### Text and Prosody Windows {.western}

1. 2. 3. 4. 5. 6. 7. 8. 9.

The Prosody window can be used to experiment with different phoneme
lengths and different intonation.

+ 41
- 0
docs/editor_if.md View File

@@ -0,0 +1,41 @@
USER INTERFACE - FORMANT EDITOR {.western}
-------------------------------

### Frame Sequence Display {.western}

The eSpeak editor can display a number of frame-sequencies in tabbed
windows. Each frame can contain a short-time frequency spectrum,
covering the period of one cycle at the sound's pitch. Frames can also
show:

- - - - -

### Text Tab {.western}

Enter text in the top left text window. Click the **Translate** button
to see the phonetic transcription in the text window below. Then click
the **Speak** button to speak the text and show the results in the
**Prosody** tab, if that is open.

If changes are made in the **Prosody** tab, then clicking **Speak** will
speak the modified prosody while **Translate** will revert to the
default prosody settings for the text.

To enter phonetic symbols (Kirschenbaum encoding) in the top left text
window, enclose them within [[ ]].

### Spect Tab {.western}

The "Spect" tab in the left panel of the eSpeak editor shows information
about the currently selected frame and sequence.

- - - - - -

### Key Commands {.western}

- - - - -

USER INTERFACE - PROSODY EDITOR {.western style="margin-left: 1cm"}
-------------------------------

-

+ 52
- 0
docs/index.md View File

@@ -0,0 +1,52 @@
# eSpeak NG - Documentation
======================

### [Usage](commands.md)

### [Languages](languages.md)

### [Voice Files](voices.md)

Voice files specify a language and other characteristics of a voice.

### [Mbrola Voices](mbrola.md)

eSpeak NG can be used as a front-end for Mbrola diphone voices.

### [Pronunciation Dictionary](dictionary.md)

### [Adding a Language](add_language.md)

How to add or improve a language.

### [Phonemes](phonemes.md)

The list of phoneme mnemonics for English, for use in the Pronunciation
Dictionary.

### [Phoneme Tables](phontab.md)

The tables of the phonemes used by each language, with their properties
and sound production.

### [Intonation](intonation.md)

Different intonation "tunes" may be defined for different languages for
clauses which end in full-stop, comma, question-mark, and
exclamation-mark.

### [eSpeak NG Library API](speak_lib.h)

API definition and header file for a shared library version of eSpeak NG.

### [Markup tags](ssml.md)

SSML (Speech Synthesis Markup Language) and HTML tags recognized by
eSpeak NG.

### [The espeakedit program](editor.md)

GUI software to edit vowel files and to compile the phoneme data for use
by eSpeak NG. See also [Espeakedit user interface](editor_if.md).



+ 102
- 0
docs/intonation.md View File

@@ -0,0 +1,102 @@
INTONATION {.western}
----------

In eSpeak's standard intonation model, a "tune" is applied to each
clause depending on its punctuation. Other intonation models may be used
for some languages, such as tone languages.

Named tunes are defined in the text file:
`phsource/intonation`{.western}. This file must be compiled for use by
eSpeak by using the espeakedit program, using the menu option:
`Compile -> Compile intonation data`{.western}.

### Clauses {.western}

The tunes which are used for a language can be specified by using a
`tunes`{.western} statement in a voice file in
`espeak-data/voices`{.western}. eg:

`tunes   s1  c1  q1  e1`{.western}

It's parameters are four tune names which are used for clauses which end
in:

1. 2. 3. 4.

A clause consists of the following parts:

- - - -

### Tune definitions {.western}

Here is an example tune definition from the file
`phsource/intonation`{.western}.

~~~~ {.western}
tune s1
prehead 46 57
headenv fall 16
head 4 80 55 -8 -5
headextend 0 63 38 13 0
nucleus fall 70 18 24 12
nucleus0 fall 64 8
endtune
~~~~

It contains:

**tune** \<tune name\>
: Starts the definition of a tune. The `tune name`{.western} can
be used in a `tunes`{.western} statements in voice files.
**endtune** \<tune name\>
: Ends the definition of a tune.
**prehead** \<start pitch\> \<end pitch\>
: Gives the pitch path for any series of unstressed syllables before
the first stressed syllable.
**headenv** \<envelope\> \<height\>
: Gives the pitch envelope which is used for stressed syllables in the
head (before the nucleus), including `onset`{.western} and
`headlast`{.western} syllables if these are specified.
`height`{.western} gives a pitch range for the envelope.
**head** \<steps\> \<start pitch\> \<end pitch\> \<unstressed start\> \<unstressed end\>
: `start pitch`{.western} and `end pitch`{.western} give a pitch
path for the stressed syllables of the head. `steps`{.western} is
the maximum number of stressed syllables for which this applies. If
there are additional stressed syllables, then the
`headextend`{.western} statement is used for them.
: `unstressed start`{.western} and `unstressed end`{.western} give
a pitch path for unstressed syllables between two stressed
syllables. Their values are relative to the pitch of the previous
stressed syllable. Values are usually negative, meaning that the
unstressed syllables have lower pitch than the previous stressed
syllable.
**headextend** \<percentage list\>
: If the head contains more stressed syllables than is specified by
`steps`{.western}, then `percentage list`{.western} is used. It
contains up to 8 numbers which are used repeatedly for the
additional stressed syllables. A value of 0 corresponds to the lower
the `start pitch`{.western} and `end pitch`{.western} values of the
`head`{.western} statement. 100 corresponds to the higher value.
Negative values and values greater than 100 are allowed.
**nucleus** \<envelope\> \<top pitch\> \<bottom pitch\> \<tail start\> \<tail end\>
: This gives the pitch envelope and pitch range of the last stressed
syllable of the clause. `tail start`{.western} and
`tail end`{.western} give a pitch path for the unstressed syllables
which are after the last stressed syllable.
**nucleus0** \<envelope\> \<top pitch\> \<bottom pitch\>
: This is used instead of `nucleus`{.western} if there are no
unstressed syllables after the last stressed syllable. In this case,
the pitch changes of the nucleus and the tail and both included in
the nucleus.

The following attributes may also be included:

**onset** \<pitch\> \<unstressed start\> \<unstressed end\>
: This specifies the pitch for the first stressed syllable of the
head. If the `onset`{.western} statement is present, then the
`head`{.western} statement used for the stressed syllables after the
first.
**headlast** \<pitch\> \<unstressed start\> \<unstressed end\>
: This specifies the pitch for the last stressed syllable of the head
(i.e. the stressed syllable before the nucleus).


+ 125
- 0
docs/languages.md View File

@@ -0,0 +1,125 @@
3. LANGUAGES {.western}
------------

**Languages**. The eSpeak speech synthesizer supports several languages,
however in many cases these are initial drafts and need more work to
improve them. Assistance from native speakers is welcome for these, or
other new languages. Please contact me if you want to help.

eSpeak does text to speech synthesis for the following languages, some
better than others. Afrikaans, Albanian, Armenian, Cantonese, Catalan,
Croatian, Czech, Danish, Dutch, English, Esperanto, Finnish, French,
German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Italian,
Kurdish, Latvian, Lojban, Macedonian, Mandarin, Norwegian, Polish,
Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swahili,
Swedish, Tamil, Turkish, Vietnamese, Welsh.


#### Help Needed {.western}

Many of these are just experimental attempts at these languages,
produced after a quick reading of the corresponding article on
wikipedia.org. They will need work or advice from native speakers to
improve them. Please contact me if you want to advise or assist with
these or other languages.

The sound of some phonemes may be poorly implemented, particularly [r]
since I'm English and therefore unable to make a "proper" [r] sound.

A major factor is the rhythm or cadance. An Italian speaker told me the
Italian voice improved from "difficult to understand" to "good" by
changing the relative length of stressed syllables. Identifying
unstressed function words in the xx\_list file is also important to make
the speech flow well. See [Adding or Improving a
Language](add_language.html)

#### Character sets {.western}

Languages recognise text either as UTF8 or alternatively in an 8-bit
character set which is appropriate for that language. For example, for
Polish this is Latin2, for Russian it is KOI8-R. This choice can be
overridden by a line in the voices file to specify an ISO 8859 character
set, eg. for Russian the line:

~~~~ {.western style="margin-bottom: 0.5cm"}
charset 5
~~~~

will mean that ISO 8859-5 is used as the 8-bit character set rather than
KOI8-R.

In the case of a language which uses a non-Latin character set (eg.
Greek or Russian) if the text contains a word with Latin characters then
that particular word will be pronounced using English pronunciation
rules and English phonemes. Speaking entirely English text using a Greek
or Russian voice will sound OK, but each word is spoken separately so it
won't flow properly.

Sample texts in various languages can be found at
[http://\<language\>.wikipedia.org](http://meta.wikimedia.org/wiki/List_of_Wikipedias)
and [www.gutenberg.org](http://www.gutenberg.org/)

### 3.1 Voice Files {.western}

A number of Voice files are provided in the
`espeak-data/voices`{.western} directory. You can select one of these
with the **-v \<voice filename\>** parameter to the speak command, eg:

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng -vaf
~~~~

to speak using the Afrikaans voice.

Language voices generally start with the 2 letter [ISO 639-1
code](http://en.wikipedia.org/wiki/ISO_639-1) for the language. If the
language does not have an ISO 639-1 code, then the 3 letter [ISO 639-3
code](http://www.sil.org/iso639-3/codes.asp) can be used.

For details of the voice files see [Voices](voices.html).

#### Default Voice {.western}

### 3.2 English Voices {.western}

### 3.3 Voice Variants {.western}

To make alternative voices for a language, you can make additional voice
files in espeak-data/voices which contains commands to change various
voice and pronunciation attributes. See [voices.html](voices.html).

Alternatively there are some preset voice variants which can be applied
to any of the language voices, by appending `+`{.western} and a variant
name. Their effects are defined by files in
`espeak-data/voices/!v`{.western}.

The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7`{.western} for male
voices, `+f1 +f2 +f3 +f4 +f5 `{.western}for female voices, and
`+croak +whisper`{.western} for other effects. For example:

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng -ven+m3
~~~~

The available voice variants can be listed with:

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng --voices=variant
~~~~

### 3.4 Other Languages {.western}

The eSpeak speech synthesizer does text to speech for the following
additional langauges.

### 3.5 Provisional Languages {.western}

These languages are only initial naive implementations which have had
little or no feedback and improvement from native speakers.

### 3.6 Mbrola Voices {.western}

Some additional voices, whose name start with **mb-** (for example
**mb-en1**) use eSpeak as a front-end to Mbrola diphone voices. eSpeak
does the spelling-to-phoneme translation and intonation. See
[mbrola.html](mbrola.html).

+ 128
- 0
docs/mbrola.md View File

@@ -0,0 +1,128 @@
MBROLA VOICES {.western}
-------------

The Mbrola project is a collection of diphone voices for speech
synthesis. They do not include any text-to-phoneme translation, so this
must be done by another program. The Mbrola voices are cost-free but are
not open source. They are available from the Mbrola website at:\

[http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html](http://www.tcts.fpms.ac.be/synthesis/mbrola/mbrcopybin.html)

eSpeak can be used as a front-end to Mbrola. It provides the
spelling-to-phoneme translation and intonation, which Mbrola then uses
to generate speech sound.

### Voice Names {.western}

To use a Mbrola voice, eSpeak needs information to translate from its
own phonemes to the equivalent Mbrola phonemes. This has been set up for
only some voices so far.

The eSpeak voices which use Mbrola are named as:\
  **mb-**xxx

where xxx is the name of a Mbrola voice (eg. **mb-en1** for the Mbrola
"**en1**" English voice). These voice files are in eSpeak's directory
`espeak-data/voices/mbrola`{.western}.

The installation instructions below use the Mbrola voice "en1" as an
example. You can use other mbrola voices for which there is an
equivalent eSpeak voice in `espeak-data/voices/mbrola`{.western}.

There are some additional eSpeak Mbrola voices which speak English text
using a Mbrola voice for a different language. These contain the name of
the Mbrola voice with a suffix **-en**. For example, the voice
**mb-de4-en** will speak English text with a German accent by using the
Mbrola **de4** voice.

### Windows Installation {.western}

The SAPI5 version of eSpeak uses the mbrola.dll.

1. 2. 3. 4.

### Linux Installation {.western}

From eSpeak version 1.44 onwards, eSpeak calls the mbrola program
directly, rather than passing phoneme data to it using a pipe.

1. 2. 3.

### Mbrola Voice Files {.western}

eSpeak's voice files for Mbrola voices are in directory
`espeak-data/voices/mbrola`{.western}. They contain a line:\
  `mbrola <voice> <translation>`{.western} \
eg.\
  `mbrola en1 en1_phtrans`{.western}

- -

They are binary files which are compiled, using espeakedit, from source
files in `phsource/mbrola`{.western}, see below.

### Mbrola Phoneme Translation Data {.western}

Mbrola phoneme translation files specify translations from eSpeak
phoneme names to mbrola phoneme names. They are referenced from voice
files.

The source files are in `phsource/mbrola`{.western}. These are compiled
using the `espeakedit`{.western} program
(`Compile->Compile mbrola phonemes list`{.western}) to produce data
files in `espeak-data/mbrola_ph`{.western} which are used by eSpeak.

Each line in the mbrola phoneme translation file contains:

`<control> <espeak ph1> <espeak ph2> <percent> <mbrola ph1> [<mbrola ph2>] `{.western}

**\<control\>**

- - - -

**\<espeak ph1\>**\
The eSpeak phoneme which is to be translated to an mbrola phoneme.

**\<espeak ph2\>**\
If this field is not `NULL`{.western}, then the match only occurs if
this field matches the next phoneme. If control bit 1 is set, then the
*previous* rather than the *next* phoneme is matched. This field may
also have the following values:\
`VWL`{.western}   matches any Vowel phoneme.

**\<percent\>**\
If this field is zero then only one mbrola phoneme is used. If this
field is non-zero, then two mbrola phonemes are used, and this value
gives the percentage length of the first mbrola phoneme.

**\<mbrola ph1\>**\
The mbrola phoneme to which the eSpeak phoneme is translated. This
field may be `NULL`{.western}.

**\<mbrola ph2\>**\
The second mbrola phoneme. This field is only used if the \<percent\>
field is not zero.

The list is searched from start to finish, until a match is found.
Therefore, a line with more specific match condition should appear
before a line which matches the same eSpeak phoneme but with a more
general condition.

The file `dictsource/dict_phonemes`{.western} lists the eSpeak phonemes
which are used for each language. Translations for all these should be
given in the mbrola phoneme translation file. In addition, some phonemes
which are referenced from phoneme files (eg.
`phsource/ph_language, phsource/phonemes`{.western}) in lines such as:

~~~~ {.western}
beforenotvowel l/
reduceto a# 0
~~~~

should also be included, even though they don't appear in
`dictsource/dict_phonemes`{.western}.

If the language's \*\_list or \*\_rules files includes rules to speak
words "as English" the mbrola phoneme translation file should include
rules which translate English phonemes into near equivalents, so that
they can spoken by the mbrola voice.

+ 283
- 0
docs/phonemes.md View File

@@ -0,0 +1,283 @@
PHONEMES {.western}
--------

In general a different set of phonemes can be defined for each language.

In most cases different languages inherit the same basic set of
consonants. They can add to these or modify them as needed.

The phoneme mnemonics are based on the scheme by Kirshenbaum which
represents International Phonetic Alphabet symbols using ascii
characters. See:
[www.kirshenbaum.net/IPA/ascii-ipa.pdf](http://www.kirshenbaum.net/IPA/ascii-ipa.pdf).

Phoneme mnemonics can be used directly in the text input to
**espeak-ng**. They are enclosed within double square brackets. Spaces
are used to separate words, and all stressed syllables must be marked
explicitly. eg:\
`[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]`{.western}

### English Consonants {.western}

`[p]`{.western}

`[b]`{.western}

`[t]`{.western}

`[d]`{.western}

`[tS]`{.western}

**ch**urch

`[dZ]`{.western}

**j**udge

`[k]`{.western}

`[g]`{.western}

`[f]`{.western}

`[v]`{.western}

`[T]`{.western}

**th**in

`[D]`{.western}

**th**is

`[s]`{.western}

`[z]`{.western}

`[S]`{.western}

**sh**op

`[Z]`{.western}

plea**s**ure

`[h]`{.western}

`[m]`{.western}

`[n]`{.western}

`[N]`{.western}

si**ng**

`[l]`{.western}

`[r]`{.western}

**r**ed (Omitted if not immediately followed by a vowel).

`[j]`{.western}

**y**es

`[w]`{.western}

**Some Additional Consonants**

\

`[C]`{.western}

German i**ch**

`[x]`{.western}

German bu**ch**

`[l^]`{.western}

Italian **gl**i

`[n^]`{.western}

Spanish **ñ**

### English Vowels {.western}

These are the phonemes which are used by the English spelling-to-phoneme
translations (en\_rules and en\_list). In some varieties of English
different phonemes may have the same sound, but they are kept separate
because they may differ in another variety.

In rhotic accents, such as General American, the phonemes
`[3:], [A@], [e@], [i@], [O@], [U@] `{.western}include the "r" sound.

`[@]`{.western}

alph**a**

schwa

`[3]`{.western}

bett**er**

rhotic schwa. In British English this is the same as `[@]`{.western},
but it includes 'r' colouring in American and other rhotic accents. In
these cases a separate `[r]`{.western} should not be included unless it
is followed immediately by another vowel.

`[3:]`{.western}

n**ur**se

`[@L]`{.western}

simp**le**

`[@2]`{.western}

the

Used only for "the".

`[@5]`{.western}

to

Used only for "to".

`[a]`{.western}

tr**a**p

`[aa]`{.western}

b**a**th

This is `[a]`{.western} in some accents, `[A:]`{.western} in others.

`[a#]`{.western}

**a**bout

This may be `[@]`{.western} or may be a more open schwa.

`[A:]`{.western}

p**al**m

`[A@]`{.western}

st**ar**t

`[E]`{.western}

dr**e**ss

`[e@]`{.western}

squ**are**

`[I]`{.western}

k**i**t

`[I2]`{.western}

**i**ntend

As `[I]`{.western}, but also indicates an unstressed syllable.

`[i]`{.western}

happ**y**

An unstressed "i" sound at the end of a word.

`[i:]`{.western}

fl**ee**ce

`[i@]`{.western}

n**ear**

`[0]`{.western}

l**o**t

`[V]`{.western}

str**u**t

`[u:]`{.western}

g**oo**se

`[U]`{.western}

f**oo**t

`[U@]`{.western}

c**ure**

`[O:]`{.western}

th**ou**ght

`[O@]`{.western}

n**or**th

`[o@]`{.western}

f**or**ce

`[aI]`{.western}

pr**i**ce

`[eI]`{.western}

f**a**ce

`[OI]`{.western}

ch**oi**ce

`[aU]`{.western}

m**ou**th

`[oU]`{.western}

g**oa**t

`[aI@]`{.western}

sc**ie**nce

`[aU@]`{.western}

h**our**

### Some Additional Vowels {.western}

Other languages will have their own vowel definitions, eg:

+--------------------------------------+--------------------------------------+
| `[e]`{.western} | German **eh**, French **é** |
+--------------------------------------+--------------------------------------+
| `[o]`{.western} | German **oo**, French **o** |
+--------------------------------------+--------------------------------------+
| `[y]`{.western} | German **ü**, French **u** |
+--------------------------------------+--------------------------------------+
| `[Y]`{.western} | German **ö**, French **oe** |
+--------------------------------------+--------------------------------------+

`[:] `{.western}can be used to lengthen a vowel, eg `[e:]`{.western}

+ 174
- 0
docs/phontab.md View File

@@ -0,0 +1,174 @@
PHONEME TABLES {.western}
--------------

A phoneme table defines all the phonemes which are used by a language,
together with their properties and the data for their production as
sounds.

Generally each language has its own phoneme table, although additional
phoneme tables can be used for different voices within the language.
These alternatives are referenced from Voice files.

A phoneme table does not need to define all the phonemes used by a
language. It can inherit the phonemes from a previously defined phoneme
table. For example, a phoneme table may redefine (or add) some of the
vowels that it uses, but inherit most of its consonants from a standard
set.

The source files for the phoneme data are in the "phsource" directory in
the espeakedit download package. "Vowel files", which are referenced in
FMT(), VowelStart(), and VowelEnding() instructions are made using the
espeakedit program.

### Phoneme files {.western}

The phoneme tables are defined in a master phoneme file, named
**phonemes**. This starts with the **base** phoneme table followed by
phoneme tables for other languages and voices. These inherit phonemes
from the **base** table or previously defined tables.

In addition to phoneme definitions, the phoneme file can contain the
following:

**include** \<filename\>
: Includes the text of the specified file at this point. This allows
different phoneme tables to be kept in different text files, for
convenience. \<filename\> is a relative path. The included file can
itself contain **include** statements.
**phonemetable** \<name\> \<parent\>
: Starts a new phoneme table, and ends the previous table.\
\<name\> Is the name of this phoneme table. This name is used in
Voice files.\
\<parent\> Is the name of a previously defined phoneme table whose
phoneme definitions are inherited by this one. The name **base**
indicates the first (base) phoneme table.

### Phoneme definitions {.western}

Note: These new Phoneme definitions apply to eSpeak version 1.42.20 and
later.

A phoneme table contains a list of phoneme definitions. Each starts with
the keyword **phoneme** and the phoneme name (this is the name used in
the pronunciation rules in a language's \*\_rules and \*\_list files),
and ends with the keyword **endphoneme**. For example:

~~~~ {.western}
phoneme aI
vowel
starttype #a endtype #i
length 230
FMT(vowels/ai)
endphoneme

phoneme s
vls alv frc sibilant
voicingswitch z
lengthmod 3
Vowelin f1=0 f2=1700 -300 300 f3=-100 80
Vowelout f1=0 f2=1700 -300 250 f3=-100 80 rms=20

IF nextPh(isPause) THEN
WAV(ufric/s_)
ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN
WAV(ufric/s!)
ENDIF
WAV(ufric/s)
endphoneme
~~~~

A phoneme definition contains both static properties and executed
instructions. The instructions may contain conditional statements, so
that the effect of the phoneme may be different depending on adjacent
phonemes, whether the syllable is stressed, etc.

The instructions of a phoneme are interpreted in two different phases.
In the first phase, the instructions may change the phoneme and replace
it by a different phoneme. In the second phase, instructions are used to
produce the sound for the phoneme.

The **import\_phoneme** statement can be used to copy a previously
defined phoneme from a specified phoneme table. For example:

~~~~ {.western}
phoneme t
import_phoneme base/t[
endphoneme
~~~~

means: `phoneme t`{.western} in this phoneme table is a copy of
`phoneme t[`{.western} from phoneme table "base". A **length**
instruction can be used after **import\_phoneme** to vary the length
from the original.

### Phoneme Properties {.western}

Within the phoneme definition the following lines may occur: ( (V)
indicates only for vowels, (C) only for consonants)

### Phoneme Instructions {.western}

Phoneme Instructions may be included within conditional statements.

During the first phase of phoneme interpretation, an instruction which
causes a change to a different phoneme will terminate the instructions.
During the second phase, FMT() and WAV() instructions will terminate the
instructions.

### Conditional Statements {.western}

Phoneme definitions can contain conditional statements such as:

~~~~ {.western}
IF <condition> THEN
<statements>
ENDIF
~~~~

or more generally:

~~~~ {.western}
IF <condition> THEN
<statements>
ELIF <condition> THEN
<statements>
...
ELSE
<statements>
ENDIF
~~~~

where the `ELSE`{.western} and multiple `ELSE`{.western} parts are
optional.

Multiple conditions may be joined with `AND`{.western} or
`OR`{.western}, but not a mixture of `AND`{.western}s and
`OR`{.western}s.

A condition may be preceded by `NOT`{.western}. For example:

~~~~ {.western}
IF <condition> AND NOT <condition> THEN
<statements>
ENDIF
~~~~

**Condition** Can be:

**Attributes**

### Sound Specifications {.western}

There are three ways to produce sounds:

- - -

### Vowel Transitions {.western}

These specify how a consonant affects an adjacent vowel. A consonant may
cause a transition in the vowel's formants as the mouth changes shape
between the consonant and the vowel. The following attributes may be
specified. Note that the maximum rate of change of formant frequencies
is limited by the speak program.



+ 64
- 0
docs/ssml.md View File

@@ -0,0 +1,64 @@
TEXT MARKUP {.western}
-----------

### SSML: Speech Synthesis Markup Language {.western}

The following markup tags and attributes are recognised:

**\<speak\>**

- -

**\<voice\>**

- - - - -

**\<prosody\>**

- - - -

**\<say-as\>**

- - - - -

**\<mark\>** name

**\<s\>**

-

**\<p\>**

-

**\<sub\>** alias

**\<tts:style\>**

- -

**\<audio\>** src

**\<emphasis\>**

-

**\<break\>**

- -

### HTML {.western}

eSpeak can speak HTML text directly, or text containing both SSML and
HTML markup.\
Any unrecognised tags are ignored.

The following tags case a sentence break.\
**\<br\>   \<dd\>   \<li\>   \<img\>   \<td\>  **

The following tags case a paragraph break.\
**\<h1\>   \<h2\>   \<h3\>   \<h4\>   \<hr\>  **

Text between the following tags is ignored.\
**\<script\>   ...   \</script\>  \
\<style\>   ...   \</style\>  **

+ 311
- 0
docs/voices.md View File

@@ -0,0 +1,311 @@
5. VOICES {.western}
---------

### 5.1 Voice Files {.western}

A Voice file specifies a language (and possibly a language variant or
dialect) together with various attributes that affect the
characteristics of the voice quality and how the language is spoken.

Voice files are placed in the `espeak-data/voices`{.western} directory,
or within subdirectories in there.

The available voice files can be listed by:

~~~~ {.western}
espeak-ng --voices
or
espeak-ng --voices=<language>
~~~~

also

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng --voices=<variant>
~~~~

Lists voice variants which can be applied to eSpeak voices.

~~~~ {.western style="margin-bottom: 0.5cm"}
espeak-ng --voices=<mbrola>
~~~~

Lists the Mbrola voices.

### 5.2 Contents of Voice Files {.western}

The **language** attribute is mandatory. All the other attributes are
optional.

#### Identification Attributes {.western}

**name  \<name\>**

A name given to this voice.

**language  \<language code\> [\<priority\>]**

This attribute should appear before the other attributes which are
listed below.

It selects the default behaviour and characteristics for the language,
and sets default values for "phonemes", "dictionary" and other
attributes. The \<language code\> should be a two-letter ISO 639-1
language code. One or more language variant codes may be appended,
separated by hyphens. (eg. en-uk-north).

The optional \<priority\> value gives the preference of this voice
compared with others for the specified language. A low value indicates a
more preferred voice. The default value is 5.

More than one **language** line may be present. A voice may be selected
for other related languages (variants which have the same initial 2
letter language code as the specified language), but it will be less
preferred for these. Different language variants may be specified by
additional **language** lines in order to indicate that this is a
preferred voice for them also. Eg.

~~~~ {.western}
language en-uk-north
language en
~~~~

indicates that this is voice is for the "en-uk-north" dialect, but it is
also a main choice when a general "en" language is specified. Without
the second **language** line, it would be disfavoured for "en" for being
a more specialised voice.

**gender  \<gender\> [\<age\>]**

This attribute is only a label for use in voice selection. It doesn't
change the sound of the voice.

\<gender\> may be male, female, or unknown.\
\<age\> is optional and gives an age in years.

**pitch  \<base\> \<range\>**

Two integer values. The first gives a base pitch to the voice (value in
Hz) The second controls the range of pitches used by the voice. Setting
it equal to the base pitch will give a monotone. The default values are
82 118.

**formant  \<number\> \<frequency\> \<strength\> \<width\>
\<freq\_add\>**

Systematically adjusts the frequency, strength, and width of the
resonance peaks of the voice. Values are percentages of the default
values. Changing these affects the tone/quality of the voice.

**freq\_add**Adds a constant value (in Hz) to the frequency of the
formant peak. The value may be negative.

- - - -

**echo  \<delay\> \<amplitude\>**

Parameter 1 gives the delay in mS (0 to 250mS).\
Parameter 2 gives the echo amplitude (0 to 100).\
Adding some echo can give a clearer or more interesting sound,
especially when listening through a domestic stereo sound system, rather
than small computer speakers.

**tone**

Controls the tone of the sound.\
**tone** is followed by up to 4 pairs of \<frequency\> \<amplitude\>
which define a frequency response graph. Frequency is in Hz and
amplitude is in the range 0 to 255. The default is:

`  `{.western}`tone 600 170  1200 135  2000 110`{.western}

This means that from frequency 0Hz to 600Hz the amplitude is 170. From
600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases
to 110 at 2000Hz and remains at 110 at higher frequencies. This
adjustment applies only to voiced sounds such as vowels and sonorant
consonants (such as [n] and [l]). Unvoiced sounds such as [s] are
unaffected.

This **tone** statement can also appear in
`espeak-data/config`{.western}, in which case it applies to all voices
which don't have their own **tone** statement.

**flutter  \<value\>**

Default value: 2.\
Adds pitch fluctuations to give a wavering or older-sounding voice. A
large value (eg. 20) makes the voice sound "croaky".

**roughness  \<value\>**

Default value: 2. Range 0 - 7\
Reduces the amplitude of alternate waveform cycles in order to make the
voice sound creaky.

**voicing  \<value\>**

Default value: 100.\
Adjusts the strength of formant-synthesized sounds (vowels and sonorant
consonants).

**consonants  \<value\> \<value\>**

Default values: 100, 100.\
Adjusts the strength of noise sounds which are used in consonants. The
first value is the strength of unvoiced consonants such as "s" and "t".
The second value is the strength of the noise component of voiced
consonants such as "z" and "d".

**breath  \<up to 8 integer values\>**

Default values: 0.\
Adds noise which corresponds to the formant frequency peaks. The values
give the strength of noise for each formant peak (formants 1 to 8).

Use together with a low or zero value of the **voicing** attribute to
make a "wisper". For example:\

`breath   75 75 60 40 15 10 breathw  150 150 200 200 400 400 voicing  18 flutter  20 formant   0 100 0 100   // remove formant 0 `{.western}

**breathw  \<up to 8 integer values\>**

These values give bandwidths of the noise peaks of the **breath**
attribute. If **breathw** values are not given, then suitable default
values will be used.

**speed  \<value\>**

Default value 100.\
Adjusts the speaking speed by a percentage of the default rate. This
can be used if a language voice seems faster or slower compared to other
voices.

**phonemes  \<name\>**

Specifies which set of phonemes to use from those contained in the
phontab, phonindex, and phondata data files. This is a **phonemetable**
name as given in the "phoneme" source file.

This parameter is usually not needed as it is set by default to the
first two letters of the "language" parameter. However, different voices
of the same language can use different phoneme sets, to give different
accents.

**dictionary  \<name\>**

Specifies which pair of dictionary files to use. eg. "english" indicates
that *speak-data/en\_dict* should be used to translate from words to
phonemes. This parameter is usually not needed as it is set by default
to the first two letters of "language" parameter.

**dictrules  \<list of rule numbers\>**

Gives a list of conditional dictionary rules which are applied for this
voice. Rule numbers are in the range 0 to 31 and are specific to a
language dictionary. They apply to rules in the language's **\_rules**
dictionary file and also its **\_list** exceptions list. See
[dictionary.html](dictionary.html).

**replace  \<flags\> \<phoneme\> \<replacement phoneme\>**

Replace a phoneme by another whenever it occurs.

\<replacement phoneme\> may be NULL.

Flags: bit 0: replacement only occurs on the final phoneme of a word.\
Flags: bit 1: replacement doesn't occur in stressed syllables.\
eg.

~~~~ {.western}
replace 0 h NULL // drops h's
replace 0 V U // replaces vowel in 'strut' by that in 'foot'
// as occurs in northern British English
replace 3 N n // change 'fishing' to 'fishin' etc.
// (only the last phoneme of a word, only in unstressed syllables)
~~~~

The phoneme mnemonics can be defined for each language, but some are
listed in [phonemes.html](phonemes.html)

**stressLength  \<8 integer values\>**

Eight integer parameters. These control the relative lengths of the
vowels in stressed and unstressed syllables.

- - - - - - - -

**stressAdd  \<8 integer values\>**

Eight integer parameters. These are added to the voice's corresponding
stressLength values. They are used in the voice variant files in
`espeak-data/voices/!v`{.western} to give some variety. Negative values
may be used.

**stressAmp  \<8 integer values\>**

Eight integer parameters. These control the relative amplitudes of the
vowels in stressed and unstressed syllables (see stressLength above).
The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although
these defaults may be different for particular languages.

**intonation  \<param1\>**

- - - -

**charset  \<param1\>**

The ISO 8859 character set number. (not all are implemented).

**dictmin  \<value\>**

Used for some languages to detect if additional language data is
installed. If the size of the compiled dictionary data for the language
(the file `espeak-data/*_dict`{.western}) is less than this size then a
warning is given.

**alphabet2  \<alphabet\> \<language\>**

Used to specify a language to be used to speak words which are written
in a non-native alphabet. eg:

~~~~ {.western style="margin-bottom: 0.5cm"}
alphabet2 cyr ru
~~~~

Alphabets names include: latin, cyr (cyrillic), ar (arabic). The default
language for latin alphabet is English.

**dictdialect  \<dialect\>**

Words can be marked in the \*\_list or \*\_rules file to be spoken using
a foreign voice. This **dictdialect** attribute can be used to specify
which dialect of the foreign language should be used, instead of the
default dialect. The currently available dialects are:\
**en-us** (US English)\
**es-la** (Latin American Spanish).\
eg.

~~~~ {.western style="margin-bottom: 0.5cm"}
dictdialect en-us
~~~~

This means that any words or rules which are maked with \_\^\_EN will be
spoken with the US English voice instead of the default UK English
voice.

Additional attributes are available to set various internal options
which control how language is processed. These would normally be set in
the program code rather than in a voice file.

A number of Voice files are provided in the
`espeak-data/voices`{.western} directory. You can select one of these
with the **-v \<voice filename\>** parameter to the speak command.

**default**

This voice is used if none is specified in the speak command. You can
copy your preferred voice to "default" so you can use the speak command
without the need to specify a voice.

For a list of voices provided for English and other languages see
[Languages](languages.html).

Loading…
Cancel
Save