123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244 |
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
- <html>
-
- <head>
- <title>eSpeak: Voice Files</title>
- <meta name="GENERATOR" content="Quanta Plus">
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- </head>
- <body>
- <A href="index.html">Back</A>
- <hr>
- <h2>5. VOICES</h2>
- <hr>
- <h3>5.1 Voice Files</h3>
- A Voice file specifies a language (and possibly a language variant or dialect) together with various attributes that affect the characteristics of the voice quality and how the language is spoken.<p>
- Voice files are placed in the <code>espeak-data/voices</code> directory, or within subdirectories in there.<p>
- The available voice files can be listed by:<pre>
- espeak --voices
- or
- espeak --voices=<language></pre>
- <hr>
- <h3>5.2 Contents of Voice Files</h3>
- The <strong>language</strong> attribute is mandatory. All the other attributes are optional.
- <p>
- <h4>Identification Attributes</h4>
- <ul>
- <dl>
- <dt>
- <strong>name <name></strong><br>
- <dd>A name given to this voice.
- <p>
- <dt>
- <strong>language <language code> [<priority>]</strong><br>
- <dd>This attribute should appear before the other attributes which are listed below.<p>
- It selectes the default behaviour and characteristics for the language, and sets default values for
- "phonemes", "dictionary" and other attributes. The <language code> should be a two-letter ISO 639-1 language code. One or more language variant codes may be appended, separated by hyphens. (eg. en-uk-north).<p>
- The optional <priority> value gives the preference of this voice compared with others for the specified language. A low value indicates a more preferred voice. The default value is 5.<p>
- More than one <strong>language</strong> line may be present. A voice may be selected for other related languages (variants which have the same initial 2 letter language code as the specified language), but it will be less preferred for these. Different language variants may be specified by additional <strong>language</strong> lines in order to indicate that this is a preferred voice for them also. Eg.<pre>
- language en-uk-north
- language en</pre>
- indicates that this is voice is for the "en-uk-north" dialect, but it is also a main choice when a general "en" language is specified. Without the second <strong>language</strong> line, it would be disfavoured for "en" for being a more specialised voice.
- <p>
- <dt>
- <strong>gender <gender> [<age>]</strong><br>
- <dd>
- This attribute is only a label for use in voice selection. It doesn't change the sound of the voice.
- <ul><gender> may be male, female, or unknown.<br>
- <age> is optional and gives an age in years.
- </dl>
- </ul>
- <h4>Voice Attributes</h4>
- <ul>
- <dl>
- <dt>
- <strong>pitch <base> <range></strong><br>
- <dd> Two integer values.
- The first gives a base pitch to the voice (value in Hz)
- The second controls the range of pitches used by the voice. Setting
- it equal to the base pitch will give a monotone. The default values are 82 118.
- <p>
- <dt>
- <strong>formant <number> <frequency> <strength> <width> <freq_add></strong><br>
- <dd> Systematically adjusts the frequency, strength, and width of the
- resonance peaks of the voice. Values are percentages of the
- default values. Changing these affects the tone/quality of the voice.<p>
- <strong>freq_add </strong> Adds a constant value (in Hz) to the frequency of the formant peak. The value may be negative.
- <ul>
- <li>Formants 1,2,3 are the standard three formants which define vowels.</li>
- <li>Formant 0 is used to give a low frequency component to the sounds, of
- frequency lower than F1.</li>
- <li>Formants 4,5 are higher than F3. They affect the quality of the voice.</li>
- <li>Formants 6,7,8 are weak, high frequency, additions to vowels to give
- a clearer sound.</li>
- </ul>
- <p>
- <dt>
- <strong>echo <delay> <amplitude></strong><br>
- <dd> Parameter 1 gives the delay in mS (0 to 250mS).<br>
- Parameter 2 gives the echo amplitude (0 to 100).<br>
-
- Adding some echo can give a clearer or more interesting sound,
- especially when listening through a domestic stereo sound system,
- rather than small computer speakers.
- <dt>
- <strong>tone</strong><br>
- <dd> Controls the tone of the sound.<br>
- <strong>tone</strong> is followed by up to 4 pairs of <frequency> <amplitude> which define a frequency response graph. Frequency is
- in Hz and amplitude is in the range 0 to 255. The default is:<p>
- <code> tone 600 170 1200 135 2000 110</code><p>
- This means that from frequency 0Hz to 600Hz the amplitude is 170. From
- 600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases to 110 at 2000Hz
- and remains at 110 at higher frequencies. This adjustment applies only to voiced sounds such as
- vowels and sonorant consonants (such as [n] and [l]). Unvoiced sounds such
- as [s] are unaffected.<p>
- This <strong>tone</strong> statement can also appear in <code>espeak-data/config</code>, in which case it applies to all voices which
- don't have their own <strong>tone</strong> statement.
- <p>
- <dt>
- <strong>flutter <value></strong><br>
- <dd> Default value: 2.<br>
-
- Adds pitch fluctuations to give a wavering or older-sounding voice.
- A large value (eg. 20) makes the voice sound "croaky".
- <p>
- <dt>
- <strong>roughness <value></strong><br>
- <dd> Default value: 2. Range 0 - 7<br>
-
- Reduces the amplitude of alternate waveform cycles in order to make the voice sound creaky.
- <p>
- <dt>
- <strong>voicing <value></strong><br>
- <dd> Default value: 100.<br>
-
- Adjusts the strength of formant-synthesized sounds (vowels and sonorant consonants).
- <p>
- <dt>
- <strong>breath <up to 8 integer values></strong><br>
- <dd> Default values: 0.<br>
-
- Adds noise which corresponds to the formant frequency peaks. The values give the strength
- of noise for each formant peak (formants 1 to 8).
- <p>
- Use together with a low or zero value of the <strong>voicing</strong> attribute to make a "wisper".
- For example:<br>
- <code>breath 75 75 60 40 15 10<br>
- breathw 150 150 200 200 400 400<br>
- voicing 18<br>
- flutter 20<br>
- formant 0 100 0 100 // remove formant 0
- </code>
-
- <p>
- <dt>
- <strong>breathw <up to 8 integer values></strong><br>
- <dd>
- These values give bandwidths of the noise peaks of the <strong>breath</strong> attribute. If <strong>breathw</strong> values are not given, then suitable default values will be used.
- <p>
- </dl>
- </ul>
- <h4>Language Attributes</h4>
- <ul>
- <dl>
- <p>
- <dt>
- <strong>phonemes <name></strong><br>
- <dd>Specifies which set of phonemes to use from those contained in the
- phontab, phonindex, and phondata data files.
- This is a <strong>phonemetable</strong> name as given in the "phoneme" source file.
- <p>
- This parameter is usually not needed as it is set by default to the first two letters of the "language" parameter.
- However, different voices of the same language can use different phoneme sets, to give different accents.
- </dd>
- <dt>
- <strong>dictionary <name></strong><br>
- <dd> Specifies which pair of dictionary files to use. eg. "english"
- indicates that <em>speak-data/en_dict</em> should
- be used to translate from words to phonemes. This parameter is usually
- not needed as it is set by default to the first two letters of "language" parameter.</dd>
- <p>
- <dt>
- <strong>dictrules <list of rule numbers></strong><br>
- <dd>
- Gives a list of conditional dictionary rules which are applied for this voice. Rule numbers are in the range 0 to 31 and are specific to a language dictionary. They apply to rules in the langauge's <b>_rules</b> dictionary file and also its <b>_list</b> exceptions list.
- See <a href="dictionary.html">dictionary.html</a>.
- </dd>
- <p>
- <dt>
- <strong>replace <flags> <phoneme> <replacement phoneme></strong><br>
- <dd> Replace a phoneme by another whenever it occurs.<p>
- <replacement phoneme> may be NULL.<p>
- Flags: bit 0: replacement only occurs on the final phoneme of a word.<br>
- Flags: bit 1: replacement doesn't occur in stressed syllables.<br>
- eg.
- <pre>
- replace 0 h NULL // drops h's
- replace 0 V U // replaces vowel in 'strut' by that in 'foot'
- // as occurs in northern British English
- replace 3 N n // change 'fishing' to 'fishin' etc.
- // (only the last phoneme of a word, only in unstressed syllables)
- </pre>
- The phoneme mnemonics can be defined for each language, but some are listed in <A href="phonemes.html">phonemes.html</A>
- </dd>
- <p>
- <dt>
- <strong>stressLength <8 integer values></strong><br>
- <dd> Eight integer parameters. These control the relative lengths of the vowels in
- stressed and unstressed syllables.
- <ul>
- <li> 0 unstressed
- </li><li> 1 diminished. Its use depends on the language. In English it's used for unstressed syllables within multisyllabic words. In Spanish it's used for unstressed final syllables.
- </li><li> 2 secondary stress
- </li><li> 3 words marked as "unstressed" in the dictionary
- </li><li> 4 not currently used
- </li><li> 5 not currently used
- </li><li> 6 stressed syllable (the main syllable in stressed words)
- </li><li> 7 tonic syllable (by default, the last stressed syllable in the clause)
- </li></ul>
- </dd>
- <p>
- <dt>
- <strong>stressAdd <8 integer values></strong><br>
- <dd> Eight integer parameters. These are added to the voice's corresponding stressLength values. They are used in the voice variant files in <code>espeak-data/voices/!v</code> to give some variety. Negative values may be used.</dd>
- <p>
- <dt>
- <strong>stressAmp <8 integer values></strong><br>
- <dd> Eight integer parameters. These control the relative amplitudes of the vowels in
- stressed and unstressed syllables (see stressLength above).
- The general default values are: 16, 16, 20, 20, 20, 24, 24, 22, although these defaults may be different for particular languages.</dd>
- <p>
- <dt>
- <strong>intonation <param1></strong><br>
- <dd><ul>
- <li>1 Default.</li>
- <li>2 Less intonation.</li>
- <li>3 Less intonation, and comma does not raise the pitch.</li>
- <li>4 Pitch rises (rather than falls) at the end of sentence.</li>
- </ul>
- </dd>
- <p>
- <dt>
- <strong>charset <param1></strong><br>
- <dd>
- The ISO 8859 character set number. (not all are implemented).
- </dd>
- <p>
- Additional attributes are available to set various internal options which control how language is processed. These would normally be set in the program code rather than in a voice file.
- </ul>
- <hr>
- <h3>5.3 Voice Files Provided</h3>
- A number of Voice files are provided in the <code>espeak-data/voices</code> directory.
- You can select one of these with the <strong>-v <voice filename></strong> parameter to the
- speak command.
- <p>
- <dl>
- <dt>
- <strong>default</strong><br>
- <dd> This voice is used if none is specified in the speak command. Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice.</dd>
- </dl>
- For a list of voices provided for English and other languages see <a href="languages.html">Languages</a>.
-
-
- </body>
- </html>
|