| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244 | 
							- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 - <html>
 - 
 - <head>
 -   <title>eSpeak: Voice Files</title>
 -   <meta name="GENERATOR" content="Quanta Plus">
 -   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 - </head>
 - <body>
 - <A href="index.html">Back</A>
 - <hr>
 - <h2>5. VOICES</h2>
 - <hr>
 - <h3>5.1 Voice Files</h3>
 - A Voice file specifies a language (and possibly a language variant or dialect) together with various attributes that affect the characteristics of the voice quality and how the language is spoken.<p>
 - Voice files are placed in the <code>espeak-data/voices</code> directory, or within subdirectories in there.<p>
 - The available voice files can be listed by:<pre>
 -    espeak --voices
 - or
 -    espeak --voices=<language></pre>
 - <hr>
 - <h3>5.2 Contents of Voice Files</h3>
 - The <strong>language</strong> attribute is mandatory.  All the other attributes are optional.
 - <p>
 - <h4>Identification Attributes</h4>
 - <ul>
 - <dl>
 - <dt>
 - <strong>name  <name></strong><br>
 - <dd>A name given to this voice.
 - <p>
 - <dt>
 - <strong>language  <language code> [<priority>]</strong><br>
 - <dd>This attribute should appear before the other attributes which are listed below.<p>
 - It selectes the default behaviour and characteristics for the language, and sets default values for
 - "phonemes", "dictionary" and other attributes. The <language code> should be a two-letter ISO 639-1 language code.  One or more language variant codes may be appended, separated by hyphens.  (eg.  en-uk-north).<p>
 - The optional <priority> value gives the preference of this voice compared with others for the specified language.  A low value indicates a more preferred voice.  The default value is 5.<p>
 - More than one <strong>language</strong> line may be present.  A voice may be selected for other related languages (variants which have the same initial 2 letter language code as the specified language), but it will be less preferred for these.  Different language variants may be specified by additional <strong>language</strong> lines in order to indicate that this is a preferred voice for them also.  Eg.<pre>
 -    language en-uk-north
 -    language en</pre>
 - indicates that this is voice is for the "en-uk-north" dialect, but it is also a main choice when a general "en" language is specified.  Without the second <strong>language</strong> line, it would be disfavoured for "en" for being a more specialised voice.
 - <p>
 - <dt>
 - <strong>gender  <gender> [<age>]</strong><br>
 - <ul><gender> may be  male, female, or unknown.<br>
 - <age> is optional and gives an age in years.
 - </dl>
 - </ul>
 - <h4>Voice Attributes</h4>
 - <ul>
 - <dl>
 - <dt>
 - <strong>pitch  <base> <range></strong><br>
 - <dd>   Two integer values.
 -    The first gives a base pitch to the voice (value in Hz)
 -    The second controls the range of pitches used by the voice. Setting
 -    it equal to the base pitch will give a monotone. The default values are 82 118.
 - <p>
 - <dt>
 - <strong>formant  <number> <frequency> <strength> <width></strong><br>
 - <dd>   Systematically adjusts the frequency, strength, and width of the
 -    resonance peaks of the voice.  Values are percentages of the
 -    default values.  Changing these affects the tone/quality of the voice.
 - <ul>
 -    <li>Formants 1,2,3 are the standard three formants which define vowels.</li>
 -    <li>Formant 0 is used to give a low frequency component to the sounds, of
 -       frequency lower than F1.</li>
 -    <li>Formants 4,5 are higher than F3.  They affect the quality of the voice.</li>
 -    <li>Formants 6,7,8 are weak, high frequency, additions to vowels to give
 -       a clearer sound.</li>
 - </ul>
 - <p>
 - <dt>
 - <strong>echo  <delay> <amplitude></strong><br>
 - <dd>   Parameter 1 gives the delay in mS  (0 to 250mS).<br>
 -    Parameter 2 gives the echo amplitude (0 to 100).<br>
 - 
 -    Adding some echo can give a clearer or more interesting sound,
 -    especially when listening through a domestic stereo sound system,
 -    rather than small computer speakers.
 - <dt>
 - <strong>tone</strong><br>
 - <dd>  Controls the tone of the sound.<br>
 - <strong>tone</strong> is followed by up to 4 pairs of <frequency> <amplitude> which define a frequency response graph.  Frequency is
 - in Hz and amplitude is in the range 0 to 255. The default is:<p>
 - <code>   tone 600 170  1200 135  2000 110</code><p>
 - This means that from frequency 0Hz to 600Hz the amplitude is 170. From
 - 600Hz to 1200Hz the amplitude decreases from 170 to 135, then decreases to 110 at 2000Hz
 - and remains at 110 at higher frequencies.  This adjustment applies only to voiced sounds such as
 - vowels and sonorant consonants (such as [n] and [l]). Unvoiced sounds such
 - as [s] are unaffected.<p>
 - This <strong>tone</strong> statement can also appear in <code>espeak-data/config</code>, in which case it applies to all voices which
 - don't have their own <strong>tone</strong> statement.
 - <p>
 - <dt>
 - <strong>flutter  <value></strong><br>
 - <dd>   Default value: 2.<br>
 - 
 -    Adds pitch fluctuations to give a wavering or older-sounding voice.
 -    A large value (eg. 20) makes the voice sound "croaky".
 - <p>
 - <dt>
 - <strong>roughness  <value></strong><br>
 - <dd>   Default value: 2. Range 0 - 7<br>
 - 
 -    Reduces the amplitude of alternate waveform cycles in order to make the voice sound creaky.
 - <p>
 - <dt>
 - <strong>voicing  <value></strong><br>
 - <dd>   Default value: 100.<br>
 - 
 -    Adjusts the strength of formant-synthesized sounds (vowels and sonorant consonants).
 - <p>
 - <dt>
 - <strong>breath  <up to 8 integer values></strong><br>
 - <dd>   Default values: 0.<br>
 - 
 -    Adds noise which corresponds to the formant frequency peaks.  The values give the strength
 -    of noise for each formant peak (formants 1 to 8).
 - <p>
 -    Use together with a low or zero value of the <strong>voicing</strong> attribute to make a "wisper".
 -    For example:<br>
 -    <code>breath   75 75 60 40 15 10<br>
 -          breathw  150 150 200 200 400 400<br>
 -          voicing  18<br>
 -          flutter  20<br>
 -          formant   0 100 0 100   // remove formant 0
 -    </code>
 - 
 - <p>
 - <dt>
 - <strong>breathw  <up to 8 integer values></strong><br>
 - <dd> 
 -    These values give bandwidths of the noise peaks of the <strong>breath</strong> attribute.  If <strong>breathw</strong> values are not given, then suitable default values will be used.
 - <p>
 - </dl>
 - </ul>
 - <h4>Language Attributes</h4>
 - <ul>
 - <dl>
 - <p>
 - <dt>
 - <strong>phonemes  <name></strong><br>
 - <dd>Specifies which set of phonemes to use from those contained in the
 -    phontab, phonindex, and phondata data files.
 -    This is a <strong>phonemetable</strong> name as given in the "phoneme" source file.
 - <p>
 -    This parameter is usually not needed as it is set by default to the first two letters of the "language" parameter.
 -    However, different voices of the same language can use different phoneme sets, to give different accents.
 - </dd>
 - <dt>
 - <strong>dictionary  <name></strong><br>
 - <dd>   Specifies which pair of dictionary files to use.  eg. "english"
 -    indicates that <em>speak-data/en_dict</em> should
 -    be used to translate from words to phonemes.  This parameter is usually
 -    not needed as it is set by default to the first two letters of "language" parameter.</dd>
 - <p>
 - <dt>
 - <strong>dictrules  <list of rule numbers></strong><br>
 - <dd>
 - Gives a list of conditional dictionary rules which are applied for this voice.  Rule numbers are in the range 0 to 31 and are specific to a language.  They can apply to rules in the langauge's <b>_rules</b> dictionary file and also its <b>_list</b> exceptions list.
 - See <a href="dictionary.html">dictionary.html</a>.
 - </dd>
 - <p>
 - <dt>
 - <strong>replace  <flags> <phoneme> <replacement phoneme></strong><br>
 - <dd>   Replace a phoneme by another whenever it occurs.<p>
 -    <replacement phoneme> may be NULL.<p>
 -    Flags: bit 0:  replacement only occurs on the final phoneme of a word.<br>
 -    Flags: bit 1:  replacement doesn't occur in stressed syllables.<br>
 -    eg.
 - <pre>
 -       replace  0  h  NULL      // drops h's
 -       replace  0  V  U         // replaces vowel in 'strut' by that in 'foot'
 -                                // as occurs in northern British English
 -       replace  3  N  n         // change 'fishing' to 'fishin' etc.
 -                                // (only the last phoneme of a word, only in unstressed syllables)
 - </pre>
 -    The phoneme mnemonics can be defined for each language, but some are listed in <A href="phonemes.html">phonemes.html</A>
 - </dd>
 - <p>
 - <dt>
 - <strong>stressLength  <8 integer values></strong><br>
 - <dd>   Eight integer parameters.  These control the relative lengths of the vowels in
 -    stressed and unstressed syllables.
 - <ul>
 - <li>      0   unstressed
 - </li><li>      1   diminished. Its use depends on the language. In English it's used for unstressed syllables within multisyllabic words. In Spanish it's used for unstressed final syllables.
 - </li><li>      2   secondary stress
 - </li><li>      3   words marked as "unstressed" in the dictionary
 - </li><li>      4      not currently used
 - </li><li>      5      not currently used
 - </li><li>      6   stressed syllable (the main syllable in stressed words)
 - </li><li>      7   tonic syllable (by default, the last stressed syllable in the clause)
 - </li></ul>
 - </dd>
 - <p>
 - <dt>
 - <strong>stressAdd  <8 integer values></strong><br>
 - <dd>   Eight integer parameters.  These are added to the voice's corresponding stressLength values.  They are used in the voice variant files in <code>espeak-data/voices/!v</code> to give some variety.  Negative values may be used.</dd>
 - <p>
 - <dt>
 - <strong>stressAmp  <8 integer values></strong><br>
 - <dd>   Eight integer parameters.  These control the relative amplitudes of the vowels in
 -    stressed and unstressed syllables (see stressLength above).
 -    The general default values are:  16, 16, 20, 20, 20, 24, 24, 22, although these defaults may be different for particular languages.</dd>
 - <p>
 - <dt>
 - <strong>intonation  <param1> <param2></strong><br>
 - <dd>   (for further development)<br>
 - 
 - 
 - </dd>
 - <p>
 - <dt>
 - <strong>charset  <param1></strong><br>
 - <dd>
 - The ISO 8859 character set number. (not all are implemented).
 - </dd>
 - <p>
 - Additional attributes are available to set various internal options which control how language is processed.  These would normally be set in the program code rather than in a voice file.
 - <p>
 - <dt>
 - <strong>stressrule  <param1> <param2> <param3> <param4></strong><br>
 - <dd>
 - Controls how different stress levels are applied to the syllables of a word.
 - </dd>
 - </ul>
 - <hr>
 - <h3>5.3 Voice Files Provided</h3>
 - A number of Voice files are provided in the <code>espeak-data/voices</code> directory.
 - You can select one of these with the <strong>-v <voice filename></strong> parameter to the
 - speak command.
 - <p>
 - <dl>
 - <dt>
 - <strong>default</strong><br>
 - <dd>   This voice is used if none is specified in the speak command.  Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice.</dd>
 - </dl>
 - For a list of voices provided for English and other languages see <a href="languages.html">Languages</a>.
 - 
 - 
 - </body>
 - </html>
 
 
  |