@@ -1,112 +0,0 @@ | |||
# Table of contents | |||
* [ANALYSIS](#analysis) | |||
* [Recording Sounds](#recording-sounds) | |||
* [Praat](#praat) | |||
* [Vowels and Diphthongs](#vowels-and-diphthongs) | |||
* [Analysing a Recording](#analysing-a-recording) | |||
* [Tone Quality](#tone-quality) | |||
* [Using an Existing Vowel File](#using-an-existing-vowel-file) | |||
* [Length and Amplitude](#length-and-amplitude) | |||
* [Using the New Vowel](#using-the-new-vowel) | |||
# ANALYSIS | |||
(Further notes are needed) | |||
Recordings of spoken words and phrases can be analysed to try and make | |||
eSpeak NG match a language more closely. Unlike most other (larger and | |||
better quality) synthesizers, of eSpeak NG data is not produced directly | |||
from recorded sounds. To use an analogy, it's like a drawing or sketch | |||
compared with a photograph. Or vector graphics compared with a bitmap | |||
image. It's smaller, less accurate, with less subtlety, but it can | |||
sometimes show some aspects of the picture more clearly than a more | |||
accurate image. | |||
## Recording Sounds | |||
Recordings should be made while speaking slowly, clearly, and firmly and | |||
loudly (but not shouting). Speak about half a metre from the microphone. | |||
Try to avoid background noise and hum interference from electrical power | |||
cables. | |||
## Praat | |||
I use a modified version of the praat program | |||
([www.praat.org](http://www.praat.org)) to view and analyse both sound | |||
recordings and output from eSpeak NG. The modification adds a new function | |||
(**Spectrum->To_eSpeak**) which analysis a voiced sound and | |||
produces a file which can be loaded into espeakedit. Details of the | |||
modification are in the `praat-mod` directory in the | |||
espeakedit package. The analysis contains a sequence of frames, one per | |||
cycle at the speech's fundamental frequency. Each frame is a short time | |||
spectrum, together with praat's estimation of the f1 to f5 formant | |||
frequencies at the time of that cycle. I also use Praat's | |||
**New->Record_mono_sound** function to make sound recordings. | |||
# Vowels and Diphthongs | |||
## Analysing a Recording | |||
Make a recording, with a male voice, and trim it in Praat to keep just | |||
the required vowel sound. Then use the new | |||
**Spectrum->To_eSpeak** modification (this was named | |||
`To_Spectrogram2` in earlier versions) to analyse the sound. | |||
It produces a file named `spectrum.dat`. Load the | |||
`spectrum.dat` file into espeakedit. Espeakedit has two Open | |||
functions, **File->Open**. They are | |||
the same, except that they remember different paths. I generally use | |||
**File->Open2** file. | |||
The data is displayed in espeakedit as a sequence of spectrum frames | |||
(see [editor](editor.md)). | |||
## Tone Quality | |||
It can be difficult to match the tonal quality of a new vowel to be | |||
compatible with existing vowel files. This is determined by the relative | |||
heights and widths of the formant peaks. These vary depending on how the | |||
recording was made, the microphone, and the strength and tone of the | |||
voice. Also the positions of the higher peaks (F3 upwards) can vary | |||
depending on the characteristics of the speaker's voice. Formant peaks | |||
correspond to resonances within the mouth and throat, and they depend on | |||
its size and shape. With a female voice, all the formants (F1 upwards) | |||
are generally shifted to higher frequencies. For these reasons, it's | |||
best to use a male voice, and to use its analysed spectra only as | |||
guidance. Rather than construct formant-peaks entirely to match the | |||
analysed data, instead copy keyframes from a similar existing vowel. | |||
Then make small adjustments to match the position of the F1, F2, F3 | |||
formant peaks and hopefully produce the required vowel sound. | |||
## Using an Existing Vowel File | |||
Choose a similar vowel file from `phsource/vowel` and open it | |||
into espeakedit. It may be useful to use | |||
`phsource/vowel/vowelchart` as a map to show how vowel files | |||
compare with each other. You can select a keyframe from the vowel file | |||
and use CTRL-C and CTRL-V to copy the green formant peaks onto a frame | |||
of the new spectrum sequence. Then adjust the peaks to match the new | |||
frame. Press F1 to hear the sound of the formant peaks in the selected | |||
frame. The F0 peak is provided in order to adjust the correct balance of | |||
low frequencies, below the F1 peak. If the sound is too muffled, or | |||
conversely, too "thin", try adjusting the amplitude or position of the | |||
F0 peak. | |||
## Length and Amplitude | |||
Use an existing vowel file as a guide for how to set the amplitude and | |||
length of the keyframes. At the right of each keyframe, its length is | |||
shown in mili seconds and under that is its relative (RMS) amplitude. The second | |||
keyframe should be marked with a red marker (use CTRL-M to toggle this). | |||
This divides the vowel into the front-part (with one frame), and the | |||
rest. Use F2 to play the sound of the new vowel sequence. It will also | |||
produce a WAV file (the default name is speech.wav) which you can read | |||
into praat to see whether it has a sensible shape. | |||
## Using the New Vowel | |||
Make a new directory (eg. `vwl\_xx`) in phsource for your new vowels. Save | |||
the spectrum sequence with a name which you have chosen for it. You can | |||
then edit the phoneme file for your language (eg. `phsource/ph\_xxx`), and | |||
change a phoneme to refer to your new vowel file. Then do | |||
**Data->Compile_Phoneme_Data** from espeakedit's menubar to | |||
re-compile the phoneme data. |
@@ -1,215 +0,0 @@ | |||
# Table of contents | |||
* [Linux and other Posix systems](#linux-and-other-posix-systems) | |||
* [Dependencies](#dependencies) | |||
* [Windows](#windows) | |||
* [Examples](#examples) | |||
* [The Command Line Options](#the-command-line-options) | |||
* [The Input Text](#the-input-text) | |||
# INSTALLATION | |||
## Linux and other Posix systems | |||
There are two versions of the command line program. They both have the | |||
same command parameters (see below). | |||
1. **espeak** uses speech engine in the **libespeak** shared library. The libespeak library must first be installed. | |||
1. **speak** is a stand-alone version which includes its own copy of the speech engine. | |||
Place the **espeak-ng** or **speak-ng** executable file in the command | |||
path, e.g. in **/usr/local/bin** | |||
Place the **espeak-data** directory in **/usr/share** as **/usr/share/espeak-data**. | |||
Alternatively if it is placed in the user's home directory (e.g. **/home/\<user\>/espeak-data**) then that will be used instead. | |||
## Dependencies | |||
**espeak-ng** uses the PortAudio sound library (version 18), so you will need to have the **libportaudio0** library package installed. It may be already, since it' used by other software, such as OpenOffice.org and the Audacity sound editor. | |||
Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio which has a slightly different API. The speak program can be compiled to use version 19 of PortAudio by copying the file portaudio19.h to portaudio.h before compiling. | |||
The speak program may be compiled without using PortAudio, by removing | |||
the line | |||
#define USE_PORTAUDIO | |||
in the file speech.h. | |||
## Windows | |||
The installer: **setup\_espeak.exe** installs the SAPI5 version of | |||
eSpeak NG. During installation you need to specify which voices you want to | |||
appear in SAPI5 voice menus. | |||
It also installs a command line program **espeak-ng** in the espeak-ng | |||
program directory. | |||
## Examples | |||
To use at the command line, type: | |||
``` | |||
espeak-ng "This is a test" | |||
``` | |||
or | |||
``` | |||
espeak-ng -f <text file> | |||
``` | |||
or just type | |||
``` | |||
espeak-ng | |||
``` | |||
followed by text on subsequent lines. Each line is spoken when RETURN | |||
is pressed. | |||
Use **espeak-ng -x** to see the corresponding phoneme codes. | |||
## The Command Line Options | |||
* **espeak-ng [options] ["text words"]** | |||
Text input can be taken either from a file, from a string in the command, or from stdin. | |||
* **-f \<text file\>** | |||
Speaks a text file. | |||
* **--stdin** | |||
Takes the text input from stdin. If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes). If that is not present then text is taken from stdin, but each line is treated as a separate sentence. | |||
* **-a \<integer\>** | |||
Sets amplitude (volume) in a range of 0 to 200. The default is 100. | |||
* **-p \<integer\>** | |||
Adjusts the pitch in a range of 0 to 99. The default is 50. | |||
* **-s \<integer\>** | |||
Sets the speed in words-per-minute (approximate values for the | |||
default English voice, others may differ slightly). The default | |||
value is 175. I generally use a faster speed of 260. The lower limit | |||
is 80. There is no upper limit, but about 500 is probably a | |||
practical maximum. | |||
* **-b \<integer\>** | |||
Input text character format. | |||
1 UTF-8. This is the default. | |||
2 The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish). | |||
4 16 bit Unicode. | |||
Without this option, eSpeak NG assumes text is UTF-8, but will | |||
automatically switch to the 8-bit character set if it finds an | |||
illegal UTF-8 sequence. | |||
* **-g \<integer\>** | |||
Word gap. This option inserts a pause between words. The value is | |||
the length of the pause, in units of 10 mS (at the default speed of | |||
170 wpm). | |||
* **-h** or **--help** | |||
The first line of output gives the eSpeak NG version number. | |||
* **-k \<integer\>** | |||
Indicate words which begin with capital letters. | |||
1 eSpeak NG uses a click sound to indicate when a word starts with a | |||
apital letter, or double click if word is all capitals. | |||
2 eSpeak NG speaks the word "capital" before a word which begins with | |||
capital letter. | |||
Other values: eSpeak NG increases the pitch for words which begin | |||
with a capital letter. The greater the value, the greater the | |||
increase in pitch. Try -k20. | |||
* **-l \<integer\>** | |||
Line-break length, default value 0. If set, then lines which are | |||
shorter than this are treated as separate clauses and spoken | |||
separately with a break between them. This can be useful for some | |||
text files, but bad for others. | |||
* **-m** | |||
Indicates that the text contains SSML (Speech Synthesis Markup | |||
Language) tags or other XML tags. Those SSML tags which are | |||
supported are interpreted. Other tags, including HTML, are ignored, | |||
except that some HTML tags such as \<hr\> \<h2\> and \<li\> ensure a | |||
break in the speech. | |||
* **-q** | |||
Quiet. No sound is generated. This may be useful with options such | |||
as -x and --pho. | |||
* **-v \<voice filename\>[+\<variant\>]** | |||
Sets a Voice for the speech, usually to select a language. eg: | |||
`espeak-ng -vaf` To use the Afrikaans voice. A modifier after the voice name can be used to vary the tone of the voice, eg: | |||
`espeak-ng -vaf+3` | |||
The variants are `+m1 +m2 +m3 +m4 +m5 +m6 +m7` for male voices and `+f1 +f2 +f3 +f4 `which simulate female voices by using higher pitches. Other variants include `+croak` and `+whisper`. | |||
\<voice filename\> is a file within the `espeak-data/voices` directory.\ | |||
\<variant\> is a file within the `espeak-data/voices/!v` directory. | |||
Voice files can specify a language, alternative pronunciations or phoneme sets, different pitches, tonal qualities, and prosody for the voice. See the [voices](voices.md) file. | |||
Voice names which start with **mb-** are for use with Mbrola diphone voices, see [mbrola](mbrola.md) | |||
Some languages may need additional dictionary data, see [languages](languages.md) | |||
* **-w \<wave file\>** | |||
Writes the speech output to a file in WAV format, rather than speaking it. | |||
* **-x** | |||
The phoneme mnemonics, into which the input text is translated, are written to stdout. If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish this from separate phonemes. | |||
* **-X** | |||
As -x, but in addition, details are shown of the pronunciation rule and dictionary list lookup. This can be useful to see why a certain pronunciation is being produced. Each matching pronunciation rule is listed, together with its score, the highest scoring rule being used in the translation. "Found:" indicates the word was found in the dictionary lookup list, and "Flags:" means the word was found with only properties and not a pronunciation. You can see when a word has been retranslated after removing a prefix or suffix. | |||
* **-z** | |||
The option removes the end-of-sentence pause which normally occurs at the end of the text. | |||
* **--stdout** | |||
Writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced. | |||
* **--compile [=\<voice name\>]** | |||
Compile the pronunciation rule and dictionary lookup data from their source files in the current directory. The Voice determines which language's files are compiled. For example, if it's an English voice, then *en\_rules*, *en\_list*, and *en\_extra* (if present), are compiled to replace *en\_dict* in the *speak-data* directory. If no Voice is specified then the default Voice is used. | |||
* **--compile-debug [=\<voice name\>]** | |||
The same as **--compile**, but source line numbers from the \*\_rules file are included. These are included in the rules trace when the **-X** option is used. | |||
* **--ipa** | |||
Writes phonemes to stdout, using the International Phonetic Alphabet (IPA). If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish this from separate phonemes. | |||
* **--path [="\<directory path\>"]** | |||
Specifies the directory which contains the espeak-data directory. | |||
* **--pho** | |||
When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme data (.pho file format) to stdout. This includes the mbrola phoneme names with duration and pitch information, in a form which is suitable as input to this mbrola voice. The --phonout option can be used to write this data to a file. | |||
* **--phonout [="\<filename\>"]** | |||
If specified, the output from -x, -X, --ipa, and --pho options is written to this file, rather than to stdout. | |||
* **--punct [="\<characters\>"]** | |||
Speaks the names of punctuation characters when they are encountered in the text. If \<characters\> are given, then only those listed punctuation characters are spoken, eg. `--punct=".,;?"` | |||
* **--sep [=\<character\>]** | |||
The character is used to separate individual phonemes in the output which is produced by the -x or --ipa options. The default is a space character. The character z means use a ZWNJ character (U+200c). | |||
* **--split [=\<minutes\>]** | |||
Used with **-w**, it starts a new WAV file every `<minutes>` minutes, at the next sentence boundary. | |||
* **--tie [=\<character\>]** | |||
The character is used within multi-letter phonemes in the output which is produced by the -x or --ipa options. The default is the tie character ͡ U+361. The character z means use a ZWJ character (U+200d). | |||
* **--voices [=\<language code\>]** | |||
Lists the available voices. | |||
If =\<language code\> is present then only those voices which are suitable for that language are listed. | |||
`--voices=mbrola` lists the voices which use mbrola diphone voices. These are not included in the default `--voices` list. | |||
`--voices=variant` lists the available voice variants (voice modifiers). | |||
## The Input Text | |||
* **HTML Input** | |||
If the -m option is used to indicate marked-up text, then HTML can | |||
be spoken directly. | |||
* **Phoneme Input** | |||
As well as plain text, phoneme mnemonics can be used in the text | |||
input to **espeak-ng**. They are enclosed within double square | |||
brackets. Spaces are used to separate words and all stressed | |||
syllables must be marked explicitly. | |||
eg: | |||
`espeak-ng -v en "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]"` | |||
This command will speak: "This is some phonetic text input". | |||
@@ -1,72 +0,0 @@ | |||
# Table of contents | |||
* [Espeakedit program](#espeakedit-program) | |||
* [Installation](#installation) | |||
* [Quick Guide](#quick-guide) | |||
* [Compiling Phoneme Data](#compiling-phoneme-data) | |||
* [Keyframe Sequences](#keyframe-sequences) | |||
* [Text and Prosody Windows](#text-and-prosody-windows) | |||
# Espeakedit program | |||
The **espeakedit** program is used to prepare phoneme data for the eSpeak speech synthesizer. | |||
It has two main functions: | |||
* Prepare keyframe files for individual vowels and voiced consonants. These each contain a sequence of keyframes which define how formant peaks (peaks in the frequency spectrum) vary during the sound. | |||
* Process the master **phonemes** file which, by including the phoneme files for the various languages, defines all their phonemes and references the keyframe files and the sound sample files which they use. **espeakedit** processes these and compiles them into the **phondata**, **phonindex**, and **phontab** files in the **espeak-data** directory which are used by the eSpeak speech synthesizer. | |||
## Installation | |||
**espeakedit** needs the following packages: | |||
(The package names mentioned here are those from the Ubuntu "Dapper" Linux distribution). | |||
* **sox** (a universal sound sample translator) | |||
* **libwxgtk2.6-0** (wxWidgets Cross-platform C++ GUI toolkit) | |||
* **portaudio0** (Portaudio V18, portable audio I/O) | |||
In addition, a modified version of **praat** ([www.praat.org](http://www.praat.org/)) is used to view and analyse WAV sound files. This needs the package **libmotif3** to run and **libmotif-dev** to compile. | |||
## Quick Guide | |||
This will quickly illustrate the main features. Details of the interface and key commands are given in [editor_if](editor_if.md) | |||
For more detailed information on analysing sound recordings and preparing phoneme definitions and keyframe data see [analyse](analyse.md). | |||
### Compiling Phoneme Data | |||
1. Run the `espeakedit` program. | |||
2. Select **Data->Compile phoneme data** from the menu bar. Dialog boxes will ask you to locate the directory (`phsource`) which contains the master phonemes file, and the directory (`dictsource,`) which contains the dictionary files (en_rules, en_list, etc). Once specified, espeakedit will remember their locations, although they can be changed later from **Options->Paths**. | |||
3. A message in the status line at the bottom of the espeakedit window will indicate whether there are any errors in the phoneme data, and how many language's dictionary files have been compiled. The compiled data is placed into the `espeak-data` directory, ready for use by the speak program. If errors are found in the phoneme data, they are listed in a file `error_log` in the `phsource` directory. | |||
NOTE: espeakedit can be used from the command line to compile the phoneme data, with the command: | |||
`espeakedit --compile` | |||
5. Select **Tools->Make vowels chart->From compiled phoneme data**. This will look for the vowels in the compiled phoneme data of each language and produce a vowel chart (.png file) in `phsource/vowelcharts`. These charts plot the vowels' F1 (formant 1) frequency against their F2 frequency, which corresponds approximately to their open/close and front/back positions. The colour in the circle for each vowel indicates its F3 frequency, red indicates a low F3, through yellow and green to blue and violet for a high F3\. In the case of a diphthong, a line is drawn from the circle to the position of the end of the vowel. | |||
### Keyframe Sequences | |||
1. Select **File->Open** from the menu bar and select a vowel file, `phsource/vowel/a`. This will open a tab in the espeakedit window which contains a sequence of 4 keyframes. Each keyframe shows a black graph, which is the outline of an original analysed spectrum from a sound recording, and also a green line, which shows the formant peaks which have been added (using the black graph as a guide) and which produce the sound. | |||
2. Click in the "a" tab window and then press the **F2** key. This will produce and play the sound of the keyframe sequence. The first time you do this, you'll get a save dialog asking where you want the WAV file to be saved. Once you give a location all future sounds will be stored in that same location, although it can be changed from **Options->Paths**. | |||
3. Click on the second of the four frames, the one with the red square. Press **F1**. That plays the sound of just that frame. | |||
4. Press the **1** (number one) key. That selects formant F1 and a red triangle appears under the F1 formant peak to indicate that it's selected. Also an = sign appears next to formant 1 in the formants list in the left panel of the window. | |||
5. Press the left-arrow key a couple of times to move the F1 peak to the left. The red triangle and its associated green formant peak moves lower frequency. Its numeric value in the formants list in the left panel decreases. | |||
6. Press the **F1** key again. The frame will give a slightly different vowel sound. As you move the F1 peak slightly up and down and then press **F1** again, the sound changes. Similarly if you press the **2** key to select the F2 formant, then moving that will also change the sound. If you move the F1 peak down to about 700 Hz (and reduce its height a bit with the down-arrow key) and move F2 up to 1400 Hz, then you'll hear a "er" schwa [@] sound instead of the original [a]. | |||
7. Select **File->Open** and choose `phsource/vowel/aI`. This opens a new tab labelled "aI" which contains more frames. This is the [aI] diphthong and if you click in the tab window and press **F2** you'll hear the English word "eye". If you click on each frame in turn and press **F1** then you can hear each of the keyframes in turn. They sound different, starting with an [A] sound (as in "palm"), going through something like [@] in "her" and ending with something like [I] in "kit" (or perhaps a French é). Together they make the diphthong [aI]. | |||
### Text and Prosody Windows | |||
1. Click on the **Text** tab in the left panel. Two text windows appear in the panel with buttons **Translate** and **Speak** below them. | |||
2. Type some text into the top window and click the **Translate** button. The phonetic translation will appear in the lower window. | |||
3. Click the **Speak** button. The text will be spoken and a **Prosody** tab will open in the main window. | |||
4. Click on a vowel phoneme which is displayed in the Prosody tab. A red line appears under it to indicate that it has been selected. | |||
5. Use the **up-arrow** or **down-arrow** key to move the vowel's blue pitch contour up or down. Then click the **Speak** button again to hear the effect of the altered pitch. If the adjacent phoneme also has a pitch contour then you may hear a discontinuity in the sound if it no longer matches with the one which you have moved. | |||
6. Hold down the **Ctrl** key while using the **up-arrow** or **down-arrow** keys. The gradient of the pitch contour will change. | |||
7. Click with the right mouse button over a phoneme. A menu allows you to select a different pitch envelope shape. Details of the currently selected phoneme appear in the Status line at the bottom of the window. The **Stress** number gives the stress level of the phoneme (see voices.html for a list). | |||
8. Click the **Translate** button. This re-translates the text and restores the original pitches. | |||
9. Click on a vowel phoneme in the Prosody window and use the **<** and **>** keys to shorten or lengthen it. | |||
The Prosody window can be used to experiment with different phoneme lengths and different intonation. | |||
@@ -1,180 +0,0 @@ | |||
# Table of contents | |||
* [User interface - formant editor](#user-interface---formant-editor) | |||
* [Frame Sequence Display](#frame-sequence-display) | |||
* [Text Tab](#text-tab) | |||
* [Spect Tab](#spect-tab) | |||
* [Key Commands](#key-commands) | |||
* [Selection](#selection) | |||
* [Formant movement](#formant-movement) | |||
* [Frame Cut and Paste](#frame-cut-and-paste) | |||
* [Frame editing](#frame-editing) | |||
* [Display and Sound](#display-and-sound) | |||
* [User interface - prosody editor](#user-interface---prosody-editor) | |||
# User interface - formant editor | |||
## Frame Sequence Display | |||
The eSpeak editor can display a number of frame-sequencies in tabbed windows. Each frame can contain a short-time frequency spectrum, covering the period of one cycle at the sound's pitch. Frames can also show: | |||
* Blue vertical lines showing the estimated position of the f1 to f5 formants (if the sequence was produced by praat analysis). These should correspond with the peaks in the spectrum, but may not do so exactly | |||
* Numbers at the right side of the frame showing the position from the start of the sequence in miliseconds, and the pitch of the sound. | |||
* Up to 9 formant peaks (numbered 0 to 9) added by the user, usually to match the peaks in the spectrum, in order to produce the required sound. These are shown in green, can be moved by keyboard presses as described below, and may merge if they are close together. If a frame has formant peaks then it is a Keyframe and is shown with a pale yellow background. | |||
* If formant peaks are present, a relative amplitude (r.m.s.) value is shown at the right side of the frame. | |||
## Text Tab | |||
Enter text in the top left text window. Click the **Translate** button to see the phonetic transcription in the text window below. Then click the **Speak** button to speak the text and show the results in the **Prosody** tab, if that is open. | |||
If changes are made in the **Prosody** tab, then clicking **Speak** will speak the modified prosody while **Translate** will revert to the default prosody settings for the text. | |||
To enter phonetic symbols in [Kirschenbaum](https://en.wikipedia.org/wiki/Kirshenbaum)-like encoding in the top left text window, enclose them within **[[ ]]**. | |||
## Spect Tab | |||
* **Spect** | |||
tab in the left panel of the eSpeak editor shows information about the currently selected frame and sequence. | |||
* **Formants** | |||
section displays the Frequency, Height, and Width of each formant peak (peaks 0 to 8). Peaks 6, 7, 8 don't have a variable width. | |||
* **% amp - Frame** | |||
can be used to adjust the amplitiude of the frame. If you change this value then the rms amplitude value at the right side of the frame will change. | |||
The formant peaks don't change, just the overall amplitude of the frame. | |||
* **mS** | |||
shows the time in miliseconds until the next keyframe (or end of sequence if there is none). | |||
The spin control initially shows the same value, but this can be changed in order to increase or decrease the effctive length of a keyframe. | |||
* **% amp - Sequence** | |||
adjusts the amplitude of the whole sequence. Changing this values changes the rms amplitudes of all the keyframes in the sequence. | |||
* **% mS - Sequence** | |||
shows the total length of the sequence. | |||
* **Graph** | |||
Yellow vertical lines show the position of keyframes within the sequence. | |||
Black bars on these show the frequencies of formant peaks which have been set at these keyframes. | |||
Thick red lines, if present, show the formants, as detected in the original analysis. | |||
Thin black line, if present, shows the pitch profile measured in the original analysis. | |||
## Key Commands | |||
### Selection | |||
The selected frame(s) are shown with a red border. The selected formant peak is also indicated by an equals (**=**) sign next to its number in the "Spect" panel to the right of the window. | |||
The selected formant peak is shown with a red triangle under the peak. | |||
Keyframes are shown with a pale yellow background. A keyframe is any frame with any formant peaks which are not zero height. If all formant peaks become zero height, the frame is no longer a keyframe. If you increase a peak's height the frame becomes a keyframe. | |||
* **Numbers 0 to 8** | |||
Select formant peak number 0 to 8. | |||
* **Page Up/Down** | |||
Move to next/previous frame | |||
### Formant movement | |||
With the following keys, holding down **Shift** causes slower movement. | |||
* **Left** | |||
Moves the selected formant peak to higher frequency. | |||
* **Right** | |||
Moves the selected formant peak to lower frequency. | |||
* **Up** | |||
Increases height of the selected formant peak. | |||
* **Down** | |||
Decreases height of the selected formant peak. | |||
* **<** | |||
Narrows the selected formant peak. | |||
* **>** | |||
Widens the selected formant peak. | |||
* **CTRL <** | |||
Narrows the selected formant peak. | |||
* **CTRL >** | |||
Widens the selected formant peak. | |||
* **/** | |||
Makes the selected formant peak symmetrical. | |||
### Frame Cut and Paste | |||
* **CTRL A** | |||
Select all frames in the sequence. | |||
* **CTRL C** | |||
Copy selected frames to (internal) clipboard. | |||
* **CTRL V** | |||
Paste frames from the clipboard to overwrite the contents of the selected frame and the frames which follow it. Only the formant peaks information is pasted. | |||
* **CTRL SHIFT V** | |||
Paste frames from the clippoard to insert them above the selected frame. | |||
* **CTRL X** | |||
Delete the selected frames. | |||
### Frame editing | |||
* **CTRL D** | |||
Copy the formant peaks down to the selected frame from the next keyframe above. | |||
* **CTRL SHIFT D** | |||
Copy the formant peaks up to the selected frame from the next key-frame below. | |||
* **CTRL Z** | |||
Set all formant peaks in the selected frame to zero height. It is no longer a key-frame. | |||
* **CTRL I** | |||
Set the formant peaks in the selected frame as an interpolation between the next keyframes above and below it. A dialog box allows you to enter a percentage. 50% gives values half-way between the two adjacent key-frames, 0% gives values equal to the one above, and 100% equal to the one below. | |||
### Display and Sound | |||
* **CTRL Q** | |||
Shows interpolated formant peaks on non-keyframes. These frames don't become keyframes until any of the peaks are edited to increase their height. | |||
* **CTRL SHIFT Q** | |||
Removes the interpolated formant peaks display. | |||
* **CTRL G** | |||
Toggle grid on and off. | |||
* **F1** | |||
Play sound made from the one selected keyframe. | |||
* **F2** | |||
Play sound made from all the keyframes in the sequence. | |||
# User interface - prosody editor | |||
* **Left** | |||
Move to previous phoneme. | |||
* **Right** | |||
Move to next phoneme. | |||
* **Up** | |||
Increase pitch. | |||
* **Down** | |||
Decrease pitch. | |||
* **Ctrl Up** | |||
Increase pitch range. | |||
* **Ctrl Down** | |||
Decrease pitch range. | |||
* **>** | |||
Increase length. | |||
* **<** | |||
Decrease length. | |||