9 years ago · 91779563dd
--- a/docs/add_language.md
+++ b/docs/add_language.md
@@ -1,10 +1,10 @@
 # Adding or Improving a Language

 - [Language Code](#language-code)
 - [Language Files](#language-files)
  - [Language](#language)
  - [Accent](#accent)
  - [Language Family](#language-family)
 - [Language Files](#language-files)
 - [Voice File](#voice-file)
 - [Phoneme Definition File](#phoneme-definition-file)
 - [Dictionary Files](#dictionary-files)
@@ -31,9 +31,6 @@ The language is identified using the
 list of valid tags originate from various standards and have been combined
 into the
 [IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry).
 Additional private-use tags for other accents and dialects are defined in the
 [bcp47-extensions](https://raw.githubusercontent.com/espeak-ng/bcp47-data/master/bcp47-extensions)
 file of the [bcp47-data](https://github.com/rhdunn/bcp47-data) project.

 ### Language

@@ -42,20 +39,17 @@ These language tags are used to specify the language, such as:
 *  `de` (German) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1)
   2-letter language code for the language.

   __NOTE:__ BCP 47 uses ISO 639-1 codes for languages that are allocated
   2-letter codes (e.g. using `en` instead of `eng`).

 *  `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3)
   3-letter language codes for the language.

 *  `ta-Arab` (Tamil written in the Arabic alphabet) -- The
   [ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code.

 __NOTE:__ The language tags listed in the IANA Language Subtag Registry should
 be used instead of those from the standards they were inherited from. For
 example, ISO 639-3 duplicates languages found in ISO 639-1, but BCP 47 always
 uses the ISO 639-1 form when available. That is, ISO 639-3 `eng` is never used
 for English in BCP 47.

 __NOTE:__ Where the script is the primary script for the language, the script
 tag should be omitted.
   __NOTE:__ Where the script is the primary script for the language, the script
   tag should be omitted.

 ### Accent

@@ -76,10 +70,10 @@ such as:
   language tags for accents that cannot be described using the available
   BCP 47 language tags.

 __NOTE:__ If the accent you are trying to describe cannot be specified using
 the above system, raise an issue in the
 [bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use
 tag will be defined for that accent.
   __NOTE:__ If the accent you are trying to describe cannot be specified using
   the above system, raise an issue in the
   [bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use
   tag will be defined for that accent.

 ### Language Family

@@ -96,8 +90,8 @@ are listed under the `cel` language family code.

 The following files are needed for your language.

  * `espeak-data/voices/fr`. The voice file. This gives the language name and
    may set some options.
  * `espeak-data/voices/roa/fr`. The voice file. This gives the language name
    and may set some options.
  * `phsource/ph_french`. The phoneme definition file. This contains phoneme
    definitions for the vowels and consonants which the language uses. Usually
    it will contain mostly vowels. Most consonants will be inherited from the
@@ -110,13 +104,13 @@ The following files are needed for your language.
    attributes such as "unstressed" and "pause" to some common words.

 The `fr_rules` and `fr_list` files are compiled to produce the
 file `espeak-data/fr_dict`, which eSpeak uses when it is speaking.
 `espeak-data/fr_dict` file, which eSpeak uses when it is speaking.

 ## Voice File

 Each language needs a voice file in `espeak-data/voices` or
 `espeak-data/voices/test`. The filename of the default voice for a
 language should be the same as the language code (eg. "fr" for French).
 Each language needs a voice file in `espeak-data/voices` grouped by the
 [language family](#language-family). The filename of the default voice for a
 language should be the same as the language code (e.g. `fr` for French).

 Details of the contents of voice files are given in [Voices](voices.md).

--- a/docs/voices.md
+++ b/docs/voices.md
@@ -39,8 +39,11 @@ dialect) together with various attributes that affect the
 characteristics of the voice quality and how the language is spoken.

 Voice files are located in the `espeak-data/voices` directory, and are
 grouped by the language family of the language being specified in the
 voice files.
 grouped by the [ISO 639-5](https://en.wikipedia.org/wiki/ISO_639-5)
 language family of the language being specified in the voice files.
 See also Wikipedia's
 [List of language families] (https://en.wiktionary.org/wiki/Wiktionary:List_of_families)
 for more details.

 The `default` voice is used if none is specified in the speak command. You
 can copy your preferred voice to "default" so you can use the speak command
@@ -65,19 +68,47 @@ It selects the default behaviour and characteristics for the language,
 and sets default values for "phonemes", "dictionary" and other
 attributes.

 The \<language code\> is a
 [BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) language tag.
 When this is not enough to identify an accent, the
 [bcp47-data](https://github.com/rhdunn/bcp47-data) accents file describes
 the private use tags used by eSpeak NG. For example:
 The \<language code\> is a valid
 [BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) language tag. The
 list of valid tags originate from various standards and have been combined
 into the
 [IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry).
 For example:

 *  `de` (German) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1)
   2-letter language code for the language.

   __NOTE:__ BCP 47 uses ISO 639-1 codes for languages that are allocated
   2-letter codes (e.g. using `en` instead of `eng`).

 *  `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3)
   3-letter language codes for the language.

 *  `ta-Arab` (Tamil written in the Arabic alphabet) -- The
   [ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code.

   __NOTE:__ Where the script is the primary script for the language, the script
   tag should be omitted.

 *  `es-419` (Spanish (Latin America)) -- The
   [UN M.49](https://en.wikipedia.org/wiki/UN_M.49) 3-number region codes.

 *  `fr-CA` (French (Canada)) -- Using the
   [ISO 3166-2](https://en.wikipedia.org/wiki/ISO_3166-2) 2-letter region codes.

 *  `en-GB-scotland` (English (Scotland)) -- This is using the BCP 47 variant
   tags.

 *  `en-GB-x-rp` (English (Received Pronunciation)) -- This is using the
   [bcp47-extensions](https://raw.githubusercontent.com/espeak-ng/bcp47-data/master/bcp47-extensions)
   language tags for accents that cannot be described using the available
   BCP 47 language tags.

 *  `en` -- English
 *  `en-GB-scotland` -- English with a Scottish accent
 *  `en-GB-x-rp` -- English with a Received Pronunciation accent
 *  `es-419` -- Spanish with a Latin American accent
 *  `fr-CA` -- French with a Canadian accent
   __NOTE:__ If the accent you are trying to describe cannot be specified using
   the above system, raise an issue in the
   [bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use
   tag will be defined for that accent.

 
 The optional \<priority\> value gives the preference of this voice
 compared with others for the specified language. A low value indicates a
 more preferred voice. The default value is 5.
@@ -89,12 +120,12 @@ preferred for these. Different language variants may be specified by
 additional `language` lines in order to indicate that this is a
 preferred voice for them also. E.g.

 	language en-uk-north
 	language en-GB-x-gbclan
 	language en

 indicates that this is voice is for the "en-uk-north" dialect, but it is
 also a main choice when a general "en" language is specified. Without
 the second `language` line, it would be disfavoured for "en" for being
 indicates that this is voice is for the `en-GB-x-gbclan` dialect, but it is
 also a main choice when a general `en` language is specified. Without
 the second `language` line, it would be disfavoured from `en` for being
 a more specialised voice.

 ### gender