Browse Source

docs: improve the language code documentation

master
Reece H. Dunn 9 years ago
parent
commit
91779563dd
2 changed files with 64 additions and 39 deletions
  1. 16
    22
      docs/add_language.md
  2. 48
    17
      docs/voices.md

+ 16
- 22
docs/add_language.md View File

# Adding or Improving a Language # Adding or Improving a Language


- [Language Code](#language-code) - [Language Code](#language-code)
- [Language Files](#language-files)
- [Language](#language) - [Language](#language)
- [Accent](#accent) - [Accent](#accent)
- [Language Family](#language-family) - [Language Family](#language-family)
- [Language Files](#language-files)
- [Voice File](#voice-file) - [Voice File](#voice-file)
- [Phoneme Definition File](#phoneme-definition-file) - [Phoneme Definition File](#phoneme-definition-file)
- [Dictionary Files](#dictionary-files) - [Dictionary Files](#dictionary-files)
list of valid tags originate from various standards and have been combined list of valid tags originate from various standards and have been combined
into the into the
[IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). [IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry).
Additional private-use tags for other accents and dialects are defined in the
[bcp47-extensions](https://raw.githubusercontent.com/espeak-ng/bcp47-data/master/bcp47-extensions)
file of the [bcp47-data](https://github.com/rhdunn/bcp47-data) project.


### Language ### Language


* `de` (German) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1) * `de` (German) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1)
2-letter language code for the language. 2-letter language code for the language.


__NOTE:__ BCP 47 uses ISO 639-1 codes for languages that are allocated
2-letter codes (e.g. using `en` instead of `eng`).

* `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3) * `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3)
3-letter language codes for the language. 3-letter language codes for the language.


* `ta-Arab` (Tamil written in the Arabic alphabet) -- The * `ta-Arab` (Tamil written in the Arabic alphabet) -- The
[ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code. [ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code.


__NOTE:__ The language tags listed in the IANA Language Subtag Registry should
be used instead of those from the standards they were inherited from. For
example, ISO 639-3 duplicates languages found in ISO 639-1, but BCP 47 always
uses the ISO 639-1 form when available. That is, ISO 639-3 `eng` is never used
for English in BCP 47.

__NOTE:__ Where the script is the primary script for the language, the script
tag should be omitted.
__NOTE:__ Where the script is the primary script for the language, the script
tag should be omitted.


### Accent ### Accent


language tags for accents that cannot be described using the available language tags for accents that cannot be described using the available
BCP 47 language tags. BCP 47 language tags.


__NOTE:__ If the accent you are trying to describe cannot be specified using
the above system, raise an issue in the
[bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use
tag will be defined for that accent.
__NOTE:__ If the accent you are trying to describe cannot be specified using
the above system, raise an issue in the
[bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use
tag will be defined for that accent.


### Language Family ### Language Family




The following files are needed for your language. The following files are needed for your language.


* `espeak-data/voices/fr`. The voice file. This gives the language name and
may set some options.
* `espeak-data/voices/roa/fr`. The voice file. This gives the language name
and may set some options.
* `phsource/ph_french`. The phoneme definition file. This contains phoneme * `phsource/ph_french`. The phoneme definition file. This contains phoneme
definitions for the vowels and consonants which the language uses. Usually definitions for the vowels and consonants which the language uses. Usually
it will contain mostly vowels. Most consonants will be inherited from the it will contain mostly vowels. Most consonants will be inherited from the
attributes such as "unstressed" and "pause" to some common words. attributes such as "unstressed" and "pause" to some common words.


The `fr_rules` and `fr_list` files are compiled to produce the The `fr_rules` and `fr_list` files are compiled to produce the
file `espeak-data/fr_dict`, which eSpeak uses when it is speaking.
`espeak-data/fr_dict` file, which eSpeak uses when it is speaking.


## Voice File ## Voice File


Each language needs a voice file in `espeak-data/voices` or
`espeak-data/voices/test`. The filename of the default voice for a
language should be the same as the language code (eg. "fr" for French).
Each language needs a voice file in `espeak-data/voices` grouped by the
[language family](#language-family). The filename of the default voice for a
language should be the same as the language code (e.g. `fr` for French).


Details of the contents of voice files are given in [Voices](voices.md). Details of the contents of voice files are given in [Voices](voices.md).



+ 48
- 17
docs/voices.md View File

characteristics of the voice quality and how the language is spoken. characteristics of the voice quality and how the language is spoken.


Voice files are located in the `espeak-data/voices` directory, and are Voice files are located in the `espeak-data/voices` directory, and are
grouped by the language family of the language being specified in the
voice files.
grouped by the [ISO 639-5](https://en.wikipedia.org/wiki/ISO_639-5)
language family of the language being specified in the voice files.
See also Wikipedia's
[List of language families] (https://en.wiktionary.org/wiki/Wiktionary:List_of_families)
for more details.


The `default` voice is used if none is specified in the speak command. You The `default` voice is used if none is specified in the speak command. You
can copy your preferred voice to "default" so you can use the speak command can copy your preferred voice to "default" so you can use the speak command
and sets default values for "phonemes", "dictionary" and other and sets default values for "phonemes", "dictionary" and other
attributes. attributes.


The \<language code\> is a
[BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) language tag.
When this is not enough to identify an accent, the
[bcp47-data](https://github.com/rhdunn/bcp47-data) accents file describes
the private use tags used by eSpeak NG. For example:
The \<language code\> is a valid
[BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) language tag. The
list of valid tags originate from various standards and have been combined
into the
[IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry).
For example:

* `de` (German) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1)
2-letter language code for the language.

__NOTE:__ BCP 47 uses ISO 639-1 codes for languages that are allocated
2-letter codes (e.g. using `en` instead of `eng`).

* `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3)
3-letter language codes for the language.

* `ta-Arab` (Tamil written in the Arabic alphabet) -- The
[ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code.

__NOTE:__ Where the script is the primary script for the language, the script
tag should be omitted.

* `es-419` (Spanish (Latin America)) -- The
[UN M.49](https://en.wikipedia.org/wiki/UN_M.49) 3-number region codes.

* `fr-CA` (French (Canada)) -- Using the
[ISO 3166-2](https://en.wikipedia.org/wiki/ISO_3166-2) 2-letter region codes.

* `en-GB-scotland` (English (Scotland)) -- This is using the BCP 47 variant
tags.

* `en-GB-x-rp` (English (Received Pronunciation)) -- This is using the
[bcp47-extensions](https://raw.githubusercontent.com/espeak-ng/bcp47-data/master/bcp47-extensions)
language tags for accents that cannot be described using the available
BCP 47 language tags.


* `en` -- English
* `en-GB-scotland` -- English with a Scottish accent
* `en-GB-x-rp` -- English with a Received Pronunciation accent
* `es-419` -- Spanish with a Latin American accent
* `fr-CA` -- French with a Canadian accent
__NOTE:__ If the accent you are trying to describe cannot be specified using
the above system, raise an issue in the
[bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use
tag will be defined for that accent.


The optional \<priority\> value gives the preference of this voice The optional \<priority\> value gives the preference of this voice
compared with others for the specified language. A low value indicates a compared with others for the specified language. A low value indicates a
more preferred voice. The default value is 5. more preferred voice. The default value is 5.
additional `language` lines in order to indicate that this is a additional `language` lines in order to indicate that this is a
preferred voice for them also. E.g. preferred voice for them also. E.g.


language en-uk-north
language en-GB-x-gbclan
language en language en


indicates that this is voice is for the "en-uk-north" dialect, but it is
also a main choice when a general "en" language is specified. Without
the second `language` line, it would be disfavoured for "en" for being
indicates that this is voice is for the `en-GB-x-gbclan` dialect, but it is
also a main choice when a general `en` language is specified. Without
the second `language` line, it would be disfavoured from `en` for being
a more specialised voice. a more specialised voice.


### gender ### gender

Loading…
Cancel
Save