| # Adding or Improving a Language | # Adding or Improving a Language | ||||
| - [Language Code](#language-code) | |||||
| - [Language](#language) | |||||
| - [Accent](#accent) | |||||
| - [Considerations Before Preparation](#considerations-before-preparation) | |||||
| - [Language Tag](#language-tag) | |||||
| - [Language Family](#language-family) | - [Language Family](#language-family) | ||||
| - [Language Files](#language-files) | |||||
| - [Voice File](#voice-file) | |||||
| - [Phoneme Definition File](#phoneme-definition-file) | |||||
| - [Dictionary Files](#dictionary-files) | |||||
| - [Accent (optional)](#accent-optional) | |||||
| - [Configuration Files](#configuration-files) | |||||
| - [Makefile.am file](#makefileam-file) | |||||
| - [Phonemes file](#phonemes-file) | |||||
| - [Voice File](#voice-file) | |||||
| - [Phoneme Definition File](#phoneme-definition-file) | |||||
| - [Dictionary Files](#dictionary-files) | |||||
| - [Program Code](#program-code) | - [Program Code](#program-code) | ||||
| - [Compiling Rules File for Debugging](#compiling-rules-file-for-debugging) | - [Compiling Rules File for Debugging](#compiling-rules-file-for-debugging) | ||||
| - [Improving a Language](#improving-a-language) | - [Improving a Language](#improving-a-language) | ||||
| ---------- | ---------- | ||||
| Most of the work doesn't need any programming knowledge. Just an | |||||
| understanding of the language, an awareness of its features, patience | |||||
| and attention to detail. Wikipedia is a good source of basic phonetic | |||||
| information, e.g. | |||||
| ## Considerations Before Preparation | |||||
| Most of the work doesn't need any programming knowledge, but, to get immediate | |||||
| feedback, by running and testing eSpeak, | |||||
| you should be able to [build](../README.md#building) it. | |||||
| You also have to understand the language main concepts, be aware of its features, | |||||
| and have to have patience and attention to detail. | |||||
| Wikipedia is a good source of basic phonetic information, e.g. | |||||
| [http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel). | [http://en.wikipedia.org/wiki/Vowel](http://en.wikipedia.org/wiki/Vowel). | ||||
| In many cases it should be fairly easy to add a rough implementation of | In many cases it should be fairly easy to add a rough implementation of | ||||
| a new language, hopefully enough to be intelligible. After that it's a | a new language, hopefully enough to be intelligible. After that it's a | ||||
| gradual process of improvement. | gradual process of improvement. | ||||
| ## Language Code | |||||
| ### Language Tag | |||||
| The language is identified using the | The language is identified using the | ||||
| [BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) language tag. The | |||||
| list of valid tags originate from various standards and have been combined | |||||
| [BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) language tag. | |||||
| The list of valid tags originate from various standards and have been combined | |||||
| into the | into the | ||||
| [IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). | [IANA Language Subtag Registry](http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). | ||||
| ### Language | |||||
| These language tags are used to specify the language, such as: | These language tags are used to specify the language, such as: | ||||
| * `de` (German) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1) | |||||
| * `fr` (French) -- The [ISO 639-1](https://en.wikipedia.org/wiki/ISO_639-1) | |||||
| 2-letter language code for the language. | 2-letter language code for the language. | ||||
| __NOTE:__ BCP 47 uses ISO 639-1 codes for languages that are allocated | __NOTE:__ BCP 47 uses ISO 639-1 codes for languages that are allocated | ||||
| * `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3) | * `yue` (Cantonese) -- The [ISO 639-3](https://en.wikipedia.org/wiki/ISO_639-3) | ||||
| 3-letter language codes for the language. | 3-letter language codes for the language. | ||||
| * `ta-Arab` (Tamil written in the Arabic alphabet) -- The | |||||
| * `ta-arab` (Tamil written in the Arabic alphabet) -- The | |||||
| [ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code. | [ISO 15924](https://en.wikipedia.org/wiki/ISO_15924) 4-letter script code. | ||||
| __NOTE:__ Where the script is the primary script for the language, the script | __NOTE:__ Where the script is the primary script for the language, the script | ||||
| tag should be omitted. | tag should be omitted. | ||||
| ### Accent | |||||
| ### Language Family | |||||
| The language tags are also used to specify the accent or dialect of a language, | |||||
| The voices are grouped by the closest language family the language belongs. | |||||
| These language families are defined in | |||||
| [ISO 639-5](https://en.wikipedia.org/wiki/ISO_639-5). See also Wikipedia's | |||||
| [List of language families] (https://en.wiktionary.org/wiki/Wiktionary:List_of_families) | |||||
| for more details. | |||||
| For example, the Celtic languages (Welsh, Irish Gaelic, Scottish Gaelic, etc.) | |||||
| are listed under the `cel` language family code. | |||||
| ### Accent (optional) | |||||
| If necessary, the language tags are also used to specify the accent or dialect of a language, | |||||
| such as: | such as: | ||||
| * `es-419` (Spanish (Latin America)) -- The | * `es-419` (Spanish (Latin America)) -- The | ||||
| [bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use | [bcp47-data](https://github.com/rhdunn/bcp47-data) project and a private use | ||||
| tag will be defined for that accent. | tag will be defined for that accent. | ||||
| ### Language Family | |||||
| ## Configuration Files | |||||
| The voices are grouped by the closest language family the language belongs. | |||||
| These language families are defined in | |||||
| [ISO 639-5](https://en.wikipedia.org/wiki/ISO_639-5). See also Wikipedia's | |||||
| [List of language families] (https://en.wiktionary.org/wiki/Wiktionary:List_of_families) | |||||
| for more details. | |||||
| To add new language, you have to create or edit following files: | |||||
| For example, the Celtic languages (Welsh, Irish Gaelic, Scottish Gaelic, etc.) | |||||
| are listed under the `cel` language family code. | |||||
| |path/file |action | | |||||
| |------------------------------|--------| | |||||
| | Makefile.am |edit | | |||||
| | phsource/phonemes |edit | | |||||
| | phsource/ph_french |create | | |||||
| | dictsource/fr_list |create | | |||||
| | dictsource/fr_rules |create | | |||||
| | dictsource/fr_extrc |create (optional) | | |||||
| | espeak-data/voices/roa/fr |create | | |||||
| ## Language Files | |||||
| where: | |||||
| The following files are needed for your language. | |||||
| * __french__ is name of the newly created language | |||||
| * __fr__ is the code of this language | |||||
| * __roa__ is the family of this language | |||||
| * `espeak-data/voices/roa/fr`. The voice file. This gives the language name | |||||
| and may set some options. | |||||
| * `phsource/ph_french`. The phoneme definition file. This contains phoneme | |||||
| definitions for the vowels and consonants which the language uses. Usually | |||||
| it will contain mostly vowels. Most consonants will be inherited from the | |||||
| common phoneme definitions in the master phoneme file, `phsource/phonemes`. | |||||
| The master phoneme file needs to be edited to call your new `ph_french` file. | |||||
| * `dictsource/fr_rules`. This contains the spelling-to-phoneme translation | |||||
| rules. | |||||
| * `dictsource/fr_list`. This contains pronunciations for numbers, letter and | |||||
| symbol names, and words with exceptional pronunciations. It also gives | |||||
| attributes such as "unstressed" and "pause" to some common words. | |||||
| The `fr_rules` and `fr_list` files are compiled to produce the | |||||
| `espeak-data/fr_dict` file, which eSpeak uses when it is speaking. | |||||
| ### Makefile.am File | |||||
| `Makefile.am` is build configuration file. | |||||
| Search for configuration of existing languages (e.g. English) | |||||
| and add similar lines for your language in following sections. | |||||
| E.g. for French: | |||||
| phsource/phonemes.stamp: \ | |||||
| ... | |||||
| phsource/ph_french \ | |||||
| ... | |||||
| dictionaries: \ | |||||
| ... | |||||
| espeak-ng-data/fr_dict \ | |||||
| ... | |||||
| fr: espeak-ng-data/fr_dict | |||||
| dictsource/fr_extra: | |||||
| touch dictsource/fr_extra | |||||
| espeak-ng-data/fr_dict: src/espeak-ng phsource/phonemes.stamp dictsource/fr_list dictsource/fr_rules dictsource/fr_extra | |||||
| cd dictsource && ESPEAK_DATA_PATH=$(PWD) LD_LIBRARY_PATH=../src:${LD_LIBRARY_PATH} ../src/espeak-ng --compile=fr && cd .. | |||||
| ... | |||||
| Note, that you don't need to add `fr_extra` reference in the last group, if your language doesn't have this file. | |||||
| ### Phonemes File | |||||
| Open file `phsource/phonemes` and add following lines into it, | |||||
| to make it call your new, e.g. `ph_french` file: | |||||
| ... | |||||
| phonemetable fr base | |||||
| include ph_french | |||||
| ... | |||||
| ### Voice File | |||||
| ## Voice File | |||||
| E.g. `espeak-data/voices/roa/fr` is the voice file for French. | |||||
| This gives the language name and may set some options. | |||||
| Each language needs a voice file in `espeak-data/voices` grouped by the | Each language needs a voice file in `espeak-data/voices` grouped by the | ||||
| [language family](#language-family). The filename of the default voice for a | [language family](#language-family). The filename of the default voice for a | ||||
| language should be the same as the language code (e.g. `fr` for French). | language should be the same as the language code (e.g. `fr` for French). | ||||
| Details of the contents of voice files are given in [Voices](voices.md). | |||||
| The simplest voice file would contain just 2 lines to give the language | The simplest voice file would contain just 2 lines to give the language | ||||
| name and language code, eg: | name and language code, eg: | ||||
| implementation of a new language by using the phoneme table of an | implementation of a new language by using the phoneme table of an | ||||
| existing language. | existing language. | ||||
| ## Phoneme Definition File | |||||
| Details of the contents of voice files are given in [Voices](voices.md). | |||||
| ### Phoneme Definition File | |||||
| E.g. `phsource/ph_french` is the phoneme definition file for French. | |||||
| This contains phoneme definitions for the vowels and consonants which the language uses. | |||||
| Usually it will contain mostly vowels. Most consonants will be inherited from the | |||||
| common phoneme definitions in the _master phoneme file_: `phsource/phonemes`. | |||||
| You must first decide on the set of phonemes (vowel and consonant | You must first decide on the set of phonemes (vowel and consonant | ||||
| sounds) for the language. These should be defined in a phoneme | sounds) for the language. These should be defined in a phoneme | ||||
| definition file `ph_xxxx`, where `ph_xxxx` is the name of your | |||||
| definition file `ph_french`, where `ph_french` is the name of your | |||||
| language. A reference to this file is then included at the end of the | language. A reference to this file is then included at the end of the | ||||
| master phoneme file, `phsource/phonemes`, e.g.: | master phoneme file, `phsource/phonemes`, e.g.: | ||||
| define vowel phonemes, will be sufficient. At least for an initial | define vowel phonemes, will be sufficient. At least for an initial | ||||
| implementation. | implementation. | ||||
| ## Dictionary Files | |||||
| ### Dictionary Files | |||||
| There are usually two dictionary files, e.g. for French: | |||||
| * `dictsource/fr_list`. This contains pronunciations for numbers, letter and | |||||
| symbol names, and words with exceptional pronunciations. It also gives | |||||
| attributes such as "unstressed" and "pause" to some common words. | |||||
| The `fr_list` file contains: | |||||
| * Pronunciations which exceptions to the rules in `fr_rules`, (e.g. foreign | |||||
| names). | |||||
| * Pronunciation of letter names, symbol names, and punctuation names. | |||||
| * Pronunciation of numbers. | |||||
| * Attributes for words. For example, common function words which should not | |||||
| be stressed, or conjunctions which should be preceded by a pause. | |||||
| * `dictsource/fr_rules`. This contains the spelling-to-phoneme translation | |||||
| rules. | |||||
| Details of the contents of the dictionary files are given in | |||||
| [Dictionary](dictionary.md). | |||||
| The `fr_rules` and `fr_list` files are compiled to produce the | |||||
| `espeak-data/fr_dict` file, which eSpeak uses when it is speaking. | |||||
| Once the language's phonemes have been defined, then pronunciation | Once the language's phonemes have been defined, then pronunciation | ||||
| dictionary data can be produced in order to translate the language's | dictionary data can be produced in order to translate the language's | ||||
| make fr | make fr | ||||
| Details of the contents of the dictionary files are given in | |||||
| [Dictionary](dictionary.md). | |||||
| The `fr_list` file contains: | |||||
| * Pronunciations which exceptions to the rules in `fr_rules`, (e.g. foreign | |||||
| names). | |||||
| * Pronunciation of letter names, symbol names, and punctuation names. | |||||
| * Pronunciation of numbers. | |||||
| * Attributes for words. For example, common function words which should not | |||||
| be stressed, or conjunctions which should be preceded by a pause. | |||||
| ## Program Code | ## Program Code | ||||
| The behaviour of the eSpeak program is controlled by various options | The behaviour of the eSpeak program is controlled by various options |