Browse Source

cmn: search for dictionary matches instead of translating characters.

cmn (Mandarin chinese) has been broken since 4825905.

This fix makes mandarin behave more like Cantonese. Instead of
translating characters, we search for dictionary matches.

The functionality of normal vs Chao tones should be investigated more.
Looks like latin characters as pinyin still uses Chao tones whereas
the characters in cmn_list and cmn_listx do not.

See #1044 for discussion. See also #1028 and #1163.
master
Juho Hiltunen 2 years ago
parent
commit
1443c970cd
4 changed files with 41 additions and 9 deletions
  1. 0
    1
      dictsource/cmn_list
  2. 0
    1
      dictsource/extra/cmn_listx
  3. 40
    6
      phsource/ph_cmn
  4. 1
    1
      src/libespeak-ng/tr_languages.c

+ 0
- 1
dictsource/cmn_list View File

@@ -93,7 +93,6 @@ z zi51
ㄨ wu55
ㄩ y55

$textmode
// Most frequent pronunciations of the 3799 most common characters (from Unihan database ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip, kHanyuPinlu field with some corrections)
涉 she4
礦 kuang4

+ 0
- 1
dictsource/extra/cmn_listx View File

@@ -1,4 +1,3 @@
$textmode
//From Unihan database ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip kMandarin entries (except the ones that have kHanyuPinlu, which are in zh_list)
//with compounds from CC-CEDICT http://www.mdbg.net/chindict/chindict.php?page=cedict and some corrections
//21611 single characters plus 36500 compound exceptions (includes 320 added 'yi' and 10721 added 'bu' exceptions, and 9700 extra 2-syllable words for 3rd-tone sandhi blocking)

+ 40
- 6
phsource/ph_cmn View File

@@ -1,23 +1,39 @@

//====================================================
// Tone Numbers
// Note:
// For tones 1-5, both normal tones and Chao tones
// are defined
//====================================================

phoneme 11 // tone: low level
// tone: low level
phoneme 11
stress
Tone(12, 9, envelope/i_risefall, NULL)
endphoneme

phoneme 5
stress
Tone(12, 9, envelope/i_risefall, NULL)
endphoneme


phoneme 21 // tone: low fall
stress
Tone(20, 10, envelope/p_fall, NULL)
endphoneme

phoneme 214 // tone: fall rise
// tone: fall rise
phoneme 214
stress
Tone(18, 42, envelope/p_214, NULL)
endphoneme

phoneme 3
stress
Tone(18, 42, envelope/p_214, NULL)
endphoneme


phoneme 22 // tone: mid-low level
stress
Tone(22, 20, envelope/p_fall, NULL)
@@ -28,7 +44,13 @@ phoneme 33 // tone: mid level
Tone(32, 30, envelope/p_fall, NULL)
endphoneme

phoneme 35 // tone: mid rise
// tone: mid rise
phoneme 35
stress
Tone(30, 50, envelope/p_rise, NULL)
endphoneme

phoneme 2
stress
Tone(30, 50, envelope/p_rise, NULL)
endphoneme
@@ -38,7 +60,13 @@ phoneme 44 // tone: mid-high level
Tone(38, 41, envelope/p_rise, NULL)
endphoneme

phoneme 51 // tone: high fall
// tone: high fall
phoneme 51
stress
Tone(50, 10, envelope/p_fall, NULL)
endphoneme

phoneme 4
stress
Tone(50, 10, envelope/p_fall, NULL)
endphoneme
@@ -48,7 +76,13 @@ phoneme 53 // tone: high fall
Tone(50, 30, envelope/p_fall, NULL)
endphoneme

phoneme 55 // tone: high level
// tone: high level
phoneme 55
stress
Tone(55, 50, envelope/p_level, NULL)
endphoneme

phoneme 1
stress
Tone(55, 50, envelope/p_level, NULL)
endphoneme

+ 1
- 1
src/libespeak-ng/tr_languages.c View File

@@ -1588,8 +1588,8 @@ Translator *SelectTranslator(const char *name)
tr->langopts.ideographs = 1;
tr->langopts.our_alphabet = 0x3100;
tr->langopts.word_gap = 0x21; // length of a final vowel is less dependent on the next consonant, don't merge consonant with next word
tr->langopts.textmode = true;
if (name2 == L3('y', 'u', 'e')) {
tr->langopts.textmode = true;
tr->langopts.listx = 1; // compile zh_listx after zh_list
tr->langopts.numbers = NUM_DEFAULT;
tr->langopts.numbers2 = NUM2_ZERO_TENS;

Loading…
Cancel
Save