- fixed/removed not working rules in be_list
- added stress to the words in be_list
- fixed multi thousand transcription
- removed not working rules in be_rules
- added rules of palatalization, phonemes lengthen
- fixed dropping of [a] at the end of words
- fixed message "Full dictionary is not installed for"
- added configuration in tr_languages.c
- fixed/added phonemes for `Q`, `ts`, `ts;`, `dz`, `dz.`, `;` etc
Remove LOPT_IT_DOUBLING and delete unused functionality.
Code checked for both langopt LOPT_IT_DOUBLING and attribe $double.
LOPT_IT_DOUBLING is redundant. $double already has a test in
dictionary.test
Code had a check for tr->langopts.param[LOPT_IT_DOUBLING] & 2 but that bit value is
not used in any language. That logic was removed.
cmn: search for dictionary matches instead of translating characters.
cmn (Mandarin chinese) has been broken since 4825905.
This fix makes mandarin behave more like Cantonese. Instead of
translating characters, we search for dictionary matches.
The functionality of normal vs Chao tones should be investigated more.
Looks like latin characters as pinyin still uses Chao tones whereas
the characters in cmn_list and cmn_listx do not.
See #1044 for discussion. See also #1028 and #1163.
The memory sanitizer would complain:
==4157154==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fc191d0a85b in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1065:7
#1 0x7fc191d02916 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#2 0x7fc191d1b324 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1448:15
#3 0x7fc191d14ebc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#4 0x7fc191cfbc9b in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#5 0x7fc191cd52fc in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#6 0x7fc191cd6d7c in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#7 0x7fc191cd6d7c in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#8 0x7fc191ca0340 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#9 0x4a4381 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#10 0x7fc19168b7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#11 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by a heap allocation
#0 0x45000d in malloc (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x45000d)
#1 0x7fc191d1ca29 in NewTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:242:26
#2 0x7fc191d1ca29 in SelectTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:482:7
(and similar for expect_verb_sn expect_noun, expect_past,
clause_upper_count, clause_lower_count)
Indeed TranslateWord3 doesn't always initialize these fields. Better
just initialize them directly from the Translator creation.
valgrind reports
==3632264== Conditional jump or move depends on uninitialised value(s)
==3632264== at 0x4846688: strcmp (vg_replace_strmem.c:924)
==3632264== by 0x490EC12: LookupDictList (dictionary.c:2889)
==3632264== by 0x49554C6: TranslateWord3 (translate.c:588)
==3632264== by 0x4957FCE: TranslateWord (translate.c:1100)
==3632264== by 0x4959344: TranslateWord2 (translate.c:1361)
==3632264== by 0x4961390: TranslateClause (translate.c:2621)
==3632264== by 0x494FF7A: SpeakNextClause (synthesize.c:1569)
==3632264== by 0x4939B9D: Synthesize (speech.c:457)
==3632264== by 0x493AE6A: sync_espeak_Synth (speech.c:570)
==3632264== by 0x493B286: espeak_ng_Synthesize (speech.c:678)
==3632264== by 0x4916925: espeak_Synth (espeak_api.c:90)
==3632264== by 0x10CF5D: main (espeak-ng.c:691)
And indeed tr->phonemes_repeat may not necessarily be initialized.
93d3c67df accidentally changes number handling for finnish.
D_FRACTION2 seems to match finnish number standards. It causes the
decimals to be read as numbers instead of individual digits: 12,12 is
read "twelve point twelve" instead of "twelve point one two".
It is undocumented so it might cause regressions as well.
See commit message for 23a4d88f.
This commit fixes cmn and yue.
CalcPitches_Tone() now accepts cmn for translator_name.
SelectTranslator() now has a case for yue instead of zhy.
Option "language <name>"already causes SelectTranslator(<name>) to be
called. Having two options to do almost the same thing is unnecessary
and confusing.
In the long term, all options from SelectTranslator() should have a
switch case in LoadVoice() so they are user configurable (see #218). If
needed, a new option (maybe called "LoadOptions") could be added to load
an existing voice or language file.
Changes language configuration files for: hak, cmn, yue, ltg, ms, mb-ma1.
No changes to users.
voices: Change default number pronunciation rule to enabled.
docs: add details about number flags to the documentation.
It's clearly intended to be enabled by default:
- it's defined as default behaviour translate.h (NUM_DEFAULT)
- tr_languages.c sets many default values related to number processing
that have no meaning unless langopts.numbers == 1.
It is also a more sensible default since most languages will want to
have number processing on. This makes adding new languages easier
because adding an entry to tr_languages.c is unnecessary.
A negative side effect is that languages with partial number defines
might experience bugs when reading undefined numbers. This is a bug and
should be fixed.
This will have the side effect of enabling number processing for
languages that currently have it disabled. However, there shouldn't be
any.
Here's a way to check affected languages:
for voice in $(ESPEAK_DATA_PATH=`pwd` LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH}
src/espeak-ng --voices | grep -v Languages | awk '{print $2}'); do
OUTPUT=$(ESPEAK_DATA_PATH=`pwd` LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH}
src/espeak-ng -qx -v $voice "1 - 2 - 3 - 12 - 123") && echo "$voice:
$OUTPUT" ; done
These voices clearly benefit from enabling numbers (they already have
number rules in *_list):
ba, cmn (zh), hak, haw, ja, kok, nb, nci
Some languages are missing some definitions (like _12) in _list files.
It causes the program to skip some numbers.
Numbering needs to be turned off explicitly for:
jbo, mi, my, piqd, py, qu, quc, th, uz
Languages with no number rules at all:
chr, cv, he, nog, tk, ug
Code cleanup: remove param2 from langopts and rename keyword option in language files.
- param2[] is only used to set a second value to LOPT_BRACKET_PAUSE. It is simpler
to have two values in param[] instead. This simplifies the codebase.
- Instead of setting "option bracket X Y" in language files, use
keywords "brackets X" and "bracketsAnnounced Y" instead to follow the
naming convention of other keywords.
- Add missing documentation to docs/voices.md.