The memory sanitizer would complain:
==4157154==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fc191d0a85b in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1065:7
#1 0x7fc191d02916 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#2 0x7fc191d1b324 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1448:15
#3 0x7fc191d14ebc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#4 0x7fc191cfbc9b in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#5 0x7fc191cd52fc in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#6 0x7fc191cd6d7c in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#7 0x7fc191cd6d7c in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#8 0x7fc191ca0340 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#9 0x4a4381 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#10 0x7fc19168b7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#11 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by a heap allocation
#0 0x45000d in malloc (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x45000d)
#1 0x7fc191d1ca29 in NewTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:242:26
#2 0x7fc191d1ca29 in SelectTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:482:7
(and similar for expect_verb_sn expect_noun, expect_past,
clause_upper_count, clause_lower_count)
Indeed TranslateWord3 doesn't always initialize these fields. Better
just initialize them directly from the Translator creation.
valgrind reports
==3632264== Conditional jump or move depends on uninitialised value(s)
==3632264== at 0x4846688: strcmp (vg_replace_strmem.c:924)
==3632264== by 0x490EC12: LookupDictList (dictionary.c:2889)
==3632264== by 0x49554C6: TranslateWord3 (translate.c:588)
==3632264== by 0x4957FCE: TranslateWord (translate.c:1100)
==3632264== by 0x4959344: TranslateWord2 (translate.c:1361)
==3632264== by 0x4961390: TranslateClause (translate.c:2621)
==3632264== by 0x494FF7A: SpeakNextClause (synthesize.c:1569)
==3632264== by 0x4939B9D: Synthesize (speech.c:457)
==3632264== by 0x493AE6A: sync_espeak_Synth (speech.c:570)
==3632264== by 0x493B286: espeak_ng_Synthesize (speech.c:678)
==3632264== by 0x4916925: espeak_Synth (espeak_api.c:90)
==3632264== by 0x10CF5D: main (espeak-ng.c:691)
And indeed tr->phonemes_repeat may not necessarily be initialized.
93d3c67df accidentally changes number handling for finnish.
D_FRACTION2 seems to match finnish number standards. It causes the
decimals to be read as numbers instead of individual digits: 12,12 is
read "twelve point twelve" instead of "twelve point one two".
It is undocumented so it might cause regressions as well.
See commit message for 23a4d88f.
This commit fixes cmn and yue.
CalcPitches_Tone() now accepts cmn for translator_name.
SelectTranslator() now has a case for yue instead of zhy.
Option "language <name>"already causes SelectTranslator(<name>) to be
called. Having two options to do almost the same thing is unnecessary
and confusing.
In the long term, all options from SelectTranslator() should have a
switch case in LoadVoice() so they are user configurable (see #218). If
needed, a new option (maybe called "LoadOptions") could be added to load
an existing voice or language file.
Changes language configuration files for: hak, cmn, yue, ltg, ms, mb-ma1.
No changes to users.
voices: Change default number pronunciation rule to enabled.
docs: add details about number flags to the documentation.
It's clearly intended to be enabled by default:
- it's defined as default behaviour translate.h (NUM_DEFAULT)
- tr_languages.c sets many default values related to number processing
that have no meaning unless langopts.numbers == 1.
It is also a more sensible default since most languages will want to
have number processing on. This makes adding new languages easier
because adding an entry to tr_languages.c is unnecessary.
A negative side effect is that languages with partial number defines
might experience bugs when reading undefined numbers. This is a bug and
should be fixed.
This will have the side effect of enabling number processing for
languages that currently have it disabled. However, there shouldn't be
any.
Here's a way to check affected languages:
for voice in $(ESPEAK_DATA_PATH=`pwd` LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH}
src/espeak-ng --voices | grep -v Languages | awk '{print $2}'); do
OUTPUT=$(ESPEAK_DATA_PATH=`pwd` LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH}
src/espeak-ng -qx -v $voice "1 - 2 - 3 - 12 - 123") && echo "$voice:
$OUTPUT" ; done
These voices clearly benefit from enabling numbers (they already have
number rules in *_list):
ba, cmn (zh), hak, haw, ja, kok, nb, nci
Some languages are missing some definitions (like _12) in _list files.
It causes the program to skip some numbers.
Numbering needs to be turned off explicitly for:
jbo, mi, my, piqd, py, qu, quc, th, uz
Languages with no number rules at all:
chr, cv, he, nog, tk, ug
Code cleanup: remove param2 from langopts and rename keyword option in language files.
- param2[] is only used to set a second value to LOPT_BRACKET_PAUSE. It is simpler
to have two values in param[] instead. This simplifies the codebase.
- Instead of setting "option bracket X Y" in language files, use
keywords "brackets X" and "bracketsAnnounced Y" instead to follow the
naming convention of other keywords.
- Add missing documentation to docs/voices.md.
code cleanup: Check all local includes with include-what-you-use
Going through files in src/libespeak-ng/, include-what-you-use removed a
few unnecessary includes and included explanations on why a certain
header should be included. This makes tracking globals and dependencies easier.
Running the codebase through IWYU should be repeated after each major
code restIncludes to standard c library weren't checked to avoid
breaking builds with other platforms.
See https://github.com/include-what-you-use/include-what-you-use