TranslateWord2 passes translator2 as tr to TranslateWord which may call
TranslateWord3, SpeakIndividualLetters, TranslateLetter, which was calling
SetTranslator2 again, thus freeing the very tr being used. Make that
latter use another translator.
When there are language switches, when we rewind to the start of the
phoneme list we have to reset the phoneme table back.
This avoids some branching that depends on undefined values, caught by
valgrind in the case e.g. of an emoji substitution that contains a
language switch.
Ref #874
And make sure that the unfilled entries are NULL to catch any spurious
reference.
Also fix some missing phoneme table switch according to language
changes.
And fill the last phlist prepause and newword fields, otherwise they are
detected as undefined:
==483407== Conditional jump or move depends on uninitialised value(s)
==483407== at 0x488E6AB: Generate (synthesize.c:1228)
==483407== by 0x488FD94: SpeakNextClause (synthesize.c:1587)
==483407== by 0x4887F56: Synthesize (speech.c:457)
==483407== by 0x488884C: sync_espeak_Synth (speech.c:570)
==483407== by 0x487B270: espeak_Synth (espeak_api.c:90)
==483407== by 0x10ACA0: main (espeak-ng.c:691)
==483407== Uninitialised value was created by a client request
==483407== at 0x4884893: MakePhonemeList (phonemelist.c:155)
==483407== by 0x4895712: TranslateClause (translate.c:2682)
==483407== by 0x488FCCF: SpeakNextClause (synthesize.c:1569)
==483407== by 0x4887F56: Synthesize (speech.c:457)
==483407== by 0x488884C: sync_espeak_Synth (speech.c:570)
==483407== by 0x487B270: espeak_Synth (espeak_api.c:90)
==483407== by 0x10ACA0: main (espeak-ng.c:691)
==483407==
==483407== Conditional jump or move depends on uninitialised value(s)
==483407== at 0x488E622: Generate (synthesize.c:1211)
==483407== by 0x488FD94: SpeakNextClause (synthesize.c:1587)
==483407== by 0x4887F56: Synthesize (speech.c:457)
==483407== by 0x488884C: sync_espeak_Synth (speech.c:570)
==483407== by 0x487B270: espeak_Synth (espeak_api.c:90)
==483407== by 0x10ACA0: main (espeak-ng.c:691)
==483407== Uninitialised value was created by a client request
==483407== at 0x4884893: MakePhonemeList (phonemelist.c:155)
==483407== by 0x4895712: TranslateClause (translate.c:2682)
==483407== by 0x488FCCF: SpeakNextClause (synthesize.c:1569)
==483407== by 0x4887F56: Synthesize (speech.c:457)
==483407== by 0x488884C: sync_espeak_Synth (speech.c:570)
==483407== by 0x487B270: espeak_Synth (espeak_api.c:90)
==483407== by 0x10ACA0: main (espeak-ng.c:691)
==483407==
This is changing the ssml.test output, but with no audible difference,
so this is probably a real fix for it.
These are global arrays reused several times. When using them msan and
valgrind thus believe they are always initialized, which reduces their
capacity to detect uninitialized values. We can however explicitly tell them
when they are reused, and thus to be considered as uninitialized.
When pollint() returns 100.0, multiplying by 2.55 doesn't actually seem to
be getting 255 on i386. Multiplying by 255 and dividing by 100, however,
does (probably because float computation with small integer values are
guaranteed to have integer results).
Fixes #1151
CheckThousandsGroup: Avoid reading uninitialized data
For the case when word is smaller than 4 characters, we should not look at
the 3rd or 4th character before checking the previous ones, otherwise
we'd at best read uninitialized data, at worse non-existing data.
Some rules test against character not being of a certain type. That may
match with the \0 end-of-text marker, and thus actually step over
it and let MatchRule continue with uninitialized data after it, leading
to potential random behavior.
This commits fixes it by making sure that we don't read past that \0.
This seems to be changing the pronunciation of "capitals" from k'apIt@Lz to
k'apIt,alz, I don't know why, I guess the rule for it was actually
bogus?
MatchRule: Prevent non-eating special characters from eating characters
Special characters such as N, S1, etc. are not actually eating
characters. Their treatment should thus *not* update pre_ptr and post_ptr,
otherwise those would underflow/overflow, e.g. in the case
@) s (_NS1 [z]
this would overflow. This for instance noticeable with the memory sanitizer:
ESPEAK_DATA_PATH=$PWD ./src/espeak-ng -qX "capitals"
Translate 'capitals'
1 c [k]
1 a [a]
1 p [p]
1 i [I]
1 t [t]
1 a [a]
1 l [l]
20 l (C [l]
==2837201==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f7f4422744b in utf8_in2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:281:2
#1 0x7f7f442281bc in utf8_in /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:332:9
#2 0x7f7f440e0d31 in MatchRule /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:1767:21
#3 0x7f7f440d937f in TranslateRules /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2320:6
#4 0x7f7f44230e5f in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:733:15
#5 0x7f7f44229844 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#6 0x7f7f44256e50 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11
#7 0x7f7f4424d6cc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#8 0x7f7f44213359 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#9 0x7f7f441a9f56 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#10 0x7f7f441a9023 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#11 0x7f7f441ad59f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#12 0x7f7f4410b3f4 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#13 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#14 0x7f7f43a2e7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#15 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by an allocation of 'sbuf' in the stack frame of function 'TranslateClause'
#0 0x7f7f4423a1f0 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941
While trying to match _NS1, MatchRule is overflowing the buffer.
It happens that this had not usually posed problem because rules usually
have these non-eating special characters last in the rule and thus it wasn't
mattering that post_ptr is pointing outside valid text.
Some rules test against character not being of a certain type. That may
match with the \0 beginning-of-text marker, and thus actually step over
it and let MatchRule continue with uninitialized data before it, leading
to potential random behavior.
This commits fixes it by making sure that we don't read before that \0.
LookupDict2 looks forward in the wtab array, it should still stop at its
end. Otherwise the memory sanitizer reports this:
testing en A. B C, D. E: F.
==65960==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7ff9d7ef0de8 in LookupDict2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2676:11
#1 0x7ff9d7eec2ec in LookupDictList /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2899:10
#2 0x7ff9d802860a in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:588:12
#3 0x7ff9d80249d4 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#4 0x7ff9d8051fe0 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11
#5 0x7ff9d804885c in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#6 0x7ff9d800e4e9 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#7 0x7ff9d7fa50e6 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#8 0x7ff9d7fa41b3 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#9 0x7ff9d7fa872f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#10 0x7ff9d7f06584 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#11 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#12 0x7ff9d78297fc in __libc_start_main csu/../csu/libc-start.c:332:16
#13 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by an allocation of 'words' in the stack frame of function 'TranslateClause'
#0 0x7ff9d8035380 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941
Strictly speaking, we are not supposed to use memcmp to compare strings
since we are not supposed to read beyond \0, which memcmp is supposed to
potentially do. Sanitizers would warn about it, and using strncmp happens to
provide the proper semantic while being not really slower, so better
just use them.
phonemes_name is only initialized when V_LANGUAGE is met. This is not
necessarily the case, notably with
testing espeak_SetVoiceByName("!v/Annie") (language variant; intonation)
Cannot set intonation: language not set, or is invalid.
Uninitialized bytes in __interceptor_strcmp at offset 0 inside [0x7fff8a875e30, 1)
==4169902==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4c6a49 in LookupPhonemeTable /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthdata.c:363:7
#1 0x4c6a49 in SelectPhonemeTableName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthdata.c:380:12
#2 0x5098a9 in LoadVoice /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:950:34
#3 0x50edcf in espeak_ng_SetVoiceByName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:1585:7
#4 0x4aad63 in espeak_SetVoiceByName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:125:32
#5 0x4a3fe1 in test_espeak_set_voice_by_name_language_variant_intonation_parameter /home/samy/brl/speech/espeak-ng-git/tests/api.c:356:2
#6 0x4a3fe1 in main /home/samy/brl/speech/espeak-ng-git/tests/api.c:567:2
#7 0x7f26e88cb7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#8 0x4213a9 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/tests/api.test+0x4213a9)
Uninitialized value was created by an allocation of 'phonemes_name' in the stack frame of function 'LoadVoice'
#0 0x504290 in LoadVoice /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:519
so better catch it properly rather than relying on uninitialized data.
The memory sanitizer would complain:
==4157154==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fc191d0a85b in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1065:7
#1 0x7fc191d02916 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#2 0x7fc191d1b324 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1448:15
#3 0x7fc191d14ebc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#4 0x7fc191cfbc9b in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#5 0x7fc191cd52fc in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#6 0x7fc191cd6d7c in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#7 0x7fc191cd6d7c in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#8 0x7fc191ca0340 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#9 0x4a4381 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#10 0x7fc19168b7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#11 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by a heap allocation
#0 0x45000d in malloc (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x45000d)
#1 0x7fc191d1ca29 in NewTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:242:26
#2 0x7fc191d1ca29 in SelectTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:482:7
(and similar for expect_verb_sn expect_noun, expect_past,
clause_upper_count, clause_lower_count)
Indeed TranslateWord3 doesn't always initialize these fields. Better
just initialize them directly from the Translator creation.