Some rules test against character not being of a certain type. That may
match with the \0 end-of-text marker, and thus actually step over
it and let MatchRule continue with uninitialized data after it, leading
to potential random behavior.
This commits fixes it by making sure that we don't read past that \0.
This seems to be changing the pronunciation of "capitals" from k'apIt@Lz to
k'apIt,alz, I don't know why, I guess the rule for it was actually
bogus?
MatchRule: Prevent non-eating special characters from eating characters
Special characters such as N, S1, etc. are not actually eating
characters. Their treatment should thus *not* update pre_ptr and post_ptr,
otherwise those would underflow/overflow, e.g. in the case
@) s (_NS1 [z]
this would overflow. This for instance noticeable with the memory sanitizer:
ESPEAK_DATA_PATH=$PWD ./src/espeak-ng -qX "capitals"
Translate 'capitals'
1 c [k]
1 a [a]
1 p [p]
1 i [I]
1 t [t]
1 a [a]
1 l [l]
20 l (C [l]
==2837201==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f7f4422744b in utf8_in2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:281:2
#1 0x7f7f442281bc in utf8_in /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:332:9
#2 0x7f7f440e0d31 in MatchRule /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:1767:21
#3 0x7f7f440d937f in TranslateRules /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2320:6
#4 0x7f7f44230e5f in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:733:15
#5 0x7f7f44229844 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#6 0x7f7f44256e50 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11
#7 0x7f7f4424d6cc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#8 0x7f7f44213359 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#9 0x7f7f441a9f56 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#10 0x7f7f441a9023 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#11 0x7f7f441ad59f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#12 0x7f7f4410b3f4 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#13 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#14 0x7f7f43a2e7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#15 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by an allocation of 'sbuf' in the stack frame of function 'TranslateClause'
#0 0x7f7f4423a1f0 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941
While trying to match _NS1, MatchRule is overflowing the buffer.
It happens that this had not usually posed problem because rules usually
have these non-eating special characters last in the rule and thus it wasn't
mattering that post_ptr is pointing outside valid text.
Some rules test against character not being of a certain type. That may
match with the \0 beginning-of-text marker, and thus actually step over
it and let MatchRule continue with uninitialized data before it, leading
to potential random behavior.
This commits fixes it by making sure that we don't read before that \0.
LookupDict2 looks forward in the wtab array, it should still stop at its
end. Otherwise the memory sanitizer reports this:
testing en A. B C, D. E: F.
==65960==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7ff9d7ef0de8 in LookupDict2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2676:11
#1 0x7ff9d7eec2ec in LookupDictList /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2899:10
#2 0x7ff9d802860a in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:588:12
#3 0x7ff9d80249d4 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#4 0x7ff9d8051fe0 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11
#5 0x7ff9d804885c in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#6 0x7ff9d800e4e9 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#7 0x7ff9d7fa50e6 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#8 0x7ff9d7fa41b3 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#9 0x7ff9d7fa872f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#10 0x7ff9d7f06584 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#11 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#12 0x7ff9d78297fc in __libc_start_main csu/../csu/libc-start.c:332:16
#13 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by an allocation of 'words' in the stack frame of function 'TranslateClause'
#0 0x7ff9d8035380 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941
Strictly speaking, we are not supposed to use memcmp to compare strings
since we are not supposed to read beyond \0, which memcmp is supposed to
potentially do. Sanitizers would warn about it, and using strncmp happens to
provide the proper semantic while being not really slower, so better
just use them.
phonemes_name is only initialized when V_LANGUAGE is met. This is not
necessarily the case, notably with
testing espeak_SetVoiceByName("!v/Annie") (language variant; intonation)
Cannot set intonation: language not set, or is invalid.
Uninitialized bytes in __interceptor_strcmp at offset 0 inside [0x7fff8a875e30, 1)
==4169902==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4c6a49 in LookupPhonemeTable /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthdata.c:363:7
#1 0x4c6a49 in SelectPhonemeTableName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthdata.c:380:12
#2 0x5098a9 in LoadVoice /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:950:34
#3 0x50edcf in espeak_ng_SetVoiceByName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:1585:7
#4 0x4aad63 in espeak_SetVoiceByName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:125:32
#5 0x4a3fe1 in test_espeak_set_voice_by_name_language_variant_intonation_parameter /home/samy/brl/speech/espeak-ng-git/tests/api.c:356:2
#6 0x4a3fe1 in main /home/samy/brl/speech/espeak-ng-git/tests/api.c:567:2
#7 0x7f26e88cb7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#8 0x4213a9 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/tests/api.test+0x4213a9)
Uninitialized value was created by an allocation of 'phonemes_name' in the stack frame of function 'LoadVoice'
#0 0x504290 in LoadVoice /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:519
so better catch it properly rather than relying on uninitialized data.
The memory sanitizer would complain:
==4157154==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fc191d0a85b in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1065:7
#1 0x7fc191d02916 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#2 0x7fc191d1b324 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1448:15
#3 0x7fc191d14ebc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#4 0x7fc191cfbc9b in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#5 0x7fc191cd52fc in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#6 0x7fc191cd6d7c in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#7 0x7fc191cd6d7c in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#8 0x7fc191ca0340 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#9 0x4a4381 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#10 0x7fc19168b7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#11 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by a heap allocation
#0 0x45000d in malloc (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x45000d)
#1 0x7fc191d1ca29 in NewTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:242:26
#2 0x7fc191d1ca29 in SelectTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:482:7
(and similar for expect_verb_sn expect_noun, expect_past,
clause_upper_count, clause_lower_count)
Indeed TranslateWord3 doesn't always initialize these fields. Better
just initialize them directly from the Translator creation.