Some rules test against character not being of a certain type. That may
match with the \0 beginning-of-text marker, and thus actually step over
it and let MatchRule continue with uninitialized data before it, leading
to potential random behavior.
This commits fixes it by making sure that we don't read before that \0.
LookupDict2 looks forward in the wtab array, it should still stop at its
end. Otherwise the memory sanitizer reports this:
testing en A. B C, D. E: F.
==65960==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7ff9d7ef0de8 in LookupDict2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2676:11
#1 0x7ff9d7eec2ec in LookupDictList /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2899:10
#2 0x7ff9d802860a in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:588:12
#3 0x7ff9d80249d4 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#4 0x7ff9d8051fe0 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11
#5 0x7ff9d804885c in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#6 0x7ff9d800e4e9 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#7 0x7ff9d7fa50e6 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#8 0x7ff9d7fa41b3 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#9 0x7ff9d7fa872f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#10 0x7ff9d7f06584 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#11 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#12 0x7ff9d78297fc in __libc_start_main csu/../csu/libc-start.c:332:16
#13 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by an allocation of 'words' in the stack frame of function 'TranslateClause'
#0 0x7ff9d8035380 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941
Strictly speaking, we are not supposed to use memcmp to compare strings
since we are not supposed to read beyond \0, which memcmp is supposed to
potentially do. Sanitizers would warn about it, and using strncmp happens to
provide the proper semantic while being not really slower, so better
just use them.
phonemes_name is only initialized when V_LANGUAGE is met. This is not
necessarily the case, notably with
testing espeak_SetVoiceByName("!v/Annie") (language variant; intonation)
Cannot set intonation: language not set, or is invalid.
Uninitialized bytes in __interceptor_strcmp at offset 0 inside [0x7fff8a875e30, 1)
==4169902==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x4c6a49 in LookupPhonemeTable /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthdata.c:363:7
#1 0x4c6a49 in SelectPhonemeTableName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthdata.c:380:12
#2 0x5098a9 in LoadVoice /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:950:34
#3 0x50edcf in espeak_ng_SetVoiceByName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:1585:7
#4 0x4aad63 in espeak_SetVoiceByName /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:125:32
#5 0x4a3fe1 in test_espeak_set_voice_by_name_language_variant_intonation_parameter /home/samy/brl/speech/espeak-ng-git/tests/api.c:356:2
#6 0x4a3fe1 in main /home/samy/brl/speech/espeak-ng-git/tests/api.c:567:2
#7 0x7f26e88cb7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#8 0x4213a9 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/tests/api.test+0x4213a9)
Uninitialized value was created by an allocation of 'phonemes_name' in the stack frame of function 'LoadVoice'
#0 0x504290 in LoadVoice /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/voices.c:519
so better catch it properly rather than relying on uninitialized data.
The memory sanitizer would complain:
==4157154==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fc191d0a85b in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1065:7
#1 0x7fc191d02916 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14
#2 0x7fc191d1b324 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1448:15
#3 0x7fc191d14ebc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17
#4 0x7fc191cfbc9b in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2
#5 0x7fc191cd52fc in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2
#6 0x7fc191cd6d7c in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29
#7 0x7fc191cd6d7c in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10
#8 0x7fc191ca0340 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32
#9 0x4a4381 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3
#10 0x7fc19168b7fc in __libc_start_main csu/../csu/libc-start.c:332:16
#11 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449)
Uninitialized value was created by a heap allocation
#0 0x45000d in malloc (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x45000d)
#1 0x7fc191d1ca29 in NewTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:242:26
#2 0x7fc191d1ca29 in SelectTranslator /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/tr_languages.c:482:7
(and similar for expect_verb_sn expect_noun, expect_past,
clause_upper_count, clause_lower_count)
Indeed TranslateWord3 doesn't always initialize these fields. Better
just initialize them directly from the Translator creation.
Similarly to dab5457620 "Fix deleting FrameManagerImpl*", we need a
virtual destructor. clang was complaining about it:
src/speechPlayer/src/speechPlayer.cpp:52:2: warning: delete called on 'SpeechWaveGenerator' that is abstract but has non-virtual destructor [-Wdelete-abstract-non-virtual-dtor]
delete playerHandleInfo->waveGenerator;
^
We need to keep the mutex around setting it, to make sure that the
visibility of setting it to true doesn't get delayed, notably not after
the signal (which could entail that the signal becomes useless, leading
to a hang).
delete playerHandleInfo->frameManager wasn't actually calling
FrameManagerImpl~, because playerHandleInfo->frameManager is only a
FrameManager, which didn't have a destructor, and nothing was saying that
it's actually a FrameManagerImpl behind. Adding a virtual destructor
fixes that.
IsLetterGroup: Do not blindly walk back in the word
strlen(p) may be arbitrarily long, that would underflow the word, for
instance:
testing fr Latn
=================================================================
==3741805==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd733c1329 at pc 0x7ff5ffbad2de bp 0x7ffd733bf310 sp 0x7ffd733bf308
READ of size 1 at 0x7ffd733c1329 thread T0
#0 0x7ff5ffbad2dd in IsLetterGroup src/libespeak-ng/dictionary.c:714
#1 0x7ff5ffbbe425 in MatchRule src/libespeak-ng/dictionary.c:1979
#2 0x7ff5ffbc09e9 in TranslateRules src/libespeak-ng/dictionary.c:2301
#3 0x7ff5ffc26656 in TranslateWord3 src/libespeak-ng/translate.c:733
#4 0x7ff5ffc2a10b in TranslateWord src/libespeak-ng/translate.c:1100
#5 0x7ff5ffc2bef2 in TranslateWord2 src/libespeak-ng/translate.c:1361
#6 0x7ff5ffc374e2 in TranslateClause src/libespeak-ng/translate.c:2623
#7 0x7ff5ffc1d010 in SpeakNextClause src/libespeak-ng/synthesize.c:1569
#8 0x7ff5ffbfbd46 in Synthesize src/libespeak-ng/speech.c:492
#9 0x7ff5ffbfd52a in sync_espeak_Synth src/libespeak-ng/speech.c:570
#10 0x7ff5ffbfdd1f in espeak_ng_Synthesize src/libespeak-ng/speech.c:678
#11 0x7ff5ffbc72fd in espeak_Synth src/libespeak-ng/espeak_api.c:90
#12 0x5627511a3137 in main src/espeak-ng.c:691
#13 0x7ff5fee557fc in __libc_start_main ../csu/libc-start.c:332
#14 0x5627511a0569 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x6569)
Address 0x7ffd733c1329 is located in stack of thread T0 at offset 1177 in frame
#0 0x7ff5ffc2f760 in TranslateClause src/libespeak-ng/translate.c:1941
This frame has 16 object(s):
[48, 52) 'cc' (line 1944)
[64, 68) 'source_index' (line 1945)
[80, 84) 'prev_in' (line 1948)
[96, 100) 'prev_out' (line 1949)
[112, 116) 'next_in' (line 1952)
[128, 132) 'char_inserted' (line 1954)
[144, 148) 'word_flags' (line 1963)
[160, 164) 'charix_top' (line 1975)
[176, 180) 'tone' (line 1985)
[192, 196) 'next2_in' (line 2294)
[208, 212) 'c_temp' (line 2518)
[224, 374) 'number_buf' (line 2522)
[448, 1048) 'num_wtab' (line 2523)
[1184, 1984) 'sbuf' (line 1982) <== Memory access at offset 1177 underflows this variable
[2112, 3720) 'charix' (line 1977)
[3856, 7456) 'words' (line 1978)
sbuf is however properly '\0'-header, so we can make IsLetterGroup
carefully walk back in the word and issue a mismatch if it walks back
too much.
Fixes #1108
IsLetterGroup: Do not blindly walk back in the word
strlen(p) may be arbitrarily long, that would underflow the word, for
instance:
testing fr Latn
=================================================================
==3741805==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd733c1329 at pc 0x7ff5ffbad2de bp 0x7ffd733bf310 sp 0x7ffd733bf308
READ of size 1 at 0x7ffd733c1329 thread T0
#0 0x7ff5ffbad2dd in IsLetterGroup src/libespeak-ng/dictionary.c:714
#1 0x7ff5ffbbe425 in MatchRule src/libespeak-ng/dictionary.c:1979
#2 0x7ff5ffbc09e9 in TranslateRules src/libespeak-ng/dictionary.c:2301
#3 0x7ff5ffc26656 in TranslateWord3 src/libespeak-ng/translate.c:733
#4 0x7ff5ffc2a10b in TranslateWord src/libespeak-ng/translate.c:1100
#5 0x7ff5ffc2bef2 in TranslateWord2 src/libespeak-ng/translate.c:1361
#6 0x7ff5ffc374e2 in TranslateClause src/libespeak-ng/translate.c:2623
#7 0x7ff5ffc1d010 in SpeakNextClause src/libespeak-ng/synthesize.c:1569
#8 0x7ff5ffbfbd46 in Synthesize src/libespeak-ng/speech.c:492
#9 0x7ff5ffbfd52a in sync_espeak_Synth src/libespeak-ng/speech.c:570
#10 0x7ff5ffbfdd1f in espeak_ng_Synthesize src/libespeak-ng/speech.c:678
#11 0x7ff5ffbc72fd in espeak_Synth src/libespeak-ng/espeak_api.c:90
#12 0x5627511a3137 in main src/espeak-ng.c:691
#13 0x7ff5fee557fc in __libc_start_main ../csu/libc-start.c:332
#14 0x5627511a0569 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x6569)
Address 0x7ffd733c1329 is located in stack of thread T0 at offset 1177 in frame
#0 0x7ff5ffc2f760 in TranslateClause src/libespeak-ng/translate.c:1941
This frame has 16 object(s):
[48, 52) 'cc' (line 1944)
[64, 68) 'source_index' (line 1945)
[80, 84) 'prev_in' (line 1948)
[96, 100) 'prev_out' (line 1949)
[112, 116) 'next_in' (line 1952)
[128, 132) 'char_inserted' (line 1954)
[144, 148) 'word_flags' (line 1963)
[160, 164) 'charix_top' (line 1975)
[176, 180) 'tone' (line 1985)
[192, 196) 'next2_in' (line 2294)
[208, 212) 'c_temp' (line 2518)
[224, 374) 'number_buf' (line 2522)
[448, 1048) 'num_wtab' (line 2523)
[1184, 1984) 'sbuf' (line 1982) <== Memory access at offset 1177 underflows this variable
[2112, 3720) 'charix' (line 1977)
[3856, 7456) 'words' (line 1978)
sbuf is however properly '\0'-header, so we can make IsLetterGroup
carefully walk back in the word and issue a mismatch if it walks back
too much.
Fixes #1108
TranslateWord2 uses phonemes in ph_list2. Apart from the breakable loops, it
may statically require up to 7 phonemes. Then TranslateClause always
uses 2 phonemes. We thus have to keep these margins along the loops to
avoid any overflow.
Fixes #1073#1095
TranslateWord2 uses phonemes in ph_list2. Apart from the breakable loops, it
may statically require up to 7 phonemes. Then TranslateClause always
uses 2 phonemes. We thus have to keep these margins along the loops to
avoid any overflow.
Fixes #1073