IsLetterGroup: Do not blindly walk back in the word
strlen(p) may be arbitrarily long, that would underflow the word, for
instance:
testing fr Latn
=================================================================
==3741805==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd733c1329 at pc 0x7ff5ffbad2de bp 0x7ffd733bf310 sp 0x7ffd733bf308
READ of size 1 at 0x7ffd733c1329 thread T0
#0 0x7ff5ffbad2dd in IsLetterGroup src/libespeak-ng/dictionary.c:714
#1 0x7ff5ffbbe425 in MatchRule src/libespeak-ng/dictionary.c:1979
#2 0x7ff5ffbc09e9 in TranslateRules src/libespeak-ng/dictionary.c:2301
#3 0x7ff5ffc26656 in TranslateWord3 src/libespeak-ng/translate.c:733
#4 0x7ff5ffc2a10b in TranslateWord src/libespeak-ng/translate.c:1100
#5 0x7ff5ffc2bef2 in TranslateWord2 src/libespeak-ng/translate.c:1361
#6 0x7ff5ffc374e2 in TranslateClause src/libespeak-ng/translate.c:2623
#7 0x7ff5ffc1d010 in SpeakNextClause src/libespeak-ng/synthesize.c:1569
#8 0x7ff5ffbfbd46 in Synthesize src/libespeak-ng/speech.c:492
#9 0x7ff5ffbfd52a in sync_espeak_Synth src/libespeak-ng/speech.c:570
#10 0x7ff5ffbfdd1f in espeak_ng_Synthesize src/libespeak-ng/speech.c:678
#11 0x7ff5ffbc72fd in espeak_Synth src/libespeak-ng/espeak_api.c:90
#12 0x5627511a3137 in main src/espeak-ng.c:691
#13 0x7ff5fee557fc in __libc_start_main ../csu/libc-start.c:332
#14 0x5627511a0569 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x6569)
Address 0x7ffd733c1329 is located in stack of thread T0 at offset 1177 in frame
#0 0x7ff5ffc2f760 in TranslateClause src/libespeak-ng/translate.c:1941
This frame has 16 object(s):
[48, 52) 'cc' (line 1944)
[64, 68) 'source_index' (line 1945)
[80, 84) 'prev_in' (line 1948)
[96, 100) 'prev_out' (line 1949)
[112, 116) 'next_in' (line 1952)
[128, 132) 'char_inserted' (line 1954)
[144, 148) 'word_flags' (line 1963)
[160, 164) 'charix_top' (line 1975)
[176, 180) 'tone' (line 1985)
[192, 196) 'next2_in' (line 2294)
[208, 212) 'c_temp' (line 2518)
[224, 374) 'number_buf' (line 2522)
[448, 1048) 'num_wtab' (line 2523)
[1184, 1984) 'sbuf' (line 1982) <== Memory access at offset 1177 underflows this variable
[2112, 3720) 'charix' (line 1977)
[3856, 7456) 'words' (line 1978)
sbuf is however properly '\0'-header, so we can make IsLetterGroup
carefully walk back in the word and issue a mismatch if it walks back
too much.
Fixes #1108
TranslateWord2 uses phonemes in ph_list2. Apart from the breakable loops, it
may statically require up to 7 phonemes. Then TranslateClause always
uses 2 phonemes. We thus have to keep these margins along the loops to
avoid any overflow.
Fixes #1073
Valgrind reports:
==3642987== Conditional jump or move depends on uninitialised value(s)
==3642987== at 0x491F268: TranslateNumber_1 (numbers.c:1785)
==3642987== by 0x4923C35: TranslateNumber (numbers.c:2080)
==3642987== by 0x49556DC: TranslateWord3 (translate.c:644)
==3642987== by 0x4957FCE: TranslateWord (translate.c:1100)
==3642987== by 0x4959344: TranslateWord2 (translate.c:1361)
==3642987== by 0x496116E: TranslateClause (translate.c:2613)
==3642987== by 0x494FF7A: SpeakNextClause (synthesize.c:1569)
==3642987== by 0x4939B9D: Synthesize (speech.c:457)
==3642987== by 0x493AE6A: sync_espeak_Synth (speech.c:570)
==3642987== by 0x493B286: espeak_ng_Synthesize (speech.c:678)
==3642987== by 0x4916925: espeak_Synth (espeak_api.c:90)
==3642987== by 0x10CF5D: main (espeak-ng.c:691)
==3642987== Uninitialised value was created by a stack allocation
==3642987== at 0x495BD9F: TranslateClause (translate.c:1941)
Indeed, TranslateNumber_1 looks back up to three bytes before, with
IsDigit09(word[-3])), so we have to increase the heading margin to three
spaces.
pre_ptr is already one byte before the current letter, so we do not want
to subtract 1 again. Otherwise this would for instance underflow word_iz
of addPluralSuffixes.
Otherwise asan reports this during make check:
testing en ibm mit ibms mits IBM MIT APH CES ITX IBMs MIT's APHs CES's ITXs
==3733154==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe420233ef at pc 0x7f2e8a30aef1 bp 0x7ffe42022c80 sp 0x7ffe42022c78
READ of size 1 at 0x7ffe420233ef thread T0
#0 0x7f2e8a30aef0 in utf8_in2 src/libespeak-ng/translate.c:281
#1 0x7f2e8a2a6db1 in MatchRule src/libespeak-ng/dictionary.c:2058
#2 0x7f2e8a2a89e9 in TranslateRules src/libespeak-ng/dictionary.c:2301
#3 0x7f2e8a30cc77 in addPluralSuffixes src/libespeak-ng/translate.c:393
#4 0x7f2e8a30e2c9 in TranslateWord3 src/libespeak-ng/translate.c:684
#5 0x7f2e8a31210b in TranslateWord src/libespeak-ng/translate.c:1100
#6 0x7f2e8a313ef2 in TranslateWord2 src/libespeak-ng/translate.c:1361
#7 0x7f2e8a31f4e2 in TranslateClause src/libespeak-ng/translate.c:2623
#8 0x7f2e8a305010 in SpeakNextClause src/libespeak-ng/synthesize.c:1569
#9 0x7f2e8a2e390e in Synthesize src/libespeak-ng/speech.c:457
#10 0x7f2e8a2e552a in sync_espeak_Synth src/libespeak-ng/speech.c:570
#11 0x7f2e8a2e5d1f in espeak_ng_Synthesize src/libespeak-ng/speech.c:678
#12 0x7f2e8a2af2fd in espeak_Synth src/libespeak-ng/espeak_api.c:90
#13 0x5618104c9137 in main src/espeak-ng.c:691
#14 0x7f2e8953d7fc in __libc_start_main ../csu/libc-start.c:332
#15 0x5618104c6569 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x6569)
Address 0x7ffe420233ef is located in stack of thread T0 at offset 47 in frame
#0 0x7f2e8a30cb3b in addPluralSuffixes src/libespeak-ng/translate.c:380
This frame has 3 object(s):
[32, 36) 'word_zz' (line 381)
[48, 52) 'word_iz' (line 382) <== Memory access at offset 47 underflows this variable
[64, 68) 'word_ss' (line 383)
and indeed, RULE_NOVOWELS keeps looking back until it finds a spacing
character, so we have to provide it with one.
event_notify currently introduces an arbitrary 50ms delay between speech
requests. This is usually unnoticed since it's small. But when
cancelling a long series of events, they add up to potentially seconds
of delays, while the user was precisely requesting to just cancel
everything as fast as possible.
This 50ms delay was probably meant to work around some issues elsewhere.
If they are still there, they should be fixed, not worked around.