Revert "maintainability: pass seq_len_adjust to LookupSpect() instead of using globals"
This reverts commit d08b8e43ca.
This commit causes gcc-4.8 to output a different SHA1 hash on the
language-phonemes test for `af` (the first language tested). It
does not break on clang or on gcc-7 so may be a compiler bug,
however the Travis CI build server is using it on Ubuntu Trusty
(14.04 LTS) and so may other older OSes.
LookupDict2: Fix searching entries longer than 128
This is a fix for https://github.com/nvaccess/nvda/issues/7740.
With the addition of emoji support, dictionary entries can now be
longer than 128 bytes. This fix makes sure the character is
interpreted as an unsigned byte so it does not treat long entries
as having a negative offset.
Treating the offset as a signed byte (like in the previous code)
could cause the hash chain search to loop indefinitely when
processing certain input, like the Tamil characters in the NVDA
issue noted above that is added as a test case to translate.test.
It is set in voices.c but never used. docs/voices.md indicates that the keyword intonation only takes one parameter, confirming that option_tone2 is unused.
This is a similar change to b60d2452c3.
In this case, it is when tr->dictionary_name is passed as the name
parameter in LoadDictionary.
This happens in the SetTranslator2 function when loading the
dictionary for the second language translator object.
SetVoiceStack looks for "!v" in variant_name and skips the first
three characters if "!v" is found. The problem here is that it
does not check that the third character is the path separator, so
may advance into unknown memory if variant_name is exactly "!v".
This fixes that problem by checking for the path separator. It
also simplifies the logic by checking the bytes explicitly.
NOTE: This is not strictly needed, as the only code paths this is
relevant for is in espeak_ng_SetVoiceByName, and the variant name
comes from ExtractVoiceVariantName, which sets up the variant name
correctly.
Compare variant_name with "!v" only if long enough
Various places call SetVoiceStack with "" for the variant_name. This
causes -fsanitize=address to fail with an overflow as the call to
memcmp is checking the first 2 bytes, and there is only 1 byte
available.
Copy name in LoadDictionary if not dictionary_name
compiledict.c sets dict_name to dictionary_name if dict_name is
not set, and passes that to LoadDictionary. LoadDictionary then
copies the passed in name to dictionary_name.
This causes -fsanitize=address to fail with overlapping memory
addresses passed to strncpy (copying the string to itself). As
such, don't copy the name in this case.
Fixes two sanitizer warnings:
```
src/libespeak-ng/compiledata.c:2291:27: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
src/libespeak-ng/compiledata.c:2424:34: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
```
No modification was being done to buf after the copy of filepath.
It was just being passed to fopen. As such the copy is redundant,
and can lead to buffer overflow issues if the specified filepath
is larger than buf.
If ph_code is not located in the phoneme_tab, the resulting ph
value will be NULL. This does not normally happen, but it can
happen if word_phonemes contains garbage data, such as with the
1.49.2 multi-word logic when processing words like 'riposted'.
translate: Don't crash translating root words that map to another list entry.
If the list file contains a text replacement to another
entry in the list file, e.g.:
ripost riposte $text
riposte rI#p0st
calling it from a prefix or suffix rule such as 'riposted'
causes word_out[0] to be NULL, as TranslateWord3 has the
information needed to perform the mapping. In this case,
no phonemes have been written in this loop and the phonemes
have been calculated, so don't override them.