It is possible -- especially at higher speeds -- for the n at the
end of a word to be velarised if the next word starts with a velar
plosive. I prefer the velarised sound between word boundaries, but
others do not. As such, limit the velarisation to within the word
only.
[1] https://en.wikipedia.org/wiki/English_phonology
Revert "maintainability: pass seq_len_adjust to LookupSpect() instead of using globals"
This reverts commit d08b8e43ca.
This commit causes gcc-4.8 to output a different SHA1 hash on the
language-phonemes test for `af` (the first language tested). It
does not break on clang or on gcc-7 so may be a compiler bug,
however the Travis CI build server is using it on Ubuntu Trusty
(14.04 LTS) and so may other older OSes.
LookupDict2: Fix searching entries longer than 128
This is a fix for https://github.com/nvaccess/nvda/issues/7740.
With the addition of emoji support, dictionary entries can now be
longer than 128 bytes. This fix makes sure the character is
interpreted as an unsigned byte so it does not treat long entries
as having a negative offset.
Treating the offset as a signed byte (like in the previous code)
could cause the hash chain search to loop indefinitely when
processing certain input, like the Tamil characters in the NVDA
issue noted above that is added as a test case to translate.test.
It is set in voices.c but never used. docs/voices.md indicates that the keyword intonation only takes one parameter, confirming that option_tone2 is unused.
This is a similar change to b60d2452c3.
In this case, it is when tr->dictionary_name is passed as the name
parameter in LoadDictionary.
This happens in the SetTranslator2 function when loading the
dictionary for the second language translator object.
SetVoiceStack looks for "!v" in variant_name and skips the first
three characters if "!v" is found. The problem here is that it
does not check that the third character is the path separator, so
may advance into unknown memory if variant_name is exactly "!v".
This fixes that problem by checking for the path separator. It
also simplifies the logic by checking the bytes explicitly.
NOTE: This is not strictly needed, as the only code paths this is
relevant for is in espeak_ng_SetVoiceByName, and the variant name
comes from ExtractVoiceVariantName, which sets up the variant name
correctly.
Compare variant_name with "!v" only if long enough
Various places call SetVoiceStack with "" for the variant_name. This
causes -fsanitize=address to fail with an overflow as the call to
memcmp is checking the first 2 bytes, and there is only 1 byte
available.
Copy name in LoadDictionary if not dictionary_name
compiledict.c sets dict_name to dictionary_name if dict_name is
not set, and passes that to LoadDictionary. LoadDictionary then
copies the passed in name to dictionary_name.
This causes -fsanitize=address to fail with overlapping memory
addresses passed to strncpy (copying the string to itself). As
such, don't copy the name in this case.
Z-SAMPA is a more complex specification than Kirshenbaum, X-SAMPA,
and CXS. Additionally, it specifies phonemes such as palatal and
velar trills that are not defined by IPA and in the context of the
velar trill is marked as being impossible.
As such, Z-SAMPA will not be supported in eSpeak NG for now. That
may change in the future once support for the other transcription
schemes has been implemented.
Fixes two sanitizer warnings:
```
src/libespeak-ng/compiledata.c:2291:27: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
src/libespeak-ng/compiledata.c:2424:34: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
```