HomoFast-eSpeak-Persian

Author	SHA1	Message	Date
Samuel Thibault	55b2e9b2a3	MatchRule: Prevent non-eating special characters from eating characters Special characters such as N, S1, etc. are not actually eating characters. Their treatment should thus not update pre_ptr and post_ptr, otherwise those would underflow/overflow, e.g. in the case @) s (_NS1 [z] this would overflow. This for instance noticeable with the memory sanitizer: ESPEAK_DATA_PATH=$PWD ./src/espeak-ng -qX "capitals" Translate 'capitals' 1 c [k] 1 a [a] 1 p [p] 1 i [I] 1 t [t] 1 a [a] 1 l [l] 20 l (C [l] ==2837201==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x7f7f4422744b in utf8_in2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:281:2 #1 0x7f7f442281bc in utf8_in /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:332:9 #2 0x7f7f440e0d31 in MatchRule /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:1767:21 #3 0x7f7f440d937f in TranslateRules /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2320:6 #4 0x7f7f44230e5f in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:733:15 #5 0x7f7f44229844 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14 #6 0x7f7f44256e50 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11 #7 0x7f7f4424d6cc in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17 #8 0x7f7f44213359 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2 #9 0x7f7f441a9f56 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2 #10 0x7f7f441a9023 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29 #11 0x7f7f441ad59f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10 #12 0x7f7f4410b3f4 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32 #13 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3 #14 0x7f7f43a2e7fc in __libc_start_main csu/../csu/libc-start.c:332:16 #15 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449) Uninitialized value was created by an allocation of 'sbuf' in the stack frame of function 'TranslateClause' #0 0x7f7f4423a1f0 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941 While trying to match _NS1, MatchRule is overflowing the buffer. It happens that this had not usually posed problem because rules usually have these non-eating special characters last in the rule and thus it wasn't mattering that post_ptr is pointing outside valid text.	3 years ago
Samuel Thibault	0f85657c7b	MatchRule: Do not underflow the text Some rules test against character not being of a certain type. That may match with the \0 beginning-of-text marker, and thus actually step over it and let MatchRule continue with uninitialized data before it, leading to potential random behavior. This commits fixes it by making sure that we don't read before that \0.	3 years ago
Samuel Thibault	e87691ad3b	Prevent LookupDict2 from overflowing wtab LookupDict2 looks forward in the wtab array, it should still stop at its end. Otherwise the memory sanitizer reports this: testing en A. B C, D. E: F. ==65960==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x7ff9d7ef0de8 in LookupDict2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2676:11 #1 0x7ff9d7eec2ec in LookupDictList /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/dictionary.c:2899:10 #2 0x7ff9d802860a in TranslateWord3 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:588:12 #3 0x7ff9d80249d4 in TranslateWord /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1100:14 #4 0x7ff9d8051fe0 in TranslateWord2 /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1361:11 #5 0x7ff9d804885c in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:2623:17 #6 0x7ff9d800e4e9 in SpeakNextClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/synthesize.c:1569:2 #7 0x7ff9d7fa50e6 in Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:457:2 #8 0x7ff9d7fa41b3 in sync_espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:570:29 #9 0x7ff9d7fa872f in espeak_ng_Synthesize /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/speech.c:678:10 #10 0x7ff9d7f06584 in espeak_Synth /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/espeak_api.c:90:32 #11 0x4a8be3 in main /home/samy/brl/speech/espeak-ng-git/src/espeak-ng.c:691:3 #12 0x7ff9d78297fc in __libc_start_main csu/../csu/libc-start.c:332:16 #13 0x421449 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x421449) Uninitialized value was created by an allocation of 'words' in the stack frame of function 'TranslateClause' #0 0x7ff9d8035380 in TranslateClause /home/samy/brl/speech/espeak-ng-git/src/libespeak-ng/translate.c:1941	3 years ago
Samuel Thibault	c8de29dec9	Properly compare strings Strictly speaking, we are not supposed to use memcmp to compare strings since we are not supposed to read beyond \0, which memcmp is supposed to potentially do. Sanitizers would warn about it, and using strncmp happens to provide the proper semantic while being not really slower, so better just use them.	3 years ago
Samuel Thibault	1ce5a1bb0b	IsLetterGroup: Do not blindly walk back in the word strlen(p) may be arbitrarily long, that would underflow the word, for instance: testing fr Latn ================================================================= ==3741805==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd733c1329 at pc 0x7ff5ffbad2de bp 0x7ffd733bf310 sp 0x7ffd733bf308 READ of size 1 at 0x7ffd733c1329 thread T0 #0 0x7ff5ffbad2dd in IsLetterGroup src/libespeak-ng/dictionary.c:714 #1 0x7ff5ffbbe425 in MatchRule src/libespeak-ng/dictionary.c:1979 #2 0x7ff5ffbc09e9 in TranslateRules src/libespeak-ng/dictionary.c:2301 #3 0x7ff5ffc26656 in TranslateWord3 src/libespeak-ng/translate.c:733 #4 0x7ff5ffc2a10b in TranslateWord src/libespeak-ng/translate.c:1100 #5 0x7ff5ffc2bef2 in TranslateWord2 src/libespeak-ng/translate.c:1361 #6 0x7ff5ffc374e2 in TranslateClause src/libespeak-ng/translate.c:2623 #7 0x7ff5ffc1d010 in SpeakNextClause src/libespeak-ng/synthesize.c:1569 #8 0x7ff5ffbfbd46 in Synthesize src/libespeak-ng/speech.c:492 #9 0x7ff5ffbfd52a in sync_espeak_Synth src/libespeak-ng/speech.c:570 #10 0x7ff5ffbfdd1f in espeak_ng_Synthesize src/libespeak-ng/speech.c:678 #11 0x7ff5ffbc72fd in espeak_Synth src/libespeak-ng/espeak_api.c:90 #12 0x5627511a3137 in main src/espeak-ng.c:691 #13 0x7ff5fee557fc in __libc_start_main ../csu/libc-start.c:332 #14 0x5627511a0569 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x6569) Address 0x7ffd733c1329 is located in stack of thread T0 at offset 1177 in frame #0 0x7ff5ffc2f760 in TranslateClause src/libespeak-ng/translate.c:1941 This frame has 16 object(s): [48, 52) 'cc' (line 1944) [64, 68) 'source_index' (line 1945) [80, 84) 'prev_in' (line 1948) [96, 100) 'prev_out' (line 1949) [112, 116) 'next_in' (line 1952) [128, 132) 'char_inserted' (line 1954) [144, 148) 'word_flags' (line 1963) [160, 164) 'charix_top' (line 1975) [176, 180) 'tone' (line 1985) [192, 196) 'next2_in' (line 2294) [208, 212) 'c_temp' (line 2518) [224, 374) 'number_buf' (line 2522) [448, 1048) 'num_wtab' (line 2523) [1184, 1984) 'sbuf' (line 1982) <== Memory access at offset 1177 underflows this variable [2112, 3720) 'charix' (line 1977) [3856, 7456) 'words' (line 1978) sbuf is however properly '\0'-header, so we can make IsLetterGroup carefully walk back in the word and issue a mismatch if it walks back too much. Fixes #1108	3 years ago
Samuel Thibault	34a12c83b4	IsLetterGroup: Do not blindly walk back in the word strlen(p) may be arbitrarily long, that would underflow the word, for instance: testing fr Latn ================================================================= ==3741805==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd733c1329 at pc 0x7ff5ffbad2de bp 0x7ffd733bf310 sp 0x7ffd733bf308 READ of size 1 at 0x7ffd733c1329 thread T0 #0 0x7ff5ffbad2dd in IsLetterGroup src/libespeak-ng/dictionary.c:714 #1 0x7ff5ffbbe425 in MatchRule src/libespeak-ng/dictionary.c:1979 #2 0x7ff5ffbc09e9 in TranslateRules src/libespeak-ng/dictionary.c:2301 #3 0x7ff5ffc26656 in TranslateWord3 src/libespeak-ng/translate.c:733 #4 0x7ff5ffc2a10b in TranslateWord src/libespeak-ng/translate.c:1100 #5 0x7ff5ffc2bef2 in TranslateWord2 src/libespeak-ng/translate.c:1361 #6 0x7ff5ffc374e2 in TranslateClause src/libespeak-ng/translate.c:2623 #7 0x7ff5ffc1d010 in SpeakNextClause src/libespeak-ng/synthesize.c:1569 #8 0x7ff5ffbfbd46 in Synthesize src/libespeak-ng/speech.c:492 #9 0x7ff5ffbfd52a in sync_espeak_Synth src/libespeak-ng/speech.c:570 #10 0x7ff5ffbfdd1f in espeak_ng_Synthesize src/libespeak-ng/speech.c:678 #11 0x7ff5ffbc72fd in espeak_Synth src/libespeak-ng/espeak_api.c:90 #12 0x5627511a3137 in main src/espeak-ng.c:691 #13 0x7ff5fee557fc in __libc_start_main ../csu/libc-start.c:332 #14 0x5627511a0569 in _start (/home/samy/ens/projet/1/speech/espeak-ng-git/src/.libs/espeak-ng+0x6569) Address 0x7ffd733c1329 is located in stack of thread T0 at offset 1177 in frame #0 0x7ff5ffc2f760 in TranslateClause src/libespeak-ng/translate.c:1941 This frame has 16 object(s): [48, 52) 'cc' (line 1944) [64, 68) 'source_index' (line 1945) [80, 84) 'prev_in' (line 1948) [96, 100) 'prev_out' (line 1949) [112, 116) 'next_in' (line 1952) [128, 132) 'char_inserted' (line 1954) [144, 148) 'word_flags' (line 1963) [160, 164) 'charix_top' (line 1975) [176, 180) 'tone' (line 1985) [192, 196) 'next2_in' (line 2294) [208, 212) 'c_temp' (line 2518) [224, 374) 'number_buf' (line 2522) [448, 1048) 'num_wtab' (line 2523) [1184, 1984) 'sbuf' (line 1982) <== Memory access at offset 1177 underflows this variable [2112, 3720) 'charix' (line 1977) [3856, 7456) 'words' (line 1978) sbuf is however properly '\0'-header, so we can make IsLetterGroup carefully walk back in the word and issue a mismatch if it walks back too much. Fixes #1108	3 years ago
Samuel Thibault	cd33c4042a	MatchRule: Do not go back before the pre-ptr pre_ptr is already one byte before the current letter, so we do not want to subtract 1 again. Otherwise this would for instance underflow word_iz of addPluralSuffixes.	3 years ago
Samuel Thibault	0119efec28	MatchRule: Do not go back before the pre-ptr pre_ptr is already one byte before the current letter, so we do not want to subtract 1 again. Otherwise this would for instance underflow word_iz of addPluralSuffixes.	3 years ago
Juho Hiltunen	d80f1a80a2	Use ESPEAKNG_DEFAULT_VOICE instead of hard coded "en". This will make it easier to set a default voice other than English. This is important for cases when a language will fall back to the default voice. Some references to L('e', 'n') still need to be changed.	4 years ago
Juho Hiltunen	f269acf22c	code cleanup: remove unneeded switch case for stress_rule = 10. There are nolanguages that use stress_rule = 10.	4 years ago
Juho Hiltunen	c6ef7061ca	code cleanup: use #defines for stress positions. The #defines should be renamed to better convey their meaning.	4 years ago
agonzalezd	2fb981855b	eu: Updated Basque phonetics and stress rule	4 years ago
Juho Hiltunen	389ce6b738	code cleanup: remove unused FLAG_HYPHENATED and related code. The flag is never set anywhere so the if-clause will always evaluate to false.	4 years ago
Juho Hiltunen	0cf3ee564c	Code cleanup: remove param2 from langopts and rename keyword option in language files. - param2[] is only used to set a second value to LOPT_BRACKET_PAUSE. It is simpler to have two values in param[] instead. This simplifies the codebase. - Instead of setting "option bracket X Y" in language files, use keywords "brackets X" and "bracketsAnnounced Y" instead to follow the naming convention of other keywords. - Add missing documentation to docs/voices.md.	4 years ago
freddii	61efed30fa	fixed spelling mistakes	4 years ago
Juho Hiltunen	ee944700f8	code cleanup: Check all local includes with include-what-you-use Going through files in src/libespeak-ng/, include-what-you-use removed a few unnecessary includes and included explanations on why a certain header should be included. This makes tracking globals and dependencies easier. Running the codebase through IWYU should be repeated after each major code restIncludes to standard c library weren't checked to avoid breaking builds with other platforms. See https://github.com/include-what-you-use/include-what-you-use	5 years ago
BenTalagan	9fd480afbf	Fixing typos and naming	5 years ago
BenTalagan	94677f4af8	Rule alignment fixes for non compliant platforms / Fix for emscripten demo	5 years ago
Reece H. Dunn	c6ac526847	When printing phonemes, don't add a space at the start of a sentence or clause.	7 years ago
Reece H. Dunn	55c64036e0	Use UTF-8 strings in replace rules, instead of a packed UTF-16 pair.	7 years ago
Reece H. Dunn	0e91fcbc04	Don't use pw when reading the replacement data.	7 years ago
Reece H. Dunn	424f705525	Revert the new (broken) replacement rule logic. The replacement tests for bs, hr, and sr are no longer marked as broken as they work using the old code. The mk tests keep the broken annotation, as they don't work in the old code either. This reverts commit `801a8d197c`. This reverts commit `64d5701e5e`. This reverts commit `3b51ebf617`. This reverts commit `1fd235d2c0`. This reverts commit `9f0667de86`.	7 years ago
Valdis Vitolins	9f0667de86	Part of issue #199 — extend .replace rule to allow using groups of characters	7 years ago
Reece H. Dunn	55d001514e	Don't print 'Bad rules data in ...' if there are no rules in the dictionary.	7 years ago
Reece H. Dunn	5ebf5b8fa4	en: Fix several -er and -est words (e.g. wickeder, wickedest, brittlest).	7 years ago
Reece H. Dunn	36c29c479a	LookupDict2: Check if flags is not null before setting it. [msvc /analyze]	7 years ago
Reece H. Dunn	28f700e829	en: fix resignedly, manoeuvred, reckon, Frances (name), and sponging.	7 years ago
Reece H. Dunn	566e904b33	LookupDict2: Fix searching entries longer than 128 This is a fix for https://github.com/nvaccess/nvda/issues/7740. With the addition of emoji support, dictionary entries can now be longer than 128 bytes. This fix makes sure the character is interpreted as an unsigned byte so it does not treat long entries as having a negative offset. Treating the offset as a signed byte (like in the previous code) could cause the hash chain search to loop indefinitely when processing certain input, like the Tamil characters in the NVDA issue noted above that is added as a test case to translate.test.	7 years ago
Reece H. Dunn	b7a8751f4d	Remove a redundant comment -- history is available in git.	7 years ago
Juho Hiltunen	c4ec7bfe34	remove option_phoneme_variants that is never set and always evaluates true	7 years ago
Reece H. Dunn	dfe66289c8	Don't set final_ph2 to before phonetic in the stack if ix=1.	7 years ago
Reece H. Dunn	7dad0dfd40	Prevent TranslateRoman reading stack data from Lookup.	7 years ago
Reece H. Dunn	b24db06a84	Copy name to tr->dictionary_name if not equal This is a similar change to `b60d2452c3`. In this case, it is when tr->dictionary_name is passed as the name parameter in LoadDictionary. This happens in the SetTranslator2 function when loading the dictionary for the second language translator object.	7 years ago
Reece H. Dunn	b60d2452c3	Copy name in LoadDictionary if not dictionary_name compiledict.c sets dict_name to dictionary_name if dict_name is not set, and passes that to LoadDictionary. LoadDictionary then copies the passed in name to dictionary_name. This causes -fsanitize=address to fail with overlapping memory addresses passed to strncpy (copying the string to itself). As such, don't copy the name in this case.	7 years ago
Juho Hiltunen	231a1d0944	headers: add new file dictionary.h with declarations of functions in dictionary.c	7 years ago
Juho Hiltunen	cd991bd2c9	headers: add new file numbers.h with declarations of functions in numbers.c	7 years ago
Juho Hiltunen	07160f9286	headers: add new file synthdata.h with declarations of functions in synthdata.c	7 years ago
Juho Hiltunen	706df97b20	headers: add new file readclause.h with declarations of functions in readclause.c	7 years ago
Juho Hiltunen	30ad5c39f6	move struct MatchRecord to dictionary.c since it's the only file that uses it	7 years ago
Juho Hiltunen	78749f14f8	readability fix: use boolean instead of 0 and 1 for loop control	7 years ago
Juho Hiltunen	da287fb851	Unify terminology for stress synthesize.h now contains the definitions STRESS_IS_... that should be used with code related to syllable stress. Note that isBreak and other defines were renumbered so that stress definitions could have values 0-6. Possible TODOs: 1. Unify with terms used with phonemes, i.e. keywords like isDiminished in compiledata.c and stress_type in phsource/phonemes 2. Add functionality and documentation about STRESS_IS_PRIORITY and STRESS_IS_EMPHASIZED	7 years ago
Juho Hiltunen	88315250ff	readability fix: use boolean instead of 0 and 1 for loop control	7 years ago
Juho Hiltunen	afb4fe75e2	fi: fix behaviour of S_2_TO_HEAVY (adding secondary stress) Stress flag S_2_TO_HEAVY is currently only used by finnish. Current behaviour skips adding secondary stress if the following syllable is heavy. The behaviour should be to skip adding secondary stress if the rest of the word (excluding last syllable) contains a heavy syllable. Source of grammar rule and examples of expected behaviour: http://scripta.kotus.fi/visk/sisallys.php?p=13	7 years ago
Reece H. Dunn	50a2d8e291	Revert "Use strcpy instead of memcpy+strlen." This reverts commit `119c200e00`.	7 years ago
Reece H. Dunn	7f42e0aaca	compiledict.c: Fix -Wmissing-prototypes warnings.	7 years ago
Reece H. Dunn	74f9f5e34b	wavegen.c: Fix -Wmissing-prototypes warnings.	7 years ago
Reece H. Dunn	119c200e00	Use strcpy instead of memcpy+strlen. This replaces uses of: memcpy(dst, src, strlen(src)) with: strcpy(dst, src) This fixes issues with reading past the end of the copied buffer (e.g. when processing word-based replacements for emoji characters) by ensuring that the destination buffer is null terminated. Reported by Michael Curran <[email protected]>	7 years ago
Reece H. Dunn	a2f751044c	Remove unused letter assignment in MatchRule. This was identified by the clang static analyser. The letter variable is set in the various match_type switch cases, so does not need to be initialised in the start of the while loop.	8 years ago
Reece H. Dunn	ecdff298b0	last_letter in MatchRule is not used. This was identified by the clang static analyser.	8 years ago
Reece H. Dunn	8a777385a8	Use wflags to access wtab->flags in LookupDict2. Clang static analysis reports a 'Dereference of null pointer' error when accessing wtab->flags. This is properly guarded against when setting the wflags variable, so use that variable instead.	8 years ago

1 2 3

115 Commits (55b2e9b2a3cf80989a4dcfe6afde122e3bb92297)