HomoFast-eSpeak-Persian

Author	SHA1	Message	Date
Valdis Vitolins	0954b24c8c	Merge pull request #1248	3 years ago
Samuel Thibault	c7cf819df7	Add .. test Ref #1271	3 years ago
Samuel Thibault	dda0967b0d	Make it clear where phoneme tests should be added	3 years ago
kmamadoudram	1f76c4b8bd	adding fuzzer for espeak_synth (#1178) * Add: fuzzer files and modifications in config & compil * add configure.ac change * add minimize-corpus.sh * add fuzzing directory and readme * add to check if CC support libfuzzer * Make workflow dump the crash POC * Add debugging information * Run fuzzing only once a week for now Co-authored-by: kmamadoudram <[email protected]> Co-authored-by: yocvito <[email protected]> Co-authored-by: Samuel Thibault <[email protected]>	3 years ago
Bill Dengler	254b64939d	Add support for Totontepec Mixe This commit implements support for [Totontepec Mixe](https://en.wikipedia.org/wiki/Totontepec_Mixe). The Espeak rules are based on the phonological inventory, orthographic mappings, and phonetic processes described in the "Esbozo fonológico" (phonological outline/sketch) chapter of Verónica Guzmán Guzmán's 2012 master's thesis in Indo American Linguistics awarded by the [Centro de Investigaciones y Estudios Superiores en Antropología Social](https://ciesas.edu.mx/) and Vocabulario Mixe de Totontepec (Totontepec Mixe vocabulary), compiled by Alvin Schoenhals and Louise C. Schoenhals and published by the Summer Institute of Linguistics in 1965. This commit was developed as part of a project for [Computational Linguistics](https://jnw.domains.swarthmore.edu/ling073/syllabus.php) at [Swarthmore College](https://swarthmore.edu). We feel that this language is suitable for merge with "testing" status, but further verification/improvements by native speakers would be very helpful. co-authored-by: Elizabeth Resendiz <[email protected]>	3 years ago
Dekedro	bc0ceab7b9	Fix grep regex in tests	3 years ago
Samuel Thibault	199cbe4a1a	Mark re-used phoneme_list as undefined And fill the last phlist prepause and newword fields, otherwise they are detected as undefined: ==483407== Conditional jump or move depends on uninitialised value(s) ==483407== at 0x488E6AB: Generate (synthesize.c:1228) ==483407== by 0x488FD94: SpeakNextClause (synthesize.c:1587) ==483407== by 0x4887F56: Synthesize (speech.c:457) ==483407== by 0x488884C: sync_espeak_Synth (speech.c:570) ==483407== by 0x487B270: espeak_Synth (espeak_api.c:90) ==483407== by 0x10ACA0: main (espeak-ng.c:691) ==483407== Uninitialised value was created by a client request ==483407== at 0x4884893: MakePhonemeList (phonemelist.c:155) ==483407== by 0x4895712: TranslateClause (translate.c:2682) ==483407== by 0x488FCCF: SpeakNextClause (synthesize.c:1569) ==483407== by 0x4887F56: Synthesize (speech.c:457) ==483407== by 0x488884C: sync_espeak_Synth (speech.c:570) ==483407== by 0x487B270: espeak_Synth (espeak_api.c:90) ==483407== by 0x10ACA0: main (espeak-ng.c:691) ==483407== ==483407== Conditional jump or move depends on uninitialised value(s) ==483407== at 0x488E622: Generate (synthesize.c:1211) ==483407== by 0x488FD94: SpeakNextClause (synthesize.c:1587) ==483407== by 0x4887F56: Synthesize (speech.c:457) ==483407== by 0x488884C: sync_espeak_Synth (speech.c:570) ==483407== by 0x487B270: espeak_Synth (espeak_api.c:90) ==483407== by 0x10ACA0: main (espeak-ng.c:691) ==483407== Uninitialised value was created by a client request ==483407== at 0x4884893: MakePhonemeList (phonemelist.c:155) ==483407== by 0x4895712: TranslateClause (translate.c:2682) ==483407== by 0x488FCCF: SpeakNextClause (synthesize.c:1569) ==483407== by 0x4887F56: Synthesize (speech.c:457) ==483407== by 0x488884C: sync_espeak_Synth (speech.c:570) ==483407== by 0x487B270: espeak_Synth (espeak_api.c:90) ==483407== by 0x10ACA0: main (espeak-ng.c:691) ==483407== This is changing the ssml.test output, but with no audible difference, so this is probably a real fix for it.	3 years ago
Samuel Thibault	a34d74ed43	Make envelope computation more robust When pollint() returns 100.0, multiplying by 2.55 doesn't actually seem to be getting 255 on i386. Multiplying by 255 and dividing by 100, however, does (probably because float computation with small integer values are guaranteed to have integer results). Fixes #1151	3 years ago
Samuel Thibault	2878e91db0	Fix testsuite under various locales When the current locale doesn't match the current voice, grep would be surprised by the produced output and believe that this is not text, for instance with LC_ALL=ru_RU.CP1251 we get: TEST tests/language-replace.test [...] testing mk grep: (standard input): binary file matches 2d1 < Translate 'пејзаж' But we can give -a to grep so it always considers its input as text.	3 years ago
Samuel Thibault	f352f1e43f	CheckThousandsGroup: Avoid reading uninitialized data For the case when word is smaller than 4 characters, we should not look at the 3rd or 4th character before checking the previous ones, otherwise we'd at best read uninitialized data, at worse non-existing data.	3 years ago
Samuel Thibault	075cac9b07	fr: Fix PR replacement Ref #853	3 years ago
Samuel Thibault	02dd413a32	Add valgrind CI run Now that all errors are fixed.	3 years ago
Valdis Vitolins	5435f465c8	Issue #1063: update tests	3 years ago
Samuel Thibault	f23265419d	tests: Check value returned by espeak-ng Otherwise we would miss errors produced on shutdown.	3 years ago
Valdis Vitolins	22005d5c86	Test before pull request #1126: MatchRule: Do not overflow the text	3 years ago
Valdis Vitolins	c14636b3bd	Add phoneme test	3 years ago
Samuel Thibault	7f1222c6ad	Properly maintain margin in ph_list2 TranslateWord2 uses phonemes in ph_list2. Apart from the breakable loops, it may statically require up to 7 phonemes. Then TranslateClause always uses 2 phonemes. We thus have to keep these margins along the loops to avoid any overflow. Fixes #1073 #1095	3 years ago
Samuel Thibault	26a675543c	Properly maintain margin in ph_list2 TranslateWord2 uses phonemes in ph_list2. Apart from the breakable loops, it may statically require up to 7 phonemes. Then TranslateClause always uses 2 phonemes. We thus have to keep these margins along the loops to avoid any overflow. Fixes #1073	3 years ago
Samuel Thibault	5ae18f9d4a	Properly maintain margin in ph_list2 TranslateWord2 uses phonemes in ph_list2. Apart from the breakable loops, it may statically require up to 7 phonemes. Then TranslateClause always uses 2 phonemes. We thus have to keep these margins along the loops to avoid any overflow. Fixes #1073	3 years ago
Ulrich Müller	889092c9d6	tests: Add unit test for ieee80.c Signed-off-by: Ulrich Müller <[email protected]>	3 years ago
Valdis Vitolins	f78bb1bec5	en: fix issue #1069: Ligature ﬅ is st, not ft	3 years ago
Valdis Vitolins	86bc55c5c4	Revert fixes for Russian from [email protected] This reverts commit `433d219eca`.	3 years ago
Ineiev	2ecff9fd40	ru: add pronunciation for common acronym, сша	3 years ago
Ineiev	aa663ad7f7	ru: improve definitions for l and l;	3 years ago
Ineiev	55fa5ea5d3	ru: use s. and z. instead of S and Z	3 years ago
Ineiev	b5115298b4	ru: improve source for y	3 years ago
Ineiev	e105e98747	ru: add palatalized velars	3 years ago
Ineiev	d8f56d14a3	sr, ru: use t from Serbian in Russian	3 years ago
Ineiev	433d219eca	ru: fix name of vcd pal nas pzd	3 years ago
Valdis Vitolins	cded9518ab	Fix test for Arabic	3 years ago
bespsm	7aacdef65d	Add initial suport for Belarusian	4 years ago
Marco BARNIG	be962b067b	Add support for Luxembourgish	3 years ago
Valdis Vitolins	7211bb77ec	Revert "ar: issue #1009: improve sound of Ain" This reverts commit `02783150cd`.	4 years ago
Valdis Vitolins	078ff000ad	Revert "ar: issue #1009: change r to trilling R" This reverts commit `aa765c7680`.	4 years ago
Valdis Vitolins	02783150cd	ar: issue #1009: improve sound of Ain	4 years ago
Valdis Vitolins	aa765c7680	ar: issue #1009: change r to trilling R	4 years ago
Valdis Vitolins	9425be8386	Allow several pipe delimited hash values for Klatt tests	4 years ago
Juho Hiltunen	01f094346d	nb: fix regression for language options. `182aba4cc` started calling Norwegian nb instead of no. The result is that code in tr_languages.c was never run.	4 years ago
Juho Hiltunen	88cb55ee0f	tests: add missing phoneme tests. Added tests are copied from existing tests from the same language family or English. The tested phonemes should be adapted for better test coverage.	4 years ago
Juho Hiltunen	ddde4b1060	tests: add a test to make sure each language has a phoneme test	4 years ago
Lucas Werkmeister	2dfc8ae66e	Add Elvish ⟨k⟩ as equivalent of ⟨c⟩ According to Appendix E of The Lord of the Rings, ⟨k⟩ is used with the same value as ⟨c⟩ in names from non-Elvish languages (both representing /k/). However, in the Silmarillion, ⟨k⟩ is also used in some Elvish names, such as Tulkas and Kementári, as well as in some words in the Appendix (Elements in Quenya and Sindarin Names), e.g. kir- as an element or root in Calacirya, Cirth, and other words. And in earlier versions of the language (when Quenya was called Qenya and Sindarin Gnomish), ⟨k⟩ also often occurs. Therefore, let’s support it as an alternative spelling of ⟨c⟩. Currently, eSpeak NG doesn’t seem to do the two-step replacement of ⟨kh⟩→⟨ch⟩→⟨x⟩, which means that ⟨kh⟩ is ultimately pronounced as /kh/ (or /kʰ/?) rather than [χ]; according to Appendix E, this is correct in Dwarvish, while in Orkish and Adûnaic ⟨kh⟩ should be equivalent to ⟨ch⟩. Since we’re not really aiming for pronouncing any of these languages, either way is fine.	4 years ago
Lucas Werkmeister	95d74edd86	Fix Elvish double plosives Consonants written twice always represent long consonants, not actual repetation. eSpeak NG’s default behavior when speaking a doubled consonant phoneme seems to work well enough for non-plosive consonants, but for plosives, we need to tell it that the two input characters correspond to one long phoneme, not a repeated regular one. All three doubled voiceless plosives – ⟨tt⟩, ⟨pp⟩, ⟨cc⟩ – are regularly found in Quenya, according to the Ambar Eldaron Quenya Dictionary [1]. Their voiced counterparts – ⟨dd⟩, ⟨bb⟩, ⟨gg⟩ – apparently don’t occur, nor are any doubled plosives to be found in the Omikhleia Sindarin Dictionary [2], voiced or not. But let’s define all six pairs in both languages anyways, since it doesn’t cost us much to do so, and it seems fairly clear that this is how these double consonants should be pronounced, if they ever occurred. [1]: https://ambar-eldaron.com/telechargements/quenya-engl-A4.pdf [2]: https://www.jrrvf.com/hisweloke/sindar/index.html	4 years ago
Lucas Werkmeister	3ad8114e4a	Fix Elvish ⟨o⟩, ⟨ó⟩, ⟨ê⟩ vowels ⟨o⟩ almost certainly represents [ɔ] – Appendix E of The Lord of the Rings describes it as the sound in English “for”. This means we should use a phoneme [[O]], not [[o]]; we should also create our own phoneme for this, since the one we inherit from Latin sounds much more like [o] to me. In Quenya, long ⟨ó⟩ (and, presumably, ⟨ô⟩) is, according to Appendix E, “tenser and ‘closer’”, which presumably means [o]. (Online sources seem to agree.) The Latin [[o:]] phoneme works well enough for this. In Sindarin, ⟨ó⟩ has “the same quality” as ⟨o⟩ according to Appendix E, so emit it as [[O:]] for [ɔː]. This sounds sensible enough te me. I’m undecided whether “Lothlórien” should be in sjn_list, to pronounce it with [oː] instead of [ɔː]. It’s composed of Sindarin “loth” and Quenya “Lórien”, so that could potentially justify a pronunciation with a Quenya ⟨ó⟩. But then again, maybe it should be a standard Sindarin ⟨ó⟩. For now, I’ve opted to not add it; in the film The Fellowship of the Ring, Aragorn (Viggo Mortensen) says “Lothlórien” after the Fellowship leave Moria, and to me his ⟨ó⟩ sounds more like [ɔː] than [oː], so if this is wrong, at least it’s no more wrong than the famous movie adaptation :)	4 years ago
Lucas Werkmeister	4ed36b07ec	Fix Elvish ⟨e⟩, ⟨é⟩, ⟨ê⟩ vowels ⟨e⟩ almost certainly always represents [ɛ], not [e]. Appendix E of The Lord of the Rings describes it as the sound in English “were”, and I’m not aware of any English dialect that pronounces “were” with an [e]. In Quenya, long ⟨é⟩ (and, presumably, ⟨ê⟩) is, according to Appendix E, “tenser and ‘closer’”, which I assume means [e]. Several online sources agree with this as well. In Sindarin, Appendix E is quite clear that ⟨é⟩ has “the same quality” as ⟨e⟩, only differring from it in length: I assume this must mean that ⟨é⟩ is [ɛː] in Sindarin. The online information on this is confusing and sometimes contradictory even within the same page; several sources claim that Sindarin has an [eː], but I have not seen this claim substantiated with a source from Tolkien, and I suspect it’s simply a confusion with Quenya. It scarcely matters, anyway: Sindarin words with ⟨é⟩ or ⟨ê⟩ seem to be pretty rare. (I’m aware of a single word with an ⟨é⟩ – the name Eluréd, son of Dior – and the Omikhleia Sindarin dictionary [1] features some words with ⟨ê⟩, giving their pronunciation with [ɛː].) The [[EI]] phoneme for Sindarin ⟨ei⟩ is copied from the base2 phonemes. [1]: https://www.jrrvf.com/hisweloke/sindar/index.html	4 years ago
Lucas Werkmeister	cc4b50f3f4	Replace Elvish [[ui]]/[[uI]] diphthong Previously, we used vdiph/ui_4 for [[ui]]; I think the main reason for that was that I didn’t like how the most common ⟨ui⟩, vdiph/ui, seemed to almost vanish in “Cuiviénen”. However, vdiph/ui_4 has the curious property that in some positions, e.g. ⟨uia⟩ in “tuia” or ⟨uil⟩ “tuilindo”, it sounds (to me) more like /ul/ than /ui/. (This also affects Finnish, which seems to be the only other language that uses vdiph/ui_4 [a few other languages also use it for [[ui]] but don’t seem to emit that phoneme in their rules files] – listen to eSpeak NG pronounce Finnish ”luiun”, for instance.) I eventually found out that this can be worked around by substantially lengthening the phoneme (length 500 seems to work in all positions), but this extreme length (the absolute maximum is just 511) becomes rather noticeable whenever the ui is used, including in positions where it had sounded just fine before. Meanwhile, the more standard vdiph/ui can be made to sound reasonably well in “Cuiviénen” with a much smaller increment to its length: 290 (as also in ph_lithuanian) instead of 240 (as in ph_base2) is enough. In this version, [[uI]] sounds acceptable enough for Elvish ⟨ui⟩ in all positions, as far as I can tell.	4 years ago
Lucas Werkmeister	9a1bb4ccf9	Add Quenya ⟨hy⟩ According to Appendix E of The Lord of the Rings, this has the same relation to ⟨y⟩ ([j]) as ⟨hw⟩ ([ʍ]) does to ⟨w⟩ ([w]) – this probably means the voiceless palatal fricative [ç], though Wikipedia says a voicless palatal approximant (which would be closer to [j], the voiced palatal approximant) is sometimes also posited. We previously emitted [[hj]] for ⟨hy⟩, which sounds fairly close to [ç], similar to how [[hw]] is fairly close to [ʍ] (see previous commit) – however, translating it into [[C]] again means better --ipa output. (In Sindarin, ⟨hy⟩ does not occur.)	4 years ago
Lucas Werkmeister	2f77829172	Add Elvish ⟨hw⟩ This is “a voiceless w, as in English white (in northern pronunciation)” according to Appendix E of The Lord of the Rings, and so we copy the [[w#]] phoneme from the English phonemes. I can’t actually hear much of a difference from the previous [[hw]] (I know what the difference between [[w]] and [[w#]] should be, but [[hw]] already sounds like [[w#]] to me), but at least this improves the --ipa output, changing it from [hw] to [ʍ].	4 years ago
Lucas Werkmeister	a39b6b7079	Add Quenya ⟨hr⟩ and Sindarin ⟨rh⟩ Both represent a “voiceless R”, which I believe means a voiceless alveolar trill, [r̥]. Ideally this would be one phoneme, but I’m not sure eSpeak NG currently has a phoneme for this. The Wikipedia article [1][2] lists occurrences in comparatively few languages, and I chose Welsh for guidance: eSpeak NG currently turns Welsh “Rhagfyr” into [[hr'agvYr]], and [[h]] and [[r]] are apparently just two separate phonemes, so for now we do the same for Quenya and Sindarin, and emit hR. [1]: https://en.wikipedia.org/wiki/Voiceless_alveolar_trill [2]: https://en.wikipedia.org/wiki/Special:PermanentLink/1024721264	4 years ago
Lucas Werkmeister	2d47c32ba9	Add Quenya ⟨hl⟩ and Sindarin ⟨lh⟩ Both represent a “voiceless L”; Appendix E of The Lord of the Rings notes that the Quenya ⟨hl⟩ was pronounced like /l/ by the Third Age, but for now we reproduce the original pronunciation. (Maybe we can later use conditional rules for different pronunciations, but I think for now I won’t go down that road.)	4 years ago
Lucas Werkmeister	438be8ed50	Change short Elvish ⟨i⟩ from [[i]] to [[I]] The B-side of the album Poems and Songs of Middle Earth begins with a reading of the Sindarin poem A Elbereth Gilthoniel by J.R.R. Tolkien himself, and in this recording, as best I can tell, he always pronounces short i (i.e. ⟨i⟩, not ⟨í⟩ or ⟨î⟩) as /ɪ/ rather than /i/, regardless of stress; for instance, the word “silivren” has the same i-sound twice (it is not “silívren”). I believe this means that we should use the phoneme [[I]], not [[i]], for ⟨i⟩ (in both Quenya and Sindarin).	4 years ago

1 2 3 4 5 ...

392 Commits (ff0f9c2045db5d0c40d73cd15bf6ebd33a75b923)