Add Pashto language support based on Urdu language files
This commit adds support for the Pashto language (ps) to espeak-ng. The implementation is based on Urdu language files and includes ps_rules, ps_list, ps_emoji, ps_extra files and updated Makefile.am to include Pashto in the dictionary targets and build rules.
fix: add Pashto data file.
refactor: use enhance rules for stress.
fix: add missing configs.
Add Pashto phoneme support and improve voice files
fix: add Pashto phonemes test.
fix: restore original phonemes.
fix: remove renduandant ps_dict from Make.am file.
fix: use correct phonemes with ipa and stress rules.
feat: translate all en_emoji to Pashto.
fix: add Pashto dict entry in Makefile.am
feat: enhance ps_rules with example pairs and words.
Fuzzing: compile the whole libespeak with -fsanitize=fuzzer-no-link
-fsanitize=fuzzer-no-link makes it workable to build a library with fuzzing
enabled, and let the fuzzing test program explicitly trigger the fuzzing.
This allows fuzzing to trace cmp instructions to guide the fuzzing to
improve results.
code cleanup: new file langopts.c for handling language options.
The switch case in LoadVoice() currently mixes voice and language
options. This change will start separating them into two functions.
CheckTranslator will be moved to langopts.c. In the future there should
be no need to use it in voices.c. There will be other temporary
solutions also.
code cleanup: start moving translateWord3() to a new source file.
The file will be organized to have one callable function only. This
should make code structure simpler.
Existing code will be changed to use function parameters instead of
global variables.
Possible problems include too much dependencies with numbers.c.
Make distclean-local remove more espeak-ng-data files
If for whatever reason the build got broken, the recompilation of data
would fail because espeak-ng would be unable to re-open the current
files. One is then stuck until removing the files by hand. Let's make "make
distclean" clear these all to be sure to be able to restart from zero.
* Add: fuzzer files and modifications in config & compil
* add configure.ac change
* add minimize-corpus.sh
* add fuzzing directory and readme
* add to check if CC support libfuzzer
* Make workflow dump the crash POC
* Add debugging information
* Run fuzzing only once a week for now
Co-authored-by: kmamadoudram <[email protected]>
Co-authored-by: yocvito <[email protected]>
Co-authored-by: Samuel Thibault <[email protected]>
This commit implements support for [Totontepec Mixe](https://en.wikipedia.org/wiki/Totontepec_Mixe). The Espeak rules are based on the phonological inventory, orthographic mappings, and phonetic processes described in the "Esbozo fonológico" (phonological outline/sketch) chapter of Verónica Guzmán Guzmán's 2012 master's thesis in Indo American Linguistics awarded by the [Centro de Investigaciones y Estudios Superiores en Antropología Social](https://ciesas.edu.mx/) and *Vocabulario Mixe de Totontepec* (Totontepec Mixe vocabulary), compiled by Alvin Schoenhals and Louise C. Schoenhals and published by the Summer Institute of Linguistics in 1965.
This commit was developed as part of a project for [Computational Linguistics](https://jnw.domains.swarthmore.edu/ling073/syllabus.php) at [Swarthmore College](https://swarthmore.edu). We feel that this language is suitable for merge with "testing" status, but further verification/improvements by native speakers would be very helpful.
co-authored-by: Elizabeth Resendiz <[email protected]>
Noticed build failure on NixOS when built package with 'make -j16':
build flags: -j16 -l16 SHELL=bash
Makefile:2844: warning: ignoring prerequisites on suffix rule definition
make all-am
make[1]: Entering directory '/build/espeak-ng'
Makefile:2844: warning: ignoring prerequisites on suffix rule definition
touch dictsource/az_extra
...
touch dictsource/yue_extra
cd dictsource && ESPEAK_DATA_PATH=/build/espeak-ng LD_LIBRARY_PATH=../src: ../src/espeak-ng --compile=yue && cd ..
bash: line 1: ../src/espeak-ng: No such file or directory
make[1]: *** [Makefile:3546: espeak-ng-data/yue_dict] Error 127
make[1]: Leaving directory '/build/espeak-ng'
make: *** [Makefile:831: all] Error 2
The fix is to add dependency on 'espeak-ng' similar to other rules.
For ⟨ae⟩, we used to use the Latin [[aI]] phoneme, which sounds like
/ae/. Call that phoneme [[aE]] for us and introduce a separate [[aI]]
one, which sounds more like /ai/ (since Sindarin has both, and they’re
supposed to sound different, though Appendix E of The Lord of the Rings
notes that there is nothing closely corresponding to ⟨ae⟩ in English and
that it may be pronounced like ⟨ai⟩). Furthermore, for ⟨oe⟩, just remove
the TODO – the Latin phoneme is called [[OI]] but sounds more like /oe/
than /oi/, so it’s actually just fine for our purposes. Finally, the
⟨ui⟩ diphthong is copied from Finnish, just like in Quenya.
Both are copied from the Finnish phonemes, since Finnish was a major
inspiration for Quenya. This means that the ⟨iu⟩ diphthong is a
“falling” one – according to Appendix E of The Lord of the Rings, this
is the original pronunciation, but by the Third Age (the time in which
The Lord of the Rings is set) it had become a “rising” one, so I may
change the phoneme later, not sure.
This prepares the languages of Quenya and Sindarin, setting up their
infrastructure without declaring a lot of rules yet – just enough for
“Eä” (a Quenya word, but I can’t think of a similarly simple one for
Sindarin). Phonemes are inherited from Esperanto for now.
Found with the following script:
for file in phsource/ph_*; do
grep -qF "$file" Makefile.am || echo "$file missing!";
done
I’ve left out one of the files highlighted by that script: ph_burmese
exists, but isn’t actually used by phsource/phonemes (there’s no
`include` directive for it).
- fix a buffer overflow in ucd_tolower leading to failure when
compiling with address sanitizer
- force the use of C++ compiler for espeak-ng
- adding a malloc to have a null-terminated string in the fuzz target
- setting (but not overwriting) ESPEAK_DATA_PATH environment
variable inside the fuzz target
And simplify the _dict pattern rule. If a build had completed in the
directory before and "make distclean" had not been run ??_dict
dictionaries still existed, if one of these was rebuilt espeak-ng
loaded the old (out of date) one first resulting in inconsistent
execution of the build the second time round. The change removes the
target before building it thus ensuring that old, possibly damaged, data
is not used.
I also changed the extraction of the language code; the GNU make %
(pattern) extension along with the long-standing $* extension (which
precedes GNU make) allow the match to the '%' to be used directly in the
command line. The cd .. at the end of the command is unnecessary; make
(all versions) execute each command line using a single system() call,
so the cd never happens inside make.
Signed-off-by: John Bowler <[email protected]>
Soundicons are used for external audio with SSML <audio> tag and for
replacing punctuation names with sound files in LoadConfig().
Currently there's a bug wih soundicon slots: if both LoadConfig and
<audio> are used, the punctuation reserves all slots and no sound from
<audio> is played.