For ⟨ae⟩, we used to use the Latin [[aI]] phoneme, which sounds like
/ae/. Call that phoneme [[aE]] for us and introduce a separate [[aI]]
one, which sounds more like /ai/ (since Sindarin has both, and they’re
supposed to sound different, though Appendix E of The Lord of the Rings
notes that there is nothing closely corresponding to ⟨ae⟩ in English and
that it may be pronounced like ⟨ai⟩). Furthermore, for ⟨oe⟩, just remove
the TODO – the Latin phoneme is called [[OI]] but sounds more like /oe/
than /oi/, so it’s actually just fine for our purposes. Finally, the
⟨ui⟩ diphthong is copied from Finnish, just like in Quenya.
According to Appendix E of The Lord of the Rings, ⟨ph⟩ stands for /f/
when final (because ⟨f⟩ is pronounced as /v/ in that position), and
otherwise is used instead of ⟨f⟩ either because it’s derived from ⟨p⟩
(in which case it’s presumably pronounced just like ⟨f⟩), or to
represent an especially long /f/. We can’t really tell which case we
have, but from the Omikhleia Sindarin dictionary [1], it appears that
all the long ⟨ph⟩’s are between two vowels, and all the short ones have
at least one adjacent consonant, so let’s use that as a rule and hope it
works out. (The Ambar Eldaron Quenya dictionary [2] is less easily
searchable, so I’m just hoping that this rule works reasonably well for
both languages.)
[1]: https://www.jrrvf.com/hisweloke/sindar/index.html
[2]: https://ambar-eldaron.com/telechargements/quenya-engl-A4.pdf
Not all of the diphthongs in Quenya and Sindarin are defined in the
Latin phonemes, and for now we’re sticking to those, so some diphthongs
just get TODOs for now. Also, we’re temporarily using the same phoneme
for ⟨ae⟩/⟨ai⟩ and ⟨oe⟩/⟨oi⟩, which should really be different (though
Appendix E of The Lord of the Rings notes that ⟨ae⟩ and ⟨oe⟩ don’t have
close English equivalents, and that they “may be pronounced as ai, oi”).
A circumflex “specially prolonged” vowels; according to Appendix E of
The Lord of the Rings, only in Sindarin, but the Ambar Eldaron Quenya
Dictionary [1] also has some very few circumflex words (sû, lîs), so
let’s support the circumflex in Quenya as well. Marking extra-long
vowels with two colons seems to work well and is also done in several
other languages.
[1]: https://ambar-eldaron.com/telechargements/quenya-engl-A4.pdf
Long vowels, marked with an acute accent or a circumflex, are longer
than short vowels (duh) and always make a heavy syllable (i.e. we don’t
include the rules to move stress to the previous syllable). In Quenya,
⟨é⟩ and ⟨ó⟩ are “tenser and ‘closer’” than the short vowels, according
to Appendix E of The Lord of the Rings, while in Sindarin they’re
supposed to be the same; the phonemes we inherit from Latin seem to
reproduce this reasonably well for Quenya, and for now we use them for
Sindarin too, which works nicely for the most common Sindarin word with
a long o, “Lothlórien” (because Lórien is actually a Quenya name, and
therefore I assume *that* ⟨ó⟩ should actually be /o/ and not /ɔ/). I
might adjust the phonemes later (at which point Lothlórien will
presumably have to go in sjn_list).
In Elvish languages, ⟨ch⟩, ⟨dh⟩, and ⟨th⟩ count as single consonants for
the purposes of stress, since they represent single letters in the
original scripts. The easiest way to implement this is to replace them
with single letters at the beginning – ⟨ð⟩ for ⟨dh⟩ and ⟨þ⟩ for ⟨th⟩ are
natural, and ⟨x⟩ for ⟨ch⟩ also makes some sense, though it means we need
to replace real ⟨x⟩ first (it’s not mentioned in Appendix E of The Lord
of the Rings, but does occur in some Quenya words, notably Helcaraxë).
Real ⟨x⟩ is pronounced like /ks/, but of course we need to spell that as
⟨cs⟩, since ⟨k⟩ does not occur in Elvish languages.
- fix a buffer overflow in ucd_tolower leading to failure when
compiling with address sanitizer
- force the use of C++ compiler for espeak-ng
- adding a malloc to have a null-terminated string in the fuzz target
- setting (but not overwriting) ESPEAK_DATA_PATH environment
variable inside the fuzz target