BCP 47 language subtag registry making the newly registered `hyw` language code the | BCP 47 language subtag registry making the newly registered `hyw` language code the | ||||
preferred value for `hy-arevmda`. This change keeps support for detecting the | preferred value for `hy-arevmda`. This change keeps support for detecting the | ||||
`hy-arevela` and `hy-arevmda` language tags. | `hy-arevela` and `hy-arevmda` language tags. | ||||
* Support any length replacement rule strings for the source part of the rule (replacing | |||||
from the 'source' string to the 'target' string). | |||||
* Add more tests to check the various parts of espeak-ng. | * Add more tests to check the various parts of espeak-ng. | ||||
* Various changes to clean up the codebase. | * Various changes to clean up the codebase. | ||||
* Fix various compiler warnings (`-Winitialized`, `-Wmissing-prototypes`, `-Wreturn-type`, | * Fix various compiler warnings (`-Winitialized`, `-Wmissing-prototypes`, `-Wreturn-type`, |
## Character Substitution | ## Character Substitution | ||||
Character substitutions can be specified by using a `.replace` section | Character substitutions can be specified by using a `.replace` section | ||||
at the start of the `*_rules` file. In each line one character can be | |||||
replaced by one or two characters. (Source and target of replacement can consume | |||||
up to four bytes.) This substitution is done to a word _before_ word is searched | |||||
in `*_list` or `*_listx` file and translated using the spelling-to-phoneme rules. | |||||
Only the lower-case version of the characters needs to be specified. e.g.: | |||||
at the start of the `*_rules` file. In each line multiple _source_ characters | |||||
can be replaced by one or two characters. This substitution is done to a word | |||||
_before_ word is searched in `*_list` or `*_listx` file and translated using | |||||
the spelling-to-phoneme rules. Only the lower-case version of the characters | |||||
needs to be specified. e.g.: | |||||
.replace | .replace | ||||
ô ő // (Hungarian) allow the use of o-circumflex instead of o-double-accute | ô ő // (Hungarian) allow the use of o-circumflex instead of o-double-accute |
if (nc == fc) { | if (nc == fc) { | ||||
if (*from == 0) return from + 1; | if (*from == 0) return from + 1; | ||||
from += utf8_in((int *)&fc, from); | |||||
match_next += utf8_in((int *)&nc, match_next); | |||||
bool matched = true; | |||||
int nmatched = 0; | |||||
while (*from != 0) { | |||||
from += utf8_in((int *)&fc, from); | |||||
nc = towlower2(nc, tr); | |||||
if (*from == 0 && nc == fc) { | |||||
*ignore_next_n = 1; | |||||
match_next += utf8_in((int *)&nc, match_next); | |||||
nc = towlower2(nc, tr); | |||||
if (nc != fc) | |||||
matched = false; | |||||
else | |||||
nmatched++; | |||||
} | |||||
if (*from == 0 && matched) { | |||||
*ignore_next_n = nmatched; | |||||
return from + 1; | return from + 1; | ||||
} | } | ||||
} | } |