code cleanup: start moving translateWord3() to a new source file.
The file will be organized to have one callable function only. This
should make code structure simpler.
Existing code will be changed to use function parameters instead of
global variables.
Possible problems include too much dependencies with numbers.c.
cmn: search for dictionary matches instead of translating characters.
cmn (Mandarin chinese) has been broken since 4825905.
This fix makes mandarin behave more like Cantonese. Instead of
translating characters, we search for dictionary matches.
The functionality of normal vs Chao tones should be investigated more.
Looks like latin characters as pinyin still uses Chao tones whereas
the characters in cmn_list and cmn_listx do not.
See #1044 for discussion. See also #1028 and #1163.
utf8_in2, when working in backward mode, is assuming that we are giving
the address of the last byte of the previous character (see all other
calls to utf8_in2). Otherwise, utf8_in2 returns the size of the current
multibyte character instead of that of the previous multibyte character.
This first reverts "Fix number_buf buffer overflow"
(commit ada93e2db0)
This for loop is apparently actually expected to to skip over NUL
characters.
Fixes #1302
Instead, this limits number processing to 32 digits, as break_numbers does
not support more and would provide bogus result with further digits.
Also fix the signedness of break_numbers so that the 32th bit
actually effectively works.