Reece H. Dunn
|
c41ac642fa
|
tokenizer.c: Tokenise Zp codepoints as paragraphs.
|
8 years ago |
Reece H. Dunn
|
f3ea6f68f3
|
tokenizer.c: Tokenise U+000B [VERTICAL TAB (VT)] as whitespace, not as newlines.
|
8 years ago |
Reece H. Dunn
|
fc7a4e6701
|
tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.
|
8 years ago |
Reece H. Dunn
|
d2d718d700
|
tokenizer.c: Tokenize line separator codepoints as newline tokens.
|
8 years ago |
Reece H. Dunn
|
bf45e7ce36
|
tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.
|
8 years ago |
Reece H. Dunn
|
df6ca7a22c
|
tokenizer.c: Support whitespace tokens.
|
8 years ago |
Reece H. Dunn
|
8f0dae6a38
|
tokenizer.c: Support windows newlines.
|
8 years ago |
Reece H. Dunn
|
b897ff5aa8
|
encoding.c: Support calling peekc past the end of the buffer. This makes calling peekc easier.
|
8 years ago |
Reece H. Dunn
|
3f692f498b
|
encoding.c: Implement a peekc API.
|
8 years ago |
Reece H. Dunn
|
1c8ed9c190
|
tokenizer.c: Support mac newlines.
|
8 years ago |
Reece H. Dunn
|
7602c9ac18
|
tokenizer.c: Support linux newlines.
|
8 years ago |
Reece H. Dunn
|
bce44316bb
|
Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.
|
8 years ago |
Reece H. Dunn
|
5c6bc0e556
|
Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.
|
8 years ago |
Reece H. Dunn
|
1c4ce3dcd3
|
tokenizer.c: create and use a clause_type_from_codepoint function, with tests.
|
8 years ago |
Reece H. Dunn
|
691457e98d
|
Add Prepended_Concatenation_Mark support from PropList.txt.
|
8 years ago |
Reece H. Dunn
|
4ce8b61180
|
Extend ucd_property to 64-bits to allow all properties to be specified.
|
8 years ago |
Reece H. Dunn
|
a9aabc6242
|
Add tests for the PropList API.
|
8 years ago |
Reece H. Dunn
|
9dabf64680
|
encoding.c: Support determining the string length for length < 0.
|
8 years ago |
Reece H. Dunn
|
b5ed1f28a5
|
encoding.c: Don't crash if NULL is passed as the string to the decode APIs.
|
8 years ago |
Reece H. Dunn
|
d167d5649b
|
encoding.c: Implement support for the auto-detected character set (utf-8 + codepoint-encoding).
|
8 years ago |
Reece H. Dunn
|
6a0b5e4ae1
|
encoding.c: Support using wchar_t strings with the text decoder API.
|
8 years ago |
Reece H. Dunn
|
b74f756f00
|
encoding.c: Support the ISO-10646-UCS-2 encoding.
|
8 years ago |
Reece H. Dunn
|
fa5d31a8af
|
encoding.c: Support the UTF-8 encoding.
|
8 years ago |
Reece H. Dunn
|
2499610433
|
encoding.c: Support the ISCII encoding.
|
8 years ago |
Reece H. Dunn
|
39f3ea54cf
|
encoding.c: Support the KOI8-R encoding.
|
8 years ago |
Reece H. Dunn
|
b8a1006dd8
|
encoding.c: Support the ISO 8859-16 encoding.
|
8 years ago |
Reece H. Dunn
|
166e815723
|
encoding.c: Support the ISO 8859-15 encoding.
|
8 years ago |
Reece H. Dunn
|
91e054ec7c
|
encoding.c: Fix the ISO 8859 encoding names with date suffices.
|
8 years ago |
Reece H. Dunn
|
0235c42652
|
encoding.c: Support the ISO 8859-14 encoding.
|
8 years ago |
Reece H. Dunn
|
24faceab57
|
encoding.c: Support the ISO 8859-13 encoding.
|
8 years ago |
Reece H. Dunn
|
495c0aed20
|
encoding.c: Support the ISO 8859-11 encoding.
|
8 years ago |
Reece H. Dunn
|
84f20f8bb8
|
encoding.c: Support the ISO 8859-10 encoding.
|
8 years ago |
Reece H. Dunn
|
0421f127e8
|
encoding.c: Support the ISO 8859-9 encoding.
|
8 years ago |
Reece H. Dunn
|
7da585e25e
|
encoding.c: Support the ISO 8859-8 encoding.
|
8 years ago |
Reece H. Dunn
|
56c0b38785
|
encoding.c: Support the ISO 8859-7 encoding.
|
8 years ago |
Reece H. Dunn
|
9e4638ff25
|
encoding.c: Support the ISO 8859-6 encoding.
|
8 years ago |
Reece H. Dunn
|
51295d9d1b
|
encoding.c: Support the ISO 8859-5 encoding.
|
8 years ago |
Reece H. Dunn
|
b5589fc5ee
|
encoding.c: Support the ISO 8859-4 encoding.
|
8 years ago |
Reece H. Dunn
|
a93b0f3d64
|
encoding.c: Support the ISO 8859-3 encoding.
|
8 years ago |
Reece H. Dunn
|
0a0e84a322
|
encoding.c: Support the ISO 8859-2 encoding.
|
8 years ago |
Reece H. Dunn
|
26bec1eedf
|
encoding.c: Support the ISO 8859-1 encoding.
|
8 years ago |
Reece H. Dunn
|
0590da5da7
|
encoding.c: Create a string decoding API; support US-ASCII decoding.
|
8 years ago |
Reece H. Dunn
|
da7eaa7b9c
|
encoding.c: Create a text decoder API based on the usage in readclause.c.
|
8 years ago |
Reece H. Dunn
|
887b1c837f
|
encoding.c: Don't crash when passing a NULL string to LookupMnem.
|
8 years ago |
Reece H. Dunn
|
26f4eb4f8f
|
encoding.c: Support US-ASCII encoding names.
|
8 years ago |
Reece H. Dunn
|
b47363b7d3
|
Create an espeak_ng_EncodingFromName API.
|
8 years ago |
Reece H. Dunn
|
ac082c9400
|
Add tests for the remaining is* APIs.
|
8 years ago |
Reece H. Dunn
|
c9f2940373
|
isblank: don't include <noBreak> characters, and add tests for this API.
|
8 years ago |
Reece H. Dunn
|
5f9dc111cf
|
Add tests for the isdigit and isxdigit ctype APIs.
|
8 years ago |
Reece H. Dunn
|
bd71fed013
|
ctype: return true in isupper/islower if there is a simple case mapping present
|
8 years ago |