Reece H. Dunn
|
5030ff95cc
|
automake: don't make the tests print 'done', have the make rule print 'PASSED' instead.
|
8 years ago |
Reece H. Dunn
|
8821c9e361
|
Split out the readclause tests so the tokenizer tests can use public-only APIs.
|
8 years ago |
Reece H. Dunn
|
d72557aed2
|
Add simple tests for testing voice selection by name.
|
8 years ago |
Reece H. Dunn
|
6ec3e85007
|
Add language.tests for de, en and jp to test the phoneme generation.
|
8 years ago |
Reece H. Dunn
|
dd90d3812d
|
tokenizer.c: Support general symbol tokens.
|
8 years ago |
Reece H. Dunn
|
786575c6ed
|
tokenizer.c: Support general punctuation tokens.
|
8 years ago |
Reece H. Dunn
|
683579f403
|
Make the tokenizer.h API public.
|
8 years ago |
Reece H. Dunn
|
9af96da469
|
Make the encoding.h API public.
|
8 years ago |
Reece H. Dunn
|
55bfbb4754
|
tokenizer.c: Support ellipsis tokens.
|
8 years ago |
Reece H. Dunn
|
b847df63b5
|
tokenizer.c: Support semicolon tokens.
|
8 years ago |
Reece H. Dunn
|
af7e8fc5a3
|
tokenizer.c: Support colon tokens.
|
8 years ago |
Reece H. Dunn
|
7560070dcd
|
tokenizer.c: Support comma tokens.
|
8 years ago |
Reece H. Dunn
|
c9199cfacb
|
tokenizer.c: Support exclamation mark tokens.
|
8 years ago |
Reece H. Dunn
|
128ceaff6a
|
tokenizer.c: Support question mark tokens.
|
8 years ago |
Reece H. Dunn
|
8f62e18324
|
tokenizer.c: Support full stop tokens.
|
8 years ago |
Reece H. Dunn
|
d50f3f2fa5
|
tokenizer.c: Support word tokens.
|
8 years ago |
Reece H. Dunn
|
a902f451d8
|
tests/tokenizer.test: Support printing the tokens from a provided file, making it easy to investigate tokenizer issues.
|
8 years ago |
Reece H. Dunn
|
d093513b65
|
tokenizer.c: Add an options parameter to the tokenizer_reset API.
|
8 years ago |
Reece H. Dunn
|
c41ac642fa
|
tokenizer.c: Tokenise Zp codepoints as paragraphs.
|
8 years ago |
Reece H. Dunn
|
f3ea6f68f3
|
tokenizer.c: Tokenise U+000B [VERTICAL TAB (VT)] as whitespace, not as newlines.
|
8 years ago |
Reece H. Dunn
|
fc7a4e6701
|
tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.
|
8 years ago |
Reece H. Dunn
|
d2d718d700
|
tokenizer.c: Tokenize line separator codepoints as newline tokens.
|
8 years ago |
Reece H. Dunn
|
bf45e7ce36
|
tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.
|
8 years ago |
Reece H. Dunn
|
df6ca7a22c
|
tokenizer.c: Support whitespace tokens.
|
8 years ago |
Reece H. Dunn
|
8f0dae6a38
|
tokenizer.c: Support windows newlines.
|
8 years ago |
Reece H. Dunn
|
b897ff5aa8
|
encoding.c: Support calling peekc past the end of the buffer. This makes calling peekc easier.
|
8 years ago |
Reece H. Dunn
|
3f692f498b
|
encoding.c: Implement a peekc API.
|
8 years ago |
Reece H. Dunn
|
1c8ed9c190
|
tokenizer.c: Support mac newlines.
|
8 years ago |
Reece H. Dunn
|
7602c9ac18
|
tokenizer.c: Support linux newlines.
|
8 years ago |
Reece H. Dunn
|
bce44316bb
|
Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.
|
8 years ago |
Reece H. Dunn
|
5c6bc0e556
|
Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.
|
8 years ago |
Reece H. Dunn
|
1c4ce3dcd3
|
tokenizer.c: create and use a clause_type_from_codepoint function, with tests.
|
8 years ago |
Reece H. Dunn
|
691457e98d
|
Add Prepended_Concatenation_Mark support from PropList.txt.
|
8 years ago |
Reece H. Dunn
|
4ce8b61180
|
Extend ucd_property to 64-bits to allow all properties to be specified.
|
8 years ago |
Reece H. Dunn
|
a9aabc6242
|
Add tests for the PropList API.
|
8 years ago |
Reece H. Dunn
|
9dabf64680
|
encoding.c: Support determining the string length for length < 0.
|
8 years ago |
Reece H. Dunn
|
b5ed1f28a5
|
encoding.c: Don't crash if NULL is passed as the string to the decode APIs.
|
8 years ago |
Reece H. Dunn
|
d167d5649b
|
encoding.c: Implement support for the auto-detected character set (utf-8 + codepoint-encoding).
|
8 years ago |
Reece H. Dunn
|
6a0b5e4ae1
|
encoding.c: Support using wchar_t strings with the text decoder API.
|
8 years ago |
Reece H. Dunn
|
b74f756f00
|
encoding.c: Support the ISO-10646-UCS-2 encoding.
|
8 years ago |
Reece H. Dunn
|
fa5d31a8af
|
encoding.c: Support the UTF-8 encoding.
|
8 years ago |
Reece H. Dunn
|
2499610433
|
encoding.c: Support the ISCII encoding.
|
8 years ago |
Reece H. Dunn
|
39f3ea54cf
|
encoding.c: Support the KOI8-R encoding.
|
8 years ago |
Reece H. Dunn
|
b8a1006dd8
|
encoding.c: Support the ISO 8859-16 encoding.
|
8 years ago |
Reece H. Dunn
|
166e815723
|
encoding.c: Support the ISO 8859-15 encoding.
|
8 years ago |
Reece H. Dunn
|
91e054ec7c
|
encoding.c: Fix the ISO 8859 encoding names with date suffices.
|
8 years ago |
Reece H. Dunn
|
0235c42652
|
encoding.c: Support the ISO 8859-14 encoding.
|
8 years ago |
Reece H. Dunn
|
24faceab57
|
encoding.c: Support the ISO 8859-13 encoding.
|
8 years ago |
Reece H. Dunn
|
495c0aed20
|
encoding.c: Support the ISO 8859-11 encoding.
|
8 years ago |
Reece H. Dunn
|
84f20f8bb8
|
encoding.c: Support the ISO 8859-10 encoding.
|
8 years ago |