HomoFast-eSpeak-Persian

Tree: 128ceaff6a

Author	SHA1	Message	Date
Reece H. Dunn	128ceaff6a	tokenizer.c: Support question mark tokens.	8 years ago
Reece H. Dunn	8f62e18324	tokenizer.c: Support full stop tokens.	8 years ago
Reece H. Dunn	d50f3f2fa5	tokenizer.c: Support word tokens.	8 years ago
Reece H. Dunn	a902f451d8	tests/tokenizer.test: Support printing the tokens from a provided file, making it easy to investigate tokenizer issues.	8 years ago
Reece H. Dunn	d093513b65	tokenizer.c: Add an options parameter to the tokenizer_reset API.	8 years ago
Reece H. Dunn	c41ac642fa	tokenizer.c: Tokenise Zp codepoints as paragraphs.	8 years ago
Reece H. Dunn	f3ea6f68f3	tokenizer.c: Tokenise U+000B [VERTICAL TAB (VT)] as whitespace, not as newlines.	8 years ago
Reece H. Dunn	fc7a4e6701	tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.	8 years ago
Reece H. Dunn	d2d718d700	tokenizer.c: Tokenize line separator codepoints as newline tokens.	8 years ago
Reece H. Dunn	bf45e7ce36	tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.	8 years ago
Reece H. Dunn	df6ca7a22c	tokenizer.c: Support whitespace tokens.	8 years ago
Reece H. Dunn	8f0dae6a38	tokenizer.c: Support windows newlines.	8 years ago
Reece H. Dunn	1c8ed9c190	tokenizer.c: Support mac newlines.	8 years ago
Reece H. Dunn	7602c9ac18	tokenizer.c: Support linux newlines.	8 years ago
Reece H. Dunn	bce44316bb	Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.	8 years ago
Reece H. Dunn	5c6bc0e556	Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.	8 years ago
Reece H. Dunn	1c4ce3dcd3	tokenizer.c: create and use a clause_type_from_codepoint function, with tests.	8 years ago

17 Commits (128ceaff6a0177a358184364dde9e7e9c16c06a3)