HomoFast-eSpeak-Persian

Author	SHA1	Message	Date
Reece H. Dunn	dd90d3812d	tokenizer.c: Support general symbol tokens.	8 years ago
Reece H. Dunn	786575c6ed	tokenizer.c: Support general punctuation tokens.	8 years ago
Reece H. Dunn	0705844bf8	tokenizer.c: Move general category classification that does not override property behaviour to the end, for generic classification.	8 years ago
Reece H. Dunn	683579f403	Make the tokenizer.h API public.	8 years ago
Reece H. Dunn	9af96da469	Make the encoding.h API public.	8 years ago
Reece H. Dunn	55bfbb4754	tokenizer.c: Support ellipsis tokens.	8 years ago
Reece H. Dunn	b847df63b5	tokenizer.c: Support semicolon tokens.	8 years ago
Reece H. Dunn	af7e8fc5a3	tokenizer.c: Support colon tokens.	8 years ago
Reece H. Dunn	7560070dcd	tokenizer.c: Support comma tokens.	8 years ago
Reece H. Dunn	c9199cfacb	tokenizer.c: Support exclamation mark tokens.	8 years ago
Reece H. Dunn	128ceaff6a	tokenizer.c: Support question mark tokens.	8 years ago
Reece H. Dunn	8f62e18324	tokenizer.c: Support full stop tokens.	8 years ago
Reece H. Dunn	d50f3f2fa5	tokenizer.c: Support word tokens.	8 years ago
Reece H. Dunn	d093513b65	tokenizer.c: Add an options parameter to the tokenizer_reset API.	8 years ago
Reece H. Dunn	c41ac642fa	tokenizer.c: Tokenise Zp codepoints as paragraphs.	8 years ago
Reece H. Dunn	fc7a4e6701	tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.	8 years ago
Reece H. Dunn	d2d718d700	tokenizer.c: Tokenize line separator codepoints as newline tokens.	8 years ago
Reece H. Dunn	bf45e7ce36	tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.	8 years ago
Reece H. Dunn	df6ca7a22c	tokenizer.c: Support whitespace tokens.	8 years ago
Reece H. Dunn	539edac795	tokenizer.c: Create a codepoint_type helper function to classify codepoints for the tokenizer.	8 years ago
Reece H. Dunn	8f0dae6a38	tokenizer.c: Support windows newlines.	8 years ago
Reece H. Dunn	1c8ed9c190	tokenizer.c: Support mac newlines.	8 years ago
Reece H. Dunn	7602c9ac18	tokenizer.c: Support linux newlines.	8 years ago
Reece H. Dunn	bce44316bb	Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.	8 years ago
Reece H. Dunn	3cc53d98f4	Add ucd.h to tokenizer.c to provide the definition of the ucd_category identifier for the emscripten build.	8 years ago
Reece H. Dunn	61d668c0cb	ucd-tools: Inverted_Terminal_Punctuation eSpeakNG extended property support; use in clause_type_from_codepoint.	8 years ago
Reece H. Dunn	5c6bc0e556	Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.	8 years ago
Reece H. Dunn	bc13173ac4	ucd-tools: Punctuation_In_Word eSpeakNG extended property support; use in clause_type_from_codepoint.	8 years ago
Reece H. Dunn	1131d0924b	ucd-tools: Optional_Space_After eSpeakNG extended property support; use in clause_type_from_codepoint.	8 years ago
Reece H. Dunn	b932f3c493	ucd-tools: Extended_Dash eSpeakNG extended property support; use in clause_type_from_codepoint.	8 years ago
Reece H. Dunn	3100ca9d1b	Use ucd_properties to implement clause_type_from_codepoint for supported types.	8 years ago
Reece H. Dunn	1c4ce3dcd3	tokenizer.c: create and use a clause_type_from_codepoint function, with tests.	8 years ago

32 Commits (dd90d3812d1f4b9dadfa2cdc9462d4a59ecc472f)