Reece H. Dunn
94376c2d5f
Use an enum and named values for the steps in compile_line to make the logic easier to read.
8 years ago
Reece H. Dunn
efd1df3206
Build a test version of libespeak-ng that exposes the internal APIs.
8 years ago
Reece H. Dunn
85801fc1e3
Remove the now unused dictdialect functionality from the code.
8 years ago
Reece H. Dunn
2f8f125c68
Remove voice/language support for alphabet2.
This is not used by any of the espeak-ng voices and languages.
Additionally, this functionality would be superceded by support
for specifying the language used by different scripts in the
language argument on the command line.
8 years ago
Reece H. Dunn
dd90d3812d
tokenizer.c: Support general symbol tokens.
8 years ago
Reece H. Dunn
786575c6ed
tokenizer.c: Support general punctuation tokens.
8 years ago
Reece H. Dunn
0705844bf8
tokenizer.c: Move general category classification that does not override property behaviour to the end, for generic classification.
8 years ago
Reece H. Dunn
683579f403
Make the tokenizer.h API public.
8 years ago
Reece H. Dunn
9af96da469
Make the encoding.h API public.
8 years ago
Reece H. Dunn
55bfbb4754
tokenizer.c: Support ellipsis tokens.
8 years ago
Alberto Pettarin
123309a07b
Added git ignore for emscripted in UCD tools
8 years ago
Reece H. Dunn
b847df63b5
tokenizer.c: Support semicolon tokens.
8 years ago
Reece H. Dunn
af7e8fc5a3
tokenizer.c: Support colon tokens.
8 years ago
Reece H. Dunn
7560070dcd
tokenizer.c: Support comma tokens.
8 years ago
Reece H. Dunn
c9199cfacb
tokenizer.c: Support exclamation mark tokens.
8 years ago
Reece H. Dunn
128ceaff6a
tokenizer.c: Support question mark tokens.
8 years ago
Reece H. Dunn
8f62e18324
tokenizer.c: Support full stop tokens.
8 years ago
chrislm
5d8bb74169
IT: new improvements tested on april 2017
reduced length to 160 for unstressed syllables
Added some exceptions to the italian dictionaries
8 years ago
Reece H. Dunn
d50f3f2fa5
tokenizer.c: Support word tokens.
8 years ago
Reece H. Dunn
d093513b65
tokenizer.c: Add an options parameter to the tokenizer_reset API.
8 years ago
Reece H. Dunn
c41ac642fa
tokenizer.c: Tokenise Zp codepoints as paragraphs.
8 years ago
Reece H. Dunn
fc7a4e6701
tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.
8 years ago
Reece H. Dunn
d2d718d700
tokenizer.c: Tokenize line separator codepoints as newline tokens.
8 years ago
Reece H. Dunn
bf45e7ce36
tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.
8 years ago
Reece H. Dunn
df6ca7a22c
tokenizer.c: Support whitespace tokens.
8 years ago
Reece H. Dunn
539edac795
tokenizer.c: Create a codepoint_type helper function to classify codepoints for the tokenizer.
8 years ago
Reece H. Dunn
8f0dae6a38
tokenizer.c: Support windows newlines.
8 years ago
Reece H. Dunn
b897ff5aa8
encoding.c: Support calling peekc past the end of the buffer. This makes calling peekc easier.
8 years ago
Reece H. Dunn
3f692f498b
encoding.c: Implement a peekc API.
8 years ago
Reece H. Dunn
1c8ed9c190
tokenizer.c: Support mac newlines.
8 years ago
Reece H. Dunn
7602c9ac18
tokenizer.c: Support linux newlines.
8 years ago
Reece H. Dunn
bce44316bb
Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.
8 years ago
Reece H. Dunn
3cc53d98f4
Add ucd.h to tokenizer.c to provide the definition of the ucd_category identifier for the emscripten build.
8 years ago
Reece H. Dunn
61d668c0cb
ucd-tools: Inverted_Terminal_Punctuation eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
5c6bc0e556
Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.
8 years ago
Reece H. Dunn
bc13173ac4
ucd-tools: Punctuation_In_Word eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
1131d0924b
ucd-tools: Optional_Space_After eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
b932f3c493
ucd-tools: Extended_Dash eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
3100ca9d1b
Use ucd_properties to implement clause_type_from_codepoint for supported types.
8 years ago
Reece H. Dunn
be86091088
Use #defines for the ESPEAKNG_PROPERTY_ constants, so they can be used in things like switch expressions.
8 years ago
Reece H. Dunn
d18d98b92c
Use #defines for the UCD_PROPERTY_ constants, so they can be used in things like switch expressions.
8 years ago
Reece H. Dunn
1c4ce3dcd3
tokenizer.c: create and use a clause_type_from_codepoint function, with tests.
8 years ago
Reece H. Dunn
92f703d98b
Use defines instead of hard-coded numbers for more clause logic.
8 years ago
Reece H. Dunn
8749891069
Better specify the CLAUSE_ flags returned by ReadClause.
8 years ago
Reece H. Dunn
2e375362c4
ucd-tools: Paragraph_Separator eSpeakNG extended property support.
8 years ago
Reece H. Dunn
7fc4f5ece2
ucd-tools: Ellipsis eSpeakNG extended property support.
8 years ago
Reece H. Dunn
8d8c8b3b56
ucd-tools: Semi_Colon eSpeakNG extended property support.
8 years ago
Reece H. Dunn
9869ee051e
ucd-tools: Colon eSpeakNG extended property support.
8 years ago
Reece H. Dunn
9ef03b8ac8
ucd-tools: Comma eSpeakNG extended property support.
8 years ago
Reece H. Dunn
5017153d62
ucd-tools: Exclamation_Mark eSpeakNG extended property support.
8 years ago