Reece H. Dunn
85801fc1e3
Remove the now unused dictdialect functionality from the code.
8 years ago
Reece H. Dunn
2f8f125c68
Remove voice/language support for alphabet2.
This is not used by any of the espeak-ng voices and languages.
Additionally, this functionality would be superceded by support
for specifying the language used by different scripts in the
language argument on the command line.
8 years ago
Reece H. Dunn
dd90d3812d
tokenizer.c: Support general symbol tokens.
8 years ago
Reece H. Dunn
786575c6ed
tokenizer.c: Support general punctuation tokens.
8 years ago
Reece H. Dunn
0705844bf8
tokenizer.c: Move general category classification that does not override property behaviour to the end, for generic classification.
8 years ago
Reece H. Dunn
683579f403
Make the tokenizer.h API public.
8 years ago
Reece H. Dunn
9af96da469
Make the encoding.h API public.
8 years ago
Reece H. Dunn
55bfbb4754
tokenizer.c: Support ellipsis tokens.
8 years ago
Reece H. Dunn
b847df63b5
tokenizer.c: Support semicolon tokens.
8 years ago
Reece H. Dunn
af7e8fc5a3
tokenizer.c: Support colon tokens.
8 years ago
Reece H. Dunn
7560070dcd
tokenizer.c: Support comma tokens.
8 years ago
Reece H. Dunn
c9199cfacb
tokenizer.c: Support exclamation mark tokens.
8 years ago
Reece H. Dunn
128ceaff6a
tokenizer.c: Support question mark tokens.
8 years ago
Reece H. Dunn
8f62e18324
tokenizer.c: Support full stop tokens.
8 years ago
chrislm
5d8bb74169
IT: new improvements tested on april 2017
reduced length to 160 for unstressed syllables
Added some exceptions to the italian dictionaries
8 years ago
Reece H. Dunn
d50f3f2fa5
tokenizer.c: Support word tokens.
8 years ago
Reece H. Dunn
d093513b65
tokenizer.c: Add an options parameter to the tokenizer_reset API.
8 years ago
Reece H. Dunn
c41ac642fa
tokenizer.c: Tokenise Zp codepoints as paragraphs.
8 years ago
Reece H. Dunn
fc7a4e6701
tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.
8 years ago
Reece H. Dunn
d2d718d700
tokenizer.c: Tokenize line separator codepoints as newline tokens.
8 years ago
Reece H. Dunn
bf45e7ce36
tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.
8 years ago
Reece H. Dunn
df6ca7a22c
tokenizer.c: Support whitespace tokens.
8 years ago
Reece H. Dunn
539edac795
tokenizer.c: Create a codepoint_type helper function to classify codepoints for the tokenizer.
8 years ago
Reece H. Dunn
8f0dae6a38
tokenizer.c: Support windows newlines.
8 years ago
Reece H. Dunn
b897ff5aa8
encoding.c: Support calling peekc past the end of the buffer. This makes calling peekc easier.
8 years ago
Reece H. Dunn
3f692f498b
encoding.c: Implement a peekc API.
8 years ago
Reece H. Dunn
1c8ed9c190
tokenizer.c: Support mac newlines.
8 years ago
Reece H. Dunn
7602c9ac18
tokenizer.c: Support linux newlines.
8 years ago
Reece H. Dunn
bce44316bb
Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.
8 years ago
Reece H. Dunn
3cc53d98f4
Add ucd.h to tokenizer.c to provide the definition of the ucd_category identifier for the emscripten build.
8 years ago
Reece H. Dunn
61d668c0cb
ucd-tools: Inverted_Terminal_Punctuation eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
5c6bc0e556
Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.
8 years ago
Reece H. Dunn
bc13173ac4
ucd-tools: Punctuation_In_Word eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
1131d0924b
ucd-tools: Optional_Space_After eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
b932f3c493
ucd-tools: Extended_Dash eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
3100ca9d1b
Use ucd_properties to implement clause_type_from_codepoint for supported types.
8 years ago
Reece H. Dunn
1c4ce3dcd3
tokenizer.c: create and use a clause_type_from_codepoint function, with tests.
8 years ago
Reece H. Dunn
92f703d98b
Use defines instead of hard-coded numbers for more clause logic.
8 years ago
Reece H. Dunn
8749891069
Better specify the CLAUSE_ flags returned by ReadClause.
8 years ago
Reece H. Dunn
e4e1e4db0a
TranslateWord: remove the unused add_plural_suffix variable.
8 years ago
Reece H. Dunn
62d4aff9a9
Remove the now unused option_multibyte variable.
8 years ago
Reece H. Dunn
ec8a7b810f
Use the text decoder object at the top-level Synthesize/espeak_TextToPhonemes call, not in TranslateClause.
8 years ago
Reece H. Dunn
b3e0fbc8ed
encoding.c: Create a text_decoder_decode_string_multibyte helper to work with the espeakCHARS_* flags.
8 years ago
Reece H. Dunn
9dabf64680
encoding.c: Support determining the string length for length < 0.
8 years ago
Reece H. Dunn
b5ed1f28a5
encoding.c: Don't crash if NULL is passed as the string to the decode APIs.
8 years ago
Reece H. Dunn
d167d5649b
encoding.c: Implement support for the auto-detected character set (utf-8 + codepoint-encoding).
8 years ago
Reece H. Dunn
be480c12de
Make TranslateClause return 'const void *' to preserve constness.
8 years ago
Reece H. Dunn
6451917bde
encoding.c: Fix text_decoder_get_buffer at EOF.
8 years ago
Reece H. Dunn
7c16ac543c
Use the text decoder API in readclause.c.
8 years ago
Reece H. Dunn
8933185de4
Remove the unused f_in argument to the Read/Translate/SpeakNextClause functions.
8 years ago