Reece H. Dunn
7b8fa3660d
Install the encoding.h and tokenizer.h header files.
8 years ago
Reece H. Dunn
dd90d3812d
tokenizer.c: Support general symbol tokens.
8 years ago
Reece H. Dunn
786575c6ed
tokenizer.c: Support general punctuation tokens.
8 years ago
Reece H. Dunn
0705844bf8
tokenizer.c: Move general category classification that does not override property behaviour to the end, for generic classification.
8 years ago
Reece H. Dunn
683579f403
Make the tokenizer.h API public.
8 years ago
Reece H. Dunn
9af96da469
Make the encoding.h API public.
8 years ago
Reece H. Dunn
55bfbb4754
tokenizer.c: Support ellipsis tokens.
8 years ago
Reece H. Dunn
706e780ff4
Merge remote-tracking branch 'pettarin/master'
8 years ago
Alberto Pettarin
3b4487e8a7
Updated directions to compile JS with emscripten
8 years ago
Alberto Pettarin
123309a07b
Added git ignore for emscripted in UCD tools
8 years ago
Reece H. Dunn
b847df63b5
tokenizer.c: Support semicolon tokens.
8 years ago
Alberto Pettarin
6ce74efeca
Fixed selection of default voice in JS demo
8 years ago
Reece H. Dunn
af7e8fc5a3
tokenizer.c: Support colon tokens.
8 years ago
Reece H. Dunn
7560070dcd
tokenizer.c: Support comma tokens.
8 years ago
Reece H. Dunn
c9199cfacb
tokenizer.c: Support exclamation mark tokens.
8 years ago
Reece H. Dunn
128ceaff6a
tokenizer.c: Support question mark tokens.
8 years ago
Reece H. Dunn
8f62e18324
tokenizer.c: Support full stop tokens.
8 years ago
Reece H. Dunn
0bbc9e9730
Merge remote-tracking branch 'Christianlm/master'
8 years ago
chrislm
5d8bb74169
IT: new improvements tested on april 2017
reduced length to 160 for unstressed syllables
Added some exceptions to the italian dictionaries
8 years ago
Reece H. Dunn
d50f3f2fa5
tokenizer.c: Support word tokens.
8 years ago
Reece H. Dunn
a902f451d8
tests/tokenizer.test: Support printing the tokens from a provided file, making it easy to investigate tokenizer issues.
8 years ago
Reece H. Dunn
d093513b65
tokenizer.c: Add an options parameter to the tokenizer_reset API.
8 years ago
Reece H. Dunn
c41ac642fa
tokenizer.c: Tokenise Zp codepoints as paragraphs.
8 years ago
Reece H. Dunn
f3ea6f68f3
tokenizer.c: Tokenise U+000B [VERTICAL TAB (VT)] as whitespace, not as newlines.
8 years ago
Reece H. Dunn
fc7a4e6701
tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.
8 years ago
Reece H. Dunn
d2d718d700
tokenizer.c: Tokenize line separator codepoints as newline tokens.
8 years ago
Reece H. Dunn
bf45e7ce36
tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.
8 years ago
Reece H. Dunn
df6ca7a22c
tokenizer.c: Support whitespace tokens.
8 years ago
Reece H. Dunn
539edac795
tokenizer.c: Create a codepoint_type helper function to classify codepoints for the tokenizer.
8 years ago
Reece H. Dunn
7b1243679f
Update the document for the new fr-CH French accent.
8 years ago
Reece H. Dunn
4bc3f15e79
Generalize the exclusion of Windows batch files.
8 years ago
claude beazley
c05e3898a4
Adding Swiss French Variant
Creating the language variant , swiss french. Primarily for counting
as Swiss French uses huitante for 80 and , like the Belgians septante
eand nonante for 70 and 90.
8 years ago
Reece H. Dunn
8f0dae6a38
tokenizer.c: Support windows newlines.
8 years ago
Reece H. Dunn
b897ff5aa8
encoding.c: Support calling peekc past the end of the buffer. This makes calling peekc easier.
8 years ago
Reece H. Dunn
3f692f498b
encoding.c: Implement a peekc API.
8 years ago
Reece H. Dunn
1c8ed9c190
tokenizer.c: Support mac newlines.
8 years ago
Reece H. Dunn
7602c9ac18
tokenizer.c: Support linux newlines.
8 years ago
Reece H. Dunn
bce44316bb
Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.
8 years ago
Reece H. Dunn
3cc53d98f4
Add ucd.h to tokenizer.c to provide the definition of the ucd_category identifier for the emscripten build.
8 years ago
Reece H. Dunn
ee61cc4358
Fix running 'make clean' when gradle is not present. Gradle is used for the Android build and is not needed when just building eSpeak NG on Linux/BSD systems.
8 years ago
Reece H. Dunn
a72199f714
Run the tests as part of the Travis build.
8 years ago
Reece H. Dunn
61d668c0cb
ucd-tools: Inverted_Terminal_Punctuation eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
5c6bc0e556
Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.
8 years ago
Reece H. Dunn
bc13173ac4
ucd-tools: Punctuation_In_Word eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
1131d0924b
ucd-tools: Optional_Space_After eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
b932f3c493
ucd-tools: Extended_Dash eSpeakNG extended property support; use in clause_type_from_codepoint.
8 years ago
Reece H. Dunn
3100ca9d1b
Use ucd_properties to implement clause_type_from_codepoint for supported types.
8 years ago
Reece H. Dunn
be86091088
Use #defines for the ESPEAKNG_PROPERTY_ constants, so they can be used in things like switch expressions.
8 years ago
Reece H. Dunn
af88820954
Merge commit 'd18d98b92ca42bf7b098d3d1fd873d9b0ee82e00'
8 years ago
Reece H. Dunn
d18d98b92c
Use #defines for the UCD_PROPERTY_ constants, so they can be used in things like switch expressions.
8 years ago