Reece H. Dunn
							
						 | 
						
							
								48ca2239bb
								
							
						 | 
						
							
									Fix non-Latin character languages falling back to English when reading Latin characters.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								d674529b9c
								
							
						 | 
						
							
									tests/languages.test: Rename the test function to test_lang, to avoid a conflict with the test command.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								5030ff95cc
								
							
						 | 
						
							
									automake: don't make the tests print 'done', have the make rule print 'PASSED' instead.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								8821c9e361
								
							
						 | 
						
							
									Split out the readclause tests so the tokenizer tests can use public-only APIs.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								d72557aed2
								
							
						 | 
						
							
									Add simple tests for testing voice selection by name.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								6ec3e85007
								
							
						 | 
						
							
									Add language.tests for de, en and jp to test the phoneme generation.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								dd90d3812d
								
							
						 | 
						
							
									tokenizer.c: Support general symbol tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								786575c6ed
								
							
						 | 
						
							
									tokenizer.c: Support general punctuation tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								683579f403
								
							
						 | 
						
							
									Make the tokenizer.h API public.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								9af96da469
								
							
						 | 
						
							
									Make the encoding.h API public.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								55bfbb4754
								
							
						 | 
						
							
									tokenizer.c: Support ellipsis tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								b847df63b5
								
							
						 | 
						
							
									tokenizer.c: Support semicolon tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								af7e8fc5a3
								
							
						 | 
						
							
									tokenizer.c: Support colon tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								7560070dcd
								
							
						 | 
						
							
									tokenizer.c: Support comma tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								c9199cfacb
								
							
						 | 
						
							
									tokenizer.c: Support exclamation mark tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								128ceaff6a
								
							
						 | 
						
							
									tokenizer.c: Support question mark tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								8f62e18324
								
							
						 | 
						
							
									tokenizer.c: Support full stop tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								d50f3f2fa5
								
							
						 | 
						
							
									tokenizer.c: Support word tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								a902f451d8
								
							
						 | 
						
							
									tests/tokenizer.test: Support printing the tokens from a provided file, making it easy to investigate tokenizer issues.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								d093513b65
								
							
						 | 
						
							
									tokenizer.c: Add an options parameter to the tokenizer_reset API.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								c41ac642fa
								
							
						 | 
						
							
									tokenizer.c: Tokenise Zp codepoints as paragraphs.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								f3ea6f68f3
								
							
						 | 
						
							
									tokenizer.c: Tokenise U+000B [VERTICAL TAB (VT)] as whitespace, not as newlines.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								fc7a4e6701
								
							
						 | 
						
							
									tokenizer.c: Recognise U+000C [FORM FEED (FF)] as a newline codepoint.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								d2d718d700
								
							
						 | 
						
							
									tokenizer.c: Tokenize line separator codepoints as newline tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								bf45e7ce36
								
							
						 | 
						
							
									tokenizer.c: Recognise U+0085 [NEW LINE (NEL)] as a newline codepoint.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								df6ca7a22c
								
							
						 | 
						
							
									tokenizer.c: Support whitespace tokens.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								8f0dae6a38
								
							
						 | 
						
							
									tokenizer.c: Support windows newlines.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								b897ff5aa8
								
							
						 | 
						
							
									encoding.c: Support calling peekc past the end of the buffer. This makes calling peekc easier.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								3f692f498b
								
							
						 | 
						
							
									encoding.c: Implement a peekc API.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								1c8ed9c190
								
							
						 | 
						
							
									tokenizer.c: Support mac newlines.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								7602c9ac18
								
							
						 | 
						
							
									tokenizer.c: Support linux newlines.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								bce44316bb
								
							
						 | 
						
							
									Create a basic tokenizer API using a structure that mirrors the TtsTokenizer interface in the tts-dev-studio project.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								5c6bc0e556
								
							
						 | 
						
							
									Armenian emphasis mark (U+055B) is used for interjections, so treat it as an exclamation mark.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								1c4ce3dcd3
								
							
						 | 
						
							
									tokenizer.c: create and use a clause_type_from_codepoint function, with tests.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								691457e98d
								
							
						 | 
						
							
									Add Prepended_Concatenation_Mark support from PropList.txt.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								4ce8b61180
								
							
						 | 
						
							
									Extend ucd_property to 64-bits to allow all properties to be specified.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								a9aabc6242
								
							
						 | 
						
							
									Add tests for the PropList API.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								9dabf64680
								
							
						 | 
						
							
									encoding.c: Support determining the string length for length < 0.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								b5ed1f28a5
								
							
						 | 
						
							
									encoding.c: Don't crash if NULL is passed as the string to the decode APIs.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								d167d5649b
								
							
						 | 
						
							
									encoding.c: Implement support for the auto-detected character set (utf-8 + codepoint-encoding).
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								6a0b5e4ae1
								
							
						 | 
						
							
									encoding.c: Support using wchar_t strings with the text decoder API.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								b74f756f00
								
							
						 | 
						
							
									encoding.c: Support the ISO-10646-UCS-2 encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								fa5d31a8af
								
							
						 | 
						
							
									encoding.c: Support the UTF-8 encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								2499610433
								
							
						 | 
						
							
									encoding.c: Support the ISCII encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								39f3ea54cf
								
							
						 | 
						
							
									encoding.c: Support the KOI8-R encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								b8a1006dd8
								
							
						 | 
						
							
									encoding.c: Support the ISO 8859-16 encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								166e815723
								
							
						 | 
						
							
									encoding.c: Support the ISO 8859-15 encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								91e054ec7c
								
							
						 | 
						
							
									encoding.c: Fix the ISO 8859 encoding names with date suffices.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								0235c42652
								
							
						 | 
						
							
									encoding.c: Support the ISO 8859-14 encoding.
							
							
							
							
						 | 
						8 years ago | 
					
				
					
						
							
								   Reece H. Dunn
							
						 | 
						
							
								24faceab57
								
							
						 | 
						
							
									encoding.c: Support the ISO 8859-13 encoding.
							
							
							
							
						 | 
						8 years ago |