eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Reece H. Dunn 5915344e3a README: Improve the description of the project.		11 years ago
data/csur	Remove the data/csur/README.md file.	11 years ago
docs	NEWS: remove an errant block of text	11 years ago
src	Use PropertyValueAliases for the script mapping.	11 years ago
tests	Make the category/category_type/script to-string helper methods part of the public API.	13 years ago
tools	Support enabling the CSUR data.	11 years ago
.gitignore	Use PropertyValueAliases for the script mapping.	11 years ago
AUTHORS	Parse the UCD data files.	13 years ago
COPYING	Parse the UCD data files.	13 years ago
Makefile.am	Support enabling the CSUR data.	11 years ago
README.md	README: Improve the description of the project.	11 years ago
autogen.sh	autogen.sh: fix libtoolize detection on Mac OSX	11 years ago
configure.ac	ucd-tools 7.0.0.1	11 years ago

Unicode Character Database Tools

Data Files
- Unicode Character Database
- ConScript Unicode Registry
C Library
Build Dependencies
- Debian
Building
Updating the UCD Data
Bugs
License Information

The Unicode Character Database (UCD) Tools is a set of Python tools and a C library. The Python tools are designed to support extracting and processing data from the text-based UCD source files, while the C library is designed to provide easy access to this information.

Data Files

The ucd-tools project provides support for UCD formatted data files from several different sources.

Unicode Character Database

The following Unicode Character Database files from the Unicode Consortium are supported:

Blocks
DerivedAge
PropList
PropertyValueAliases
Scripts
UnicodeData

ConScript Unicode Registry

If enabled, the following data from the ConScript Unicode Registry (CSUR) is added:

Code Range	Script
`F8D0-F8FF`	Klingon

This data is located in the data/csur directory in a form compatible with the Unicode Character Data files.

C Library

The C library provides several different facilities that make use of the UCD data. It provides a compact and efficient representation of the different data tables.

Detailed documentation is provided in the src/include/ucd/ucd.h file in the Doxygen documentation format.

Querying Properties

The library exposes the following properties from the UCD data files:

Property	Description
`General_Category`	A General Category Value, including the higher-level grouping.
`Script`	An ISO 15924 script code.

Case Conversion

The following character conversion functions are provided:

ucd::tolower -- convert letters to lower case
ucd::totitle -- convert letters to title case (UCD extension)
ucd::toupper -- convert letters to upper case

NOTE: These functions use the simple case mapping algorithm. That is, they only ever map to a single character. This is to provide a compatible signature to the standard C wctype.h APIs.

wctype Compatibility

To facilitate working on platforms that don’t have a useable wide-character ctypes library, or to provide a more consistent behaviour, the ucd-tools C library provides a set of APIs that are compatible with wctype.h.

The following character classification functions are provided:

ucd::isalnum
ucd::isalpha
ucd::iscntrl
ucd::isdigit
ucd::isgraph
ucd::islower
ucd::isprint
ucd::ispunct
ucd::isspace
ucd::isupper

NOTE: Equivalents for isblank and isxdigit are not provided.

Build Dependencies

In order to build ucd-tools, you need:

a functional autotools system (make, autoconf, automake and libtool);
a functional c++ compiler.

To build the documentation, you need:

the doxygen program to build the api documentation;
the dot program from the graphviz library to generate graphs in the api documentation.

Debian

Core Dependencies:

Dependency	Install
autotools	`sudo apt-get install make autoconf automake libtool`
c++ compiler	`sudo apt-get install gcc g++`

Documentation Dependencies:

Dependency	Install
doxygen	`sudo apt-get install doxygen`
graphviz	`sudo apt-get install graphviz`

Building

UCD Tools supports the standard GNU autotools build system. The source code does not contain the generated configure files, so to build it you need to run:

./autogen.sh
./configure --prefix=/usr
make

The tests can be run by using:

make check

The program can be installed using:

sudo make install

The documentation can be built using:

make html

Updating the UCD Data

To re-generate the source files from the UCD data when a new version of unicode is released, you need to run:

./configure --prefix=/usr --with-unicode-version=VERSION
make ucd-update

where VERSION is the Unicode version (e.g. 6.3.0).

Additionally, you can use the UCD_FLAGS option to control how the data is generated. The following flags are supported:

Flag	Description
--with-csur	Add ConScript Unicode Registry data.

Bugs

Report bugs to the ucd-tools issues page on GitHub.

License Information

UCD Tools is released under the GPL version 3 or later license.

README.md