|
|
11 years ago | |
|---|---|---|
| data/csur | 11 years ago | |
| docs | 11 years ago | |
| src | 11 years ago | |
| tests | 13 years ago | |
| tools | 11 years ago | |
| .gitignore | 11 years ago | |
| AUTHORS | 13 years ago | |
| COPYING | 13 years ago | |
| Makefile.am | 11 years ago | |
| README.md | 11 years ago | |
| autogen.sh | 11 years ago | |
| configure.ac | 11 years ago | |
The Unicode Character Database (UCD) Tools is a set of Python tools and a C library. The Python tools are designed to support extracting and processing data from the text-based UCD source files, while the C library is designed to provide easy access to this information.
The ucd-tools project provides support for UCD formatted data files from
several different sources.
The following Unicode Character Database files from the Unicode Consortium are supported:
If enabled, the following data from the ConScript Unicode Registry (CSUR) is added:
| Code Range | Script |
|---|---|
F8D0-F8FF |
Klingon |
This data is located in the data/csur directory in a form compatible with the
Unicode Character Data files.
The C library provides several different facilities that make use of the UCD data. It provides a compact and efficient representation of the different data tables.
Detailed documentation is provided in the src/include/ucd/ucd.h file in the
Doxygen documentation format.
The library exposes the following properties from the UCD data files:
| Property | Description |
|---|---|
General_Category |
A General Category Value, including the higher-level grouping. |
Script |
An ISO 15924 script code. |
The following character conversion functions are provided:
ucd::tolower -- convert letters to lower caseucd::totitle -- convert letters to title case (UCD extension)ucd::toupper -- convert letters to upper caseNOTE: These functions use the simple case mapping algorithm. That is, they
only ever map to a single character. This is to provide a compatible signature
to the standard C wctype.h APIs.
To facilitate working on platforms that don’t have a useable wide-character
ctypes library, or to provide a more consistent behaviour, the ucd-tools
C library provides a set of APIs that are compatible with wctype.h.
The following character classification functions are provided:
ucd::isalnumucd::isalphaucd::iscntrlucd::isdigitucd::isgraphucd::islowerucd::isprintucd::ispunctucd::isspaceucd::isupperNOTE: Equivalents for isblank and isxdigit are not provided.
In order to build ucd-tools, you need:
make, autoconf, automake and libtool);To build the documentation, you need:
Core Dependencies:
| Dependency | Install |
|---|---|
| autotools | sudo apt-get install make autoconf automake libtool |
| c++ compiler | sudo apt-get install gcc g++ |
Documentation Dependencies:
| Dependency | Install |
|---|---|
| doxygen | sudo apt-get install doxygen |
| graphviz | sudo apt-get install graphviz |
UCD Tools supports the standard GNU autotools build system. The source code
does not contain the generated configure files, so to build it you need to
run:
./autogen.sh
./configure --prefix=/usr
make
The tests can be run by using:
make check
The program can be installed using:
sudo make install
The documentation can be built using:
make html
To re-generate the source files from the UCD data when a new version of unicode is released, you need to run:
./configure --prefix=/usr --with-unicode-version=VERSION
make ucd-update
where VERSION is the Unicode version (e.g. 6.3.0).
Additionally, you can use the UCD_FLAGS option to control how the data is
generated. The following flags are supported:
| Flag | Description |
|---|---|
| --with-csur | Add ConScript Unicode Registry data. |
Report bugs to the ucd-tools issues page on GitHub.
UCD Tools is released under the GPL version 3 or later license.