![]() |
10 years ago | |
---|---|---|
data/csur | 10 years ago | |
docs | 11 years ago | |
src | 10 years ago | |
tests | 12 years ago | |
tools | 10 years ago | |
.gitignore | 10 years ago | |
AUTHORS | 12 years ago | |
COPYING | 12 years ago | |
Makefile.am | 10 years ago | |
README.md | 10 years ago | |
autogen.sh | 11 years ago | |
configure.ac | 11 years ago |
The Unicode Character Database (UCD) Tools is a set of Python tools and a C library. The Python tools are designed to support extracting and processing data from the text-based UCD source files, while the C library is designed to provide easy access to this information.
The ucd-tools
project provides support for UCD formatted data files from
several different sources.
The following Unicode Character Database files from the Unicode Consortium are supported:
If enabled, the following data from the ConScript Unicode Registry (CSUR) is added:
Code Range | Script |
---|---|
F8D0-F8FF |
Klingon |
This data is located in the data/csur
directory in a form compatible with the
Unicode Character Data files.
The C library provides several different facilities that make use of the UCD data. It provides a compact and efficient representation of the different data tables.
Detailed documentation is provided in the src/include/ucd/ucd.h
file in the
Doxygen documentation format.
The library exposes the following properties from the UCD data files:
Property | Description |
---|---|
General_Category |
A General Category Value, including the higher-level grouping. |
Script |
An ISO 15924 script code. |
The following character conversion functions are provided:
ucd::tolower
-- convert letters to lower caseucd::totitle
-- convert letters to title case (UCD extension)ucd::toupper
-- convert letters to upper caseNOTE: These functions use the simple case mapping algorithm. That is, they
only ever map to a single character. This is to provide a compatible signature
to the standard C wctype.h
APIs.
To facilitate working on platforms that don’t have a useable wide-character
ctypes library, or to provide a more consistent behaviour, the ucd-tools
C library provides a set of APIs that are compatible with wctype.h
.
The following character classification functions are provided:
ucd::isalnum
ucd::isalpha
ucd::iscntrl
ucd::isdigit
ucd::isgraph
ucd::islower
ucd::isprint
ucd::ispunct
ucd::isspace
ucd::isupper
NOTE: Equivalents for isblank
and isxdigit
are not provided.
In order to build ucd-tools, you need:
make
, autoconf
, automake
and libtool
);To build the documentation, you need:
Core Dependencies:
Dependency | Install |
---|---|
autotools | sudo apt-get install make autoconf automake libtool |
c++ compiler | sudo apt-get install gcc g++ |
Documentation Dependencies:
Dependency | Install |
---|---|
doxygen | sudo apt-get install doxygen |
graphviz | sudo apt-get install graphviz |
UCD Tools supports the standard GNU autotools build system. The source code
does not contain the generated configure
files, so to build it you need to
run:
./autogen.sh
./configure --prefix=/usr
make
The tests can be run by using:
make check
The program can be installed using:
sudo make install
The documentation can be built using:
make html
To re-generate the source files from the UCD data when a new version of unicode is released, you need to run:
./configure --prefix=/usr --with-unicode-version=VERSION
make ucd-update
where VERSION
is the Unicode version (e.g. 6.3.0
).
Additionally, you can use the UCD_FLAGS
option to control how the data is
generated. The following flags are supported:
Flag | Description |
---|---|
--with-csur | Add ConScript Unicode Registry data. |
Report bugs to the ucd-tools issues page on GitHub.
UCD Tools is released under the GPL version 3 or later license.