|  Reece H. Dunn
					
				
				
						b0fc50b954
						
				
				
				Build HTML versions of the README and CHANGELOG files. | 9 years ago | |
|---|---|---|
| _layouts | 9 years ago | |
| data/csur | 11 years ago | |
| docs | 9 years ago | |
| src | 9 years ago | |
| tests | 9 years ago | |
| tools | 9 years ago | |
| .gitignore | 9 years ago | |
| AUTHORS | 13 years ago | |
| CHANGELOG.md | 9 years ago | |
| COPYING | 13 years ago | |
| COPYING.UCD | 11 years ago | |
| Makefile.am | 9 years ago | |
| README.md | 11 years ago | |
| autogen.sh | 9 years ago | |
| configure.ac | 9 years ago | |
The Unicode Character Database (UCD) Tools is a set of Python tools and a C++ library. The Python tools are designed to support extracting and processing data from the text-based UCD source files, while the C++ library is designed to provide easy access to this information within a C++ program.
The ucd-tools project provides support for UCD formatted data files from
several different sources.
The following Unicode Character Database files are supported:
If enabled, the following data from the ConScript Unicode Registry (CSUR) is added:
| Code Range | Script | 
|---|---|
| F8D0-F8FF | Klingon | 
This data is located in the data/csur directory in a form compatible with the
Unicode Character Data files.
The C++ library provides several different facilities that make use of the UCD data. It provides a compact and efficient representation of the different data tables.
Detailed documentation is provided in the src/include/ucd/ucd.h file in the
Doxygen documentation format.
The library exposes the following properties from the UCD data files:
| Property | Description | 
|---|---|
| General_Category | A General Category Value, including the higher-level grouping. | 
| Script | An ISO 15924 script code. | 
The following character conversion functions are provided:
ucd::tolower -- convert letters to lower caseucd::totitle -- convert letters to title case (UCD extension)ucd::toupper -- convert letters to upper caseNOTE: These functions use the simple case mapping algorithm. That is, they
only ever map to a single character. This is to provide a compatible signature
to the standard C wctype.h APIs.
To facilitate working on platforms that don’t have a useable wide-character
ctypes library, or to provide a more consistent behaviour, the ucd-tools
C library provides a set of APIs that are compatible with wctype.h.
The following character classification functions are provided:
ucd::isalnumucd::isalphaucd::iscntrlucd::isdigitucd::isgraphucd::islowerucd::isprintucd::ispunctucd::isspaceucd::isupperNOTE: Equivalents for isblank and isxdigit are not provided.
In order to build ucd-tools, you need:
make, autoconf, automake and libtool);To build the documentation, you need:
Core Dependencies:
| Dependency | Install | 
|---|---|
| autotools | sudo apt-get install make autoconf automake libtool | 
| c++ compiler | sudo apt-get install gcc g++ | 
Documentation Dependencies:
| Dependency | Install | 
|---|---|
| doxygen | sudo apt-get install doxygen | 
| graphviz | sudo apt-get install graphviz | 
UCD Tools supports the standard GNU autotools build system. The source code
does not contain the generated configure files, so to build it you need to
run:
./autogen.sh
./configure --prefix=/usr
make
The tests can be run by using:
make check
The program can be installed using:
sudo make install
The documentation can be built using:
make html
To re-generate the source files from the UCD data when a new version of unicode is released, you need to run:
./configure --prefix=/usr --with-unicode-version=VERSION
make ucd-update
where VERSION is the Unicode version (e.g. 6.3.0).
Additionally, you can use the UCD_FLAGS option to control how the data is
generated. The following flags are supported:
| Flag | Description | 
|---|---|
| --with-csur | Add ConScript Unicode Registry data. | 
Report bugs to the ucd-tools issues page on GitHub.
UCD Tools is released under the GPL version 3 or later license.
The UCD data files in data/ucd are downloaded from the UCD website and are
licensed under the Unicode Terms of Use. These data files are
used in their unmodified form. They have the following Copyright notice:
Copyright © 1991-2014 Unicode, Inc. All rights reserved.
The files in data/csur are based on the information from the ConScript
Unicode Registry maintained by John Cowan and Michael Everson.