Browse Source

README: Improve the description of the project.

master
Reece H. Dunn 10 years ago
parent
commit
5915344e3a
1 changed files with 79 additions and 15 deletions
  1. 79
    15
      README.md

+ 79
- 15
README.md View File

@@ -1,6 +1,12 @@
# Unicode Character Data Tools

- [ConScript Unicode Registry](#conscript-unicode-registry)
# Unicode Character Database Tools

- [Data Files](#data-files)
- [Unicode Character Database](#unicode-character-database)
- [ConScript Unicode Registry](#conscript-unicode-registry)
- [C Library](#c-library)
- [Querying Properties](#querying-properties)
- [Case Conversion](#case-conversion)
- [wctype Compatibility](#wctype-compatibility)
- [Build Dependencies](#build-dependencies)
- [Debian](#debian)
- [Building](#building)
@@ -10,22 +16,29 @@

----------

The Unicode Character Data (UCD) Tools is a library for working with the
Unicode Character Data from unicode.org.
The Unicode Character Database (UCD) Tools is a set of Python tools and a C
library. The Python tools are designed to support extracting and processing
data from the text-based UCD source files, while the C library is designed
to provide easy access to this information.

## Data Files

The `ucd-tools` project provides support for UCD formatted data files from
several different sources.

It provides a compact replacement for various wide-character C APIs. These can
be used in Android applications, as the Android C library does not have full
wide-character support.
### Unicode Character Database

In addition to this it provides APIs for:
- querying the [Unicode General Category](http://www.unicode.org/reports/tr44/) values and groups;
- querying the [ISO 15924](http://www.unicode.org/iso15924/iso15924-codes.html) script;
- converting to upper, lower and title case.
The following [Unicode Character Database](http://www.unicode.org/Public/7.0.0/ucd/)
files from the [Unicode Consortium](http://www.unicode.org) are supported:

The following data sets are used for the data tables:
- [Unicode Character Data 7.0.0](http://www.unicode.org/Public/7.0.0/ucd/).
* Blocks
* DerivedAge
* PropList
* PropertyValueAliases
* Scripts
* UnicodeData

## ConScript Unicode Registry
### ConScript Unicode Registry

If enabled, the following data from the
[ConScript Unicode Registry](http://www.evertype.com/standards/csur/) (CSUR) is
@@ -38,6 +51,57 @@ added:
This data is located in the `data/csur` directory in a form compatible with the
Unicode Character Data files.

## C Library

The C library provides several different facilities that make use of the UCD
data. It provides a compact and efficient representation of the different data
tables.

Detailed documentation is provided in the `src/include/ucd/ucd.h` file in the
Doxygen documentation format.

### Querying Properties

The library exposes the following properties from the UCD data files:

| Property | Description |
|--------------------|-------------|
| `General_Category` | A [General Category Value](http://www.unicode.org/reports/tr44/#General_Category_Values), including the higher-level grouping. |
| `Script` | An [ISO 15924](http://www.unicode.org/iso15924/iso15924-codes.html) script code. |

### Case Conversion

The following character conversion functions are provided:

* `ucd::tolower` -- convert letters to lower case
* `ucd::totitle` -- convert letters to title case (UCD extension)
* `ucd::toupper` -- convert letters to upper case

__NOTE:__ These functions use the simple case mapping algorithm. That is, they
only ever map to a single character. This is to provide a compatible signature
to the standard C `wctype.h` APIs.

### wctype Compatibility

To facilitate working on platforms that don't have a useable wide-character
ctypes library, or to provide a more consistent behaviour, the `ucd-tools`
C library provides a set of APIs that are compatible with `wctype.h`.

The following character classification functions are provided:

* `ucd::isalnum`
* `ucd::isalpha`
* `ucd::iscntrl`
* `ucd::isdigit`
* `ucd::isgraph`
* `ucd::islower`
* `ucd::isprint`
* `ucd::ispunct`
* `ucd::isspace`
* `ucd::isupper`

__NOTE:__ Equivalents for `isblank` and `isxdigit` are not provided.

## Build Dependencies

In order to build ucd-tools, you need:

Loading…
Cancel
Save