Reece H. Dunn
f109bb918f
isspace: don't include <noBreak> characters.
8 years ago
Reece H. Dunn
5f9dc111cf
Add tests for the isdigit and isxdigit ctype APIs.
8 years ago
Reece H. Dunn
bd71fed013
ctype: return true in isupper/islower if there is a simple case mapping present
8 years ago
Reece H. Dunn
e77b7c7b49
printdata: create an isspace helper function
8 years ago
Reece H. Dunn
ceda811b12
printdata: add the properties to the primary data map
8 years ago
Reece H. Dunn
a2193799e4
printdata: use get to return a default value if the map key is not present
8 years ago
Reece H. Dunn
cd9cc8e6e2
Unicode Character Data 9.0.0
8 years ago
Reece H. Dunn
a83ce9ee8e
Python 3 compatibility fixes.
8 years ago
Reece H. Dunn
7201a1a150
Convert scripts.cpp from C++ to C.
9 years ago
Reece H. Dunn
707998940d
Convert categories.cpp from C++ to C.
9 years ago
Reece H. Dunn
0afcb3f89f
Convert case.cpp from C++ to C.
9 years ago
Reece H. Dunn
454038dbfa
Create a C-based API in addition to the C++-based API in <ucd/ucd.h>.
9 years ago
Reece H. Dunn
bcf8be59b3
Support enabling the CSUR data.
10 years ago
Reece H. Dunn
28baabf72a
Remove the IANA subtag registry parser
This is not needed now that PropertyValueAliases is used for script
mapping.
10 years ago
Reece H. Dunn
1154409393
Use PropertyValueAliases for the script mapping.
The mapping of the script labels in the UCD data to ISO 15924
script tags is now done using the sc property map in the
PropertyValueAliases data.
This has the following benefits:
1. It removes the dependency on the IANA subtag registry.
2. It ensures the scripts are correct as specified in the
UCD data files.
10 years ago
Reece H. Dunn
8a8f021a2c
ucd: support parsing PropertyValueAliases data
10 years ago
Reece H. Dunn
9589e27f0f
tools/printdata.py: don't include CSUR data in the tests
11 years ago
Reece H. Dunn
ced06ed0f4
Do not include supplementary data in the UCD APIs.
This removes support for the CSUR (ConScript Unicode Registry) data
in the main Unicode APIs. This data should be accessed through a
different API.
11 years ago
Reece H. Dunn
b757f60c63
Unicode Character Data 7.0.0
11 years ago
Reece H. Dunn
4747999f57
tools/iana.py: read_data is not used, so remove it
11 years ago
Reece H. Dunn
88e72aeb0a
tools/ucd.py: support printing out the data as CSV with specified columns from the command-line
12 years ago
Reece H. Dunn
7e411b34e9
F8D0-D8FF: Klingon
12 years ago
Reece H. Dunn
c06f296d87
tools/scripts.py: merge some script set ranges
12 years ago
Reece H. Dunn
65f95033c8
Add support for querying the Script property.
12 years ago
Reece H. Dunn
349e225aae
Support mapping a General Category to a General Category Group.
12 years ago
Reece H. Dunn
6e15fd6d9b
Add tests for ucd::lookup_category_group.
12 years ago
Reece H. Dunn
3f9f6c0623
Add tests for ucd::isspace.
12 years ago
Reece H. Dunn
cc9abdff12
Fetch UnicodeData.txt from unicode.org if not present to make the build fully automated.
12 years ago
Reece H. Dunn
2d982956a5
Store the category data in uint8_t arrays to minimize their compiled size.
12 years ago
Reece H. Dunn
2df0e6abdb
Factor out the remaining single category tables.
12 years ago
Reece H. Dunn
7f1dd9cc96
Avoid duplicating Lo only tables.
12 years ago
Reece H. Dunn
ea09eb5c45
Add tests for querying UCD properties; fix discovered issues.
12 years ago
Reece H. Dunn
9c3a87dbeb
Add toupper, tolower and totitle case-conversion APIs.
12 years ago
Reece H. Dunn
e3e85d33f2
Rename Ci to Ii and move it to an I/Invalid category group as it is not part of the UCD C/Other category group.
12 years ago
Reece H. Dunn
bc6a5c23cc
Remove the Zc class as it is not part of the UCD; special case Cc-based whitespace instead.
12 years ago
Reece H. Dunn
ff7a5e0209
Add support for looking up the general category group for a codepoint.
12 years ago
Reece H. Dunn
a416b4090c
Display the Unicode Character Database version in the generated file.
12 years ago
Reece H. Dunn
12bafa6b4d
tools/categories.py: generate category lookup tables for the full unicode range.
12 years ago
Reece H. Dunn
e6133fcafd
tools/ucd.py: fixup the codepoint ranges when processing the UnicodeData file.
12 years ago
Reece H. Dunn
a77e5a142c
tools/ucd.py: parse CodePoint/CodeRange entries to their numerical values.
12 years ago
Reece H. Dunn
2813950acc
Infrastructure for building libucd.a.
12 years ago
Reece H. Dunn
1b24e604ed
Parse the UCD data files.
12 years ago