Light

fonttools / unicodedata2 Goto Github PK

View Code? Open in Web Editor NEW

34.0 11.0 21.0 4.73 MB

unicodedata backport/updates

License: Apache License 2.0

C 98.03% Python 1.97%

unicodedata2's Introduction

unicodedata2

unicodedata backport/updates.

The versions of this package match Unicode versions, so unicodedata2==13.0.0 is data from Unicode 13.0.0.

Pre-compiled wheel packages are available on PyPI and can be installed via pip.

Testing

We run the tests using tox. This can be installed as usual with pip install tox.

Without any options, tox will run the tests against all of the library's target Python versions. Any missing versions will be skipped.

To run tests against a specific python version you can use the -e option followed by a tox environment name. E.g. -e py38 will run tests against Python 3.8. For more info, check tox's documentation.

unicodedata2's People

Contributors

Stargazers

Watchers

Forkers

anthrotype jayvdb moyogo daltonmaag achernet cequencer amitdo studioego scw harmon758 iwsfutcmd patton-l jackenmen madig catap isabella232 sshyran pnemade snoopj mdboom mlodewijck

unicodedata2's Issues

Release 14.0.0

I'm going to be working on the next release, 14.0.0. This is what I anticipate being done:

Feedback is welcome.

unicode 12.0+ characters not supported

None of the new characters since unicode 12.0 are recognized.

Doing

>>> import unicodedata
>>> unicodedata.name('𛅥') # or any other of the new characters like ᳺ or 𛅐

returns:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name

I'm using python3.7 with unicodedata2-12.1.0

Add old tables

It would be useful to be able to refer to tables for the previous versions of Unicode.

jquast/wcwidth#23 is attempting to do that.

It would also be very helpful to faciliate Python based analysis of the changes in Unicode data.

It seems the build infrastructure of unicodedata2 is perfect for that.

In order to avoid forcing all users to install all data, perhaps a separate PyPI package name could be used for the 'all unicodedata versions' edition of this.

Documentation

Please provide some pages with documentation containing examples of how to use this module. Here in the README and also at https://pypi.org/project/unicodedata2/ if possible

Distribute wheels

It would be great if this package provided installable wheels, so that users do not need to compile the package.

This would be especially useful for Python 2.6 and Python 2.7 on Linux, as Python 2.6.6, 2.7.0 and 2.7.1 have a regression in unicodedata.normalize NFC, and this backport may be the only 'solution' to allow support for those versions. This is particularly important for Python 2.6.6, as it is the Python version provided by Redhat Enterprise Linux.
http://bugs.python.org/issue10254
https://phabricator.wikimedia.org/T102461

Add is_normalize (port upstream)

There are a few commits at https://github.com/python/cpython/commits/master/Modules/unicodedata.c that aren’t in our versions. We should port/backport them.

Support PyPy?

I don't know how much involved this would be, but it would be nice to make this extension module also work with the PyPy implementation as well as CPython.

This is the error I get if I try to compile the extension module with PyPy on my mac

$ python setup.py bdist_wheel
running bdist_wheel
running build
running build_ext
building 'unicodedata2' extension
creating build
creating build/temp.macosx-10.12-x86_64-2.7
creating build/temp.macosx-10.12-x86_64-2.7/unicodedata2
creating build/temp.macosx-10.12-x86_64-2.7/unicodedata2/py2
cc -pthread -arch x86_64 -DNDEBUG -O2 -fPIC -I./unicodedata2/py2 -I./unicodedata2/ -I/Users/cosimolupo/Documents/Github/pyenv/versions/unicodedata2-pypy2/include -c ./unicodedata2/py2/unicodedata.c -o build/temp.macosx-10.12-x86_64-2.7/./unicodedata2/py2/unicodedata.o
./unicodedata2/py2/unicodedata.c:16:10: fatal error: 'ucnhash.h' file not found
#include "ucnhash.h"
         ^~~~~~~~~~~
1 error generated.
error: command 'cc' failed with exit status 1

Some info on writing extension modules compatible with PyPy:
http://doc.pypy.org/en/latest/extending.html

Unicode 12.1 Support

https://www.unicode.org/versions/Unicode12.1.0/
http://blog.unicode.org/2019/05/unicode-12-1-en.html
python/cpython@3aca40d
python/cpython#13214
https://bugs.python.org/issue36861

support Unicode 11

was published yesterday
http://blog.unicode.org/2018/06/announcing-unicode-standard-version-110.html

Update for Unicode 15.1

With the release of Unicode 15.1, it's time for another update to unicodedata2.

Aside from bumping UNIDATA_VERSION, the new CJK Ideograph Extension I needs to be added to the unified CJK ranges, and there's also a bug in makeunicodedata.py based on how it handles character properties. This may also a good opportunity to review the changes from #58, and to add Python 3.12 (scheduled to come out on 2 October 2023) to the library's tox configuration.

I have opened an upstream PR since this involves a bugfix in the CPython tooling, I figure I won't open a PR for unicodedata2 until this is merged to minimize drift from the upstream, but I can say that applying the same changeset to the relevant files in unicodedata2 does work and the test suite passes.

EDIT: the upstream PR has landed, PR forthcoming.

Python 3 support

This package is most useful on Python 2, but it can also be useful on older Python 3.
I'd be happy to add the necessary compilation voodoo needed to support Python 3.

Synchronize with CPython

As mentioned in #55, unicodedata.c and other files extracted from CPython are pretty out of sync with the upstream. Filing this issue so that there's a coherent place for questions/discussion that might crop up in pursuit of that.

See also #39 which covers some of the same ground, but is itself out of date.

[feature-request] support Unicode 8.0.0?

I imagine this is already in the pipeline.

I'm not familiar with unicodedata internals. How hard is it to modify the unicodedata2 module to use updated data files from Unicode.org?

Thanks.

Unicode 9.0 support?

any plans of adding support for Unicode 9.0?
thanks!

Drop test code for narrow builds of Python

There is some code in the unicodedata2 test associated with narrow (UCS2-only) builds of Python. I believe this code was added (in #35) when unicodedata2 was still supporting Python 2 and versions before Python 3.3 when narrow builds were removed.

As far as I know, this code is entirely dead now that unicodedata2 does not support these older versions of Python.

(context: I noticed this while diffing the test suite against the upstream tests as I organize a PR for #60)

Update to Unicode 13

https://unicode.org/versions/Unicode13.0.0/
https://bugs.python.org/issue39926

I'm gonna try to follow the instructions by @Harmon758 in #29 (comment)

Doesn't build for Python 3.11

Attempting to build unicodedata2 in a docker container (aarch64/amd64). Works fine using Python 3.10.4-slim-bullseye base image. Build fails using Python 3.11.0b1-slim-bullseye base image. No other changes made to the build process except the base container.

Error is as follows:

#17 17.42   Running setup.py install for unicodedata2: started
#17 17.42   Running command Running setup.py install for unicodedata2
#17 17.73   running build_ext
#17 17.74   building 'unicodedata2' extension
#17 17.74   creating build
#17 17.74   creating build/temp.linux-aarch64-cpython-311
#17 17.74   creating build/temp.linux-aarch64-cpython-311/unicodedata2
#17 17.74   gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./unicodedata2/ -I/usr/local/include/python3.11 -c ./unicodedata2/unicodectype.c -o build/temp.linux-aarch64-cpython-311/./unicodedata2/unicodectype.o
#17 18.07   gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./unicodedata2/ -I/usr/local/include/python3.11 -c ./unicodedata2/unicodedata.c -o build/temp.linux-aarch64-cpython-311/./unicodedata2/unicodedata.o
#17 18.39   ./unicodedata2/unicodedata.c: In function ‘PyInit_unicodedata2’:
#17 18.39   ./unicodedata2/unicodedata.c:1362:24: error: lvalue required as left operand of assignment
#17 18.39    1362 |     Py_TYPE(&UCD_Type) = &PyType_Type;
#17 18.39         |                        ^
#17 18.41   error: command '/usr/bin/gcc' failed with exit code 1
#17 18.43   error: subprocess-exited-with-error

Unicode 12.0 Support

https://unicode.org/versions/Unicode12.0.0/
http://blog.unicode.org/2019/03/announcing-unicode-standard-version-120.html
python/cpython@738c19f
python/cpython#12256
https://bugs.python.org/issue36252

request to move repo under fonttools organization

Hi @mikekap
would you be willing to share the maintenance burden with us by moving the repository under the fonttools organization?
We use your project in our open-source font build pipeline, and are interested in keeping it up to date with the latest Unicode versions as they get published upstream.
We would like to switch from Travis/Appveyor to Github Workflows and it makes things easier when developers can access the repository settings (e.g. for things like setting encrypted token in the repository's secrets to be able to upload wheels to PyPI).
Of course you will be continue to be the main admin and won't lose control of the project.
Let us know what you think,
thanks

/cc @madig

PEP517

Could you move the buildsystem to PEP517?

Python 2.6: undefined symbol: Py_TOUPPER

unicodedata2 compiles OK on Python2.6.6, but on import it fails.

$ python2.6
Python 2.6.6 (r266:84292, Jun 18 2015, 19:35:20) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux4
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /usr/local/lib/python2.6/site-packages/unicodedata2.so: undefined symbol: Py_TOUPPER

On Python 2.7 it works OK.

add script data

Hi, thanks a lot for this wonderful and very useful code!

My main interest is to get the script for a character through UAX#24 (http://www.unicode.org/Public/UNIDATA/Scripts.txt), is it something that could be added? Thanks!

Support localization

Please, support localization for Unicode block description and character description. Translations are available from https://github.com/unicode-table/unicode-table-data/tree/master/loc If possible, use a gettetxt approach similar to https://pypi.org/project/pycountry/ Implementing this feature will allow users to read Unicode descriptions in their own language, other than English.

For example, now is possible only in English:

from unicodedata import name
print(name('ß'))
LATIN SMALL LETTER SHARP S

So unidecode could provide a way to translate LATIN SMALL LETTER SHARP S to e.g. German with (proposed code):

from unicodedata import name
from unicodedata2 import LOCALED_DIR
from gettext import translation
german = translation('UnicodeData2' LOCALED_DIR, languages=['de'])
german.install()
print(_(name('ß')))
LATEINISCHER KLEINBUCHSTABE SCHARFES S

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.