Git Product home page Git Product logo

unicodedata2's Introduction

Githun CI Status PyPI

unicodedata2

unicodedata backport/updates.

The versions of this package match Unicode versions, so unicodedata2==13.0.0 is data from Unicode 13.0.0.

Pre-compiled wheel packages are available on PyPI and can be installed via pip.

Testing

We run the tests using tox. This can be installed as usual with pip install tox.

Without any options, tox will run the tests against all of the library's target Python versions. Any missing versions will be skipped.

To run tests against a specific python version you can use the -e option followed by a tox environment name. E.g. -e py38 will run tests against Python 3.8. For more info, check tox's documentation.

unicodedata2's People

Contributors

anthrotype avatar catap avatar harmon758 avatar jayvdb avatar khaledhosny avatar madig avatar mikekap avatar moyogo avatar pnemade avatar scw avatar snoopj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unicodedata2's Issues

Release 14.0.0

I'm going to be working on the next release, 14.0.0. This is what I anticipate being done:

Feedback is welcome.

unicode 12.0+ characters not supported

None of the new characters since unicode 12.0 are recognized.

Doing

>>> import unicodedata
>>> unicodedata.name('𛅥') # or any other of the new characters like ᳺ or 𛅐

returns:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name

I'm using python3.7 with unicodedata2-12.1.0

Add old tables

It would be useful to be able to refer to tables for the previous versions of Unicode.

jquast/wcwidth#23 is attempting to do that.

It would also be very helpful to faciliate Python based analysis of the changes in Unicode data.

It seems the build infrastructure of unicodedata2 is perfect for that.

In order to avoid forcing all users to install all data, perhaps a separate PyPI package name could be used for the 'all unicodedata versions' edition of this.

Distribute wheels

It would be great if this package provided installable wheels, so that users do not need to compile the package.

This would be especially useful for Python 2.6 and Python 2.7 on Linux, as Python 2.6.6, 2.7.0 and 2.7.1 have a regression in unicodedata.normalize NFC, and this backport may be the only 'solution' to allow support for those versions. This is particularly important for Python 2.6.6, as it is the Python version provided by Redhat Enterprise Linux.
http://bugs.python.org/issue10254
https://phabricator.wikimedia.org/T102461

Support PyPy?

I don't know how much involved this would be, but it would be nice to make this extension module also work with the PyPy implementation as well as CPython.

This is the error I get if I try to compile the extension module with PyPy on my mac

$ python setup.py bdist_wheel
running bdist_wheel
running build
running build_ext
building 'unicodedata2' extension
creating build
creating build/temp.macosx-10.12-x86_64-2.7
creating build/temp.macosx-10.12-x86_64-2.7/unicodedata2
creating build/temp.macosx-10.12-x86_64-2.7/unicodedata2/py2
cc -pthread -arch x86_64 -DNDEBUG -O2 -fPIC -I./unicodedata2/py2 -I./unicodedata2/ -I/Users/cosimolupo/Documents/Github/pyenv/versions/unicodedata2-pypy2/include -c ./unicodedata2/py2/unicodedata.c -o build/temp.macosx-10.12-x86_64-2.7/./unicodedata2/py2/unicodedata.o
./unicodedata2/py2/unicodedata.c:16:10: fatal error: 'ucnhash.h' file not found
#include "ucnhash.h"
         ^~~~~~~~~~~
1 error generated.
error: command 'cc' failed with exit status 1

Some info on writing extension modules compatible with PyPy:
http://doc.pypy.org/en/latest/extending.html

Update for Unicode 15.1

With the release of Unicode 15.1, it's time for another update to unicodedata2.

Aside from bumping UNIDATA_VERSION, the new CJK Ideograph Extension I needs to be added to the unified CJK ranges, and there's also a bug in makeunicodedata.py based on how it handles character properties. This may also a good opportunity to review the changes from #58, and to add Python 3.12 (scheduled to come out on 2 October 2023) to the library's tox configuration.

I have opened an upstream PR since this involves a bugfix in the CPython tooling, I figure I won't open a PR for unicodedata2 until this is merged to minimize drift from the upstream, but I can say that applying the same changeset to the relevant files in unicodedata2 does work and the test suite passes.

EDIT: the upstream PR has landed, PR forthcoming.

Python 3 support

This package is most useful on Python 2, but it can also be useful on older Python 3.
I'd be happy to add the necessary compilation voodoo needed to support Python 3.

Synchronize with CPython

As mentioned in #55, unicodedata.c and other files extracted from CPython are pretty out of sync with the upstream. Filing this issue so that there's a coherent place for questions/discussion that might crop up in pursuit of that.

See also #39 which covers some of the same ground, but is itself out of date.

[feature-request] support Unicode 8.0.0?

I imagine this is already in the pipeline.

I'm not familiar with unicodedata internals. How hard is it to modify the unicodedata2 module to use updated data files from Unicode.org?

Thanks.

Drop test code for narrow builds of Python

There is some code in the unicodedata2 test associated with narrow (UCS2-only) builds of Python. I believe this code was added (in #35) when unicodedata2 was still supporting Python 2 and versions before Python 3.3 when narrow builds were removed.

As far as I know, this code is entirely dead now that unicodedata2 does not support these older versions of Python.

(context: I noticed this while diffing the test suite against the upstream tests as I organize a PR for #60)

Doesn't build for Python 3.11

Attempting to build unicodedata2 in a docker container (aarch64/amd64). Works fine using Python 3.10.4-slim-bullseye base image. Build fails using Python 3.11.0b1-slim-bullseye base image. No other changes made to the build process except the base container.

Error is as follows:

#17 17.42   Running setup.py install for unicodedata2: started
#17 17.42   Running command Running setup.py install for unicodedata2
#17 17.73   running build_ext
#17 17.74   building 'unicodedata2' extension
#17 17.74   creating build
#17 17.74   creating build/temp.linux-aarch64-cpython-311
#17 17.74   creating build/temp.linux-aarch64-cpython-311/unicodedata2
#17 17.74   gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./unicodedata2/ -I/usr/local/include/python3.11 -c ./unicodedata2/unicodectype.c -o build/temp.linux-aarch64-cpython-311/./unicodedata2/unicodectype.o
#17 18.07   gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./unicodedata2/ -I/usr/local/include/python3.11 -c ./unicodedata2/unicodedata.c -o build/temp.linux-aarch64-cpython-311/./unicodedata2/unicodedata.o
#17 18.39   ./unicodedata2/unicodedata.c: In function ‘PyInit_unicodedata2’:
#17 18.39   ./unicodedata2/unicodedata.c:1362:24: error: lvalue required as left operand of assignment
#17 18.39    1362 |     Py_TYPE(&UCD_Type) = &PyType_Type;
#17 18.39         |                        ^
#17 18.41   error: command '/usr/bin/gcc' failed with exit code 1
#17 18.43   error: subprocess-exited-with-error

request to move repo under fonttools organization

Hi @mikekap
would you be willing to share the maintenance burden with us by moving the repository under the fonttools organization?
We use your project in our open-source font build pipeline, and are interested in keeping it up to date with the latest Unicode versions as they get published upstream.
We would like to switch from Travis/Appveyor to Github Workflows and it makes things easier when developers can access the repository settings (e.g. for things like setting encrypted token in the repository's secrets to be able to upload wheels to PyPI).
Of course you will be continue to be the main admin and won't lose control of the project.
Let us know what you think,
thanks

/cc @madig

Python 2.6: undefined symbol: Py_TOUPPER

unicodedata2 compiles OK on Python2.6.6, but on import it fails.

$ python2.6
Python 2.6.6 (r266:84292, Jun 18 2015, 19:35:20) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux4
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /usr/local/lib/python2.6/site-packages/unicodedata2.so: undefined symbol: Py_TOUPPER

On Python 2.7 it works OK.

Support localization

Please, support localization for Unicode block description and character description. Translations are available from https://github.com/unicode-table/unicode-table-data/tree/master/loc If possible, use a gettetxt approach similar to https://pypi.org/project/pycountry/ Implementing this feature will allow users to read Unicode descriptions in their own language, other than English.

For example, now is possible only in English:

from unicodedata import name
print(name('ß'))
LATIN SMALL LETTER SHARP S

So unidecode could provide a way to translate LATIN SMALL LETTER SHARP S to e.g. German with (proposed code):

from unicodedata import name
from unicodedata2 import LOCALED_DIR
from gettext import translation
german = translation('UnicodeData2' LOCALED_DIR, languages=['de'])
german.install()
print(_(name('ß')))
LATEINISCHER KLEINBUCHSTABE SCHARFES S

Typing support

Would you be interested in adding type stubs if they were contributed to this repo?
And if not, could you give me permission (*) to contribute the stubs to https://github.com/python/typeshed instead?

(*) The permission is required per typeshed's contributing guidelines but having the stubs for your lib in the typeshed does not require you to do any work on it as it's maintained within the typeshed repo then

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.