Git Product home page Git Product logo

acora's Introduction

Hi there, I'm Stefan 👋

  • 🧑🏽‍💻 I am the maintainer of Cython, lxml and a few other Python data processing tools that you can find below.
  • 🧑🏽‍💻 I am also a core developer of CPython and contributor to many other open source software projects.
  • 🤝 In my city there's an old house from the 1600s with an inscription: "May God give to all those who know me, twice what they feel that they owe me" (*). With a donation, YOU can help me live the life that you think I deserve for the work that I do on Cython and on lxml, or my other projects.
  • 🤔 You can contact me at [email protected]
  • 😄 My pronouns are he/him
  • 🗣️ I speak and 🖋️ write in Deutsch, English, Français

(*) original German inscription: "Gott gebe allen, die mich kennen, doppelt so viel, als sie mir gönnen"

acora's People

Contributors

msabramo avatar scoder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

acora's Issues

Python 3.11 support?

Hello, is this project dead or is there a plan to support 3.11?

We've been using acora extensively in our projects. We are now migrating some important user scripts to run in a python3.11 environment. We decided to build it manually and this worked fine with Cython 0.29.36. However we recently noticed some important slowness as we started building from Cython 3.0.0, and I'm unable to build using 3.0.1 or 3.0.2. We could parse a specific file in about 0.5s vs 18s with the slow version build.

The slowness is only observed in python3.11 with Cython >= 3.0.0, I can not reproduce it in python3.9.

Here is the test I made, with pytest-benchmark:

from acora import AcoraBuilder
import string


def acora_find_all():
    builder = AcoraBuilder("ab")
    ac = builder.build()
    _ = ac.findall(string.ascii_lowercase * 100000)


def test_acora_findall(benchmark):
    benchmark(acora_find_all)

Results:

  • Python3.9 Cython==3.0.2:
---------------------------------------------- benchmark: 1 tests ---------------------------------------------
Name (time in ms)         Min      Max     Mean  StdDev   Median     IQR  Outliers      OPS  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------
test_acora_findall    24.7092  31.3866  26.0660  1.3501  25.6888  0.7616       3;2  38.3642      23           1
---------------------------------------------------------------------------------------------------------------
  • Python 3.11 Cython==0.29.36
---------------------------------------------- benchmark: 1 tests ---------------------------------------------
Name (time in ms)         Min      Max     Mean  StdDev   Median     IQR  Outliers      OPS  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------
test_acora_findall    18.3448  22.0383  19.7224  0.7655  19.5438  0.5929       8;5  50.7038      46           1
---------------------------------------------------------------------------------------------------------------
  • Python3.11 Cython==3.0.0
----------------------------------------------- benchmark: 1 tests -----------------------------------------------
Name (time in ms)          Min       Max      Mean  StdDev    Median     IQR  Outliers     OPS  Rounds  Iterations
------------------------------------------------------------------------------------------------------------------
test_acora_findall    173.6256  183.2379  178.6345  3.6809  178.3472  5.8426       2;0  5.5980       6           1
------------------------------------------------------------------------------------------------------------------

fused types

Would it be possible to fuse the _unicode and _byte functions? Apart from fetching the next element (and technically the hard-coded sizeof's), both cython code paths are identical, I think. The pure-python is 100% identical. I'm not familiar with cython and the code seems more complicated than the standard ahocorasick implementation, so maybe not.. I'm thinking of some lookup table matching type -> get_next_element_function. Perhaps then the container type can by extended to any (python) sequence type and the contained type can be any (python) comparable type. Cython automatically selects the fastest type, I think, so str->UCS4, bytes->char, int->int?, bool->bint, ->.

I'd also like to support the c bitarray extension module, which is a char*-backed List[Bool], without the intermediate boxing of bit to python boolean. Any ideas?

Can't STOP when I want to use longest match

Hi, I came across a bug when use longest_match as README.rst introductions to do greedy search for the longest matching keywords.

the longest_match did as README,

    def _longest_match(matches):
        spos_groupby_iter = groupby(matches, itemgetter(1)) # by spos
        for _, kw_iter_with_same_spos in spos_groupby_iter:
            l = list(kw_iter_with_same_spos)
            print(l)
            yield max(l) # max get the longest keyword

It can never stop when I match a sentence like

因为弱覆盖

(sorry for use Chinese because time is too less to construct a English case)and word dict has words like

覆盖
弱覆盖

. What's more, the following 2 conditions will not raise the BUG:

  1. text 弱覆盖 is not the end of the text.

    like sentence = 因为弱覆盖的原因

  2. use list, instead of iterator.

    def _longest_match(matches):
        pre_get_list = list(matches) # get the list instead of the iterator
        spos_groupby_iter = groupby(pre_get_list, itemgetter(1)) # by spos
        for _, kw_iter_with_same_spos in spos_groupby_iter:
            l = list(kw_iter_with_same_spos)
            print(l)
            yield max(l) # max get the longest keyword

SO may be it is because of not a proper StopIteration is raised? SORRY for I can't help currenttly. What I can do is only report the bug...

I use unicode as str representation, In Python2, CentOS7.2

Building mildly deep automatons takes a long time

With this snippet and the latest 2.0, which creates an automaton with 1000 strings of 2000 characters each build() takes forever to complete, I eventually killed it:

>>> from array import array
>>> from acora import AcoraBuilder
>>> tks =[array('h', range(x, x+1000)).tostring() for x in range(1000)]
>>> builder = AcoraBuilder(*tks)
>>> ac=builder.build()

acora fails to match binary string patterns

Reported by Nate Lawson:

I have a problem with matching binary strings in a file. I've included a short snippet of code to demonstrate this. It works fine if the file is binary but the pattern is ASCII, but not if the pattern is also binary.

from acora import AcoraBuilder

# Works fine
#pattern = 'abc'

# Fails to match
pattern = '\xa5\x66\x80'

ac = AcoraBuilder(pattern).build()
mainString = (10 * '\xf0') + pattern
assert ac.findall(mainString) == [(pattern, 10)]

installation with pip fails on Solaris

When running the install command on a Solaris SunOS 5.11 machine with this command; pip install acora ; I get the following error:
unable to execute /opt/csw/bin/gcc-4.8: No such file or directory

gcc is installed, but not in that location.

How can I specify the correct location of gcc?

Thank you

Searching for phrases across newlines.

I'd like to search a very long text (OCR'd) for a hundred or so multi-word phrases where the phrases may have newlines where spaces should be. So I want to search for the quick brown fox and the text looks like:

the quick
brown fox
  • Is it possible to use acora for that? (How?)
  • Would it be possible to extend acora to treat all whitespace the same? Would you be open to a pull request to do so?

cython compile error

i use cython-0.19.1.

python setup.py build
running build
running build_py
running build_ext
cythoning acora/_acora.pyx to acora/_acora.c
building 'acora._acora' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c acora/_acora.c -o build/temp.linux-x86_64-2.7/acora/_acora.o
gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/acora/_acora.o -o build/lib.linux-x86_64-2.7/acora/_acora.so
cythoning acora/_nfa2dfa.py to acora/_nfa2dfa.c

Error compiling Cython file:

...

from _acora cimport _NfaState

cdef _visit_all(_NfaState tree, visitor)

@cython.locals(state=_NfaState, new_state=_NfaState, eq_states=set)

^

acora/_nfa2dfa.pxd:6:0: Cdef functions/classes cannot take arbitrary decorators.

building 'acora._nfa2dfa' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c acora/_nfa2dfa.c -o build/temp.linux-x86_64-2.7/acora/_nfa2dfa.o
acora/_nfa2dfa.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
error: command 'gcc' failed with exit status 1

compile succeed after commenting out #@cython.locals(state=_NfaState, new_state=_NfaState, eq_states=set) in _nfa2dfa.pxd file.

python test.py

WARNING: '_acora' C extension not imported, only testing Python implementation

Ran 46 tests in 0.930s

OK

why is the cython implementation not tested?

Segmentation fault

We're running into a segmentation fault:

kernel: [12705926.991919] python[21044]: segfault at 10 ip 00007f78305d7231 sp 00007f780ffb90a0 error 4 in_cacora.so[7f78305d1000+23000]

When using acora in w3af.

We replaced esm with acora a few days ago and the library was working fine until we used w3af to scan a specific target.

What information do you need in order to debug and fix this issue? I'm collecting kernel version, python version, and trying to come up with a minimalist PoC (1 file, ~30 lines of code) that will trigger the segmentation fault. Anything else?

The construction algorithm does not scale well.

It is noted in the documentation that the construction algorithm is not suitable for more than a couple of thousands of keywords. In my experience, the memory requirements go through the roof at around 10K keywords.

I am, however, quite motivated to (somehow) use Aho-Corasick search in python with > 100K keywords. Several workarounds are possible for my current use-case, but if it is possible to improve the scalability of Acora, I'd much rather spend my time on contributing to that.

So, where do we start?
It seems that tree construction is possible in linear runtime: http://link.springer.com/chapter/10.1007%2F11496656_15

longest match

Hi,
I'm having issues with your sample code for longest_match.

  1. It doesn't seem to generate correct results for me.
    • It seems to group by string start rather than string end
    • It uses the maximum by string comparison, rather than maximum length
  2. For some reason my machine goes to 100% load and never finishes; I honestly don't understand why.

Below an implementation that I believe is giving "more correct" results for me...

def search_longest(matches):
         for pos, match_set in groupby(matches, lambda x: len(x[0]) + x[1]):
             yield max(match_set, key=lambda x: len(x))

Does not build with python3.9

FYI, this package does not install after python3.8:

$ docker run -it --rm python:3.9.1-buster /bin/bash
root@3053158056ee:/# pip3 install acora
Collecting acora
  Downloading acora-2.2.tar.gz (210 kB)
     |████████████████████████████████| 210 kB 4.4 MB/s 
Building wheels for collected packages: acora
  Building wheel for acora (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-zj45cgqs
       cwd: /tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/
  Complete output (142 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.9
  creating build/lib.linux-x86_64-3.9/acora
  copying acora/__init__.py -> build/lib.linux-x86_64-3.9/acora
  copying acora/_acora.py -> build/lib.linux-x86_64-3.9/acora
  running build_ext
  building 'acora._acora' extension
  creating build/temp.linux-x86_64-3.9
  creating build/temp.linux-x86_64-3.9/acora
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.9 -c acora/_acora.c -o build/temp.linux-x86_64-3.9/acora/_acora.o
  acora/_acora.c: In function ‘__Pyx_modinit_type_init_code’:
  acora/_acora.c:10468:43: error: ‘PyTypeObject’ {aka ‘struct _typeobject’} has no member named ‘tp_print’; did you mean ‘tp_dict’?
     __pyx_type_5acora_6_acora__MachineState.tp_print = 0;
                                             ^~~~~~~~
                                             tp_dict
  acora/_acora.c:10479:38: error: ‘PyTypeObject’ {aka ‘struct _typeobject’} has no member named ‘tp_print’; did you mean ‘tp_dict’?
     __pyx_type_5acora_6_acora__Machine.tp_print = 0;
                                        ^~~~~~~~
                                        tp_dict
  acora/_acora.c: In function ‘__Pyx_ParseOptionalKeywords’:
  acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                       (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                       ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11096:21: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                       (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                       ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
   Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                               ^~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                       (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                       ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                       (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                       ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11096:21: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                       (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                       ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
   Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                               ^~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                       (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                       ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                           (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11112:25: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                           (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
   Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                               ^~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                           (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                           (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11112:25: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                           (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
   Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                               ^~~~~~~~~~~~~~~~~~~
  acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                           (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                           ^
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
   static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~
  acora/_acora.c: In function ‘__Pyx_decode_c_bytes’:
  acora/_acora.c:13555:9: warning: ‘PyUnicode_FromUnicode’ is deprecated [-Wdeprecated-declarations]
           return PyUnicode_FromUnicode(NULL, 0);
           ^~~~~~
  In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                   from /usr/local/include/python3.9/Python.h:97,
                   from acora/_acora.c:16:
  /usr/local/include/python3.9/cpython/unicodeobject.h:551:42: note: declared here
   Py_DEPRECATED(3.3) PyAPI_FUNC(PyObject*) PyUnicode_FromUnicode(
                                            ^~~~~~~~~~~~~~~~~~~~~
  error: command '/usr/bin/gcc' failed with exit code 1
  ----------------------------------------
  ERROR: Failed building wheel for acora
  Running setup.py clean for acora
Failed to build acora
Installing collected packages: acora
    Running setup.py install for acora ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-1phktx0j/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.9/acora
         cwd: /tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/
    Complete output (142 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.9
    creating build/lib.linux-x86_64-3.9/acora
    copying acora/__init__.py -> build/lib.linux-x86_64-3.9/acora
    copying acora/_acora.py -> build/lib.linux-x86_64-3.9/acora
    running build_ext
    building 'acora._acora' extension
    creating build/temp.linux-x86_64-3.9
    creating build/temp.linux-x86_64-3.9/acora
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.9 -c acora/_acora.c -o build/temp.linux-x86_64-3.9/acora/_acora.o
    acora/_acora.c: In function ‘__Pyx_modinit_type_init_code’:
    acora/_acora.c:10468:43: error: ‘PyTypeObject’ {aka ‘struct _typeobject’} has no member named ‘tp_print’; did you mean ‘tp_dict’?
       __pyx_type_5acora_6_acora__MachineState.tp_print = 0;
                                               ^~~~~~~~
                                               tp_dict
    acora/_acora.c:10479:38: error: ‘PyTypeObject’ {aka ‘struct _typeobject’} has no member named ‘tp_print’; did you mean ‘tp_dict’?
       __pyx_type_5acora_6_acora__Machine.tp_print = 0;
                                          ^~~~~~~~
                                          tp_dict
    acora/_acora.c: In function ‘__Pyx_ParseOptionalKeywords’:
    acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                         (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                         ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11096:21: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                         (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                         ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
     Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                                 ^~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                         (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                         ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                         (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                         ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11096:21: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                         (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                         ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
     Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                                 ^~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11096:21: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                         (PyUnicode_GET_SIZE(**name) != PyUnicode_GET_SIZE(key)) ? 1 :
                         ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                             (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                             ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11112:25: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                             (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                             ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
     Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                                 ^~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                             (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                             ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                             (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                             ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11112:25: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
                             (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                             ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:580:45: note: declared here
     Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
                                                 ^~~~~~~~~~~~~~~~~~~
    acora/_acora.c:11112:25: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
                             (PyUnicode_GET_SIZE(**argname) != PyUnicode_GET_SIZE(key)) ? 1 :
                             ^
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:446:26: note: declared here
     static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
    acora/_acora.c: In function ‘__Pyx_decode_c_bytes’:
    acora/_acora.c:13555:9: warning: ‘PyUnicode_FromUnicode’ is deprecated [-Wdeprecated-declarations]
             return PyUnicode_FromUnicode(NULL, 0);
             ^~~~~~
    In file included from /usr/local/include/python3.9/unicodeobject.h:1026,
                     from /usr/local/include/python3.9/Python.h:97,
                     from acora/_acora.c:16:
    /usr/local/include/python3.9/cpython/unicodeobject.h:551:42: note: declared here
     Py_DEPRECATED(3.3) PyAPI_FUNC(PyObject*) PyUnicode_FromUnicode(
                                              ^~~~~~~~~~~~~~~~~~~~~
    error: command '/usr/bin/gcc' failed with exit code 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0um7_g35/acora_ebe641025ee34edfbf51454eee319dc3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-1phktx0j/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.9/acora Check the logs for full command output.

This is only seen from 3.9 (latest 3.8.x image installs it fine).

Can't build Acora 1.7

I've tried to build python acora 1.7 with python 3, as follows:

jd@outcrop:~/software/acora-1.7$ rm -rf build
jd@outcrop:~/software/acora-1.7$ python3 setup.py build
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.2
creating build/lib.linux-x86_64-3.2/acora
copying acora/nfa2dfa.py -> build/lib.linux-x86_64-3.2/acora
copying acora/__init__.py -> build/lib.linux-x86_64-3.2/acora
copying acora/_nfa2dfa.py -> build/lib.linux-x86_64-3.2/acora
running build_ext
building 'acora._acora' extension
creating build/temp.linux-x86_64-3.2
creating build/temp.linux-x86_64-3.2/acora
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python3.2mu -c acora/_acora.c -o build/temp.linux-x86_64-3.2/acora/_acora.o
gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions build/temp.linux-x86_64-3.2/acora/_acora.o -o build/lib.linux-x86_64-3.2/acora/_acora.cpython-32mu.so
building 'acora._nfa2dfa' extension
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python3.2mu -c acora/_nfa2dfa.c -o build/temp.linux-x86_64-3.2/acora/_nfa2dfa.o
acora/_nfa2dfa.c: In function ‘__pyx_pf_5acora_8_nfa2dfa_8NfaState_1__init__’:
acora/_nfa2dfa.c:886:13: warning: variable ‘__pyx_v_state_id’ set but not used [-Wunused-but-set-variable]
acora/_nfa2dfa.c: In function ‘__pyx_pf_5acora_8_nfa2dfa_8NfaState_7__deepcopy__’:
acora/_nfa2dfa.c:1410:13: warning: variable ‘__pyx_v_memo’ set but not used [-Wunused-but-set-variable]
acora/_nfa2dfa.c: In function ‘__pyx_f_5acora_8_nfa2dfa__visit_all’:
acora/_nfa2dfa.c:2025:43: error: ‘struct __pyx_obj_5acora_6_acora__NfaState’ has no member named ‘None’
acora/_nfa2dfa.c: In function ‘__pyx_f_5acora_8_nfa2dfa_nfa2dfa’:
acora/_nfa2dfa.c:2423:45: error: ‘struct __pyx_obj_5acora_6_acora__NfaState’ has no member named ‘None’
error: command 'gcc' failed with exit status 1
jd@outcrop:~/software/acora-1.7$ echo $PYTHONPATH

jd@outcrop:~/software/acora-1.7$ python3 --version
Python 3.2

jd@outcrop:~/software/acora-1.7$ python setup.py build
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/acora
copying acora/nfa2dfa.py -> build/lib.linux-x86_64-2.7/acora
copying acora/__init__.py -> build/lib.linux-x86_64-2.7/acora
copying acora/_nfa2dfa.py -> build/lib.linux-x86_64-2.7/acora
running build_ext
building 'acora._acora' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/acora
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c acora/_acora.c -o build/temp.linux-x86_64-2.7/acora/_acora.o
gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.7/acora/_acora.o -o build/lib.linux-x86_64-2.7/acora/_acora.so
building 'acora._nfa2dfa' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c acora/_nfa2dfa.c -o build/temp.linux-x86_64-2.7/acora/_nfa2dfa.o
acora/_nfa2dfa.c: In function ‘__pyx_pf_5acora_8_nfa2dfa_8NfaState_1__init__’:
acora/_nfa2dfa.c:886:13: warning: variable ‘__pyx_v_state_id’ set but not used [-Wunused-but-set-variable]
acora/_nfa2dfa.c: In function ‘__pyx_pf_5acora_8_nfa2dfa_8NfaState_7__deepcopy__’:
acora/_nfa2dfa.c:1410:13: warning: variable ‘__pyx_v_memo’ set but not used [-Wunused-but-set-variable]
acora/_nfa2dfa.c: In function ‘__pyx_f_5acora_8_nfa2dfa__visit_all’:
acora/_nfa2dfa.c:2025:43: error: ‘struct __pyx_obj_5acora_6_acora__NfaState’ has no member named ‘None’
acora/_nfa2dfa.c: In function ‘__pyx_f_5acora_8_nfa2dfa_nfa2dfa’:
acora/_nfa2dfa.c:2423:45: error: ‘struct __pyx_obj_5acora_6_acora__NfaState’ has no member named ‘None’
error: command 'gcc' failed with exit status 1
jd@outcrop:~/software/acora-1.7$ python --version
Python 2.7.1+

Any hint? Is it environmental? As shown above, PYTHONPATH is unset.

Non-ISO-Latin/ASCII character sets

I would like to use acora for searching Japanese text for keywords. It claims to be Unicode compliant but I'm wondering if there has been any experience or reports of bugs when using it with non-ASCII /ISO-Latin strings ?
Also, +1 on improving the speed of construction as we have a dictionary of 77K Japanese keywords to be loaded.

Doesn't install on Windows

pip install acora produced an error:

[...]
building 'acora._acora' extension
error: Unable to find vcvarsall.bat

which is factually correct, as I don't have Visual Studio installed on this machine.

Could you please consider including a wheel in the acora package?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.