nppoly / cyac Goto Github PK
View Code? Open in Web Editor NEWHigh performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python
License: MIT License
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python
License: MIT License
Is this library thread safe after calling the build function?
If the trie is built and several threads access it using the match function, would this be thread-safe?
ERROR: Failed building wheel for cyac
Failed to build cyac
my cython version is 3.0.0
python version is 3.9
gcc version is 10.2.1
system: debian:bullseye-slim
Thanks!
Hello,
I would like to run some memory analysis and tests using the new shared_buffer branch. I would post those memory analysis in an issue, so that you could put them in the README.md if you like.
How I am facing the following problem:
>>> import cyac
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/cyac-1.0-py3.7-linux-x86_64.egg/cyac/__init__.py", line 2, in <module>
from .ac import AC
File "lib/cyac/ac.pyx", line 1, in init cyac.ac
#cython: language_level=3, boundscheck=False, overflowcheck=False
File "lib/cyac/trie.pyx", line 1, in init cyac.trie
#cython: language_level=3, boundscheck=False, overflowcheck=False
ModuleNotFoundError: No module named 'cyac.util'
What I did to install it:
Tried with python3.7 and 3.6.
When iterating over an AC trie, a segmentation fault occurs. For example:
import cyac
trie = cyac.AC.build(['hello', 'world'])
for w in trie:
print(w)
Output:
hello
world
Segmentation fault
However, AC.items()
is works as expected.
I just want to point out that this library does not support overlap matches.
Consider the following:
import cyac
ac = cyac.AC.build([u"[email protected]", u"gmail.com"])
var1 = "[email protected]"
for id, start, end in ac.match(var1):
print(var1[start:end])
Outputs:
[email protected]
It should have output: [email protected] and gmail.com
This might also be one of the reasons why this lib is faster than pyahocorasick
There is error during build of the latest 1.8 version when package downloaded from PyPI:
lib/cyac/xstring.cpp:1113:10: fatal error: 'unicode_portability.c' file not found
#include "unicode_portability.c"
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1
Summary:
Previously saved case sensitive data loads without error into trie. However trie.get(val) fails get item, even though iterating through trie.items() show correct item.
Env: Archlinux using python 3.9, Ubuntu 20.0.4 using python 3.8.5
Cyac Version: 1.2
Cython Version:
Steps to reproduce:
The code below will should be run twice:
Script below:
from cyac import Trie
import os
import mmap
p=["M3 1AR", "M3 2AR", "M3 3AR"]
if os.path.isfile('./pinsens.bin'):
# Load into new tries
print("* * * Loaded from buffer * * * ")
with open('pinsens.bin', 'r+b') as fins:
bins = mmap.mmap(fins.fileno(), 0)
cins = Trie.from_buff(bins)
bins.flush()
with open('psens.bin', 'r+b') as fsens:
bsens = mmap.mmap(fsens.fileno(), 0)
csen = Trie.from_buff(bsens)
bsens.flush()
else:
print("* * * Clean data")
cins = Trie(ignore_case=True)
csen = Trie()
for x in p:
cins.insert(x)
csen.insert(x)
cins.save('pinsens.bin')
csen.save('psens.bin')
print("Case Insensitive")
print("{} is at {}".format(p[2], cins.get(p[2]))) # correctly returns 2
print("Case Sensitive")
print("{} is at {}".format(p[2], csen.get(p[2]))) # correctly returns 2 in first run, fails in second
for id in csen.items():
print(id)
I am trying to use several AC, each one of them shared between several processes.
I cannot share the code, thus I will provide a pseudo code.
def target_function(ac_name):
with open(ac_name, "r+b") as bf:
buff_object = mmap.mmap(bf.fileno(), 0)
. automaton = AC.from_buff(buff_object, copy=False)
....
processes_per_AC = 3
total_ac = 0
for x in range(0, total_Ac):
x_patterns = [<some words here, different for every AC in the iteration>]
ac = AC.build(x_patterns)
ac.save("ac_{}".format(x))
for x in range(0, processes_per_AC):
p = Process(target_function, args=("ac_{}".format(x)))
p.start()
....
Basically what I am doing is create several AC, with different words in the main process, and then launching several child processes that share the AC created. In this case, 3 processes per AC.
The AC I am building contain the following type of information:
The exception:
...
File "lib/cyac/ac.pyx", line 413, in cyac.ac.AC.from_buff
File "lib/cyac/ac.pyx", line 465, in cyac.ac.ac_from_buff
Exception: invalid data, buf size is not correct
Occurs when I am trying to load from_buff in the target_function. It occurs randomly, and I have not been able to understand why. It does not matter if I have more or less words on the AC. The type of words (email, ip, domains, etc), does not seem to make a difference. I wish I could be more precise, but this is everything I can observe from the exception.
I can provide you the file I am using, it is just random generated emails, domains, ip, etc in json format.
At https://github.com/nppoly/cyac/blob/master/lib/cyac/ac.pyx#L261,
The description is "return_all: by default, only return the longest. ..."
But when I run following code, I cannot understand the output.
ac = cyac.AC.build([u"py", u"python"])
text = 'python'
for idx,s,e in ac.match(text):
print(s, e, ac[idx])
-- output --
0 2 py <== ??
0 6 python
Is it a bug?
pip install cyac
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting cyac
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/db/2e/4a4916514b64694dd478ab085f01fb4ca599e3127c05704f98f3067e8fc7/cyac-1.3.tar.gz (38 kB) Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34,
in <module>
File "C:\Users\admin\AppData\Local\Temp\pip-install-h1651_cj\cyac_4973446135a546ce84624d967d791583\setup.py", line 10, in <module>
long_description = open("README.md").read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8c in position 3452: illegal multibyte sequence
[end of output]
note: This error originates from a subprocess,
and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
python version: Python 3.10.1
pip version: pip 22.0.4
The above two projects are currently excellent projects. Can people who have used them help evaluate their direct differences?
I'm getting the following error when trying to install cyac
% pip --version
pip 23.2.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)
Don't even know if it's cyacs fault, but maybe you can help me?
% pip install cyac
Collecting cyac
Using cached cyac-1.9.tar.gz (47 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: cython>=0.29.0 in /usr/local/lib/python3.9/site-packages (from cyac) (3.0.2)
Building wheels for collected packages: cyac
Building wheel for cyac (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for cyac (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [107 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-11-x86_64-cpython-39
creating build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/version.py -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/init.py -> build/lib.macosx-11-x86_64-cpython-39/cyac
running egg_info
writing cyac.egg-info/PKG-INFO
writing dependency_links to cyac.egg-info/dependency_links.txt
writing requirements to cyac.egg-info/requires.txt
writing top-level names to cyac.egg-info/top_level.txt
reading manifest file 'cyac.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching 'pycache' found anywhere in distribution
adding license file 'LICENSE'
writing manifest file 'cyac.egg-info/SOURCES.txt'
copying lib/cyac/ac.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/ac.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/trie.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/trie.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/unicode_portability.c -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/utf8.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/utf8.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/util.c -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/util.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/util.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/xstring.pxd -> build/lib.macosx-11-x86_64-cpython-39/cyac
copying lib/cyac/xstring.pyx -> build/lib.macosx-11-x86_64-cpython-39/cyac
running build_ext
building 'cyac.util' extension
creating build/temp.macosx-11-x86_64-cpython-39
creating build/temp.macosx-11-x86_64-cpython-39/lib
creating build/temp.macosx-11-x86_64-cpython-39/lib/cyac
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/usr/local/include -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c lib/cyac/util.c -o build/temp.macosx-11-x86_64-cpython-39/lib/cyac/util.o
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 434, in build_wheel
return self._build_with_temp_dir(
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir
self.run_setup()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in run_setup
exec(code, locals())
File "", line 21, in
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/init.py", line 103, in setup
return distutils.core.setup(**attrs)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 364, in run
self.run_command("build")
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 989, in run_command
super().run_command(command)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 88, in run
_build_ext.run(self)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
_build_ext.build_extension(self, ext)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/Cython/Distutils/build_ext.py", line 127, in build_extension
super(build_ext, self).build_extension(ext)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 600, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/unixccompiler.py", line 185, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 1041, in spawn
spawn(cmd, dry_run=self.dry_run, **kwargs)
File "/private/var/folders/wk/xjxk9mdj7lz95cjvqc_nqxl80000gn/T/pip-build-env-tcws4eg3/overlay/lib/python3.9/site-packages/setuptools/_distutils/spawn.py", line 57, in spawn
proc = subprocess.Popen(cmd, env=env)
File "/usr/local/Cellar/[email protected]/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 947, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/Cellar/[email protected]/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1739, in _execute_child
env_list.append(k + b'=' + os.fsencode(v))
File "/usr/local/Cellar/[email protected]/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py", line 810, in fsencode
filename = fspath(filename) # Does type-checking offilename
.
TypeError: expected str, bytes or os.PathLike object, not int
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cyac
Failed to build cyac
ERROR: Could not build wheels for cyac, which is required to install pyproject.toml-based projects
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.