pytries / dawg Goto Github PK
View Code? Open in Web Editor NEWDAFSA-based dictionary-like read-only objects for Python. Based on `dawgdic` C++ library.
Home Page: http://dawg.readthedocs.org
License: MIT License
DAFSA-based dictionary-like read-only objects for Python. Based on `dawgdic` C++ library.
Home Page: http://dawg.readthedocs.org
License: MIT License
I couldn't install package on ubuntu 22 with python3.10. Can't tell cpython version
Error was legacy-install-failure
tried git+https://github.com/pytries/DAWG.git and git+https://github.com/pytries/DAWG.git@actions
It seems the current limitation of the number of DAWG units (and thus transitions - but not elements) is 2^29. Is it possible to increase the number of units? Thanks!
Are RecordDAWGs compatible with using generator functions to fill them?
I have a very large TSV file that contains: words, the contexts they appear in, and then some statistics about the words and contexts. The data is calculated based on Gigaword so the file is 29 GB with 507 million lines, and when I tried to build a dictionary that mapped from word-context pairs to their rank ratios it took up over 80 GB of RAM (and the machine I'm using only has 64), so I tried using RecordDAWGs.
Unfortunately, when I use code like the following Python churns away for a long time seemingly loading the DAWG (and using about 45GB of RAM, which is more reasonable), but then the DAWG it saves only has 6 elements in it and takes up 1.6 KB. Is this because the RecordDAWG constructor cannot read from a generator function?
def ngram_rank_ratio_iter(rank_ratio_file):
for line in rank_ratio_file:
line = line.strip('\r\n')
split_line = line.split('\t')
word_context = '{}\t{}'.format(split_line[0], split_line[1])
ratio = float(split_line[7])
yield (word_context, (ratio,))
rr_dawg = dawg.RecordDAWG('d', ngram_rank_ratio_iter(open('huge_file.tsv')))
rr_dawg.save('huge_file.dawg')
Hello!
I have an IntDawg dictionary saved to file. Unfortunately I have no access to original keys from which this dictionary was built, but IntDawg class has no methods to iterate over its keys as mentioned in the docs https://dawg.readthedocs.io/en/latest/. Is there an easy way to extract all the keys from the IntDawg dictionary?
BytesDAWG uses binascii.b2a_base64
to encode binary data. This function adds unnecessary '\n' to the result:
In [16]: from binascii import b2a_base64
In [17]: b2a_base64('value1')
Out[17]: 'dmFsdWUx\n'
Hello,
I am trying to run the BytesDAWG
implementation on Raspberry Pi 4 (AArch64, 64-bit, Python3.7.17) and the values returned by iteritems()
seem to be different from what I get from get()
method. This behavior does not happen when I run it on my laptop (x86_64, 64-bit, Python3.7.17).
Here is the sample code I am running:
import dawg
data = []
keys = [u'foo', u'bar', u'foobar', u'fo']
for item in zip(keys, range(4)):
data.append((item[0], item[1].to_bytes(3, 'little')))
bytes_dawg = dawg.BytesDAWG(data)
results = bytes_dawg.iteritems('fo')
for item in results:
print('key::', item[0], 'value from get::', bytes_dawg.get(item[0], None), ' value from iteritems', item[1])
It's output on Raspberry PI is :
Output 1
key:: fo value from get:: [b'\x03\x00\x00'] value from iteritems b'\xfc\x0c\x00\x03'
key:: foo value from get:: [b'\x00\x00\x00'] value from iteritems b'\xfc\x00\x00\x03'
key:: foobar value from get:: [b'\x02\x00\x00'] value from iteritems b'\xfc\x08\x00\x03'
While on the X86_64 machine, the output is
Output 2
key:: fo value from get:: [b'\x03\x00\x00'] value from iteritems b'\x03\x00\x00'
key:: foo value from get:: [b'\x00\x00\x00'] value from iteritems b'\x00\x00\x00'
key:: foobar value from get:: [b'\x02\x00\x00'] value from iteritems b'\x02\x00\x00'
From output 1, when I convert the byte
representation( returned from iteritems()
) back to int
, it seems to return incorrect values.
Any ideas on why this might be happening?
When I run setup.py
with Python 3 and with C locale, I get UnicodeDecodeError
exception:
$ LC_ALL=C python3 setup.py
Traceback (most recent call last):
File "setup.py", line 10, in <module>
long_description = open('README.rst').read() + open('CHANGES.rst').read(),
File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1551: ordinal not in range(128)
Collecting DAWG>=0.7.3; extra == "fast" (from pymorphy2[fast])
Using cached https://files.pythonhosted.org/packages/29/c0/d8d967bcaa0b572f9dc1d878bbf5a7bfd5afa2102a5ae426731f6ce3bc26/DAWG-0.7.8.tar.gz
Installing collected packages: DAWG
Running setup.py install for DAWG ... error
Complete output from command c:\users\loren\appdata\local\programs\python\python37\python.exe -u -c "import setuptools, tokenize;file='C:\Users\loren\AppData\Local\Temp\pip-install-u9ywe6f1\DAWG\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\loren\AppData\Local\Temp\pip-record-4frb1boo\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'dawg' extension
creating build
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\src
creating build\temp.win-amd64-3.7\Release\lib
creating build\temp.win-amd64-3.7\Release\lib\b64
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ilib -Ic:\users\loren\appdata\local\programs\python\python37\include -Ic:\users\loren\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tpsrc\b64_decode.cpp /Fobuild\temp.win-amd64-3.7\Release\src\b64_decode.obj
b64_decode.cpp
c:\users\loren\appdata\local\temp\pip-install-u9ywe6f1\dawg\src../lib/b64/decode.h(52): warning C4244: '=': conversion from 'std::streamsize' to 'int', possible loss of data
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ilib -Ic:\users\loren\appdata\local\programs\python\python37\include -Ic:\users\loren\appdata\local\programs\python\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tpsrc\dawg.cpp /Fobuild\temp.win-amd64-3.7\Release\src\dawg.obj
dawg.cpp
c:\users\loren\appdata\local\temp\pip-install-u9ywe6f1\dawg\src../lib/b64/decode.h(52): warning C4244: '=': conversion from 'std::streamsize' to 'int', possible loss of data
src\dawg.cpp(4638): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
src\dawg.cpp(4652): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
src\dawg.cpp(10580): warning C4267: 'argument': conversion from 'size_t' to 'const int', possible loss of data
src\dawg.cpp(11052): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
src\dawg.cpp(11526): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
src\dawg.cpp(12419): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
src\dawg.cpp(12433): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
src\dawg.cpp(13011): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
src\dawg.cpp(21523): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21524): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21525): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21526): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21527): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21528): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21550): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21551): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21552): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21553): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21554): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21555): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21568): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21569): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21570): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21582): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21583): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21584): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21585): error C2039: 'exc_type': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21586): error C2039: 'exc_value': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
src\dawg.cpp(21587): error C2039: 'exc_traceback': is not a member of '_ts'
c:\users\loren\appdata\local\programs\python\python37\include\pystate.h(212): note: see declaration of '_ts'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
Character dictionary for dawg.DAWG.compile_replaces can have only single value for one key which is problem if you want to use same character for multiple diacritic characters, which is usual use-case.
Example:
It can be solved by compiling dictionary several times for each combination, but this is not quite ... optimal, to put it mildly.
Thank you
Windows 7 x64, Python 2.7.1, VS2008 compiler (also tested with 2012). Does not install properly. Installation bug or missing prereq?
C:\Python27\Lib\DAWG-308899fe8801c51483099ff9ea5757bb8ea1be55>pip install DAWG
Downloading/unpacking DAWG
Running setup.py egg_info for package DAWG
Installing collected packages: DAWG
Running setup.py install for DAWG
building 'dawg' extension
Traceback (most recent call last):
File "", line 1, in
File "c:\users\jacob\appdata\local\temp\pip-build-Jacob\DAWG\setup.py", li
ne 40, in
'Topic :: Text Processing :: Linguistic',
File "C:\Python27\lib\distutils\core.py", line 152, in setup
dist.run_commands()
File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
self.run_command(cmd)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools
\command\install.py", line 56, in run
File "C:\Python27\lib\distutils\command\install.py", line 563, in run
self.run_command('build')
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\distutils\command\build.py", line 127, in run
self.run_command(cmd_name)
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools
\command\build_ext.py", line 46, in run
File "C:\Python27\lib\distutils\command\build_ext.py", line 340, in run
self.build_extensions()
File "C:\Python27\lib\distutils\command\build_ext.py", line 449, in build_
extensions
self.build_extension(ext)
File "C:\Python27\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools
\command\build_ext.py", line 175, in build_extension
File "C:\Python27\lib\distutils\command\build_ext.py", line 499, in build_
extension
depends=ext.depends)
File "C:\Python27\lib\distutils\msvc9compiler.py", line 473, in compile
self.initialize()
File "C:\Python27\lib\distutils\msvc9compiler.py", line 383, in initialize
vc_env = query_vcvarsall(VERSION, plat_spec)
File "C:\Python27\lib\distutils\msvc9compiler.py", line 299, in query_vcva
rsall
raise ValueError(str(list(result.keys())))
ValueError: [u'path']
Complete output from command C:\Python27\python.exe -c "import setuptools;__
file__='c:\users\appdata\local\temp\pip-build\DAWG\setup.py'
;exec(compile(open(file).read().replace('\r\n', '\n'), file, 'exec'))" i
nstall --record c:\users\appdata\local\temp\pip-oysw7h-record\install-reco
rd.txt --single-version-externally-managed:
running install
running build
running build_ext
building 'dawg' extension
Traceback (most recent call last):
File "", line 1, in
File "c:\users\jacob\appdata\local\temp\pip-build-Jacob\DAWG\setup.py", line 4
0, in
'Topic :: Text Processing :: Linguistic',
File "C:\Python27\lib\distutils\core.py", line 152, in setup
dist.run_commands()
File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
self.run_command(cmd)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\com
mand\install.py", line 56, in run
File "C:\Python27\lib\distutils\command\install.py", line 563, in run
self.run_command('build')
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\distutils\command\build.py", line 127, in run
self.run_command(cmd_name)
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\com
mand\build_ext.py", line 46, in run
File "C:\Python27\lib\distutils\command\build_ext.py", line 340, in run
self.build_extensions()
File "C:\Python27\lib\distutils\command\build_ext.py", line 449, in build_exte
nsions
self.build_extension(ext)
File "C:\Python27\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\com
mand\build_ext.py", line 175, in build_extension
File "C:\Python27\lib\distutils\command\build_ext.py", line 499, in build_exte
nsion
depends=ext.depends)
File "C:\Python27\lib\distutils\msvc9compiler.py", line 473, in compile
self.initialize()
File "C:\Python27\lib\distutils\msvc9compiler.py", line 383, in initialize
vc_env = query_vcvarsall(VERSION, plat_spec)
File "C:\Python27\lib\distutils\msvc9compiler.py", line 299, in query_vcvarsal
l
raise ValueError(str(list(result.keys())))
ValueError: [u'path']
Command C:\Python27\python.exe -c "import setuptools;file='c:\users
\appdata\local\temp\pip-build\DAWG\setup.py';exec(compile(open(file
__).read().replace('\r\n', '\n'), __file, 'exec'))" install --record c:\users\appdata\local\temp\pip-oysw7h-record\install-record.txt --single-version-e
xternally-managed failed with error code 1 in c:\users\appdata\local\temp
pip-build\DAWG
Storing complete log in C:\Users\pip\pip.log
C:\Python27\Lib\DAWG-308899fe8801c51483099ff9ea5757bb8ea1be55>
We've forked the repository, the new repo is https://github.com/pymorphy2-fork/DAWG
What's changed:
We'll try to maintain the project as far as we can, at least answer to issues and review and merge pull requests.
I'm trying to build this with Python 3.7 and Cython 28.5.
python setup.py install
gives several compilation errors related to the new Python 3.7 C-API.
I understand that I seem to need to run the update_cpp.sh
script. This gives me a Cython compilation error, which is fixed by the below patch. Re-generating the C++ files then works, and so does installing the package. I'm hesitant to open a pull request because a) there are enormous changes to the checked-in C++ files, and b) I'm not sure if my change is a very good idea. I'm not too familiar with Cython, unfortunately. Maybe someone can help me out.
The aforementioned patch:
diff --git a/src/dawg.pyx b/src/dawg.pyx
index 133c7fc..2e05087 100644
--- a/src/dawg.pyx
+++ b/src/dawg.pyx
@@ -346,7 +346,7 @@ cdef class CompletionDAWG(DAWG):
return completer.Next()
- cpdef bytes tobytes(self) except +:
+ cpdef bytes tobytes(self):
"""
Return raw DAWG content as bytes.
"""
I need to store multiple values by the same key. When I tryed to use BytesDAWG and invoke keys I receive multiple equal keys. Is this correct behaviour?
Here is an example:
import dawg
data = [(u'key1', b'value1'), (u'key2', b'value2'), (u'key1', b'value3')]
bytes_dawg = dawg.BytesDAWG(data)
keys = bytes_dawg.keys( u'key1' )
print keys
The output is:
[u'key1', u'key1']
Hi. I am trying to install the package through pip with PyPy 3.6 and get the following exception
Collecting dawg
Using cached https://files.pythonhosted.org/packages/29/c0/d8d967bcaa0b572f9dc1d878bbf5a7bfd5afa2102a5ae426731f6ce3bc26/DAWG-0.7.8.tar.gz
Installing collected packages: dawg
Running setup.py install for dawg ... error
ERROR: Command errored out with exit status 1:
command: /home/dev/pypy/bin/pypy3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jqjt78ga/dawg/setup.py'"'"'; __file__='"'"'/tmp/pip-install-jqjt78ga/dawg/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jc6hctxn/install-record.txt --single-version-externally-managed --compile
cwd: /tmp/pip-install-jqjt78ga/dawg/
Complete output (22 lines):
running install
running build
running build_ext
building 'dawg' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
creating build/temp.linux-x86_64-3.6/lib
creating build/temp.linux-x86_64-3.6/lib/b64
gcc -pthread -DNDEBUG -O2 -fPIC -Ilib -I/home/dev/pypy/include -c src/_dawg.cpp -o build/temp.linux-x86_64-3.6/src/_dawg.o
gcc -pthread -DNDEBUG -O2 -fPIC -Ilib -I/home/dev/pypy/include -c src/_base_types.cpp -o build/temp.linux-x86_64-3.6/src/_base_types.o
gcc -pthread -DNDEBUG -O2 -fPIC -Ilib -I/home/dev/pypy/include -c src/dawg.cpp -o build/temp.linux-x86_64-3.6/src/dawg.o
src/dawg.cpp: In function ‘void __Pyx_Raise(PyObject*, PyObject*, PyObject*, PyObject*)’:
src/dawg.cpp:21426:21: error: cannot convert ‘PyObject*’ {aka ‘_object*’} to ‘PyObject**’ {aka ‘_object**’}
PyErr_Fetch(tmp_type, tmp_value, tmp_tb);
^~~~~~~~
In file included from /home/dev/pypy/include/Python.h:142,
from src/dawg.cpp:16:
/home/dev/pypy/include/pypy_decl.h:222:41: note: initializing argument 1 of ‘void PyPyErr_Fetch(PyObject**, PyObject**, PyObject**)’
PyAPI_FUNC(void) PyErr_Fetch(PyObject **arg0, PyObject **arg1, PyObject **arg2);
~~~~~~~~~~~^~~~
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /home/dev/pypy/bin/pypy3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jqjt78ga/dawg/setup.py'"'"'; __file__='"'"'/tmp/pip-install-jqjt78ga/dawg/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jc6hctxn/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
Hi,
I am trying to create a RecordDAWG object which contains tuple that consists of different data type. But there was error.
Does RecordDAWG only accept numeric tuple ?
data = [(u'key1', (1, b'a')), (u'key2', (2, b'b')),(u'key3', (3, b'c'))]
dawg.RecordDAWG(data)
Traceback (most recent call last):
File "", line 1, in
File "dawg.pyx", line 830, in dawg.RecordDAWG.init (src/dawg.cpp:13810)
struct.error: bad char in struct format
Br,
Eric
I'm trying to store a large number of 4-grams in memory to speed up computation as part of a larger program I'm working on, and I think I found a reasonable way to do it using your DAWG library, but I ran into an issue because you currently don't allow byte string keys (I'm not sure if this is a limitation of your wrapper or the underlying library).
The idea I had was to have one DAWG that maps from words to unique numbers, and then store the 4-gram counts in another DAWG by converting the 4-grams to struct-packed byte strings of the format 'LLLL' where each packed item in the byte string is one of the unique numbers from the word DAWG. However, I can't currently do this because the keys are required to be unicode.
Is this something that would be feasible to add in the future?
This may be a misunderstanding on my part, or a bug:
>>> dawg.BytesDAWG([u'foo'])
Traceback (most recent call last):
File "<ipython-input-16-8701cbd4e408>", line 1, in <module>
dawg.BytesDAWG([u'foo'])
File "dawg.pyx", line 473, in dawg.BytesDAWG.__init__ (src/dawg.cpp:7904)
File "dawg.pyx", line 288, in dawg.CompletionDAWG.__init__ (src/dawg.cpp:5795)
File "dawg.pyx", line 37, in dawg.DAWG.__init__ (src/dawg.cpp:1982)
File "dawg.pyx", line 472, in genexpr (src/dawg.cpp:7725)
TypeError: Expected bytes, got unicode
>>> dawg.BytesDAWG(['foo'])
Traceback (most recent call last):
File "<ipython-input-17-4b1312367f59>", line 1, in <module>
dawg.BytesDAWG(['foo'])
File "dawg.pyx", line 473, in dawg.BytesDAWG.__init__ (src/dawg.cpp:7904)
File "dawg.pyx", line 288, in dawg.CompletionDAWG.__init__ (src/dawg.cpp:5795)
File "dawg.pyx", line 37, in dawg.DAWG.__init__ (src/dawg.cpp:1982)
File "dawg.pyx", line 472, in genexpr (src/dawg.cpp:7722)
TypeError: Expected unicode, got str
Python 2.7, freshly pipped dawg.
Currently I am using DAWG, but I am unable too pickle the objets and hence cache them in memcached.
I am loading a list of Django models, and creating a DAWG object form the name and locations.
As this is a time consuming operation, I am trying to cache it and I am assuming this is pickled before it goes into memcache. Django is using cPickle, but I am getting the same error message regardless of pickle or cPickle. When setting the cache, the cache item is empty.
So at the moment I can't pickle or cache.
Is this a bug?
Are there any work arounds?
DAWG==0.7.8
Python 2.7.11 in virtualenv on mac os x
Error from django shell
>>> d2 = pickle.loads(data)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "dawg.pyx", line 820, in dawg.RecordDAWG.__init__ (src/dawg.cpp:13552)
TypeError: __init__() takes at least 1 positional argument (0 given)
Code (Django):
from models import Competition
import dawg
import pickle
comps = Competition.objects.all()
keys = list()
values = list()
if comps:
for comp in comps:
keys.append(comp.name.lower())
values.append((comp.id,1))
if comp.address_city is not None:
keys.append(comp.address_city.lower())
values.append((0,3))
data = zip(keys, values)
format = "II"
dawg.RecordDAWG(format, data)
dawg.CompletionDAWG(keys)
data = pickle.dumps(record_dawg)
d2 = pickle.loads(data)
cache.set('record_dawg', record_dawg, 2000)
cache.get('record_dawg')
When looking at the data variable, it shows me pickled data, but when I try to load it, it fails with the above error.
the last lines are straight from the documentation, can someone please have a look and check whether this is a real bug or am I just not seeing what is wrong?
L.
I'm building ByteDAWGs of increasing size, while the first several are fine, once they exceed a certain size they all fail with the "Can't build dictionary" exception (see below for traceback). Prior to version 0.5 this was causing a segmentation fault, but the new version yields a python exception.
It's happing after iteration through the data (bundleSubGenerator()
generates a sequence of (unicode,bytes) tuples, and completes it), but before the function returns the dawg.
These are pretty huge data structures, somewhere over 3GB when written to disk using the dwg.save()
method. There are about 55000 keys, each with between 20 and 50 characters, and each with a potentially large number of values, between 8 and 300 bytes.
Is this simply an issue of size? It looks like it's simply getting a 'false' from a function in the dawgdic
library, but I can't figure out what the problem is.
Thanks for the wonderful library. I hope this issue at least has a work-around.
---------------------------------------------------------------------------
---> 38 dwg = dawg.BytesDAWG(bundleSubGenerator(bundleDBBasePath+str(y)))
/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/DAWG-0.5-py2.7-macosx-10.5-x86_64.egg/dawg.so in dawg.BytesDAWG.__init__ (src/dawg.cpp:7321)()
/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/DAWG-0.5-py2.7-macosx-10.5-x86_64.egg/dawg.so in dawg.CompletionDAWG.__init__ (src/dawg.cpp:5318)()
/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/DAWG-0.5-py2.7-macosx-10.5-x86_64.egg/dawg.so in dawg.DAWG.__init__ (src/dawg.cpp:1739)()
/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/DAWG-0.5-py2.7-macosx-10.5-x86_64.egg/dawg.so in dawg.DAWG._build_from_iterable (src/dawg.cpp:1978)()
Exception: Can't build dictionary
TypeError: object of type 'dawg.IntCompletionDAWG' has no len()
Could you implement len(), is it in the underlying C++ library? There are references to size()?
Не знаю, куда написать - то ли в pymorphy2, то ли сюда, то ли ещё куда.
DAWG 0.7.2
pymorphy2 0.5
pymorphy2-dicts от 10/11/2013 (самосборные)
Вот такое ловлю::
File "./django_pymorphy2/morph.py", line 8, in <module>
morph = MorphAnalyzer()
File "/usr/lib64/python2.7/site-packages/pymorphy2/analyzer.py", line 168, in __init__
self.prob_estimator = probability_estimator_cls(path)
File "/usr/lib64/python2.7/site-packages/pymorphy2/analyzer.py", line 65, in __init__
self.p_t_given_w = ConditionalProbDistDAWG().load(cpd_path)
File "dawg.pyx", line 387, in dawg.CompletionDAWG.load (src/dawg.cpp:7121)
IOError: It's not possible to read file stream
В чем может быть причина?
I know this is a work in progress, but I'm very excited by your project, and can't wait to get using it. There are two things that are stopping me (the second will be in a separate feature request), the first of which is that there aren't memory efficient ways to iterate through the DAWG yet.
Storing things is very efficient, but if I want to walk through all the items in the DAWG currently I end up with a list of keys, which in my case is actually too large to hold in memory (hence why I wanted to use the DAWG in the first place).
While I have plenty of Python (and C) development experience, I've never messed with Cython, so I'm not sure I'd be able to help get this implemented correctly any faster than you'd be able to. Otherwise, I'd take care of it myself and issue a pull request.
I have opened a PR to fix the issue: #44
This fixes an issue that can cause the dawg to allocate a lot of memory on load in the case of a corrupted file.
Unfortunately, the original dawgdic library looks like it has long since ceased to be maintained, so I am opening a PR against the python wrapper.
To replicate the issue, use the following script:
from dawg import BytesDAWG as dawg
with open('corrupt.dawg', 'w') as f:
f.write('corrupt!')
try:
d = dawg().load('corrupt.dawg')
except:
print('failed, as expected')
If you run this under gtime
on OSX, you will see somewhere between 4 and 6GB of RAM being used.
$ gtime -v python load_corrupted_dawg
failed, as expected
Command being timed: "python load_corrupted_dawg"
User time (seconds): 2.94
System time (seconds): 3.00
Percent of CPU this job got: 84%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.04
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6256108 <------------------------ SIX GIGABYTES
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 258
Minor (reclaiming a frame) page faults: 1890722
Voluntary context switches: 338
Involuntary context switches: 15360
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
After this PR, this number is greatly reduced
Maximum resident set size (kbytes): 35152
Hi,
I've tried to use dawgdic to create dictionary with c++ strings and tries to read it with dwag_python
it does not show any items and wrote some mistake related to encoding to UTF-8
I found that dwag_python works only with UTF-8 strings. Am I right?
After installing Anaconda3-2019.07-Linux-x86_64.sh for a new user, directly running pip install DAWG
fails with the following:
Building wheel for DAWG (setup.py) ... error
ERROR: Complete output from command /home/conda/anaconda3/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-_lu0pouq/DAWG/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-k8u1ivdk --python-tag cp37:
ERROR: running bdist_wheel
running build
running build_ext
building 'dawg' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/src
creating build/temp.linux-x86_64-3.7/lib
creating build/temp.linux-x86_64-3.7/lib/b64
gcc -pthread -B /home/conda/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ilib -I/home/conda/anaconda3/include/python3.7m -c src/dawg.cpp -o build/temp.linux-x86_64-3.7/src/dawg.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from src/dawg.cpp:266:0:
src/../lib/dawgdic/dictionary-builder.h: In member function ‘bool dawgdic::DictionaryBuilder::BuildDictionary(dawgdic::BaseType, dawgdic::BaseType)’:
src/../lib/dawgdic/dictionary-builder.h:138:5: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
if (dawg_.is_merging(dawg_child_index))
^~
src/../lib/dawgdic/dictionary-builder.h:139:53: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
link_table_.Insert(dawg_child_index, offset); {
^
src/dawg.cpp: In function ‘PyObject* __pyx_f_4dawg_9BytesDAWG_items(__pyx_obj_4dawg_BytesDAWG*, int, __pyx_opt_args_4dawg_9BytesDAWG_items*)’:
src/dawg.cpp:11011:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
~~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘PyObject* __pyx_gb_4dawg_9BytesDAWG_24generator2(__pyx_GeneratorObject*, PyObject*)’:
src/dawg.cpp:11485:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘PyObject* __pyx_f_4dawg_9BytesDAWG_keys(__pyx_obj_4dawg_BytesDAWG*, int, __pyx_opt_args_4dawg_9BytesDAWG_keys*)’:
src/dawg.cpp:11814:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
~~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘PyObject* __pyx_gb_4dawg_9BytesDAWG_29generator3(__pyx_GeneratorObject*, PyObject*)’:
src/dawg.cpp:12222:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘int __Pyx_GetException(PyObject**, PyObject**, PyObject**)’:
src/dawg.cpp:21523:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21524:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21525:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp:21526:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = local_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21527:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21528:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp: In function ‘void __Pyx_ExceptionSwap(PyObject**, PyObject**, PyObject**)’:
src/dawg.cpp:21550:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21551:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21552:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp:21553:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = *type;
^~~~~~~~
curexc_type
src/dawg.cpp:21554:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = *value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21555:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = *tb;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp: In function ‘void __Pyx_ExceptionSave(PyObject**, PyObject**, PyObject**)’:
src/dawg.cpp:21568:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
*type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21569:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
*value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21570:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
*tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp: In function ‘void __Pyx_ExceptionReset(PyObject*, PyObject*, PyObject*)’:
src/dawg.cpp:21582:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21583:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21584:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp:21585:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = type;
^~~~~~~~
curexc_type
src/dawg.cpp:21586:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21587:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = tb;
^~~~~~~~~~~~~
curexc_traceback
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for DAWG
Running setup.py clean for DAWG
Failed to build DAWG
Installing collected packages: DAWG
Running setup.py install for DAWG ... error
ERROR: Complete output from command /home/conda/anaconda3/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-_lu0pouq/DAWG/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-6jwkhmli/install-record.txt --single-version-externally-managed --compile:
ERROR: running install
running build
running build_ext
building 'dawg' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/src
creating build/temp.linux-x86_64-3.7/lib
creating build/temp.linux-x86_64-3.7/lib/b64
gcc -pthread -B /home/conda/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ilib -I/home/conda/anaconda3/include/python3.7m -c src/dawg.cpp -o build/temp.linux-x86_64-3.7/src/dawg.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from src/dawg.cpp:266:0:
src/../lib/dawgdic/dictionary-builder.h: In member function ‘bool dawgdic::DictionaryBuilder::BuildDictionary(dawgdic::BaseType, dawgdic::BaseType)’:
src/../lib/dawgdic/dictionary-builder.h:138:5: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
if (dawg_.is_merging(dawg_child_index))
^~
src/../lib/dawgdic/dictionary-builder.h:139:53: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
link_table_.Insert(dawg_child_index, offset); {
^
src/dawg.cpp: In function ‘PyObject* __pyx_f_4dawg_9BytesDAWG_items(__pyx_obj_4dawg_BytesDAWG*, int, __pyx_opt_args_4dawg_9BytesDAWG_items*)’:
src/dawg.cpp:11011:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
~~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘PyObject* __pyx_gb_4dawg_9BytesDAWG_24generator2(__pyx_GeneratorObject*, PyObject*)’:
src/dawg.cpp:11485:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘PyObject* __pyx_f_4dawg_9BytesDAWG_keys(__pyx_obj_4dawg_BytesDAWG*, int, __pyx_opt_args_4dawg_9BytesDAWG_keys*)’:
src/dawg.cpp:11814:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_10 = 0; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {
~~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘PyObject* __pyx_gb_4dawg_9BytesDAWG_29generator3(__pyx_GeneratorObject*, PyObject*)’:
src/dawg.cpp:12222:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
~~~~~~~~~~^~~~~~~~~~~
src/dawg.cpp: In function ‘int __Pyx_GetException(PyObject**, PyObject**, PyObject**)’:
src/dawg.cpp:21523:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21524:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21525:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp:21526:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = local_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21527:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21528:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp: In function ‘void __Pyx_ExceptionSwap(PyObject**, PyObject**, PyObject**)’:
src/dawg.cpp:21550:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21551:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21552:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp:21553:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = *type;
^~~~~~~~
curexc_type
src/dawg.cpp:21554:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = *value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21555:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = *tb;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp: In function ‘void __Pyx_ExceptionSave(PyObject**, PyObject**, PyObject**)’:
src/dawg.cpp:21568:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
*type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21569:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
*value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21570:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
*tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp: In function ‘void __Pyx_ExceptionReset(PyObject*, PyObject*, PyObject*)’:
src/dawg.cpp:21582:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
src/dawg.cpp:21583:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21584:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
src/dawg.cpp:21585:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = type;
^~~~~~~~
curexc_type
src/dawg.cpp:21586:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = value;
^~~~~~~~~
curexc_value
src/dawg.cpp:21587:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = tb;
^~~~~~~~~~~~~
curexc_traceback
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command "/home/conda/anaconda3/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-_lu0pouq/DAWG/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-6jwkhmli/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-_lu0pouq/DAWG/
I'm using mingw to compile dawg on windows using the command 'python setup.py install'
With other modules I've had no problem using mingw to compile modules on Windows.
When I try to import dawg I get this error:
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
import dawg
Traceback (most recent call last):
File "", line 1, in
File "build\bdist.win32\egg\dawg.py", line 7, in
File "build\bdist.win32\egg\dawg.py", line 6, in bootstrap
ImportError: DLL load failed: The specified procedure could not be found.
This is the output when I install dawg:
running install
running bdist_egg
running egg_info
writing DAWG.egg-info\PKG-INFO
writing top-level names to DAWG.egg-info\top_level.txt
writing dependency_links to DAWG.egg-info\dependency_links.txt
reading manifest file 'DAWG.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'DAWG.egg-info\SOURCES.txt'
installing library code to build\bdist.win32\egg
running install_lib
running build_ext
building 'dawg' extension
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src\b64_decode.cpp -o build\temp.win32-2.7\Release\src\b64_decode.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src\dawg.cpp -o build\temp.win32-2.7\Release\src\dawg.o
src\dawg.cpp: In function 'PyObject* __pyx_f_4dawg_4DAWG_tobytes(_pyx_obj_4dawg_DAWG, int)':
src\dawg.cpp:2877:7: warning: variable '_pyx_t_3' set but not used [-Wunused-but-set-variable]
src\dawg.cpp: In function 'PyObject __pyx_f_4dawg_14CompletionDAWG_tobytes(_pyx_obj_4dawg_CompletionDAWG, int)':
src\dawg.cpp:6856:7: warning: variable '_pyx_t_3' set but not used [-Wunused-but-set-variable]
src\dawg.cpp: In function 'PyObject __pyx_f_4dawg_9BytesDAWG_items(_pyx_obj_4dawg_BytesDAWG, int, _pyx_opt_args_4dawg_9BytesDAWG_items)':
src\dawg.cpp:10063:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
src\dawg.cpp: In function 'PyObject* __pyx_gb_4dawg_9BytesDAWG_24generator2(pyx_GeneratorObject, PyObject)':
src\dawg.cpp:10538:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
src\dawg.cpp: In function 'PyObject* __pyx_f_4dawg_9BytesDAWG_keys(_pyx_obj_4dawg_BytesDAWG, int, _pyx_opt_args_4dawg_9BytesDAWG_keys)':
src\dawg.cpp:10847:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
src\dawg.cpp: In function 'PyObject* __pyx_gb_4dawg_9BytesDAWG_29generator3(pyx_GeneratorObject, PyObject)':
src\dawg.cpp:11256:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
src\dawg.cpp: In function 'void Pyx_RaiseArgtupleInvalid(const char, int, Py_ssize_t, Py_ssize_t, Py_ssize_t)':
src\dawg.cpp:19641:59: warning: unknown conversion type character 'z' in format [-Wformat]
src\dawg.cpp:19641:59: warning: format '%s' expects argument of type 'char', but argument 5 has type 'Py_ssize_t {aka int}' [-Wformat]
src\dawg.cpp:19641:59: warning: unknown conversion type character 'z' in format [-Wformat]
src\dawg.cpp:19641:59: warning: too many arguments for format [-Wformat-extra-args]
src\dawg.cpp: In function 'void __Pyx_RaiseTooManyValuesError(Py_ssize_t)':
src\dawg.cpp:19665:94: warning: unknown conversion type character 'z' in format [-Wformat]
src\dawg.cpp:19665:94: warning: too many arguments for format [-Wformat-extra-args]
src\dawg.cpp: In function 'void Pyx_RaiseNeedMoreValuesError(Py_ssize_t)':
src\dawg.cpp:19671:48: warning: unknown conversion type character 'z' in format [-Wformat]
src\dawg.cpp:19671:48: warning: format '%s' expects argument of type 'char', but argument 3 has type 'Py_ssize_t {aka int}' [-Wformat]
src\dawg.cpp:19671:48: warning: too many arguments for format [-Wformat-extra-args]
src\dawg.cpp: In function 'PyObject __pyx_f_4dawg_9BytesDAWG_keys(_pyx_obj_4dawg_BytesDAWG, int, _pyx_opt_args_4dawg_9BytesDAWG_keys)':
src\dawg.cpp:20133:5: warning: '__pyx_v_i' may be used uninitialized in this function [-Wmaybe-uninitialized]
src\dawg.cpp:10687:7: note: '__pyx_v_i' was declared here
src\dawg.cpp: In function 'PyObject* __pyx_f_4dawg_9BytesDAWG_items(_pyx_obj_4dawg_BytesDAWG, int, _pyx_opt_args_4dawg_9BytesDAWG_items)':
src\dawg.cpp:20133:5: warning: '__pyx_v_i' may be used uninitialized in this function [-Wmaybe-uninitialized]
src\dawg.cpp:9898:7: note: '__pyx_v_i' was declared here
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src\iostream.cpp -o build\temp.win32-2.7\Release\src\iostream.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_base_types.cpp -o build\temp.win32-2.7\Release\src_base_types.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_completer.cpp -o build\temp.win32-2.7\Release\src_completer.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_dawg.cpp -o build\temp.win32-2.7\Release\src_dawg.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_dawg_builder.cpp -o build\temp.win32-2.7\Release\src_dawg_builder.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_dictionary.cpp -o build\temp.win32-2.7\Release\src_dictionary.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_dictionary_builder.cpp -o build\temp.win32-2.7\Release\src_dictionary_builder.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_dictionary_unit.cpp -o build\temp.win32-2.7\Release\src_dictionary_unit.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_guide.cpp -o build\temp.win32-2.7\Release\src_guide.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_guide_builder.cpp -o build\temp.win32-2.7\Release\src_guide_builder.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c src_guide_unit.cpp -o build\temp.win32-2.7\Release\src_guide_unit.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c lib/b64\cdecode.c -o build\temp.win32-2.7\Release\lib\b64\cdecode.o
C:\MinGW\bin\gcc.exe -mdll -O -Wall -Ilib -Ic:\kivy18\Python27\include -Ic:\kivy18\Python27\PC -c lib/b64\cencode.c -o build\temp.win32-2.7\Release\lib\b64\cencode.o
writing build\temp.win32-2.7\Release\src\dawg.def
creating build\lib.win32-2.7
C:\MinGW\bin\g++.exe -shared -s build\temp.win32-2.7\Release\src\b64_decode.o build\temp.win32-2.7\Release\src\dawg.o build\temp.win32-2.7\Release\src\iostream.o build\temp.win32-2.7\Release\src_base_types.o build\temp.win32-2.7\Release\src_completer.o build\temp.win32-2.7\Release\src_dawg.o build\temp.win32-2.7\Release\src_dawg_builder.o build\temp.win32-2.7\Release\src_dictionary.o build\temp.win32-2.7\Release\src_dictionary_builder.o build\temp.win32-2.7\Release\src_dictionary_unit.o build\temp.win32-2.7\Release\src_guide.o build\temp.win32-2.7\Release\src_guide_builder.o build\temp.win32-2.7\Release\src_guide_unit.o build\temp.win32-2.7\Release\lib\b64\cdecode.o build\temp.win32-2.7\Release\lib\b64\cencode.o build\temp.win32-2.7\Release\src\dawg.def -Lc:\kivy18\Python27\libs -Lc:\kivy18\Python27\PCbuild -lpython27 -lmsvcr90 -o build\lib.win32-2.7\dawg.pyd
creating build\bdist.win32
creating build\bdist.win32\egg
copying build\lib.win32-2.7\dawg.pyd -> build\bdist.win32\egg
creating stub loader for dawg.pyd
byte-compiling build\bdist.win32\egg\dawg.py to dawg.pyc
creating build\bdist.win32\egg\EGG-INFO
copying DAWG.egg-info\PKG-INFO -> build\bdist.win32\egg\EGG-INFO
copying DAWG.egg-info\SOURCES.txt -> build\bdist.win32\egg\EGG-INFO
copying DAWG.egg-info\dependency_links.txt -> build\bdist.win32\egg\EGG-INFO
copying DAWG.egg-info\top_level.txt -> build\bdist.win32\egg\EGG-INFO
writing build\bdist.win32\egg\EGG-INFO\native_libs.txt
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist\DAWG-0.7.5-py2.7-win32.egg' and adding 'build\bdist.win32\egg' to it
removing 'build\bdist.win32\egg' (and everything under it)
Processing DAWG-0.7.5-py2.7-win32.egg
Copying DAWG-0.7.5-py2.7-win32.egg to c:\kivy18\python27\lib\site-packages
Adding DAWG 0.7.5 to easy-install.pth file
Installed c:\kivy18\python27\lib\site-packages\dawg-0.7.5-py2.7-win32.egg
Processing dependencies for DAWG==0.7.5
Finished processing dependencies for DAWG==0.7.5
Hi there. I am using the last version of the library (da01324) but apparently I am facing some problems. I don't know if it's just me or if there's actually a bug in the code. I am not sure since there's not so much documentation in the package.
I have a simple txt file containing a sorted (lexicographically) set of strings. I just want to create a DAWG object containing all the sorted set. Here's what I'm doing:
>>> import dawg
>>> d = dawg.DAWG()
>>> d.load("..filename..")
Program received signal SIGSEGV, Segmentation fault.
__pyx_pf_4dawg_4DAWG_16read (__pyx_v_f=0xb7d85390, __pyx_v_self=0xb7b10158) at src/dawg.cpp:2407
2407 __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
(gdb) bt
#0 __pyx_pf_4dawg_4DAWG_16read (__pyx_v_f=0xb7d85390, __pyx_v_self=0xb7b10158) at src/dawg.cpp:2407
#1 __pyx_pw_4dawg_4DAWG_17read (__pyx_v_self=0xb7b10158, __pyx_v_f=0xb7d85390) at src/dawg.cpp:2366
#2 0x0806303d in PyObject_Call (func=0xb7b236ac, arg=0xb7b1482c, kw=0x0) at Objects/abstract.c:2529
#3 0xb7a68410 in __pyx_pf_4dawg_4DAWG_20load (__pyx_v_path=0xb7d853e8, __pyx_v_self=0x825f5ac) at src/dawg.cpp:2587
#4 __pyx_pw_4dawg_4DAWG_21load (__pyx_v_self=0xb7b10158, __pyx_v_path=0xb7d853e8) at src/dawg.cpp:2500
#5 0x080dbb84 in call_function (oparg=<optimized out>, pp_stack=0xbffff0b4) at Python/ceval.c:4009
#6 PyEval_EvalFrameEx (f=0x825f5ac, throwflag=0) at Python/ceval.c:2666
#7 0x080dcaa1 in PyEval_EvalCodeEx (co=0xb7b1a068, globals=0xb7db835c, locals=0xb7db835c, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#8 0x080dcbf7 in PyEval_EvalCode (co=0xb7b1a068, globals=0xb7db835c, locals=0xb7db835c) at Python/ceval.c:667
#9 0x080fe41d in run_mod (arena=0x8244fb0, flags=0xbffff34c, locals=0xb7db835c, globals=0xb7db835c,
filename=0x8154c30 "<stdin>", mod=0x825b778) at Python/pythonrun.c:1353
#10 PyRun_InteractiveOneFlags (fp=0xb7f6aac0, filename=0x8154c30 "<stdin>", flags=0xbffff34c) at Python/pythonrun.c:852
#11 0x080fe618 in PyRun_InteractiveLoopFlags (fp=0xb7f6aac0, filename=0x8154c30 "<stdin>", flags=0xbffff34c)
at Python/pythonrun.c:772
#12 0x080fe737 in PyRun_AnyFileExFlags (fp=0xb7f6aac0, filename=<optimized out>, closeit=0, flags=0xbffff34c)
at Python/pythonrun.c:741
#13 0x08059d07 in Py_Main (argc=1, argv=0xbffff424) at Modules/main.c:639
#14 0x08058d6b in main (argc=1, argv=0xbffff424) at ./Modules/python.c:23
(gdb) q
A debugging session is active.
Inferior 1 [process 24546] will be killed.
Quit anyway? (y or n) y
Neither DAWG.load or DAWG.read seems to be working. I also tried updating the Cython generated source code by using the script ./update_cpp.sh
but I get other error during the parse of the .pyx file. Just for reference I am using version 0.15.1 of Cython.
Hi. Would it be possible to publish wheels for Windows to PyPI? https://pypi.python.org/pypi/DAWG
This makes installation faster and more reliable http://pythonwheels.com/
Good day! Trying to build DAWG, but got such error:
Command "/usr/local/opt/python/bin/python3.7 -u -c "import setuptools, tokenize;__file__='/private/var/folders/kj/9fyj5j0n2td6ss6g43f_tbs00000gn/T/pip-install-1ogot1h1/dawg/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/kj/9fyj5j0n2td6ss6g43f_tbs00000gn/T/pip-record-7al0nz7k/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/kj/9fyj5j0n2td6ss6g43f_tbs00000gn/T/pip-install-1ogot1h1/dawg/
Maybe someone could help me?
Could you add it please, to simplify the installation directly from git?
Tests are available in the git repository, but they are not included in the tarball at PyPI. Please include them. Thanks!
There are actually two related but distinct data structures called DAWG; a type of suffix index, and a compactified trie. I assume this package implements the latter, given that dawgdic refers to Daciuk's papers? If so, it might be worthwhile to document that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.