ilanschnell / bitarray Goto Github PK
View Code? Open in Web Editor NEWefficient arrays of booleans for Python
License: Other
efficient arrays of booleans for Python
License: Other
Thanks, i'm trying to iterate a redis bitmap value.
Can you comment on this issue :
http://stackoverflow.com/questions/41027865/extending-bitarray-to-allow-more-arguments
How can I use Cython memoryviews with bitarray ?
Hi,
It would be cool to be able to add an integer of arbitrary size.
That is; that you can specify how many bits the integer is.
I was evaluating bitarray
under pypy3 (6.0.0) and was hit with numerous breaks and tracebacks. When escalating my problem to the pypy developers (issue here). The feedback was:
The issue is this piece of code:
static PyTypeObject DecodeIter_Type = {
#ifdef IS_PY3K
PyVarObject_HEAD_INIT(&DecodeIter_Type, 0)
#else
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
#endif
...On py3k, it causes the type of object returned by .iterdecode() to be an instance of itself. Additionally, PyType_Ready() is never called, which makes the type invalid, per CPython documentation. This should be fixed in bitarray.
This is consistent with the guidelines for developing extensions.
Hi. I just installed bitarray and have found a segfault. I'm able to reduce it down to this code:
In [1]: from bitarray import *
In [2]: b = bitarray([False, True, False, False, True])
In [3]: t = bitarray([True])
In [4]: b.itersearch(t)
zsh: segmentation fault ipython
Using:
Python 2.7.2+ (default, Jul 20 2012, 22:15:08)
[GCC 4.6.1] on linux2
Installed bitarray using easy_install
In my application, I have thousands of small bitarrays that I need to concatenate to get one large bitarray. There doesn't seem to be an efficient way of doing this.
What I've tried:
# Try #1
# Allocate large bitarray, then assign into it at offsets:
numBitArrays = len(bitArrayList)
lengths = [len(bitArray) for bitArray in bitArrayList]
totalNumBits = sum(lengths)
newBitArray = bitarray.bitarray(totalNumBits)
offset = 0
for i in xrange(numBitArrays):
length = lengths[i]
newBitArray[offset : offset + length] = bitArrayList[i]
offset += length
# Try # 2
# Use the + operator to concatenate bitarrays:
newBitArray = bitarray.bitarray(0)
for bitArray in bitArrayList:
newBitArray += bitArray
# Try #3
# Get the bits as strings, concatenate strings, then create large bitarray from string
bitsList = list()
for bitArray in bitArrayList:
bits = bitArray.to01()
bitsList.append(bits)
newBits = ''.join(bitsList)
newBitArray = bitarray.bitarray(newBits)
Surprisingly, the 3rd method seems to be about 25% faster than the others. But it takes more memory and still is a bit slow for my application with 3000 bitarrays to be concatenated
A common need with a bit array is to iterate over all bits that are set (or all bits that are cleared). Right now this is accomplished via arr.itersearch(bitarray('1'))
, but the internal implementation of search()
(which is used by itersearch()
) simply compares bit by bit via two loops. This implementation is ultimately very naive, requiring multiple registers, increments, compares, and jumps for every single bit, regardless of whether the bit is set. This could be sped up dramatically by having a dedicated iterator for set bits. This implementation could walk 32 or 64 bits at a time and quickly skip over regions containing no set bits at potentially hundreds of times the speed of the current implementation (a single QWORD operation replacing 64 masks, compares, bitshifts, increments, and jumps). It is also possible to use a "count trailing zero" and/or "count leading zero" operation to further reduce the number of required bitwise comparisons required in many cases (see https://lemire.me/blog/2018/02/21/iterating-over-set-bits-quickly/).
I use the bitarray module to transform a DNA sequence, that is written in a binary file, to its reverse complement.
Each nucleotide is represented by two bits in the following format: A - 00, C - 01, G - 10, T - 11.
For example, the reverse complement of AGCTACGG (00 10 01 11 00 01 10 10) would be CCGTAGCT (01 01 10 11 00 10 01 11).
This sequence takes up exactly 16 bits (2 bytes), but a sequence of length 9 would take 18 bits and it is padded to take up 24 bits (3 bytes).
So, it would be great, if you could set an offset, which part of the bitarray to transform (since there is no need to modify the padding bits).
At the moment, the following steps are needed to transform a bitarray to its reverse compelment:
copy the bitarray
reverse the bitarray
invert the bitarray
use a for cycle to shuffle bits (e.g. 01 to 10 and vice versa)
And if the bitarray has padding bits:
copy the bittaray without padding
reverse the bitarray
invert the bitarray
use a for cycle to shuffle bits (e.g. 01 to 10 and vice versa)
reverse the bitarray
use fill() function
reverse the bittaray again
A reversecomp() feature would make this process much easier and probably a lot faster too.
Is there any way to use bitarray as daemon or will it be supported in the future?
It would be great to have pre-built wheels on Pypi.
I am vendoring some here:https://github.com/nexB/scancode-toolkit/tree/1ff61e5858186210dd8393f7ee55c20c30168369/thirdparty/prod#cfc12116b3783dcbe14d11617e0230f8-170bd4072ed3b82dc55e8307cacd50d4982cdf8d and I build them with these "build loops" (on Linux, Mac and Windows through Travis and Appveyor):
I would be willing to help to have this baked in this project.
top(count=1) -> bitarray
Cut items from begin of array and return them.
>>> from bitarray import bitarray
>>> bitwise = bitarray('00111110')
>>> bitwise.top(2)
bitarray('00')
>>> bitwise
bitarray('111110')
tail(count=1) -> bitarray
Cut items from end of array and return them.
>>> from bitarray import bitarray
>>> bitwise = bitarray('00111110')
>>> bitwise.tail(2)
bitarray('10')
>>> bitwise
bitarray('001111')
There is the index
function which find a value from the start of the bitarray and return its index. Is there is a way to make the same but reversed from the end of the bitarray?
from bitarray import bitarray
b = bitarray( 114873544900 )
with open( "test.bitarray", "wb" ) as file:
b.tofile( file )
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: open file expected
Given dealing with large data is one of bitarray
s primary use cases, it should be able to do so.
Can you add support for the << and >> operators? Perhaps look at https://bitbucket.org/quiark/bitarray/src/473cb720d09d/bitarray/_bitarray.c for a reference. Thanks.
Hey owners of bitarray,
Looks like the sdist package of bitarray is missing PKG-INFO which will error out when we call pkginfo.SDist(file_path)
.
ValueError: No PKG-INFO in archive: ../cache/bitarray-0.8.1.tar.gz
https://pypi.python.org/simple/bitarray/
It would be nice if we can include PKG-INFO in sdist at next release.
Thanks,
Hao
Currently you can only append/insert new bits. A useful feature would be to set/toggle certain bits or a bits range, i.e.
bitarr.set(5, True)
bitarr.toggle(5)
bitarr.set_range(5, 10, True)
Hi, I have been using bitarray
recently, and I realized that bitarray
objects are not registered with the collections.Sequence
abstract base class, even though it implements the necessary __len__
and __getattr__
.
This makes code that checks for implementation of the Container
, Iterable
, Sized
, and Sequence
abstract base classes in the collections
module fail when using bitarray
.
It would be useful to have an immutable counterpart to the bitarray type, analogous to bytes for bytearray, for hashability and because immutable types are safer in important respects.
Struct and similar other built-in functions and methods allow conversion to and from raw binary data at offsets, which is useful for dealing with bit arrays embedded inside larger data structures.
Am I correct in assuming that inserts to the middle of the array will require a resize?
It can't hurt to explicitly mention this in the docs.
Thanks for the project.
Mutable objects shouldn't be able to be used as keys, as they can cause collisions and unexpected behavior. I haven't dug into why yet but on Python 2 bitarrays are hashable, allowing programmers to use them as dictionary keys.
To reproduce:
import bitarray
array_one = bitarray.bitarray([True, False])
array_two = bitarray.bitarray([False, False])
dct = {
array_one: "one",
array_two: "two",
}
array_three = bitarray.bitarray([True, False])
array_three == array_one # Returns True
dct[array_three] # Should return "one", throws KeyError instead
Here's my test case:
>>> import bitarray, subprocess
>>> f = subprocess.Popen(['zstd', '-qdc', '20151111-rdns.gz.zstd.coverage.bin.zstd'], stdout=subprocess.PIPE).stdout
>>> b = bitarray.bitarray()
>>> b.fromfile(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: could not find EOF
Only happens under Python2. Not that important since using Py3 is a workaround, but I thought I'd report it anyway.
I know that is outside the spec but is a strange behaviour
In : bitarray([True, False])
Out: bitarray('10')
In : bitarray('1') + bitarray('0')
Out: bitarray('10')
In : bitarray('1') + True
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-93-3d90d100a032> in <module>()
----> 1 bitarray('1') + True
TypeError: could not extend bitarray
In : bitarray(True) # here it interprets True as the integer 1 => int(True)
Out: bitarray('1')
In : bitarray(False)
Out: bitarray()
In : a = 2 * bitarray('1') + 2 * bitarray('0') + 2 * bitarray('1') + bitarray(False) + bitarray(True)
In : a
Out: bitarray('1100111')
# it misses the false because it evaluate as bitarray() -> zero length
Do I have to install this ?
We have many machines on the (Jenkins) CI server. I guess some of them have older version of VS C++...
I case I can't upgrade the CI servers soon, can I skip compilation and download only binaries or something?
Hello everyone,
For one of my projects I would need to get the cardinality
of a bit array, i.e the number of bits set to 1, as defined in Java's BitSet class
For now the simplest way of doing this is using sum
:
cardinality = sum(1 for i in my_bitarray if i)
My issue is that it is no that fast (between 10 and 50ms for a million bits array on my machine), I mean not that fast if I need to compute/use it quite often.
Is there a better way to get the cardinality ?
I suggest the cardinality to be an attribute and incrementing/decrementing it when setting a bit to 1
For some tasks it would be useful if bitarray would be able to compute the lexicographically next bit permutation like mentioned on http://www-graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation (at the very bottom). Also some of the other tricks there might be handy as well.
TypeError: unhashable type: 'bitarray'
It would be useful to be able to use bitarrays as keys in dictionaries, and I don't think implementing this would be too bad, but I'm not familiar with the codebase.
Here's the code that I was trying to use:
bitarray_dictionary = {}
reverse_bitarray_dictionary = {}
for i,base in enumerate(dictionary):
binary = '{0:b}'.format(i)
b = bitarray(binary)
bitarray_dictionary[base] = b
reverse_bitarray_dictionary[b] = base
which I could then turn into an iterator:
def characters(bitarray):
for i in range(0, len(bitarray), 2):
bits = bitarray[i:i+2]
yield reverse_bitarray_dictionary[bits]
Sincerely,
Wesley
I thought I would get a zero from this code, but an segfault is raised
>>> from bitarray import bitarray
>>> b = bitarray()
>>> int(b)
[1] 23442 segmentation fault (core dumped) python
Tested in python 2.7.3 on arch linux 64 bit, installed though pip.
I came across this lib in http://kmike.ru/python-data-structures/ and got interested since it could replace bitfield in my projects.
Here is a link to Windows installers for different Py versions of bitarray:
http://www.lfd.uci.edu/~gohlke/pythonlibs/#bitarray
I execute the same command twice and get different results. Why would this happen?
I'm using bitarray 0.8.1 with Python 2.7.3 on Ubuntu Linux 12.04 64 bit.
In [301]: x = bitarray(20); x[0:2] = True; x[4:7] = True; x[9:20] = True; x
Out[301]: bitarray('11011110111111111111')
In [302]: x = bitarray(20); x[0:2] = True; x[4:7] = True; x[9:20] = True; x
Out[302]: bitarray('11011110011111111111')
In [303]: x = bitarray(20); x[0:2] = True; x[4:7] = True; x[9:20] = True; x
Out[303]: bitarray('11111110011111111111')
In [304]: x = bitarray(20); x[0:2] = True; x[4:7] = True; x[9:20] = True; x
Out[304]: bitarray('11101110011111111111')
class Sieve(bitarray):
def __init__(self, init):
bitarray.__init__(self, init)
# and more code ...
s=Sieve([False,False,True,True])
fails with
TypeError: bitarray() argument 2 must be str, not list
(surprisingly when calling Sieve constructor, not the bitarray.init ...)
tried various super() variants, no change...
bitarray 0.8.3 (from Anaconda), python 3.7.1, Windows 10
Hi,
I'm using bitarray, and I noticed a change in behavior of tostring.
In 0.3.5, the following code works:
>>> from bitarray import bitarray
>>> bitarray("11111111").tostring()
'\xff'
While in 0.4.0 and HEAD, it doesn't work:
>>> from bitarray import bitarray
>>> bitarray("11111111").tostring()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/.../virtualenv/lib/python2.7/site-packages/bitarray/__init__.py", line 91, in tostring
return self.tobytes().decode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
I noticed in 0.4.0 also tobytes()
and frombytes()
have been introduced. So my question is: is the change in behavior of tostring()
intended and should I update my code to use tobytes()
if bitarray >= 0.4.0 is detected, or is this change of behavior a bug in bitarray?
-Tobi
I get an error when I attempt to install this package.
$ pip install bitarray==0.8.2
Collecting bitarray==0.8.2
Downloading [internal mirror]/bitarray-0.8.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/b2/n_p6v7k57xzbxywzk1kykrnh0000gn/T/pip-build-s3sqrmv4/bitarray/setup.py", line 7, in <module>
kwds['long_description'] = open('README.rst').read()
FileNotFoundError: [Errno 2] No such file or directory: 'README.rst'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/b2/n_p6v7k57xzbxywzk1kykrnh0000gn/T/pip-build-s3sqrmv4/bitarray/
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
There is no error if I install 0.8.1
.
I recently passed a float to bitarray()
without realizing it. It crashed with "TypeError: could not extend bitarray"
which had me running in circles for a few hours before realizing it. A simple type check in __init__
may spare others the lost time.
Most of the code deals with things a bit or a byte at a time. Could some speedup be realized by dealing with chunks of 32 bits when possible?
In particular, doing a misaligned copy would be possible by bitshift operations on a 32-bit buffer, rather than doing everything bitwise.
bitarray's __setitem__
checks the existing bounds:
>>> b = bitarray.bitarray()
>>> i = 3
>>> b[i] = True
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: bitarray index out of range
For sparser bitarrays, sometimes it's desirable to set an index (or a sequence of indices) to 1, resizing if necessary to accommodate the extra bits.
This can be done with:
>>> b.extend(itertools.repeat(0, i+1-b.length()))
>>> b[i] = True
>>> b
bitarray('0001')
But it would be nice if there was a clean an efficient way to do that sort of thing.
Hi,
I am just talking out loudly, did not think about it too much, but:
It might be a good idea to detect endianness at runtime, and functions can be implemented in a endian neutral way. Let's say I have a huge bitset holding read articles of a user in a database. I will search the first free bit to find first underad article but, the returned index might differ between big-little endian platforms.
Is it not a good idea to have endian-neutral functions by default?(and naturally user might explicitly change this setting)
Dumping a bitarray directly via marshal does not store the correct format:
>>> from bitarray import bitarray
>>> b = bitarray('10001101')
>>> b
bitarray('10001101')
>>> import marshal
>>> s = marshal.dumps(b)
>>> c = marshal.loads(s)
>>> c
'\x8d'
>>> b # for comparison
bitarray('10001101')
A workaround is to dump and load mytes:
>>> s = marshal.dumps(b.tobytes())
>>> c = bitarray()
>>> c.frombytes(marshal.loads(s))
>>> c
bitarray('10001101')
>>> b # for comparison
bitarray('10001101')
It would be more convenient when marshaling bitarrays stored in lists, dicts, etc if it was not necessary to do this conversion.
It works fine with Python2, but not Python3:
$ python2
Python 2.7.12 (default, Jun 29 2016, 14:05:02)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bitarray import bitarray
>>> b = bitarray("000110011110")
>>> m = memoryview(b)
>>>
$ python3
Python 3.5.2 (default, Jul 28 2016, 21:28:00)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bitarray import bitarray
>>> b = bitarray("000110011110")
>>> m = memoryview(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: memoryview: a bytes-like object is required, not 'bitarray'
>>>
I have used this library, and it is really awesome.
I just wondering if there is a way to trim zeros from the start and the end of the bitarrray. Just like trimming white spaces from a string in other languages.
thanks a lot,
The unittest.TestCase.assert_()
alias is deprecated in Python 2.7 and Python 3.2+, so running test_bitarray.py prints 49 warnings like the following:
bitarray/test_bitarray.py:65: DeprecationWarning: Please use assertTrue instead.
self.assert_(0 <= unused < 8)
A quick solution might be to use assertTrue()
instead; however the unittest library also suggests using the purpose-specific functions available in Python 2.7/3.1+ (e.g. assertIs()
and assertIsInstance()
).
Hi Ilan,
It seems illogical to me (being the purpose of my task to identify packet contents using a bitarray) that one cannot set the contents of the bitarray from bytes, instead only extending the bitarray. Furthermore, one cannot read the bitarray as an integer, nor construct one from bytes directly. Could you offer any assistance?
Thanks, Angus.
It would be great to have the tests running on supported on OSes on Travis (Linux and Mac) and Appveyor (Windows).
I can help as needed.
https://pypi.io/packages/source/b/bitarray/bitarray-0.8.3.tar.gz doesn't have LICENSE file.
Running setup.py install results in an error. This happens for both 2.7 & 3.4 versions of python
X:>c:/python34/python.exe setup.py install
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.4
creating build\lib.win-amd64-3.4\bitarray
copying bitarray\test_bitarray.py -> build\lib.win-amd64-3.4\bitarray
copying bitarray__init__.py -> build\lib.win-amd64-3.4\bitarray
running build_ext
building 'bitarray._bitarray' extension
error: Unable to find vcvarsall.bat
Hello,
Following on the recent problems with 0.8.2, it appears that the latest version is uploaded to pypi as 0.8.2.1, despite the fact that it says at https://pypi.org/project/bitarray/0.8.2/#history it is 0.8.2.
pip install bitarray==0.8.2
does not work. It reports:
Could not find a version that satisfies the requirement bitarray==0.8.2 (from -r foo (line 1)) (from versions: 0.1.0, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.4.0, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 0.8.0, 0.8.1, 0.8.2.1)
No matching distribution found for bitarray==0.8.2 (from -r foo (line 1))
Is it possible to deploy this as version 0.8.3, and remove 0.8.2.1 completely from pypi? Or alternatively, fix the 0.8.2 deployment to be saved as 0.8.2?
If n < size:
would call delete_n(size - n, n)
Elif n > size:
would call resize(size + n) and then memset(self->ob_item + old_size , 0x00, n);
If you think this is a good idea, I can submit a PR.
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I%PREFIX%\include -I%PREFIX%\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" -I%PREFIX%\Library\include -I%PREFIX%\Library\include /Tcbitarray/_bitarray.c /Fobuild\temp.win-amd64-3.7\Release\bitarray/_bitarray.obj
_bitarray.c
bitarray/_bitarray.c(2431): error C2099: initializer is not a constant
bitarray/_bitarray.c(2546): error C2099: initializer is not a constant
bitarray/_bitarray.c(2910): error C2099: initializer is not a constant
bitarray/_bitarray.c(3037): error C2099: initializer is not a constant
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\amd64\\cl.exe' failed with exit status 2
MemoryError after running the following code:
bit_list = []
for i in range(3):
print (i)
a = bitarray(2**32)
bit_list.append(a)
I believe it is currently is only set for 2.7:
https://github.com/ilanschnell/bitarray/blob/master/bitarray/_bitarray.c#L51-L54
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.