sztal / pybdm Goto Github PK
View Code? Open in Web Editor NEWPython implementation of block decomposition method for approximating algorithmic complexity
Home Page: https://pybdm-docs.readthedocs.io/en/latest/
License: MIT License
Python implementation of block decomposition method for approximating algorithmic complexity
Home Page: https://pybdm-docs.readthedocs.io/en/latest/
License: MIT License
Regarding the normalization of the BDM, how do you perform the normalization?
I was expecting the normalization to be:
NBDM=sum(BDM) / (|x| * log_2(#))
In other words, the sum of the bits of all the blocks divided by the number of bits times log_2 of symbol cardinality. The results, however, seem to indicate that this is not the case.
The problem is to compare with other measures, we need a similar normalization factor (example other compressors).
Hello, what is the recommendation for a large 1D array of np.uint8 elements as input, which is initialized from a binary file using np.memmap()?
Seems preferred behavior might be to to specify n_symbols as 256 in BDM constructor, and then each element gets mapped down to sequence of 8 bits internally so that the CTM-B2-D12 gets used
# Create a dataset (256 discrete symbols)
X = np.full((1,), 255, dtype=np.uint8)
# Initialize BDM object with 256-symbols alphabet
bdm = BDM(ndim=1, nsymbols=256)
# Compute BDM
print(bdm.bdm(X)) # this fails, but I would expect it to transform [255] to [1 1 1 1 1 1 1 1] internally
Hello PyBDM team!
My hope is to use PyBDM to analyze ocean temperatures data (my data has 2 decimals, for example T = 13.52, 14.07,... etc). And since the data contains 10 different symbols (1,2,3,...,9 and 0) I can't use your code for Non-binary sequences (1D) since it supports a maximum of 9 different symbols. My plan is therefore to remove decimal points and convert the decimal data into binary (e.g. 13.52, 14.07 -> 1352, 1407 -> 00110001 00110011 00110101 00110010, 00110001 00110100 00110000 00110111 etc) and use your code for Binary sequences (1D). So to match the binary data representation I would like to use a block size of 8.
To make sure my PyBDM for Binary sequences (1D) works correctly I tested it against the Online Algorithmic Complexity Calculator (https://complexitycalculator.com) using the standard string (010101010101010101010101010101010101) and standard OACC settings (block size=12, overlap=0, alphabet=2). Both OACC and my PyBDM computed the algorithmic complexity to be 28.5757 bits.
While testing the code I noticed that when converting the sequence 1111111111 (i.e. ten repetitions of ones) into binary (i.e. s=00110001001100010011000100110001001100010011000100110001001100010011000100110001), the PyBDM code for Binary sequences (1D) computes the algorithmic complexity of the binary string s to be 68.71 bits, while the OACC manages to compute the algorithmic complexity of the binary string s to be 25.39 bits (block size=8, overlap=0, alphabet=2).
Furthermore, if I set the OACC to default setting (block size=12, overlap=0, alphabet=2), it computes the algorithmic complexity of the binary string s to be 90.7816 bits, which would wrongly imply that the string is Martin-Löf random since then K(s)/length(s)>1, for K(s)=90.78, length(s)=80.
Is there any way to set the block size to 8 in the PyBDM code for Binary sequences (1D) to replicate the efficient OACC result (25.39 bits)?
I've set BDM(ndim=1, nsymbols=4) and I have lots of nucleotide strings. Sometimes, not all 4 letters get expressed for a small snip of a genome. In this case, the string [0,2,3] returns this error:
Computed BDM is 0, dataset may have incorrect dimensions
Is this a bug? Thanks!
Edit: Also, this happens for longer strings with only 3/4 of the letters expressed. It doesn't seem it matters too much about the length of the string.
As in the title. It should be done in the nearest release (or just as a patch).
Hi! I am trying to get a algorithmic complexity measure for length 10 binary sequences. But bdm = BDM(ndim=1) bdm.bdm(X) raised valuewarning that every length 10 binary sequences is 0. Also bdm.ent always yield 0 while it's impossible the shannon entropy of all sequence is zero.
It would be perhaps useful to send warnings when for instance entropy is zero, but because the data is short and was decomposed to only one block, as this may be confusing for users, who do not know the details of the implementation too well.
As in the title. Should be done in the next non-patch release.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.