sztal / pybdm Goto Github PK

Python implementation of block decomposition method for approximating algorithmic complexity

Home Page: https://pybdm-docs.readthedocs.io/en/latest/

License: MIT License

Makefile 1.85% Python 98.15%

algorithmic-complexity algorithmic-information-dynamics algorithmic-information-theory complexity data-science kolmogorov-complexity

pybdm's People

Contributors

Stargazers

Watchers

Forkers

kostastsa algorithmic-dynamics-lab diitaz93 pycert arjunkaruvally nodtem66 simplexityware gabopotestades ph4ge lexloki

pybdm's Issues

Normalization of the BDM

Regarding the normalization of the BDM, how do you perform the normalization?
I was expecting the normalization to be:
NBDM=sum(BDM) / (|x| * log_2(#))
In other words, the sum of the bits of all the blocks divided by the number of bits times log_2 of symbol cardinality. The results, however, seem to indicate that this is not the case.
The problem is to compare with other measures, we need a similar normalization factor (example other compressors).

Large 1D array of np.uint8

Hello, what is the recommendation for a large 1D array of np.uint8 elements as input, which is initialized from a binary file using np.memmap()?

Seems preferred behavior might be to to specify n_symbols as 256 in BDM constructor, and then each element gets mapped down to sequence of 8 bits internally so that the CTM-B2-D12 gets used

    # Create a dataset (256 discrete symbols)
    X = np.full((1,), 255, dtype=np.uint8)

    # Initialize BDM object with 256-symbols alphabet
    bdm = BDM(ndim=1, nsymbols=256)

    # Compute BDM
    print(bdm.bdm(X)) # this fails, but I would expect it to transform [255] to [1 1 1 1 1 1 1 1] internally

Defining PyBDM block size

Hello PyBDM team!

My hope is to use PyBDM to analyze ocean temperatures data (my data has 2 decimals, for example T = 13.52, 14.07,... etc). And since the data contains 10 different symbols (1,2,3,...,9 and 0) I can't use your code for Non-binary sequences (1D) since it supports a maximum of 9 different symbols. My plan is therefore to remove decimal points and convert the decimal data into binary (e.g. 13.52, 14.07 -> 1352, 1407 -> 00110001 00110011 00110101 00110010, 00110001 00110100 00110000 00110111 etc) and use your code for Binary sequences (1D). So to match the binary data representation I would like to use a block size of 8.

To make sure my PyBDM for Binary sequences (1D) works correctly I tested it against the Online Algorithmic Complexity Calculator (https://complexitycalculator.com) using the standard string (010101010101010101010101010101010101) and standard OACC settings (block size=12, overlap=0, alphabet=2). Both OACC and my PyBDM computed the algorithmic complexity to be 28.5757 bits.

While testing the code I noticed that when converting the sequence 1111111111 (i.e. ten repetitions of ones) into binary (i.e. s=00110001001100010011000100110001001100010011000100110001001100010011000100110001), the PyBDM code for Binary sequences (1D) computes the algorithmic complexity of the binary string s to be 68.71 bits, while the OACC manages to compute the algorithmic complexity of the binary string s to be 25.39 bits (block size=8, overlap=0, alphabet=2).

Furthermore, if I set the OACC to default setting (block size=12, overlap=0, alphabet=2), it computes the algorithmic complexity of the binary string s to be 90.7816 bits, which would wrongly imply that the string is Martin-Löf random since then K(s)/length(s)>1, for K(s)=90.78, length(s)=80.

Is there any way to set the block size to 8 in the PyBDM code for Binary sequences (1D) to replicate the efficient OACC result (25.39 bits)?

Alphabet size is 4, but this string only uses 3...

I've set BDM(ndim=1, nsymbols=4) and I have lots of nucleotide strings. Sometimes, not all 4 letters get expressed for a small snip of a genome. In this case, the string [0,2,3] returns this error:

Computed BDM is 0, dataset may have incorrect dimensions

Is this a bug? Thanks!

Edit: Also, this happens for longer strings with only 3/4 of the letters expressed. It doesn't seem it matters too much about the length of the string.

Add proper package classifiers in setup.py

As in the title. It should be done in the nearest release (or just as a patch).

Doesn't work for short sequence and .ent doesn't work

Hi! I am trying to get a algorithmic complexity measure for length 10 binary sequences. But bdm = BDM(ndim=1) bdm.bdm(X) raised valuewarning that every length 10 binary sequences is 0. Also bdm.ent always yield 0 while it's impossible the shannon entropy of all sequence is zero.

Send warnings when it is possible that there were problems with slicing

It would be perhaps useful to send warnings when for instance entropy is zero, but because the data is short and was decomposed to only one block, as this may be confusing for users, who do not know the details of the implementation too well.

Improve unit test coverage (up to 95%)

As in the title. Should be done in the next non-patch release.

sztal / pybdm Goto Github PK

pybdm's People

Contributors

Stargazers

Watchers

Forkers

pybdm's Issues

Normalization of the BDM

Large 1D array of np.uint8

Defining PyBDM block size

Alphabet size is 4, but this string only uses 3...

Add proper package classifiers in setup.py

Doesn't work for short sequence and .ent doesn't work

Send warnings when it is possible that there were problems with slicing

Improve unit test coverage (up to 95%)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent