Git Product home page Git Product logo

sztal / pybdm Goto Github PK

View Code? Open in Web Editor NEW
34.0 34.0 11.0 98.38 MB

Python implementation of block decomposition method for approximating algorithmic complexity

Home Page: https://pybdm-docs.readthedocs.io/en/latest/

License: MIT License

Makefile 1.85% Python 98.15%
algorithmic-complexity algorithmic-information-dynamics algorithmic-information-theory complexity data-science kolmogorov-complexity

pybdm's People

Contributors

kostastsa avatar sztal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pybdm's Issues

Normalization of the BDM

Regarding the normalization of the BDM, how do you perform the normalization?
I was expecting the normalization to be:
NBDM=sum(BDM) / (|x| * log_2(#))
In other words, the sum of the bits of all the blocks divided by the number of bits times log_2 of symbol cardinality. The results, however, seem to indicate that this is not the case.
The problem is to compare with other measures, we need a similar normalization factor (example other compressors).

Large 1D array of np.uint8

Hello, what is the recommendation for a large 1D array of np.uint8 elements as input, which is initialized from a binary file using np.memmap()?

Seems preferred behavior might be to to specify n_symbols as 256 in BDM constructor, and then each element gets mapped down to sequence of 8 bits internally so that the CTM-B2-D12 gets used

    # Create a dataset (256 discrete symbols)
    X = np.full((1,), 255, dtype=np.uint8)

    # Initialize BDM object with 256-symbols alphabet
    bdm = BDM(ndim=1, nsymbols=256)

    # Compute BDM
    print(bdm.bdm(X)) # this fails, but I would expect it to transform [255] to [1 1 1 1 1 1 1 1] internally

Defining PyBDM block size

Hello PyBDM team!

My hope is to use PyBDM to analyze ocean temperatures data (my data has 2 decimals, for example T = 13.52, 14.07,... etc). And since the data contains 10 different symbols (1,2,3,...,9 and 0) I can't use your code for Non-binary sequences (1D) since it supports a maximum of 9 different symbols. My plan is therefore to remove decimal points and convert the decimal data into binary (e.g. 13.52, 14.07 -> 1352, 1407 -> 00110001 00110011 00110101 00110010, 00110001 00110100 00110000 00110111 etc) and use your code for Binary sequences (1D). So to match the binary data representation I would like to use a block size of 8.

To make sure my PyBDM for Binary sequences (1D) works correctly I tested it against the Online Algorithmic Complexity Calculator (https://complexitycalculator.com) using the standard string (010101010101010101010101010101010101) and standard OACC settings (block size=12, overlap=0, alphabet=2). Both OACC and my PyBDM computed the algorithmic complexity to be 28.5757 bits.

While testing the code I noticed that when converting the sequence 1111111111 (i.e. ten repetitions of ones) into binary (i.e. s=00110001001100010011000100110001001100010011000100110001001100010011000100110001), the PyBDM code for Binary sequences (1D) computes the algorithmic complexity of the binary string s to be 68.71 bits, while the OACC manages to compute the algorithmic complexity of the binary string s to be 25.39 bits (block size=8, overlap=0, alphabet=2).

Furthermore, if I set the OACC to default setting (block size=12, overlap=0, alphabet=2), it computes the algorithmic complexity of the binary string s to be 90.7816 bits, which would wrongly imply that the string is Martin-Löf random since then K(s)/length(s)>1, for K(s)=90.78, length(s)=80.

Is there any way to set the block size to 8 in the PyBDM code for Binary sequences (1D) to replicate the efficient OACC result (25.39 bits)?

PyBDM

Screenshot_2019-11-17 OACC - Online Algorithmic Complexity Calculator

Alphabet size is 4, but this string only uses 3...

I've set BDM(ndim=1, nsymbols=4) and I have lots of nucleotide strings. Sometimes, not all 4 letters get expressed for a small snip of a genome. In this case, the string [0,2,3] returns this error:

Computed BDM is 0, dataset may have incorrect dimensions

Is this a bug? Thanks!

Edit: Also, this happens for longer strings with only 3/4 of the letters expressed. It doesn't seem it matters too much about the length of the string.

Doesn't work for short sequence and .ent doesn't work

Hi! I am trying to get a algorithmic complexity measure for length 10 binary sequences. But bdm = BDM(ndim=1) bdm.bdm(X) raised valuewarning that every length 10 binary sequences is 0. Also bdm.ent always yield 0 while it's impossible the shannon entropy of all sequence is zero.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.