raphaelvallat / antropy Goto Github PK

AntroPy: entropy and complexity of (EEG) time-series in Python

Home Page: https://raphaelvallat.com/antropy/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

python signal-processing entropy fractal-dimension numba neuroscience eeg complexity signal feature-extraction machine-learning algorithm

antropy's Issues

how to explain the detrended_fluctuation result

去趋势波动分析-13598837587-心脏得分:1.1411
去趋势波动分析-13598837587-肝脏得分:1.1261
去趋势波动分析-13598837587-肾脏得分:0.9356
去趋势波动分析-13598837587-脾脏得分:0.9347
去趋势波动分析-13598837587-肺脏得分:0.8362
去趋势波动分析-13609109801-心率:0.6007
去趋势波动分析-13609109801-心脏得分:0.9536
去趋势波动分析-13609109801-肝脏得分:0.6253
去趋势波动分析-13609109801-肾脏得分:0.5689
去趋势波动分析-13609109801-脾脏得分:0.5131
去趋势波动分析-13609109801-肺脏得分:0.6028
去趋势波动分析-13609121766-心率:0.0000
去趋势波动分析-13609121766-心脏得分:0.0000
去趋势波动分析-13609121766-肝脏得分:0.0000
去趋势波动分析-13609121766-肾脏得分:0.0000
去趋势波动分析-13609121766-脾脏得分:0.0000
去趋势波动分析-13609121766-肺脏得分:0.0000

in the nodd,there is

For alpha < 1 the underlying process is stationary and can be modelled as fractional Gaussian noise with H = alpha. This means for alpha = 0.5 we have no correlation or “memory”, for 0.5 < alpha < 1 we have a memory with positive correlation and for alpha < 0.5 the correlation is negative.
but what does this 0.0000 mean

Errors in sample_entropy and higuchi_fd on a small vector

Hi, is this supposed to be?

import antropy as ant

print(ant.sample_entropy([-1, 2, 1, 3, 3]))

C:\ProgramData\Anaconda3\lib\site-packages\antropy\entropy.py in sample_entropy(x, order, metric)
663 x = np.asarray(x, dtype=np.float64)
664 if metric == "chebyshev" and x.size < 5000:
--> 665 return _numba_sampen(x, order=order, r=(0.2 * x.std(ddof=0)))
666 else:
667 phi = _app_samp_entropy(x, order=order, metric=metric, approximate=False)

IndexError: getitem out of range

print(ant.higuchi_fd(x))

C:\ProgramData\Anaconda3\lib\site-packages\antropy\fractal.py in higuchi_fd(x, kmax)
297 x = np.asarray(x, dtype=np.float64)
298 kmax = int(kmax)
--> 299 return _higuchi_fd(x, kmax)
300
301

ZeroDivisionError: division by zero

Multiscale Entropy (MSE)

Are there any plans to include Multiscale Sample Entropy (MSE) as a function to the package?

Error importing with 32-bit windows 7

Hi there,

I've been playing with antropy on my main home machine and have come to use the same code on a 32-bit windows 7 machine which has inccured an import error.

Currently using Python 3.8.10 32-bit. Can this be fixed or is it likely i'm in need of going to a 64-bit version?

The traceback is as follow:

Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:34:34) [MSC v.1928 32 bit (
Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import antropy
Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\numba\core\errors.py", line 776, in new_er
ror_context
    yield
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 235, in lowe
r_block
    self.lower_inst(inst)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 380, in lowe
r_inst
    val = self.lower_assign(ty, inst)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 556, in lowe
r_assign
    return self.lower_expr(ty, value)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 1084, in low
er_expr
    res = self.lower_call(resty, expr)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 815, in lowe
r_call
    res = self._lower_call_normal(fnty, expr, signature)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 1055, in _lo
wer_call_normal
    res = impl(self.builder, argvals, self.loc)
  File "C:\Python38\lib\site-packages\numba\core\base.py", line 1194, in __call_
_
    res = self._imp(self._context, builder, self._sig, args, loc=loc)
  File "C:\Python38\lib\site-packages\numba\core\base.py", line 1224, in wrapper

    return fn(*args, **kwargs)
  File "C:\Python38\lib\site-packages\numba\np\unsafe\ndarray.py", line 31, in c
odegen
    res = _empty_nd_impl(context, builder, arrty, shapes)
  File "C:\Python38\lib\site-packages\numba\np\arrayobj.py", line 3468, in _empt
y_nd_impl
    arrlen_mult = builder.smul_with_overflow(arrlen, s)
  File "C:\Python38\lib\site-packages\llvmlite\ir\builder.py", line 50, in wrapp
ed
    raise ValueError("Operands must be the same type, got (%s, %s)"
ValueError: Operands must be the same type, got (i32, i64)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python38\lib\site-packages\antropy\__init__.py", line 4, in <module>
    from .fractal import *
  File "C:\Python38\lib\site-packages\antropy\fractal.py", line 304, in <module>

    def _dfa(x):
  File "C:\Python38\lib\site-packages\numba\core\decorators.py", line 226, in wr
apper
    disp.compile(sig)
  File "C:\Python38\lib\site-packages\numba\core\dispatcher.py", line 979, in co
mpile
    cres = self._compiler.compile(args, return_type)
  File "C:\Python38\lib\site-packages\numba\core\dispatcher.py", line 141, in co
mpile
    status, retval = self._compile_cached(args, return_type)
  File "C:\Python38\lib\site-packages\numba\core\dispatcher.py", line 155, in _c
ompile_cached
    retval = self._compile_core(args, return_type)
  File "C:\Python38\lib\site-packages\numba\core\dispatcher.py", line 168, in _c
ompile_core
    cres = compiler.compile_extra(self.targetdescr.typing_context,
  File "C:\Python38\lib\site-packages\numba\core\compiler.py", line 686, in comp
ile_extra
    return pipeline.compile_extra(func)
  File "C:\Python38\lib\site-packages\numba\core\compiler.py", line 428, in comp
ile_extra
    return self._compile_bytecode()
  File "C:\Python38\lib\site-packages\numba\core\compiler.py", line 492, in _com
pile_bytecode
    return self._compile_core()
  File "C:\Python38\lib\site-packages\numba\core\compiler.py", line 471, in _com
pile_core
    raise e
  File "C:\Python38\lib\site-packages\numba\core\compiler.py", line 462, in _com
pile_core
    pm.run(self.state)
  File "C:\Python38\lib\site-packages\numba\core\compiler_machinery.py", line 34
3, in run
    raise patched_exception
  File "C:\Python38\lib\site-packages\numba\core\compiler_machinery.py", line 33
4, in run
    self._runPass(idx, pass_inst, state)
  File "C:\Python38\lib\site-packages\numba\core\compiler_lock.py", line 35, in
_acquire_compile_lock
    return func(*args, **kwargs)
  File "C:\Python38\lib\site-packages\numba\core\compiler_machinery.py", line 28
9, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "C:\Python38\lib\site-packages\numba\core\compiler_machinery.py", line 26
2, in check
    mangled = func(compiler_state)
  File "C:\Python38\lib\site-packages\numba\core\typed_passes.py", line 396, in
run_pass
    lower.lower()
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 138, in lowe
r
    self.lower_normal_function(self.fndesc)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 192, in lowe
r_normal_function
    entry_block_tail = self.lower_function_body()
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 221, in lowe
r_function_body
    self.lower_block(block)
  File "C:\Python38\lib\site-packages\numba\core\lowering.py", line 235, in lowe
r_block
    self.lower_inst(inst)
  File "C:\Python38\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Python38\lib\site-packages\numba\core\errors.py", line 786, in new_er
ror_context
    raise newerr.with_traceback(tb)
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: native
lowering)
Operands must be the same type, got (i32, i64)

File "lib\site-packages\antropy\fractal.py", line 313:
def _dfa(x):
    <source elided>

    for i_n, n in enumerate(nvals):
    ^

During: lowering "array.70 = call empty_func.71(size_tuple.69, func=empty_func.7
1, args=(Var(size_tuple.69, fractal.py:313),), kws=[], vararg=None, target=None)
" at C:\Python38\lib\site-packages\antropy\fractal.py (313)
>>>

The most "generic" entropy measure

Hi,

Is there any review paper available that overviews the performance of different entropy measures which are implemented in this library for the actual electrophysiological data?
Also, what would be the measure with the smallest number of non-optional parameters that is also guaranteed to work in most cases?

Thank you!

Different results of different SampEn implementations

My own implementation:

import math
import numpy as np
from scipy.spatial.distance import pdist

def sample_entropy(signal,m,r,dist_type='chebyshev', result = None, scale = None):

# Check Errors
if m > len(signal):
    raise ValueError('Embedding dimension must be smaller than the signal length (m<N).')
if len(signal) != signal.size:
    raise ValueError('The signal parameter must be a [Nx1] vector.')
if not isinstance(dist_type, str):
    raise ValueError('Distance type must be a string.')
if dist_type not in ['braycurtis', 'canberra', 'chebyshev', 'cityblock',
                     'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
                     'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis',
                     'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
                     'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule']:
    raise ValueError('Distance type unknown.')

# Useful parameters
N = len(signal)
sigma = np.std(signal)
templates_m = []
templates_m_plus_one = []
signal = np.squeeze(signal)

for i in range(N - m + 1):
    templates_m.append(signal[i:i + m])

B = np.sum(pdist(templates_m, metric=dist_type) <= sigma * r)
if B == 0:
    value = math.inf
else:
    m += 1
    for i in range(N - m + 1):
        templates_m_plus_one.append(signal[i:i + m])
    A = np.sum(pdist(templates_m_plus_one, metric=dist_type) <= sigma * r)

    if A == 0:
        value = math.inf

    else:
        A = A/len(templates_m_plus_one)
        B = B/len(templates_m)

        value = -np.log((A / B))

"""IF A = 0 or B = 0, SamEn would return an infinite value. 
However, the lowest non-zero conditional probability that SampEn should
report is A/B = 2/[(N-m-1)*(N-m)]"""

if math.isinf(value):

    """Note: SampEn has the following limits:
            - Lower bound: 0 
            - Upper bound : log(N-m) + log(N-m-1) - log(2)"""

    value = -np.log(2/((N-m-1)*(N-m)))

if result is not None:
    result[scale-1] = value

return value

signal= np.random.rand(200) // rand(200,1) in Matlab
parameters: m = 1; r = 0.2

Outputs:

My implementation: 2.1812
Implementation adapted: 2.1969
Neurokit 2 entropy_sample function: 2.5316
Your implementation: 2.2431
Different implementation from GitHub: 1.0488

Allow users to pass signal in frequency domain in spectral entropy

Currently, antropy.spectral_entropy only allows x to be in time-domain. We should add freqs=None and psd=None as possible input if users want to calculate the spectral entropy of a pre-computed power spectrum. We should also add an example of how to calculate the spectral entropy from a multitaper power spectrum.

Zero-crossings

Hi Raph,

Was doing some cross-checking and I have a quick question to disperse a doubt in my mind regarding the counting of the number of inversions:

antropy/antropy/entropy.py

Line 908 in 88fea89

nzc = np.diff(np.signbit(x), axis=axis).sum(axis=axis)

Shouldn't it be: np.diff(np.signbit(np.diff( here? I.e., counting the changes in sign of the consecutive differences, rather than the difference of the sign of the consecutive samples 🤔

modify the entropy function be able to compute vectorizly

Hi, I have used your package to process the ECG signal and it achieve good results on classify different heart disease. Thanks a lot!

However, so far, these functions are only can deal with one-dimensional signal like array(~, 1). May I take a try to modify the code and make it can process the data like sklearn.preprocessing.scale(X, axis=xx)? So it will be more efficient to deal with big array, because we do not need to run the foor loop or something else.

My email is [email protected], welcome to discuss with me!

RuntimeWarning in _xlogx when x has zero values

In the version currently on github, _xlogx uses numpy.where to return valid results based on the condition x==0, 0. However numpy.where still tries to apply the log function to all values of x before trimming the values that meet the condition, resulting in runtime warnings.

To avoid those issues, I would suggest changing the code to something like

    xlogx = np.zeros_like(x)
    valid = np.nonzero(x)
    xlogx[valid] = x[valid] * np.log(x[valid]) / np.log(base)
    return xlogx

As this strictly apply the function to the nonzero elements of x.

If this looks good to you I could submit a PR. Let me know.

numba jit needs explicit nopython=True

C:\ProgramData\Anaconda3\lib\site-packages\antropy\fractal.py:197: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@jit((types.Array(types.float64, 1, "C", readonly=True), types.int32))

speedup _numba_sampen kernel

I recently implemented my own version of _numba_sampen as a C extension. Mainly because I did not know about this package, but also because I thought it would be good to learn how to write C extensions for numpy.

From my benchmarks, it looks like my implementation is slightly faster than the current one, so I was wondering if you would be interested in a PR to update the kernel. It would also improve the memory requirement, as the current implementation allocates several copies of the array, whereas my kernel is constant (wrt. array size and window size).

Here is a benchmark figure:

And here is a pure python implementation of my kernel:

def sample_entropy(data, window_size: int, tolerance: float) -> float:
    """Calculate Sample Entropy

    Sample entropy is a measure of irregularity of a series. In other words,
    it quantifies how difficult it is ("on average") to predict the next value
    of a series from a window of ``window_size`` known values.

    Read more at: https://en.wikipedia.org/wiki/Sample_entropy

    Parameters
    ----------
    window_size : int
        How many elements to include in the template.
    tolerance : float
        How much sampling noise is tolerable and can be "ignored".

    Returns
    -------
    entropy : float
        A measure of how irregular the series is.

    """

    # instead of sliding two windows independently, this implementation
    # slides both windows at a constant offset (inner loop) and then varies
    # the offset (outer loop). The constant offset between windows has the
    # advantage that a lot of computation between two applications can be
    # reused. This way, we only need to track what slides out of the window
    # and what slides in.

    # further, instead of keeping the full window, we only need to know how
    # many pairs in the current window above the threshold. If there is at
    # least one, we don't count the window else we do. This way, we only
    # need to track if a pair > threshold moved out/in and keep one counter
    # per window length.

    # sliding a m-size and (m+1)-size window also has overlap, so we can
    # apply some trickery to share intermediate results when computing the
    # numerator and denominator

    sequence = np.asarray(data)
    size = sequence.size
    sequence = sequence.tolist()

    numerator = 0
    denominator = 0

    for offset in range(1, size - window_size):
        n_numerator = (
            abs(sequence[window_size] - sequence[window_size + offset]) > tolerance
        )
        n_denominator = 0

        for idx in range(window_size):
            n_numerator += abs(sequence[idx] - sequence[idx + offset]) > tolerance
            n_denominator += abs(sequence[idx] - sequence[idx + offset]) > tolerance

        if n_numerator == 0:
            numerator += 1
        if n_denominator == 0:
            denominator += 1

        prev_in_diff = (
            abs(sequence[window_size] - sequence[offset + window_size]) > tolerance
        )
        for idx in range(1, size - offset - window_size):
            out_diff = (
                abs(sequence[idx - 1] - sequence[idx + offset - 1]) > tolerance
            )
            in_diff = (
                abs(
                    sequence[idx + window_size]
                    - sequence[idx + offset + window_size]
                )
                > tolerance
            )
            n_numerator += in_diff - out_diff
            n_denominator += prev_in_diff - out_diff
            prev_in_diff = in_diff

            if n_numerator == 0:
                numerator += 1
            if n_denominator == 0:
                denominator += 1

        # one extra idx for the denominator
        idx = size - offset - window_size
        out_diff = (
            abs(
                sequence[idx - 1]
                - sequence[size - window_size - 1]
            )
            > tolerance
        )
        n_denominator = n_denominator - out_diff + prev_in_diff
        if n_denominator == 0:
            denominator += 1

    # one extra offset for the denominator
    offset = size - window_size
    n_denominator = 0
    for idx in range(window_size):
        n_denominator += abs(sequence[idx] - sequence[idx + offset]) > tolerance
    if n_denominator == 0:
        denominator += 1

    return -log(numerator / denominator)

conda-forge package

Hello, I've added antropy to conda-forge; please let me know if you'd like to be added as a co-maintainer for the respective feedstock. It could also make sense to amend the installation instructions, WDYT?

Fix KDTree all_metrics

Hi, I noticed that in new sklearn they have probably changed

KDTree.valid_metrics from a list to a method, and antropy does not work with it anymore.
Suggesting to add line 376 in antropy.py to resolve the issue:

if type(_all_metrics).__name__=='builtin_function_or_method': _all_metrics=_all_metrics()

sample_entropy not matching data types

I always get error messages with the function "sample_entropy" that the types at _numba_sampen are not the correct ones:
"No matching definition for argument type(s) readonly array(float64, 1d, C), int64, float64".

I was able to fix the error by hard specifying the data types:
Old:
if metric == "chebyshev" and x.size < 5000: return _numba_sampen(x, order=order, r=0.2 * x.std(ddof=0))

New:
if metric == "chebyshev" and x.size < 5000: return _numba_sampen(x.astype(np.float64), order=np.int32(order), r=np.float64(0.2 * x.std(ddof=0)))

Speed up importing antropy

Create a file called import.py with the single line import antropy. On my machine (Linux VM), this takes at least 10 seconds to run.

Using pyinstrument tells me that most of the time is spent importing numba. Is there any possibility of speeding this up? Seems like this is a known issue with numba, though: see e.g. numba/numba#4927.

$ pyinstrument import.py 

  _     ._   __/__   _ _  _  _ _/_   Recorded: 16:36:28  Samples:  7842
 /_//_/// /_\ / //_// / //_'/ //     Duration: 12.368    CPU time: 11.963
/   _/                      v3.4.1

Program: import.py

12.368 <module>  import.py:1
└─ 12.368 <module>  antropy/__init__.py:2
   ├─ 6.711 <module>  antropy/fractal.py:1
   │  └─ 6.711 wrapper  numba/core/decorators.py:191
   │        [14277 frames hidden]  numba, llvmlite, contextlib, pickle, ...
   ├─ 3.034 <module>  antropy/entropy.py:1
   │  ├─ 2.390 wrapper  numba/core/decorators.py:191
   │  │     [5009 frames hidden]  numba, abc, llvmlite, inspect, contex...
   │  └─ 0.522 <module>  sklearn/__init__.py:14
   │        [374 frames hidden]  sklearn, scipy, inspect, enum, numpy,...
   └─ 2.618 <module>  antropy/utils.py:1
      ├─ 1.584 wrapper  numba/core/decorators.py:191
      │     [5027 frames hidden]  numba, abc, functools, llvmlite, insp...
      ├─ 0.895 <module>  numba/__init__.py:3
      │     [1444 frames hidden]  numba, llvmlite, pkg_resources, warni...
      └─ 0.138 <module>  numpy/__init__.py:106
            [190 frames hidden]  numpy, pathlib, urllib, collections, ...

To view this report with different options, run:
    pyinstrument --load-prev 2021-06-17T16-36-28 [options]

Calculation in katz_fd seems to be incorrect

The current implementation of katz_fd in antropy/fractal.py looks like the following:

def katz_fd(x, axis=-1):
  x = np.asarray(x)
  dists = np.abs(np.diff(x, axis=axis))
  ll = dists.sum(axis=axis)
  ln = np.log10(ll / dists.mean(axis=axis))
  aux_d = x - np.take(x, indices=[0], axis=axis)
  d = np.max(np.abs(aux_d), axis=axis)
  kfd = np.squeeze(ln / (ln + np.log10(d / ll)))
  if not kfd.ndim:
      kfd = kfd.item()
  return kfd

However, both in the documentation and in the original paper the distance (dist and d) is defined as the Euclidean distance between the given points, which is calculated differently.

A BASIC implementation was also provided in the Appendix of the original paper:

The important differences are in lines 130 and 150. (Note: the exponential operator ^ probably missing due to scanning/printing issues)

With this and by looking at another implementation in MATLAB, I think the above code for katz_fd should be changed to represent the original definition. However, I do not know if this change would break existing works using the current (apparently incorrect) functionality, so I propose the following change:

Do not remove the current implementation, but include a flag to switch between the "legacy" version and the new version.
Add a warning if the legacy version is used
Change the dists and d calculations to something like the following (Note: this was only tested with single channel data)

ind = np.arange(len(x)-1)
A = np.stack((ind,x[:-1]))
B = np.stack((ind+1,x[1:]))
dists = np.linalg.norm(B-A,axis=0)

ind = np.arange(len(x))
A = np.stack((ind,x))
first = np.reshape([0,x[0]],(2,1))
aux_d = np.linalg.norm(A-first,axis=0)
d = np.max(aux_d)

If there are any comments or suggestions, please let me know.
(Also this is my first contribution to an open source project, so I am a bit unsure about the etiquette/conventions)

Variable implementation of r e.g. SampEn or AppEn

Great package and fastest implementation of entropies I found so far.

Is there a specific reason to hardcode the tolerance distance to r=0.2SD ?
On the one hand, this is a disadvantage compared to other implementations such as neurokit and on the other hand, there is some research indicating that the choice of r has an important impact on the gathered results (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5335714).

raphaelvallat / antropy Goto Github PK

antropy's Issues

Recommend Projects

Recommend Topics

Recommend Org