Git Product home page Git Product logo

diptest's People

Contributors

alexhagen avatar alimuldal avatar dependabot[bot] avatar eldeveloper avatar prokolyvakis avatar rurlus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

prokolyvakis

diptest's Issues

IndexError when I tried on 2 millions sample

I tried run the diptest.diptest() method with my sample from a huge dataframe. it contains more than 2 millions of elements, but I got IndexError about bounds for axis. However if I get a filtered data from the same column it runs normal, without error.

IndexError
Traceback (most recent call last)
----> 6 dip, pval = diptest.diptest(s)

File /env/lib/python3.10/site-packages/diptest/diptest.py:198, in diptest(x, full_output, sort_x, allow_zero, boot_pval, n_boot, n_threads, seed, stream)
    196     pval = func(**kwargs)
    197 else:
--> 198     pval = Consts.compute_pval_interpolation(n, dip)
    200 if full_output:
    201     return dip, pval, r[1]

File /home/paulo/routers/env/lib/python3.10/site-packages/diptest/consts.py:83, in Consts.compute_pval_interpolation(cls, n, dip)
     80 i1 = min(cls._CRIT_VALS.shape[0], i1)
     82 # Interpolate on sqrt(n)
---> 83 n0, n1 = cls._SAMPLE_SIZE[[i0, i1]]
     85 y0 = np.sqrt(n0) * cls._CRIT_VALS[i0]
     86 sD = np.sqrt(n) * dip

IndexError: index 21 is out of bounds for axis 0 with size 21

I got the code from the example provided by the Github README.md page

## Full version (this got error)
s = df.signal
# only the dip statistic
dip = diptest.dipstat(s)

# both the dip statistic and p-value
dip, pval = diptest.diptest(s)
print(dip, pval)
## Filtered version (run well)
s = df.query("col1 == 'A'").signal
# only the dip statistic
dip = diptest.dipstat(s)

# both the dip statistic and p-value
dip, pval = diptest.diptest(s)
print(dip, pval)
# 0.15625745645430683 0.0

Finer resolution for a p-value of 0.0

First, thanks for putting together this package!

I've got a distribution that certainly looks bimodal but I'm getting a p-value of 0.0. I'm sure the p-value is very low, but not 0. I'd like to know a more precise threshold for the p-value (e.g. p < 1e-5?). I'm using diptest-0.7.0 and below is a snippet of code.

import diptest 
x = df['my_distribution_data'].values
dip, pval = diptest.diptest(x)
print(dip,pval)

Returns: 0.10526623882697403 0.0

Feature: Remove obsolete consistency checks

Hello @RUrlus,

I was considering that since we now moved entirely to C++, we can remove the legacy ifault variable and remove some obsolete consistency checks. Specifically, my proposals are the following:

  1. Remove completely the ifault variable.
  2. Create one function diptst_unsafe (func name discussable of course) that does not perform the checks for sorting and non negativity. This can save some time when calling either diptest_pval or diptest_pval_mt.
  3. The original function diptst will call diptst_unsafe while also adding the following code:
if (ifault == 1) {
    throw std::runtime_error("N must be >= 1.");
} else if (ifault == 2) {
    throw std::runtime_error("x must be sorted in ascending error.");
}

(ifault will be removed and the actual checks will be performed in that place), so the wrapper method can also be refactored a little bit!

WDYT? If you agree, I will be happy to start working on it and create a PR linked to this issue. I am looking forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.