Comments (9)
Hi Oliver,
Thanks for reporting. You are right. It's been a while since I wrote this so I did a few things to confirm the problem.
I tried your values using this online tool and it reports a p-value of 0.557475. Note that the discrepancy on the 3rd decimal can be attributed to the use of different approximations.
I also compared my implementation directly with the one of R:
shapiro.test(c(488.0, 486.0, 492.0, 490.0, 489.0, 491.0, 488.0, 490.0, 496.0, 487.0, 487.0, 493.0))
The result is p-value = 0.5583
After applying your patch, my implementation reports a p-value of 0.55826969.
I can patch it, but if you prefer send me a Pull-request and I'll merge it ASAP. I can also issue a hotfix if you want to unblock your work.
from datumbox-framework.
Hi Vasilis,
thanks for the quick reply and confirmation. Take your time, no hotfix necessary for me. I just wanted to notify you.
from datumbox-framework.
I patched it on the develop
branch and also pushed a new snapshot version on the repo (0.8.2-SNAPSHOT
). I'll probably take care of a few more minor improvements and release a stable version soon.
Thanks for the help!
Vasilis
from datumbox-framework.
Ok. I did a little deeper dive into this and I think the Bugfix I proposed just introduced another bug.
the ContinuousDistributions.gaussCdf
calculates the cumulative probability of a gaussian (normal) distribution right? In which case m
would be the mean of the distribution an s
would be the sigma.
In this case the original line pw=ContinuousDistributions.gaussCdf((y-m)/s);
is CORRECT, but fails when the mean becomes negative (which it is for the provided example). When switching from y-m
to m-y
this is inadvertedly fixed as this moves the value to the "correct side" of the mean, as long as the mean is negative.
Unfortunately I haven't been able to produce any example where m becomes positive, so the fix may still apply, but the safer fix would be:
if (m < 0.0) {
m *= -1.0;
y *= -1.0;
}
pw=ContinuousDistributions.gaussCdf((y-m)/s);
from datumbox-framework.
I think I will need to investigate this more thoroughly to align the behaviour of the class with the original implementation. I will be investigating this later this week.
Cheers
from datumbox-framework.
Could you please check again the latest develop branch? I believe it's now patched.
The results agree with R for the following examples:
Shapiro-Wilk normality test
data: c(33.4, 33.3, 31, 31.4, 33.5, 34.4, 33.7, 36.2, 34.9, 37)
W = 0.95509, p-value = 0.7288
Shapiro-Wilk normality test
data: c(35, 45, 55, 58, 61, 63, 65, 68, 70, 72, 74, 86)
W = 0.97107, p-value = 0.9216
Shapiro-Wilk normality test
data: c(488, 486, 492, 490, 489, 491, 488, 490, 496, 487, 487, 493)
W = 0.94449, p-value = 0.5583
from datumbox-framework.
@oliver-krauss Did you manage by any chance to check the fix? It should be returning the right values now.
from datumbox-framework.
Hi, sorry for the delay was on a trip. I can confirm that this works with my own values as well, which I did cross reference with different tools as well.
from datumbox-framework.
Awesome thanks for reporting and for contributing to the fix. I will take care of a couple of remaining updates and issue a stable version soon. For now the latest 0.8.2-SNAPSHOT
version contains the patch.
from datumbox-framework.
Related Issues (20)
- Access output of StepwiseRegression prediction HOT 1
- Serialize Dataframe HOT 4
- Cross Validation in Datumbox for parameter selection HOT 2
- Train Text Classifier from String array HOT 1
- How to Set configs so that I can read Training Data from Disk? HOT 4
- How to use Pretrained Models in Datumbox Framework HOT 3
- Can we perform Named Entity Extraction Using Datumbox HOT 2
- How can to make datumbox train data in disk HOT 1
- Will this work on Android HOT 2
- java.lang.OutOfMemoryError while preparing model from own datasets. HOT 1
- Created model is giving slow response? HOT 2
- FlatDataList with null values gets an exception when trying to calculate the variance HOT 5
- SVM example for text classfication HOT 2
- Unable to download the framework using Maven HOT 1
- WordSequenceExtractor can not work with MultinomialNaiveBayes Training HOT 1
- How to setLogPriors for Naive Bayes model during cross validation? HOT 1
- Why Holt-Winters only returns one-step-ahead forecast ? HOT 2
- How to load a big dataset and use multiple TextClassifier to predict it? HOT 1
- Entity based Sentiment Analysis HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datumbox-framework.