Comments (4)
Hey, I got a pull request for fixing the docs today :D
Student's and Fisher's tests very often agree (always?), but Student's is immensely faster to compute.
That's why I changed the default stat test.
The paper you mentioned says the opposite. From the introduction:
Student’s t, bootstrap, and randomization tests largely agree with each other.
Researchers using any of these three tests are likely to draw the same
conclusions regarding statistical significance of their results.
There's also a nice table showing that:
from ranx.
This is the ‘practical’ conclusion with their experiments on TREC data. They have further discussion in §5.2 and in the end recommend to use Fisher’s randomization.
Thank you for your quick answer (as always :))
from ranx.
Sorry, I completely forgot about that section.
From my experiments on very different datasets from those used in the paper, Student's and Fisher's tests always agreed (comparing mean values).
I see the point about tiny test sets.
However, I question the validity of tiny test sets.
Are they representative of the general user behavior?
What's the confidence that a model outperforming another on a small set of queries will work better on different ones?
Is a small set of queries representative of the actual population of queries in the real world?
Obviously, we could be interested in testing a specific niche of queries depending on the use case.
What do you think about it?
from ranx.
It makes sense but I am not so familiar with statistical significance testing.
But I think that small test sets will not get low p-values anyway.
The opposite problem, namely that huge datasets will get low p-values very easily, is discussed in §4.4 of https://www.morganclaypool.com/doi/abs/10.2200/S00994ED1V01Y202002HLT045
from ranx.
Related Issues (20)
- [Feature Request] Expose DCG as metric HOT 3
- [BUG] dcg and dcg_burges do not work in the compare function HOT 2
- [Feature Request] Use black to indent the code HOT 1
- [BUG] RBP with multiple relevance levels HOT 3
- [Feature Request] Support gzipped files? HOT 3
- [Feature Request] memory issue / make Run more efficient HOT 2
- Incorrect result for f1 score HOT 13
- Zero-scored documents HOT 10
- [BUG] Misleading exception message on dataframe types HOT 2
- [BUG] Issues when storing/loading Qrels from a dataframe and a parquet file. HOT 6
- [Feature Request] Run.from_df and Run.from_parquet does not allow specifying run name HOT 1
- Question on rank aggregation usage HOT 4
- Getting "Segmentation fault (core dumped)" error HOT 2
- [Feature Request] stddev statistic HOT 3
- Couldn't find any documentation about Qrel and run score range HOT 2
- [Feature Request] Propensity-scored Metrics HOT 1
- How do we compare different runs with multiple folds per run? HOT 1
- [Question] About the correction among multiple hypotheses
- [Question] How to compute precision for a retriever operating at passage-level
- JIT compilation on serverless (i.e. Modal Labs)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ranx.