Comments (17)
This is debatable.
Pro: avoids any leakage of test data into model selection
Con: triples that we know are true (since in test) are treated as wrong during validation
from kge.
As far as I can see ConvE also filters all the known data. But I agree that test should be unknown during model selection. This should be prominently noted in the release!
from kge.
Bordes also removes all the triples for validation
https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L143
https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L310
The CP guy does it as well:
https://github.com/facebookresearch/kbc/blob/master/kbc/datasets/process_datasets.py#L73
And I am not going through the code for ConvE again now, but he is doing it as well.
My suggestion:
We can have our we know better even if its worse implementation and give people the choice to do so. But as I said, when we are doing it by default differently than everyone else and get different results, then this should be prominently noted in the release!
Alos, people should know why their dev set results are not comparable to others and why!
from kge.
from kge.
The cleanest solution might be to produce both numbers and not exclude anything by default
valid filtered(train/dev)
valid filtered(train/dev/test)
and then have an explanation/discussion in the documentation which is what and why they are there.
from kge.
Well if everybody is using test data anyway, why shouldn't we just "validate" against it? ;)
But seriously: I do not like giving both numbers by default because this encourages use of test data.
Instead, we can have a clearly documented option such as valid.filter_with_test: False
. Then we can publish training/grid search jobs that show how they have been trained/selected in their config.
A discussion the the docs is good to have, independently of this.
from kge.
Btw, the "even if its worse" would be interesting to understand (right know it's merely a conjecture). How much is everybody relying on test data to train their models?
Perhaps: when valid.filter_with_test: True
, we output both the filtered
and filtered_with_test
metrics. But the default should be not to use test data.
from kge.
My current conjecture is this:
Filter with test: we pick models during validation more likely where the test data is ranked high, because then valiation data has a higher chance to be ranked high. -> bad because of leakage
Filter without test: we pick models during validation more likely where the test data is not ranked high, because then valiation data has a higher chance to be ranked high. -> also bad, because now we treat test as wrong and pick the corresponding models that do the same
from kge.
So there is leakage in both directions. Terrible.
from kge.
I currently see no solution for the current datasets. Both decisions are wrong and Rainers suggestion goes against community accepted practices and makes experiments not comparable to previous work.
from kge.
@rgemulla can you please include an option valid.filter_with_test: True
for the time being. As long as we develop the framework it would be nice to be sure that what we are doing converges to SOTA. Then we can explore what kind of an effect valid.filter_with_test: False
actually has and then we can safely decide, based on some evidence and not a hunch.
from kge.
"community accepted" ;) --- Who accepted this? And who silently did it?
Leakage both ways? No; when no test data is used, it can't leak. Also, there are already many true triples that the validation does not know about. For many pairs in validation, whether or not test data is used may not even matter (nothing filtered).
An alternative way would be to use a different validation protocol (e.g., classification against random triples).
Anyway, I'll add this technique but it will be turned off by default and be marked with a red flag. Whoever wants can override it in their model config.
from kge.
Option valid.filter_with_test
added in a0b1004
If all of you agree with this solution, we can close this issue.
from kge.
I adopted it from Yanjie's and Daniel's evaluation code thinking that KB people do it like this. Maybe after Bordes everybody did it like this (btw, put your work with Martin on the list, and Analogy as well).
There is no use in speculating about the effect it might or might not have.
Martin's code
https://github.com/drachenbach/thrax/blob/master/src/main.cpp#L116
https://github.com/drachenbach/thrax/blob/master/include/thrax/optimizer/Optimizer.h#L76
https://github.com/drachenbach/thrax/blob/master/include/thrax/evaluation/Evaluation.h#L261
Analogy
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L682
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L724
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L438
from kge.
a0b1004#diff-92c133ecf2cbcc44179341f071c31dc5R78
a0b1004#diff-1dd7c36dae44d494543baa0f3c7dbcd5R65
evil really :) ? Language could be toned down a bit.
For me the issue will be resolved as soon as we haved published a paper where we explain our findings and reviewers agree with this and the community uses this method from then on. :)
from kge.
Evil was the most intuitive label I could come up with ;) --- Feel free to change, I did not expect this to survive the day.
I think this issue is resolved from the implementation point of view (perhaps apart from the default value). I suggest to insert a documentation stub to the README and go on for now.
from kge.
Nothing to do here anymore.
from kge.
Related Issues (20)
- Support more metrics?
- How to apply HittER
- Number of negative samples during evaluation HOT 3
- web.informatik.uni-mannheim.de not accesible HOT 2
- ValueError thrown by `$ kge start examples/toy-complex-train.yaml` HOT 3
- Using buffer for writing to a file during preprocessing
- ConvE and reciprocal_relations_model HOT 2
- Getting output of libKGE
- Relation Prediction HOT 5
- Filtered _ro prediction HOT 1
- Frequency based sampling broken
- Error on tensor scoring HOT 1
- Adding user keys to config HOT 2
- Trial XXXXX failed: TypeError("step() missing 1 required positional argument: 'closure'") HOT 2
- ERROR: file:///content does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found. HOT 3
- generate embeddings HOT 1
- Trained embeddings are missing for Codex-{S/M/L} HOT 1
- dataset issues HOT 3
- Getting model predictions in parallel HOT 1
- About debug the program HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kge.