<a href="https://github.com/rufex2001/kge/blob/030320d83c10ae379b321fd1101d1e1b976a318

Bordes also removes all the triples for validation <a href=

Why ignore test during validation? about kge HOT 17 CLOSED

uma-pi1 commented on August 14, 2024

Why ignore test during validation?

from kge.

Comments (17)

rgemulla commented on August 14, 2024

This is debatable.

Pro: avoids any leakage of test data into model selection
Con: triples that we know are true (since in test) are treated as wrong during validation

from kge.

samuelbroscheit commented on August 14, 2024

As far as I can see ConvE also filters all the known data. But I agree that test should be unknown during model selection. This should be prominently noted in the release!

from kge.

samuelbroscheit commented on August 14, 2024

Bordes also removes all the triples for validation

https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L143
https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L310

The CP guy does it as well:

https://github.com/facebookresearch/kbc/blob/master/kbc/datasets/process_datasets.py#L73

And I am not going through the code for ConvE again now, but he is doing it as well.

My suggestion:

We can have our we know better even if its worse implementation and give people the choice to do so. But as I said, when we are doing it by default differently than everyone else and get different results, then this should be prominently noted in the release!

Alos, people should know why their dev set results are not comparable to others and why!

from kge.

rufex2001 commented on August 14, 2024

I'd vote for giving the choice but by default not using test data for model selection. And yes, it should be prominently noted.

…

On Thu, Apr 4, 2019 at 12:10 PM samuelbroscheit ***@***.***> wrote: Bordes also removes *all* the triples for validation https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L143 https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L310 The CP guy does it as well: https://github.com/facebookresearch/kbc/blob/master/kbc/datasets/process_datasets.py#L73 And I am not going through the code for ConvE again now, but he is doing it as well. Soooooooooooo, we can have our *we know better even if its worse* implementation and give people the coice to do so. But as I said, when we are doing it yb default differently than everyone else and get different results, then this should be prominently noted in the release! People should know why their dev set results are not comparable to others and why! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AElr5AvSW7jDrbaDiKvduSsuTXlrH605ks5vdc-cgaJpZM4cYvVK> .

-- Daniel Ruffinelli Data and Web Science Group University of Mannheim B6, 26 Room B 003

from kge.

samuelbroscheit commented on August 14, 2024

The cleanest solution might be to produce both numbers and not exclude anything by default

valid filtered(train/dev)
valid filtered(train/dev/test)

and then have an explanation/discussion in the documentation which is what and why they are there.

from kge.

rgemulla commented on August 14, 2024

Well if everybody is using test data anyway, why shouldn't we just "validate" against it? ;)

But seriously: I do not like giving both numbers by default because this encourages use of test data.

Instead, we can have a clearly documented option such as valid.filter_with_test: False. Then we can publish training/grid search jobs that show how they have been trained/selected in their config.

A discussion the the docs is good to have, independently of this.

from kge.

rgemulla commented on August 14, 2024

Btw, the "even if its worse" would be interesting to understand (right know it's merely a conjecture). How much is everybody relying on test data to train their models?

Perhaps: when valid.filter_with_test: True, we output both the filtered and filtered_with_test metrics. But the default should be not to use test data.

from kge.

samuelbroscheit commented on August 14, 2024

My current conjecture is this:

Filter with test: we pick models during validation more likely where the test data is ranked high, because then valiation data has a higher chance to be ranked high. -> bad because of leakage

Filter without test: we pick models during validation more likely where the test data is not ranked high, because then valiation data has a higher chance to be ranked high. -> also bad, because now we treat test as wrong and pick the corresponding models that do the same

from kge.

samuelbroscheit commented on August 14, 2024

So there is leakage in both directions. Terrible.

from kge.

samuelbroscheit commented on August 14, 2024

I currently see no solution for the current datasets. Both decisions are wrong and Rainers suggestion goes against community accepted practices and makes experiments not comparable to previous work.

from kge.

samuelbroscheit commented on August 14, 2024

@rgemulla can you please include an option valid.filter_with_test: True for the time being. As long as we develop the framework it would be nice to be sure that what we are doing converges to SOTA. Then we can explore what kind of an effect valid.filter_with_test: False actually has and then we can safely decide, based on some evidence and not a hunch.

from kge.

rgemulla commented on August 14, 2024

"community accepted" ;) --- Who accepted this? And who silently did it?

Leakage both ways? No; when no test data is used, it can't leak. Also, there are already many true triples that the validation does not know about. For many pairs in validation, whether or not test data is used may not even matter (nothing filtered).

An alternative way would be to use a different validation protocol (e.g., classification against random triples).

Anyway, I'll add this technique but it will be turned off by default and be marked with a red flag. Whoever wants can override it in their model config.

from kge.

rgemulla commented on August 14, 2024

Option valid.filter_with_test added in a0b1004

If all of you agree with this solution, we can close this issue.

from kge.

samuelbroscheit commented on August 14, 2024

I adopted it from Yanjie's and Daniel's evaluation code thinking that KB people do it like this. Maybe after Bordes everybody did it like this (btw, put your work with Martin on the list, and Analogy as well).

There is no use in speculating about the effect it might or might not have.

Martin's code

https://github.com/drachenbach/thrax/blob/master/src/main.cpp#L116
https://github.com/drachenbach/thrax/blob/master/include/thrax/optimizer/Optimizer.h#L76
https://github.com/drachenbach/thrax/blob/master/include/thrax/evaluation/Evaluation.h#L261

Analogy

https://github.com/quark0/ANALOGY/blob/master/main.cpp#L682
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L724
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L438

from kge.

samuelbroscheit commented on August 14, 2024

a0b1004#diff-92c133ecf2cbcc44179341f071c31dc5R78
a0b1004#diff-1dd7c36dae44d494543baa0f3c7dbcd5R65

evil really :) ? Language could be toned down a bit.

For me the issue will be resolved as soon as we haved published a paper where we explain our findings and reviewers agree with this and the community uses this method from then on. :)

from kge.

rgemulla commented on August 14, 2024

Evil was the most intuitive label I could come up with ;) --- Feel free to change, I did not expect this to survive the day.

I think this issue is resolved from the implementation point of view (perhaps apart from the default value). I suggest to insert a documentation stub to the README and go on for now.

from kge.

rgemulla commented on August 14, 2024

Nothing to do here anymore.

from kge.

Why ignore test during validation? about kge HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent