Git Product home page Git Product logo

Comments (17)

rgemulla avatar rgemulla commented on August 14, 2024

This is debatable.

Pro: avoids any leakage of test data into model selection
Con: triples that we know are true (since in test) are treated as wrong during validation

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

As far as I can see ConvE also filters all the known data. But I agree that test should be unknown during model selection. This should be prominently noted in the release!

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

Bordes also removes all the triples for validation

https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L143
https://github.com/glorotxa/SME/blob/master/FB15k_exp.py#L310

The CP guy does it as well:

https://github.com/facebookresearch/kbc/blob/master/kbc/datasets/process_datasets.py#L73

And I am not going through the code for ConvE again now, but he is doing it as well.

My suggestion:

We can have our we know better even if its worse implementation and give people the choice to do so. But as I said, when we are doing it by default differently than everyone else and get different results, then this should be prominently noted in the release!

Alos, people should know why their dev set results are not comparable to others and why!

from kge.

rufex2001 avatar rufex2001 commented on August 14, 2024

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

The cleanest solution might be to produce both numbers and not exclude anything by default

valid filtered(train/dev)
valid filtered(train/dev/test)

and then have an explanation/discussion in the documentation which is what and why they are there.

from kge.

rgemulla avatar rgemulla commented on August 14, 2024

Well if everybody is using test data anyway, why shouldn't we just "validate" against it? ;)

But seriously: I do not like giving both numbers by default because this encourages use of test data.

Instead, we can have a clearly documented option such as valid.filter_with_test: False. Then we can publish training/grid search jobs that show how they have been trained/selected in their config.

A discussion the the docs is good to have, independently of this.

from kge.

rgemulla avatar rgemulla commented on August 14, 2024

Btw, the "even if its worse" would be interesting to understand (right know it's merely a conjecture). How much is everybody relying on test data to train their models?

Perhaps: when valid.filter_with_test: True, we output both the filtered and filtered_with_test metrics. But the default should be not to use test data.

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

My current conjecture is this:

Filter with test: we pick models during validation more likely where the test data is ranked high, because then valiation data has a higher chance to be ranked high. -> bad because of leakage

Filter without test: we pick models during validation more likely where the test data is not ranked high, because then valiation data has a higher chance to be ranked high. -> also bad, because now we treat test as wrong and pick the corresponding models that do the same

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

So there is leakage in both directions. Terrible.

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

I currently see no solution for the current datasets. Both decisions are wrong and Rainers suggestion goes against community accepted practices and makes experiments not comparable to previous work.

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

@rgemulla can you please include an option valid.filter_with_test: True for the time being. As long as we develop the framework it would be nice to be sure that what we are doing converges to SOTA. Then we can explore what kind of an effect valid.filter_with_test: False actually has and then we can safely decide, based on some evidence and not a hunch.

from kge.

rgemulla avatar rgemulla commented on August 14, 2024

"community accepted" ;) --- Who accepted this? And who silently did it?

Leakage both ways? No; when no test data is used, it can't leak. Also, there are already many true triples that the validation does not know about. For many pairs in validation, whether or not test data is used may not even matter (nothing filtered).

An alternative way would be to use a different validation protocol (e.g., classification against random triples).

Anyway, I'll add this technique but it will be turned off by default and be marked with a red flag. Whoever wants can override it in their model config.

from kge.

rgemulla avatar rgemulla commented on August 14, 2024

Option valid.filter_with_test added in a0b1004

If all of you agree with this solution, we can close this issue.

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

I adopted it from Yanjie's and Daniel's evaluation code thinking that KB people do it like this. Maybe after Bordes everybody did it like this (btw, put your work with Martin on the list, and Analogy as well).

There is no use in speculating about the effect it might or might not have.

Martin's code

https://github.com/drachenbach/thrax/blob/master/src/main.cpp#L116
https://github.com/drachenbach/thrax/blob/master/include/thrax/optimizer/Optimizer.h#L76
https://github.com/drachenbach/thrax/blob/master/include/thrax/evaluation/Evaluation.h#L261

Analogy

https://github.com/quark0/ANALOGY/blob/master/main.cpp#L682
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L724
https://github.com/quark0/ANALOGY/blob/master/main.cpp#L438

from kge.

samuelbroscheit avatar samuelbroscheit commented on August 14, 2024

a0b1004#diff-92c133ecf2cbcc44179341f071c31dc5R78
a0b1004#diff-1dd7c36dae44d494543baa0f3c7dbcd5R65

evil really :) ? Language could be toned down a bit.

For me the issue will be resolved as soon as we haved published a paper where we explain our findings and reviewers agree with this and the community uses this method from then on. :)

from kge.

rgemulla avatar rgemulla commented on August 14, 2024

Evil was the most intuitive label I could come up with ;) --- Feel free to change, I did not expect this to survive the day.

I think this issue is resolved from the implementation point of view (perhaps apart from the default value). I suggest to insert a documentation stub to the README and go on for now.

from kge.

rgemulla avatar rgemulla commented on August 14, 2024

Nothing to do here anymore.

from kge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.