ibalazevic / tucker Goto Github PK

TuckER: Tensor Factorization for Knowledge Graph Completion

License: MIT License

Python 100.00%

tucker's Introduction

TuckER: Tensor Factorization for Knowledge Graph Completion

This codebase contains PyTorch implementation of the paper:

TuckER: Tensor Factorization for Knowledge Graph Completion. Ivana Balažević, Carl Allen, and Timothy M. Hospedales. Empirical Methods in Natural Language Processing (EMNLP), 2019. [Paper]

TuckER: Tensor Factorization for Knowledge Graph Completion. Ivana Balažević, Carl Allen, and Timothy M. Hospedales. ICML Adaptive & Multitask Learning Workshop, 2019. [Short Paper]

Link Prediction Results

Dataset	MRR	Hits@10	Hits@3	Hits@1
FB15k	0.795	0.892	0.833	0.741
WN18	0.953	0.958	0.955	0.949
FB15k-237	0.358	0.544	0.394	0.266
WN18RR	0.470	0.526	0.482	0.443

Running a model

To run the model, execute the following command:

 CUDA_VISIBLE_DEVICES=0 python main.py --dataset FB15k-237 --num_iterations 500 --batch_size 128
                                       --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.3 
                                       --hidden_dropout1 0.4 --hidden_dropout2 0.5 --label_smoothing 0.1

Available datasets are:

FB15k-237
WN18RR
FB15k
WN18

To reproduce the results from the paper, use the following combinations of hyperparameters with batch_size=128:

dataset	lr	dr	edim	rdim	input_d	hidden_d1	hidden_d2	label_smoothing
FB15k	0.003	0.99	200	200	0.2	0.2	0.3	0.
WN18	0.005	0.995	200	30	0.2	0.1	0.2	0.1
FB15k-237	0.0005	1.0	200	200	0.3	0.4	0.5	0.1
WN18RR	0.003	1.0	200	30	0.2	0.2	0.3	0.1

Requirements

The codebase is implemented in Python 3.6.6. Required packages are:

numpy      1.15.1
pytorch    1.0.1

Citation

If you found this codebase useful, please cite:

@inproceedings{balazevic2019tucker,
title={TuckER: Tensor Factorization for Knowledge Graph Completion},
author={Bala\v{z}evi\'c, Ivana and Allen, Carl and Hospedales, Timothy M},
booktitle={Empirical Methods in Natural Language Processing},
year={2019}
}

tucker's People

Contributors

Stargazers

Watchers

tucker's Issues

You add reverse relation to the training data?

Reverse relation is usually removed from datasets, but you add it to training data? What is the motivation for this?

why do you have reverse triples in evaluation?

In code

self.valid_data = self.load_data(data_dir, "valid", reverse=reverse)
self.test_data = self.load_data(data_dir, "test", reverse=reverse)

it should be

self.valid_data = self.load_data(data_dir, "valid", reverse=False)
self.test_data = self.load_data(data_dir, "test", reverse=False)

I did testing with this and the results it shows are much better than reported in the paper. Please let me know if I am wrong.

What different from torch==0.4.0 to torch==1.0.0?

I used torch==1.0.0 and cannot reproduce the results in the paper. What different from torch==0.4.0 to torch==1.0.0? Thanks for your help!

why not use 1-x score function.

Hi, thanks for your elegent code and job!!
I am thinking how to achieve the 1-x socre funtion( The x of 1-x means the number of entity to form a loss. 1-N uses the whole entity.）. In my opinion, 1-x should has much better performace than 1-N, becase is hard to train in high dim.
So why not use 1-x score function ?

Unable to reproduce results on WN18RR

Hi Ivana

I am trying to get entity embeddings for a downstream application. For WN18RR dataset I was unable to reproduce the reported results of TuckER. I used the hyperparameters given in the README of this repo. Following is the command I used:

 CUDA_VISIBLE_DEVICES=3 python main.py --dataset WN18RR --num_iterations 500 --batch_size 128 \
                                       --lr 0.01 --dr 1.0 --edim 200 --rdim 30 --input_dropout 0.2 \
                                       --hidden_dropout1 0.2 --hidden_dropout2 0.3 --label_smoothing 0.1

And the results are:

495
12.792492151260376
0.00035594542557143403
Validation:
Number of data points: 6068
Hits @10: 0.5121951219512195
Hits @3: 0.4728081740276862
Hits @1: 0.43638760711931446
Mean rank: 6254.662491760053
Mean reciprocal rank: 0.4624483298017613
Test:
Number of data points: 6268
Hits @10: 0.5140395660497766
Hits @3: 0.4738353541799617
Hits @1: 0.43123803446075304
Mean rank: 6595.924856413529
Mean reciprocal rank: 0.45961590280892123
5.328977823257446

Should I increase the number of epochs or am I missing something?

Thanks

Parameters for reproducing results from paper

Can you provide the parameters for reproducing the results from the paper on FB15k and FB15K-237? I ran the command from the README:

 CUDA_VISIBLE_DEVICES=0 python main.py --dataset FB15k-237 --num_iterations 500 --batch_size 128
                                       --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.3 
                                       --hidden_dropout1 0.4 --hidden_dropout2 0.5 --label_smoothing 0.1

which gave final performance of

Number of data points: 35070
Hits @10: 0.4009124607927003
Hits @3: 0.2555460507556316
Hits @1: 0.1760193897918449
Mean rank: 291.46401482748786
Mean reciprocal rank: 0.24741750020439274
Test:
Number of data points: 40932
Hits @10: 0.3974396560148539
Hits @3: 0.2546662757744552
Hits @1: 0.17094205022964917
Mean rank: 304.61949086289457
Mean reciprocal rank: 0.24344486414937788

Any ideas?

UPDATE: I noticed in the paper that you mention the best learning rate for FB15k-237 is 0.005 instead of 0.0005 and best the learning rate decay is 0.995 instead of 1.0 -- might that be the issue?

arguments have invalid types: (numpy.ndarray) in "main.py", line 82

sort_idxs is numpy and e2_idx is torch
I changed the line 82 in main.py like below , and it worked
rank = np.where(sort_idxs[j]==e2_idx.cpu().numpy()[j])[0][0]

Do you only test tails in the evalution?

I am a bit confused about the evaluation protocol. In the evaluation, you only feed (head, rel) to the model and get predictions with n elements representing the scores of (head, rel, t_1) ... (head, rel, t_n). Why you don't repeat this process for the tail? Could you explain the reason? I think it should be done right? Previous works all conduct the evaluation in this way.
Maybe I misunderstand your code. Look forward to your reply.

Why set "padding_idx=0" in nn.Embedding

Hi~
I have found that the code set "padding_idx=0" in nn.Embedding, like
self.E = torch.nn.Embedding(len(d.entities), d1, )
self.R = torch.nn.Embedding(len(d.relations), d2, padding_idx=0)
However, this will lead the gradient of the first entity and relation becoming zero. This is very interesting and I want to know the reason for this. Thank you!

Could the one-way evaluation be a problem?

Hi,

I have a question on the evaluation in the code.

when the test rank is evaluated, the scores seem only be calculated for each head toward all tails. I didn't see the scores are calculated for each tail toward all heads in the code. Don't people usually calculate them both and average them as the final scores? Would this one-way evaluation be a problem, such as having some bias?

Thank you!

License

Hey,
Thanks for releasing this. Could you clarify the license of the repo, couldn't find it anywhere in the Readme/code.

Is it MIT?

Wrong FB15k-237 folder name

The default value for data_dir should be data/FB15k-237/ and not data/FB15K-237/ (lowercase the "K").

TuckER/load_data.py

Line 3 in 80fc331

def __init__(self, data_dir="data/FB15K-237/", reverse=False):

Realistic Ranking Evaluation

This paper mentions that when using realistic ranking, Tucker's performance decreases.

I was thinking if it was because of using the sigmoid before ranking which maybe causing numerical issues. It would be great if you could let me know what you think of this realistic ranking metric evaluation scheme!

paper attribution

I want to ask if your paper was published in journals or conferences.

data progress,

def load_data(self, data_dir, data_type="train", reverse=False):
with open("%s%s.txt" % (data_dir, data_type), "r") as f:
data = f.read().strip().split("\n")
data = [i.split("") for i in data]
if reverse:
data += [[i[2], i[1]+"_reverse", i[0]] for i in data]
return data

Are your sure the data is data = [i.split("") for i in data] not data = [i.split("\t") for i in data], your data is splited by "\t", but you used space, if I use data = [i.split("\t") for i in data], I can not get the result you report in your paper about FB15k-237, can you explain it?

Why is self-adversarial negative sampling unfair?

You have said that 'The RotatE (Sun et al., 2019) results are reported without their self-adversarial negative sampling (see Appendix H in the original paper) for fair comparison'

What is the reason for it being unfair?

question about evaluation

In the paper, you say

for a given triple, we generate 2*n_e test triples by

keeping the subject entity e_s and relation r fixed and replacing the object entity e_o with all possible entities E and by

keeping the object entity e_o and relation r fixed and replacing the subject entity e_s with all entities E.

In the evaluate function, it looks like you score all possilbee_o's given an (e_s, e_r) tuple, then compute the rank of the true e_o. So I see how you're doing 1) above, but are you actually doing 2)?

Thanks!
~ Ben

Unable to reproduce results on FB15k

Hey, I ran the code with suggested parameters, however, I was not able to reproduce the results on FB15k.

On FB15K, I got the following MRR (The best MRR is 0.789 in 500 epochs):
500
30.736143827438354
0.00022165691843464258
Validation:
Number of data points: 100000
Hits @10: 0.88763
Hits @3: 0.8288
Hits @1: 0.73175
Mean rank: 39.8066
Mean reciprocal rank: 0.789614621151087
Test:
Number of data points: 118142
Hits @10: 0.8898613532867228
Hits @3: 0.8294086776929458
Hits @1: 0.7293595842291479
Mean rank: 38.221682382218006
Mean reciprocal rank: 0.7889229105464421

Hyperparameters for Yago3-10

Hi, thanks for developing this amazing model.
I'd like to try and train it on the Yago3-10 dataset (I think you have used it in your other work titled "Hypernetwork Knowledge Graph Embeddings").

Have you ever tried to train TuckER on that dataset?
Can you suggest me any hyperparameter settings, before I start running a long grid search? :)

Thanks for your help!

Andrea

Reopening evaluation issue

Hi,

I was just going through your code and found out that the training data has been augmented by adding new relations for reversed triples from the training set (correct me if I am wrong). I am not sure whether this is harmless, as this might have a regularzing effect on the weights the model learns.

Instead of adding new relations for reversing the triples, could you try the following and check whether this gives the same result?

Create d.train_data_reversed, where for each triple from d.train_data you only switch e_s and e_o and keep the relation. (So you don't create any new relations in this dataset.)
Add to class TuckER a method forward_reversed that is exactly the same as forward, but transposes the tensor W, so that the axes for e_s and e_o are switched.
When training, use forward for d.train_data and use forward_reversed for d.train_data_reversed

I think this way, one can guarantee that the evaluation is fair. It would be also interesting to know how you evaluate other models you compare with, for examples, whether you use the BCE loss and augment the training data for other models as well. This will make sure that it is is not the BCE loss or data augmentation that helps TuckER perform well.

Reverse flag implementation

Hey if I just change the line 194 (d = Data(data_dir=data_dir, reverse=True) in main.py file, and use reverse=False, and run the code for FB15k-237 with recommended settings, the MRR shoots up to 0.4067. Is it expected behaviour?

To replicate:

Just change reverse=False in main.py
CUDA_VISIBLE_DEVICES=0 python main.py --dataset FB15k-237 --num_iterations 500 --batch_size 128 --lr 0.0005 --dr 1.0 --edim 200 --rdim 200 --input_dropout 0.3 --hidden_dropout1 0.4 --hidden_dropout2 0.5 --label_smoothing 0.1

MRR keeps increasing.

Log for iteration 145:
145
21.162700176239014
0.001321860825107114
Test:
Number of data points: 20466
Hits @10: 0.6135053259063813
Hits @3: 0.4763998827323366
Hits @1: 0.3409557314570507
Mean rank: 147.47825662073683
Mean reciprocal rank: 0.4335430118109515

Relation Prediction

Dear Ivana,

firstly thank you for the work and clean implementation. As I am interesting in relation prediction via TuckER, I was wondering whether you could provide a function like def forward_rel(self, e1_idx, e2_id): so that one could make use of TuckEr in relation prediction task.

Cheers

question about speed

Hi , I experiment with your code and data on GPU, the parameters and settings are the same as yours, however ,it runs very slowly，Memory-Usage is 740M/12202M, my pytorch version is 0.4. What's the speed of your experiment? How to Improve？

ibalazevic / tucker Goto Github PK

tucker's Introduction

TuckER: Tensor Factorization for Knowledge Graph Completion

Link Prediction Results

Running a model

Requirements

Citation

tucker's People

Contributors

Stargazers

Watchers

Forkers

tucker's Issues

Recommend Projects

Recommend Topics

Recommend Org