nju-websoft / rsn Goto Github PK

Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs, ICML 2019

License: MIT License

Jupyter Notebook 100.00%

knowledge-graph-embedding entity-alignment

rsn's Introduction

RSN

Lingbing Guo, Zequn Sun, Wei Hu*. Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs. In: ICML 2019

INSTALLATION

Please first install Python 3.5+, and then unpack data.7z.
Type pip install -r requirements in shell to install required packages. Note that, when using Tensorflow 1.2+, the learning rate has to be re-adjusted. We suggest using tensorflow-gpu = 1.1.

RUNNING

Run jupyter by typing jupyter notebook in shell.
In the opened browser, click RSN4EA.ipynb for entity alignment, or RSN4KGC.ipynb for KG completion.
The files RSN4EA.ipynb and RSN4KGC.ipynb record the latest results on DBP-WD (normal) and FB15K, respectively.
You can also click 'Toolbar -> Kernel -> Restart&Run All' to run these two experiments.

DATA

Limited by the space, we only uploaded FB15K for KG completion. For WN18 and FB15K-237, you can easily download them from the Internet.
Change options.data_path or other options.* to run RSN on different datasets with different settings.
For RSN4KGC.ipynb, we adopt a matrix filter method for evaluation, which may use more than 64GB memory.
For entity alignment, V1 denotes the normal datasets, and V2 denotes the dense ones. Please use first 10% data of ref_ent_ids for validation.
entity-alignment-full-data.7z provides a complete version of the EA datasets, including attribute files and datasets with different proportions.

CITATION

If you find our work useful, please kindly cite it as follows:

@inproceedings{RSN,
  author    = {Lingbing Guo and Zequn Sun and Wei Hu},
  title     = {Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs},
  booktitle = {ICML},
  pages     = {2505--2514},
  year      = {2019}
}

rsn's People

Contributors

Stargazers

Watchers

Forkers

leiloong techstone gegetang codemanyep xujinglin stjordanis murphyjoker jasonhu520 yuntianyt mdheller freekang hodge-ge zyh2011 myaxxxxx zdqf terrorblade2333 ritine

rsn's Issues

Type-based NCE

Hi,

I found you used type-based noise-constrained estimation (NCE) for negative sampling, which is very interesting. Could you please tell me which part of your code is for the NCE?

Best regards
Sirui

Save and restore the model?

Hello, as far as I can see there are no ways to save and restore a model when the training is over. It would be nice to have that feature :)

model.sample_paths() is wrong because sample_paths() is not a method of the class model. I fixed this bug and found another bug in the function sample_paths() in line:
rt_x = rtailkb.loc[hrt[:, 2]].apply(perform_random, axis=1)
The error was:
File "kgc.py", line 448, in sample_paths
rt_x = rtailkb.loc[hrt[:, 2]].apply(perform_random, axis=1)
File "/home/local/QCRI/ahmohamed/anaconda3/envs/rsn/lib/python3.6/site-packages/pandas/core/indexing.py", line 1767, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/local/QCRI/ahmohamed/anaconda3/envs/rsn/lib/python3.6/site-packages/pandas/core/indexing.py", line 1953, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/local/QCRI/ahmohamed/anaconda3/envs/rsn/lib/python3.6/site-packages/pandas/core/indexing.py", line 1594, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "/home/local/QCRI/ahmohamed/anaconda3/envs/rsn/lib/python3.6/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "/home/local/QCRI/ahmohamed/anaconda3/envs/rsn/lib/python3.6/site-packages/pandas/core/indexing.py", line 1654, in _validate_read_indexer
"Passing list-likes to .loc or [] with any missing labels "
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

"min" method in cal_ranks function: scoring ties in RSN

Hi, thanks for developing RSN.
I believe this is a ground-breaking model.

I am writing because I saw that the function cal_ranks you use to compute ranks in evaluation function accepts a method parameter that is "min" by default.

If I get it correctly, this conveys the policy for handling ties.
If some entities have the same score as the target one (so if there is a tie) the "min" policy will still give to the target entity the minimum rank. Otherwise, other policies can be applied, e.g. average, or max.

I was wondering how the choice of this policy affects the performances computed for RSN.
Of course, it depends on how many ties RSN generates in evaluation.
In your experience, is this something that happens very often?

...

Why is the depth biased calculated differently from what's described in the paper?

In the sampler, the 'pre' column is used to represent the previous entity in the path. However, it doesn't change with the length of the path so far. It's always set to be the first entity in the path.
This is done in initializing the 'pre' column in line: curr.loc[:, 'pre'] = hrt[:, 0]
This line puts the first entity in the path in the pre column.

For example, if the path so far is:
e1, r1, e2, r2, e3, r3, e4
and we are computing the depth bias for the entities that are neighbors to e4, then we will find for each candidate entity ei the bias between (e1, ei) rather than the bias between (e3, ei).

In the paper, its described as such that we calculate the bias between (ei-1, e+1) rather than (e1, ei+1). Can you please explain why this is happenning?

Thanks,

Concerning the attribute files of the datasets

Hi, thank you for sharing the code. It has been really helpful! However, I cannot find the attribute files, which should be essential for implementing JAPE and GCN-Align. I wonder whether you can provide these files, or am I missing something?

Many thanks,
Weixin.