Git Product home page Git Product logo

rdgcn's Introduction

RDGCN

Source code and datasets for IJCAI 2019 paper: Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs.

Initial datasets are from GCN-Align and JAPE.

Dependencies

  • Python>=3.5
  • Tensorflow>=1.8.0
  • Scipy>=1.1.0
  • Numpy

Due to the limited graphics memory of GPU, we ran our codes using CPUs (40 Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz).

Datasets

Please first download the datasets here and extract them into data/ directory.

There are three cross-lingual datasets in this folder:

  • fr-en
  • ja-en
  • zh-en

Take the dataset DBP15K (ZH-EN) as an example, the folder "zh_en" contains:

  • ent_ids_1: ids for entities in source KG (ZH);
  • ent_ids_2: ids for entities in target KG (EN);
  • ref_ent_ids: entity links encoded by ids;
  • triples_1: relation triples encoded by ids in source KG (ZH);
  • triples_2: relation triples encoded by ids in target KG (EN);
  • zh_vectorList.json: the input entity feature matrix initialized by word vectors;

Running

  • Modify language or some other settings in include/Config.py
  • cd to the directory of main.py
  • run main.py

Due to the instability of embedding-based methods, it is acceptable that the results fluctuate a little bit (±1%) when running code repeatedly.

If you have any questions about reproduction, please feel free to email to [email protected].

Citation

If you use this model or code, please cite it as follows:

Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, Dongyan Zhao. Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5278-5284, 2019.

@inproceedings{ijcai2019-733,
  title={Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs},
  author={Wu, Yuting and Liu, Xiao and Feng, Yansong and Wang, Zheng and Yan, Rui and Zhao, Dongyan},
  booktitle={Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, {IJCAI-19}},            
  pages={5278--5284},
  year={2019},
}

rdgcn's People

Contributors

mberr avatar stephaniewyt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rdgcn's Issues

OOM when allocating tensor with shape[2259000,300]

Caused by op 'gradients/concat', defined at:
File "main.py", line 34, in
Config.epochs, train, e, Config.k, test)
File ".\include\Model.py", line 275, in training
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)

What's the effect of the translated name embeddings?

A very nice work!

I replaced the initialization method with the random initialization and found that RDGCN failed to achieve promising performance, So I want to know what's the effect of the translated name embeddings?

Thanks!

Ablation studies that only using structure information

Hi,
Thanks for sharing your code.

I tried to conduct ablation studies that only uses the structure information. I made some modifications on the following function.

RDGCN/include/Model.py

Lines 187 to 195 in d60ab1a

def get_input_layer(e, dimension, lang):
print('adding the primal input layer...')
with open(file='data/' + lang + '_en/' + lang + '_vectorList.json', mode='r', encoding='utf-8') as f:
embedding_list = json.load(f)
print(len(embedding_list), 'rows,', len(embedding_list[0]), 'columns.')
input_embeddings = tf.convert_to_tensor(embedding_list)
ent_embeddings = tf.Variable(input_embeddings)
return tf.nn.l2_normalize(ent_embeddings, 1)

After modification, the entity embedding are not initialized from pretrained word embeddings. Instead, they are randomly initialized. However, ramdom initialization leads to significant decrement of the performance.

def get_input_layer(e, dimension, lang):
    print('adding the primal input layer...')
    with open(file='data/' + lang + '_en/' + lang + '_vectorList.json', mode='r', encoding='utf-8') as f:
        embedding_list = json.load(f)
        print(len(embedding_list), 'rows,', len(embedding_list[0]), 'columns.')
    input_embeddings = tf.convert_to_tensor(embedding_list)
    # ent_embeddings = tf.Variable(input_embeddings)
    ent_embeddings = tf.Variable(tf.random.uniform(shape=tf.shape(input_embeddings)))
    return tf.nn.l2_normalize(ent_embeddings, 1)

On JA_EN dataset, H@1 only reaches 0.53% after 150/600 epochs. Did I miss something? I would be grateful if you could reply to this issue.

Thanks and regrads,

what is the meaning of this line

The idea of ​​the article is great!
I understand the paper but I have a problem when I refer to the source code.Main.py 140 lines
what is logits = f_1 + tf.transpose(f_2) mean?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.