Git Product home page Git Product logo

rpje's Introduction

RPJE

AAAI 2020: Rule-Guided Compositional Representation Learning on Knowledge Graphs

This is our c++ source code and data for the paper:

Guanglin Niu, Yongfei Zhang, Bo Li, Peng Cui, Si Liu, Jingyang Li, Xiaowei Zhang. Rule-Guided Compositional Representation Learning on Knowledge Graphs. In AAAI, 2020. Paper in arXiv.

Author: Dr. Guanglin Niu (beihangngl at buaa.edu.cn)

Introduction

Rule and Path-based Joint Embedding (RPJE) takes full advantage of the explainability and accuracy of logic rules, the generalization of knowledge graph (KG) embedding as well as the supplementary semantic structure of paths. RPJE achieves better performance with higher accuracy and explainability on KG completion task.

Dataset

We provide four datasets: FB15K, FB15K237, WN18 and NELL-995. You can find all the datasets as well as the encoded rules mined from each dataset in the folders ./data_FB15K, ./data_FB15K237, ./data_WN18, ./data_NELL-995, which containing the following files:

  • entity2id.txt: Entity file containing all the entities in the dataset. Each line is an entity and its id: (entity name, entity id).
  • relation2id.txt: Relation file containing all the relations in the dataset. Each line is an relation and its id: (relation name, relation id).
  • train.txt: Training data file containing all the triples in train set. Each line is a triple in the format (head entity name, tail entity name, relation name).
  • valid.txt: Validation data file containing all the triples in valid set. Each line is a triple in the format (head entity name, tail entity name, relation name).
  • test.txt: Testing data file containing all the triples in test set. Each line is a triple in the format (head entity name, tail entity name, relation name).
  • train_pra.txt: Training data file containing all the triples with the paths linking the entity pairs. Each train instance is composed of two lines. The former line is a triple in train.txt but in the format (head entity name, tail entity name, relation id), and the latter line is the paths information linking the entity pair of this triple in the format (the number of paths, relation id list in path 1, reliability of path 1, relation id list in path 2, reliability of path 2,...).
  • test_pra.txt: Testing data file containing all the triples with the paths linking the entity pairs. Each test instance is composed of two lines. The former line is a triple in test.txt but in the format (head entity name, tail entity name, relation id), and the latter line is the paths information linking the entity pair of this triple in the format (the number of paths, length of path 1, relation id list in path 1, reliability of path 1, length of path 2, relation id list in path 2, reliability of path 2,...).
  • confidence.txt: Confidence file containing all the paths with their corresponding direct relations in the dataset. The former line is a path in the format (length of the path, relation id list in the path), and the latter line is all the relations related to this path in the format (number of the relations, relation 1 id, reliability of the path representing relation 1, relation 2 id, reliability of the path representing relation 2,...).

In each folder of dataset, the folder ./rule containing all the encoded rules with various confidence threshold:

  • rule_path[n].txt: Rules file containing all the encoded rules of length 2 mined from the dataset with the confidence threshold n. Each line is an encoded rule in the format (id of the first relation in rule body, id of the second relation in rule body, id of the relation in rule head).
  • rule_relation[n].txt: Rules file containing all the encoded rules of length 1 mined from the dataset with the confidence threshold n. Each line is an encoded rule in the format (id of the relation in rule body, id of the relation in rule head).

Please note that all the above data contain the positive instances for training. The negative instances are generated in the process of training.

Example to Run the codes

Firstly, select the dataset and the rules confidence threshold in the training file Train_RPJE.cpp. And then implement the settings:

  • dimension: dimension of entity and relation embeddings
  • nbatches: number of batches for each epoch
  • nepoches: number of epoches
  • alpha: learning rate
  • maring: margin in max-margin loss for training
  • lambda: weight of paths and length 2 rules in loss function
  • lambda_rule: weight of length 1 rules in loss function

Compile

g++ Train_RPJE.cpp -o Train_RPJE -O2
g++ Test_RPJE.cpp -o Test_RPJE -O2

Train

./Train_RPJE

Test

./Test_RPJE

Acknowledge

@inproceedings{RPJE19,
  author    = {Guanglin Niu and
               Yongfei Zhang and
               Bo Li and
               Peng Cui and
               Si Liu and
               Jingyang Li and
               Xiaowei Zhang},
  title     = {Rule-Guided Compositional Representation Learning on Knowledge Graphs},
  booktitle = {arXiv preprint},
  year      = {2019}
}

rpje's People

Contributors

ngl567 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rpje's Issues

rule folder?

Hello, after extracting the rules according to the rule extraction tool AMIE+ cited in the paper, the rules are much more than what you gave in the paper. How did you deal with it? Do you choose the one with higher confidence? , I hope you can answer, thank you!

About the paper.

Dear author,I have some questions about the paper“Rule-Guided Compositional Representation Learning on Knowledge Graphs”.

  1. About the Loss L3.Why the confidence level denoted as β only with positive energy,the paper say " The confidence levels of all the rules are considered to be penalty coefficients in optimization".Can you explain it for me.
    image

2.When Evaluate the model,the paper didn't use the Third Energy E3 as Triple Score,Why?
image

Looking forward to your response. XD

Segmentation fault(core dumped)

Can you tell me the detail of runtime environment? My g++ version is 7.5.0, but i meet the error: Segmentation fault(core dumped) when i run the command ./Train_RPJE

'n2n.txt' missing

Hello,

There is a file called 'n2n.txt' missing in folder FB15K237, which is required in Test_RPJE.cpp line 666: FILE* f7 = fopen("./n2n.txt","r"); Can you please double-check to see what happened? Or if there's any other way I can reproduce your result in FB15K237?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.