Git Product home page Git Product logo

eea-eval's Introduction

Experimental analysis of knowledge graph embedding for entity alignment

We developed a degree-based sampling method to generate 42 alignment-oriented datasets from real-world large-scale KGs, representing different heterogeneities of the original KGs. We selected three state-of-the-art embedding-based entity alignment methods for evaluation and comparison. Furthermore, we observed that multi-mapping relations and literal embedding are the two main obstacles for embedding-based entity alignment and some preliminary solutions were attempted. Specifically, we leveraged several enhanced KG embedding models to handle multi-mapping relations and used word2vec to incorporate literal similarities into embeddings. Our findings indicate that the performance of existing embedding-based methods is influenced by the characteristics of datasets and not all KG embedding models are suitable for entity alignment. Alignment-oriented KG embedding remains to be explored.

Dataset

Description

We considered the following four aspects to build our datasets: source KG, dataset language, entity size and difference of degree distributions between the extracted datasets and original KGs. We selected three well-known KGs as our sources: DBpedia (2016-10), Wikidata (20160801) and YAGO3. For DBpedia, we also formed two cross-lingual datasets: English-French and English-German. In terms of entity sizes, we sampled two kinds of datasets with 15K and 100K entities, respectively. Each dataset contains two versions, V1 and V2, where V1 approximates the degree distribution of source KG, while V2 fits the doubled average degree. Due to lack of enough prior alignment, we only built V1 for cross-lingual DBP-100K. For each version, three samples were generated to prevent randomness. For each dataset, we have five files:

  • ent_links: reference entity alignmet
  • triples_1: relation triples of sampled entities in KG1
  • triples_2: relation triples of sampled entities in KG2
  • attr_triples_1: attribute triples of sampled entities in KG1
  • attr_triples_2: attribute triples of sampled entities in KG2

Download

All datasets can be downloaded from Datahub or Dropbox, in which three folders named "_1", "_2" and "_3" denote our three samples.

Degree distribution example

As shown below, this is an example of degree distributions of source KGs and sampled datasets. The sampled dataset in figure is WDB-WD-15K. The red curve represents the V1 version, and the blue curve represents the V2 version. The solid curve represents the source KG, and dotted curve represents the sampled dataset.

100K datasets statistics

The statistics of the 100K datasets are shown below.

DBP-WD-100K
V1 V2
DBpedia Wikidata DBpedia Wikidata
Relations S1 358 216 333 221
S2 364 211 333 226
S3 368 217 347 221
AVG 363 215 338 223
Attributes S1 463 807 349 740
S2 486 791 390 731
S3 466 783 402 756
AVG 472 794 380 742
Rel. triples S1 257,398 226,585 497,241 503,836
S2 259,100 224,863 493,865 484,209
S3 269,471 237,846 519,713 517,948
AVG 261,990 229,765 503,606 501,998
Attr. triples S1 399,424 593,332 385,004 838,155
S2 398,373 587,581 397,852 830,654
S3 397,787 619,950 389,973 856,447
AVG 398,528 600,288 390,943 841,752
DBP-YG-100K
V1 V2
DBpedia YAGO DBpedia YAGO
Relations S1 326 30 311 31
S2 358 31 320 31
S3 337 30 303 31
AVG 340 30 311 31
Attributes S1 404 24 347 24
S2 415 24 335 23
S3 402 24 343 23
AVG 407 24 342 23
Rel. triples S1 261,038 277,779 457,197 535,106
S2 281,143 318,434 443,115 522,817
S3 280,904 313,147 457,888 529,100
AVG 274,362 303,120 452,733 529,008
Attr. triples S1 425,648 141,936 442,973 108,338
S2 413,532 131,411 442,122 111,467
S3 420,947 136,464 448,000 105,639
AVG 420,042 136,604 444,365 108,481
DBP(en_fr)-100K-V1 DBP(en_de)-100K-V1
en fr en de
Relations S1 329 257 305 163
S2 331 254 310 167
S3 331 256 305 169
AVG 330 256 307 166
Attributes 332 469 S1 360 494
S2 331 478 361 494
S3 331 480 357 489
AVG 221 476 359 492
Rel. triples S1 367,096 294,440 273,093 230,586
S2 367,190 294,378 274,256 232,439
S3 367,328 294,471 275,022 232,364
AVG 367,205 294,430 274,124 231,796
Attr. triples S1 403,321 361,330 437,144 684,663
S2 402,443 361,648 436,472 685,318
S3 402,764 361,788 439,633 689,150
AVG 402,843 361,589 437,750 686,377

Code

Code files

Folder "code" contains two subfolders:

  • "comparative_method" contains the code of all comparative methods. The correspondence between code files and the methods are as follows:
    • "MTransE.py": MTransE
    • "IPTransE.py": IPTransE
    • "JAPE.py": JAPE
    • "TransD_plus.py": TransD+
    • "TransH_plus.py": TransH+
    • "TransH_2plus.py": TransH++
    • "Label2Vec.py": Label2Vec
  • "data_handler" contains the code of our degree-based sampling method.

Dependencies

The code is based on Python 3, Tensorflow, Scipy, Numpy, sklearn.

Code running

For running code, you need to modify the training data path and the supervision ratio in code files and then execute python3 "code_file.py". For example, if you want to run MTransE on DBP-WD-15K-V1 with 30% supervision, you should first set the two parameters in the main function of MTransE.py as "../ISWC2018/dbp_wd_15k_V1/" and 0.3, respectively. Then you need to execute python3 MTransE.py. During running, logs and results will be printed on screen.

Another simple way to run the code is to execute python3 "code_file.py" "data folder" "supervision ratio". For the above example, you can directly execute python3 MTransE.py ../ISWC2018/dbp_wd_15k_V1/ 0.3.

As for the parameters used in referred methods, you can modify them as you need in file "param.py".

Experimental Results

The file detailed_result.csv contains our detailed experimental results. Folder "figure" contains some figures about our experimental results.

If you have any difficulty or question about our datasets, source code or reproducing expriment results, please email to [email protected], [email protected] or [email protected].

eea-eval's People

Contributors

sunzequn avatar shilan910 avatar whu2015 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.