haseebs / owe Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 9.0 50 KB

Pytorch code for An Open-World Extension to Knowledge Graph Completion Models (AAAI 2019)

Home Page: https://aaai.org/ojs/index.php/AAAI/article/view/4162

Shell 0.27% Python 99.73%

owe's People

Contributors

Stargazers

Watchers

Forkers

1780041410 leiloong junsungson yzuki cse-ljl ren-1247 bonniezq moqingxinai hell-to-heaven

owe's Issues

Data for head prediction

Hi there,

I noticed that the dataset you provided (FB15k-237-OWE) represents the open-world tail prediction data you described in your paper. However, is there a way to construct also the head prediction dataset ?

Thanks

When training FB15K237OWE through OpenKE, there was a segment error, is it a problem with the dataset?

Hi, haseebs, thank for your time. here is my question
“”“”
Input Files Path : ./benchmarks/FB15K-237-OWE/openke_format/
The toolkit is importing datasets.
Segmentation fault
“”“”

Process killed during dataload

During data loading for a large KG (900k+) triples in data.py

def init_labels(self) -> None:
#other code
self.head_labels = {k: to_one_hot(t, self.train.num_entities) for k, t in all_heads.items()}
self.tail_labels = {k: to_one_hot(t, self.train.num_entities) for k, t in all_tails.items()}

I have around 1M heads and tails items combined. My guess is it getting targeted by the OOM Killer. Any ideas why?

training KGC models using Open-KE

Hi,

I wonder if I can train the KGC model using only the code in the folder "owe" that you gave if I don't want to use the open-KE framework. Or what is the relationship between the two folders that you gave. I'm sorry I'm new to this, so there may be soem questions.

Thanks

How to find the enwiki_20180420_300d.bin ?

Hi,

I only find the enwiki_20180420_300d.txt or enwiki_20180420_300d.pkl, but I cannot find 300d.bin.

could you share this file?

Miao

Regarding training OpenKE with bigger graphs

This is technically not part of your contribution but I was wondering if you could let me know about how to train embeddings on bigger graphs. I am using a GPU with 32 GB GPU and it is exceeding it's capacity.

WARNING: Config setting: 'cuda' not found

WARNING: Config setting: 'cuda' not found. Returning None!

If I get this correctly this should be thrown due to not finding a configuration in the config.ini file. Would this affect speed of training

FB20k dataset

Hi, thanks for sharing this wonderful work!

Can you share the processed FB20K dataset?

AttributeError: 'NoneType' object has no attribute 'endswith'

This problem occurs when I run 'run_open_world.py'. And the first thing that went wrong was the following line of code in the 'data.py' file.
"return KeyedVectors.load_word2vec_format(embedding_file, binary=not embedding_file.endswith(".txt")) "
I woule like to know what this line of code means and what the endswith function means. Thank you!

about the dataset

Hi, Can you share the Test set and Valid set of Head Pred. of FB15k-237-OWE, as you wrote in the Table2 in your paper? In addition, can you provide the openke_format dataset of FB20k? I can't find it. Thanks so much!

Extend the code for predicting relations

I was wondering which parts of the code needs to be updated for the model to predict relations given head and tail entities

About <PATH_TO_KGC_EMBEDDINGS>

Hello,
In this, python owe/run_open_world.py -t -c <PATH_TO_DATASET> -d <PATH_TO_CONFIG_AND_OUTPUT_DIR> --<KGC_MODEL_NAME> <PATH_TO_KGC_EMBEDDINGS>,What does <PATH_TO_KGC_EMBEDDINGS> refer to?
I wonder what kind of file it is? And how should this file be obtained?

Some help reproducing results of FB20k & DBPedia50k

Hi,
I'm able to reproduce the FB15k-237-OWE results reported in the paper. I'm unable to reproduce FB20k and DBpedia50k results.

Can you tell me which split you used? I'm using the splits of DKRL and Conmask respectively. For FB20k do you use the same descriptions as DKRL or shorter wikidata descriptions ?
Do I need to change any of the hyperparamters ? In the paper, only dropout is mentioned
The descriptions of FB20k and DBpedia50k are quite long. Do you still just use an average encoder? Any filtering before averaging ?

unsupported operand type

Hi there,
I found that when I loaded pretrained embeddings from Complex model , there is a mistake that "unsupported operand type(s) for / : 'str' and 'str' "
And the corresponding code is "entity_r_file = emb_dir / "entities_r.p" "
Hope you can help me.

Pretrained ComplEx Embeddings for DBpedia

Hi there,

is there a chance to get the pretrained ComplEx Embeddings for DBPedia50k, similar as for the FB-OWE dataset ?

Thank you

Question about sampling strategy for FB15k-237-OWE

Hi Haseeb,

I have a question regarding your sampling strategy to produce FB15k-237-OWE.

Each picked head x is removed from the training graph by moving all triples of the form (x, ?, t) to the test set and dropping all triples of the form (?, ?, x) if t still remains in the training set after these operations.

For dropping all triples of the form (?, ?, x) if t still remains. What does t refer and it seems (?, ?, x) not contains t?

AttributeError: module 'openke.config' has no attribute 'Config'

I met this problem “AttributeError: module 'openke.config' has no attribute 'Config' ”when running train_and_export_fb15k-237z_transe.py . How should I solve it?

How to deal with IndexError: list index out of range

This is what I am getting as error:

Traceback (most recent call last):
File "run_closed_world.py", line 115, in
main()
File "run_closed_world.py", line 100, in main
model.init_embeddings(dataset, model_dir)
File "/......./OWE-master/owe/models/closed_world/transe.py", line 56, in init_embeddings
for l in entity2id_file.open("rt")}
File "/....../OWE-master/owe/models/closed_world/transe.py", line 56, in
for l in entity2id_file.open("rt")}
IndexError: list index out of range

entity2wikidata.json file queries

Hi,
I was wondering about the entity2wikidata.json file. If I were to use wikidata as the Knowledge Graph, how would this file look like? Also, are there any scripts that can be used to generate this file?

Dataset link broken

It seems like the links to download the datasets and the per-trained embedding are broken, can you fix this? Thank you

Predicting tails

The returned matrix for tail prediction is of the [1,E,D]; where E is the number of distinct entities in the training set and D is the number of dimensions, does that mean the model expects known entities as tails in the test set??

can you give me an example?

Thank you for your reply.

I am running your code. But it's not going well.
I downloaded the data set, could you give me an example?
For example :
Training:
python3 owe/run_open_world.py -t -c ./data/FB15k-237-OWE -d ./ --complex ./data/FB15k_embedding

evaluate:
python3 owe/run_open_world.py -e -lb -c ./data/preprocessed -d ./ --complex ./data/FB15k_embedding

Miao

How to use pickle export the entity and relation matrices as numpy arrays ?

Here it is said that in order to use a pre-trained KGC model, entities and relationships need to be exported as numpy arrays, and the final files are entities.p and relations.p. But can pickle export .p files directly?

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Hi,haseebs:
I downloaded enwiki _ 20180420_ 300d. pkl.bz2 on Wiki, whether the preconditionedembeddingfile =. / data / enwiki_ 20180420_ 300d. pkl.bz2 Or =. / data / enwiki_ 20180420_ 300d.pkl。 When running the code, I encountered the following problems. How can I solve them？

(venv) gzdx@2080:/home/OLD/rhl/OWE$ python owe/run_open_world.py -t -c ./data/FB15k-237-zeroshot -d /home/OLD/rhl/OWE/ --complex ./embeddings/
19:20:05: INFO: Git Hash: 7735eb5

19:20:05: INFO: Reading config from: /home/OLD/rhl/OWE/config.ini
19:20:05: INFO: Using entity2wikidata.json as wikidata file
data/FB15k-237-zeroshot/train.txt
data/FB15k-237-zeroshot/valid_zero.txt
data/FB15k-237-zeroshot/train.txt
data/FB15k-237-zeroshot/valid_zero.txt
19:20:05: INFO: 12324 distinct entities in train having 235 relations (242489 triples).
19:20:05: INFO: 6038 distinct entities in validation having 220 relations (9424 triples).
19:20:05: INFO: 8897 distinct entities in test having 224 relations (22393 triples).
19:20:05: INFO: Working with: 14405 distinct entities having 235 relations.
19:20:05: INFO: Converting entities...
19:20:06: INFO: Building Vocab...
19:20:06: INFO: Building triples...
19:20:17: INFO: Loading word vectors from: ./data/enwiki_20180420_300d.pkl.bz2...
Traceback (most recent call last):
File "owe/run_open_world.py", line 164, in
main()
File "owe/run_open_world.py", line 99, in main
word_vectors = data.load_embedding_file(Config.get('PretrainedEmbeddingFile'))
File "/home/OLD/rhl/OWE/owe/data.py", line 67, in load_embedding_file
return KeyedVectors.load_word2vec_format(embedding_file, binary=not embedding_file.endswith(".txt"))
File "/home/OLD/rhl/venv/lib/python3.8/site-packages/gensim/models/keyedvectors.py", line 1547, in load_word2vec_format
return _load_word2vec_format(
File "/home/OLD/rhl/venv/lib/python3.8/site-packages/gensim/models/utils_any2vec.py", line 276, in _load_word2vec_format
header = utils.to_unicode(fin.readline(), encoding=encoding)
File "/home/OLD/rhl/venv/lib/python3.8/site-packages/gensim/utils.py", line 368, in any2unicode
return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte