ttrouill / complex Goto Github PK

Source code for experiments in the papers "Complex Embeddings for Simple Link Prediction" (ICML 2016) and "Knowledge Graph Completion via Complex Tensor Factorization" (JMLR 2017).

License: Other

Python 100.00%

complex's People

Contributors

Stargazers

Watchers

complex's Issues

Writing out the scores

Hi, I must admit that I have been finding the code to confusing to solve this myself.

How would I go about, if I would like to write out the predicted scores of the models into a file for further use? I require the actual predicted triples/heads/tails for a given test triple with their scores (with this the ranks can be derived).

Thank you!

theano.function(self.get_pred_symb_vars(), self.pred_func)

Hi, thanks for sharing this repo. I am new to Theano. When I run the command python fb15k_run.py in your Readme, I got an error. Can you provide me any advice to solve it ? Thanks!

(env_complex) gyhu@mic119:/DATA/119/gyhu/code/complex$ python fb15k_run.py WARNING (theano.configdefaults): install mkl withconda install mkl-service`: No module named 'mkl'
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
2019-09-10 16:23:59,712 (EFE) [INFO] Nb entities: 14951
2019-09-10 16:23:59,713 (EFE) [INFO] Nb relations: 1345
2019-09-10 16:23:59,713 (EFE) [INFO] Nb obs triples: 483142
2019-09-10 16:24:07,993 (EFE) [INFO] Learning rate: 0.5
2019-09-10 16:24:07,994 (EFE) [INFO] Max iter: 1000
2019-09-10 16:24:07,994 (EFE) [INFO] Generated negatives ratio: 10
2019-09-10 16:24:07,994 (EFE) [INFO] Batch size: 4831
2019-09-10 16:24:07,994 (EFE) [INFO] Starting grid search on: Complex_Logistic_Model

You can find the C code in this temporary file: /tmp/theano_compilation_error_8ipm7__4
Traceback (most recent call last):
File "fb15k_run.py", line 39, in
fb15kexp.grid_search_on_all_models(all_params, embedding_size_grid = [emb_size], lmbda_grid = [lmbda], nb_runs = 1)
File "/DATA/119/gyhu/code/complex/efe/experiment.py", line 73, in grid_search_on_all_models
self.run_model(model_s,cur_params)
File "/DATA/119/gyhu/code/complex/efe/experiment.py", line 94, in run_model
model.fit(self.train, self.valid, Parameters(**vars(params)), self.n_entities, self.n_relations, self. n_entities, self.scorer)
File "/DATA/119/gyhu/code/complex/efe/models.py", line 152, in fit
self.setup_params_for_train(train_triples, valid_triples, hparams)
File "/DATA/119/gyhu/code/complex/efe/models.py", line 132, in setup_params_for_train
self.pred_func_compiled = theano.function(self.get_pred_symb_vars(), self.pred_func)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/compile/function .py", line 317, in function
output_keys=output_keys)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/compile/pfunc.py ", line 486, in pfunc
output_keys=output_keys)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/compile/function _module.py", line 1841, in orig_function
fn = m.create(defaults)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/compile/function _module.py", line 1715, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/link.py", li ne 699, in make_thunk
storage_map=storage_map)[:3]
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/vm.py", line 1091, in make_all
impl=impl))
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/op.py", line 955, in make_thunk
no_recycling)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/op.py", line 858, in make_c_thunk
output_storage=node_output_storage)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/cc.py", line 1217, in make_thunk
keep_lock=keep_lock)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/cc.py", line 1157, in compile
keep_lock=keep_lock)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/cc.py", line 1624, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/cmodule.py", line 1189, in module_from_key
module = lnk.compile_cmodule(location)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/cc.py", line 1527, in compile_cmodule
preargs=preargs)
File "/DATA/119/gyhu/soft/Anaconda3/envs/env_complex/lib/python3.6/site-packages/theano/gof/cmodule.py", line 2396, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', AdvancedSubtensor1(e1, tubes), '\n', "Compilation failed (return status=1): /tmp/ccpoOASs.s: Assembler messages:. /tmp/ccpoOASs.s:1675: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'. /tmp/ccpoOASs.s:1680: Error: no such instructio n: vextracti128 $0x1,%ymm0,16(%r12)'. ", '[AdvancedSubtensor1(e1, tubes)]')

about the appendix A and B of the essay

There is no trace about appendix A and appendix B of the Complex Embeddings for Simple Link Prediction

Print scores at index

Hello,
First off: great module! I just implemented it and it works like a charm with my own data.
Now I want to do some error analysis by printing out the top 100 scores of certain relationship types.
Is there any build-in way to do that?

Thanks,

ConceptNet

Hi Theo,
Sorry for contacting you through here, but the email address you've given doesn't work. Basically I'm a knowledge graph embedding hobbyist and I'm trying to experiment with your ComplEx code on a 'real' dataset that's based on ConceptNet instead of FB15k. It's much larger than FB15k, and has 1345609 entities, 1995411 triplets but only 36 relations. The number of entities is much larger than FB15k, but there are much fewer relation types.
Complex doesn't seem to be as effective on this dataset. There is some improvement but it's super slow. Could you please offer some insight into how I could improve the results? My guess is that, because there's many entities and not so many triplets, the algorithm doesn't have enough triplets to learn from. Perhaps increasing the negative triplet factor would help.
But training is so slow on my computer (3 days for 50 iterations) that I'd like to hear your input before wasting another 3 days.

Thank you, and keep up the great work!

a beginner's question

Dear author,I'm a Chinenese student, after I train on my data. I achieve a .mat model. It's a entities_num*dimension Matrix. But I don't know how the row corresponding to the entity. I just learn python recently. and I love read paper. This was my first time to practice experiment. Sorry for bother you about my simple question.

will it work on windows?

Loss Function of Trans_L1_Model

Hello,

I might be wrong, but is the self.loss function of the Trans_L1_Model correctly implemented in line 453 of model.py especially the reshape() part.

I am doubting it because lets say i have batch_size = 2, neg_ratio = 4 and the portion of the code just before reshape will give matrix as in figure (a) in the attached figure
Picture1.pdf where c1, c3, c5, c7 represent the corruption of same true triple and c2, c4, c6, c8 represent corruption of another triple.

When we reshape it by calling reshape((int(batch_size),int(neg_ratio)) as in the current code, we get something as in figure (b). If this is followed by sum along dimension 1, then the two different type of triples's data is being added to each other as in figure (c) (first column has sum = c1+c2+c3+c4 where c1, c3 belong to one type of triple is being added to c2, c4 which is another type).

On the other hand if we implement it as reshape(neg_ratio, batch_size) followed by sum along dimension 0, then i adds only one type of triple's data to each other.

please correct me if my understanding of this code is wrong.

Thanks in advance
Navdeep

Regarding implementing constraints of TransR inside your code

Hello Theo,

I am trying to implement TransR model using your code which, as you know, has the objective function ||hMr + r - tMr||_{2} for a given relation r(h,t). Further, as you know, this model requires that we put constraints ||hMr||_{2} <= 1, ||tMr||_{2} <=1 and ||r||_{2} <=1 on the objective function.
If I take a look at the other codes for TransR, for example here ([https://github.com/thunlp/OpenKE/blob/master/models/TransR.py]) They are taking L2 norm hMr/||hMr||_{2}, tMr/||tMr||_{2} and r/||r||_{2} to implement the above constraints on the objective function before visiting mini-batch.

Could you please tell how can i implement the above constraints as hMr/||hMr||_{2}, tMr/||tMr||_{2} and r/||r||_{2} inside your code. I am getting confused because in your code h and Mr would be stored separately and I can only implement an objective function self.loss with both h and Mr present separately in it.

(a) Here is one solution according to me: i implement a class TransR_Batch_Loader(Batch_Loader) inside batching.py where it computes h = h/sqrt(L2_norm(hMr)) and Mr = Mr/sqrt(L2_norm(hMr)) separately inside the call() function and save these parameters back in the model (like in line 81 of batching.py in your code). I can do similar thing for ||hMr||_{2} and with ||r||_{2}, the constraint is straight-forward. Afterwards, when the batch index of h, Mr is separately passed to the optimization algorithm, the algorithm is optimizing hMr = h/sqrt(L2_norm(hMr))*Mr/sqrt(L2_norm(hMr)) = hMr/L2_norm(hMr) in self.loss of Model. Can you please tell me if this is the right way of projecting hMr, tMr and r to unit l2-ball before visiting each mini-batch?

(b) Can you please suggest any other way of putting the above constraints in TransR model?

Best
Navdeep

Run question.

Hello,

I have do all steps by your readme file.

However, I have encountered an question as follows:

Could you help me solve this problem?

Thank you.

James.

Just replace the tail entity when testing.

Hello,

Thanks for your great job.

I have a question about how to change the corrupted generation when testing.

I just want to replace the tail entities rather than repalce the head entities.

Thanks!

Evaluation data matrix is modified during test?

I'm looking at the scoring function, it seems idx_obj_mat and idx_sub_mat are inited only once but are modified many times in the for-loop and eval_...() functions. Could you please look into this?
Thanks

Question RE: testing protocol

Hello Théo,

Thanks a lot for this piece of code, I've trying to understand the testing protocol from the code but unfortunately I couldn't.

I wonder what is the positive to negative ratio used in testing, and where to find that in the code.

Thanks a lot

Data format

your data in three files train.txt, valid.txt and test.txt Each line is a triple, but not in the format like ubject_entity_id relation_id object_entity_id. How your program works with this data that stored in datasets fb15k.zip and wn18.zip file?

Downhill version 0.3.2 not working anymore

As downhill v0.3.2 depends on the package climate that does not seem to exist anymore, the repo can not be installed anymore.

I have just checked v0.4.0 of downhill and it seems to mainly have changed the logging part, i.e., removed the dependency to climate.

Therefore, commenting out the 4th line in *_run.py (downhill.base.logging.setLevel(20)) seems to do the trick and resolve the issue with #1.

Runtime requirement

Hello Théo,

firstly, I would like to genuinely thank you for the code and repo which are well prepared and straightforward. However, the following situation confuses me:
Given the following settings,
nb entities: 521,358, relations:70 and obs triples:2,395,882 )
Learning rate: 0.5
Max iter: 100
Generated negatives ratio: 10
Batch size: 23958
Starting grid search on: Complex_Logistic_Model
train: 100 mini-batches from callable
downhill: compiling evaluation function
downhill: compiling ADAGRAD optimizer
downhill: setting: rms_regularizer = 1e-08
downhill: setting: patience = 9999999
downhill: setting: validate_every = 9999999
downhill: setting: min_improvement = 0
downhill: setting: max_gradient_norm = 1
downhill: setting: max_gradient_elem = 0
downhill: setting: learning_rate = TensorConstant{0.5}
downhill: setting: momentum = 0
downhill: setting: nesterov = False

, it takes hours to train models. This issue stems from the implementation or am I lacking of a theoretical background here?

Cheers

About Hermitian product

Dear author
Hi im student from South Korea
I have a question about hermitian product which is used in ComplEx's score function.
I would be glad if you explain how the equation (11) was made..

In detail, lets assume we have entity A's embedding vector of dimension 100. (Tensor shape (1, 100))
To treat this embedding vector as a complex vector,
ComplEx divides this 100-dimension embedding into 2 pieces, Re and Im.
(Real, Imaginary each would have 50-dimension in this case)

Let's assume we have another entity B's embedding too.
Then the similarity(= inner product) of these two entities can be computed with hermitian product.
(Let's say <> is a sum of element-wise multiplication>

Similarity(A,B) = <Re(A) , Re(B)> + <-Im(A) , Im(B)>

This is what I understood(Am I right?) and here is the question.
ComplEx's score function calculates score with 3 vectors (2 entities and 1 relation)...
How can I do this?? I mean, How did you made equation (11)??

Uhh..lets say relation between A, B is R.
The paper says we can calculate 3 vector's hermitian product just like the equation (11) does.

<Re(R), Re(A), Re(B)> + <Re(R), Im(A), Im(B)> + <Im(R), Re(A), Im(B)> - <Im(R), Im(A), Re(B)>

How did you made this equation??

I'll be really glad if you tell me how..
Thank you!

corresbonding to the algorithm

where is the Appendix A in the study Complex Embeddings for Simple Link Prediction?

Question about concatenation in extract_sub_scores function.

Hi Théo,

I wonder if this line

complex/efe/evaluation.py

Line 88 in 61fa3f6

 res = Result(res.preds[idxs], res.true_vals[idxs], res.ranks[np.concatenate((idxs,idxs))], res.raw_ranks[np.concatenate((idxs,idxs))]) 

should be "res = Result(res.preds[idxs], res.true_vals[idxs], res.ranks[np.concatenate((idxs, len(idxs)+idxs))], res.raw_ranks[np.concatenate((idxs, len(idxs)+idxs))])". That is, add len(idxs) to the concatenation.

Thanks a lot.

How to run Complex on GPU colab?

Hello,

I am currently trying to run Complex on Google Colab Pro using my data.
I have specified the requirements for GPU on Colab and installed all packages in requirements.txt
The algorithm works fine but doesn't use the Colab GPU and uses CPU instead. As a result, I am having to wait long for my output.

Can you please specify what is wrong while using the command : THEANO_FLAGS="device=gpu" & python fb15k_run.py?
I also noticed that it doesn't use all the 15 cores on my Linux system. How can I make sure it is using all the cores?

ttrouill / complex Goto Github PK

complex's People

Contributors

Stargazers

Watchers

Forkers

complex's Issues

Recommend Projects

Recommend Topics

Recommend Org