malllabiisc / reside Goto Github PK

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

License: Apache License 2.0

Python 31.32% Shell 0.14% CSS 35.74% JavaScript 0.41% HTML 3.02% Less 14.59% SCSS 14.78%

relation-extraction deep-learning graph-convolutional-networks natural-language-processing distant-supervision neural-relation-extraction

reside's Introduction

RESIDE

Improving Distantly-Supervised Neural Relation Extraction using Side Information

Overview of RESIDE

RESIDE first encodes each sentence in the bag by concatenating embeddings (denoted by ⊕) from Bi-GRU and Syntactic GCN for each token, followed by word attention. Then, sentence embedding is concatenated with relation alias information, which comes from the Side Information Acquisition Section, before computing attention over sentences. Finally, bag representation with entity type information is fed to a softmax classifier. Please refer to paper for more details.

Also includes implementation of PCNN, PCNN+ATT, CNN, CNN+ATT, and BGWA models.

Dependencies

Compatible with TensorFlow 1.x and Python 3.x.
Dependencies can be installed using requirements.txt.

Dataset:

We use Riedel NYT and Google IISc Distant Supervision (GIDS) dataset for evaluation.
Datasets in json list format with side information can be downloaded from here: RiedelNYT and GIDS.

The processed version of the datasets can be downloaded from RiedelNYT and GIDS. The structure of the processed input data is as follows.

{
    "voc2id":   {"w1": 0, "w2": 1, ...},
    "type2id":  {"type1": 0, "type2": 1 ...},
    "rel2id":   {"NA": 0, "/location/neighborhood/neighborhood_of": 1, ...}
    "max_pos": 123,
    "train": [
        {
            "X":        [[s1_w1, s1_w2, ...], [s2_w1, s2_w2, ...], ...],
            "Y":        [bag_label],
            "Pos1":     [[s1_p1_1, sent1_p1_2, ...], [s2_p1_1, s2_p1_2, ...], ...],
            "Pos2":     [[s1_p2_1, sent1_p2_2, ...], [s2_p2_1, s2_p2_2, ...], ...],
            "SubPos":   [s1_sub, s2_sub, ...],
            "ObjPos":   [s1_obj, s2_obj, ...],
            "SubType":  [s1_subType, s2_subType, ...],
            "ObjType":  [s1_objType, s2_objType, ...],
            "ProbY":    [[s1_rel_alias1, s1_rel_alias2, ...], [s2_rel_alias1, ... ], ...]
            "DepEdges": [[s1_dep_edges], [s2_dep_edges] ...]
        },
        {}, ...
    ],
    "test":  { same as "train"},
    "valid": { same as "train"},
}

voc2id is the mapping of word to its id
type2id is the maping of entity type to its id.
rel2id is the mapping of relation to its id.
max_pos is the maximum position to consider for positional embeddings.
Each entry of train, test and valid is a bag of sentences, where
- X denotes the sentences in bag as the list of list of word indices.
- Y is the relation expressed by the sentences in the bag.
- Pos1 and Pos2 are position of each word in sentences wrt to target entity 1 and entity 2.
- SubPos and ObjPos contains the position of the target entity 1 and entity 2 in each sentence.
- SubType and ObjType contains the target entity 1 and entity 2 type information obtained from KG.
- ProbY is the relation alias side information (refer paper) for the bag.
- DepEdges is the edgelist of dependency parse for each sentence (required for GCN).

Evaluate pretrained model:

reside.py contains TensorFlow (1.x) based implementation of RESIDE (proposed method).
Download the pretrained model's parameters from RiedelNYT and GIDS (put downloaded folders in checkpoint directory).
Execute evaluate.sh for comparing pretrained RESIDE model against baselines (plots Precision-Recall curve).

Side Information:

Entity Type information for both the datasets is provided in side_info/type_info.zip.
- Entity type information can be used directly in the model.
Relation Alias Information for both the datasets is provided in side_info/relation_alias.zip.
- The preprocessing code for using relation alias information: rel_alias_side_info.py.
- Following figure summarizes the method:

Training from scratch:

Execute setup.sh for downloading GloVe embeddings.

For training RESIDE run:

python reside.py -data data/riedel_processed.pkl -name new_run

The above model needs to be further trained with SGD optimizer for few epochs to match the performance reported in the paper. For that execute
```
python reside.py -name new_run -restore -opt sgd -lr 0.001 -l2 0.0 -epoch 4
```
Finally, run python plot_pr.py -name new_run to get the plot.

Baselines:

The repository also includes code for PCNN, PCNN+ATT, CNN, CNN+ATT, BGWA models.

For training PCNN+ATT:

python pcnnatt.py -data data/riedel_processed.pkl -name new_run -attn # remove -attn for PCNN

Similarly for training CNN+ATT:

python cnnatt.py -data data/riedel_processed.pkl -name new_run # remove -attn for CNN

For training BGWA:

python bgwa.py -data data/riedel_processed.pkl -name new_run

Preprocessing a new dataset:

preproc directory contains code for getting a new dataset in the required format (riedel_processed.pkl) for reside.py.
Get the data in the same format as followed in riedel_raw or gids_raw for Riedel NYT dataset.
Finally, run the script preprocess.sh. make_bags.py is used for generating bags from sentence. generate_pickle.py is for converting the data in the required pickle format.

Running pretrained model on new samples:

The code for running pretrained model on a sample is included in online directory.
A flask based server is also provided. Use python online/server.py to start the server.
- riedel_test_bags.json and other required files can be downloaded from the provided links.

Citation:

Please cite the following paper if you use this code in your work.

@inproceedings{reside2018,
  author = 	"Vashishth, Shikhar and 
  		Joshi, Rishabh and
		Prayaga, Sai Suman and
		Bhattacharyya, Chiranjib and
		Talukdar, Partha",
  title = 	"{RESIDE}: Improving Distantly-Supervised Neural Relation Extraction using Side Information",
  booktitle = 	"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  month = 	oct # "-" # nov,
  address = 	"Brussels, Belgium",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1257--1266",
  url = 	"http://aclweb.org/anthology/D18-1157"
}

For any clarification, comments, or suggestions please create an issue or contact Shikhar.

reside's People

Contributors

Stargazers

Watchers

reside's Issues

question about type mapping

Hi, @svjan5
Thanks for your contributions to this field.
I have see the FIGER mapping file,and basically know how you map the origin type to the 88 types.
However, I'm still confused about the detail of the processing of the mapping. I noticed there are 565 mappings, like /finance/stock_exchange -> /finance/stock_exchange.
I wonder whether you use all of these in your processing or what? Would you tell me the detail of processing?
Thanks a lot.

Best

Urgent help needed with segregation of the code

For my project, I need to separate the code into the following format:
I need to segregate the code into functions as described below. The main method should take the input as a txt file (which would be the bag of sentences) and give the output as a txt file which would contain the entities, correct relations and predicted relations. I have tried printing the actual and predicted relations in the 'predict' function but it shows a one-hot representation I believe. Can you please help me on how do I refactor the reside code as per the structure given below?

class RelationExtraction(abc.ABC):

def __init__(self):
	pass

@abc.abstractmethod
def read_dataset(self, input_file, *args, **kwargs):  
	"""
	Reads a dataset to be used for training
     
     Note: The child file of each member overrides this function to read dataset 
     according to their data format.
     
	Args:
		input_file: Filepath with list of files to be read
	Returns: 
        (optional):Data from file
	"""
	pass


@abc.abstractmethod
def data_preprocess(self,input_data, *args, **kwargs):
	"""
     (Optional): For members who do not need preprocessing. example: .pkl files 
     A common function for a set of data cleaning techniques such as lemmatization, count vectorizer and so forth.
	Args: 
		input_data: Raw data to tokenize
	Returns:
		Formatted data for further use.
	"""
	pass 


@abc.abstractmethod
def tokenize(self, input_data ,ngram_size=None, *args, **kwargs):  
	"""
	Tokenizes dataset using Stanford Core NLP(Server/API)
	Args:
		input_data: str or [str] : data to tokenize
		ngram_size: mention the size of the token combinations, default to None
	Returns:
		tokenized version of data
	"""
	pass


@abc.abstractmethod
def train(self, train_data, *args, **kwargs):  
	"""
	Trains a model on the given training data
    
     Note: The child file of each member overrides this function to train data 
     according to their algorithm.
     
	Args:
		train_data: post-processed data to be trained.
	
    Returns: 
		(Optional) : trained model in applicable formats.
	     None: if the model is stored internally. 
	"""
	pass


@abc.abstractmethod
def predict(self, test_data, entity_1 = None, entity_2= None,  trained_model = None, *args, **kwargs):   
	"""
	Predict on the trained model using test data
	Args:
          entity_1, entity_2: for some models, given an entity, give the relation most suitable 
		test_data: test the model and predict the result.
		trained_model: the trained model from the method - def train().
					  None if store trained model internally.
	Returns:
          probablities: which relation is more probable given entity1, entity2 
              or 
		relation: [tuple], list of tuples. (Eg - Entity 1, Relation, Entity 2) or in other format 
	"""
	pass

@abc.abstractmethod
def evaluate(self, input_data, trained_model = None, *args, **kwargs):
	"""
	Evaluates the result based on the benchmark dataset and the evauation metrics  [Precision,Recall,F1, or others...]
     Args:
         input_data: benchmark dataset/evaluation data
         trained_model: trained model or None if stored internally 
	Returns:
		performance metrics: tuple with (p,r,f1) or similar...
	"""
	pass

@abc.abstractmethod
def save_model(self, file):
	"""
	:param file: Where to save the model - Optional function
	:return:
	"""
	pass

@abc.abstractmethod
def load_model(self, file):
	"""
	:param file: From where to load the model - Optional function
	:return:
	"""
	pass

Wrong link to riedel_raw

Hello,
I wanted to download the two raw datasets riedel and gids. But it seems that the link to riedel raw dataset is giving the pretrained model riedel instead (a.k.a Datasets in json list format with side information can be downloaded from here: RiedelNYT and GIDS. in the README).

How do you get type_info.json?

Hi, guys:
Thanks for the contributions.
Now I'm a little confused about how do you get the type_info.json.
Could you please tell me about it?

Thanks a lot.

questions about GCN when training and testing

Each sentence have their own dependency tree, so the dependency trees are different among sentences. I wonder how to share and learn the parameters of GCNs among different sentences.

Question about the PR curves of the GIDS dataset

Hi,
I wonder to know if the way to obtain the PR curves of the GIDS dataset is as same as the NYT dataset, where we only count the non-NA labels?
And How many data point in the test set you draw in PR-curves?

BTW, I notice that GIDS dataset has dev set. I'd like to know if you use it? Or just use the train set and test set?

Looking forward to your reply.

Best.

Performance without relation alias

Hi,

I think the second bar in Figure 4 and the last bar in Figure 5 both represents the performance without relation alias. But one seems to be 0.41, while the other looks close to 0.39. Why is there a big difference?

Thanks,
Lisheng

Missing cuda support

Hi guys,

I am trying to train a model from the NYT dataset, like in your Readme.md.

But I am always running into errors with cuda support. I wanted to try to run it on a laptop with Ubuntu, but without nvidia graphic card. Can I still train a model on it, or you think a CPU is too weak/slow?

How to most easily remove the need for cuda/gpu support from your code?

Thanks a lot, Gerhard

Flask server bad file descriptor

Hello I receive the following error when running the flask based server, any suggestions on how to resolve this ?

Traceback (most recent call last): File "online/server.py", line 70, in <module> model = RESIDE(args) File "/Users/salahazekour/Documents/Projects/RESIDE-master/online/base_model.py", line 218, in __init__ nn_out, self.accuracy = self.add_model() File "/Users/salahazekour/Documents/Projects/RESIDE-master/online/online_reside.py", line 260, in add_model embed_init = getEmbeddings(model, self.wrd_list, self.p.embed_dim) File "./helper.py", line 30, in getEmbeddings for line in open(embed_loc): OSError: [Errno 9] Bad file descriptor

error when running online/server.py

When I ran the python online/server.py, an error occurred: FileNotFoundError: [Errno 2] No such file or directory: './glove/glove.6B.50d_word2vec.txt'
Where can I download this file? or where it was generated?

can you tell me the auc of BGWA model?

Question about valid split on Riedel dataset

To my knowledge, the original Riedel dataset is only split into train and test partitions, how did you get the valid split?

A question about side_info

@svjan5
Hi, I want to use the entity type information on Riedel NYT, but I don't know what the id in your
/side_info/entity_type/riedel_nyt/type_info.json is. More specific, in "m.027typ": ["/person"], "m.07tzbr": ["/person"], "m.0ds41": ["/person", "/person/artist"] , what the "m.027typ" is? How can I get the mapping between the id to entity?

ps: I have processed the data by myself, so I need to know what the "m.027typ" refer to. I can' t find the code to get the entity id like "m.027typ".

Thank you for your help!

Label decoder for riedel_processed.pkl

I downloaded the provided pickle file, but I realized, that the label values are encoded and the file seems to include no mapping.

Did you define a label mapping somewhere in the code? How can I reconstruct the original labels?

New training gets killed after some time and never finish/converge?

I could generate the results in the paper.
but when I try training the model from the scratch, then the program always get killed at
2019-02-20 07:53:55,385 - [INFO] - E:0 Train Accuracy (27232/292459): 93.46 975.36 new_run 0.0
Killed
I run with GPU and without GPU but nothing changed.
How can I run the model training from the scratch?
Thanks,

About Riedel&GIDS raw dataset

Hi, guys:

Thanks for the awesome contributions.

I have read ur paper and now I am interested in the construction of Riedel & GIDS raw dataset(riedel_train.json, riedel_test.json, gids_train.json, gids_test.json, gids_dev.json).

As mentioned in ur paper, RESIDE used stanford CoreNLP tool to extract nlp features from sentences, and I'm a little confused about how did you get Riedel & GIDS raw dataset(riedel_train.json, riedel_test.json, gids_train.json, gids_test.json, gids_dev.json):

About CoreNLP usage: I found many python wrappers of CoreNLP didn't work well with kbp, entitylink annotators. Did you just start CoreNLP server, then send request to server for preprocessing ?
I've noticed riedel_*.json's format is a little different from gids_*.json. That is, riedel_*.json has a openie key, but gids_*.json's openie key is embedded in corenlp key. To my understanding, for riedel, you used corenlp to extract openie features from one sentence, then used corenlp again to extract depparse features from the same sentence; for gids, you used corenlp to extract all features from one sentence at the same time. Is that right? Also, preprocessing code seems to be not compatible with the format of gids_*.json dataset provided.
for corenlp openie features, did you just activate tokenize,ssplit,pos,lemma,depparse,natlog,openie annotators of corenlp? for corenlp dependency tree features, did you just activate tokenize,ssplit,pos,lemma,parse,depparse,ner,entitylink,coref,kbp annotators of corenlp? Also, as RESIDE seems to only use dependency tree feature extracted from depparse annotator and relation phrase feature extracted from openie phrase, did other features from annotators like kbp, coref, entitylink contribute to the preprocessing of dataset?

Thanks a lot.

Best

How do you get baselines_pr?

I want to compare my model (maybe in PyTorch) with RESIDE and baselines. However, reimplementing every model is tedious. RESIDE and baselines are evaluated in GIDS, a new dataset, so I am curious where the implementations of baselines come from.

In other words, how do you get baselines_pr?

Thanks

cannot reproduce the result in the paper

hello, I run the code, and I cannot reproduce the result announced in paper, I use the default parameter as your code, what's wrong with this issue?

A problem about relation alias: how does the relation alias obtained from PPDB?

Hi @svjan5 , thanks for sharing your data and code. I'm not familiar with PPDB, and i tried to use string matching to get relation alias from PPDB, but only a few paraphrases are obtained for each relation. Could you describe your strategy on getting these relation alias?

problems processing the GIDS dataset

Hello, following your instructions to process the data, I found the following in the case of GIDS:

The format varies between Riedel and GIDS dataset (resulted in a post)
There are entities that do not appear in entity_type (GIDS)
It falls into an infinite loop in one of the parts of generate_pickle.pkl
There are null values (None) in core -> sentences.
Thank you very much in advance.

where is preprocessing code for new dataset?

I have a new dataset for your pretrained model. How should I preprocess my dataset to fit your data format requirement? I cannot find the code.

question about the pre-train glove you used

Hi, @svjan5
Sorry to bother you again.
I have read your paper and noticed you used Glove, is the Glove pre-trained in the NYT dataset? And I also checked your code and I found you update the word embedding during training. I'am confused about the update of the word embedding, why not freeze it during training? Intuitively, freezing the pretrained word embedding is better way to represent the word semantic information because your have trained each words and gain the accurate representation.

Looking forward to your reply.
Best :^)

Issues with ddict in make_bags.py

train_data = ddict(lambda: {'rels': ddict(list)})

is ddict the python package:

https://pypi.org/project/ddict/

or is it some module in the reside repo?

currently cannot find anything to reference it.

Question about the P@N in paper and experiment

Hi, sorry to bother you again. @svjan5

In your paper, the setting: ALL means using all sentences in a bag, and the recall&precision should be consistent with the PR file you provided, but I read your pr file and find out that the P@N is P@100(precision[99]): 0.818, P@200(precision[199]): 0.754, P@300: 0.742(precision[299]) which is a little different with the results in you paper: P@100: 0.840, P@200: 0.785, P@300: 0.756
So, is there something wrong with my operation? Firstly I read you pr file into a list named precision, then sorted the list in descending order, and P@100 = precision[99], P@200 = precision[199], P@300 = precision[299].
I will appreciate it if you can point it out for me. Look forward for your reply.

About how to get precision_recall.pkl

I didn't find it when I did the code test. I hope you can help me.

Error while running Flask server

Hi,

I'm trying to run Flask server with a model trained from scratch (python online/server.py -name new_run), but got the following error:

WARNING: Logging before flag parsing goes to stderr.
W0624 21:49:30.760367 139833379484992 doc2vec.py:75] Slow version of gensim.models.doc2vec is being used
Loading RESIDE model.
2019-06-24 21:51:47,860 - [INFO] - {'dataset': 'riedel', 'gpu': '0', 'wGate': True, 'lstm_dim': 192, 'port': 3535, 'pos_dim': 16, 'type_dim': 50, 'alias_dim': 32, 'de_gcn_dim': 16, 'max_pos': 60, 'de_layers': 1, 'dropout': 0.8, 'rec_dropout': 0.8, 'lr': 0.001, 'l2': 0.001, 'max_epochs': 2, 'batch_size': 32, 'chunk_size': 1000, 'restore': False, 'only_eval': False, 'opt': 'sgd', 'eps': 1e-08, 'name': 'new_run', 'seed': 1234, 'log_dir': './log/', 'config_dir': './config/', 'embed_loc': './glove/glove.6B.50d_word2vec.txt', 'embed_dim': 50, 'rel2alias_file': './side_info/relation_alias/riedel/relation_alias_from_wikidata_ppdb_extended.json', 'type2id_file': './side_info/entity_type/riedel/type_info.json'}
{'alias_dim': 32,
 'batch_size': 32,
 'chunk_size': 1000,
 'config_dir': './config/',
 'dataset': 'riedel',
 'de_gcn_dim': 16,
 'de_layers': 1,
 'dropout': 0.8,
 'embed_dim': 50,
 'embed_loc': './glove/glove.6B.50d_word2vec.txt',
 'eps': 1e-08,
 'gpu': '0',
 'l2': 0.001,
 'log_dir': './log/',
 'lr': 0.001,
 'lstm_dim': 192,
 'max_epochs': 2,
 'max_pos': 60,
 'name': 'new_run',
 'only_eval': False,
 'opt': 'sgd',
 'port': 3535,
 'pos_dim': 16,
 'rec_dropout': 0.8,
 'rel2alias_file': './side_info/relation_alias/riedel/relation_alias_from_wikidata_ppdb_extended.json',
 'restore': False,
 'seed': 1234,
 'type2id_file': './side_info/entity_type/riedel/type_info.json',
 'type_dim': 50,
 'wGate': True}
/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/smart_open/smart_open_lib.py:398: UserWarning: This function is deprecated, use smart_open.open instead. See the migration notes for details: https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst#migrating-to-the-new-open-function
  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
2019-06-24 21:53:10.510101: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-06-24 21:53:10.538389: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2500000000 Hz
2019-06-24 21:53:10.538861: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x11aedfe20 executing computations on platform Host. Devices:
2019-06-24 21:53:10.538891: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-24 21:53:11.875817: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-06-24 21:53:12.459249: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Bi-LSTM/bidirectional_rnn/bw/BW_GRU/candidate/bias not found in checkpoint
Traceback (most recent call last):
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key Bi-LSTM/bidirectional_rnn/bw/BW_GRU/candidate/bias not found in checkpoint
	 [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key Bi-LSTM/bidirectional_rnn/bw/BW_GRU/candidate/bias not found in checkpoint
	 [[node save/RestoreV2 (defined at online/server.py:77) ]]

Original stack trace for 'save/RestoreV2':
  File "online/server.py", line 77, in <module>
    saver		= tf.train.Saver()
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1296, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1614, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 678, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "online/server.py", line 83, in <module>
    saver.restore(sess, save_path)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key Bi-LSTM/bidirectional_rnn/bw/BW_GRU/candidate/bias not found in checkpoint
	 [[node save/RestoreV2 (defined at online/server.py:77) ]]

Original stack trace for 'save/RestoreV2':
  File "online/server.py", line 77, in <module>
    saver		= tf.train.Saver()
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/ec2-user/virtualenvs/RESIDE/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Thanks!

About The evaluation P@N

I read your paper and the paper Lin et al., 2016. I found that you all used p@n as a evaluation . I noticed that p@one refers to predicting by randomly taking a sentence from the bag of each entity pair. However, this may result in different results for each experiment, so I am puzzled by this evaluation. Could you please help me understand it, thank you very much!

Get accuracy at the instance level?

Hello, I would like to know how to obtain the accuracy at the level of each instance. Reviewing the code, the prediction is made on a bag and the measurements used are based on these results. The train set with 400,000 instances becomes 19,000 bags.
Is there a way to get those to instance-level measurements with your code?

Issue about running the model

Hi, I was trying to run the model with the RiedelNYT dataset. When it processes around 4500 sentences, the program gets killed. I have installed all required libraries with correct versions mentioned in the requirement file and the program works with the GIDS dataset successfully. I am just confused about the error. Could you please give any help? The error is shown as follow:
2019-03-22 00:36:38.872704: I tensorflow/core/common_runtime/bfc_allocator.cc:660] Stats:
Limit: 5915344896
InUse: 5914959872
MaxInUse: 5914960384
NumAllocs: 1698569
MaxAllocSize: 167650304

2019-03-22 00:36:38.872817: W tensorflow/core/common_runtime/bfc_allocator.cc:275] ****************************************************************************************************
2019-03-22 00:36:38.872831: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at matmul_op.cc:478 : Resource exhausted: OOM when allocating tensor with shape[2607,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

About GIDS raw dataset

Can you provide the GIDS raw dataset like riedel_raw?
Thank you!

Regarding Freebase Links of Riedel Entities

The entity mentions are annotated using Stanford NER (Finkel et al., 2005) and are linked to Freebase.

In the Reidel dataset Paragraph in paper, you mention that these linkages to Freebase are done, But I dont see these Freebase links in Processed Riedel. Can I know where exactly these Freebase Linkages are present? There are guids already given to various entities in Riedel but that seem to be different than Freebase Linkages.

Problem about the GIDS dataset

Hi,

After checking that the max_len of all sentences in the GIDS dataset is 100, however, there are some values in SubPos and ObjPos larger than 100

So it seems that the max_len 100 is the length after preprocessing instead of the real max length of the sentences, is it?

Thanks

can you provide code show how to generate riedel_processed.pkl ?

I need the code to pretreat the original data, can you provide the code?

Regarding generating relations from raw sentence

Need your input for generating relations out of sentences. I've checked your flask server code and from that, I could relate you are using samples present in 'riedel_test_bags.json'. Instead of using samples from this riedel_test_bags.json, how can I convert sentences in the similar format? or is there any pipeline available which can take care of pre-processing?

Some problems about code

Hello! I encountered such a problem when reproducing the code. During the training, 432 lines of the base.py code reported an error. Sess in "test_loss, test_acc, y, y_pred, logit_list, y_hot = self.predict(sess, data, label)" was not defined. I did not find a good method after debugging. I hope to get your help.

question about how to get the performance in your paper?

Hi, @svjan5
Sorry to bother you, but when I download your code and run it like you said in readme file:

I didn't change any parameter or hyper-parameter， but I got result auc: 0.39, it didn't match your paper auc: 0.416 on Riedel.

So, I wonder if you can tell me how to match the performance reported in the paper? If I need to change some settings for matching the auc and PR-curves that your paper showed?
Thank you very much.

Error while running online/server.py

Hi,
I got following error log while running online/server.py.Can someone point out what is causing this error?

OSError Traceback (most recent call last)
~\RESIDE-master\online\server.py in
68 bag_list.append(bag)
69
---> 70 model = RESIDE(args)
71 config = tf.ConfigProto()
72 config.gpu_options.allow_growth=True

~\RESIDE-master\online\base_model.py in init(self, params)
216 self.add_placeholders()
217
--> 218 nn_out, self.accuracy = self.add_model()
219
220 self.loss = self.add_loss(nn_out)

~\RESIDE-master\online\online_reside.py in add_model(self)
256 with tf.variable_scope('Embeddings') as scope:
257 model = gensim.models.KeyedVectors.load_word2vec_format(self.p.embed_loc, binary=False)
--> 258 embed_init = getPhr2vec(model, self.wrd_list, self.p.embed_dim)
259 _wrd_embeddings = tf.get_variable('embeddings', initializer=embed_init, trainable=True, regularizer=self.regularizer)
260 wrd_pad = tf.zeros([1, self.p.embed_dim])

~\RESIDE-master\helper.py in getEmbeddings(wrd_list, embed_dims, embed_loc)
27 embed_list, wrd2vec = [], {}
28
---> 29 with open(embed_loc, "r+", encoding="utf-8") as f:
30 for line in f.readlines():
31 data = line.strip().split(' ')

OSError: [WinError 6] The handle is invalid

Activation from out-edges

how to judge out-edges and in-edges? not all nodes have out-edges and in-edges in the same time.

Question about the accuracy on test split

to my understanding, self.input_y is the hot representation of relations in each bag,
https://github.com/malllabiisc/RESIDE/blob/master/reside.py#L352
in the training stage, the number of relationships is ensured to be one, then this is turn to be a one-hot representation
but in the testing stage, there are multiple relations, so this is actually a multi-hot representation?

and then you use self.input_y to calculate y_actual by using argmax, this means you only need the relation with min index in this bag?
https://github.com/malllabiisc/RESIDE/blob/master/reside.py#L634

finally, this min index relation is used to calculate the accuracy?
https://github.com/malllabiisc/RESIDE/blob/master/reside.py#L635

so for the testing stage, I can not figure out the meaning of accuracy, could you please explain that?

Error when executing cnnatt.py and pcnnatt.py

I got the following error when running cnnatt.py and pcnnatt.py.

  File "pcnnatt.py", line 332, in <module>
    model.fit(sess)
  File "/home/haojie/projects/RESIDE/base.py", line 498, in fit
    one_100, one_200, one_300 = self.getPscore(sess, self.test_one, label='P@1 Evaluation')
  File "/home/haojie/projects/RESIDE/base.py", line 432, in getPscore
    test_loss, test_acc, y, y_pred, logit_list, y_hot = self.predict(sess, data, label)
  File "/home/haojie/projects/RESIDE/base.py", line 326, in predict
    for step, batch in enumerate(self.getBatches(data, shuffle)):
  File "pcnnatt.py", line 55, in getBatches
    batch['PartPos']	+= get_sent_part(bag['SubPos'], bag['ObjPos'], bag['X'])
KeyError: 'SubPos'

I find that in base.py, when creating the p_one and p_two lists, neitherSubPos nor ObjPos are added to the new dictionary.

p_one.append({
	'X':    	[bag['X'][indx[0]]],
	'Pos1': 	[bag['Pos1'][indx[0]]],
	'Pos2': 	[bag['Pos2'][indx[0]]],
	'DepEdges': 	[bag['DepEdges'][indx[0]]],
	'ProbY': 	[bag['ProbY'][indx[0]]],
	'Y':    	bag['Y'],
	'SubType':	bag['SubType'],
	'ObjType':	bag['ObjType']
})

Could you please fix this bug? Thanks!

Flask Server Prediction and Online_Reside.py

Hi! I have been trying to use RESIDE's online code, but I keep on running into an issue where the server just displays NA for the predicted relation. There are no errors in the flask server. How can I get RESIDE to predict the relation?

Another questions is what is the output of Online_Reside.py. I have ran the program successfully, but there does not seem to be anything outputted, and diving into the code has not been much help

what is the auc of the PR curve of your model?

hello, what is the auc of the PR curve of your model? I need this data to use in my paper.

Problem about running pcnnatt

Traceback (most recent call last):
File "/home/luoyang/reside/pcnnatt.py", line 347, in
model.fit(sess)
File "/home/luoyang/reside/base.py", line 479, in fit
train_loss, train_acc = self.run_epoch(sess, self.data['train'], epoch)
File "/home/luoyang/reside/base.py", line 369, in run_epoch
for step, batch in enumerate(self.getBatches(data, shuffle)):
File "/home/luoyang/reside/pcnnatt.py", line 54, in getBatches
batch['PartPos'] += get_sent_part(bag['Pos1'], bag['Pos2'], bag['X'])
File "/home/luoyang/reside/pcnnatt.py", line 38, in get_sent_part
if pos1 == pos2 or pos1 <= 0 or pos2 >= len(sent) - 1:
TypeError: '<=' not supported between instances of 'list' and 'int'

The same error occured when running pcnnatt and cnnatt.

Regarding Setting up of server

Traceback (most recent call last):
  File "online/server.py", line 74, in <module>
    model  = RESIDE(args)
  File "/home/reidel_nyt/RESIDE/online/base_model.py", line 214, in __init__
    self.load_data()
  File "/home/reidel_nyt/RESIDE/online/base_model.py", line 177, in load_data
    self.voc2id	   	= json.load(open('./data/{}_voc2id.json'.format(self.p.dataset)))
FileNotFoundError: [Errno 2] No such file or directory: './data/riedel_voc2id.json'

Where can I get riedel_voc2id.json from? Its nowhere mentioned while setting up the server?

how to interpret DepEdges

how to interpret DepEdges? and how to construct it

OOM appears when training the model

Hi! I have a problom like the issue #18 , and I tried with reduced batch size and dimensions as you answered. I set the batch equal one but it also had same problom, OOM. My graphics card is Nvidia GeForce RTX 2080ti (11G), why is the memory superimposed when the program is running? Moreover, when I set the lstm_dim equal 64 and the batch equal 32, I got result auc: 0.397, it didn't match your paper auc: 0.416 on Riedel. Could you please give any help? The error is shown as follow:

NameError: name 'sess' is not defined

WARNING:tensorflow:From reside.py:528: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2020-01-01 16:09:15,529 - [INFO] - {'dataset': 'data/riedel_processed.pkl', 'gpu': '0', 'wGate': True, 'lstm_dim': 192, 'pos_dim': 16, 'type_dim': 50, 'alias_dim': 32, 'de_gcn_dim': 16, 'de_layers': 1, 'dropout': 0.8, 'rec_dropout': 0.8, 'lr': 0.001, 'l2': 0.001, 'max_epochs': 2, 'batch_size': 32, 'chunk_size': 1000, 'restore': False, 'only_eval': False, 'opt': 'adam', 'eps': 1e-08, 'name': 'new_run', 'seed': 1234, 'log_dir': './log/', 'config_dir': './config/', 'embed_loc': './glove/glove.6B.50d.txt', 'embed_dim': 50}
{'alias_dim': 32,
'batch_size': 32,
'chunk_size': 1000,
'config_dir': './config/',
'dataset': 'data/riedel_processed.pkl',
'de_gcn_dim': 16,
'de_layers': 1,
'dropout': 0.8,
'embed_dim': 50,
'embed_loc': './glove/glove.6B.50d.txt',
'eps': 1e-08,
'gpu': '0',
'l2': 0.001,
'log_dir': './log/',
'lr': 0.001,
'lstm_dim': 192,
'max_epochs': 2,
'name': 'new_run',
'only_eval': False,
'opt': 'adam',
'pos_dim': 16,
'rec_dropout': 0.8,
'restore': False,
'seed': 1234,
'type_dim': 50,
'wGate': True}
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

2020-01-01 16:09:36,994 - [INFO] - Document count [train]: 292459, [test]: 96676
WARNING:tensorflow:From reside.py:58: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From reside.py:82: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING:tensorflow:From reside.py:360: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From reside.py:362: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From reside.py:383: GRUCell.init (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:From reside.py:385: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use keras.layers.Bidirectional(keras.layers.RNN(cell)), which is equivalent to this API
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use keras.layers.RNN(cell), which is equivalent to this API
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use layer.add_weight method instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From reside.py:283: The name tf.sparse_transpose is deprecated. Please use tf.sparse.transpose instead.

WARNING:tensorflow:From reside.py:284: The name tf.sparse_tensor_dense_matmul is deprecated. Please use tf.sparse.sparse_dense_matmul instead.

WARNING:tensorflow:From reside.py:287: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING:tensorflow:From reside.py:458: The name tf.nn.xw_plus_b is deprecated. Please use tf.compat.v1.nn.xw_plus_b instead.

WARNING:tensorflow:From /content/RESIDE/base.py:277: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From /content/RESIDE/base.py:277: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From /content/RESIDE/base.py:294: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /content/RESIDE/base.py:39: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /content/RESIDE/base.py:40: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From reside.py:535: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From reside.py:537: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-01-01 16:09:45.004854: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2020-01-01 16:09:45.112223: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000179999 Hz
2020-01-01 16:09:45.115414: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2ec5480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-01 16:09:45.115452: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-01-01 16:09:45.148794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-01 16:09:45.317052: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-01 16:09:45.317793: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2ec52c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-01-01 16:09:45.317823: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2020-01-01 16:09:45.318236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-01-01 16:09:45.318739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2020-01-01 16:09:45.339971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

...

2020-01-01 17:37:50,009 - [INFO] - E:1 Train Accuracy (163552/292459): 95.643 0.14488 new_run 94.474
2020-01-01 17:37:53,422 - [INFO] - E:1 Train Accuracy (163872/292459): 95.644 0.14483 new_run 94.474
2020-01-01 17:37:57,689 - [INFO] - E:1 Train Accuracy (164192/292459): 95.644 0.14484 new_run 94.474

...

2020-01-01 18:01:56,983 - [INFO] - E:1 Train Accuracy (287712/292459): 95.664 0.13823 new_run 94.474
2020-01-01 18:02:00,141 - [INFO] - E:1 Train Accuracy (288032/292459): 95.664 0.13822 new_run 94.474
2020-01-01 18:02:06,500 - [INFO] - E:1 Train Accuracy (288352/292459): 95.665 0.13823 new_run 94.474
2020-01-01 18:02:09,965 - [INFO] - E:1 Train Accuracy (288672/292459): 95.663 0.13825 new_run 94.474
2020-01-01 18:02:13,536 - [INFO] - E:1 Train Accuracy (288992/292459): 95.663 0.13824 new_run 94.474
2020-01-01 18:02:17,056 - [INFO] - E:1 Train Accuracy (289312/292459): 95.662 0.13824 new_run 94.474

...

2020-01-01 18:02:53,866 - [INFO] - E:1 Train Accuracy (292192/292459): 95.658 0.13819 new_run 94.474
2020-01-01 18:02:56,696 - [INFO] - Training Loss:0.1381186544895172, Accuracy: 95.6602156162262
2020-01-01 18:02:56,697 - [INFO] - [Epoch 1]: Training Loss: 0.13812, Training Acc: 95.66

INFO:tensorflow:Restoring parameters from checkpoints/new_run/best_model
2020-01-01 18:02:57,883 - [INFO] - Evaluating on Test (32/96676): 96.875 0.085916 new_run
2020-01-01 18:03:17,846 - [INFO] - Evaluating on Test (3232/96676): 97.556 0.081206 new_run
2020-01-01 18:03:37,863 - [INFO] - Evaluating on Test (6432/96676): 97.683 0.079807 new_run
2020-01-01 18:03:57,750 - [INFO] - Evaluating on Test (9632/96676): 97.799 0.077063 new_run
2020-01-01 18:04:18,123 - [INFO] - Evaluating on Test (12832/96676): 97.81 0.080971 new_run
2020-01-01 18:04:38,361 - [INFO] - Evaluating on Test (16032/96676): 97.754 0.084688 new_run
2020-01-01 18:04:58,311 - [INFO] - Evaluating on Test (19232/96676): 97.754 0.08457 new_run
2020-01-01 18:05:17,626 - [INFO] - Evaluating on Test (22432/96676): 97.762 0.084693 new_run
2020-01-01 18:05:36,826 - [INFO] - Evaluating on Test (25632/96676): 97.749 0.085677 new_run
2020-01-01 18:05:55,400 - [INFO] - Evaluating on Test (28832/96676): 97.739 0.084639 new_run
2020-01-01 18:06:14,745 - [INFO] - Evaluating on Test (32032/96676): 97.746 0.083909 new_run
2020-01-01 18:06:34,200 - [INFO] - Evaluating on Test (35232/96676): 97.775 0.083634 new_run
2020-01-01 18:06:54,216 - [INFO] - Evaluating on Test (38432/96676): 97.77 0.083335 new_run
2020-01-01 18:07:14,300 - [INFO] - Evaluating on Test (41632/96676): 97.771 0.082361 new_run
2020-01-01 18:07:34,194 - [INFO] - Evaluating on Test (44832/96676): 97.769 0.082067 new_run
2020-01-01 18:07:52,966 - [INFO] - Evaluating on Test (48032/96676): 97.772 0.081363 new_run
2020-01-01 18:08:12,089 - [INFO] - Evaluating on Test (51232/96676): 97.787 0.081485 new_run
2020-01-01 18:08:32,589 - [INFO] - Evaluating on Test (54432/96676): 97.766 0.082547 new_run
2020-01-01 18:08:52,092 - [INFO] - Evaluating on Test (57632/96676): 97.756 0.082637 new_run
2020-01-01 18:09:11,821 - [INFO] - Evaluating on Test (60832/96676): 97.756 0.083055 new_run
2020-01-01 18:09:31,836 - [INFO] - Evaluating on Test (64032/96676): 97.759 0.082874 new_run
2020-01-01 18:09:51,079 - [INFO] - Evaluating on Test (67232/96676): 97.764 0.082702 new_run
2020-01-01 18:10:10,474 - [INFO] - Evaluating on Test (70432/96676): 97.769 0.082184 new_run
2020-01-01 18:10:30,843 - [INFO] - Evaluating on Test (73632/96676): 97.758 0.082614 new_run
2020-01-01 18:10:50,781 - [INFO] - Evaluating on Test (76832/96676): 97.748 0.082557 new_run
2020-01-01 18:11:09,929 - [INFO] - Evaluating on Test (80032/96676): 97.75 0.082459 new_run
2020-01-01 18:11:29,581 - [INFO] - Evaluating on Test (83232/96676): 97.75 0.082377 new_run
2020-01-01 18:11:48,420 - [INFO] - Evaluating on Test (86432/96676): 97.743 0.082698 new_run
2020-01-01 18:12:08,105 - [INFO] - Evaluating on Test (89632/96676): 97.747 0.082548 new_run
2020-01-01 18:12:27,132 - [INFO] - Evaluating on Test (92832/96676): 97.758 0.082317 new_run
2020-01-01 18:12:47,005 - [INFO] - Evaluating on Test (96032/96676): 97.762 0.082606 new_run
2020-01-01 18:12:50,929 - [INFO] - Test Accuracy: 1.0
2020-01-01 18:12:53,532 - [INFO] - Final results: Prec:0.39756866276273045 | Rec:0.5014196479244667 | F1:0.44349572585304514 | Area:0.3837338804383346
Traceback (most recent call last):
File "reside.py", line 539, in
model.fit(sess)
File "/content/RESIDE/base.py", line 498, in fit
one_100, one_200, one_300 = self.getPscore(self.test_one, label='P@1 Evaluation')
File "/content/RESIDE/base.py", line 432, in getPscore
test_loss, test_acc, y, y_pred, logit_list, y_hot = self.predict(sess, data, label)
NameError: name 'sess' is not defined

Hello, can you provide the code to get the entity type? Thank you

Could you plz release the id2entitytype dict corresponding to the provided riedel_preprocessed.pkl?

Hi,

We are trying to do some comparison based on the RESIDE model, and it turns out that we need to align the entity pairs in your dataset with that in our version(due to a random permutation, the index are different but the total items are in total the same).

We wonder whether you could release the mapping from line index (e.g. 0, 1, 2, ...) in the test file in 'riedel_preprocessed.pkl' to the entity pairs id (e.g. ("m.0chrx", "m.01_d4"), ("m.0abcds", "m.01_-jfew"), ("m.0chrx", "m.01ojlll")).

Or alternatively, could you release the 'riedel_test_bags.json' and 'riedel_train_bags.json' files, so that we can generate the map on our own? We have tried running the preprocessing codes from scratch, but maybe due to randomness, we could not reproduce the result exactly.

Thank you!