facebookresearch / lama Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 181.0 482 KB

LAnguage Model Analysis

License: Other

Shell 4.60% Python 95.40%

lama's People

Contributors

Stargazers

Watchers

Forkers

alexholdenmiller zpppy zorrock shenyong123 chaoyue729 lql0716 entn-at krzynio trendingtechnology zhangjiekui azharmithani yuxiang-wu debasishmaji vivekanon ricklentz intuitionmachine hypnoai darrengarvey codeaudit cwenner kaggledevs panda0881 ankur-gos aprilyapingzhang leiloong dragomirradev mmarius almoslmi james-assiene jzbjyb ikuyamada albertvillanova embeddedsamurai barseghyanartur tomarraj008 merajat lukehired zhangdongxu alphadl autoave anubrata npoe xiaming9880 arita37 tkngoutham ruizewang yyll008 kp-forks meinwerk norakassner taylorshin silencio94 gabrielilharco gdpan919 sanjibnarzary akariasai thesamuel subbaraomanchala mbrhd liweijiang chia-hsuan-lee awoziji seanliu96 tombombadil95 lucascofano giusgn mewadashreya sunyilgdx xennygrimmato xingluxi ym-han wgc20 wangdongde xiaoya-li minstar leox1v sunbear616 zhuchen03 morioka nijeah guyrosin lian6605 createrll casually-pylearner iyuele89 xrosliang yhiyori hayul7805 afiqmuzaffar cnjelita songweige danny911kr nativeatom yup111 sujiakuan matteomedioli pharnisch zjh-819 benjaminliu1998 lianlii

lama's Issues

Horizon Zero Dawn Nexusmods

In Nexusmods Of Horizon Zero Dawn Mods, Whenever Fireclaw Does It's Ground Lava Fountain Attack It Crashes The Game, Plz Fix It As Soon As Possible THANKS 👍🔥❤️
WE ARE WAITING!

download the pre-trained models

Good morning,

I start by saying that I’m new on working on macOS because I tried to install the requirements on windows 10 but I’ve got many issues installing manually all the prerequisites, so I’m trying to switch.

When I try to download the pre-trained models (in particular "BERT BASE CASED”) I get this error:

./download_models.sh: line 12: realpath: command not found

Ok, I see. The provided `common_vocab_cased.txt` does have `1990s` and `1970s `. So I think the problem probably lies in the `vocab_intersection.py` ?

    Ok, I see. The provided `common_vocab_cased.txt` does have `1990s` and `1970s `. So I think the problem probably lies in the `vocab_intersection.py` ?

Originally posted by @Hannibal046 in #51 (comment)

RuntimeError: invalid argument 2: k not in range for dimension.

Hi,
I followed the requirements to install the environment.
But when I run the scripts/run_experiments.py, there is a bug in __max_probs_values_indices:
RuntimeError: invalid argument 2: k not in range for dimension at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1190

I checked the log_prob is 28 but k is 10000, which exceeds.

RoBERTa mean P@1 for LAMA-T-REx and LAMA-Google-RE?

Hello,
Have you ever evaluated RoBERTa on LAMA-T-REx and LAMA-Google-RE?

RoBERTa evaluation on LAMA

Thank you for open-sourcing and maintaining such a great project! :)

I have an issue related to RoBERTa evaluation on LAMA.
To evaluate the RoBERTa performance on LAMA, I downloaded RoBERTa {base,large} checkpoints from the fairseq repository. Then I slightly modify run_experiment.py to add RoBERTa to the target LMs as follows:

LMs = [
    {
        "lm": "roberta",
        "label": "roberta",
        "models_names": ["roberta"],
        "roberta_model_name": "model.pt",
        "roberta_model_dir": "pre-trained_language_models/roberta.large/",
        "roberta_vocab_name": "dict.txt"
    }, ...
]

Although the RoBERTa large model is loaded correctly, many warnings show up saying that many words are not included vocab_subset in model.

2020-03-12 17:55:33,651 - LAMA - WARNING - word wingspan from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word woken from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word wooded from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word wrestled from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word wrinkled from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yanked from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yd from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yellowish from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yer from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word zu from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word zur from vocab_subset not in model vocabulary!

Then, it was terminated by the error shown below:

Traceback (most recent call last):
  File "scripts/run_experiments.py", line 221, in <module>
    run_all_LMs(parameters)
  File "scripts/run_experiments.py", line 214, in run_all_LMs
    run_experiments(*parameters, input_param=ip, use_negated_probes=False)
  File "scripts/run_experiments.py", line 135, in run_experiments
    Precision1 = run_evaluation(args, shuffle_data=False, model=model)
  File "/home/akari/projects/LAMA/scripts/batch_eval_KB_completion.py", line 390, in main
    model, data, vocab_subset, args.max_sentence_length, args.template
  File "/home/akari/projects/LAMA/scripts/batch_eval_KB_completion.py", line 233, in filter_samples
    if obj_label_ids:
RuntimeError: bool value of Tensor with more than one value is ambiguous

Do you have any insights into why many words are not found in vocab_subset and how we could solve the RuntimeError.

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow?

Hi,

when I run ./download_models.sh., I get the following exception:

Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 158, in <module>
    main()
  File "lama/vocab_intersection.py", line 152, in main
    __vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
  File "lama/vocab_intersection.py", line 97, in __vocab_intersection
    model = build_model_by_name(args.lm, args)
  File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
    self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
    model = cls(config, *inputs, **kwargs)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
    self.transformer = TransfoXLModel(config)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
    div_val=config.div_val)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
    self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
    self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188

I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :

      RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]

But in #32 there is no recommendation how to fix this dimension error.

All the packages from requirements.txt are installed correctly, but I have overrides==3.1.0 instead of overrides==6.1.0 as the import "from allennlp.modules.elmo import _ElmoBiLm" in elmo_connector.py didn't work, it worked only after changing to 3.1.0. I also tried to skip the building vocab-part and downloaded the provided common_vocab.txts from the README, but the same Torch: invalid memory size -- maybe an overflow?-error occurs when running run_experiments.py .

Does anybody have an idea how to fix this?

TREx directory does not exist in the dataset

run_experiments.py uses TREx directory which does not exist in the publicized dataset (data.zip). Also, the dataset contains TREx_alpaca directory. Is it OK to change the directory name in this line to TREx_alpaca to run experiments?

Sentence selection for T-REx and GoogleRE

Hi there,

How were the sentences (evidences for T-REx and considered_sentences for GoogleRE) selected for T-REx and GoogleRE? Was it done manually or using some script?

Thanks,

ModuleNotFoundError: No module named 'lama'

I'm getting this error when trying to run ./download_models.sh

File "/home/Downloads/LAMA-main/lama/vocab_intersection.py", line 7, in
from lama.modules import build_model_by_name
ModuleNotFoundError: No module named 'lama'

The bert-base-cased downloaded from s3 link has something wrong

The tokenizer.vocab_size is 30522, however it should be 28996 which is just the same with config.vocab_size.
I suggust that we can just download the model from hugging face.

ModuleNotFoundError: No module named 'llama'

what happened?

Dataset size mismatch

Hi, thank you for open-sourcing this great project!

I looked into the datasets provided in this repository (https://dl.fbaipublicfiles.com/LAMA/data.zip) and some of their sizes do not match with the sizes described in the paper.

ConceptNet: 11458 (paper) vs 29774 (dataset)
Google-RE death-place: 765 (paper) vs 766 (dataset)

Also for the TREx dataset, could you explain how the sentences are selected from the 'evidences' in each line of jsonl file? There seems to be multiple 'masked_sentence' in 'evidences'.

Thank you.

ma-label is not getting displayed in next line if mat-label is too lengthy in angular

How to make label to come in next line. If labe is too lengthy in mat-form-field in angular

If anybody knows this..please give the solution

ModuleNotFoundError: No module named 'lama'

I got the following error while executing ./download_dataset.sh for the first time. The same problem appears when I tried to execute the same script again.

Building common vocab                        
Traceback (most recent call last):           
  File "lama/vocab_intersection.py", line 7, 
in <module>                                  
    from lama.modules import build_model_by_n
ame                                          
ModuleNotFoundError: No module named 'lama'

mac os installation issue

Hi,

I'm on Mojave, and using

CFLAGS="-Wno-deprecated-declarations -std=c++11 -stdlib=libc++" pip install --editable .

gives me

error: invalid argument '-std=c++11' not allowed with 'C'
error: command 'gcc' failed with exit status 1

Failed building wheel for cytoolz

Any solution please?

Missing relation data for LAMA dataset

I tried to replicate the results but I ran into a problem. There are some T-REx relations like P166, P69, P54 that are mentioned in the relations file but don't have data files in the downloadable dataset. Were the results in the paper achieved without such relations or are these relations missing in the dataset?

No Matching distribution found for torch

Hi everyone,

I want to install Lama on my WIN10 laptop and this error keep interrupt:

Can you help me ?

Thanks

Mathieu

Whether the project contains Basline code(RE_o) with LAMA probe

I do some works following the reference paper. I have reproduced the results of LMs. If I want to evaluate the baseline with the LAMA probe, I wonder if there is Baseline script/code in the project, or we need to rewrite this part and evaluate with the LAMA probe data.

Multiple Valid Objects

The "Language Models as Knowledge Bases" paper, in Section 4.4, mentions that the evaluation deals with multiple valid objects for the same subject and relation pair. Specifically, valid objects other than the one which is being tested are removed from the ranked list of answers before computing the metrics. However, the TREx data released in this repository only includes one object per tested fact, even for queries where multiple valid answers do exist (based on my browsing of Wikidata). So, does the LAMA evaluation account of multiple valid objects? If yes, how does it do that given that the multiple objects are not in the data.

Could you provide your common_vocab_*.txt

I have tried running the downlaod_models.sh script. However, my machine can't deal with transformerXL. Could you release your processed common_vocab_*.txt?

AttributeError: 'NoneType' object has no attribute 'ids_to_tokens'

Hello,
I followed the tutorial, but when I do "./download_models.sh", I encountered with this problem:

$ ./download_models.sh
BERT BASE LOWERCASED
BERT BASE CASED
Building common vocab
...
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 160, in <module>
    main()
  File "lama/vocab_intersection.py", line 154, in main
    __vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
  File "lama/vocab_intersection.py", line 99, in __vocab_intersection
    model = build_model_by_name(args.lm, args)
  File "/LAMA-master/lama/modules/__init__.py", line 33, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "/LAMA-master/lama/modules/bert_connector.py", line 79, in __init__
    self.vocab = list(self.tokenizer.ids_to_tokens.values())
AttributeError: 'NoneType' object has no attribute 'ids_to_tokens'

Actually, when I try to run the experiments, I get:

$ python scripts/run_experiments.py
Model name 'pre-trained_language_models/transformerxl/transfo-xl-wt103/' was not found in model name list (transfo-xl-wt103). We assumed 'pre-trained_language_models/transformerxl/transfo-xl-wt103/' was a path or url but couldn't find files pre-trained_language_models/transformerxl/transfo-xl-wt103/vocab.bin at this path or url.
Traceback (most recent call last):
  File "scripts/run_experiments.py", line 214, in <module>
    run_all_LMs(parameters)
  File "scripts/run_experiments.py", line 207, in run_all_LMs
    run_experiments(*parameters, input_param=ip, use_negated_probes=False)
  File "scripts/run_experiments.py", line 125, in run_experiments
    model = build_model_by_name(model_type_name, args)
  File "LAMA-master/lama/modules/__init__.py", line 33, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "LAMA-master/lama/modules/transformerxl_connector.py", line 33, in __init__
    self.vocab = list(self.tokenizer.idx2sym)
AttributeError: 'NoneType' object has no attribute 'idx2sym'

I wonder if this is the cause: for simplicity, I only downloaded the BERT BASE CASED/UNCASED two models, so I commented out all the other models in the download_model.sh.

Any help will be appreciated, thanks a lot!

Negated templates results are not shown even though use_negated_probes=True

Hi,

I set use_negated_probes=True in run_experiments.py, but the written results are still for positive templates. Are the negated template results being written somewhere?

Thanks

Reproducing ELMo base results on Google-RE

Hi there,

Thank you for releasing the code to this wonderful project! I'm trying to reproduce the results on Google-RE (birth-date and death-place relations) but wasn't able to match the reported numbers in the paper. This is the output I get for ELMo original:

Birth-Date Google-RE
all_samples: 1825 list_of_results: 1825 global MRR: 0.00018831351372720835 global Precision at 10: 0.0 global Precision at 1: 0.0 P@1 : 0.0

Death-Place Google-RE
all_samples: 765 list_of_results: 765 global MRR: 3.727018365172404e-06 global Precision at 10: 0.0 global Precision at 1: 0.0 P@1 : 0.0

It seems like the paper reports Mean Precision@1 scores of 0.1 and 0.3 respectively for the above two relations.

Am I using the repository incorrectly? Please let me know how I can reproduce the results :)

Thank you in advance!

ConceptNet in Negated-LAMA

Hi,

In the paper, the ConceptNet in Negated-LAMA constains 2996 items.

But the when the ConceptNet in https://dl.fbaipublicfiles.com/LAMA/negated_data.tar.gz constains 20000+ items without filter.

Best,
Deming

Error in running ./download_models.sh

Hi,

When I execute ./download_models.sh, encounter the following problem

lowercase models
OpenAI GPT
BERT BASE LOWERCASED
BERT LARGE LOWERCASED
cased models
Transformer XL
ELMO ORIGINAL 5.5B
ELMO ORIGINAL
BERT BASE CASED
BERT LARGE CASED
Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 7, in <module>
    from lama.modules import build_model_by_name
  File "/data/datasets/yuke/LAMA/lama/modules/__init__.py", line 8, in <module>
    from .elmo_connector import Elmo
  File "/data/datasets/yuke/LAMA/lama/modules/elmo_connector.py", line 10, in <module>
    from allennlp.modules.elmo import _ElmoBiLm #, Elmo as AllenNLP_Elmo
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/__init__.py", line 9, in <module>
    from allennlp.modules.elmo import Elmo
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 20, in <module>
    from allennlp.modules.highway import Highway
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 12, in <module>
    class Highway(torch.nn.Module):
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 49, in Highway
    def forward(self, inputs: torch.Tensor) -> torch.Tensor:  # pylint: disable=arguments-differ
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 83, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 170, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 189, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 113, in ensure_signature_is_compatible
    method_name,
  File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 220, in ensure_all_positional_args_defined_in_sub
    raise TypeError(f"{method_name}: `{super_param.name}` must be present")
TypeError: Highway.forward: `input` must be present

Is there any solution to this? Thanks!

Mathematical expression corpus

Could LAMA be augmented to accept mathematical corpi in order to learn a mathematical knowledge base?

Windows, Ubuntu, Allennlp

Hi There - I could use your help on a few questions I have:

I am on Win10 and running ubuntu with Linux dev. I was wondering why i am getting the following error when I run "python scripts/run_experiments.py":

Traceback (most recent call last):
File "scripts/run_experiments.py", line 8, in
from batch_eval_KB_completion import main as run_evaluation
File "/home/USER/LAMA/scripts/batch_eval_KB_completion.py", line 7, in
from lama.modules import build_model_by_name
File "build/bdist.linux-x86_64/egg/lama/modules/init.py", line 8, in
File "build/bdist.linux-x86_64/egg/lama/modules/elmo_connector.py", line 10, in
ImportError: No module named allennlp.modules.elmo

I have reinstalled Allennlp several times and upgraded. Same error.

Also, i was wondering if I can run LAMA on windows without using the linux completely. Thank you!

How to load a huggingface RoBERTa checkpoint?

Hello,
Thanks for your great job.
I have one problem. I saw in the project, the "roberta_connector.py" load a FAIRSEQ RoBERTa, but I want to load huggingface RoBERTa. What should I do? Could you give me some advice?
Thanks a lot.

A tutorial on how to hack on LAMA with MacOS

Any chance you can add a detailed tutorial on how to work with LAMA on MacOs (M1 or M2 chips)?
There are many developers that are using Macs and it will be helpful to know how to start hacking on this powerful open-source project from Mac

Use a fine-tuned model

Do you have examples for using a fine-tuned model to do the evaluation? I have saved the checkpoint but just by editing LM list in the run_experiments.py the code failed......

Hi,

when I run ./download_models.sh., I get the following exception:

Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 158, in <module>
    main()
  File "lama/vocab_intersection.py", line 152, in main
    __vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
  File "lama/vocab_intersection.py", line 97, in __vocab_intersection
    model = build_model_by_name(args.lm, args)
  File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
    self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
    model = cls(config, *inputs, **kwargs)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
    self.transformer = TransfoXLModel(config)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
    div_val=config.div_val)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
    self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
    self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188

I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :

      RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]

But in #32 there is no recommendation how to fix this dimension error.

Does anybody have an idea how to fix this?

Originally posted by @blrtvs in #47

Duplicates in data?

Hi,

The following is most likely a misunderstanding on my part but I notice that there are many duplicates and pseudo-duplicates in the jsonl files.

For instance, this line in lama/TREx/P17.jsonl:

{
	"uuid": "df10f035-6269-4cdf-88df-26395e0dc3b4",
	"obj_uri": "Q16",
	"obj_label": "Canada",
	"sub_uri": "Q7517499",
	"sub_label": "Simcoe Composite School",
	"predicate_id": "P17",
	"evidences": [{
		"sub_surface": "Simcoe Composite School",
		"obj_surface": "Canada",
		"masked_sentence": "Simcoe Composite School is a high school in Simcoe, Ontario, [MASK]."
	}, {
		"sub_surface": "Simcoe Composite School",
		"obj_surface": "Canada",
		"masked_sentence": "Simcoe Composite School is a high school in Simcoe, Ontario, [MASK]."
	}]
}

has two evidences both of which are the same. This is not always the case, i.e., in many other cases the evidences are different sentences.

Further, in the conceptnet corpora, apparently every UUID appears twice. As an example, here are two instances with the same UUID:

{
	"sub": "alive",
	"obj": "think",
	"pred": "HasSubevent",
	"masked_sentences": ["One of the things you do when you are alive is [MASK]."],
	"obj_label": "think",
	"uuid": "d4f11631dde8a43beda613ec845ff7d1"
}

and

{
	"pred": "HasSubevent",
	"masked_sentences": ["One of the things you do when you are alive is [MASK]."],
	"obj_label": "think",
	"uuid": "d4f11631dde8a43beda613ec845ff7d1",
	"sub_label": "alive"
}

Here, in the second time the instance does not have the following fields sub, obj but otherwise seems to remain unchanged.

So, based on this, my question is:

Are the duplicates intentional? For instance, when computing metrics of my model over the probe, am I to treat the task as-is and if need be, make predictions twice over the same instance?

Alternatively, I could easily root out the duplicates when processing the files? Do I do that instead? Have others done that?

I know for a fact that LAMA on HuggingFace datasets (https://huggingface.co/datasets/lama) contains these duplicates.

import allennlp.modules.highway fails

I just installed LAMA without error on Ubuntu 20.
The run_experiment script immediately crashes because of the following import:
when I open python and just run

import allennlp.modules

I've got this error:

>>> import allennlp.modules
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/__init__.py", line 9, in <module>
    from allennlp.modules.elmo import Elmo
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 20, in <module>
    from allennlp.modules.highway import Highway
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 12, in <module>
    class Highway(torch.nn.Module):
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 49, in Highway
    def forward(self, inputs: torch.Tensor) -> torch.Tensor:  # pylint: disable=arguments-differ
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 88, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 114, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 135, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 104, in ensure_signature_is_compatible
    method_name,
  File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 211, in ensure_all_positional_args_defined_in_sub
    raise TypeError(f"{method_name}: `{super_param.name}` must be present")
TypeError: Highway.forward: `input` must be present

Thank you !

Code to create the data

Hi all,

Is the code to create the sentences from KBs available? e.g. the code with the templating and the code to generate the sentences with [MASK]?

Thanks,
Arian

RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]

For anyone else encountering this error when running the vocab_intersection.py, it looks like there is an issue with the config.json in transfo-xl-wt103 directory.

Update the config.json to the JSON in the link below and it should resolve the error.
https://huggingface.co/transfo-xl/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

Hope this helps!

KG evaluation datasets?

Nice work on the EMNLP paper "Language Models as Knowledge Bases?", and thanks for making the code available.

Are the datasets available to reproduce the results from the paper, namely Table 2?

evaluating customized model (similar to BERT)

Is there an option to load and evaluate a custom pre-trained BERT model?

Can it be tried on subword ?

Hi, Just wanted to know if this can be extended on a subword level ?
Can we mask like:

Sentence: The cat is sitting at highest point.
Input: The cat is sitting at high_[MASK]_ point

Model should be able to predict highest ?

Results on GPT

Hi,
Do you plan releasing results on GPT ?
Especially on GPT-2 if you ran experiments

Thanks for this project, great job here

GPT-2 Support

Are you going to support GPT-2. There is an unofficial training here https://github.com/nshepperd/gpt-2

Requirements in requirements.txt and setup.py are different

Hi,

Just wanted to point out that the requirement versions in setup.py and requirements.txt file are different. (e.g. 'fairseq==0.6.1'
vs fairseq==0.8.0)

Thanks!

🐛Bug for common_vocab

Hi, I encounter the same problem as in #10.
And I found the reason why 2 examples are filtered is that the obj_label are 1970s and 1990s. And in common_vocab_cased.txt generated by vocab_intersection.py, there are no 1970s and 1990s.

236: {"masked_sentences": ["Income inequality began to increase in the US in the [MASK]."], "obj_label": "1970s", "id": "57287b322ca10214002da3bf_0", "sub_label": "Squad"}
206: {"masked_sentences": ["The perception of Genghis Khan in Mongolia brightened in the [MASK]."], "obj_label": "1990s", "id": "5727404b708984140094db59_0", "sub_label": "Squad"}

Reproduce results in the paper

Hi,

I was trying to reproduce results by running your code, and couldn't get exactly the same precision on SQuAD.
Here is what I got for bert_large model on SQuAD:
all_samples: 303
list_of_results: 303
global MRR: 0.3018861233236291
global Precision at 10: 0.5676567656765676
global Precision at 1: 0.16831683168316833

However, in the paper, the table shows that there should be 305 samples and the precision should be 17.4%.

At first, I guessed that it is because 2 samples are excluded because their object labels are out of the common vocabulary, but even after testing without common vocabulary, I got global Precision at 1: 0.1704918, which is still different to results in the paper.

Is there a way to reproduce the same results in the paper?
Please correct me if I made any mistakes! Thanks!

Running setup.py install for fastBPE ... error

It seems that I struggle with the requirements step ;
fastBPE and fairseq are the last 2 module to installed and I run into a problem :

How to fix that ?

Thank you for your time,

Mathieu

The template for P27 in T-Rex is wrong

Hi,

The template of P27 (T-Rex) got zero accuracy, but I find it's because the template is wrong:
[CLS] Harashima is [MASK] citizen . rightanswer: Japan modelpredict: Japanese
In this case, it's actually right for the model to predict Japanese instead of Japan.

So I changed it to [X] is a citizen of [Y]. And the accuracy for this specific relation goes up to 55%.

Negated templates for T-REx?

Hi,

When trying to run experiments, I'm getting errors in T-Rex regarding "template_negated" (full error below). It seems the file relations.jsonl contains a bunch of examples where the key "template" is present but the key "template_negated" is not, which makes this part of the code (in run_experiments.py) fail:

if "template" in relation:
    PARAMETERS["template"] = relation["template"]
    PARAMETERS["template_negated"] = relation["template_negated"]

Is the file relations.jsonl missing these keys?

Thanks!

Error + relation print:

2. T-REx
bert_base
{'description': 'most specific known '
                '(e.g. city instead of '
                'country, or hospital '
                'instead of city) birth '
                'location of a person, '
                'animal or fictional '
                'character',
 'label': 'place of birth',
 'relation': 'P19',
 'template': '[X] was born in [Y] .',
 'type': 'N-1'}
Traceback (most recent call last):
  File "scripts/run_experiments.py", line 215, in <module>
    run_all_LMs(parameters)
  File "scripts/run_experiments.py", line 204, in run_all_LMs
    run_experiments(*parameters, input_param=ip, use_negated_probes=True)
  File "scripts/run_experiments.py", line 114, in run_experiments
    PARAMETERS["template_negated"] = relation["template_negated"]
KeyError: 'template_negated'

how do i get uniforms for my business split sides comedy club

looking to get uniforms for split sides comedy club