facebookresearch / lama Goto Github PK
View Code? Open in Web Editor NEWLAnguage Model Analysis
License: Other
LAnguage Model Analysis
License: Other
In Nexusmods Of Horizon Zero Dawn Mods, Whenever Fireclaw Does It's Ground Lava Fountain Attack It Crashes The Game, Plz Fix It As Soon As Possible THANKS 👍🔥❤️
WE ARE WAITING!
Good morning,
I start by saying that I’m new on working on macOS because I tried to install the requirements on windows 10 but I’ve got many issues installing manually all the prerequisites, so I’m trying to switch.
When I try to download the pre-trained models (in particular "BERT BASE CASED”) I get this error:
./download_models.sh: line 12: realpath: command not found
Ok, I see. The provided `common_vocab_cased.txt` does have `1990s` and `1970s `. So I think the problem probably lies in the `vocab_intersection.py` ?
Originally posted by @Hannibal046 in #51 (comment)
Hi,
I followed the requirements to install the environment.
But when I run the scripts/run_experiments.py, there is a bug in __max_probs_values_indices:
RuntimeError: invalid argument 2: k not in range for dimension at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1190
I checked the log_prob is 28 but k is 10000, which exceeds.
Hello,
Have you ever evaluated RoBERTa on LAMA-T-REx and LAMA-Google-RE?
Thank you for open-sourcing and maintaining such a great project! :)
I have an issue related to RoBERTa evaluation on LAMA.
To evaluate the RoBERTa performance on LAMA, I downloaded RoBERTa {base,large} checkpoints from the fairseq repository. Then I slightly modify run_experiment.py
to add RoBERTa to the target LMs as follows:
LMs = [
{
"lm": "roberta",
"label": "roberta",
"models_names": ["roberta"],
"roberta_model_name": "model.pt",
"roberta_model_dir": "pre-trained_language_models/roberta.large/",
"roberta_vocab_name": "dict.txt"
}, ...
]
Although the RoBERTa large model is loaded correctly, many warnings show up saying that many words are not included vocab_subset
in model.
2020-03-12 17:55:33,651 - LAMA - WARNING - word wingspan from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word woken from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word wooded from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word wrestled from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word wrinkled from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yanked from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yd from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yellowish from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word yer from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word zu from vocab_subset not in model vocabulary!
2020-03-12 17:55:33,651 - LAMA - WARNING - word zur from vocab_subset not in model vocabulary!
Then, it was terminated by the error shown below:
Traceback (most recent call last):
File "scripts/run_experiments.py", line 221, in <module>
run_all_LMs(parameters)
File "scripts/run_experiments.py", line 214, in run_all_LMs
run_experiments(*parameters, input_param=ip, use_negated_probes=False)
File "scripts/run_experiments.py", line 135, in run_experiments
Precision1 = run_evaluation(args, shuffle_data=False, model=model)
File "/home/akari/projects/LAMA/scripts/batch_eval_KB_completion.py", line 390, in main
model, data, vocab_subset, args.max_sentence_length, args.template
File "/home/akari/projects/LAMA/scripts/batch_eval_KB_completion.py", line 233, in filter_samples
if obj_label_ids:
RuntimeError: bool value of Tensor with more than one value is ambiguous
Do you have any insights into why many words are not found in vocab_subset
and how we could solve the RuntimeError
.
Hi,
when I run ./download_models.sh., I get the following exception:
Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
File "lama/vocab_intersection.py", line 158, in <module>
main()
File "lama/vocab_intersection.py", line 152, in main
__vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
File "lama/vocab_intersection.py", line 97, in __vocab_intersection
model = build_model_by_name(args.lm, args)
File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
return MODEL_NAME_TO_CLASS[lm](args)
File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
model = cls(config, *inputs, **kwargs)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
self.transformer = TransfoXLModel(config)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
div_val=config.div_val)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188
I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :
RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]
But in #32 there is no recommendation how to fix this dimension error.
All the packages from requirements.txt are installed correctly, but I have overrides==3.1.0 instead of overrides==6.1.0 as the import "from allennlp.modules.elmo import _ElmoBiLm" in elmo_connector.py didn't work, it worked only after changing to 3.1.0. I also tried to skip the building vocab-part and downloaded the provided common_vocab.txts from the README, but the same Torch: invalid memory size -- maybe an overflow?-error occurs when running run_experiments.py .
Does anybody have an idea how to fix this?
run_experiments.py
uses TREx
directory which does not exist in the publicized dataset (data.zip). Also, the dataset contains TREx_alpaca
directory. Is it OK to change the directory name in this line to TREx_alpaca
to run experiments?
Hi there,
How were the sentences (evidences
for T-REx and considered_sentences
for GoogleRE) selected for T-REx and GoogleRE? Was it done manually or using some script?
Thanks,
I'm getting this error when trying to run ./download_models.sh
File "/home/Downloads/LAMA-main/lama/vocab_intersection.py", line 7, in
from lama.modules import build_model_by_name
ModuleNotFoundError: No module named 'lama'
The tokenizer.vocab_size is 30522, however it should be 28996 which is just the same with config.vocab_size.
I suggust that we can just download the model from hugging face.
Hi, thank you for open-sourcing this great project!
I looked into the datasets provided in this repository (https://dl.fbaipublicfiles.com/LAMA/data.zip) and some of their sizes do not match with the sizes described in the paper.
ConceptNet: 11458 (paper) vs 29774 (dataset)
Google-RE death-place: 765 (paper) vs 766 (dataset)
Also for the TREx dataset, could you explain how the sentences are selected from the 'evidences' in each line of jsonl file? There seems to be multiple 'masked_sentence' in 'evidences'.
Thank you.
How to make label to come in next line. If labe is too lengthy in mat-form-field in angular
If anybody knows this..please give the solution
I got the following error while executing ./download_dataset.sh
for the first time. The same problem appears when I tried to execute the same script again.
Building common vocab
Traceback (most recent call last):
File "lama/vocab_intersection.py", line 7,
in <module>
from lama.modules import build_model_by_n
ame
ModuleNotFoundError: No module named 'lama'
Hi,
I'm on Mojave, and using
CFLAGS="-Wno-deprecated-declarations -std=c++11 -stdlib=libc++" pip install --editable .
gives me
error: invalid argument '-std=c++11' not allowed with 'C'
error: command 'gcc' failed with exit status 1
Failed building wheel for cytoolz
Any solution please?
I tried to replicate the results but I ran into a problem. There are some T-REx relations like P166, P69, P54 that are mentioned in the relations file but don't have data files in the downloadable dataset. Were the results in the paper achieved without such relations or are these relations missing in the dataset?
I do some works following the reference paper. I have reproduced the results of LMs. If I want to evaluate the baseline with the LAMA probe, I wonder if there is Baseline script/code in the project, or we need to rewrite this part and evaluate with the LAMA probe data.
The "Language Models as Knowledge Bases" paper, in Section 4.4, mentions that the evaluation deals with multiple valid objects for the same subject and relation pair. Specifically, valid objects other than the one which is being tested are removed from the ranked list of answers before computing the metrics. However, the TREx data released in this repository only includes one object per tested fact, even for queries where multiple valid answers do exist (based on my browsing of Wikidata). So, does the LAMA evaluation account of multiple valid objects? If yes, how does it do that given that the multiple objects are not in the data.
I have tried running the downlaod_models.sh
script. However, my machine can't deal with transformerXL. Could you release your processed common_vocab_*.txt?
Hello,
I followed the tutorial, but when I do "./download_models.sh", I encountered with this problem:
$ ./download_models.sh
BERT BASE LOWERCASED
BERT BASE CASED
Building common vocab
...
Traceback (most recent call last):
File "lama/vocab_intersection.py", line 160, in <module>
main()
File "lama/vocab_intersection.py", line 154, in main
__vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
File "lama/vocab_intersection.py", line 99, in __vocab_intersection
model = build_model_by_name(args.lm, args)
File "/LAMA-master/lama/modules/__init__.py", line 33, in build_model_by_name
return MODEL_NAME_TO_CLASS[lm](args)
File "/LAMA-master/lama/modules/bert_connector.py", line 79, in __init__
self.vocab = list(self.tokenizer.ids_to_tokens.values())
AttributeError: 'NoneType' object has no attribute 'ids_to_tokens'
Actually, when I try to run the experiments, I get:
$ python scripts/run_experiments.py
Model name 'pre-trained_language_models/transformerxl/transfo-xl-wt103/' was not found in model name list (transfo-xl-wt103). We assumed 'pre-trained_language_models/transformerxl/transfo-xl-wt103/' was a path or url but couldn't find files pre-trained_language_models/transformerxl/transfo-xl-wt103/vocab.bin at this path or url.
Traceback (most recent call last):
File "scripts/run_experiments.py", line 214, in <module>
run_all_LMs(parameters)
File "scripts/run_experiments.py", line 207, in run_all_LMs
run_experiments(*parameters, input_param=ip, use_negated_probes=False)
File "scripts/run_experiments.py", line 125, in run_experiments
model = build_model_by_name(model_type_name, args)
File "LAMA-master/lama/modules/__init__.py", line 33, in build_model_by_name
return MODEL_NAME_TO_CLASS[lm](args)
File "LAMA-master/lama/modules/transformerxl_connector.py", line 33, in __init__
self.vocab = list(self.tokenizer.idx2sym)
AttributeError: 'NoneType' object has no attribute 'idx2sym'
I wonder if this is the cause: for simplicity, I only downloaded the BERT BASE CASED/UNCASED two models, so I commented out all the other models in the download_model.sh.
Any help will be appreciated, thanks a lot!
Hi,
I set use_negated_probes=True
in run_experiments.py
, but the written results are still for positive templates. Are the negated template results being written somewhere?
Thanks
Hi there,
Thank you for releasing the code to this wonderful project! I'm trying to reproduce the results on Google-RE (birth-date and death-place relations) but wasn't able to match the reported numbers in the paper. This is the output I get for ELMo original:
Birth-Date Google-RE
all_samples: 1825 list_of_results: 1825 global MRR: 0.00018831351372720835 global Precision at 10: 0.0 global Precision at 1: 0.0 P@1 : 0.0
Death-Place Google-RE
all_samples: 765 list_of_results: 765 global MRR: 3.727018365172404e-06 global Precision at 10: 0.0 global Precision at 1: 0.0 P@1 : 0.0
It seems like the paper reports Mean Precision@1 scores of 0.1 and 0.3 respectively for the above two relations.
Am I using the repository incorrectly? Please let me know how I can reproduce the results :)
Thank you in advance!
Hi,
In the paper, the ConceptNet in Negated-LAMA constains 2996 items.
But the when the ConceptNet in https://dl.fbaipublicfiles.com/LAMA/negated_data.tar.gz constains 20000+ items without filter.
Best,
Deming
Hi,
When I execute ./download_models.sh
, encounter the following problem
lowercase models
OpenAI GPT
BERT BASE LOWERCASED
BERT LARGE LOWERCASED
cased models
Transformer XL
ELMO ORIGINAL 5.5B
ELMO ORIGINAL
BERT BASE CASED
BERT LARGE CASED
Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Traceback (most recent call last):
File "lama/vocab_intersection.py", line 7, in <module>
from lama.modules import build_model_by_name
File "/data/datasets/yuke/LAMA/lama/modules/__init__.py", line 8, in <module>
from .elmo_connector import Elmo
File "/data/datasets/yuke/LAMA/lama/modules/elmo_connector.py", line 10, in <module>
from allennlp.modules.elmo import _ElmoBiLm #, Elmo as AllenNLP_Elmo
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/__init__.py", line 9, in <module>
from allennlp.modules.elmo import Elmo
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 20, in <module>
from allennlp.modules.highway import Highway
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 12, in <module>
class Highway(torch.nn.Module):
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 49, in Highway
def forward(self, inputs: torch.Tensor) -> torch.Tensor: # pylint: disable=arguments-differ
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 83, in overrides
return _overrides(method, check_signature, check_at_runtime)
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 170, in _overrides
_validate_method(method, super_class, check_signature)
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 189, in _validate_method
ensure_signature_is_compatible(super_method, method, is_static)
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 113, in ensure_signature_is_compatible
method_name,
File "/home/yuke_wang/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 220, in ensure_all_positional_args_defined_in_sub
raise TypeError(f"{method_name}: `{super_param.name}` must be present")
TypeError: Highway.forward: `input` must be present
Is there any solution to this? Thanks!
Could LAMA be augmented to accept mathematical corpi in order to learn a mathematical knowledge base?
Hi There - I could use your help on a few questions I have:
I am on Win10 and running ubuntu with Linux dev. I was wondering why i am getting the following error when I run "python scripts/run_experiments.py":
Traceback (most recent call last):
File "scripts/run_experiments.py", line 8, in
from batch_eval_KB_completion import main as run_evaluation
File "/home/USER/LAMA/scripts/batch_eval_KB_completion.py", line 7, in
from lama.modules import build_model_by_name
File "build/bdist.linux-x86_64/egg/lama/modules/init.py", line 8, in
File "build/bdist.linux-x86_64/egg/lama/modules/elmo_connector.py", line 10, in
ImportError: No module named allennlp.modules.elmo
I have reinstalled Allennlp several times and upgraded. Same error.
Also, i was wondering if I can run LAMA on windows without using the linux completely. Thank you!
Hello,
Thanks for your great job.
I have one problem. I saw in the project, the "roberta_connector.py" load a FAIRSEQ RoBERTa, but I want to load huggingface RoBERTa. What should I do? Could you give me some advice?
Thanks a lot.
Any chance you can add a detailed tutorial on how to work with LAMA on MacOs (M1 or M2 chips)?
There are many developers that are using Macs and it will be helpful to know how to start hacking on this powerful open-source project from Mac
Do you have examples for using a fine-tuned model to do the evaluation? I have saved the checkpoint but just by editing LM list in the run_experiments.py the code failed......
Hi,
when I run ./download_models.sh., I get the following exception:
Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
File "lama/vocab_intersection.py", line 158, in <module>
main()
File "lama/vocab_intersection.py", line 152, in main
__vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
File "lama/vocab_intersection.py", line 97, in __vocab_intersection
model = build_model_by_name(args.lm, args)
File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
return MODEL_NAME_TO_CLASS[lm](args)
File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
model = cls(config, *inputs, **kwargs)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
self.transformer = TransfoXLModel(config)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
div_val=config.div_val)
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188
I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :
RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]
But in #32 there is no recommendation how to fix this dimension error.
All the packages from requirements.txt are installed correctly, but I have overrides==3.1.0 instead of overrides==6.1.0 as the import "from allennlp.modules.elmo import _ElmoBiLm" in elmo_connector.py didn't work, it worked only after changing to 3.1.0. I also tried to skip the building vocab-part and downloaded the provided common_vocab.txts from the README, but the same Torch: invalid memory size -- maybe an overflow?-error occurs when running run_experiments.py .
Does anybody have an idea how to fix this?
Hi,
The following is most likely a misunderstanding on my part but I notice that there are many duplicates and pseudo-duplicates in the jsonl files.
For instance, this line in lama/TREx/P17.jsonl
:
{
"uuid": "df10f035-6269-4cdf-88df-26395e0dc3b4",
"obj_uri": "Q16",
"obj_label": "Canada",
"sub_uri": "Q7517499",
"sub_label": "Simcoe Composite School",
"predicate_id": "P17",
"evidences": [{
"sub_surface": "Simcoe Composite School",
"obj_surface": "Canada",
"masked_sentence": "Simcoe Composite School is a high school in Simcoe, Ontario, [MASK]."
}, {
"sub_surface": "Simcoe Composite School",
"obj_surface": "Canada",
"masked_sentence": "Simcoe Composite School is a high school in Simcoe, Ontario, [MASK]."
}]
}
has two evidences
both of which are the same. This is not always the case, i.e., in many other cases the evidences
are different sentences.
Further, in the conceptnet corpora, apparently every UUID appears twice. As an example, here are two instances with the same UUID:
{
"sub": "alive",
"obj": "think",
"pred": "HasSubevent",
"masked_sentences": ["One of the things you do when you are alive is [MASK]."],
"obj_label": "think",
"uuid": "d4f11631dde8a43beda613ec845ff7d1"
}
and
{
"pred": "HasSubevent",
"masked_sentences": ["One of the things you do when you are alive is [MASK]."],
"obj_label": "think",
"uuid": "d4f11631dde8a43beda613ec845ff7d1",
"sub_label": "alive"
}
Here, in the second time the instance does not have the following fields sub
, obj
but otherwise seems to remain unchanged.
So, based on this, my question is:
Are the duplicates intentional? For instance, when computing metrics of my model over the probe, am I to treat the task as-is and if need be, make predictions twice over the same instance?
Alternatively, I could easily root out the duplicates when processing the files? Do I do that instead? Have others done that?
I know for a fact that LAMA on HuggingFace datasets (https://huggingface.co/datasets/lama) contains these duplicates.
I just installed LAMA without error on Ubuntu 20.
The run_experiment script immediately crashes because of the following import:
when I open python and just run
import allennlp.modules
I've got this error:
>>> import allennlp.modules
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/__init__.py", line 9, in <module>
from allennlp.modules.elmo import Elmo
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/elmo.py", line 20, in <module>
from allennlp.modules.highway import Highway
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 12, in <module>
class Highway(torch.nn.Module):
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/allennlp/modules/highway.py", line 49, in Highway
def forward(self, inputs: torch.Tensor) -> torch.Tensor: # pylint: disable=arguments-differ
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 88, in overrides
return _overrides(method, check_signature, check_at_runtime)
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 114, in _overrides
_validate_method(method, super_class, check_signature)
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/overrides.py", line 135, in _validate_method
ensure_signature_is_compatible(super_method, method, is_static)
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 104, in ensure_signature_is_compatible
method_name,
File "/home/xtof/anaconda3/envs/lama37/lib/python3.7/site-packages/overrides/signature.py", line 211, in ensure_all_positional_args_defined_in_sub
raise TypeError(f"{method_name}: `{super_param.name}` must be present")
TypeError: Highway.forward: `input` must be present
Thank you !
Hi all,
Is the code to create the sentences from KBs available? e.g. the code with the templating and the code to generate the sentences with [MASK]?
Thanks,
Arian
For anyone else encountering this error when running the vocab_intersection.py, it looks like there is an issue with the config.json in transfo-xl-wt103 directory.
Update the config.json to the JSON in the link below and it should resolve the error.
https://huggingface.co/transfo-xl/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json
Hope this helps!
Nice work on the EMNLP paper "Language Models as Knowledge Bases?", and thanks for making the code available.
Are the datasets available to reproduce the results from the paper, namely Table 2?
Is there an option to load and evaluate a custom pre-trained BERT model?
Hi, Just wanted to know if this can be extended on a subword level ?
Can we mask like:
Sentence: The cat is sitting at highest point.
Input: The cat is sitting at high_[MASK]_ point
Model should be able to predict highest ?
Hi,
Do you plan releasing results on GPT ?
Especially on GPT-2 if you ran experiments
Thanks for this project, great job here
Are you going to support GPT-2. There is an unofficial training here https://github.com/nshepperd/gpt-2
Hi,
Just wanted to point out that the requirement versions in setup.py and requirements.txt file are different. (e.g. 'fairseq==0.6.1'
vs fairseq==0.8.0)
Thanks!
Hi, I encounter the same problem as in #10.
And I found the reason why 2 examples are filtered is that the obj_label
are 1970s
and 1990s
. And in common_vocab_cased.txt
generated by vocab_intersection.py
, there are no 1970s
and 1990s
.
236: {"masked_sentences": ["Income inequality began to increase in the US in the [MASK]."], "obj_label": "1970s", "id": "57287b322ca10214002da3bf_0", "sub_label": "Squad"}
206: {"masked_sentences": ["The perception of Genghis Khan in Mongolia brightened in the [MASK]."], "obj_label": "1990s", "id": "5727404b708984140094db59_0", "sub_label": "Squad"}
Hi,
I was trying to reproduce results by running your code, and couldn't get exactly the same precision on SQuAD.
Here is what I got for bert_large model on SQuAD:
all_samples: 303
list_of_results: 303
global MRR: 0.3018861233236291
global Precision at 10: 0.5676567656765676
global Precision at 1: 0.16831683168316833
However, in the paper, the table shows that there should be 305 samples and the precision should be 17.4%.
At first, I guessed that it is because 2 samples are excluded because their object labels are out of the common vocabulary, but even after testing without common vocabulary, I got global Precision at 1: 0.1704918
, which is still different to results in the paper.
Is there a way to reproduce the same results in the paper?
Please correct me if I made any mistakes! Thanks!
Hi,
The template of P27 (T-Rex) got zero accuracy, but I find it's because the template is wrong:
[CLS] Harashima is [MASK] citizen . rightanswer: Japan modelpredict: Japanese
In this case, it's actually right for the model to predict Japanese instead of Japan.
So I changed it to [X] is a citizen of [Y]. And the accuracy for this specific relation goes up to 55%.
Hi,
When trying to run experiments, I'm getting errors in T-Rex regarding "template_negated" (full error below). It seems the file relations.jsonl
contains a bunch of examples where the key "template" is present but the key "template_negated" is not, which makes this part of the code (in run_experiments.py) fail:
if "template" in relation:
PARAMETERS["template"] = relation["template"]
PARAMETERS["template_negated"] = relation["template_negated"]
Is the file relations.jsonl
missing these keys?
Thanks!
Error + relation print:
2. T-REx
bert_base
{'description': 'most specific known '
'(e.g. city instead of '
'country, or hospital '
'instead of city) birth '
'location of a person, '
'animal or fictional '
'character',
'label': 'place of birth',
'relation': 'P19',
'template': '[X] was born in [Y] .',
'type': 'N-1'}
Traceback (most recent call last):
File "scripts/run_experiments.py", line 215, in <module>
run_all_LMs(parameters)
File "scripts/run_experiments.py", line 204, in run_all_LMs
run_experiments(*parameters, input_param=ip, use_negated_probes=True)
File "scripts/run_experiments.py", line 114, in run_experiments
PARAMETERS["template_negated"] = relation["template_negated"]
KeyError: 'template_negated'
looking to get uniforms for split sides comedy club
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.