Comments (10)
This is because you are not using the constrained search as shown in the example code. https://github.com/facebookresearch/GENRE/tree/main/examples_genre
from genre.
Thanks! But I thought, that it should always give me a valid entity name without any constraint on the candidate set.
In the e2e examples, you constrain the candidates via "candidates_trie" with different candidates including the name "Einstein". But how can I constrain a sentence where I do not know the entities existing inside the sentence and hence cannot create a candidate list?
from genre.
Then the candidates_trie
is a trie with all possible entities in your KB. Similarly to was is shown for Entity Disambiguation.
from genre.
Thanks. I tried the following: Passing the whole BPE prefix tree as candidates_trie
:
with open("../data/kilt_titles_trie_dict.pkl", "rb") as f:
trie = Trie.load_from_dict(pickle.load(f))
prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(
model,
sentences,
candidates_trie=trie)
But that does not work, the output does not make sense. This is, what you suggested, right? Or is it more complicated and I have to rewrite the code in entity_linking.py
?
from genre.
As I show in the example the trie
needs to be formatted as follows:
candidates_trie=Trie([
model.encode(" }} [ {} ]".format(e))[1:].tolist()
for e in ["Albert Einstein", "Nobel Prize in Physics", "NIL"]
])
the trie
from ../data/kilt_titles_trie_dict.pkl
is not formatted like that. You need to generate the trie from a list of valid entity names (ie all titles from Wikipedia).
from genre.
Sorry, for asking again.
As you suggested in #56, I used the KILT knowledge source to extract all wikipedia titles.
I saved all titles in the list candidate_list
and did what you suggested:
model = GENRE.from_pretrained("models/hf_e2e_entity_linking_aidayago").eval()
sentences = ["For some people he's the John Travolta of early 80's art."]
prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(
model,
sentences,
candidates_trie=Trie([
model.encode(" }} [ {} ]".format(e))[1:].tolist()
for e in candidate_list]))
print(model.sample(
sentences,
prefix_allowed_tokens_fn=prefix_allowed_tokens_fn))
Output
[[{'text': "For some { people } [ People (magazine) ] he's the { John Travolta } [ John Tromp ] of early 80's { art } [ Art ].", 'score': tensor(-1.0483)}, {'text': "For some { people } [ People (magazine) ] he's the { John Travolta } [ John Tromp ] of early 80's art.", 'score': tensor(-1.1079)}, {'text': "For some { people } [ People (magazine) ] he's the { John Travolta } [ John Tromp ] of early 80's { art } [ Visual arts ].", 'score': tensor(-1.2112)}, {'text': "For some { people } [ People (magazine) ] he's the { John Travolta } [ John Tromp ] of early 80's { art } [ Fine art ].", 'score': tensor(-1.2184)}, {'text': "For some { people } [ People (magazine) ] he's the { John Travolta } [ John Tromp ] of early 80's { art } [ Artist ].", 'score': tensor(-1.2463)}]]
The output for John Travolta
was not John Travolta as expected but John Tromp
. If I use the disambiguation model and tag John Travolta, the output is John Travolta. But it should be the same, right?
I checked the candidate_list
, there is John Travolta
in it.
from genre.
This looks correct to me. One suggestion: save your trie so you do not need to recompute it every time.
The disambiguation model and the end2end linking model are not the same so they might give different outputs. The disambiguation is usually much more precise than the end2end linking model.
from genre.
Okay, thank you.
In thought, after finding a mention in the text the end2end linking model uses the same way as the disambiguation model to get the linked wikipedia entity candidate.
Then, it would make sense to first compute all mentions from the e2e model and not use the linked wikipedida entities but use the disambiguation model to link those mentions, wouldn't it?
from genre.
The two models operate in different ways. Please refer to the paper for details.
Then, it would make sense to first compute all mentions from the e2e model and not use the linked wikipedida entities but use the disambiguation model to link those mentions, wouldn't it?
Yes, I think so. It is also faster to use something like FLAIR to get the mentions.
from genre.
Thank you for the fast and detailled answers! :)
from genre.
Related Issues (20)
- is prefix_allowed_tokens_fn only working for seq2seq model.generate? HOT 2
- Loading mgenre models is taking 44GB RAM
- Problem in candidate-based generation on GENRE using transformers >= 4.36.0
- the same entity name question
- Inference speed is too slow. Is this problem because of Constrained beam search?
- can not receive different outputs from mGENRE.sample using dropout in train mode and different seeds HOT 2
- can't find ID to title map json file HOT 1
- alignment between candidate and KILT wikipedia data source HOT 4
- Question: Running genre on multiple GPUs HOT 1
- format of entries for entity linking training HOT 2
- Fail to Reproduce the dev score of GENRE Document Retrieval HOT 7
- mGENRE finetuning issue
- Why do you prepend `eos_token_id' to sent_orig HOT 2
- colab script to run GENRE
- NameError: name 'batched_hypos' is not defined (mGENRE) HOT 5
- [Question] Evaluating mGENRE on Mewsli-9
- Fine-tune with hugging face trainer
- import package error
- Chinese entity linking
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genre.