Comments (5)
Can you post the full error stack?
from genre.
Sure.
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In [137], line 1
----> 1 model.sample(
2 sentences=["[START] Einstein [END] era un fisico tedesco."],
3 # Italian for "[START] Einstein [END] was a German physicist."
4 prefix_allowed_tokens_fn=lambda batch_id, sent: [
5 e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary)
6 ],
7 text_to_id=lambda x: max(lang_title2wikidataID[
8 tuple(reversed(x.split(" >> ")))
9 ], key=lambda y: int(y[1:])),
10 marginalize=True,
11 )
File ~/GENRE/genre/fairseq_model.py:53, in _GENREHubInterface.sample(self, sentences, beam, verbose, text_to_id, marginalize, marginalize_lenpen, max_len_a, max_len_b, **kwargs)
36 batched_hypos = self.generate(
37 tokenized_sentences,
38 beam,
(...)
42 **kwargs,
43 )
45 outputs = [
46 [
47 {"text": self.decode(hypo["tokens"]), "score": hypo["score"]}
(...)
50 for hypos in batched_hypos
51 ]
---> 53 outputs = post_process_wikidata(
54 outputs, text_to_id=text_to_id, marginalize=marginalize
55 )
57 return outputs
File ~/GENRE/genre/utils.py:492, in post_process_wikidata(outputs, text_to_id, marginalize)
486 outputs = [
487 [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
488 for hypos in outputs
489 ]
491 if marginalize:
--> 492 for (i, hypos), hypos_tok in zip(enumerate(outputs), batched_hypos):
493 outputs_dict = defaultdict(list)
494 for hypo, hypo_tok in zip(hypos, hypos_tok):
NameError: name 'batched_hypos' is not defined
from genre.
当然。
2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False 2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True 2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False 2023-02-03 09:13:50 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1 --------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In [137], line 1 ----> 1 model.sample( 2 sentences=["[START] Einstein [END] era un fisico tedesco."], 3 # Italian for "[START] Einstein [END] was a German physicist." 4 prefix_allowed_tokens_fn=lambda batch_id, sent: [ 5 e for e in trie.get(sent.tolist()) if e < len(model.task.target_dictionary) 6 ], 7 text_to_id=lambda x: max(lang_title2wikidataID[ 8 tuple(reversed(x.split(" >> "))) 9 ], key=lambda y: int(y[1:])), 10 marginalize=True, 11 ) File ~/GENRE/genre/fairseq_model.py:53, in _GENREHubInterface.sample(self, sentences, beam, verbose, text_to_id, marginalize, marginalize_lenpen, max_len_a, max_len_b, **kwargs) 36 batched_hypos = self.generate( 37 tokenized_sentences, 38 beam, (...) 42 **kwargs, 43 ) 45 outputs = [ 46 [ 47 {"text": self.decode(hypo["tokens"]), "score": hypo["score"]} (...) 50 for hypos in batched_hypos 51 ] ---> 53 outputs = post_process_wikidata( 54 outputs, text_to_id=text_to_id, marginalize=marginalize 55 ) 57 return outputs File ~/GENRE/genre/utils.py:492, in post_process_wikidata(outputs, text_to_id, marginalize) 486 outputs = [ 487 [{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos] 488 for hypos in outputs 489 ] 491 if marginalize: --> 492 for (i, hypos), hypos_tok in zip(enumerate(outputs), batched_hypos): 493 outputs_dict = defaultdict(list) 494 for hypo, hypo_tok in zip(hypos, hypos_tok): NameError: name 'batched_hypos' is not defined
Has the problem been solved? How did you solve it?
from genre.
Same issues. Any update?
from genre.
The solution is to modify this method to receive batched_hypos
:
def post_process_wikidata(outputs, text_to_id=False, marginalize=False,
batched_hypos=None, marginalize_lenpen=0.5):
if text_to_id:
outputs = [
[{**hypo, "id": text_to_id(hypo["text"])} for hypo in hypos]
for hypos in outputs
]
if marginalize:
for (
i, hypos), hypos_tok in zip(
enumerate(outputs), batched_hypos):
outputs_dict = defaultdict(list)
for hypo, hypo_tok in zip(hypos, hypos_tok):
outputs_dict[hypo["id"]].append(
{**hypo, "len": len(hypo_tok["tokens"])}
)
outputs[i] = sorted(
[
{
"id": _id,
"texts": [hypo["text"] for hypo in hypos],
"scores": torch.stack([hypo["score"] for hypo in hypos]),
"score": torch.stack(
[
hypo["score"]
* hypo["len"]
/ (hypo["len"] ** marginalize_lenpen)
for hypo in hypos
]
).logsumexp(-1),
}
for _id, hypos in outputs_dict.items()
],
key=lambda x: x["score"],
reverse=True,
)
return outputs
And then you can call it in class _GENREHubInterface
with:
outputs = post_process_wikidata(
outputs,
text_to_id=text_to_id,
marginalize=marginalize,
batched_hypos=batched_hypos,
marginalize_lenpen=marginalize_lenpen)
from genre.
Related Issues (20)
- is prefix_allowed_tokens_fn only working for seq2seq model.generate? HOT 2
- Loading mgenre models is taking 44GB RAM
- Problem in candidate-based generation on GENRE using transformers >= 4.36.0
- the same entity name question
- Inference speed is too slow. Is this problem because of Constrained beam search?
- can not receive different outputs from mGENRE.sample using dropout in train mode and different seeds HOT 2
- can't find ID to title map json file HOT 1
- alignment between candidate and KILT wikipedia data source HOT 4
- Question: Running genre on multiple GPUs HOT 1
- format of entries for entity linking training HOT 2
- Invalid prediction - no wikipedia entity HOT 10
- Fail to Reproduce the dev score of GENRE Document Retrieval HOT 7
- mGENRE finetuning issue
- Why do you prepend `eos_token_id' to sent_orig HOT 2
- colab script to run GENRE
- [Question] Evaluating mGENRE on Mewsli-9
- Fine-tune with hugging face trainer
- import package error
- Chinese entity linking
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genre.