persiannlp / parsinlu Goto Github PK
View Code? Open in Web Editor NEWA comprehensive suite of high-level NLP tasks for Persian language
Home Page: https://arxiv.org/abs/2012.06154
License: Other
A comprehensive suite of high-level NLP tasks for Persian language
Home Page: https://arxiv.org/abs/2012.06154
License: Other
The lines are not proper JSON. Unlike Python syntax, JSON strings only use double-quotations: https://stackoverflow.com/a/4162651/1164246
I ran train_and_evaluate_entailment_baselines.sh and got results in /scripts/runs/ . All events.out.tfevents files have same 40 bytes size and are quite empty. How can I solve it?
What is Parsnlu's best model for entailment.
Have you trained 'persiannlp/parsbert-base-parsinlu-entailment' on FarsTail dataset?
Hello,
I am trying to use reading comprehension data in another model. I found out that there are several answers to most of the questions. I tried to concatenate these answers but it seems that some of them complete each other but others become duplicated.
For example:
I would like to know how should I use the answers and solve this problem.
Thanks for your consideration.
Hi, first I want to send gratitude for sharing your great work. I'm new to NLP field and I have to a task to do, so I've looked at your model in hugging face
and I want to fine-tune your model. So how can I fine-tune your model with python transformer
package?
Thanks a lot
I cloned the package and installed requirements.txt
Then I wento the scripts directory and run this command:
./train_and_evaluate_reading_comprehension_baselines.sh
But I got the following error:
Traceback (most recent call last):
File "../src/run_seq2seq.py", line 344, in <module>
main(args)
File "../src/run_seq2seq.py", line 322, in main
logger=logger,
File "/home/pouramini/parsinlu/src/lightning_base.py", line 329, in generic_train
**train_params,
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'early_stop_callback'
It might be related to some enviornment setting or version mismatche, if you know the error please guide me.
Hi,
Does anyone know how to use ParsiNLU with RASA?
https://rasa.com/docs/rasa/language-support/
Regards,
I tried this example for wikibert-base-parsinlu-multiple-choice.
I got these outputs:
MultipleChoiceModelOutput(loss=None, logits=tensor([[2.1362],
[2.1362],
[2.1362],
[2.1362]], grad_fn=<ViewBackward>), hidden_states=None, attentions=None)
MultipleChoiceModelOutput(loss=None, logits=tensor([[2.1362],
[2.1362],
[2.1362],
[2.1362]], grad_fn=<ViewBackward>), hidden_states=None, attentions=None)
MultipleChoiceModelOutput(loss=None, logits=tensor([[2.1362],
[2.1362],
[2.1362],
[2.1362]], grad_fn=<ViewBackward>), hidden_states=None, attentions=None)
But how do I obtain which candidate was guessed? Those outputs did not make a sense.
The ones that involves BERT had same examples except model name so I guess they would output something similar. Only MT5 ones seemed to output what I expected.
Hi,
Thank you for providing this benchmark,
However, I can't find data for reading comprehension in the data folder.
I'm translating English sentences into Farsi with mt5-base-parsinlu-translation_en_fa (from Huggingface). Sentences longer than around 8 words result in the translation of the first part of the sentence, but the rest of the sentence is ignored. For example:
English sentences:
Terry's side fell to their second Premier League loss of the season at Loftus Road
Following a four-day hiatus, UN envoy Ismail Ould Cheikh Ahmed on Thursday will resume mediation efforts in the second round of Kuwait-hosted peace talks between Yemen’s warring rivals.
Mark Woods is a writer and broadcaster who has covered the NBA, and British basketball, for over a decade.
Translations:
طرفدار تری در فوتبال دوم فصل در لئوپوس رود به
پس از چهار روز توقف، سفیر سازمان ملل، ایمیل اولد شیخ
مارک ولز نویسنده و پخش کننده ای است که بیش از یک دهه
which according to Google Translate translates back to this:
More fans in the second football season in Leopard
After a four-day hiatus, the ambassador to the United Nations, Old Sheikh Sheikh
Mark Wells has been a writer and broadcaster for over a decade
I can't find any configuration settings that would be limiting the number of tokens being translated
Here is my code:
#!/usr/bin/python3
import sys
#from transformers import MarianTokenizer, MarianMTModel
from transformers import MT5ForConditionalGeneration, MT5Tokenizer
from typing import List
import torch
device = "cuda:0"
dir=sys.argv[1] + "persiannlp"
size="base"
mname = f'{dir}/data/mt5-{size}-parsinlu-translation_en_fa'
tokenizer = MT5Tokenizer.from_pretrained(mname)
model = MT5ForConditionalGeneration.from_pretrained(mname)
model = model.to(device)
lines = []
while True:
for line in sys.stdin:
line = line.strip()
if line == 'EOD':
inputs = tokenizer(lines, return_tensors="pt", padding=True).to(device)
translated = model.generate(**inputs).to(device)
[print(tokenizer.decode(t, skip_special_tokens=True)) for t in translated]
print('EOL')
sys.stdout.flush()
lines.clear()
elif line.startswith('EOF'):
sys.exit(0)
else:
lines.append(line)
sys.exit(0)
I was trying to experiment with your reading comprehension mt5 model and ran into a problem. I followed the code provided in the model card of huggingface
mt5-small-parsinlu.
when I run
tokenizer = MT5Tokenizer.from_pretrained("persiannlp/mt5-small-parsinlu-squad-reading-comprehension")
I get no error but I get the error "AttributeError: 'NoneType' object has no attribute 'encode'" when I try to call the run_model() which points to the tokenizer. I checked the type of tokenizer with both "base" and "small" models and it returns a NoneType. The model, however, loads fine and the type is correct using the same checkpoint name.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.