persiannlp / parsinlu Goto Github PK

A comprehensive suite of high-level NLP tasks for Persian language

Home Page: https://arxiv.org/abs/2012.06154

License: Other

Python 74.70% Shell 25.30%

query-paraphrasing machine-translation sentiment-analysis persian-language persian natural-language-processing farsi reading-comprehension textual-entailment natural-language-inference

parsinlu's People

Contributors

Stargazers

Watchers

parsinlu's Issues

Sentiment analysis files

The lines are not proper JSON. Unlike Python syntax, JSON strings only use double-quotations: https://stackoverflow.com/a/4162651/1164246

empty result

I ran train_and_evaluate_entailment_baselines.sh and got results in /scripts/runs/ . All events.out.tfevents files have same 40 bytes size and are quite empty. How can I solve it?

Looking for best NLI model

What is Parsnlu's best model for entailment.

Have you trained 'persiannlp/parsbert-base-parsinlu-entailment' on FarsTail dataset?

Reading comprehension data (Answers)

Hello,
I am trying to use reading comprehension data in another model. I found out that there are several answers to most of the questions. I tried to concatenate these answers but it seems that some of them complete each other but others become duplicated.
For example:

20 دندان / 10 عدد در هر آرواره
استان فارس / فارس

I would like to know how should I use the answers and solve this problem.
Thanks for your consideration.

Fine-tune Huggingface model

Hi, first I want to send gratitude for sharing your great work. I'm new to NLP field and I have to a task to do, so I've looked at your model in hugging face and I want to fine-tune your model. So how can I fine-tune your model with python transformer package?
Thanks a lot

TypeError: init() got an unexpected keyword argument 'early_stop_callback'

I cloned the package and installed requirements.txt

Then I wento the scripts directory and run this command:

./train_and_evaluate_reading_comprehension_baselines.sh

But I got the following error:

Traceback (most recent call last):
  File "../src/run_seq2seq.py", line 344, in <module>
    main(args)
  File "../src/run_seq2seq.py", line 322, in main
    logger=logger,
  File "/home/pouramini/parsinlu/src/lightning_base.py", line 329, in generic_train
    **train_params,
  File "/home/pouramini/miniconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 41, in overwrite_by_env_vars
    return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'early_stop_callback'

It might be related to some enviornment setting or version mismatche, if you know the error please guide me.

Intergrade parsinlu with RASA

Hi,

Does anyone know how to use ParsiNLU with RASA?
https://rasa.com/docs/rasa/language-support/

Regards,

Example for wikibert-base-parsinlu-multiple-choice seemed not working.

I tried this example for wikibert-base-parsinlu-multiple-choice.

I got these outputs:

MultipleChoiceModelOutput(loss=None, logits=tensor([[2.1362],
        [2.1362],
        [2.1362],
        [2.1362]], grad_fn=<ViewBackward>), hidden_states=None, attentions=None)
MultipleChoiceModelOutput(loss=None, logits=tensor([[2.1362],
        [2.1362],
        [2.1362],
        [2.1362]], grad_fn=<ViewBackward>), hidden_states=None, attentions=None)
MultipleChoiceModelOutput(loss=None, logits=tensor([[2.1362],
        [2.1362],
        [2.1362],
        [2.1362]], grad_fn=<ViewBackward>), hidden_states=None, attentions=None)

But how do I obtain which candidate was guessed? Those outputs did not make a sense.

The ones that involves BERT had same examples except model name so I guess they would output something similar. Only MT5 ones seemed to output what I expected.

Where is data for reading comprehnesion?

Hi,
Thank you for providing this benchmark,

However, I can't find data for reading comprehension in the data folder.

Machine translation - long sentences cause incomplete translation

I'm translating English sentences into Farsi with mt5-base-parsinlu-translation_en_fa (from Huggingface). Sentences longer than around 8 words result in the translation of the first part of the sentence, but the rest of the sentence is ignored. For example:

English sentences:

Terry's side fell to their second Premier League loss of the season at Loftus Road

Following a four-day hiatus, UN envoy Ismail Ould Cheikh Ahmed on Thursday will resume mediation efforts in the second round of Kuwait-hosted peace talks between Yemen’s warring rivals.

Mark Woods is a writer and broadcaster who has covered the NBA, and British basketball, for over a decade.

Translations:

طرفدار تری در فوتبال دوم فصل در لئوپوس رود به

پس از چهار روز توقف، سفیر سازمان ملل، ایمیل اولد شیخ

مارک ولز نویسنده و پخش کننده ای است که بیش از یک دهه

which according to Google Translate translates back to this:

More fans in the second football season in Leopard

After a four-day hiatus, the ambassador to the United Nations, Old Sheikh Sheikh

Mark Wells has been a writer and broadcaster for over a decade

I can't find any configuration settings that would be limiting the number of tokens being translated
Here is my code:

#!/usr/bin/python3
import sys
#from transformers import MarianTokenizer, MarianMTModel
from transformers import MT5ForConditionalGeneration, MT5Tokenizer
from typing import List
import torch

device = "cuda:0"

dir=sys.argv[1] + "persiannlp"
size="base"
mname = f'{dir}/data/mt5-{size}-parsinlu-translation_en_fa'

tokenizer = MT5Tokenizer.from_pretrained(mname)
model = MT5ForConditionalGeneration.from_pretrained(mname)
model = model.to(device)

lines = [] 
while True:
    for line in sys.stdin:
        line = line.strip()
        if line == 'EOD':
            inputs    = tokenizer(lines, return_tensors="pt", padding=True).to(device)
            translated   = model.generate(**inputs).to(device)
            [print(tokenizer.decode(t, skip_special_tokens=True)) for t in translated]
            print('EOL')
            sys.stdout.flush()
            lines.clear()
        elif line.startswith('EOF'):
            sys.exit(0)
        else:
            lines.append(line)
sys.exit(0)

mt5 tokenizer not loading

I was trying to experiment with your reading comprehension mt5 model and ran into a problem. I followed the code provided in the model card of huggingface
mt5-small-parsinlu.

when I run

tokenizer = MT5Tokenizer.from_pretrained("persiannlp/mt5-small-parsinlu-squad-reading-comprehension")

I get no error but I get the error "AttributeError: 'NoneType' object has no attribute 'encode'" when I try to call the run_model() which points to the tokenizer. I checked the type of tokenizer with both "base" and "small" models and it returns a NoneType. The model, however, loads fine and the type is correct using the same checkpoint name.

persiannlp / parsinlu Goto Github PK

parsinlu's People

Contributors

Stargazers

Watchers

Forkers

parsinlu's Issues

Sentiment analysis files

empty result

Looking for best NLI model

Reading comprehension data (Answers)

Fine-tune Huggingface model

TypeError: init() got an unexpected keyword argument 'early_stop_callback'

Intergrade parsinlu with RASA

Example for wikibert-base-parsinlu-multiple-choice seemed not working.

Where is data for reading comprehnesion?

Machine translation - long sentences cause incomplete translation

mt5 tokenizer not loading

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent