Git Product home page Git Product logo

jglue's People

Contributors

shirayu avatar tomohideshibata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jglue's Issues

Unable to generate MARC-ja because of 403 Forbidden

Thank you for the great benchmark.

Amazon Reviews Corpus seems to be inaccessible.

$ wget https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz
--2023-07-31 15:22:11--  https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz
s3.amazonaws.com (s3.amazonaws.com) をDNSに問いあわせています... 52.216.98.53, 52.216.41.112, 52.216.249.70, ...
s3.amazonaws.com (s3.amazonaws.com)|52.216.98.53|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 403 Forbidden
2023-07-31 15:22:11 エラー 403: Forbidden。

and with the command from https://registry.opendata.aws/amazon-reviews-ml/

$ aws s3 ls --no-sign-request s3://amazon-reviews-ml/

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

We may be able to move to HuggingFace: https://huggingface.co/datasets/amazon_reviews_multi
(I can not validate that I can generate the same dataset as the original one.)
(also not available)

The "label" column in the JSTS dataset is a string dtype

Hi, thanks for publishing JGLUE.

The dtype for the JSTS label column is a string dtype.

{"sentence_pair_id": "0", "yjcaptions_id": "100312_421853-104611-31624", "sentence1": "レンガの建物の前を、乳母車を押した女性が歩いています。", "sentence2": "厩舎で馬と女性とが寄り添っています。", "label": "0.0"}

Why?

I think that run_glue.py determines if a task is a regression task or not by the dtype of the label column, so if it is a string dtype, it is treated as a classification task.
https://github.com/huggingface/transformers/blob/v4.9.2/examples/pytorch/text-classification/run_glue.py

In fact, fine-tuning BERT in JSTS resulted in a 26-value classification model.
(I have patched run_glue.py.)

DeBERTa models support

Thank you for releasing JGLUE, but I could not evaluate my deberta-base-japanese-aozora. There seem two problems exist:

  • DeBERTaV2ForMultipleChoice requires transformers v4.19.0 and after, but JGLUE requires v4.9.2
  • Fast tokenizers (including DeBERTaV2TokenizerFast) are not supported on JSQuAD with --use_fast_tokenizer

I tried to force v4.19.2 for the problems, but I could not resolve the latter. Please see detail in my diary (written in Japanese). Do you have any idea?

cannot reproduce the baseline score of question answering with transformers v4.19.2

I tried to reproduce the baseline score with run_squad.py parameters you provided and patched transformers v4.19.2.
but the result score in eval_results.json is quite low compared to the baseline.

    "exact": 42.30076542098154,
    "f1": 42.390814948221525,

based on fune-tuning/README.md, I think you confirmed that transformers v4.19.2 worked.
How was the score then?

I'm attaching the requirements.txt and eval_results.json when I tested with transformers v4.19.2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.