Git Product home page Git Product logo

kornludatasets's Introduction

KorNLU Datasets

This is the dataset repository for our paper KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding.

We introduce KorNLI and KorSTS, which are NLI and STS datasets in Korean.

KorNLI

Dataset Overview

KorNLI Total Train Dev. Test
Source - SNLI, MNLI XNLI XNLI
Translated by - Machine Human Human
# Examples 950,354 942,854 2,490 5,010
Avg. # words (premise) 13.6 13.6 13.0 13.1
Avg. # words (hypothesis) 7.1 7.2 6.8 6.8

Examples

Example English Translation Label
P: 저는, 그냥 알아내려고 거기 있었어요.
H: 이해하려고 노력하고 있었어요.
I was just there just trying to figure it out.
I was trying to understand.
Entailment
P: 저는, 그냥 알아내려고 거기 있었어요.
H: 나는 처음부터 그것을 잘 이해했다.
I was just there just trying to figure it out.
I understood it well from the beginning.
Contradiction
P: 저는, 그냥 알아내려고 거기 있었어요.
H: 나는 돈이 어디로 갔는지 이해하려고 했어요.
I was just there just trying to figure it out.
I was trying to understand where the money went.
Neutral

KorSTS

Dataset Overview

KorSTS Total Train Dev. Test
Source - STS-B STS-B STS-B
Translated by - Machine Human Human
# Examples 8,628 5,749 1,500 1,379
Avg. # words 7.7 7.5 8.7 7.6

Examples

Example English Translation Label
한 남자가 음식을 먹고 있다.
한 남자가 뭔가를 먹고 있다.
A man is eating food.
A man is eating something.
4.2
한 비행기가 착륙하고 있다.
애니메이션화된 비행기 하나가 착륙하고 있다.
A plane is landing.
A animated airplane is landing.
2.8
한 여성이 고기를 요리하고 있다.
한 남자가 말하고 있다.
A woman is cooking meat.
A man is speaking.
0.0

License

Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0)

References

If you use KorNLI or KorSTS for research, please cite our paper:

@article{ham2020kornli,
  title={KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding},
  author={Ham, Jiyeon and Choe, Yo Joong and Park, Kyubyong and Choi, Ilji and Soh, Hyungjoon},
  journal={arXiv preprint arXiv:2004.03289},
  year={2020}
}

kornludatasets's People

Contributors

hammouse avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.