Git Product home page Git Product logo

zaloqa2019's Introduction

1st place solution for Zalo AI 2019 - Vietnamese Wiki Question Answering

Overview

Given a question, related paragraphs from a Wikipedia article (in the shuffle order), the task is finding paragraph which answers the question of each test case.

F1 measure based on precision and recall is used to rank competing submissions.

For detailed, visit: https://challenge.zalo.ai/portal/question-answering

About Data?

  • train data: 18108 question text pairs
  • test data: 2678 question text pairs

Some interesting/confusing/mistake question text pairs:

Question Text Title Label
Việt nam ra nhập liên hợp quốc năm bao nhiêu Quốc gia Việt Nam và Việt Nam Dân Chủ Cộng Hoà cùng nộp đơn vào tháng 10 năm 1949 . Quá trình gia nhập Liên Hợp Quốc của Việt Nam ?
Ngọn núi cao nhất thế giới tính từ mực nước biển Độ cao so với mực nước biển của mặt đất thay đổi từ -418 m ở biển Chết tới 8.848 m trên đỉnh Everest và độ cao trung bình trên mặt nước biển là 840 m . Trái Đất ?
Lý Thái Tổ trị vì bao nhiêu năm? Lý Thái Tổ ( ) , là vị hoàng đế sáng lập nhà Lý trong lịch sử Việt Nam , trị vì từ năm 1009 đến khi qua đời vào năm 1028 . Lý Thái Tổ ?

Approaches

We used handcrafted features, bert and combined on original data (only use question, text), or add title data, add transitivity rules and add external data (Squad 2.0 translated to Vietnamese).

We have made 73 single models from the above combinations.

Final Solution

We selected the best model to submit. Although ensemble version have the better result, however the predict time is so much longer than one model, so one model is submit. The config is below:

Init checkpoint at https://github.com/lampts/bert4vn (BERT for Vietnamese)

num_train_epochs = 5.0

max_seq_length = 512 

train_batch_size = 128

learning_rate = 2e-5

Train

Please download the init checkpoint at here and put into the folder checkpoint.

Put data in the folder /data/

Run command sh train.sh

Predict

If you want to only predict, please download the checkpoint of trained model at here

Run command: sh predict.sh.

What's next?

ALBERT for Vietnamese will be release at https://github.com/ngoanpv/albert_vi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.