Git Product home page Git Product logo

sailor-llm's Introduction

Sailor: Open Language Models for South-East Asia

Homepage    HuggingFace    Technical Report    SailCraft

This repository contains the evaluation code for Sailor, a suite of open language models for South-East Asia.

Sailor is developed by the Sea AI Lab and Singapore University of Technology and Design.

Introduction

Sailor is a suite of Open Language Models tailored for South-East Asia (SEA), focusing on languages such as 🇮🇩Indonesian, 🇹🇭Thai, 🇻🇳Vietnamese, 🇲🇾Malay, and 🇱🇦Lao. Developed with careful data curation, Sailor models are designed to understand and generate text across diverse linguistic landscapes of SEA region. Built from Qwen 1.5, Sailor encompasses models of varying sizes, spanning from 0.5B to 7B versions for different requirements. Benchmarking results demonstrate Sailor's proficiency in tasks such as question answering, commonsense reasoning, reading comprehension and etc. in SEA languages.

  • Continually pretrained on 200 Billion to 400 Billion tokens over 7 languages, including Indonesian, Thai, Vietnamese, Malay, Lao, English and Chinese.
  • Various model sizes (0.5B, 1.8B, 4B and 7B) to support different requirements.
  • Strong performance on SEA benchmarks such as XQuAD, TydiQA, XCOPA, Belebele and M3Exam.
  • No restrict on the research and the commercial use, but should comply with the Qwen 1.5 license.

To learn more details, please access the technical report.

Models

You can find all the Sailor models in our Huggingface home page here:

Evaluation

Here are the results of the evaluation of the models on question answering tasks. The evaluation results are presented in the form of tables, where the first column is the model name, and the reset columns are the performance on Thai (th), Indonesian (id), and Vietnamese (vi) languages, respectively. The results of Sailor models are highlighted in bold. You can find the full evaluation results on the different tasks and our evaluation code to reproduce the results in the eval directory.

Question Answering

3-shot (EM / F1) XQuAD (th) TydiQA (id) XQuAD (vi)
Qwen1.5-0.5B 14.19 / 23.35 20.71 / 32.64 19.85 / 35.38
Sailor-0.5B 15.84 / 27.58 30.44 / 54.74 21.13 / 40.57
Qwen1.5-1.8B 27.24 / 43.56 29.73 / 53.76 29.17 / 48.15
Sailor-1.8B 32.72 / 48.66 40.88 / 65.37 34.22 / 53.35
Qwen1.5-4B 34.03 / 53.40 48.32 / 72.68 43.71 / 63.86
Sailor-4B 46.82 / 63.34 53.98 / 73.48 47.65 / 67.09
Llama-2-7b 30.64 / 43.80 56.64 / 72.14 46.96 / 66.16
Mistral-7B-v0.1 48.48 / 63.27 63.54 / 78.73 53.72 / 72.75
SeaLLM-7b-Hybrid 49.70 / 67.62 50.62 / 75.21 49.62 / 70.74
SeaLLM-7b-v2 34.55 / 55.13 52.21 / 77.00 46.19 / 72.11
Qwen1.5-7B 53.79 / 69.30 57.17 / 77.28 56.63 / 76.99
Sailor-7B 57.88 / 71.06 60.53 / 75.42 53.81 / 74.62

Setup

We use OpenCompass to evaluate the models. To install the required packages, run the following command under this folder:

# setup opencompass environment
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
pip install pythainlp langid
mkdir data

Build Evaluation Script

To build the evaluation script, run the following command under this folder:

cp -r eval/configs/* opencompass/configs/
cp -r eval/data/* opencompass/data/
cp -r eval/datasets/* opencompass/opencompass/datasets/
cp eval/icl_sailor_evaluator.py opencompass/opencompass/openicl/icl_evaluator/
cp eval/sailor_text_postprocessors.py opencompass/opencompass/utils/
echo "from .icl_sailor_evaluator import AnsEvaluator, TextGenEvaluator  # noqa" >> "opencompass/opencompass/openicl/icl_evaluator/__init__.py"
echo "from .sailor_text_postprocessors import *  # noqa" >> "opencompass/opencompass/utils/__init__.py"
echo "from .xquad import *  # noqa: F401, F403" >> "opencompass/opencompass/datasets/__init__.py"
echo "from .tydiqa_id import *  # noqa: F401, F403" >> "opencompass/opencompass/datasets/__init__.py"
echo "from .xcopa_sea import *  # noqa: F401, F403" >> "opencompass/opencompass/datasets/__init__.py"
echo "from .m3exam import *  # noqa: F401, F403" >> "opencompass/opencompass/datasets/__init__.py"
echo "from .belebele import *  # noqa: F401, F403" >> "opencompass/opencompass/datasets/__init__.py"
cp eval/eval_sailor.py opencompass/configs/

Run Evaluation

To run the evaluation, run the following command under this folder:

cd opencompass
python run.py configs/eval_sailor.py -w outputs/sailor --num-gpus 1 --max-num-workers 64

You can also modify the script to evaluate other models like Qwen1.5, Llama, Mistral, etc.

Demo

We provide a simple demo to chat with Sailor-7B-Chat . You can also develop it using the provided demo code.

Citing this work

If you use this repository or sailor models, please cite

@misc{dou2024sailor,
      title={Sailor: Open Language Models for South-East Asia}, 
      author={Longxu Dou and Qian Liu and Guangtao Zeng and Jia Guo and Jiahui Zhou and Wei Lu and Min Lin},
      year={2024},
      eprint={2404.03608},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

If you have any questions, please raise an issue in our Github or contact us at [email protected] and [email protected].

sailor-llm's People

Contributors

chaoscodes avatar eltociear avatar guojiapub avatar longxudou avatar siviltaram avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sailor-llm's Issues

running error when run eval

Hi nice work! I was aware that Thai is difficult to optimize so I'm very interested in the perf eval of this work!
When I started to run the code, it gave me this assertion error, you guys know why it happened?

$python run.py configs/eval_sailor.py -w outputs/sailor --num-gpus 2 --max-num-workers 64
03/08 10:47:25 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
03/08 10:47:25 - OpenCompass - INFO - Partitioned into 12 tasks.
0%| | 0/12 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/opencompass/opencompass-main/run.py", line 357, in
main()
File "/home/opencompass/opencompass-main/run.py", line 309, in main
runner(tasks)
File "/home/opencompass/opencompass-main/opencompass/runners/base.py", line 39, in call
self.summarize(status)
File "/home/opencompass/opencompass-main/opencompass/runners/base.py", line 61, in summarize
for _task, code in status:
File "/home/.conda/envs/opencompass/lib/python3.10/concurrent/futures/_base.py", line 609, in result_iterator
yield fs.pop().result()
File "/home/.conda/envs/opencompass/lib/python3.10/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/home/.conda/envs/opencompass/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/home/.conda/envs/opencompass/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/opencompass/opencompass-main/opencompass/runners/local.py", line123, in submit
assert len(gpus) >= num_gpus
AssertionError
0%| | 0/12 [00:00<?, ?it/s]
(opencompass)

My environment was set up as the guideline you gave, and my GPUs are A800 * 2. they work fine. could you help and see how to resolve the error? Much appreciate it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.