Hi, I'm wondering the real data num of dolly in your experiments. You mentioned that y

【MiniLLM】About the number of training data of dolly about lmops HOT 4 CLOSED

songmzhang commented on June 3, 2024

【MiniLLM】About the number of training data of dolly

from lmops.

Comments (4)

t1101675 commented on June 3, 2024

The code and data in this repo can reproduce the reported results.
We filtered out the samples that are too long to fit in the context length of our model. Therefore, the number of samples in the repo are smaller than that in the paper (12.4k v.s. 15k, 242 v.s. 252). For validation samples in dolly, we indeed use 1K samples for validation in our experiments.
We will fix and explain these information in our paper. Thanks for pointing it out!

from lmops.

songmzhang commented on June 3, 2024

The code and data in this repo can reproduce the reported results. We filtered out the samples that are too long to fit in the context length of our model. Therefore, the number of samples in the repo are smaller than that in the paper (12.4k v.s. 15k, 242 v.s. 252). For validation samples in dolly, we indeed use 1K samples for validation in our experiments. We will fix and explain these information in our paper. Thanks for pointing it out!

Thanks very much for your quick reply! So for dolly, you use 11.4k data for training, 1k data for validation (selecting the best ckpt) and 500 data for testing (reproducing the results in paper)?

from lmops.

t1101675 commented on June 3, 2024

yes

from lmops.

songmzhang commented on June 3, 2024

yes

Thanks!

from lmops.

Recommend Projects

【MiniLLM】About the number of training data of dolly about lmops HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent