Hi, If I want to make a 1B/3B model for Mistral, do you know approximately how man

How much compute will this take? about llm-shearing HOT 7 CLOSED

princeton-nlp commented on September 21, 2024

How much compute will this take?

from llm-shearing.

Comments (7)

xiamengzhou commented on September 21, 2024

We use approximantely 1845 A100 GPU hours and 3310 A100 GPU hours to get the 1.3B and 2.7B model. However, the actual execution also heavily dependent on your set up and cluster speed.

from llm-shearing.

fakerybakery commented on September 21, 2024

Also, are you planning to release a sheared Mistral version?

from llm-shearing.

xiamengzhou commented on September 21, 2024

We use an in-house cluster at Princeton! I think A100 should be more expensive than $0.5 per hour though.

from llm-shearing.

xiamengzhou commented on September 21, 2024

Also, are you planning to release a sheared Mistral version?

We intend to add support for the mistral and pythia models in the upcoming weeks. We are in short of computes -- so I am not sure if we will end up delivering these models before the next stronger 7B model comes out.

from llm-shearing.

SinanAkkoyun commented on September 21, 2024

Hi, when having full control over all finetuning data, does it make the most sense to first shear the base model and then finetune on top? Or is it better to finetune in advance (or a mixture of both)? Completely disregarding cost, just purely performance and overfitting related

from llm-shearing.

xiamengzhou commented on September 21, 2024

Hi! Yeah I think it makes most sense to prune the base model first then finetune, as it's largely believed that the abilities of language models are enabled by pre-training. This is the most neat way to execute.

However, I am not too sure about what the performance will be like when mixing pre-training and fine-tuning data for pruning -- it might have the benefit to help the pruning process find a submodel that better follows instructions.

from llm-shearing.

SinanAkkoyun commented on September 21, 2024

Alright, tysm!

from llm-shearing.

Recommend Projects

How much compute will this take? about llm-shearing HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent