Comments (10)
Hey guys, I just made a video on how to do this in google collab: https://youtu.be/3de0Utr9XnI
Hope it helps!
from llm-foundry.
@baptistejamin is it possible to make a jupyter notebook , that way we could use that to fine-tune MPT-7B using cloud GPUs (paid ofc) , but in a notebook format it will be easy for others to pick up as well.
from llm-foundry.
A solution was found here: #94 (comment)
from llm-foundry.
To set your starting model to our MPT-7B-Instruct model on the Hugging Face hub, you'd use this model config in your YAML
# Model
model:
name: hf_causal_lm
pretrained_model_name_or_path: mosaicml/mpt-7b-instruct
init_device: cpu
pretrained: true
# Comment the attn_config block to use default "torch" attention
config_overrides:
attn_config:
attn_impl: triton
Note that pretrained_model_name_or_path
determines what value is passed to Hugging Face's AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=...)
when building the model. from_pretrained
supports local paths.
For freezing all the weights except the last layer, you can add some custom logic to scripts/train/train.py
. The guy in this video does that freezing with our model inside a notebook. Might be a useful reference.
from llm-foundry.
@alextrott16 so I managed to successfully run the train.py. Two questions however:
- I froze all the weights except the last one and it took maybe 10 seconds to fine-tune on a dataset of just 6 samples. Is that normal? Using an A100-40GB
- Where is the updated model saved? I see in the yaml there's a prop called
save_folder
and having specified that I get a file called "latest-rank0.pt", it's about 23.61 GB. Is this the updated model? How do I use it? Sorry if I'm not asking the right questions.
Thank you!
from llm-foundry.
To set your starting model to our MPT-7B-Instruct model on the Hugging Face hub, you'd use this model config in your YAML
# Model model: name: hf_causal_lm pretrained_model_name_or_path: mosaicml/mpt-7b-instruct device: cpu pretrained: true # Comment the attn_config block to use default "torch" attention attn_config: attn_impl: tritonNote that
pretrained_model_name_or_path
determines what value is passed to Hugging Face'sAutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=...)
when building the model.from_pretrained
supports local paths.
Hi, we fine tuned an instruction model without freezing any weights. The resulting checkpoint is 77GB ! How do we convert it back to the hf-format and use it ?
Also, when trying to fine-tune the training loss profile is way better with
name: mpt_causal_lm than with hf_causal_lm. What is the difference?
Thank you.
from llm-foundry.
Hi, we fine tuned an instruction model without freezing any weights. The resulting checkpoint is 77GB ! How do we convert it back to the hf-format and use it ?
Hi @SoumitriKolavennu , the 77GB is expected given that it is a 7B model and the Composer checkpoint holds both model and optimizer state. To convert to a HF checkpoint folder, you can use the instructions in the scripts/inference
folder: https://github.com/mosaicml/llm-foundry/tree/main/scripts/inference#converting-a-composer-checkpoint-to-an-hf-checkpoint-folder
Also, when trying to fine-tune the training loss profile is way better with
name: mpt_causal_lm than with hf_causal_lm. What is the difference?
Could you clarify exactly what you see different between the training loss profiles (with screenshots)? In one case (mpt_causal_lm
) , you are probably initializing a from-scratch MPT and finetuning it. In the other case (hf_causal_lm
pointed at the HF model mosaicml/mpt-7b-instruct
) you are starting from the pretrained weights of our MPT-7B-Instruct model on the HF Hub. I would expect the latter to have much lower initial loss and result in a higher quality model
from llm-foundry.
Hi @SoumitriKolavennu , I'm closing this issue for now but feel free to reopen if you need further assistance.
from llm-foundry.
Hi @SoumitriKolavennu , I'm closing this issue for now but feel free to reopen if you need further assistance.
Hi @abhi-mosaic, Thank you for the help in converting to huggingface. Following your instructions worked great. One minor suggestion would be to include the converter in training folder instead of the inference folder. My other question about name is still relevant but perhaps it deserves a thread of its own. Please close this issue and thank you for the help.
from llm-foundry.
Hey guys, I just made a video on how to do this in google collab: https://youtu.be/3de0Utr9XnI Hope it helps!
@VRSEN can you make a Colab or video by generalizing the input data preprocessing a bit more. For example, if we want to fine-tune for a news summarization task, how would I preprocess the dataset (e.g., HF dataset multi_news) which has two columns only "document" and "summary"?
from llm-foundry.
Related Issues (20)
- Fly tokenization with multiple streams HOT 12
- Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning HOT 2
- Evaluation for long_context_tasks failed with a KeyError: 'continuation_indices' HOT 3
- Can you add the pre-training of dbrx? HOT 3
- Installation issue from habana_alpha branch HOT 2
- Fine-tune dbrx-instruct on a single VM with 8 H100s HOT 1
- Is there a way to figure out what dependencies are installed in the docker image? HOT 1
- Opt-3b Pretrain YAML config failing with mosaicml/llm-foundry/2.2.1_cu121_flash2-4aef5de docker HOT 1
- Observing 1/2 the throughput on AMD MI250 HOT 4
- Freeze when using cpu offload HOT 2
- Issues in using FP8 for MPT baselines on H100 HOT 2
- FP8 not working HOT 2
- convert_dataset_hf.py example stuck HOT 2
- Is flops calculation correct? HOT 2
- Loss curve differences for pretraining HOT 1
- How to run inference/convert_composer_to_hf.py with MPT-1B model on Habana Gaudi 2, file formats do not match HOT 4
- `ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size HOT 2
- Issue when installing "pip install -e ".[gpu-flash2]"" HOT 3
- Wrong number of samples for C4? HOT 2
- Composer crashes when attempting to load sharded checkpoint HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-foundry.