Comments (4)
Hi, Gaudi support is alpha, and uses Deepspeed in FSDP. Most of the LLM Foundry repo is built around using FSDP, in particular that helper script assumes a single checkpoint file, which is different from what Deepspeed produces. Unfortunately we don't have an easy script for converting from a Deepspeed checkpoint into another format.
from llm-foundry.
Hi Daniel, thanks for the response, and to be clear, I am an Intel Gaudi employee. My goal here is to take what was documented in the blog: https://www.databricks.com/blog/llm-training-and-inference-intel-gaudi2-ai-accelerators and provide the specific
instructions to run the MPT-1B model with 8 Gaudi cards.
Note that the mpt-1b-gaudi2.yaml that is on your github page has FSDP commented out for Gaudi usage, it's not being used.
So how is the blog executing the commands? You show how to run 8 Gaudi Inference using Hugging Face, and it seems like you have to run that convert_composer_to_hf.py to get to Optimum Habana..
from llm-foundry.
we can close this. As the support is early, we'll stay focused on the training section only.
from llm-foundry.
closed
from llm-foundry.
Related Issues (20)
- How does packing work for non-MPT models? HOT 10
- sharded model format HOT 2
- Any plan for supporting DPO? HOT 1
- How to support new ICL task types in own codebase HOT 9
- Error loading JSON fine-tuning datasets HOT 1
- How to record loss curve?
- Triton attention patch from Mistral HOT 1
- Can't create environment on A100 server HOT 2
- How to support multi-threaded parallel data preprocessing? HOT 11
- Freeze when using cpu offload HOT 2
- Issues in using FP8 for MPT baselines on H100 HOT 2
- FP8 not working HOT 2
- convert_dataset_hf.py example stuck HOT 2
- Is flops calculation correct? HOT 2
- Loss curve differences for pretraining HOT 1
- `ValueError` when following finetuning `mpt-7b-arc-easy--gpu.yaml` example with different default batch size HOT 2
- Issue when installing "pip install -e ".[gpu-flash2]"" HOT 3
- Wrong number of samples for C4? HOT 2
- Composer crashes when attempting to load sharded checkpoint HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-foundry.