Comments (7)
Hi @feifeibear . Thank you so much for your effort. We would appreciate it if you could also share the configurations used to test the same models with Deepspeed and PatrickStar? We would like to evaluate and improve the performance on a similar node scale as well as larger scale.
BTW, did you try 3d-8mp? 3d requires a cube number of mp.
from colossalai.
The DeepSpeed benchmark script
https://github.com/feifeibear/DeepSpeedZeRO3Benchmark
The PatrickStar
https://github.com/Tencent/PatrickStar/blob/master/examples/run_transformers.sh
The benchmarking is very easy.
export SUFFIX="colossal_compare"
env GPU_NUM=8 MODEL_TYPE="GPT" MODEL_NAME=GPT3_10B BS=2 CPU_EBD=0 AMM=1 MSC=1 CACHE=1 SP=0 CS=288 HYB=1 TILING=0 ACT_OFFLOAD=0 SUFFIX=${SUFFIX} bash run_transformers.sh
from colossalai.
I have uploaded the logs of DeepSpeed and PatirckStar to Baidu WangPan...
Note that for DeepSpeed, the SamplesPerSec is not equal to 'Throughput'. You have to calculate it by batch/elapse.
link: https://pan.baidu.com/s/1vEHl0hPuxDb7HjOlpuW-YA?pwd=1mfd
code: 1mfd
from colossalai.
@feifeibear Thank you!
from colossalai.
This issue is stale because it has been open for 14 days with no activity.
from colossalai.
Thanks for your report, detailed tests with stable code will come soon.
from colossalai.
We have updated a lot. This issue was closed due to inactivity. Thanks.
from colossalai.
Related Issues (20)
- When I execute `colossalai run --nproc_per_node 2 --master_addr localhost --master_port 29500 train.py`, I encounter an error. HOT 2
- [FEATURE]: pretrain data example
- Pedido pra voltar (1) HOT 1
- [BUG]: OOM during llama2 pretraining with flashattention and PP HOT 3
- [BUG]: Cannot find module 'colossalai.kernel.cuda_native' when run sequence_parallel example HOT 4
- TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' HOT 4
- [BUG], please delete this item.
- [FEATURE]: cuda 12 support HOT 2
- [BUG]: ValueError: mutable default <class 'colossalai.legacy.tensor.distspec._DistSpec'> for field dist_attr is not allowed: use default_factory HOT 1
- [BUG]: AttributeError: type object 'ColoParameter' has no attribute 'from_torch_tensor' when run hybrid_parallel example HOT 3
- [FEATURE]: Support qwen2 model
- [BUG]: OOM when saving 70B model HOT 2
- [DOC]: What is the datasetset used to train the Colossal-Llama-2? HOT 1
- [BUG]: Running ColossalAI in H800 with torch 2.0 HOT 28
- [BUG]: pretraing llama2 using "gemini" plugin, can not resume from saved checkpoints HOT 1
- [BUG] [Shardformer]: Error in blip2 testing with half precision HOT 1
- [FEATURE]: support multiple (partial) backward passes for zero
- [BUG]: re-join str type error_msgs using `\n\t` in general_checkpoint_io
- how to wrapped multiple models with booster HOT 3
- [BUG]: ColossalMoE Train: AssertionError: Parameters are expected to have the same dtype `torch.bfloat16`, but got `torch.float32` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colossalai.