Hello, I am struggling to download fused_adam pre build of deepspeed. I found noth

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

DS_ACCELERATOR=cuda ds_report <a class="u

Thanks for clarifying the env var use, <a class="user-mention notranslate" data-hoverc

Failed to install Fused_adam op on CPU about deepspeed HOT 9 CLOSED

daehuikim commented on July 19, 2024

Failed to install Fused_adam op on CPU

from deepspeed.

Comments (9)

delock commented on July 19, 2024 2

Hi @daehuikim I use the following command and can see cuda op status. Note I don't have CUDA toolchain installed. Is your environment have CUDA toolchain you should be able to see desired result on your master node.

DS_ACCELERATOR=cuda DS_BUILD_FUSED_ADAM=1 pip install deepspeed
DS_ACCELERATOR=cuda ds_report

You may want to set in your .bashrc if you wish to build CUDA by default on master node. You don't need this env on compute node but it will work as well.

(dscpu) 22:07:19|~/machine_learning/DeepSpeed$ DS_ACCELERATOR=cuda ds_report
[2024-05-27 22:07:36,330] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (override)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/akey/anaconda3/envs/dscpu/lib/python3.10/site-packages/torch']
torch version .................... 2.1.0+cu121
deepspeed install path ........... ['/home/akey/anaconda3/envs/dscpu/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.2, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version .....................  [FAIL] cannot find CUDA_HOME via torch.utils.cpp_extension.CUDA_HOME=None 
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 31.18 GB

from deepspeed.

loadams commented on July 19, 2024

Hi @daehuikim - are you able to run pip install deepspeed with no errors? And do you hit any errors when installing other ops?

It appears that your system is being detected as a CPU, but you have installed torch+cuda, can you tell us more about what accelerator you are trying to use?

[2024-05-21 11:34:41,285] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.

from deepspeed.

daehuikim commented on July 19, 2024

Hello @loadams Thanks for your replying.
pip install deepspeed works without any errors for me.
I am running my script on my master node which has CPU only using slurm scheduler.
Specifically, I am activating conda virtual environment that has packages and propagate works to worker node which has multiple GPUs using slurm scheduler.
Therefore, I am trying to install deepspeed with ops in my conda virtual environment.

from deepspeed.

loadams commented on July 19, 2024

I see, is there a reason that you need to precompile the ops? Since you should be able to run DeepSpeed on the GPU nodes and it will detect the GPU and then JIT compile the ops (information here.)

from deepspeed.

daehuikim commented on July 19, 2024

@loadams
There is no reason for doing this.
I was just following this tutorial about finetuning t5 model.
I found another way to utilize fused adm just adding
torch_adam=true
in optimizers in deepspeed config now.
I just wanted to let contributors know this(failing pre-build installation in some environment) happens.
Thanks for replying!

from deepspeed.

loadams commented on July 19, 2024

Thanks @daehuikim - that makes sense, since it currently believes your environment is a CPU environment on your master node, so it believes that it can only run certain ops that are installed. Can you try running with the following (this may not work since you don't have cuda installed on the node, but if you do, you can specify the type of DeepSpeed accelerator to build for with the DS_ACCELERATOR=cuda env var added before your pip install command?

from deepspeed.

daehuikim commented on July 19, 2024

pip uninstall deepspeed
DS_ACCELERATOR=cuda pip install deepspeed
ds_report

produces the result like below

[2024-05-24 09:17:18,762] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-05-24 09:17:18,763] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['TORCH_INSTALL_PATH']
torch version .................... 2.1.2+cu121
deepspeed install path ........... ['DEEPSPEED_INSTALL_PATH']
deepspeed info ................... 0.14.2+cu118torch2.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.0
shared memory (/dev/shm) size .... 125.67 GB

@loadams I tried recommended variable and got same result.

from deepspeed.

daehuikim commented on July 19, 2024

DS_ACCELERATOR=cuda ds_report

@delock Your recommendation made everything perfect! Thanks for giving nice advice on it!
I got same results with you Thanks again :)

from deepspeed.

loadams commented on July 19, 2024

Thanks for clarifying the env var use, @delock!

from deepspeed.

Failed to install Fused_adam op on CPU about deepspeed HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent