System Info deepspeed accelerate Who can he

QLoRA should work with <a href="https://huggingface.co/docs/peft/main/en/accelerate/de

PP or TP supported for multi-node training? about peft HOT 3 CLOSED

mxjyst commented on June 10, 2024

PP or TP supported for multi-node training?

from peft.

Comments (3)

BenjaminBossan commented on June 10, 2024

QLoRA should work with DeepSpeed. FSDP should also work but may require the latest versions of the respectively involved libraries (bitsandbytes, transformers, trl, peft).

from peft.

mxjyst commented on June 10, 2024

Agree with you. But deepspeed ad FSDP are all data parallel. If I want to fine-tune much large model, like 130B llm, there will be OOM with 80G 4xA100. I think pipeline or tensor parallel is needed in this case.

from peft.

BenjaminBossan commented on June 10, 2024

DS and FSDP are not just data parallel but model parallel. Regarding PP and TP, I'm not aware of working examples. This doesn't mean it can't work, but I wouldn't suspect it to run out of the box.

from peft.

Related Issues (20)

model.print_trainable_parameters() is incomplete. HOT 5
Merging models and feature extraction HOT 4
TypeError: ChatGLMForConditionalGeneration.forward() got an unexpected keyword argument 'decoder_input_ids' HOT 4
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half HOT 11
examples/sft/run_peft.sh model load dtype error HOT 7
Error while merge and unload: InvalidHeaderDeserialization when open .safetensor model HOT 12
LISA HOT 6
[Feature Request] Support `torch.nn.Conv1d` layers for LoRA HOT 2
Is there a choice to use `ignore_mismatch_sizes` in PeftModel.from_pretrained like AutoModel in Transformers? HOT 7
Using LoRA on custom models HOT 2
p_tuning layers disappear after loading peft model HOT 4
Configuration issue HOT 11
AttributeError: ModulesToSaveWrapper has no attribute layers HOT 2
Prefix tuning configuration issue HOT 1
More Input: If I use PyTorch Lightning for large model fine-tuning (LoRA) tasks, and I want to pass an additional parameter p outside the input-output pair (input, label), to be used in the Linear layer of LoRA for some special tasks during forward pass, how should I proceed? HOT 2
LoRA layer param.grad=None after loss.backward() using lora for mamba 2.8B HOT 2
peft v0.10 takes up too much GPU memory than v0.3.0 HOT 14
Support HQQ method. HOT 11
replace_lora_weights_loftq seems not present in transformers HOT 2
Documentation Clarification for loading PEFT models with AutoModelForCausalLM.from_pretrained HOT 3

PP or TP supported for multi-node training? about peft HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent