Git Product home page Git Product logo

Comments (7)

axelmagn avatar axelmagn commented on May 26, 2024 2

After some tweaking, I think I've got it working. I ended up using the HazyReasearch/flash-attention fork. For others trying via docker, this is the dockerfile I used:

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

ARG TORCH_CUDA_ARCH_LIST="8.0+PTX"

RUN apt-get update && apt-get install -y \
    build-essential \
    apt-utils \
    python3.10 \
    python3-pip \
    git \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
RUN pip install \
    torch==2.1.2 \
    torchvision==0.16.2 \
    torchaudio==2.1.2 \
    --index-url https://download.pytorch.org/whl/cu118 # due to observed causal-conv1d dependency

RUN pip install \
    jupyter==1.0.0 \
    hydra-core==1.3.2 \
    packaging==23.2 \
    ninja==1.11.1.1 


# install apex
RUN pip install -v \
    --disable-pip-version-check \
    --no-cache-dir \
    --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" \
    --config-settings "--build-option=--cuda_ext" \
    'git+https://github.com/NVIDIA/apex@b496d85'

RUN pip install 'git+https://github.com/HazyResearch/[email protected]' --no-build-isolation
RUN pip install 'git+https://github.com/HazyResearch/[email protected]#subdirectory=csrc/fused_dense_lib'  --no-build-isolation
RUN pip install 'git+https://github.com/HazyResearch/[email protected]#subdirectory=csrc/layer_norm' --no-build-isolation

# install based
RUN mkdir -p /app
WORKDIR /app
COPY . .
RUN pip install .

CMD python3 test_script.py

It requires NVIDIA docker tookit to run, with the command:

docker run --rm --runtime=nvidia --gpus all based

from based.

simran-arora avatar simran-arora commented on May 26, 2024

Hi,
I think it's because this RMSNorm is being set to None

norm_cls = partial(

Due to the import structure here:

The options are to

Sorry for the difficulty -- we will fix the install / instructions for this

from based.

axelmagn avatar axelmagn commented on May 26, 2024

No worries, and thanks for the speedy reply. Your guidance helped me get past the above error by installing the norm from flash-attn, but there seem to be more undocumented dependency issues:

root@d75213223120:/app# python3 test_script.py 
tokenizer_config.json: 100%|██████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 149kB/s]
config.json: 100%|█████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 8.40MB/s]
vocab.json: 100%|██████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 9.97MB/s]
merges.txt: 100%|████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 28.5MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 9.81MB/s]
config.json: 100%|█████████████████████████████████████████████████████| 2.86k/2.86k [00:00<00:00, 35.0MB/s]
No module named 'causal_attention_cuda'
Traceback (most recent call last):
  File "/app/test_script.py", line 6, in <module>
    model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/based-360m").to("cuda", dtype=torch.float16)
  File "/app/based/models/gpt.py", line 468, in from_pretrained_hf
    model = cls(config, device=device, **kwargs)
  File "/app/based/models/gpt.py", line 741, in __init__
    self.transformer = GPTModel(config, process_group=process_group, **factory_kwargs)
  File "/app/based/models/gpt.py", line 585, in __init__
    [
  File "/app/based/models/gpt.py", line 586, in <listcomp>
    create_block(config, layer_idx=i, process_group=process_group, **factory_kwargs)
  File "/app/based/models/gpt.py", line 382, in create_block
    block = Block(
  File "/app/based/models/block.py", line 86, in __init__
    self.mixer = mixer_cls(dim)
  File "/app/based/models/mixers/slide_attention.py", line 357, in __init__
    if fused_bias_fc and FusedDense is None: raise ImportError("fused_dense is not installed")
ImportError: fused_dense is not installed

I'm a little baffled, since it seems like FusedDense is being imported from flash_attn here:

from flash_attn.ops.fused_dense import ColumnParallelLinear, FusedDense, RowParallelLinear

Are there additional subpackages within flash-attn that need to be installed?

For reference, here is my updated dockerfile:

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

RUN apt-get update && apt-get install -y \
    apt-utils \
    python3.10 \
    python3-pip \
    git \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
RUN pip install \
    torch==2.1.2 \
    torchvision==0.16.2 \
    torchaudio==2.1.2 \
    --index-url https://download.pytorch.org/whl/cu118 # due to observed causal-conv1d dependency

RUN pip install \
    jupyter==1.0.0 \
    hydra-core==1.3.2 \
    packaging==23.2 \
    ninja==1.11.1.1 

# RUN pip install 'git+https://github.com/Dao-AILab/flash-attention.git@6c9e60d' 
RUN pip install 'git+https://github.com/Dao-AILab/flash-attention.git@6c9e60d#subdirectory=csrc/layer_norm'

# install apex
RUN pip install -v \
    --disable-pip-version-check \
    --no-cache-dir \
    --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" \
    --config-settings "--build-option=--cuda_ext" \
    'git+https://github.com/NVIDIA/apex@b496d85'

# install based
RUN mkdir -p /app
WORKDIR /app
COPY . .
RUN pip install .

CMD python3 test_script.py

from based.

simran-arora avatar simran-arora commented on May 26, 2024

That line you pointed out requires this to be installed: https://github.com/Dao-AILab/flash-attention/tree/main/csrc/fused_dense_lib

Would recommend cloning flash-attention and python setup.py install within this directory
An alternative workaround, without the install, is to, in the config, set fused_bias_fc = False

from based.

melisa-writer avatar melisa-writer commented on May 26, 2024

Hi! I got a similar problem while running the sample code:

import torch
from transformers import AutoTokenizer
from based.models.gpt import GPTLMHeadModel

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/based-360m").to("cuda", dtype=torch.float16)

input = tokenizer.encode("If I take one more step, it will be", return_tensors="pt").to("cuda")
output = model.generate(input, max_length=20)
print(tokenizer.decode(output[0]))

Error:

Traceback (most recent call last):
  File "/home/melisarussak/based/inference_test.py", line 6, in <module>
    model = GPTLMHeadModel.from_pretrained_hf("hazyresearch/based-360m").to("cuda", dtype=torch.float16)
  File "/home/melisarussak/based/based/models/gpt.py", line 470, in from_pretrained_hf
    model = cls(config, device=device, **kwargs)
  File "/home/melisarussak/based/based/models/gpt.py", line 743, in __init__
    self.transformer = GPTModel(config, process_group=process_group, **factory_kwargs)
  File "/home/melisarussak/based/based/models/gpt.py", line 587, in __init__
    [
  File "/home/melisarussak/based/based/models/gpt.py", line 588, in <listcomp>
    create_block(config, layer_idx=i, process_group=process_group, **factory_kwargs)
  File "/home/melisarussak/based/based/models/gpt.py", line 373, in create_block
    norm_cls = partial(
TypeError: the first argument must be callable

so I used the Dockerfile given by @axelmagn and now I get:

No module named 'causal_attention_cuda'
Successfully imported the causal dot product kernel!
Could not import the FLA triton kernels...
Traceback (most recent call last):
  File "/app/inference_test.py", line 9, in <module>
    output = model.generate(input, max_length=20)
  File "/app/based/generation.py", line 573, in generate
    output = decode(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/app/based/generation.py", line 194, in decode
    scores.append(get_logits(sequences[-1], inference_params))
  File "/app/based/generation.py", line 155, in get_logits
    logits = model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/app/based/models/gpt.py", line 806, in forward
    hidden_states = self.transformer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/app/based/models/gpt.py", line 674, in forward
    hidden_states, residual = layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/app/based/models/block.py", line 189, in forward
    hidden_states = self.mixer(hidden_states, position_ids=position_ids, decay=decay, **mixer_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/app/based/models/mixers/linear_attention.py", line 127, in forward
    return self.recurrent_forward(hidden_states, kv_state, k_state, q, k, v)
  File "/app/based/models/mixers/linear_attention.py", line 195, in recurrent_forward
    kv_state += k[:, :, -1:] * v[:, :, -1:]
RuntimeError: The size of tensor a (16) must match the size of tensor b (273) at non-singleton dimension 4

Is this due to code changes 2 days ago or I am missing some steps?

from based.

simran-arora avatar simran-arora commented on May 26, 2024

yes that was due to the changes, please try again and let me know if you run into issues

from based.

melisa-writer avatar melisa-writer commented on May 26, 2024

it works now! 🎉 thank you!

from based.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.