Git Product home page Git Product logo

llama-70b-chat-4-shards's People

Contributors

bravo325806 avatar mexiqq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

bravo325806

llama-70b-chat-4-shards's Issues

multi-node inference

Hi MexiQQ(ๅๅ››), Thanks for this great implementation!
But it still could not resolve my concern as though it's 4 shards now, my VRAM for one node(4 gpus)could not handle 70B and I probably need multi-node inference.

DO you have any suggestions or scripts about doing multi-node inference?

runtime error

When running with the following command on a 4GPU A6000 machine

 python convert.py  \
--input_llama_path ~/llama/llama-2-70b-chat \
--input_shards 8 \
--output_llama_path ~/llama/llama-2-70b-chat-4-shards \
--ou>     --output_shards 4

the following error appear.

Fetching all parameters from the checkpoint at /home/yizzhan/llama/llama-2-70b-chat.
Traceback (most recent call last):
File "/home/yizzhan/miniconda3/envs/llama/lib/python3.8/site-packages/torch/serialization.py", line 619, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol, _disable_byteorder_record)
File "/home/yizzhan/miniconda3/envs/llama/lib/python3.8/site-packages/torch/serialization.py", line 853, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at inline_container.cc:588] . PytorchStreamWriter failed writing file data/338: file write failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "convert.py", line 254, in
main()
File "convert.py", line 246, in main
convert_to_llama_70b_2(
File "convert.py", line 219, in convert_to_llama_70b_2
torch.save(shard, path)
File "/home/yizzhan/miniconda3/envs/llama/lib/python3.8/site-packages/torch/serialization.py", line 620, in save
return
File "/home/yizzhan/miniconda3/envs/llama/lib/python3.8/site-packages/torch/serialization.py", line 466, in exit
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:424] . unexpected pos 12609115648 vs 12609115544

llama3

Hi mexiQQ, thank you for open-sourcing this incredible script. Does this also support llama3?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.