lynn1 / llama3-on-2gpus Goto Github PK
View Code? Open in Web Editor NEWA script tool which recut the original llama3_70B_instruct model into 2 or 4 shards, so that one can run the model efficiently on a `2x80GB` or `4x40GB` GPUs environments.
License: GNU Lesser General Public License v2.1