b4rtaz / distributed-llama Goto Github PK
View Code? Open in Web Editor NEWTensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
License: MIT License