Comments (3)
Might be a good idea to keep it open in case someone else has the same issue. I'll close it myself once we have real bitsandbytes support.
from aphrodite-engine.
The quant name in aphrodite is unfortunately a bit misleading - I intend to fix this with the next release. The load_in_4bit
quant isn't actually bitsandbytes, it's SmoothQuant+. We don't allow loading bnb weights directly yet. This will also be addressed with the next release.
Note that SQ+ is faster and offers better quality than bnb 4bit. bnb reduces throughput compared to fp16, while sq+ increases it by close to 3x.
from aphrodite-engine.
Got it, ty for looking at this and helping me understand. Do you want me to close this issue?
from aphrodite-engine.
Related Issues (20)
- [Installation]: install fails on Ubuntu 24.04 HOT 2
- [Bug]: New Numpy version breaks installation HOT 2
- [Misc]: Is there a way to log the prompts content request in the console? HOT 2
- [Usage]: native nvlink support or not agnostic to mobo
- [Usage]: So can AMD NAVI GPUs be used with aphrodite? Which GPUs? HOT 1
- [Usage]: How to use prefix caching? HOT 3
- [Bug]: (exl2) First request with guided_json/guided_regex param creates crash loop for 10 minutes HOT 1
- [Bug]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU HOT 11
- [Bug]: Using SillyTavern slow down aphrodite generation speed. HOT 2
- [Installation]: Could not build wheels for aphrodite-engine, xformers
- Support 3bit quip# model.
- [Bug]: LLMs have difficulties generating new line characters "\n", which is extremely obvious while generating code blocks in markdown
- [Feature] GGUF memory usage optimization HOT 1
- [Usage]: Set memory usuage for each gpu seperately HOT 1
- [Usage]: Only One GPU Being Used HOT 2
- [Usage]: Low Throughput HOT 2
- [Bug]: CUDA out of memory for Dracones/c4ai-command-r-plus_exl2_5.5bpw with 192GB VRAM - 4*RTX 6000 Ada HOT 1
- [Bug]: Issue when loading Mistral-Nemo-Instruct-2407-Q5_K_M HOT 3
- [Bug]: Llama 3.1 outputs gibberish when --kv-cache-dtype fp8 but AWQ model works fine HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aphrodite-engine.