Comments (12)
https://github.com/huggingface/transformers-bloom-inference/blob/abe365066fec6e03ce0ea2cc8136f2da1254e2ea/bloom-inference-server/ds_inference/grpc_server.py#L33
@cderinbogaz I hacked my way around it for now
I pass the downloaded model path and checkpoint dict for the model I need to use and the model="bigscience/bloom"
I know this is not the most elegant method to do this :(
from deepspeed-mii.
@mrwyattii I believe your commit yesterday has fixed this?
Let me know.
I am closely watching this repo :)
from deepspeed-mii.
Seems like there is a check in place which is not letting the new weights work with MII
from deepspeed-mii.
Any updates on this?
@jeffra @RezaYazdaniAminabadi
from deepspeed-mii.
Also the same thing happens with the bigscience/bloom-350m for some reason.
I just ran the example in the README and I got the
AssertionError: text-generation only supports [.....]
error
from deepspeed-mii.
Thanks for the response @mayank31398 !
I think its a neat solution :)
from deepspeed-mii.
weight_quantizer.quantize(transpose(sd[0][prefix + 'self_attention.query_key_value.' + 'weight']))) File "/opt/conda/lib/python3.7/site-packages/deepspeed/module_inject/replace_module.py", line 100, in copy dim=self.in_dim)[self.gpu_index].to(
This is the error I got today while trying int8 inference with bloom.
from deepspeed-mii.
Hi @TahaBinhuraib I think MII doesn't support int8 models.
Can you try vanilla DS-inference?
https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-server
you can try running via a CLI/ deploy a generation server as given in the instructions ^^.
from deepspeed-mii.
The fp16 Bloom weights are now supported. Int8 models are also supported, but currently the DeepSpeed sharded int8 weights for the Bloom model will throw an error. I'm working on a fix for this and automatic loading of the sharded weights (so you don't have to manually download the weights and define the checkpoint file list). Those changes will come in #69 and likely another PR.
from deepspeed-mii.
Thanks @mrwyattii
from deepspeed-mii.
Thanks @mrwyattii can't wait!
from deepspeed-mii.
@mayank31398 @TahaBinhuraib I finally found the time to fix #69 so that it works with int8. You no longer need to download the sharded checkpoint files separately and MII will handle this for you (but it will take a while as the checkpoints are quite large). I just confirmed that it's working on my side, but if you have the opportunity to test it out, please do. The script I used:
import mii
mii_configs = {
"dtype": "int8",
"tensor_parallel": 4,
"port_number": 50950,
}
name = "microsoft/bloom-deepspeed-inference-int8"
mii.deploy(task='text-generation',
model=name,
deployment_name="bloom_deployment",
model_path="/data/bloom-ckpts",
mii_config=mii_configs)
You will probably want to change the model_path
parameter if you run this on your local machine.
from deepspeed-mii.
Related Issues (20)
- Does deepspeed-mii support prefix_allowed_tokens_fn?
- DeepSpeed-MII 能加载量化的int4或者int8的模型吗?
- Tf32 support
- How can I use the same prompt to produce the same text output as vllm
- Support LLava next stronger
- support Qwen
- support Qwen1.5
- support stream
- [BUG] MII Backend Hangs After 9999 Exceptions in `MIIAsyncPipeline.put_request` HOT 2
- few questions regarding the implementation of streaming and batching
- Configure server log level HOT 2
- Compute perplexity
- Attempting to flush sequence N which does not exist
- [QUERY] Expert Parallelism Supported?
- deepseed-mii支持多节点推理么 HOT 2
- Import Error, not compatible with transformer package HOT 4
- CUDA device rank in mii.pipeline
- Client cannot find deployment error
- Dummy data loading?
- non-persistent simple example does not work HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeed-mii.