Git Product home page Git Product logo

llm-speed-benchmark's People

Contributors

mehmetmhy avatar mmirman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

llm-speed-benchmark's Issues

4bit/Llama-2-7b-chat-hf failed to run

f3c0b913-b189-48c5-bd85-e5655cf81a4c - model running - running model with following parameters Namespace(max_length=50, temperature=0.9, top_k=50, top_p=0.9, num_return_sequences=1, uuid='f3c0b913-b189-48c5-bd85-e5655cf81a4c', prompt='Hello World!', model='4bit/Llama-2-7b-chat-hf', device='cuda:0')
f3c0b913-b189-48c5-bd85-e5655cf81a4c - metrics collector - Collected metrics for the 6 time, now waiting for 0 sec
Traceback (most recent call last):
File "/workspace/benchllm/model.py", line 54, in
output = hf.run_llm(args.model, args.prompt, args.device, {
File "/workspace/benchllm/src/hf.py", line 43, in run_llm
tokenizer = AutoTokenizer.from_pretrained(model_name)
File "/workspace/benchllm/env/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 768, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/workspace/benchllm/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/workspace/benchllm/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/workspace/benchllm/env/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 124, in init
super().init(
File "/workspace/benchllm/env/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 120, in init
raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

question

Hi,

does this also run on aws sagemaker? How to run it with TGI, VLLM, TensorRT-LLM etc?

Thanks,
Gerald

Add option to change dtype

Currently, for most models, HuggingFace defaults to a dtype of float32. There should be an option to reduce this number.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.