With other methods of running LLMs using fp16 or quantization methods down to 4bit/5bi

Is this running in fp16 or fp32, or is it something different? about web-llm HOT 4 CLOSED

mlc-ai commented on June 10, 2024

Is this running in fp16 or fp32, or is it something different?

from web-llm.

Comments (4)

jinhongyii commented on June 10, 2024 1

Thanks for your advice. We are testing fp16 correctness and speed internally and will make it public soon

from web-llm.

xzuyn commented on June 10, 2024

Thanks for your advice. We are testing fp16 correctness and speed internally and will make it public soon

I'm wondering about what the web demo uses currently. The model size is similar to a q4_0 ggml model, so is it running 4bit? I couldn't find any specific info on what precision you guys are using.

from web-llm.

jinhongyii commented on June 10, 2024

it is using 4bit quantization and fp32 for compute

from web-llm.

xzuyn commented on June 10, 2024

it is using 4bit quantization and fp32 for compute

Thank you. Good luck with everything, I'm looking forward to seeing how this progresses.

from web-llm.

Is this running in fp16 or fp32, or is it something different? about web-llm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent