I wanted to use another llm but i had some errors as: <a target="_bl

hi, @mt-v . There is a list of commercial usable LLMs: <a href="https://github.com

First of all great work <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="us

Hey, it uses WebSockets, so it doesn't have a REST API. See <a href="https://github.co

can't use ggml-gpt4all-j-v1.3-groovy.bin about chatdocs HOT 9 CLOSED

nilvaes commented on June 11, 2024

can't use ggml-gpt4all-j-v1.3-groovy.bin

from chatdocs.

Comments (9)

marella commented on June 11, 2024 2

Thanks @mt-v I hope nilvaes comment answered your questions.

@nilvaes if you are still looking for gpt4all-j model, you can use this file: https://huggingface.co/rustformers/gpt4all-j-ggml/blob/main/gpt4all-j-q4_0.bin which is in the standard ggml format.

chatdocs.yml:

ctransformers:
  model: rustformers/gpt4all-j-ggml
  model_file: gpt4all-j-q4_0.bin
  model_type: gptj

from chatdocs.

marella commented on June 11, 2024 1

I don't think gpt4all-j will be faster than the default llama model. On Open LLM Leaderboard, gpt4all-13b-snoozy doesn't appear to be good compared to other 13B models like Wizard-Vicuna-13B-Uncensored
Depending on your RAM you may or may not be able to run 13B models. RAM requirements are mentioned in the model card.

Recently some new quantization formats were released which significantly reduce the model size and require less memory.
Try the ...q2_K.bin files from Wizard-Vicuna-7B-Uncensored-GGML and Wizard-Vicuna-13B-Uncensored-GGML. They will be faster but will have less quality.

chatdocs.yml:

ctransformers:
  model: TheBloke/Wizard-Vicuna-7B-Uncensored-GGML
  model_file: Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K.bin
  model_type: llama

Also try running with gpu_layers: 0. Sometimes running on just CPU can be faster if VRAM is not enough.

from chatdocs.

nilvaes commented on June 11, 2024 1

hi, @mt-v . There is a list of commercial usable LLMs:
https://github.com/eugeneyan/open-llms

I would recommend you to check and research which models are suitable for your project. If you want faster responds you need a better cpu ram or gpu, if you going to use it locally for now.

If your vram (gpu) runs out of memory, you should play with the gpu_layers: 50. I have gtx 1660 SUPER (6gb vram) and gpu_layers: 30 was the best solution for me.

I would love to hear your accomplishments throughout your project, keep me notified.

from chatdocs.

marella commented on June 11, 2024

I'm sorry if these questions/problems are easy. I'm still a beginner on this subject but i really love the work you're putting on.

Hey, no worries. Actually this is not an easy problem to figure out. I think gpt4all team changed their models and are using custom formats for their models instead of the standard ggml format, so it is not working with the ggml library.

Any reason you want to use the gpt4all-j model? I think the default model Wizard-Vicuna-7B-Uncensored is better than gpt4all-j and has similar size. Please note that only llama based models like Wizard-Vicuna support GPU. so gpt4all-j doesn't support GPU. If you want to use a gpt4all model you can try https://huggingface.co/TheBloke/GPT4All-13B-snoozy-GGML/tree/main which is also better than gpt4all-j.

Also any of the GGML models from https://huggingface.co/TheBloke will work.

from chatdocs.

nilvaes commented on June 11, 2024

My Specs:
cpu: amd ryzen 5 2600x 6Core
gpu: gtx 1660 super

i wanted to get faster responses. For now with gpu_layers: 30 i'm nearly using my all vrams and also using my cpu and i get a response in 37seconds.

What do you think about this one? ggml-gpt4all-l13b-snoozy.bin

from chatdocs.

v4rm3t commented on June 11, 2024

First of all great work @marella . This library makes it so easy to install and run.

So, I have similar issue where my 2gb Nvidia Quadro P620 runs out of memory.
And I am making a chatbot app for commercial usage, so which model can I use for it? I know that gpt4all-j models can be used, but the results are very poor with it. So how can I achieve that?
(This is just for testing until I buy a cloud for commercial use)

from chatdocs.

v4rm3t commented on June 11, 2024

@nilvaes @marella Thank you very much guys! This is exactly what I was looking for :)

I will keep you posted on the project. Once again, thanks for your response and this wonderful project!

from chatdocs.

v4rm3t commented on June 11, 2024

Hey guys! So I have upgraded to RTX 3060 12gb for testing the models. Do we have a support to configure this like an API server as you see in LocalAI. So that you can switch between different models, backends and OpenAI API.

from chatdocs.

marella commented on June 11, 2024

Hey, it uses WebSockets, so it doesn't have a REST API. See backend and frontend code for reference.
Switching models might not be feasible because it will require more memory for each model loaded.

from chatdocs.

can't use ggml-gpt4all-j-v1.3-groovy.bin about chatdocs HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent