Git Product home page Git Product logo

Comments (9)

marella avatar marella commented on June 11, 2024 2

Thanks @mt-v I hope nilvaes comment answered your questions.


@nilvaes if you are still looking for gpt4all-j model, you can use this file: https://huggingface.co/rustformers/gpt4all-j-ggml/blob/main/gpt4all-j-q4_0.bin which is in the standard ggml format.

chatdocs.yml:

ctransformers:
  model: rustformers/gpt4all-j-ggml
  model_file: gpt4all-j-q4_0.bin
  model_type: gptj

from chatdocs.

marella avatar marella commented on June 11, 2024 1

I don't think gpt4all-j will be faster than the default llama model. On Open LLM Leaderboard, gpt4all-13b-snoozy doesn't appear to be good compared to other 13B models like Wizard-Vicuna-13B-Uncensored
Depending on your RAM you may or may not be able to run 13B models. RAM requirements are mentioned in the model card.

Recently some new quantization formats were released which significantly reduce the model size and require less memory.
Try the ...q2_K.bin files from Wizard-Vicuna-7B-Uncensored-GGML and Wizard-Vicuna-13B-Uncensored-GGML. They will be faster but will have less quality.

chatdocs.yml:

ctransformers:
  model: TheBloke/Wizard-Vicuna-7B-Uncensored-GGML
  model_file: Wizard-Vicuna-7B-Uncensored.ggmlv3.q2_K.bin
  model_type: llama

Also try running with gpu_layers: 0. Sometimes running on just CPU can be faster if VRAM is not enough.

from chatdocs.

nilvaes avatar nilvaes commented on June 11, 2024 1

hi, @mt-v . There is a list of commercial usable LLMs:
https://github.com/eugeneyan/open-llms

I would recommend you to check and research which models are suitable for your project. If you want faster responds you need a better cpu ram or gpu, if you going to use it locally for now.

If your vram (gpu) runs out of memory, you should play with the gpu_layers: 50. I have gtx 1660 SUPER (6gb vram) and gpu_layers: 30 was the best solution for me.

I would love to hear your accomplishments throughout your project, keep me notified.

from chatdocs.

marella avatar marella commented on June 11, 2024

I'm sorry if these questions/problems are easy. I'm still a beginner on this subject but i really love the work you're putting on.

Hey, no worries. Actually this is not an easy problem to figure out. I think gpt4all team changed their models and are using custom formats for their models instead of the standard ggml format, so it is not working with the ggml library.

Any reason you want to use the gpt4all-j model? I think the default model Wizard-Vicuna-7B-Uncensored is better than gpt4all-j and has similar size. Please note that only llama based models like Wizard-Vicuna support GPU. so gpt4all-j doesn't support GPU. If you want to use a gpt4all model you can try https://huggingface.co/TheBloke/GPT4All-13B-snoozy-GGML/tree/main which is also better than gpt4all-j.

Also any of the GGML models from https://huggingface.co/TheBloke will work.

from chatdocs.

nilvaes avatar nilvaes commented on June 11, 2024

My Specs:
cpu: amd ryzen 5 2600x 6Core
gpu: gtx 1660 super

i wanted to get faster responses. For now with gpu_layers: 30 i'm nearly using my all vrams and also using my cpu and i get a response in 37seconds.

What do you think about this one? ggml-gpt4all-l13b-snoozy.bin

from chatdocs.

v4rm3t avatar v4rm3t commented on June 11, 2024

First of all great work @marella . This library makes it so easy to install and run.

So, I have similar issue where my 2gb Nvidia Quadro P620 runs out of memory.
And I am making a chatbot app for commercial usage, so which model can I use for it? I know that gpt4all-j models can be used, but the results are very poor with it. So how can I achieve that?
(This is just for testing until I buy a cloud for commercial use)

from chatdocs.

v4rm3t avatar v4rm3t commented on June 11, 2024

@nilvaes @marella Thank you very much guys! This is exactly what I was looking for :)

I will keep you posted on the project. Once again, thanks for your response and this wonderful project!

from chatdocs.

v4rm3t avatar v4rm3t commented on June 11, 2024

Hey guys! So I have upgraded to RTX 3060 12gb for testing the models. Do we have a support to configure this like an API server as you see in LocalAI. So that you can switch between different models, backends and OpenAI API.

from chatdocs.

marella avatar marella commented on June 11, 2024

Hey, it uses WebSockets, so it doesn't have a REST API. See backend and frontend code for reference.
Switching models might not be feasible because it will require more memory for each model loaded.

from chatdocs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.