Git Product home page Git Product logo

fast-llm.rs's Introduction

Fast-LLM powered by Candle ๐Ÿฆ€

Yo! I really have no clue what I'm doing here, but here's me learning to rust by making candle's quantised llm examples into its own package.

None of the work here is original and all attributions should go to Laurent & Nicolas who made this gem of a library and with ready-to-use examples.

What does it do?

It allows you to run popular GGUF checkpoints on the Hugging Face Hub via Candle. Works on Macs with Metal or on CPU (although CPU is much much slower).

This is an alpha release and I expect quite a lot of this to change in the short term.

How do you run this bad boi?

Step 1: git clone https://github.com/Vaibhavs10/fast-llm.rs/

Step 2: cd fast-llm.rs

Step 3: cargo run --features metal --release -- --which 7b-mistral-instruct-v0.2 --prompt "What is the meaning of life according to a dog?" --sample-len 100

Note: you can remove the --features metal to run inference on CPU.

Check how to install Rust and how to use the CLI if you need to.

Which models are supported?

  1. Mistral 7B
  2. Llama 7B/ 13B/ 34B
  3. CodeLlama 7B/ 13B/ 34B
  4. Mixtral 8x7B

You can also bring your own GGUF checkpoint by passing a --model.

More details

Installing Rust

Just follow the official instructions.

How to use the CLI

When you use cargo run, command-line arguments go to cargo. Use -- to send them to the fast-llm binary. The following will compile the code in release mode (a cargo option), and then list all the options fast-llm supports.

cargo run --release -- --help

By default, fast-llm sends your prompt to the LLM, prints the response and quits. You can use interactive or chat mode too:

  • cargo run --release -- --prompt interactive. Runs in interactive mode. You can ask multiple independent queries, previous context is not retained.

  • cargo run --release -- --prompt chat. Runs in chat mode. Carries conversation history, just like when using ChatGPT or HuggingChat. In this mode you'll get best results with one of the Instruct versions of the models, Mistral, Zephyr, or OpenChat, as all these models are designed for chat.

fast-llm.rs's People

Contributors

pcuenca avatar vaibhavs10 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.