Git Product home page Git Product logo

orca's Introduction

Orca

Orca is a LLM Orchestration Framework written in Rust. It is designed to be a simple, easy-to-use, and easy-to-extend framework for creating LLM Orchestration. It is currently in development so it may contain bugs and its functionality is limited.

CI

About Orca

Orca is currently in development. It is hard to say what the future of Orca looks like, as I am currently learning about LLM orchestrations and its extensive applications. These are some ideas I want to explore. Suggestions are welcome!

  • WebAssembly to create simple, portable, yet powerful LLM applications that can run serverless across platforms.
  • Taking advantage of Rust for fast memory-safe distributed LLM applications.
  • Deploying LLMs to the edge (think IOT devices, mobile devices, etc.)

Set up

To set up Orca, you will need to install Rust. You can do this by following the instructions here. Once you have Rust installed, you can add Orca to your Cargo.toml file as a dependency:

[dependencies]
orca = { git = "https://github.com/scrippt-tech/orca", package = "orca-core" }

Features

  • Prompt templating using handlebars-like syntax (see example below)
  • Loading records (documents)
    • HTML from URLs or local files
    • PDF from bytes or local files
  • Vector store support with Qdrant
  • Current LLM support:
    • OpenAI Chat
    • Limited Bert support using the Candle ML framework
  • Pipelines:
    • Simple pipelines
    • Sequential pipelines

Examples

Orca supports simple LLM pipelines and sequential pipelines. It also supports reading PDF and HTML records (documents).

OpenAI Chat

use orca::pipeline::simple::LLMPipeline;
use orca::pipeline::Pipeline;
use orca::llm::openai::OpenAI;
use orca::prompt::context::Context;
use serde::Serialize;

#[derive(Serialize)]
pub struct Data {
    country1: String,
    country2: String,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = OpenAI::new();
    let prompt = r#"
            {{#chat}}
            {{#user}}
            What is the capital of {{country1}}?
            {{/user}}
            {{#assistant}}
            Paris
            {{/assistant}}
            {{#user}}
            What is the capital of {{country2}}?
            {{/user}}
            {{/chat}}
            "#;
    let pipeline = LLMPipeline::new(&client)
        .load_template("capitals", prompt)?
        .load_context(&Context::new(Data {
            country1: "France".to_string(),
            country2: "Germany".to_string(),
        })?)?;
    let res = pipeline.execute("capitals").await?.content();

    assert!(res.contains("Berlin") || res.contains("berlin"));
    Ok(())
}

Contributing

Contributors are welcome! If you would like to contribute, please open an issue or a pull request. If you would like to add a new feature, please open an issue first so we can discuss it.

Running locally

We use [cargo-make](https://github.com/sagiegurari/cargo-make) to run Orca locally. To install it run:

cargo install cargo-make

Once you have cargo-make installed, you can build or test Orca by running:

$ makers build # Build Orca
$ makers test # Test Orca

orca's People

Contributors

santiagomed avatar butch78 avatar dongpil avatar

Stargazers

Xtian avatar Willie Seabrook avatar Francisco Leon avatar  avatar Vladislav Sorokin avatar Mike Tang avatar Vadym Parakonnyi avatar Shaneo avatar Raj Kadam avatar Noam Teyssier avatar Karol Szambelańczyk avatar Samir Djelal avatar Kusaanko avatar hamflx avatar felix avatar  avatar Arun Udayashankar avatar Ulan Sametov avatar Willie Abrams avatar duduxin avatar Srivatsa avatar  avatar Shelby Jenkins avatar Chris Raethke avatar André Claudino avatar Alexi Chepura avatar Kasi Komma avatar  avatar Elio Esteves Duarte avatar Pierson Marks avatar mukul avatar Amin Nasiri avatar Collin avatar Ryan Kee avatar Dr.meteor avatar Sigrid Jin (ง'̀-'́)ง oO avatar Stefan Terdell avatar RETEX avatar  avatar  avatar David Titarenco avatar  avatar Brad Pillow avatar Vineeth Voruganti avatar Bo An avatar  avatar Robin avatar  avatar Jingcheng Yang avatar Achilles Moraites avatar Nikolaus Schlemm avatar  avatar  avatar Satoshi47 avatar Mauricio avatar Adriel Casellas avatar Clément Boulanger avatar fr4nk avatar hul avatar Andy avatar Mas Rayfa Nanda Nurimansyah  avatar  avatar Chaoqian Xu avatar Alex avatar Qin Liu avatar Jinjing Zhou avatar Prabir Shrestha avatar  avatar Aditya Upadhyaya avatar Giorgio Rengucci avatar Tommy McCormick avatar Brucel Qwe avatar Andrejs Agejevs avatar  avatar Sebastien Soudan avatar Honsun Zhu avatar  avatar Syed Ather Rizvi avatar  avatar Shiven Mian avatar Abdullah Mohammed avatar Pram avatar Adhi Setiawan avatar Dr. Pierre Jourlin avatar ZongYuan Zhan avatar Matthew avatar Jin Yi avatar Sangmin, Kang avatar  avatar Dezoito avatar assigntalent avatar Andy Prock avatar  avatar Darrell Tang avatar  avatar Krzysztof Zawisła avatar Patrick Devaney avatar Luis Magana avatar Adrib Mahmud avatar  avatar

Watchers

Gabriel Ciuloaica avatar Todsaporn Banjerdkit avatar Matthieu Dartiguenave avatar Carlos Vicens avatar Alexi Chepura avatar  avatar  avatar

orca's Issues

Add more models to Orca

Right now we only have a small amount of supported models. Let's expand this! A good start would be to look into https://github.com/huggingface/candle/tree/main/candle-examples/examples and port over their examples into Orca.

Note: Copy pasta is not enough, look at how I did for Bert and Quantized for inspiration. These also could be improved to be a cleaner interface.

Edit: I am refactoring to have a models API in a separate crate orca-models. This crate's goal is to provide an API for using models such as those provided by OpenAI as well as providing an API to easily use Candle transformer models. This should be the main point of development and should replace orca-core's models implementations.

[suggestion] Share an example / use case as a blogpost

completely up to you, could be a nice way to increase visibility and feedback - If you have some nice examples of how this could be useful, it would be interesting to feature as a Community Blogpost on HF. What are your thoughts?

Refactor Prompt polymorphism

Currently, Prompts follow an implementation of the Prompt trait. We then use dynamic dispatch to handle prompts. This is not a clean implementation. The goal is to allow every user to implement their prompt types, and use them as they wish. Unfortunately, my mistake was to hardcode the use of ChatPrompt and String prompts into the codebase. We need to refactor this to handle generic prompts. I am thinking of replacing Prompt trait with using just ToString and handling this accordingly. Another option would be to leverage JSON. Ideas are welcome.

General solution to generate embeddings concurrently

Concurrent processing is a key strategy for speeding up embedding generation, especially when generating embeddings for a series of texts. However, a general issue that surfaces with concurrency is the difficulty in maintaining the correct order of the output embeddings. In concurrent setups, due to asynchronous task completions, the order of the output embeddings (Vec<Vec<f32>>) often doesn’t align with the order of the input prompts (Vec<Box<dyn Prompt>>). This misalignment is causes issues such as a misalignment of text-to-embedding when inserting in a vector database such as Qdrant.

Concurrency:

  • Rayon Crate: Experiments with the Rayon crate have shown drastic improvements in processing speed (80% faster) for Bert embeddings but have fallen short of solving the ordering issue.
  • Tokio: Using tokio::spawn to start multiple asynchronous embedding tasks is another potential solution under exploration for async embedding contexts (such as using OpenAI for embedding API calls).

The goal here is to identify a solution that not only enhances processing speed but also ensures the integrity of data by preserving the correct order of embeddings. Any insights or experiences that could contribute to resolving this challenge would be highly valuable.

Split OpenAI model into directory with multiple modules

Might be a good idea to split the OpenAI model into separate modules for readability and scalability. (i.e. ChatCompletion and Embeddings go in separate modules) This will also provide a cleaner interface to users as well.

Taking templates to the next level

I think there is a lot of potential for templates in Orca. A potential they have is being the main point of entry and interface for the user to develop their LLM applications. Maybe think of it as a language to write LLM applications(?)

I keep thinking of ideas to extend this interface. What other ways is there for the user to communicate with an LLM? For example, I think it would be possible to specify functions for the LLM to use within a template as specified in #11.

Prompt engineering is basically pseudocoding in and of itself, so why not formalize it?

Any ideas, thoughts, or comments are more than welcome.

Add sqlite-vss wrapper

I'm thinking of using this project, but I'd like to use sqlite instead of Qdrant, would you accept such a PR?

Integrate functions for LLMs to use

I want to integrate the use of functions for LLMs. Inspired by the ReAct framework, this would allow users to pass in a function they would like the LLM to use when a trigger happens (the LLM decides it needs to use it). I am holding off in implementing this for now, since I want to have a clear idea of a functions lifetime in the context of Orca.

Even cooler, would be to be able to pass the function's trigger through a template. Similar to how you load records or context, you would load the function for the LLM to use, and it would be interface through the template.

Any thoughts, advice, or ideas are more than welcome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.