Git Product home page Git Product logo

tl-jockey's Introduction

Jockey

Jockey is a conversational video agent built on top of the Twelve Labs APIs and LangGraph.

Join Twelve Labs' Multimodal Minds Discord server if you have questions or encounter issues when working with Jockey!

Description

ATTENTION: Jockey is in alpha development and may break or behave unexpectedly!

Jockey combines the capabilities of existing Large Language Models (LLMs) with Twelve Labs' APIs using LangGraph. This allows workloads to be allocated to the appropriate foundation models for handling complex video workflows. LLMs are used to logically plan execution steps and interact with users, while video-related tasks are passed to Twelve Labs APIs, powered by video-foundation models (VFMs), to work with video natively, without the need for intermediary representations like pre-generated captions.

Quickstart

Dependencies

  • FFMPEG: Must have ffmpeg accessible in $PATH for the Video Editing worker.

  • Docker: Required for running the Jockey API server.

  • Docker Compose: Required for running the Jockey API server.

    • Needs to accessible via:

      docker compose

      Depending on your install method it may only be accessible with:

      docker-compose

      which can cause issues with the langgraph-cli. In such a case, you can install Docker Desktop to easily make the above a valid system command.

  • Required Python Packages (For Local Dev): requirements.txt

  • Twelve Labs API Key: Twelve Labs Dashboard

  • LLM Provider API Key (Currently Azure or Open AI only)

Setup Steps (Mac OSX)

  1. Install the external dependencies listed above (FFMPEG, Docker, Docker Compose)

  2. Grab the repo using this command:

    git clone https://github.com/twelvelabs-io/tl-jockey.git
  3. Enter the tl-jockey directory:

    cd tl-jockey
  4. Create a virtual environment:

    python3 -m venv venv
  5. Activate your virtual environment:

    source venv/bin/activate
  6. Install Python package requirements:

    pip3 install -r requirements.txt
  7. Create your .env file and add the correct variables:

    AZURE_OPENAI_ENDPOINT=<IF USING AZURE GPTs>
    AZURE_OPENAI_API_KEY=<IF USING AZURE GPTs>
    OPENAI_API_VERSION=<IF USING AZURE GPTs>
    OPENAI_API_KEY=<IF USING OPEN AI GPTs>
    # Determines which Langchain classes are used to construct Jockey LLM instances.
    LLM_PROVIDER=<MUST BE ONE OF [AZURE, OPENAI]>
    TWELVE_LABS_API_KEY=<YOUR TWELVE LABS API KEY>
    # This variable is used to persist and make rendered video servable from within the LangGraph API container.
    # Please make sure this directory exists on the host machine.
    # Please make sure this directory is available as a File Sharing resource in Docker For Mac.
    HOST_PUBLIC_DIR=<VOLUME MOUNTED TO LANGGRAPH API SERVER CONTAINER WHERE RENDERED VIDEOS GO>
    # This variable is a placeholder that will be used by an upcoming Jockey core worker it currently doesn't impact anything.
    # Please make sure this directory exists on the host machine.
    # Please make sure this directory is available as a File Sharing resource in Docker For Mac.
    HOST_VECTOR_DB_DIR=<VOLUME MOUNTED TO LANGGRAPH API SERVER CONTAINER WHERE VECTOR DB GOES>

    Make sure your .env file is somewhere in the directory tree of the tl-jockey directory.

Deploying in the Terminal

This is an easy and lightweight way to run an instance of Jockey in your terminal. Great for quick testing or validation during local dev work.

Terminal Example Jockey Video Walkthrough

  1. Modify your .env file with your desired configuration for an LLM provider.

  2. Run the following command from the tl-jockey directory (where the langgraph.json and compose.yaml files are.):

    python3 -m jockey terminal

Jockey Terminal Startup 3. Currently, Jockey requires an Index ID (and in some cases a Video ID) are supplied as part of the conversation history. You are free to modify how this is handled. An example of an initial prompt might be something like:

Use index 65f747a50db0463b8996bde2. I'm trying to create a funny video focusing on Gordon Ramsay. Can you find 3 clips of Gordon yelling at his chefs about scrambled eggs and then a final clip where Gordon bangs his head on a table. After you find all those clips, lets edit them together into one video.

Since the Index ID is now stored as part of the conversation history, subsequent requests utilizing the same Index ID do not need to explicitly include it as part of the prompt (assuming some reference to the Index ID is still within the context window. For example we could now use the following as a prompt:

This is awesome but the last clip is too long. Lets shorten the last clip where Gordon hits his head on the table by making it start one second later. Then combine all the clips into a single video again.

Debugging In The Terminal

The terminal version of Jockey is designed to be a lightweight way to do dev work and have fairly decent access to useful levels of information for debugging. For example, if we take the Gordon Ramsay prompt we used earlier we would see this intermediary output:

Jockey Terminal Debugging Example

As you can see we get the output from all of the individual components of Jockey even including the inputs and outputs to tool calls. This debugging is handled by the parse_langchain_events_terminal() function and relies on event names and component tags to decide how to parse incoming events. You can modify this code to suite your needs if you need additional information exposed in the terminal. Please note that the tags for the individual components are set in app.py.

Deploying with the LangGraph API Server

This approach is more suitable for building and testing end-to-end user applications although it takes longer to build.

  1. Run the following command from the tl-jockey directory (where the langgraph.json and compose.yaml files are.):

    python3 -m jockey server

LangGraph Debugger UI Walkthrough

  1. Open LangGraph Debugger in your browser to ensure the LangGraph API server is up and reachable.LangGraph Debugger jockey should appear under "Assistants"
  2. If you click on jockey another panel will open and you can click New Thread to step into your running Jockey instance!
  3. Visit: LangGraph Examples for more detailed information on integrating into a user application. You can also check out: client.ipynb for a basic example in a Jupyter notebook.
  4. You can use the LangGraph Debugger to step into the Jockey instance and debug end-to-end including adding breakpoints to examine and validate the graph state for any given input.Jockey LangGraph Debugger

Jockey Agent Architecture

Jockey Architecture

Jockey Architecture Walkthrough Video

Jockey consists of three main parts.

  1. Supervisor: Responsible for node routing.
  2. Planner: The primary purpose of the planner is to create step-by-step plans for complex user requests.
  3. Workers: The worker nodes consist of two components.
    1. Instructor: Generates exact and complete task instructions for a single worker based on the plan created by the Planner.
    2. Actual Workers: Agents that ingest the instructions from the instructor and execute them using the tools they have available.

Customizing Jockey

Jockey comes with a core set of workers for powering a conversational video agent. However, video-related use cases are diverse and complex, and the vanilla core workers may not be adequate for your needs. There are two main ways to customize Jockey that are explored below.

Prompt as a Feature

This is the most lightweight way to customize Jockey. Use this approach when the agent's functionality or vanilla core workers are sufficient, but the planning capabilities of the LLM are not robust enough for consistent performance across various inputs.

To apply this customization, modify some or all of the planner.md prompt with more targeted and specific instructions. This approach works best for less open-ended use cases that involve minimal dynamic planning.

Extending or Modifying Jockey

For more complex use cases, the out-of-the-box capabilities of the agent or vanilla core workers may be insufficient. In these instances, consider extending or modifying Jockey in the following ways:

Modifying Prompts

This is similar to the Prompt as a Feature approach but may require heavily modifying the planner.md, supervisor.md and worker prompts.

Extending or Modifying State

Given the complex nature of video workflows, the default state that Jockey uses may be inadequate for your use case. Each node in the underlying StateGraph has full access to the JockeyState, allowing you to add or modify state variables to track complex state information that isn't easily handled by LLMs via conversation history. This can help ensure cohesive planning and correct worker selection.

Example:

# Default JockeyState
class JockeyState(TypedDict):
    """Used to track the state between nodes in the graph."""
    chat_history: Annotated[Sequence[BaseMessage], add_messages]
    next_worker: str
    made_plan: bool
    active_plan: str

To handle multiple sequences of video that could be rendered, extend JockeyState:

class Clip(BaseModel):
    """Define what constitutes a clip."""
    index_id: str = Field(description="A UUID for the index a video belongs to. This is different from the video_id.")
    video_id: str = Field(description="A UUID for the video a clip belongs to.")
    start: float = Field(description="The start time of the clip in seconds.")
    end: float = Field(description="The end time of the clip in seconds.")

class Sequence(BaseModel):
    """Represents a single sequence of renderable video and its constituents."""
    clips: List[Clip]
    sequence_name: str

# Extended JockeyState
class JockeyState(TypedDict):
    """Used to track the state between nodes in the graph."""
    chat_history: Annotated[Sequence[BaseMessage], add_messages]
    next_worker: str
    made_plan: bool
    active_plan: str
    sequences: Dict[str, Sequence]

This could allow you to manage multiple versions of a sequence that can be selectively rendered or easily modified.

Adding or Modifying Workers

If the first two customization options aren't sufficient, you can add or modify existing workers:

  • Modifying an Existing Worker: Add custom logic or tools to an existing worker. You may also need to refine the appropriate prompts.
  • Adding a New Worker: If none of the existing workers provide the needed capabilities, you can create a new worker. Use the Stirrup class to define your custom worker and modify jockey_graph.py to integrate the new worker into Jockey. Update the appropriate prompts to ensure Jockey correctly uses the new worker(s).

tl-jockey's People

Contributors

traviscouture avatar dmitriitsy avatar ojadeyemi avatar

Stargazers

OJ (MLSE) avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.